| # Norwegian RAG Chatbot | |
| A Retrieval-Augmented Generation (RAG) based chatbot with excellent Norwegian language support, built using Hugging Face's Inference API. | |
| ## Features | |
| - **Norwegian Language Support**: Leverages state-of-the-art Norwegian language models like NorMistral, Viking, and NorskGPT | |
| - **Document Processing**: Upload and process documents in various formats (PDF, TXT, HTML) | |
| - **RAG Implementation**: Retrieves relevant context from documents to generate accurate responses | |
| - **Embeddable Interface**: Easily embed the chatbot in any website using iframe or JavaScript widget | |
| - **Lightweight Architecture**: Uses Hugging Face's Inference API instead of running models locally | |
| ## Architecture | |
| This chatbot uses a lightweight architecture that leverages Hugging Face's hosted models: | |
| 1. **Document Processing**: Documents are processed locally, extracting text and splitting into chunks | |
| 2. **Embedding Generation**: Document chunks are embedded using Hugging Face's Inference API | |
| 3. **Retrieval**: When a query is received, the most relevant document chunks are retrieved | |
| 4. **Response Generation**: The LLM generates a response based on the retrieved context | |
| ## Getting Started | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - A Hugging Face account (for API access) | |
| ### Installation | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://huggingface.co/spaces/username/norwegian-rag-chatbot | |
| cd norwegian-rag-chatbot | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements-ultra-light.txt | |
| ``` | |
| 3. Set up your Hugging Face API key: | |
| ```bash | |
| export HF_API_KEY="your_api_key_here" | |
| ``` | |
| ### Running the Chatbot | |
| ```bash | |
| python src/main.py | |
| ``` | |
| The chatbot will be available at http://localhost:7860 | |
| ## Usage | |
| ### Chat Interface | |
| The main chat interface allows you to: | |
| - Ask questions in Norwegian | |
| - Receive responses based on your uploaded documents | |
| - Adjust temperature and other settings | |
| ### Document Upload | |
| You can upload documents to provide context for the chatbot: | |
| - Supported formats: PDF, TXT, HTML | |
| - Documents are automatically processed and indexed | |
| - The chatbot will use these documents to provide more accurate responses | |
| ### Embedding | |
| You can embed the chatbot in your website using: | |
| - iFrame embedding | |
| - JavaScript widget | |
| - Direct link | |
| ## Deployment | |
| The chatbot is designed to be deployed to Hugging Face Spaces: | |
| 1. Create a new Space on Hugging Face | |
| 2. Upload the code to the Space | |
| 3. Set the HF_API_KEY secret in the Space settings | |
| 4. The Space will automatically build and deploy the chatbot | |
| ## Models | |
| The chatbot can use various Norwegian language models: | |
| - **NorMistral-7b-scratch**: A large Norwegian language model pretrained from scratch | |
| - **Viking 7B**: A multilingual model for Nordic languages | |
| - **NorskGPT**: A Norwegian language model based on Mistral or LLAMA2 | |
| For embeddings, it uses: | |
| - **NbAiLab/nb-sbert-base**: A Norwegian sentence embedding model | |
| ## License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## Acknowledgements | |
| - [Hugging Face](https://huggingface.co/) for hosting the models and providing the Inference API | |
| - [Gradio](https://gradio.app/) for the web interface framework | |
| - The creators of the Norwegian language models used in this project | |
| --- | |
| name: norwegian-rag-chatbot | |
| title: Norwegian RAG Chatbot | |
| emoji: π³π΄ | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 4.0.0 | |
| app_file: src/main.py | |
| pinned: true | |
| license: mit | |