Spaces:
Sleeping
Sleeping
completed readme.md
Browse files
README.md
CHANGED
@@ -9,21 +9,169 @@ pinned: false
|
|
9 |
|
10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
11 |
|
12 |
-
#
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
-
|
22 |
-
-
|
23 |
-
-
|
24 |
-
|
25 |
-
-
|
26 |
-
-
|
27 |
-
-
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
11 |
|
12 |
+
# Smart Research Assistant
|
13 |
+
|
14 |
+
A document-aware AI assistant for research summarization, question-answering, and logic-based question generation, built with FastAPI (backend) and Streamlit (frontend), deployed as a single Docker container.
|
15 |
+
|
16 |
+
---
|
17 |
+
|
18 |
+
## Features
|
19 |
+
|
20 |
+
- **Document Upload:** Upload PDF or TXT documents for processing.
|
21 |
+
- **Auto Summary:** Generate concise (β€150 words) summaries of uploaded documents.
|
22 |
+
- **Ask Anything:** Ask free-form questions and receive answers grounded in the uploaded document.
|
23 |
+
- **Challenge Me:** Generate three logic-based or comprehension-focused questions from the document.
|
24 |
+
- **Evaluate Responses:** Evaluate user answers to logic-based questions with feedback and justification.
|
25 |
+
- **Session Management:** Each user session is tracked for document and interaction history.
|
26 |
+
- **Vector Database Integration:** Uses Pinecone for semantic search and retrieval.
|
27 |
+
- **Clean UI:** Intuitive web interface for uploading, querying, and interacting with documents.
|
28 |
+
|
29 |
+
---
|
30 |
+
|
31 |
+
## Architecture
|
32 |
+
|
33 |
+
- **Backend:** FastAPI (Python)
|
34 |
+
- Handles document upload, storage, and retrieval.
|
35 |
+
- Implements endpoints for Q&A, question generation, and answer evaluation.
|
36 |
+
- Uses SQLite for session/document management and Pinecone for vector search.
|
37 |
+
- **Frontend:** Streamlit (Python)
|
38 |
+
- Provides a web interface for users to upload documents, ask questions, and receive feedback.
|
39 |
+
- Communicates with the backend via HTTP requests.
|
40 |
+
- **Vector Database:** Pinecone
|
41 |
+
- Stores document embeddings for semantic search and retrieval.
|
42 |
+
- **Deployment:** Single Docker container with both backend and frontend services.
|
43 |
+
- FastAPI runs on port 8000 (internal).
|
44 |
+
- Streamlit runs on port 7860 (exposed to users).
|
45 |
+
- No Nginx or reverse proxy required for minimal setup.
|
46 |
+
|
47 |
+
---
|
48 |
+
|
49 |
+
## Setup
|
50 |
+
|
51 |
+
### Requirements
|
52 |
+
|
53 |
+
- **Docker**
|
54 |
+
- **Pinecone API key** (for vector search)
|
55 |
+
- **OpenAI API key** (for LLM inference)
|
56 |
+
|
57 |
+
---
|
58 |
+
|
59 |
+
### 1. Clone the Repository
|
60 |
+
|
61 |
+
```
|
62 |
+
git clone https://github.com/m-umar-j/smart-research-assistant.git
|
63 |
+
cd smart-research-assistant
|
64 |
+
```
|
65 |
+
|
66 |
+
---
|
67 |
+
|
68 |
+
### 2. Environment Variables
|
69 |
+
|
70 |
+
Create a `.env` file in the project root with the following variables:
|
71 |
+
|
72 |
+
```
|
73 |
+
OPENAI_API_KEY=your_openai_key
|
74 |
+
PINECONE_API_KEY=your_pinecone_key
|
75 |
+
```
|
76 |
+
|
77 |
+
|
78 |
+
---
|
79 |
+
|
80 |
+
### 3. Build and Run with Docker
|
81 |
+
|
82 |
+
docker build -t smart-research-assistant .
|
83 |
+
docker run -p 7860:7860 --env-file .env smart-research-assistant
|
84 |
+
|
85 |
+
|
86 |
+
- **Port 7860** is exposed for Streamlit.
|
87 |
+
- **Port 8000** is used internally for FastAPI.
|
88 |
+
|
89 |
+
---
|
90 |
+
|
91 |
+
### 4. Access the Application
|
92 |
+
|
93 |
+
Open your browser to:
|
94 |
+
|
95 |
+
http://localhost:7860
|
96 |
+
|
97 |
+
|
98 |
+
---
|
99 |
+
|
100 |
+
## Commands
|
101 |
+
|
102 |
+
- **Start Streamlit and FastAPI (via `start.sh`):**
|
103 |
+
|
104 |
+
```
|
105 |
+
cd /app && uvicorn backend.main:app --host 0.0.0.0 --port 8000 &
|
106 |
+
cd /app && streamlit run frontend/app.py --server.port=7860 --server.address=0.0.0.0 --browser.gatherUsageStats=false --server.enableXsrfProtection=false
|
107 |
+
```
|
108 |
+
|
109 |
+
|
110 |
+
---
|
111 |
+
|
112 |
+
## Technical Details
|
113 |
+
|
114 |
+
### Backend
|
115 |
+
|
116 |
+
- **FastAPI endpoints:**
|
117 |
+
- `/upload-doc`: Upload and index documents (PDF/TXT).
|
118 |
+
- `/list-docs`: List documents by session.
|
119 |
+
- `/chat`: Answer questions based on uploaded documents.
|
120 |
+
- `/challenge-me`: Generate logic-based questions.
|
121 |
+
- `/evaluate-response`: Evaluate user answers to logic-based questions.
|
122 |
+
- **Database:** SQLite (`research_assistant.db`) for session/document storage.
|
123 |
+
- **Vector Database:** Pinecone for document embeddings and semantic retrieval.
|
124 |
+
|
125 |
+
### Frontend
|
126 |
+
|
127 |
+
- **Streamlit UI:**
|
128 |
+
- Upload documents.
|
129 |
+
- Display summaries.
|
130 |
+
- Ask questions and view answers.
|
131 |
+
- Generate and answer logic-based questions.
|
132 |
+
- View feedback on answers.
|
133 |
+
|
134 |
+
### Data Flow
|
135 |
+
|
136 |
+
1. **User uploads a document.**
|
137 |
+
2. **Document is split, embedded, and indexed in Pinecone.**
|
138 |
+
3. **User asks questions or requests logic-based questions.**
|
139 |
+
4. **Backend retrieves relevant document chunks and generates answers/feedback.**
|
140 |
+
5. **Frontend displays results to the user.**
|
141 |
+
|
142 |
+
---
|
143 |
+
|
144 |
+
## Known Issues & Workarounds
|
145 |
+
|
146 |
+
- **File uploads on Hugging Face Spaces:**
|
147 |
+
- Disable XSRF protection in Streamlit (`--server.enableXsrfProtection=false`).
|
148 |
+
- File uploads may still be restricted by platform security policies.
|
149 |
+
- **Database permissions:**
|
150 |
+
- Ensure `/app` is writable in Docker (handled by `chmod -R 777 /app` in Dockerfile).
|
151 |
+
- **Pinecone indexing:**
|
152 |
+
- Ensure Pinecone index exists and API key is valid.
|
153 |
+
|
154 |
+
---
|
155 |
+
|
156 |
+
## Folder Structure
|
157 |
+
|
158 |
+
smart-research-assistant/
|
159 |
+
βββ backend/ # FastAPI backend code
|
160 |
+
βββ frontend/ # Streamlit frontend code
|
161 |
+
βββ .env # Environment variables
|
162 |
+
βββ requirements.txt # Python dependencies
|
163 |
+
βββ Dockerfile # Docker build file
|
164 |
+
βββ start.sh # Startup script
|
165 |
+
βββ README.md # This file
|
166 |
+
|
167 |
+
|
168 |
+
---
|
169 |
+
|
170 |
+
## Additional Notes
|
171 |
+
|
172 |
+
- **Session management:** Each user session is tracked with a unique ID.
|
173 |
+
- **Vector search:** Chunks of uploaded documents are embedded and indexed in Pinecone for semantic retrieval.
|
174 |
+
- **LLM integration:** Uses OpenAI's GPT-4 for question-answering and feedback generation.
|
175 |
+
|
176 |
+
---
|
177 |
+
|