umar-100 commited on
Commit
fdb6191
Β·
1 Parent(s): 93836c3

completed readme.md

Browse files
Files changed (1) hide show
  1. README.md +166 -18
README.md CHANGED
@@ -9,21 +9,169 @@ pinned: false
9
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
11
 
12
- # smart-research-assistant
13
-
14
- ## TODO:
15
- - [] pinecone utilis
16
- - [] RAG component using langchain/langgraph
17
- - [] database setup
18
- - [] backend using FastAPI
19
- - [] frontend using streamlit
20
- ## Deliverabilities
21
- - Answer questions that require comprehension and inference
22
- - Pose logic-based questions to users and evaluate their responses
23
- - Justify every answer with a reference from the document
24
- ### Functional Requirements
25
- - Input file (pdf/txt)
26
- - 2 modes (a) Ask anything (b) challenge me
27
- - Auto summary after document upload
28
- - Streamlit + FastAPI (current stack)
29
- - Bonus features i.e. state management and context highlighting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
11
 
12
+ # Smart Research Assistant
13
+
14
+ A document-aware AI assistant for research summarization, question-answering, and logic-based question generation, built with FastAPI (backend) and Streamlit (frontend), deployed as a single Docker container.
15
+
16
+ ---
17
+
18
+ ## Features
19
+
20
+ - **Document Upload:** Upload PDF or TXT documents for processing.
21
+ - **Auto Summary:** Generate concise (≀150 words) summaries of uploaded documents.
22
+ - **Ask Anything:** Ask free-form questions and receive answers grounded in the uploaded document.
23
+ - **Challenge Me:** Generate three logic-based or comprehension-focused questions from the document.
24
+ - **Evaluate Responses:** Evaluate user answers to logic-based questions with feedback and justification.
25
+ - **Session Management:** Each user session is tracked for document and interaction history.
26
+ - **Vector Database Integration:** Uses Pinecone for semantic search and retrieval.
27
+ - **Clean UI:** Intuitive web interface for uploading, querying, and interacting with documents.
28
+
29
+ ---
30
+
31
+ ## Architecture
32
+
33
+ - **Backend:** FastAPI (Python)
34
+ - Handles document upload, storage, and retrieval.
35
+ - Implements endpoints for Q&A, question generation, and answer evaluation.
36
+ - Uses SQLite for session/document management and Pinecone for vector search.
37
+ - **Frontend:** Streamlit (Python)
38
+ - Provides a web interface for users to upload documents, ask questions, and receive feedback.
39
+ - Communicates with the backend via HTTP requests.
40
+ - **Vector Database:** Pinecone
41
+ - Stores document embeddings for semantic search and retrieval.
42
+ - **Deployment:** Single Docker container with both backend and frontend services.
43
+ - FastAPI runs on port 8000 (internal).
44
+ - Streamlit runs on port 7860 (exposed to users).
45
+ - No Nginx or reverse proxy required for minimal setup.
46
+
47
+ ---
48
+
49
+ ## Setup
50
+
51
+ ### Requirements
52
+
53
+ - **Docker**
54
+ - **Pinecone API key** (for vector search)
55
+ - **OpenAI API key** (for LLM inference)
56
+
57
+ ---
58
+
59
+ ### 1. Clone the Repository
60
+
61
+ ```
62
+ git clone https://github.com/m-umar-j/smart-research-assistant.git
63
+ cd smart-research-assistant
64
+ ```
65
+
66
+ ---
67
+
68
+ ### 2. Environment Variables
69
+
70
+ Create a `.env` file in the project root with the following variables:
71
+
72
+ ```
73
+ OPENAI_API_KEY=your_openai_key
74
+ PINECONE_API_KEY=your_pinecone_key
75
+ ```
76
+
77
+
78
+ ---
79
+
80
+ ### 3. Build and Run with Docker
81
+
82
+ docker build -t smart-research-assistant .
83
+ docker run -p 7860:7860 --env-file .env smart-research-assistant
84
+
85
+
86
+ - **Port 7860** is exposed for Streamlit.
87
+ - **Port 8000** is used internally for FastAPI.
88
+
89
+ ---
90
+
91
+ ### 4. Access the Application
92
+
93
+ Open your browser to:
94
+
95
+ http://localhost:7860
96
+
97
+
98
+ ---
99
+
100
+ ## Commands
101
+
102
+ - **Start Streamlit and FastAPI (via `start.sh`):**
103
+
104
+ ```
105
+ cd /app && uvicorn backend.main:app --host 0.0.0.0 --port 8000 &
106
+ cd /app && streamlit run frontend/app.py --server.port=7860 --server.address=0.0.0.0 --browser.gatherUsageStats=false --server.enableXsrfProtection=false
107
+ ```
108
+
109
+
110
+ ---
111
+
112
+ ## Technical Details
113
+
114
+ ### Backend
115
+
116
+ - **FastAPI endpoints:**
117
+ - `/upload-doc`: Upload and index documents (PDF/TXT).
118
+ - `/list-docs`: List documents by session.
119
+ - `/chat`: Answer questions based on uploaded documents.
120
+ - `/challenge-me`: Generate logic-based questions.
121
+ - `/evaluate-response`: Evaluate user answers to logic-based questions.
122
+ - **Database:** SQLite (`research_assistant.db`) for session/document storage.
123
+ - **Vector Database:** Pinecone for document embeddings and semantic retrieval.
124
+
125
+ ### Frontend
126
+
127
+ - **Streamlit UI:**
128
+ - Upload documents.
129
+ - Display summaries.
130
+ - Ask questions and view answers.
131
+ - Generate and answer logic-based questions.
132
+ - View feedback on answers.
133
+
134
+ ### Data Flow
135
+
136
+ 1. **User uploads a document.**
137
+ 2. **Document is split, embedded, and indexed in Pinecone.**
138
+ 3. **User asks questions or requests logic-based questions.**
139
+ 4. **Backend retrieves relevant document chunks and generates answers/feedback.**
140
+ 5. **Frontend displays results to the user.**
141
+
142
+ ---
143
+
144
+ ## Known Issues & Workarounds
145
+
146
+ - **File uploads on Hugging Face Spaces:**
147
+ - Disable XSRF protection in Streamlit (`--server.enableXsrfProtection=false`).
148
+ - File uploads may still be restricted by platform security policies.
149
+ - **Database permissions:**
150
+ - Ensure `/app` is writable in Docker (handled by `chmod -R 777 /app` in Dockerfile).
151
+ - **Pinecone indexing:**
152
+ - Ensure Pinecone index exists and API key is valid.
153
+
154
+ ---
155
+
156
+ ## Folder Structure
157
+
158
+ smart-research-assistant/
159
+ β”œβ”€β”€ backend/ # FastAPI backend code
160
+ β”œβ”€β”€ frontend/ # Streamlit frontend code
161
+ β”œβ”€β”€ .env # Environment variables
162
+ β”œβ”€β”€ requirements.txt # Python dependencies
163
+ β”œβ”€β”€ Dockerfile # Docker build file
164
+ β”œβ”€β”€ start.sh # Startup script
165
+ └── README.md # This file
166
+
167
+
168
+ ---
169
+
170
+ ## Additional Notes
171
+
172
+ - **Session management:** Each user session is tracked with a unique ID.
173
+ - **Vector search:** Chunks of uploaded documents are embedded and indexed in Pinecone for semantic retrieval.
174
+ - **LLM integration:** Uses OpenAI's GPT-4 for question-answering and feedback generation.
175
+
176
+ ---
177
+