Omartificial-Intelligence-Space commited on
Commit
35d7319
ยท
verified ยท
1 Parent(s): 2812965

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +266 -6
README.md CHANGED
@@ -1,10 +1,270 @@
1
  ---
2
- title: Context Caching Gemini Pdf Qa
3
- emoji: ๐Ÿข
4
- colorFrom: purple
5
- colorTo: gray
6
  sdk: docker
7
- pinned: false
 
8
  ---
 
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ title: Long Context Caching Gemini PDF QA
 
 
4
  sdk: docker
5
+ emoji: ๐Ÿ“š
6
+ colorFrom: yellow
7
  ---
8
+ # ๐Ÿ“š Smart Document Analysis Platform
9
 
10
+ A modern web application that leverages Google Gemini API's caching capabilities to provide efficient document analysis. Upload documents once, ask questions forever!
11
+
12
+ ## ๐Ÿš€ Features
13
+
14
+ - **Document Upload**: Upload PDF files via drag-and-drop or URL
15
+ - **Gemini API Caching**: Documents are cached using Gemini's explicit caching feature
16
+ - **Cost-Effective**: Save on API costs by reusing cached document tokens
17
+ - **Real-time Chat**: Ask multiple questions about your documents
18
+ - **Beautiful UI**: Modern, responsive design with smooth animations
19
+ - **Token Tracking**: See how many tokens are cached for cost transparency
20
+ - **Smart Error Handling**: Graceful handling of small documents that don't meet caching requirements
21
+
22
+ ## ๐ŸŽฏ Use Cases
23
+
24
+ This platform is perfect for:
25
+
26
+ - **Research Analysis**: Upload research papers and ask detailed questions
27
+ - **Legal Document Review**: Analyze contracts, legal documents, and policies
28
+ - **Academic Studies**: Study course materials and textbooks
29
+ - **Business Reports**: Analyze quarterly reports, whitepapers, and presentations
30
+ - **Technical Documentation**: Review manuals, specifications, and guides
31
+
32
+ ## โšก๏ธ Deploy on Hugging Face Spaces
33
+
34
+ You can deploy this app on [Hugging Face Spaces](https://huggingface.co/spaces) using the **Docker** SDK.
35
+
36
+ ### 1. **Select Docker SDK**
37
+ - When creating your Space, choose **Docker** (not Gradio, not Static).
38
+
39
+ ### 2. **Project Structure**
40
+ Make sure your repo includes:
41
+ - `app.py` (Flask app)
42
+ - `requirements.txt`
43
+ - `Dockerfile`
44
+ - `.env.example` (for reference, do not include secrets)
45
+
46
+ ### 3. **Dockerfile**
47
+ A sample Dockerfile is provided:
48
+ ```dockerfile
49
+ FROM python:3.10-slim
50
+ WORKDIR /app
51
+ RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/*
52
+ COPY requirements.txt .
53
+ RUN pip install --no-cache-dir -r requirements.txt
54
+ COPY . .
55
+ EXPOSE 7860
56
+ CMD ["python", "app.py"]
57
+ ```
58
+
59
+ ### 4. **Port Configuration**
60
+ The app will run on the port provided by the `PORT` environment variable (default 7860), as required by Hugging Face Spaces.
61
+
62
+ ### 5. **Set Environment Variables**
63
+ - In your Space settings, add your `GOOGLE_API_KEY` as a secret environment variable.
64
+
65
+ ### 6. **Push to Hugging Face**
66
+ - Push your code to the Space's Git repository.
67
+ - The build and deployment will happen automatically.
68
+
69
+ ---
70
+
71
+ ## ๐Ÿ“‹ Prerequisites
72
+
73
+ - Python 3.8 or higher
74
+ - Google Gemini API key
75
+ - Internet connection for API calls
76
+
77
+ ## ๐Ÿ”ง Local Installation
78
+
79
+ 1. **Clone the repository**
80
+ ```bash
81
+ git clone <repository-url>
82
+ cd smart-document-analysis
83
+ ```
84
+
85
+ 2. **Install dependencies**
86
+ ```bash
87
+ pip install -r requirements.txt
88
+ ```
89
+
90
+ 3. **Set up environment variables**
91
+ ```bash
92
+ cp .env.example .env
93
+ ```
94
+
95
+ Edit `.env` and add your Google Gemini API key:
96
+ ```
97
+ GOOGLE_API_KEY=your_actual_api_key_here
98
+ ```
99
+
100
+ 4. **Get your API key**
101
+ - Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
102
+ - Create a new API key
103
+ - Copy it to your `.env` file
104
+
105
+ ## ๐Ÿš€ Running the Application Locally
106
+
107
+ 1. **Start the server**
108
+ ```bash
109
+ python app.py
110
+ ```
111
+
112
+ 2. **Open your browser**
113
+ Navigate to `http://localhost:7860`
114
+
115
+ 3. **Upload a document**
116
+ - Drag and drop a PDF file, or
117
+ - Click to select a file, or
118
+ - Provide a URL to a PDF
119
+
120
+ 4. **Start asking questions**
121
+ Once your document is cached, you can ask unlimited questions!
122
+
123
+ ## ๐Ÿ’ก How It Works
124
+
125
+ ### 1. Document Upload
126
+ When you upload a PDF, the application:
127
+ - Uploads the file to Gemini's File API
128
+ - Checks if the document meets minimum token requirements (4,096 tokens)
129
+ - If eligible, creates a cache with the document content
130
+ - If too small, provides helpful error message and suggestions
131
+ - Stores cache metadata locally
132
+ - Returns a cache ID for future reference
133
+
134
+ ### 2. Question Processing
135
+ When you ask a question:
136
+ - The question is sent to Gemini API
137
+ - The cached document content is automatically included
138
+ - You only pay for the question tokens, not the document tokens
139
+ - Responses are generated based on the cached content
140
+
141
+ ### 3. Cost Savings
142
+ - **Without caching**: You pay for document tokens + question tokens every time
143
+ - **With caching**: You pay for document tokens once + question tokens for each question
144
+
145
+ ## ๐Ÿ” API Endpoints
146
+
147
+ - `GET /` - Main application interface
148
+ - `POST /upload` - Upload PDF file
149
+ - `POST /upload-url` - Upload PDF from URL
150
+ - `POST /ask` - Ask question about cached document
151
+ - `GET /caches` - List all cached documents
152
+ - `DELETE /cache/<cache_id>` - Delete specific cache
153
+
154
+ ## ๐Ÿ“Š Cost Analysis
155
+
156
+ ### Example Scenario
157
+ - Document: 10,000 tokens
158
+ - Question: 50 tokens
159
+ - 10 questions asked
160
+
161
+ **Without Caching:**
162
+ - Cost = (10,000 + 50) ร— 10 = 100,500 tokens
163
+
164
+ **With Caching:**
165
+ - Cost = 10,000 + (50 ร— 10) = 10,500 tokens
166
+ - **Savings: 90% cost reduction!**
167
+
168
+ ### Token Requirements
169
+ - **Minimum for caching**: 4,096 tokens
170
+ - **Recommended minimum**: 5,000 tokens for cost-effectiveness
171
+ - **Optimal range**: 10,000 - 100,000 tokens
172
+ - **Maximum**: Model-specific limits (check Gemini API docs)
173
+
174
+ ## ๐ŸŽจ Customization
175
+
176
+ ### Changing the Model
177
+ Edit `app.py` and change the model name:
178
+ ```python
179
+ model="models/gemini-2.0-flash-001" # Current
180
+ model="models/gemini-2.0-pro-001" # Alternative
181
+ ```
182
+
183
+ ### Custom System Instructions
184
+ Modify the system instruction in the cache creation:
185
+ ```python
186
+ system_instruction="Your custom instruction here"
187
+ ```
188
+
189
+ ### Cache TTL
190
+ Add TTL configuration to cache creation:
191
+ ```python
192
+ config=types.CreateCachedContentConfig(
193
+ system_instruction=system_instruction,
194
+ contents=[document],
195
+ ttl='24h' # Cache for 24 hours
196
+ )
197
+ ```
198
+
199
+ ## ๐Ÿ”’ Security Considerations
200
+
201
+ - API keys are stored in environment variables
202
+ - File uploads are validated for PDF format
203
+ - Cached content is managed securely through Gemini API
204
+ - No sensitive data is stored locally
205
+
206
+ ## ๐Ÿšง Production Deployment
207
+
208
+ For production deployment:
209
+
210
+ 1. **Use a production WSGI server**
211
+ ```bash
212
+ pip install gunicorn
213
+ gunicorn -w 4 -b 0.0.0.0:7860 app:app
214
+ ```
215
+
216
+ 2. **Add database storage**
217
+ - Replace in-memory storage with PostgreSQL/MySQL
218
+ - Add user authentication
219
+ - Implement session management
220
+
221
+ 3. **Add monitoring**
222
+ - Log API usage and costs
223
+ - Monitor cache hit rates
224
+ - Track user interactions
225
+
226
+ 4. **Security enhancements**
227
+ - Add rate limiting
228
+ - Implement file size limits
229
+ - Add input validation
230
+
231
+ ## ๐Ÿค Contributing
232
+
233
+ 1. Fork the repository
234
+ 2. Create a feature branch
235
+ 3. Make your changes
236
+ 4. Add tests if applicable
237
+ 5. Submit a pull request
238
+
239
+ ## ๐Ÿ“ License
240
+
241
+ This project is licensed under the MIT License - see the LICENSE file for details.
242
+
243
+ ## ๐Ÿ™ Acknowledgments
244
+
245
+ - Google Gemini API for providing the caching functionality
246
+ - Flask community for the excellent web framework
247
+ - The open-source community for inspiration and tools
248
+
249
+ ## ๐Ÿ“ž Support
250
+
251
+ If you encounter any issues:
252
+
253
+ 1. Check the [Gemini API documentation](https://ai.google.dev/docs)
254
+ 2. Verify your API key is correct
255
+ 3. Ensure your PDF files are valid
256
+ 4. Check the browser console for JavaScript errors
257
+ 5. **For small document errors**: Upload a larger document or combine multiple documents
258
+
259
+ ## ๐Ÿ”ฎ Future Enhancements
260
+
261
+ - [ ] Support for multiple file formats (Word, PowerPoint, etc.)
262
+ - [ ] User authentication and document sharing
263
+ - [ ] Advanced analytics and usage tracking
264
+ - [ ] Integration with cloud storage (Google Drive, Dropbox)
265
+ - [ ] Mobile app version
266
+ - [ ] Multi-language support
267
+ - [ ] Advanced caching strategies
268
+ - [ ] Real-time collaboration features
269
+ - [ ] Document preprocessing to meet token requirements
270
+ - [ ] Batch document processing