File size: 7,718 Bytes
07cdbe6
35d7319
 
07cdbe6
35d7319
 
07cdbe6
35d7319
07cdbe6
35d7319
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
license: apache-2.0
title: Long Context Caching Gemini PDF QA
sdk: docker
emoji: ๐Ÿ“š
colorFrom: yellow
---
# ๐Ÿ“š Smart Document Analysis Platform

A modern web application that leverages Google Gemini API's caching capabilities to provide efficient document analysis. Upload documents once, ask questions forever!

## ๐Ÿš€ Features

- **Document Upload**: Upload PDF files via drag-and-drop or URL
- **Gemini API Caching**: Documents are cached using Gemini's explicit caching feature
- **Cost-Effective**: Save on API costs by reusing cached document tokens
- **Real-time Chat**: Ask multiple questions about your documents
- **Beautiful UI**: Modern, responsive design with smooth animations
- **Token Tracking**: See how many tokens are cached for cost transparency
- **Smart Error Handling**: Graceful handling of small documents that don't meet caching requirements

## ๐ŸŽฏ Use Cases

This platform is perfect for:

- **Research Analysis**: Upload research papers and ask detailed questions
- **Legal Document Review**: Analyze contracts, legal documents, and policies
- **Academic Studies**: Study course materials and textbooks
- **Business Reports**: Analyze quarterly reports, whitepapers, and presentations
- **Technical Documentation**: Review manuals, specifications, and guides

## โšก๏ธ Deploy on Hugging Face Spaces

You can deploy this app on [Hugging Face Spaces](https://huggingface.co/spaces) using the **Docker** SDK.

### 1. **Select Docker SDK**
- When creating your Space, choose **Docker** (not Gradio, not Static).

### 2. **Project Structure**
Make sure your repo includes:
- `app.py` (Flask app)
- `requirements.txt`
- `Dockerfile`
- `.env.example` (for reference, do not include secrets)

### 3. **Dockerfile**
A sample Dockerfile is provided:
```dockerfile
FROM python:3.10-slim
WORKDIR /app
RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "app.py"]
```

### 4. **Port Configuration**
The app will run on the port provided by the `PORT` environment variable (default 7860), as required by Hugging Face Spaces.

### 5. **Set Environment Variables**
- In your Space settings, add your `GOOGLE_API_KEY` as a secret environment variable.

### 6. **Push to Hugging Face**
- Push your code to the Space's Git repository.
- The build and deployment will happen automatically.

---

## ๐Ÿ“‹ Prerequisites

- Python 3.8 or higher
- Google Gemini API key
- Internet connection for API calls

## ๐Ÿ”ง Local Installation

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd smart-document-analysis
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables**
   ```bash
   cp .env.example .env
   ```
   
   Edit `.env` and add your Google Gemini API key:
   ```
   GOOGLE_API_KEY=your_actual_api_key_here
   ```

4. **Get your API key**
   - Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
   - Create a new API key
   - Copy it to your `.env` file

## ๐Ÿš€ Running the Application Locally

1. **Start the server**
   ```bash
   python app.py
   ```

2. **Open your browser**
   Navigate to `http://localhost:7860`

3. **Upload a document**
   - Drag and drop a PDF file, or
   - Click to select a file, or
   - Provide a URL to a PDF

4. **Start asking questions**
   Once your document is cached, you can ask unlimited questions!

## ๐Ÿ’ก How It Works

### 1. Document Upload
When you upload a PDF, the application:
- Uploads the file to Gemini's File API
- Checks if the document meets minimum token requirements (4,096 tokens)
- If eligible, creates a cache with the document content
- If too small, provides helpful error message and suggestions
- Stores cache metadata locally
- Returns a cache ID for future reference

### 2. Question Processing
When you ask a question:
- The question is sent to Gemini API
- The cached document content is automatically included
- You only pay for the question tokens, not the document tokens
- Responses are generated based on the cached content

### 3. Cost Savings
- **Without caching**: You pay for document tokens + question tokens every time
- **With caching**: You pay for document tokens once + question tokens for each question

## ๐Ÿ” API Endpoints

- `GET /` - Main application interface
- `POST /upload` - Upload PDF file
- `POST /upload-url` - Upload PDF from URL
- `POST /ask` - Ask question about cached document
- `GET /caches` - List all cached documents
- `DELETE /cache/<cache_id>` - Delete specific cache

## ๐Ÿ“Š Cost Analysis

### Example Scenario
- Document: 10,000 tokens
- Question: 50 tokens
- 10 questions asked

**Without Caching:**
- Cost = (10,000 + 50) ร— 10 = 100,500 tokens

**With Caching:**
- Cost = 10,000 + (50 ร— 10) = 10,500 tokens
- **Savings: 90% cost reduction!**

### Token Requirements
- **Minimum for caching**: 4,096 tokens
- **Recommended minimum**: 5,000 tokens for cost-effectiveness
- **Optimal range**: 10,000 - 100,000 tokens
- **Maximum**: Model-specific limits (check Gemini API docs)

## ๐ŸŽจ Customization

### Changing the Model
Edit `app.py` and change the model name:
```python
model="models/gemini-2.0-flash-001"  # Current
model="models/gemini-2.0-pro-001"    # Alternative
```

### Custom System Instructions
Modify the system instruction in the cache creation:
```python
system_instruction="Your custom instruction here"
```

### Cache TTL
Add TTL configuration to cache creation:
```python
config=types.CreateCachedContentConfig(
    system_instruction=system_instruction,
    contents=[document],
    ttl='24h'  # Cache for 24 hours
)
```

## ๐Ÿ”’ Security Considerations

- API keys are stored in environment variables
- File uploads are validated for PDF format
- Cached content is managed securely through Gemini API
- No sensitive data is stored locally

## ๐Ÿšง Production Deployment

For production deployment:

1. **Use a production WSGI server**
   ```bash
   pip install gunicorn
   gunicorn -w 4 -b 0.0.0.0:7860 app:app
   ```

2. **Add database storage**
   - Replace in-memory storage with PostgreSQL/MySQL
   - Add user authentication
   - Implement session management

3. **Add monitoring**
   - Log API usage and costs
   - Monitor cache hit rates
   - Track user interactions

4. **Security enhancements**
   - Add rate limiting
   - Implement file size limits
   - Add input validation

## ๐Ÿค Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## ๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

## ๐Ÿ™ Acknowledgments

- Google Gemini API for providing the caching functionality
- Flask community for the excellent web framework
- The open-source community for inspiration and tools

## ๐Ÿ“ž Support

If you encounter any issues:

1. Check the [Gemini API documentation](https://ai.google.dev/docs)
2. Verify your API key is correct
3. Ensure your PDF files are valid
4. Check the browser console for JavaScript errors
5. **For small document errors**: Upload a larger document or combine multiple documents

## ๐Ÿ”ฎ Future Enhancements

- [ ] Support for multiple file formats (Word, PowerPoint, etc.)
- [ ] User authentication and document sharing
- [ ] Advanced analytics and usage tracking
- [ ] Integration with cloud storage (Google Drive, Dropbox)
- [ ] Mobile app version
- [ ] Multi-language support
- [ ] Advanced caching strategies
- [ ] Real-time collaboration features
- [ ] Document preprocessing to meet token requirements
- [ ] Batch document processing