|
--- |
|
license: apache-2.0 |
|
title: Long Context Caching Gemini PDF QA |
|
sdk: docker |
|
emoji: ๐ |
|
colorFrom: yellow |
|
--- |
|
# ๐ Smart Document Analysis Platform |
|
|
|
A modern web application that leverages Google Gemini API's caching capabilities to provide efficient document analysis. Upload documents once, ask questions forever! |
|
|
|
## ๐ Features |
|
|
|
- **Document Upload**: Upload PDF files via drag-and-drop or URL |
|
- **Gemini API Caching**: Documents are cached using Gemini's explicit caching feature |
|
- **Cost-Effective**: Save on API costs by reusing cached document tokens |
|
- **Real-time Chat**: Ask multiple questions about your documents |
|
- **Beautiful UI**: Modern, responsive design with smooth animations |
|
- **Token Tracking**: See how many tokens are cached for cost transparency |
|
- **Smart Error Handling**: Graceful handling of small documents that don't meet caching requirements |
|
|
|
## ๐ฏ Use Cases |
|
|
|
This platform is perfect for: |
|
|
|
- **Research Analysis**: Upload research papers and ask detailed questions |
|
- **Legal Document Review**: Analyze contracts, legal documents, and policies |
|
- **Academic Studies**: Study course materials and textbooks |
|
- **Business Reports**: Analyze quarterly reports, whitepapers, and presentations |
|
- **Technical Documentation**: Review manuals, specifications, and guides |
|
|
|
## โก๏ธ Deploy on Hugging Face Spaces |
|
|
|
You can deploy this app on [Hugging Face Spaces](https://huggingface.co/spaces) using the **Docker** SDK. |
|
|
|
### 1. **Select Docker SDK** |
|
- When creating your Space, choose **Docker** (not Gradio, not Static). |
|
|
|
### 2. **Project Structure** |
|
Make sure your repo includes: |
|
- `app.py` (Flask app) |
|
- `requirements.txt` |
|
- `Dockerfile` |
|
- `.env.example` (for reference, do not include secrets) |
|
|
|
### 3. **Dockerfile** |
|
A sample Dockerfile is provided: |
|
```dockerfile |
|
FROM python:3.10-slim |
|
WORKDIR /app |
|
RUN apt-get update && apt-get install -y build-essential && rm -rf /var/lib/apt/lists/* |
|
COPY requirements.txt . |
|
RUN pip install --no-cache-dir -r requirements.txt |
|
COPY . . |
|
EXPOSE 7860 |
|
CMD ["python", "app.py"] |
|
``` |
|
|
|
### 4. **Port Configuration** |
|
The app will run on the port provided by the `PORT` environment variable (default 7860), as required by Hugging Face Spaces. |
|
|
|
### 5. **Set Environment Variables** |
|
- In your Space settings, add your `GOOGLE_API_KEY` as a secret environment variable. |
|
|
|
### 6. **Push to Hugging Face** |
|
- Push your code to the Space's Git repository. |
|
- The build and deployment will happen automatically. |
|
|
|
--- |
|
|
|
## ๐ Prerequisites |
|
|
|
- Python 3.8 or higher |
|
- Google Gemini API key |
|
- Internet connection for API calls |
|
|
|
## ๐ง Local Installation |
|
|
|
1. **Clone the repository** |
|
```bash |
|
git clone <repository-url> |
|
cd smart-document-analysis |
|
``` |
|
|
|
2. **Install dependencies** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. **Set up environment variables** |
|
```bash |
|
cp .env.example .env |
|
``` |
|
|
|
Edit `.env` and add your Google Gemini API key: |
|
``` |
|
GOOGLE_API_KEY=your_actual_api_key_here |
|
``` |
|
|
|
4. **Get your API key** |
|
- Visit [Google AI Studio](https://makersuite.google.com/app/apikey) |
|
- Create a new API key |
|
- Copy it to your `.env` file |
|
|
|
## ๐ Running the Application Locally |
|
|
|
1. **Start the server** |
|
```bash |
|
python app.py |
|
``` |
|
|
|
2. **Open your browser** |
|
Navigate to `http://localhost:7860` |
|
|
|
3. **Upload a document** |
|
- Drag and drop a PDF file, or |
|
- Click to select a file, or |
|
- Provide a URL to a PDF |
|
|
|
4. **Start asking questions** |
|
Once your document is cached, you can ask unlimited questions! |
|
|
|
## ๐ก How It Works |
|
|
|
### 1. Document Upload |
|
When you upload a PDF, the application: |
|
- Uploads the file to Gemini's File API |
|
- Checks if the document meets minimum token requirements (4,096 tokens) |
|
- If eligible, creates a cache with the document content |
|
- If too small, provides helpful error message and suggestions |
|
- Stores cache metadata locally |
|
- Returns a cache ID for future reference |
|
|
|
### 2. Question Processing |
|
When you ask a question: |
|
- The question is sent to Gemini API |
|
- The cached document content is automatically included |
|
- You only pay for the question tokens, not the document tokens |
|
- Responses are generated based on the cached content |
|
|
|
### 3. Cost Savings |
|
- **Without caching**: You pay for document tokens + question tokens every time |
|
- **With caching**: You pay for document tokens once + question tokens for each question |
|
|
|
## ๐ API Endpoints |
|
|
|
- `GET /` - Main application interface |
|
- `POST /upload` - Upload PDF file |
|
- `POST /upload-url` - Upload PDF from URL |
|
- `POST /ask` - Ask question about cached document |
|
- `GET /caches` - List all cached documents |
|
- `DELETE /cache/<cache_id>` - Delete specific cache |
|
|
|
## ๐ Cost Analysis |
|
|
|
### Example Scenario |
|
- Document: 10,000 tokens |
|
- Question: 50 tokens |
|
- 10 questions asked |
|
|
|
**Without Caching:** |
|
- Cost = (10,000 + 50) ร 10 = 100,500 tokens |
|
|
|
**With Caching:** |
|
- Cost = 10,000 + (50 ร 10) = 10,500 tokens |
|
- **Savings: 90% cost reduction!** |
|
|
|
### Token Requirements |
|
- **Minimum for caching**: 4,096 tokens |
|
- **Recommended minimum**: 5,000 tokens for cost-effectiveness |
|
- **Optimal range**: 10,000 - 100,000 tokens |
|
- **Maximum**: Model-specific limits (check Gemini API docs) |
|
|
|
## ๐จ Customization |
|
|
|
### Changing the Model |
|
Edit `app.py` and change the model name: |
|
```python |
|
model="models/gemini-2.0-flash-001" # Current |
|
model="models/gemini-2.0-pro-001" # Alternative |
|
``` |
|
|
|
### Custom System Instructions |
|
Modify the system instruction in the cache creation: |
|
```python |
|
system_instruction="Your custom instruction here" |
|
``` |
|
|
|
### Cache TTL |
|
Add TTL configuration to cache creation: |
|
```python |
|
config=types.CreateCachedContentConfig( |
|
system_instruction=system_instruction, |
|
contents=[document], |
|
ttl='24h' # Cache for 24 hours |
|
) |
|
``` |
|
|
|
## ๐ Security Considerations |
|
|
|
- API keys are stored in environment variables |
|
- File uploads are validated for PDF format |
|
- Cached content is managed securely through Gemini API |
|
- No sensitive data is stored locally |
|
|
|
## ๐ง Production Deployment |
|
|
|
For production deployment: |
|
|
|
1. **Use a production WSGI server** |
|
```bash |
|
pip install gunicorn |
|
gunicorn -w 4 -b 0.0.0.0:7860 app:app |
|
``` |
|
|
|
2. **Add database storage** |
|
- Replace in-memory storage with PostgreSQL/MySQL |
|
- Add user authentication |
|
- Implement session management |
|
|
|
3. **Add monitoring** |
|
- Log API usage and costs |
|
- Monitor cache hit rates |
|
- Track user interactions |
|
|
|
4. **Security enhancements** |
|
- Add rate limiting |
|
- Implement file size limits |
|
- Add input validation |
|
|
|
## ๐ค Contributing |
|
|
|
1. Fork the repository |
|
2. Create a feature branch |
|
3. Make your changes |
|
4. Add tests if applicable |
|
5. Submit a pull request |
|
|
|
## ๐ License |
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
|
## ๐ Acknowledgments |
|
|
|
- Google Gemini API for providing the caching functionality |
|
- Flask community for the excellent web framework |
|
- The open-source community for inspiration and tools |
|
|
|
## ๐ Support |
|
|
|
If you encounter any issues: |
|
|
|
1. Check the [Gemini API documentation](https://ai.google.dev/docs) |
|
2. Verify your API key is correct |
|
3. Ensure your PDF files are valid |
|
4. Check the browser console for JavaScript errors |
|
5. **For small document errors**: Upload a larger document or combine multiple documents |
|
|
|
## ๐ฎ Future Enhancements |
|
|
|
- [ ] Support for multiple file formats (Word, PowerPoint, etc.) |
|
- [ ] User authentication and document sharing |
|
- [ ] Advanced analytics and usage tracking |
|
- [ ] Integration with cloud storage (Google Drive, Dropbox) |
|
- [ ] Mobile app version |
|
- [ ] Multi-language support |
|
- [ ] Advanced caching strategies |
|
- [ ] Real-time collaboration features |
|
- [ ] Document preprocessing to meet token requirements |
|
- [ ] Batch document processing |