Electrol_roll / README.md
shivam0109's picture
Added file as per requiremnets
f1a0c7b
|
raw
history blame
1.38 kB

πŸ—³οΈ Hindi Voter PDF Processor with LLM API (OCR + OpenRouter + Gradio)

This app extracts voter information from scanned PDFs using OCR and formats it into a structured CSV using an LLM API (via OpenRouter).


πŸ“¦ Features

  • Extracts text from Hindi/English PDFs using EasyOCR
  • Splits content to avoid LLM token limits
  • Sends chunked JSON to LLM for conversion to clean CSV
  • Uses OpenRouter LLM API (e.g., Gemma-3b)
  • Interactive UI with Gradio
  • Supports download of extracted JSON and final CSV

🌐 Get Your OpenRouter API Key

  • Go to https://openrouter.ai

  • Click Login (use Google/GitHub/Email)

  • Navigate to the Models page

  • Click on a model like gemma-3b, mistral, etc.

  • On the model page, click "Create API Key"

  • Copy the API key

πŸ§ͺ How to Use the Gradio App

πŸ”Ή Tab 1: PDF Processing

  • Upload a Hindi/English scanned PDF

  • Click "Process PDF"

  • View extracted text in JSON format

  • Download JSON file if needed

πŸ”Ή Tab 2: LLM API Processing

  • Paste your OpenRouter API key

  • (Optional) Customize the prompt or add instructions

  • Click "Call LLM API"

  • View structured voter data in CSV format

  • Download the CSV file

  • Enable Debug Mode to see raw API responses for troubleshooting.

πŸ“ Output Files

Extracted JSON and CSV files are saved in the processed_json/ folder.