Zwounds commited on
Commit
0b47d18
·
verified ·
1 Parent(s): 323677a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -13,7 +13,7 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
13
 
14
  # CV to CSV Extraction App
15
 
16
- A Gradio application that extracts publications, talks, and other scholarly accomplishments from faculty CVs (PDFs) using Google's Gemini API.
17
 
18
  ## Features
19
 
@@ -22,6 +22,7 @@ A Gradio application that extracts publications, talks, and other scholarly acco
22
  - Display results in a tabular format
23
  - Download results as CSV
24
  - Password protection using Hugging Face secrets
 
25
 
26
  ## Installation
27
 
@@ -39,6 +40,7 @@ A Gradio application that extracts publications, talks, and other scholarly acco
39
  3. Create a `.env` file in the root directory with your Google API key:
40
  ```
41
  GOOGLE_API_KEY=your_google_api_key_here
 
42
  ```
43
 
44
  ## Usage
@@ -74,12 +76,25 @@ A Gradio application that extracts publications, talks, and other scholarly acco
74
 
75
  2. **PDF Processing**: The app extracts text from uploaded PDF files using PyPDF2
76
 
77
- 3. **LLM Processing**: The extracted text is sent to Google's Gemini API to identify faculty names and extract scholarly accomplishments
 
 
 
 
78
 
79
  4. **Categorization**: Accomplishments are categorized into different types based on a decision tree approach
80
 
81
  5. **Results Display**: The extracted accomplishments are displayed in a tabular format and can be downloaded as CSV
82
 
 
 
 
 
 
 
 
 
 
83
  ## Customization
84
 
85
  ### Changing the Password
@@ -101,4 +116,4 @@ To modify the categories of scholarly accomplishments, edit the `MAIN_CATEGORIES
101
 
102
  ## License
103
 
104
- This project is licensed under the MIT License.
 
13
 
14
  # CV to CSV Extraction App
15
 
16
+ A Gradio application that extracts publications, talks, and other scholarly accomplishments from faculty CVs (PDFs) using Google's Gemini API with Pydantic-AI for robust structured data extraction.
17
 
18
  ## Features
19
 
 
22
  - Display results in a tabular format
23
  - Download results as CSV
24
  - Password protection using Hugging Face secrets
25
+ - Robust JSON parsing with Pydantic-AI
26
 
27
  ## Installation
28
 
 
40
  3. Create a `.env` file in the root directory with your Google API key:
41
  ```
42
  GOOGLE_API_KEY=your_google_api_key_here
43
+ APP_PASSWORD=your_app_password_here
44
  ```
45
 
46
  ## Usage
 
76
 
77
  2. **PDF Processing**: The app extracts text from uploaded PDF files using PyPDF2
78
 
79
+ 3. **LLM Processing with Pydantic-AI**:
80
+ - The extracted text is processed using Pydantic-AI with Google's Gemini model
81
+ - Pydantic models define the structure of the expected data
82
+ - This approach ensures more robust parsing and validation of the extracted data
83
+ - If Pydantic-AI processing fails, the app falls back to the standard Gemini API approach
84
 
85
  4. **Categorization**: Accomplishments are categorized into different types based on a decision tree approach
86
 
87
  5. **Results Display**: The extracted accomplishments are displayed in a tabular format and can be downloaded as CSV
88
 
89
+ ## Pydantic-AI Integration
90
+
91
+ The app uses Pydantic-AI to improve the reliability of structured data extraction:
92
+
93
+ - **Defined Data Models**: Clear schema definitions for faculty data and accomplishments
94
+ - **Type Validation**: Ensures fields like years and confidence scores are properly typed
95
+ - **Default Values**: Handles missing fields gracefully with sensible defaults
96
+ - **Fallback Mechanism**: If Pydantic-AI extraction fails, the app falls back to standard extraction
97
+
98
  ## Customization
99
 
100
  ### Changing the Password
 
116
 
117
  ## License
118
 
119
+ This project is licensed under the MIT License.