Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
13 |
|
14 |
# CV to CSV Extraction App
|
15 |
|
16 |
-
A Gradio application that extracts publications, talks, and other scholarly accomplishments from faculty CVs (PDFs) using Google's Gemini API.
|
17 |
|
18 |
## Features
|
19 |
|
@@ -22,6 +22,7 @@ A Gradio application that extracts publications, talks, and other scholarly acco
|
|
22 |
- Display results in a tabular format
|
23 |
- Download results as CSV
|
24 |
- Password protection using Hugging Face secrets
|
|
|
25 |
|
26 |
## Installation
|
27 |
|
@@ -39,6 +40,7 @@ A Gradio application that extracts publications, talks, and other scholarly acco
|
|
39 |
3. Create a `.env` file in the root directory with your Google API key:
|
40 |
```
|
41 |
GOOGLE_API_KEY=your_google_api_key_here
|
|
|
42 |
```
|
43 |
|
44 |
## Usage
|
@@ -74,12 +76,25 @@ A Gradio application that extracts publications, talks, and other scholarly acco
|
|
74 |
|
75 |
2. **PDF Processing**: The app extracts text from uploaded PDF files using PyPDF2
|
76 |
|
77 |
-
3. **LLM Processing
|
|
|
|
|
|
|
|
|
78 |
|
79 |
4. **Categorization**: Accomplishments are categorized into different types based on a decision tree approach
|
80 |
|
81 |
5. **Results Display**: The extracted accomplishments are displayed in a tabular format and can be downloaded as CSV
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
## Customization
|
84 |
|
85 |
### Changing the Password
|
@@ -101,4 +116,4 @@ To modify the categories of scholarly accomplishments, edit the `MAIN_CATEGORIES
|
|
101 |
|
102 |
## License
|
103 |
|
104 |
-
This project is licensed under the MIT License.
|
|
|
13 |
|
14 |
# CV to CSV Extraction App
|
15 |
|
16 |
+
A Gradio application that extracts publications, talks, and other scholarly accomplishments from faculty CVs (PDFs) using Google's Gemini API with Pydantic-AI for robust structured data extraction.
|
17 |
|
18 |
## Features
|
19 |
|
|
|
22 |
- Display results in a tabular format
|
23 |
- Download results as CSV
|
24 |
- Password protection using Hugging Face secrets
|
25 |
+
- Robust JSON parsing with Pydantic-AI
|
26 |
|
27 |
## Installation
|
28 |
|
|
|
40 |
3. Create a `.env` file in the root directory with your Google API key:
|
41 |
```
|
42 |
GOOGLE_API_KEY=your_google_api_key_here
|
43 |
+
APP_PASSWORD=your_app_password_here
|
44 |
```
|
45 |
|
46 |
## Usage
|
|
|
76 |
|
77 |
2. **PDF Processing**: The app extracts text from uploaded PDF files using PyPDF2
|
78 |
|
79 |
+
3. **LLM Processing with Pydantic-AI**:
|
80 |
+
- The extracted text is processed using Pydantic-AI with Google's Gemini model
|
81 |
+
- Pydantic models define the structure of the expected data
|
82 |
+
- This approach ensures more robust parsing and validation of the extracted data
|
83 |
+
- If Pydantic-AI processing fails, the app falls back to the standard Gemini API approach
|
84 |
|
85 |
4. **Categorization**: Accomplishments are categorized into different types based on a decision tree approach
|
86 |
|
87 |
5. **Results Display**: The extracted accomplishments are displayed in a tabular format and can be downloaded as CSV
|
88 |
|
89 |
+
## Pydantic-AI Integration
|
90 |
+
|
91 |
+
The app uses Pydantic-AI to improve the reliability of structured data extraction:
|
92 |
+
|
93 |
+
- **Defined Data Models**: Clear schema definitions for faculty data and accomplishments
|
94 |
+
- **Type Validation**: Ensures fields like years and confidence scores are properly typed
|
95 |
+
- **Default Values**: Handles missing fields gracefully with sensible defaults
|
96 |
+
- **Fallback Mechanism**: If Pydantic-AI extraction fails, the app falls back to standard extraction
|
97 |
+
|
98 |
## Customization
|
99 |
|
100 |
### Changing the Password
|
|
|
116 |
|
117 |
## License
|
118 |
|
119 |
+
This project is licensed under the MIT License.
|