Spaces:
Sleeping
Sleeping
Merge remote-tracking branch 'hf/main'
Browse files- README.md +12 -83
- requirements .txt +0 -6
README.md
CHANGED
@@ -1,86 +1,15 @@
|
|
1 |
-
# RAG 30 Days Sprint ๐
|
2 |
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
-
|
6 |
|
7 |
-
|
8 |
-
|-----|--------|------------------------|--------|
|
9 |
-
| 1 | day1 | Hello world test file | โ
|
|
10 |
-
| 2 | day2 | TBD | โณ |
|
11 |
-
| ... | ... | ... | ... |
|
12 |
-
|
13 |
-
## ๐ Folder Structure
|
14 |
-
|
15 |
-
rag-30-days/
|
16 |
-
โ
|
17 |
-
โโโ day1/
|
18 |
-
โ โโโ hello_ai.py
|
19 |
-
โ
|
20 |
-
โโโ README.md
|
21 |
-
|
22 |
-
markdown
|
23 |
-
Copy
|
24 |
-
Edit
|
25 |
-
|
26 |
-
## ๐ง Goal
|
27 |
-
|
28 |
-
To build a production-ready RAG pipeline in 30 days and land a remote AI job by the end of the sprint.
|
29 |
-
|
30 |
-
## ๐ ๏ธ Tools
|
31 |
-
|
32 |
-
- Python
|
33 |
-
- LangChain
|
34 |
-
- ChromaDB / Weaviate / FAISS
|
35 |
-
- OpenAI API
|
36 |
-
- Streamlit (optional UI)
|
37 |
-
- Git & GitHub
|
38 |
-
|
39 |
-
## ๐ Progress
|
40 |
-
|
41 |
-
Check commits and folders daily to follow the sprint. Each folder corresponds to 1 day of learning and building.
|
42 |
-
|
43 |
-
## ๐
Day 1 โ Getting Started with Python & Flask
|
44 |
-
|
45 |
-
### โ
What I Learned
|
46 |
-
- Refreshed core **Python basics** (variables, functions, classes, etc.)
|
47 |
-
- Built my first **Flask API** with real-world JSON responses
|
48 |
-
- Practiced structured coding with **Copilot assistance**
|
49 |
-
|
50 |
-
### ๐ ๏ธ What I Built
|
51 |
-
- `hello_ai.py`: A minimal Python script to print a welcome message
|
52 |
-
- `api.py`: A Flask application with 3 endpoints:
|
53 |
-
- `/hello`: greeting message
|
54 |
-
- `/calculate`: accepts 2 numbers (POST) and returns their sum
|
55 |
-
- `/ai-ready`: motivational message for AI learning
|
56 |
-
|
57 |
-
### ๐ฎ Tomorrow's Plan
|
58 |
-
- Begin **LangChain** setup and environment configuration
|
59 |
-
- Start working on **RAG-based document processing**
|
60 |
-
- Set up folder structure and `day2` workflow
|
61 |
-
|
62 |
-
> ๐ฃ One day down, 29 to go. Keep shipping.
|
63 |
-
|
64 |
-
## Day 3: First RAG System โ
|
65 |
-
|
66 |
-
### What I Built
|
67 |
-
- PDF processing pipeline (loader + optimal chunker)
|
68 |
-
- Compared 3 chunking strategies (fixed, recursive, token)
|
69 |
-
- ChromaDB vector storage (persistent)
|
70 |
-
- SentenceTransformer embeddings (MiniLM)
|
71 |
-
- Gradio chat interface (upload PDF โ ask)
|
72 |
-
- Deployment on Hugging Face Spaces
|
73 |
-
|
74 |
-
### Key Learnings
|
75 |
-
- Fixed vs Recursive vs Token-based chunking trade-offs
|
76 |
-
- Embedding format must be list[list[float]] for Chroma
|
77 |
-
- New Chroma API uses `PersistentClient`
|
78 |
-
- Prompt design: extractive answers + fallback
|
79 |
-
|
80 |
-
### Live Demo
|
81 |
-
๐ [HuggingFace Space Link](https://didactic-winner-q7g79xg9gp4626w56-7860.app.github.dev/)
|
82 |
-
|
83 |
-
## ๐ฌ Contact
|
84 |
-
|
85 |
-
Made by [Hamid Omarov](https://www.linkedin.com/in/hamidomarov)
|
86 |
-
Check out my portfolio: [Notion Page](https://www.notion.so/AI-Content-Factory-Operations-2400a72a724c8050b5c6ddc0e6a0a77d)
|
|
|
|
|
1 |
|
2 |
+
---
|
3 |
+
title: PDF RAG (Chroma + Groq)
|
4 |
+
emoji: ๐
|
5 |
+
colorFrom: indigo
|
6 |
+
colorTo: green
|
7 |
+
sdk: gradio
|
8 |
+
sdk_version: "4.44.0"
|
9 |
+
app_file: app.py
|
10 |
+
pinned: false
|
11 |
+
---
|
12 |
|
13 |
+
# PDF RAG (Chroma + Groq)
|
14 |
|
15 |
+
Upload a PDF and ask questions. Uses ChromaDB for retrieval and Groq LLM for answers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements .txt
DELETED
@@ -1,6 +0,0 @@
|
|
1 |
-
gradio
|
2 |
-
chromadb
|
3 |
-
sentence-transformers
|
4 |
-
langchain-groq
|
5 |
-
pypdf
|
6 |
-
python-dotenv
|
|
|
|
|
|
|
|
|
|
|
|
|
|