Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -13,94 +13,3 @@ pinned: false
|
|
13 |
# PDF RAG (Chroma + Groq)
|
14 |
|
15 |
Upload a PDF and ask questions. Uses ChromaDB for retrieval and Groq LLM for answers.
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
# RAG 30 Days Sprint ๐
|
21 |
-
|
22 |
-
This repository contains a 30-day sprint to master Retrieval-Augmented Generation (RAG) systems using Python, LangChain, and modern AI tools.
|
23 |
-
|
24 |
-
## ๐
Day Tracker
|
25 |
-
|
26 |
-
| Day | Folder | Description | Status |
|
27 |
-
|-----|--------|------------------------|--------|
|
28 |
-
| 1 | day1 | Hello world test file | โ
|
|
29 |
-
| 2 | day2 | TBD | โณ |
|
30 |
-
| ... | ... | ... | ... |
|
31 |
-
|
32 |
-
## ๐ Folder Structure
|
33 |
-
|
34 |
-
rag-30-days/
|
35 |
-
โ
|
36 |
-
โโโ day1/
|
37 |
-
โ โโโ hello_ai.py
|
38 |
-
โ
|
39 |
-
โโโ README.md
|
40 |
-
|
41 |
-
markdown
|
42 |
-
Copy
|
43 |
-
Edit
|
44 |
-
|
45 |
-
## ๐ง Goal
|
46 |
-
|
47 |
-
To build a production-ready RAG pipeline in 30 days and land a remote AI job by the end of the sprint.
|
48 |
-
|
49 |
-
## ๐ ๏ธ Tools
|
50 |
-
|
51 |
-
- Python
|
52 |
-
- LangChain
|
53 |
-
- ChromaDB / Weaviate / FAISS
|
54 |
-
- OpenAI API
|
55 |
-
- Streamlit (optional UI)
|
56 |
-
- Git & GitHub
|
57 |
-
|
58 |
-
## ๐ Progress
|
59 |
-
|
60 |
-
Check commits and folders daily to follow the sprint. Each folder corresponds to 1 day of learning and building.
|
61 |
-
|
62 |
-
## ๐
Day 1 โ Getting Started with Python & Flask
|
63 |
-
|
64 |
-
### โ
What I Learned
|
65 |
-
- Refreshed core **Python basics** (variables, functions, classes, etc.)
|
66 |
-
- Built my first **Flask API** with real-world JSON responses
|
67 |
-
- Practiced structured coding with **Copilot assistance**
|
68 |
-
|
69 |
-
### ๐ ๏ธ What I Built
|
70 |
-
- `hello_ai.py`: A minimal Python script to print a welcome message
|
71 |
-
- `api.py`: A Flask application with 3 endpoints:
|
72 |
-
- `/hello`: greeting message
|
73 |
-
- `/calculate`: accepts 2 numbers (POST) and returns their sum
|
74 |
-
- `/ai-ready`: motivational message for AI learning
|
75 |
-
|
76 |
-
### ๐ฎ Tomorrow's Plan
|
77 |
-
- Begin **LangChain** setup and environment configuration
|
78 |
-
- Start working on **RAG-based document processing**
|
79 |
-
- Set up folder structure and `day2` workflow
|
80 |
-
|
81 |
-
> ๐ฃ One day down, 29 to go. Keep shipping.
|
82 |
-
|
83 |
-
## Day 3: First RAG System โ
|
84 |
-
|
85 |
-
### What I Built
|
86 |
-
- PDF processing pipeline (loader + optimal chunker)
|
87 |
-
- Compared 3 chunking strategies (fixed, recursive, token)
|
88 |
-
- ChromaDB vector storage (persistent)
|
89 |
-
- SentenceTransformer embeddings (MiniLM)
|
90 |
-
- Gradio chat interface (upload PDF โ ask)
|
91 |
-
- Deployment on Hugging Face Spaces
|
92 |
-
|
93 |
-
### Key Learnings
|
94 |
-
- Fixed vs Recursive vs Token-based chunking trade-offs
|
95 |
-
- Embedding format must be list[list[float]] for Chroma
|
96 |
-
- New Chroma API uses `PersistentClient`
|
97 |
-
- Prompt design: extractive answers + fallback
|
98 |
-
|
99 |
-
### Live Demo
|
100 |
-
๐ [HuggingFace Space Link](https://didactic-winner-q7g79xg9gp4626w56-7860.app.github.dev/)
|
101 |
-
|
102 |
-
## ๐ฌ Contact
|
103 |
-
|
104 |
-
Made by [Hamid Omarov](https://www.linkedin.com/in/hamidomarov)
|
105 |
-
Check out my portfolio: [Notion Page](https://www.notion.so/AI-Content-Factory-Operations-2400a72a724c8050b5c6ddc0e6a0a77d)
|
106 |
-
|
|
|
13 |
# PDF RAG (Chroma + Groq)
|
14 |
|
15 |
Upload a PDF and ask questions. Uses ChromaDB for retrieval and Groq LLM for answers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|