Spaces:
Sleeping
Sleeping
Hamid Omarov
commited on
Commit
Β·
22f4b8b
1
Parent(s):
0ebeee2
Add Day 3 README
Browse files
README.md
CHANGED
@@ -61,6 +61,24 @@ Check commits and folders daily to follow the sprint. Each folder corresponds to
|
|
61 |
|
62 |
> π£ One day down, 29 to go. Keep shipping.
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
## π¬ Contact
|
66 |
|
|
|
61 |
|
62 |
> π£ One day down, 29 to go. Keep shipping.
|
63 |
|
64 |
+
## Day 3: First RAG System β
|
65 |
+
|
66 |
+
### What I Built
|
67 |
+
- PDF processing pipeline (loader + optimal chunker)
|
68 |
+
- Compared 3 chunking strategies (fixed, recursive, token)
|
69 |
+
- ChromaDB vector storage (persistent)
|
70 |
+
- SentenceTransformer embeddings (MiniLM)
|
71 |
+
- Gradio chat interface (upload PDF β ask)
|
72 |
+
- Deployment on Hugging Face Spaces
|
73 |
+
|
74 |
+
### Key Learnings
|
75 |
+
- Fixed vs Recursive vs Token-based chunking trade-offs
|
76 |
+
- Embedding format must be list[list[float]] for Chroma
|
77 |
+
- New Chroma API uses `PersistentClient`
|
78 |
+
- Prompt design: extractive answers + fallback
|
79 |
+
|
80 |
+
### Live Demo
|
81 |
+
π [HuggingFace Space Link](https://didactic-winner-q7g79xg9gp4626w56-7860.app.github.dev/)
|
82 |
|
83 |
## π¬ Contact
|
84 |
|