Hamid Omarov commited on
Commit
a6f5647
ยท
2 Parent(s): 2005cbc 5ba6d70

Merge remote-tracking branch 'hf/main'

Browse files
Files changed (2) hide show
  1. README.md +12 -83
  2. requirements .txt +0 -6
README.md CHANGED
@@ -1,86 +1,15 @@
1
- # RAG 30 Days Sprint ๐Ÿš€
2
 
3
- This repository contains a 30-day sprint to master Retrieval-Augmented Generation (RAG) systems using Python, LangChain, and modern AI tools.
 
 
 
 
 
 
 
 
 
4
 
5
- ## ๐Ÿ“… Day Tracker
6
 
7
- | Day | Folder | Description | Status |
8
- |-----|--------|------------------------|--------|
9
- | 1 | day1 | Hello world test file | โœ… |
10
- | 2 | day2 | TBD | โณ |
11
- | ... | ... | ... | ... |
12
-
13
- ## ๐Ÿ“‚ Folder Structure
14
-
15
- rag-30-days/
16
- โ”‚
17
- โ”œโ”€โ”€ day1/
18
- โ”‚ โ””โ”€โ”€ hello_ai.py
19
- โ”‚
20
- โ”œโ”€โ”€ README.md
21
-
22
- markdown
23
- Copy
24
- Edit
25
-
26
- ## ๐Ÿง  Goal
27
-
28
- To build a production-ready RAG pipeline in 30 days and land a remote AI job by the end of the sprint.
29
-
30
- ## ๐Ÿ› ๏ธ Tools
31
-
32
- - Python
33
- - LangChain
34
- - ChromaDB / Weaviate / FAISS
35
- - OpenAI API
36
- - Streamlit (optional UI)
37
- - Git & GitHub
38
-
39
- ## ๐Ÿ“ˆ Progress
40
-
41
- Check commits and folders daily to follow the sprint. Each folder corresponds to 1 day of learning and building.
42
-
43
- ## ๐Ÿ“… Day 1 โ€“ Getting Started with Python & Flask
44
-
45
- ### โœ… What I Learned
46
- - Refreshed core **Python basics** (variables, functions, classes, etc.)
47
- - Built my first **Flask API** with real-world JSON responses
48
- - Practiced structured coding with **Copilot assistance**
49
-
50
- ### ๐Ÿ› ๏ธ What I Built
51
- - `hello_ai.py`: A minimal Python script to print a welcome message
52
- - `api.py`: A Flask application with 3 endpoints:
53
- - `/hello`: greeting message
54
- - `/calculate`: accepts 2 numbers (POST) and returns their sum
55
- - `/ai-ready`: motivational message for AI learning
56
-
57
- ### ๐Ÿ”ฎ Tomorrow's Plan
58
- - Begin **LangChain** setup and environment configuration
59
- - Start working on **RAG-based document processing**
60
- - Set up folder structure and `day2` workflow
61
-
62
- > ๐Ÿ‘ฃ One day down, 29 to go. Keep shipping.
63
-
64
- ## Day 3: First RAG System โœ…
65
-
66
- ### What I Built
67
- - PDF processing pipeline (loader + optimal chunker)
68
- - Compared 3 chunking strategies (fixed, recursive, token)
69
- - ChromaDB vector storage (persistent)
70
- - SentenceTransformer embeddings (MiniLM)
71
- - Gradio chat interface (upload PDF โ†’ ask)
72
- - Deployment on Hugging Face Spaces
73
-
74
- ### Key Learnings
75
- - Fixed vs Recursive vs Token-based chunking trade-offs
76
- - Embedding format must be list[list[float]] for Chroma
77
- - New Chroma API uses `PersistentClient`
78
- - Prompt design: extractive answers + fallback
79
-
80
- ### Live Demo
81
- ๐Ÿ”— [HuggingFace Space Link](https://didactic-winner-q7g79xg9gp4626w56-7860.app.github.dev/)
82
-
83
- ## ๐Ÿ“ฌ Contact
84
-
85
- Made by [Hamid Omarov](https://www.linkedin.com/in/hamidomarov)
86
- Check out my portfolio: [Notion Page](https://www.notion.so/AI-Content-Factory-Operations-2400a72a724c8050b5c6ddc0e6a0a77d)
 
 
1
 
2
+ ---
3
+ title: PDF RAG (Chroma + Groq)
4
+ emoji: ๐Ÿ“š
5
+ colorFrom: indigo
6
+ colorTo: green
7
+ sdk: gradio
8
+ sdk_version: "4.44.0"
9
+ app_file: app.py
10
+ pinned: false
11
+ ---
12
 
13
+ # PDF RAG (Chroma + Groq)
14
 
15
+ Upload a PDF and ask questions. Uses ChromaDB for retrieval and Groq LLM for answers.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements .txt DELETED
@@ -1,6 +0,0 @@
1
- gradio
2
- chromadb
3
- sentence-transformers
4
- langchain-groq
5
- pypdf
6
- python-dotenv