First_RAG_System / README.md
Hamid Omarov
Add Day 3 README
22f4b8b
|
raw
history blame
2.61 kB

RAG 30 Days Sprint ๐Ÿš€

This repository contains a 30-day sprint to master Retrieval-Augmented Generation (RAG) systems using Python, LangChain, and modern AI tools.

๐Ÿ“… Day Tracker

Day Folder Description Status
1 day1 Hello world test file โœ…
2 day2 TBD โณ
... ... ... ...

๐Ÿ“‚ Folder Structure

rag-30-days/ โ”‚ โ”œโ”€โ”€ day1/ โ”‚ โ””โ”€โ”€ hello_ai.py โ”‚ โ”œโ”€โ”€ README.md

markdown Copy Edit

๐Ÿง  Goal

To build a production-ready RAG pipeline in 30 days and land a remote AI job by the end of the sprint.

๐Ÿ› ๏ธ Tools

  • Python
  • LangChain
  • ChromaDB / Weaviate / FAISS
  • OpenAI API
  • Streamlit (optional UI)
  • Git & GitHub

๐Ÿ“ˆ Progress

Check commits and folders daily to follow the sprint. Each folder corresponds to 1 day of learning and building.

๐Ÿ“… Day 1 โ€“ Getting Started with Python & Flask

โœ… What I Learned

  • Refreshed core Python basics (variables, functions, classes, etc.)
  • Built my first Flask API with real-world JSON responses
  • Practiced structured coding with Copilot assistance

๐Ÿ› ๏ธ What I Built

  • hello_ai.py: A minimal Python script to print a welcome message
  • api.py: A Flask application with 3 endpoints:
    • /hello: greeting message
    • /calculate: accepts 2 numbers (POST) and returns their sum
    • /ai-ready: motivational message for AI learning

๐Ÿ”ฎ Tomorrow's Plan

  • Begin LangChain setup and environment configuration
  • Start working on RAG-based document processing
  • Set up folder structure and day2 workflow

๐Ÿ‘ฃ One day down, 29 to go. Keep shipping.

Day 3: First RAG System โœ…

What I Built

  • PDF processing pipeline (loader + optimal chunker)
  • Compared 3 chunking strategies (fixed, recursive, token)
  • ChromaDB vector storage (persistent)
  • SentenceTransformer embeddings (MiniLM)
  • Gradio chat interface (upload PDF โ†’ ask)
  • Deployment on Hugging Face Spaces

Key Learnings

  • Fixed vs Recursive vs Token-based chunking trade-offs
  • Embedding format must be list[list[float]] for Chroma
  • New Chroma API uses PersistentClient
  • Prompt design: extractive answers + fallback

Live Demo

๐Ÿ”— HuggingFace Space Link

๐Ÿ“ฌ Contact

Made by Hamid Omarov
Check out my portfolio: Notion Page