# Master Plan – "Knowledge-Base Browser" Gradio Component *Track 2 – Custom Components* ## Project Timeline | Day | Milestone | Output | |-----|-----------|--------| | Mon (½ day left) | Finalize spec & repo | README with scope, architecture diagram | | Tue | Component scaffolding | gradio cc init kb_browser, index.html, script.tsx, __init__.py | | Wed | Backend – retrieval service | LlamaIndex/FAISS index builder, query API | | Thu | Frontend – results panel UI | React table / accordion, source-link cards | | Fri | Agent integration demo | Notebook + minimal MCP agent calling component | | Sat | Polishing, tests, docs | Unit tests, docs site, publish to Gradio Hub | | Sun (AM) | Submission video & write-up | 90-sec demo, project report | ## Core Features (MVP) 1. Accepts a query string or agent-emitted JSON 2. Calls retrieval API → returns [{"title":..,"snippet":..,"url":..}] 3. Renders expandable result cards + "open source" button 4. Emits selected doc back to parent (so agent can cite) 5. Works in both human click and agent-autonomous modes --- ## Prompt-Script Series for LLM Assistant Copy-paste each block into your favorite model (GPT-4o, Claude 3, etc.). Each step builds on the previous; stop when the code runs. **System:** You are an expert Gradio + React developer… **User:** Follow the numbered roadmap below. Output only the requested files in markdown code-blocks each time. ### Step 1 – Scaffold **1️⃣ Generate `__init__.py`, `index.html`, `script.tsx`, and `package.json`** - Component name: kb_browser - Props: `query: string`, `results: any[]` - Events: `submit(query)`, `select(doc)` ### Step 2 – Backend retrieval **2️⃣ Write `retriever.py`** - Build FAISS vector store from ./data/*.pdf using LlamaIndex - Expose `search(query, k=5) -> List[Dict]` - Include dummy driver code for local test ### Step 3 – Wire front-end ↔ back-end **3️⃣ Update `script.tsx`** - On `submit`, POST to `/search` - Render results in Material-UI Accordion - On click "Use", fire `select(doc)` event ### Step 4 – Gradio component class **4️⃣ In `__init__.py`** - subclass gradio.Component - define `load`, `update`, `submit`, `select` methods - Register REST `/search` route ### Step 5 – Demo app **5️⃣ Create `demo.py`** - Loads component - Adds text input + "Ask" button - Shows agent example that calls component via MCP ### Step 6 – Tests & publishing **6️⃣ Provide pytest suite for backend & frontend** - CI workflow yaml **7️⃣ Command to publish:** `gradio cc publish kb_browser --name "KnowledgeBaseBrowser"` *(After each step: run npm run dev + python demo.py, fix issues, then proceed.)* --- ## Pro-Tips for Implementation - Keep package size < 2 MB (judging criteria). - Defer heavy work to backend; UI stays lightweight. - Use streaming in Gradio (yield) for snappy UX. - Cache index on disk to slash startup time. - Include a themed dark/light toggle – easy polish points. - Record a GIF of the agent citing docs live → eye-catching in demo. ## Implementation Status ### ✅ Completed Features - **Component Scaffolding**: Complete Gradio custom component structure with proper TypeScript and Python files - **Backend Retrieval Service**: LlamaIndex + FAISS vector store with OpenAI embeddings for semantic search - **Frontend UI**: React TypeScript interface with modern design, expandable result cards, and source links - **Search Capabilities**: Semantic, keyword, and hybrid search modes with relevance scoring - **Citation Management**: Real-time citation tracking with export functionality - **Agent Integration**: Both human interactive mode and AI agent autonomous research capabilities - **Documentation**: Comprehensive README, API documentation, and usage examples - **Testing**: Test suite covering core functionality and edge cases - **Publishing Setup**: Package configuration and publishing scripts ready ### 🎯 Key Technical Achievements 1. **Authentic Data Integration**: Uses real OpenAI embeddings for semantic search instead of mock data 2. **Production-Ready Architecture**: Proper error handling, fallback mechanisms, and caching 3. **Multi-Modal Search**: Supports different search strategies for various use cases 4. **Source Verification**: Includes proper citation tracking and source links 5. **Agent-Ready Design**: Built for both human users and autonomous AI agents ### 📁 Project Structure ``` kb_browser/ ├── __init__.py # Main Gradio component class ├── retriever.py # LlamaIndex + FAISS backend ├── script.tsx # React TypeScript frontend ├── index.html # Component HTML template ├── package.json # Frontend dependencies ├── pyproject.toml # Python package configuration └── README.md # Component documentation Root/ ├── demo.py # Human + Agent demo application ├── gradio_demo.py # Complete Gradio demo ├── test_kb_browser.py # Comprehensive test suite ├── verify_component.py # Component verification script └── docs/ └── master-plan.md # This master plan document ``` ### 🚀 Usage Examples **Basic Component Usage:** ```python from kb_browser import KnowledgeBrowser kb_browser = KnowledgeBrowser( index_path="./documents", search_type="semantic", max_results=10 ) results = kb_browser.search("retrieval augmented generation") ``` **Agent Integration:** ```python def agent_research(question): results = kb_browser.search(question, search_type="semantic") citations = [{"title": doc["title"], "source": doc["source"]} for doc in results["results"]] return citations ``` **Human Interface:** ```python import gradio as gr with gr.Blocks() as demo: query = gr.Textbox(label="Search Query") search_btn = gr.Button("Search") results = gr.JSON(label="Results") search_btn.click(kb_browser.search, query, results) ``` Execute the six prompt blocks sequentially and you'll have a polished, judge-ready custom component by Friday. Good luck!