abdullahmeda commited on
Commit
0955e72
·
1 Parent(s): 92dd823

finalize hfcontext7 v2

Browse files
Files changed (11) hide show
  1. README.md +156 -3
  2. Relevant README 1.md +311 -0
  3. Relevant README 2.md +618 -0
  4. app.py +128 -38
  5. make_docs.py +33 -22
  6. make_rag_db.py +34 -11
  7. postBuild +0 -2
  8. repo2txt.py +15 -104
  9. requirements.txt +6 -5
  10. schemas.py +4 -0
  11. utils.py +68 -0
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: HfContext7
3
- emoji: 🐠
4
  colorFrom: pink
5
  colorTo: yellow
6
  sdk: gradio
@@ -9,8 +9,161 @@ app_file: app.py
9
  pinned: false
10
  tags:
11
  - mcp-server-track
 
12
  license: apache-2.0
13
  short_description: Latest 🤗 documentation for LLMs and AI code editors
14
  ---
15
 
16
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: HFContext7
3
+ emoji: 🤗
4
  colorFrom: pink
5
  colorTo: yellow
6
  sdk: gradio
 
9
  pinned: false
10
  tags:
11
  - mcp-server-track
12
+ - Agents-MCP-Hackathon
13
  license: apache-2.0
14
  short_description: Latest 🤗 documentation for LLMs and AI code editors
15
  ---
16
 
17
+ # 🐠 HfContext7 MCP Server
18
+
19
+ <p align="center">
20
+ <em>Real-time HuggingFace Documentation for AI Coding Assistants and LLMs</em>
21
+ </p>
22
+
23
+
24
+ ## 🚀 What is HfContext7?
25
+ ---
26
+ **HfContext7** is a specialized Model Context Protocol (MCP) server designed to provide AI coding assistants and Large Language Models (LLMs) with **real-time, up-to-date documentation** from the HuggingFace ecosystem.
27
+
28
+ Inspired by the groundbreaking [Context7 MCP Server](https://github.com/upstash/context7), HfContext7 specifically targets the rapidly evolving HuggingFace libraries, ensuring your AI assistant always has the latest and most accurate information.
29
+
30
+ ---
31
+
32
+ ## ❌ The Problem We Solve
33
+
34
+ The HuggingFace ecosystem evolves at lightning speed. New APIs, features, and best practices emerge constantly, making it challenging for LLMs trained on static datasets to keep up. This leads to:
35
+
36
+ - ❌ Outdated code examples based on old training data
37
+ - ❌ Hallucinated APIs that no longer exist or never existed
38
+ - ❌ Generic answers that don't reflect current HuggingFace best practices
39
+ - ❌ Confusion between similar HuggingFace libraries (Transformers, Diffusers, PEFT, etc.)
40
+
41
+ ---
42
+
43
+ ## ✅ How HfContext7 Solves It
44
+
45
+ HfContext7 MCP server solves these issues by:
46
+
47
+ - **Real-time Documentation**: Fetching the latest HuggingFace documentation directly from official sources.
48
+ - **Semantic Search**: Leveraging advanced embeddings and vector search (powered by Milvus and OpenAI embeddings) to retrieve highly relevant documentation snippets.
49
+ - **Seamless Integration**: Easily integrates with popular AI coding assistants (Cursor, Claude Desktop, Windsurf, etc.) via MCP.
50
+
51
+ Simply add `use hfcontext7` to your prompt:
52
+
53
+ ```txt
54
+ Create a LoRA fine-tuning script for Llama with PEFT. use hfcontext7
55
+ ```
56
+
57
+ ```txt
58
+ Set up a Gradio interface with Diffusers for image generation. use hfcontext7
59
+ ```
60
+
61
+ HfContext7 instantly provides your AI assistant with accurate, up-to-date HuggingFace documentation and code examples.
62
+
63
+ ---
64
+
65
+ ## 📚 Supported HuggingFace Libraries (28+)
66
+
67
+ HfContext7 supports a wide range of HuggingFace libraries, including:
68
+
69
+ - **Transformers** – State-of-the-art NLP models
70
+ - **Diffusers** – Diffusion models for image/audio generation
71
+ - **PEFT** – Parameter-Efficient Fine-Tuning (LoRA, etc.)
72
+ - **TRL** – Transformer Reinforcement Learning
73
+ - **Datasets** – Access and share datasets
74
+ - **Accelerate** – Simplified distributed training
75
+ - **Text Generation Inference (TGI)** – High-performance inference
76
+ - **Optimum** – Hardware-optimized transformers
77
+ - **AutoTrain** – No-code training platform
78
+ - **bitsandbytes** – 8-bit optimizers and quantization
79
+
80
+ ...and many more! (Full list available in `repos_config.json`)
81
+
82
+ ---
83
+
84
+ ## 🛠️ Available Tools
85
+
86
+ HfContext7 provides essential tools for AI coding assistants:
87
+
88
+ - **`list_huggingface_resources_names`**: Lists all available HuggingFace resources in the documentation database.
89
+ - **`get_huggingface_documentation`**: Retrieves relevant documentation for a specific topic, optionally filtered by resource names.
90
+
91
+ ---
92
+
93
+ ## ⚙️ Quick Start
94
+
95
+ ### 1. Clone and Install
96
+
97
+ ```bash
98
+ git clone <repo-url>
99
+ cd hfcontext7
100
+ pip install -r requirements.txt
101
+ ```
102
+
103
+ ### 2. Set OpenAI API Key
104
+
105
+ ```bash
106
+ echo "OPENAI_API_KEY=your_key_here" > .env
107
+ ```
108
+
109
+ ### 3. Build Documentation Database
110
+
111
+ ```bash
112
+ python make_docs.py
113
+ python make_rag_db.py
114
+ ```
115
+
116
+ ### 4. Run the Server
117
+
118
+ ```bash
119
+ python app.py
120
+ ```
121
+
122
+ ---
123
+
124
+ ## 🔌 MCP Client Setup
125
+
126
+ ### Cursor & Claude Desktop Example
127
+
128
+ ```json
129
+ {
130
+ "mcpServers": {
131
+ "hfcontext7": {
132
+ "command": "python",
133
+ "args": ["/path/to/hfcontext7/app.py"],
134
+ "env": {
135
+ "OPENAI_API_KEY": "your_openai_api_key"
136
+ }
137
+ }
138
+ }
139
+ }
140
+ ```
141
+
142
+ ---
143
+
144
+ ## 💡 How It Works
145
+
146
+ HfContext7 MCP server workflow:
147
+
148
+ 1. **Crawls** official HuggingFace documentation repositories.
149
+ 2. **Organizes** documentation using semantic embeddings (OpenAI embeddings + Milvus vector DB).
150
+ 3. **Serves** relevant documentation snippets directly into your AI assistant's context via MCP.
151
+ 4. **Updates** easily—just re-run the build scripts to refresh documentation.
152
+
153
+ ---
154
+
155
+ ## 🌟 Inspired by Context7
156
+
157
+ This project was heavily inspired by the incredible [Context7 MCP Server](https://github.com/upstash/context7) by Upstash, which revolutionized how LLMs access general development documentation. While Context7 provides broad coverage across many frameworks, HfContext7 focuses specifically on the HuggingFace ecosystem, providing deeper, more specialized knowledge for AI/ML development.
158
+
159
+ ---
160
+
161
+ ## 📄 License
162
+
163
+ Apache 2.0
164
+
165
+ ---
166
+
167
+ <p align="center">
168
+ <strong>Stop fighting outdated HuggingFace examples. Get the latest docs in every prompt. 🚀</strong>
169
+ </p>
Relevant README 1.md ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 align="center">Crawl4AI RAG MCP Server</h1>
2
+
3
+ <p align="center">
4
+ <em>Web Crawling and RAG Capabilities for AI Agents and AI Coding Assistants</em>
5
+ </p>
6
+
7
+ A powerful implementation of the [Model Context Protocol (MCP)](https://modelcontextprotocol.io) integrated with [Crawl4AI](https://crawl4ai.com) and [Supabase](https://supabase.com/) for providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.
8
+
9
+ With this MCP server, you can <b>scrape anything</b> and then <b>use that knowledge anywhere</b> for RAG.
10
+
11
+ The primary goal is to bring this MCP server into [Archon](https://github.com/coleam00/Archon) as I evolve it to be more of a knowledge engine for AI coding assistants to build AI agents. This first version of the Crawl4AI/RAG MCP server will be improved upon greatly soon, especially making it more configurable so you can use different embedding models and run everything locally with Ollama.
12
+
13
+ ## Overview
14
+
15
+ This MCP server provides tools that enable AI agents to crawl websites, store content in a vector database (Supabase), and perform RAG over the crawled content. It follows the best practices for building MCP servers based on the [Mem0 MCP server template](https://github.com/coleam00/mcp-mem0/) I provided on my channel previously.
16
+
17
+ The server includes several advanced RAG strategies that can be enabled to enhance retrieval quality:
18
+ - **Contextual Embeddings** for enriched semantic understanding
19
+ - **Hybrid Search** combining vector and keyword search
20
+ - **Agentic RAG** for specialized code example extraction
21
+ - **Reranking** for improved result relevance using cross-encoder models
22
+
23
+ See the [Configuration section](#configuration) below for details on how to enable and configure these strategies.
24
+
25
+ ## Vision
26
+
27
+ The Crawl4AI RAG MCP server is just the beginning. Here's where we're headed:
28
+
29
+ 1. **Integration with Archon**: Building this system directly into [Archon](https://github.com/coleam00/Archon) to create a comprehensive knowledge engine for AI coding assistants to build better AI agents.
30
+
31
+ 2. **Multiple Embedding Models**: Expanding beyond OpenAI to support a variety of embedding models, including the ability to run everything locally with Ollama for complete control and privacy.
32
+
33
+ 3. **Advanced RAG Strategies**: Implementing sophisticated retrieval techniques like contextual retrieval, late chunking, and others to move beyond basic "naive lookups" and significantly enhance the power and precision of the RAG system, especially as it integrates with Archon.
34
+
35
+ 4. **Enhanced Chunking Strategy**: Implementing a Context 7-inspired chunking approach that focuses on examples and creates distinct, semantically meaningful sections for each chunk, improving retrieval precision.
36
+
37
+ 5. **Performance Optimization**: Increasing crawling and indexing speed to make it more realistic to "quickly" index new documentation to then leverage it within the same prompt in an AI coding assistant.
38
+
39
+ ## Features
40
+
41
+ - **Smart URL Detection**: Automatically detects and handles different URL types (regular webpages, sitemaps, text files)
42
+ - **Recursive Crawling**: Follows internal links to discover content
43
+ - **Parallel Processing**: Efficiently crawls multiple pages simultaneously
44
+ - **Content Chunking**: Intelligently splits content by headers and size for better processing
45
+ - **Vector Search**: Performs RAG over crawled content, optionally filtering by data source for precision
46
+ - **Source Retrieval**: Retrieve sources available for filtering to guide the RAG process
47
+
48
+ ## Tools
49
+
50
+ The server provides essential web crawling and search tools:
51
+
52
+ ### Core Tools (Always Available)
53
+
54
+ 1. **`crawl_single_page`**: Quickly crawl a single web page and store its content in the vector database
55
+ 2. **`smart_crawl_url`**: Intelligently crawl a full website based on the type of URL provided (sitemap, llms-full.txt, or a regular webpage that needs to be crawled recursively)
56
+ 3. **`get_available_sources`**: Get a list of all available sources (domains) in the database
57
+ 4. **`perform_rag_query`**: Search for relevant content using semantic search with optional source filtering
58
+
59
+ ### Conditional Tools
60
+
61
+ 5. **`search_code_examples`** (requires `USE_AGENTIC_RAG=true`): Search specifically for code examples and their summaries from crawled documentation. This tool provides targeted code snippet retrieval for AI coding assistants.
62
+
63
+ ## Prerequisites
64
+
65
+ - [Docker/Docker Desktop](https://www.docker.com/products/docker-desktop/) if running the MCP server as a container (recommended)
66
+ - [Python 3.12+](https://www.python.org/downloads/) if running the MCP server directly through uv
67
+ - [Supabase](https://supabase.com/) (database for RAG)
68
+ - [OpenAI API key](https://platform.openai.com/api-keys) (for generating embeddings)
69
+
70
+ ## Installation
71
+
72
+ ### Using Docker (Recommended)
73
+
74
+ 1. Clone this repository:
75
+ ```bash
76
+ git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
77
+ cd mcp-crawl4ai-rag
78
+ ```
79
+
80
+ 2. Build the Docker image:
81
+ ```bash
82
+ docker build -t mcp/crawl4ai-rag --build-arg PORT=8051 .
83
+ ```
84
+
85
+ 3. Create a `.env` file based on the configuration section below
86
+
87
+ ### Using uv directly (no Docker)
88
+
89
+ 1. Clone this repository:
90
+ ```bash
91
+ git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
92
+ cd mcp-crawl4ai-rag
93
+ ```
94
+
95
+ 2. Install uv if you don't have it:
96
+ ```bash
97
+ pip install uv
98
+ ```
99
+
100
+ 3. Create and activate a virtual environment:
101
+ ```bash
102
+ uv venv
103
+ .venv\Scripts\activate
104
+ # on Mac/Linux: source .venv/bin/activate
105
+ ```
106
+
107
+ 4. Install dependencies:
108
+ ```bash
109
+ uv pip install -e .
110
+ crawl4ai-setup
111
+ ```
112
+
113
+ 5. Create a `.env` file based on the configuration section below
114
+
115
+ ## Database Setup
116
+
117
+ Before running the server, you need to set up the database with the pgvector extension:
118
+
119
+ 1. Go to the SQL Editor in your Supabase dashboard (create a new project first if necessary)
120
+
121
+ 2. Create a new query and paste the contents of `crawled_pages.sql`
122
+
123
+ 3. Run the query to create the necessary tables and functions
124
+
125
+ ## Configuration
126
+
127
+ Create a `.env` file in the project root with the following variables:
128
+
129
+ ```
130
+ # MCP Server Configuration
131
+ HOST=0.0.0.0
132
+ PORT=8051
133
+ TRANSPORT=sse
134
+
135
+ # OpenAI API Configuration
136
+ OPENAI_API_KEY=your_openai_api_key
137
+
138
+ # LLM for summaries and contextual embeddings
139
+ MODEL_CHOICE=gpt-4.1-nano
140
+
141
+ # RAG Strategies (set to "true" or "false", default to "false")
142
+ USE_CONTEXTUAL_EMBEDDINGS=false
143
+ USE_HYBRID_SEARCH=false
144
+ USE_AGENTIC_RAG=false
145
+ USE_RERANKING=false
146
+
147
+ # Supabase Configuration
148
+ SUPABASE_URL=your_supabase_project_url
149
+ SUPABASE_SERVICE_KEY=your_supabase_service_key
150
+ ```
151
+
152
+ ### RAG Strategy Options
153
+
154
+ The Crawl4AI RAG MCP server supports four powerful RAG strategies that can be enabled independently:
155
+
156
+ #### 1. **USE_CONTEXTUAL_EMBEDDINGS**
157
+ When enabled, this strategy enhances each chunk's embedding with additional context from the entire document. The system passes both the full document and the specific chunk to an LLM (configured via `MODEL_CHOICE`) to generate enriched context that gets embedded alongside the chunk content.
158
+
159
+ - **When to use**: Enable this when you need high-precision retrieval where context matters, such as technical documentation where terms might have different meanings in different sections.
160
+ - **Trade-offs**: Slower indexing due to LLM calls for each chunk, but significantly better retrieval accuracy.
161
+ - **Cost**: Additional LLM API calls during indexing.
162
+
163
+ #### 2. **USE_HYBRID_SEARCH**
164
+ Combines traditional keyword search with semantic vector search to provide more comprehensive results. The system performs both searches in parallel and intelligently merges results, prioritizing documents that appear in both result sets.
165
+
166
+ - **When to use**: Enable this when users might search using specific technical terms, function names, or when exact keyword matches are important alongside semantic understanding.
167
+ - **Trade-offs**: Slightly slower search queries but more robust results, especially for technical content.
168
+ - **Cost**: No additional API costs, just computational overhead.
169
+
170
+ #### 3. **USE_AGENTIC_RAG**
171
+ Enables specialized code example extraction and storage. When crawling documentation, the system identifies code blocks (≥300 characters), extracts them with surrounding context, generates summaries, and stores them in a separate vector database table specifically designed for code search.
172
+
173
+ - **When to use**: Essential for AI coding assistants that need to find specific code examples, implementation patterns, or usage examples from documentation.
174
+ - **Trade-offs**: Significantly slower crawling due to code extraction and summarization, requires more storage space.
175
+ - **Cost**: Additional LLM API calls for summarizing each code example.
176
+ - **Benefits**: Provides a dedicated `search_code_examples` tool that AI agents can use to find specific code implementations.
177
+
178
+ #### 4. **USE_RERANKING**
179
+ Applies cross-encoder reranking to search results after initial retrieval. Uses a lightweight cross-encoder model (`cross-encoder/ms-marco-MiniLM-L-6-v2`) to score each result against the original query, then reorders results by relevance.
180
+
181
+ - **When to use**: Enable this when search precision is critical and you need the most relevant results at the top. Particularly useful for complex queries where semantic similarity alone might not capture query intent.
182
+ - **Trade-offs**: Adds ~100-200ms to search queries depending on result count, but significantly improves result ordering.
183
+ - **Cost**: No additional API costs - uses a local model that runs on CPU.
184
+ - **Benefits**: Better result relevance, especially for complex queries. Works with both regular RAG search and code example search.
185
+
186
+ ### Recommended Configurations
187
+
188
+ **For general documentation RAG:**
189
+ ```
190
+ USE_CONTEXTUAL_EMBEDDINGS=false
191
+ USE_HYBRID_SEARCH=true
192
+ USE_AGENTIC_RAG=false
193
+ USE_RERANKING=true
194
+ ```
195
+
196
+ **For AI coding assistant with code examples:**
197
+ ```
198
+ USE_CONTEXTUAL_EMBEDDINGS=true
199
+ USE_HYBRID_SEARCH=true
200
+ USE_AGENTIC_RAG=true
201
+ USE_RERANKING=true
202
+ ```
203
+
204
+ **For fast, basic RAG:**
205
+ ```
206
+ USE_CONTEXTUAL_EMBEDDINGS=false
207
+ USE_HYBRID_SEARCH=true
208
+ USE_AGENTIC_RAG=false
209
+ USE_RERANKING=false
210
+ ```
211
+
212
+ ## Running the Server
213
+
214
+ ### Using Docker
215
+
216
+ ```bash
217
+ docker run --env-file .env -p 8051:8051 mcp/crawl4ai-rag
218
+ ```
219
+
220
+ ### Using Python
221
+
222
+ ```bash
223
+ uv run src/crawl4ai_mcp.py
224
+ ```
225
+
226
+ The server will start and listen on the configured host and port.
227
+
228
+ ## Integration with MCP Clients
229
+
230
+ ### SSE Configuration
231
+
232
+ Once you have the server running with SSE transport, you can connect to it using this configuration:
233
+
234
+ ```json
235
+ {
236
+ "mcpServers": {
237
+ "crawl4ai-rag": {
238
+ "transport": "sse",
239
+ "url": "http://localhost:8051/sse"
240
+ }
241
+ }
242
+ }
243
+ ```
244
+
245
+ > **Note for Windsurf users**: Use `serverUrl` instead of `url` in your configuration:
246
+ > ```json
247
+ > {
248
+ > "mcpServers": {
249
+ > "crawl4ai-rag": {
250
+ > "transport": "sse",
251
+ > "serverUrl": "http://localhost:8051/sse"
252
+ > }
253
+ > }
254
+ > }
255
+ > ```
256
+ >
257
+ > **Note for Docker users**: Use `host.docker.internal` instead of `localhost` if your client is running in a different container. This will apply if you are using this MCP server within n8n!
258
+
259
+ ### Stdio Configuration
260
+
261
+ Add this server to your MCP configuration for Claude Desktop, Windsurf, or any other MCP client:
262
+
263
+ ```json
264
+ {
265
+ "mcpServers": {
266
+ "crawl4ai-rag": {
267
+ "command": "python",
268
+ "args": ["path/to/crawl4ai-mcp/src/crawl4ai_mcp.py"],
269
+ "env": {
270
+ "TRANSPORT": "stdio",
271
+ "OPENAI_API_KEY": "your_openai_api_key",
272
+ "SUPABASE_URL": "your_supabase_url",
273
+ "SUPABASE_SERVICE_KEY": "your_supabase_service_key"
274
+ }
275
+ }
276
+ }
277
+ }
278
+ ```
279
+
280
+ ### Docker with Stdio Configuration
281
+
282
+ ```json
283
+ {
284
+ "mcpServers": {
285
+ "crawl4ai-rag": {
286
+ "command": "docker",
287
+ "args": ["run", "--rm", "-i",
288
+ "-e", "TRANSPORT",
289
+ "-e", "OPENAI_API_KEY",
290
+ "-e", "SUPABASE_URL",
291
+ "-e", "SUPABASE_SERVICE_KEY",
292
+ "mcp/crawl4ai"],
293
+ "env": {
294
+ "TRANSPORT": "stdio",
295
+ "OPENAI_API_KEY": "your_openai_api_key",
296
+ "SUPABASE_URL": "your_supabase_url",
297
+ "SUPABASE_SERVICE_KEY": "your_supabase_service_key"
298
+ }
299
+ }
300
+ }
301
+ }
302
+ ```
303
+
304
+ ## Building Your Own Server
305
+
306
+ This implementation provides a foundation for building more complex MCP servers with web crawling capabilities. To build your own:
307
+
308
+ 1. Add your own tools by creating methods with the `@mcp.tool()` decorator
309
+ 2. Create your own lifespan function to add your own dependencies
310
+ 3. Modify the `utils.py` file for any helper functions you need
311
+ 4. Extend the crawling capabilities by adding more specialized crawlers
Relevant README 2.md ADDED
@@ -0,0 +1,618 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Context7 MCP - Up-to-date Code Docs For Any Prompt
2
+
3
+ [![Website](https://img.shields.io/badge/Website-context7.com-blue)](https://context7.com) [![smithery badge](https://smithery.ai/badge/@upstash/context7-mcp)](https://smithery.ai/server/@upstash/context7-mcp) [<img alt="Install in VS Code (npx)" src="https://img.shields.io/badge/VS_Code-VS_Code?style=flat-square&label=Install%20Context7%20MCP&color=0098FF">](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%7B%22name%22%3A%22context7%22%2C%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40upstash%2Fcontext7-mcp%40latest%22%5D%7D)
4
+
5
+ [![繁體中文](https://img.shields.io/badge/docs-繁體中文-yellow)](./docs/README.zh-TW.md) [![簡體中文](https://img.shields.io/badge/docs-簡體中文-yellow)](./docs/README.zh-CN.md) [![한국어 문서](https://img.shields.io/badge/docs-한국어-green)](./docs/README.ko.md) [![Documentación en Español](https://img.shields.io/badge/docs-Español-orange)](./docs/README.es.md) [![Documentation en Français](https://img.shields.io/badge/docs-Français-blue)](./docs/README.fr.md) [![Documentação em Português (Brasil)](<https://img.shields.io/badge/docs-Português%20(Brasil)-purple>)](./docs/README.pt-BR.md) [![Documentazione in italiano](https://img.shields.io/badge/docs-Italian-red)](./docs/README.it.md) [![Dokumentasi Bahasa Indonesia](https://img.shields.io/badge/docs-Bahasa%20Indonesia-pink)](./docs/README.id-ID.md) [![Dokumentation auf Deutsch](https://img.shields.io/badge/docs-Deutsch-darkgreen)](./docs/README.de.md) [![Документация на русском языке](https://img.shields.io/badge/docs-Русский-darkblue)](./docs/README.ru.md) [![Türkçe Doküman](https://img.shields.io/badge/docs-Türkçe-blue)](./docs/README.tr.md) [![Arabic Documentation](https://img.shields.io/badge/docs-Arabic-white)](./docs/README.ar.md)
6
+
7
+ ## ❌ Without Context7
8
+
9
+ LLMs rely on outdated or generic information about the libraries you use. You get:
10
+
11
+ - ❌ Code examples are outdated and based on year-old training data
12
+ - ❌ Hallucinated APIs don't even exist
13
+ - ❌ Generic answers for old package versions
14
+
15
+ ## ✅ With Context7
16
+
17
+ Context7 MCP pulls up-to-date, version-specific documentation and code examples straight from the source — and places them directly into your prompt.
18
+
19
+ Add `use context7` to your prompt in Cursor:
20
+
21
+ ```txt
22
+ Create a basic Next.js project with app router. use context7
23
+ ```
24
+
25
+ ```txt
26
+ Create a script to delete the rows where the city is "" given PostgreSQL credentials. use context7
27
+ ```
28
+
29
+ Context7 fetches up-to-date code examples and documentation right into your LLM's context.
30
+
31
+ - 1️⃣ Write your prompt naturally
32
+ - 2️⃣ Tell the LLM to `use context7`
33
+ - 3️⃣ Get working code answers
34
+
35
+ No tab-switching, no hallucinated APIs that don't exist, no outdated code generations.
36
+
37
+ ## 📚 Adding Projects
38
+
39
+ Check out our [project addition guide](./docs/adding-projects.md) to learn how to add (or update) your favorite libraries to Context7.
40
+
41
+ ## 🛠️ Installation
42
+
43
+ ### Requirements
44
+
45
+ - Node.js >= v18.0.0
46
+ - Cursor, Windsurf, Claude Desktop or another MCP Client
47
+
48
+ <details>
49
+ <summary><b>Installing via Smithery</b></summary>
50
+
51
+ To install Context7 MCP Server for any client automatically via [Smithery](https://smithery.ai/server/@upstash/context7-mcp):
52
+
53
+ ```bash
54
+ npx -y @smithery/cli@latest install @upstash/context7-mcp --client <CLIENT_NAME> --key <YOUR_SMITHERY_KEY>
55
+ ```
56
+
57
+ You can find your Smithery key in the [Smithery.ai webpage](https://smithery.ai/server/@upstash/context7-mcp).
58
+
59
+ </details>
60
+
61
+ <details>
62
+ <summary><b>Install in Cursor</b></summary>
63
+
64
+ Go to: `Settings` -> `Cursor Settings` -> `MCP` -> `Add new global MCP server`
65
+
66
+ Pasting the following configuration into your Cursor `~/.cursor/mcp.json` file is the recommended approach. You may also install in a specific project by creating `.cursor/mcp.json` in your project folder. See [Cursor MCP docs](https://docs.cursor.com/context/model-context-protocol) for more info.
67
+
68
+ > Since Cursor 1.0, you can click the install button below for instant one-click installation.
69
+
70
+ #### Cursor Remote Server Connection
71
+
72
+ [![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=context7&config=eyJ1cmwiOiJodHRwczovL21jcC5jb250ZXh0Ny5jb20vbWNwIn0%3D)
73
+
74
+ ```json
75
+ {
76
+ "mcpServers": {
77
+ "context7": {
78
+ "url": "https://mcp.context7.com/mcp"
79
+ }
80
+ }
81
+ }
82
+ ```
83
+
84
+ #### Cursor Local Server Connection
85
+
86
+ [![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=context7&config=eyJjb21tYW5kIjoibnB4IC15IEB1cHN0YXNoL2NvbnRleHQ3LW1jcCJ9)
87
+
88
+ ```json
89
+ {
90
+ "mcpServers": {
91
+ "context7": {
92
+ "command": "npx",
93
+ "args": ["-y", "@upstash/context7-mcp"]
94
+ }
95
+ }
96
+ }
97
+ ```
98
+
99
+ <details>
100
+ <summary>Alternative: Use Bun</summary>
101
+
102
+ [![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=context7&config=eyJjb21tYW5kIjoiYnVueCAteSBAdXBzdGFzaC9jb250ZXh0Ny1tY3AifQ%3D%3D)
103
+
104
+ ```json
105
+ {
106
+ "mcpServers": {
107
+ "context7": {
108
+ "command": "bunx",
109
+ "args": ["-y", "@upstash/context7-mcp"]
110
+ }
111
+ }
112
+ }
113
+ ```
114
+
115
+ </details>
116
+
117
+ <details>
118
+ <summary>Alternative: Use Deno</summary>
119
+
120
+ [![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/install-mcp?name=context7&config=eyJjb21tYW5kIjoiZGVubyBydW4gLS1hbGxvdy1lbnYgLS1hbGxvdy1uZXQgbnBtOkB1cHN0YXNoL2NvbnRleHQ3LW1jcCJ9)
121
+
122
+ ```json
123
+ {
124
+ "mcpServers": {
125
+ "context7": {
126
+ "command": "deno",
127
+ "args": ["run", "--allow-env", "--allow-net", "npm:@upstash/context7-mcp"]
128
+ }
129
+ }
130
+ }
131
+ ```
132
+
133
+ </details>
134
+
135
+ </details>
136
+
137
+ <details>
138
+ <summary><b>Install in Windsurf</b></summary>
139
+
140
+ Add this to your Windsurf MCP config file. See [Windsurf MCP docs](https://docs.windsurf.com/windsurf/mcp) for more info.
141
+
142
+ #### Windsurf Remote Server Connection
143
+
144
+ ```json
145
+ {
146
+ "mcpServers": {
147
+ "context7": {
148
+ "serverUrl": "https://mcp.context7.com/sse"
149
+ }
150
+ }
151
+ }
152
+ ```
153
+
154
+ #### Windsurf Local Server Connection
155
+
156
+ ```json
157
+ {
158
+ "mcpServers": {
159
+ "context7": {
160
+ "command": "npx",
161
+ "args": ["-y", "@upstash/context7-mcp"]
162
+ }
163
+ }
164
+ }
165
+ ```
166
+
167
+ </details>
168
+
169
+ <details>
170
+ <summary><b>Install in VS Code</b></summary>
171
+
172
+ [<img alt="Install in VS Code (npx)" src="https://img.shields.io/badge/VS_Code-VS_Code?style=flat-square&label=Install%20Context7%20MCP&color=0098FF">](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%7B%22name%22%3A%22context7%22%2C%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40upstash%2Fcontext7-mcp%40latest%22%5D%7D)
173
+ [<img alt="Install in VS Code Insiders (npx)" src="https://img.shields.io/badge/VS_Code_Insiders-VS_Code_Insiders?style=flat-square&label=Install%20Context7%20MCP&color=24bfa5">](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%7B%22name%22%3A%22context7%22%2C%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22%40upstash%2Fcontext7-mcp%40latest%22%5D%7D)
174
+
175
+ Add this to your VS Code MCP config file. See [VS Code MCP docs](https://code.visualstudio.com/docs/copilot/chat/mcp-servers) for more info.
176
+
177
+ #### VS Code Remote Server Connection
178
+
179
+ ```json
180
+ "mcp": {
181
+ "servers": {
182
+ "context7": {
183
+ "type": "http",
184
+ "url": "https://mcp.context7.com/mcp"
185
+ }
186
+ }
187
+ }
188
+ ```
189
+
190
+ #### VS Code Local Server Connection
191
+
192
+ ```json
193
+ "mcp": {
194
+ "servers": {
195
+ "context7": {
196
+ "type": "stdio",
197
+ "command": "npx",
198
+ "args": ["-y", "@upstash/context7-mcp"]
199
+ }
200
+ }
201
+ }
202
+ ```
203
+
204
+ </details>
205
+
206
+ <details>
207
+ <summary><b>Install in Zed</b></summary>
208
+
209
+ It can be installed via [Zed Extensions](https://zed.dev/extensions?query=Context7) or you can add this to your Zed `settings.json`. See [Zed Context Server docs](https://zed.dev/docs/assistant/context-servers) for more info.
210
+
211
+ ```json
212
+ {
213
+ "context_servers": {
214
+ "Context7": {
215
+ "command": {
216
+ "path": "npx",
217
+ "args": ["-y", "@upstash/context7-mcp"]
218
+ },
219
+ "settings": {}
220
+ }
221
+ }
222
+ }
223
+ ```
224
+
225
+ </details>
226
+
227
+ <details>
228
+ <summary><b>Install in Claude Code</b></summary>
229
+
230
+ Run this command. See [Claude Code MCP docs](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/tutorials#set-up-model-context-protocol-mcp) for more info.
231
+
232
+ #### Claude Code Remote Server Connection
233
+
234
+ ```sh
235
+ claude mcp add --transport sse context7 https://mcp.context7.com/sse
236
+ ```
237
+
238
+ #### Claude Code Local Server Connection
239
+
240
+ ```sh
241
+ claude mcp add context7 -- npx -y @upstash/context7-mcp
242
+ ```
243
+
244
+ </details>
245
+
246
+ <details>
247
+ <summary><b>Install in Claude Desktop</b></summary>
248
+
249
+ Add this to your Claude Desktop `claude_desktop_config.json` file. See [Claude Desktop MCP docs](https://modelcontextprotocol.io/quickstart/user) for more info.
250
+
251
+ ```json
252
+ {
253
+ "mcpServers": {
254
+ "Context7": {
255
+ "command": "npx",
256
+ "args": ["-y", "@upstash/context7-mcp"]
257
+ }
258
+ }
259
+ }
260
+ ```
261
+
262
+ </details>
263
+
264
+ <details>
265
+ <summary><b>Install in BoltAI</b></summary>
266
+
267
+ Open the "Settings" page of the app, navigate to "Plugins," and enter the following JSON:
268
+
269
+ ```json
270
+ {
271
+ "mcpServers": {
272
+ "context7": {
273
+ "command": "npx",
274
+ "args": ["-y", "@upstash/context7-mcp"]
275
+ }
276
+ }
277
+ }
278
+ ```
279
+
280
+ Once saved, enter in the chat `get-library-docs` followed by your Context7 documentation ID (e.g., `get-library-docs /nuxt/ui`). More information is available on [BoltAI's Documentation site](https://docs.boltai.com/docs/plugins/mcp-servers). For BoltAI on iOS, [see this guide](https://docs.boltai.com/docs/boltai-mobile/mcp-servers).
281
+
282
+ </details>
283
+
284
+ <details>
285
+ <summary><b>Using Docker</b></summary>
286
+
287
+ If you prefer to run the MCP server in a Docker container:
288
+
289
+ 1. **Build the Docker Image:**
290
+
291
+ First, create a `Dockerfile` in the project root (or anywhere you prefer):
292
+
293
+ <details>
294
+ <summary>Click to see Dockerfile content</summary>
295
+
296
+ ```Dockerfile
297
+ FROM node:18-alpine
298
+
299
+ WORKDIR /app
300
+
301
+ # Install the latest version globally
302
+ RUN npm install -g @upstash/context7-mcp
303
+
304
+ # Expose default port if needed (optional, depends on MCP client interaction)
305
+ # EXPOSE 3000
306
+
307
+ # Default command to run the server
308
+ CMD ["context7-mcp"]
309
+ ```
310
+
311
+ </details>
312
+
313
+ Then, build the image using a tag (e.g., `context7-mcp`). **Make sure Docker Desktop (or the Docker daemon) is running.** Run the following command in the same directory where you saved the `Dockerfile`:
314
+
315
+ ```bash
316
+ docker build -t context7-mcp .
317
+ ```
318
+
319
+ 2. **Configure Your MCP Client:**
320
+
321
+ Update your MCP client's configuration to use the Docker command.
322
+
323
+ _Example for a cline_mcp_settings.json:_
324
+
325
+ ```json
326
+ {
327
+ "mcpServers": {
328
+ "Сontext7": {
329
+ "autoApprove": [],
330
+ "disabled": false,
331
+ "timeout": 60,
332
+ "command": "docker",
333
+ "args": ["run", "-i", "--rm", "context7-mcp"],
334
+ "transportType": "stdio"
335
+ }
336
+ }
337
+ }
338
+ ```
339
+
340
+ _Note: This is an example configuration. Please refer to the specific examples for your MCP client (like Cursor, VS Code, etc.) earlier in this README to adapt the structure (e.g., `mcpServers` vs `servers`). Also, ensure the image name in `args` matches the tag used during the `docker build` command._
341
+
342
+ </details>
343
+
344
+ <details>
345
+ <summary><b>Install in Windows</b></summary>
346
+
347
+ The configuration on Windows is slightly different compared to Linux or macOS (_`Cline` is used in the example_). The same principle applies to other editors; refer to the configuration of `command` and `args`.
348
+
349
+ ```json
350
+ {
351
+ "mcpServers": {
352
+ "github.com/upstash/context7-mcp": {
353
+ "command": "cmd",
354
+ "args": ["/c", "npx", "-y", "@upstash/context7-mcp@latest"],
355
+ "disabled": false,
356
+ "autoApprove": []
357
+ }
358
+ }
359
+ }
360
+ ```
361
+
362
+ </details>
363
+
364
+ <details>
365
+ <summary><b>Install in Augment Code</b></summary>
366
+
367
+ To configure Context7 MCP in Augment Code, follow these steps:
368
+
369
+ 1. Press Cmd/Ctrl Shift P or go to the hamburger menu in the Augment panel
370
+ 2. Select Edit Settings
371
+ 3. Under Advanced, click Edit in settings.json
372
+ 4. Add the server configuration to the `mcpServers` array in the `augment.advanced` object
373
+
374
+ ```json
375
+ "augment.advanced": {
376
+ "mcpServers": [
377
+ {
378
+ "name": "context7",
379
+ "command": "npx",
380
+ "args": ["-y", "@upstash/context7-mcp"]
381
+ }
382
+ ]
383
+ }
384
+ ```
385
+
386
+ Once the MCP server is added, restart your editor. If you receive any errors, check the syntax to make sure closing brackets or commas are not missing.
387
+
388
+ </details>
389
+
390
+ <details>
391
+ <summary><b>Install in Roo Code</b></summary>
392
+
393
+ Add this to your Roo Code MCP configuration file. See [Roo Code MCP docs](https://docs.roocode.com/features/mcp/using-mcp-in-roo) for more info.
394
+
395
+ #### Roo Code Remote Server Connection
396
+
397
+ ```json
398
+ {
399
+ "mcpServers": {
400
+ "context7": {
401
+ "type": "streamable-http",
402
+ "url": "https://mcp.context7.com/mcp"
403
+ }
404
+ }
405
+ }
406
+ ```
407
+
408
+ #### Roo Code Local Server Connection
409
+
410
+ ```json
411
+ {
412
+ "mcpServers": {
413
+ "context7": {
414
+ "command": "npx",
415
+ "args": ["-y", "@upstash/context7-mcp"]
416
+ }
417
+ }
418
+ }
419
+ ```
420
+
421
+ </details>
422
+
423
+ <details>
424
+ <summary><b>Install in Zencoder</b></summary>
425
+
426
+ To configure Context7 MCP in Zencoder, follow these steps:
427
+
428
+ 1. Go to the Zencoder menu (...)
429
+ 2. From the dropdown menu, select Agent tools
430
+ 3. Click on the Add custom MCP
431
+ 4. Add the name and server configuration from below, and make sure to hit the Install button
432
+
433
+ ```json
434
+ {
435
+ "command": "npx",
436
+ "args": [
437
+ "-y",
438
+ "@upstash/context7-mcp@latest"
439
+ ]
440
+ }
441
+ ```
442
+
443
+ Once the MCP server is added, you can easily continue using it.
444
+
445
+ </details>
446
+
447
+ ## 🔧 Environment Variables
448
+
449
+ The Context7 MCP server supports the following environment variables:
450
+
451
+ - `DEFAULT_MINIMUM_TOKENS`: Set the minimum token count for documentation retrieval (default: 10000)
452
+
453
+ Example configuration with environment variables:
454
+
455
+ ```json
456
+ {
457
+ "mcpServers": {
458
+ "context7": {
459
+ "command": "npx",
460
+ "args": ["-y", "@upstash/context7-mcp"],
461
+ "env": {
462
+ "DEFAULT_MINIMUM_TOKENS": "6000"
463
+ }
464
+ }
465
+ }
466
+ }
467
+ ```
468
+
469
+ ## 🔨 Available Tools
470
+
471
+ Context7 MCP provides the following tools that LLMs can use:
472
+
473
+ - `resolve-library-id`: Resolves a general library name into a Context7-compatible library ID.
474
+
475
+ - `libraryName` (required): The name of the library to search for
476
+
477
+ - `get-library-docs`: Fetches documentation for a library using a Context7-compatible library ID.
478
+ - `context7CompatibleLibraryID` (required): Exact Context7-compatible library ID (e.g., `/mongodb/docs`, `/vercel/next.js`)
479
+ - `topic` (optional): Focus the docs on a specific topic (e.g., "routing", "hooks")
480
+ - `tokens` (optional, default 10000): Max number of tokens to return. Values less than the configured `DEFAULT_MINIMUM_TOKENS` value or the default value of 10000 are automatically increased to that value.
481
+
482
+ ## 💻 Development
483
+
484
+ Clone the project and install dependencies:
485
+
486
+ ```bash
487
+ bun i
488
+ ```
489
+
490
+ Build:
491
+
492
+ ```bash
493
+ bun run build
494
+ ```
495
+
496
+ <details>
497
+ <summary><b>Local Configuration Example</b></summary>
498
+
499
+ ```json
500
+ {
501
+ "mcpServers": {
502
+ "context7": {
503
+ "command": "npx",
504
+ "args": ["tsx", "/path/to/folder/context7-mcp/src/index.ts"]
505
+ }
506
+ }
507
+ }
508
+ ```
509
+
510
+ </details>
511
+
512
+ <details>
513
+ <summary><b>Testing with MCP Inspector</b></summary>
514
+
515
+ ```bash
516
+ npx -y @modelcontextprotocol/inspector npx @upstash/context7-mcp
517
+ ```
518
+
519
+ </details>
520
+
521
+ ## 🚨 Troubleshooting
522
+
523
+ <details>
524
+ <summary><b>Module Not Found Errors</b></summary>
525
+
526
+ If you encounter `ERR_MODULE_NOT_FOUND`, try using `bunx` instead of `npx`:
527
+
528
+ ```json
529
+ {
530
+ "mcpServers": {
531
+ "context7": {
532
+ "command": "bunx",
533
+ "args": ["-y", "@upstash/context7-mcp"]
534
+ }
535
+ }
536
+ }
537
+ ```
538
+
539
+ This often resolves module resolution issues in environments where `npx` doesn't properly install or resolve packages.
540
+
541
+ </details>
542
+
543
+ <details>
544
+ <summary><b>ESM Resolution Issues</b></summary>
545
+
546
+ For errors like `Error: Cannot find module 'uriTemplate.js'`, try the `--experimental-vm-modules` flag:
547
+
548
+ ```json
549
+ {
550
+ "mcpServers": {
551
+ "context7": {
552
+ "command": "npx",
553
+ "args": ["-y", "--node-options=--experimental-vm-modules", "@upstash/[email protected]"]
554
+ }
555
+ }
556
+ }
557
+ ```
558
+
559
+ </details>
560
+
561
+ <details>
562
+ <summary><b>TLS/Certificate Issues</b></summary>
563
+
564
+ Use the `--experimental-fetch` flag to bypass TLS-related problems:
565
+
566
+ ```json
567
+ {
568
+ "mcpServers": {
569
+ "context7": {
570
+ "command": "npx",
571
+ "args": ["-y", "--node-options=--experimental-fetch", "@upstash/context7-mcp"]
572
+ }
573
+ }
574
+ }
575
+ ```
576
+
577
+ </details>
578
+
579
+ <details>
580
+ <summary><b>General MCP Client Errors</b></summary>
581
+
582
+ 1. Try adding `@latest` to the package name
583
+ 2. Use `bunx` as an alternative to `npx`
584
+ 3. Consider using `deno` as another alternative
585
+ 4. Ensure you're using Node.js v18 or higher for native fetch support
586
+
587
+ </details>
588
+
589
+ ## ⚠️ Disclaimer
590
+
591
+ Context7 projects are community-contributed and while we strive to maintain high quality, we cannot guarantee the accuracy, completeness, or security of all library documentation. Projects listed in Context7 are developed and maintained by their respective owners, not by Context7. If you encounter any suspicious, inappropriate, or potentially harmful content, please use the "Report" button on the project page to notify us immediately. We take all reports seriously and will review flagged content promptly to maintain the integrity and safety of our platform. By using Context7, you acknowledge that you do so at your own discretion and risk.
592
+
593
+ ## 🤝 Connect with Us
594
+
595
+ Stay updated and join our community:
596
+
597
+ - 📢 Follow us on [X](https://x.com/contextai) for the latest news and updates
598
+ - 🌐 Visit our [Website](https://context7.com)
599
+ - 💬 Join our [Discord Community](https://upstash.com/discord)
600
+
601
+ ## 📺 Context7 In Media
602
+
603
+ - [Better Stack: "Free Tool Makes Cursor 10x Smarter"](https://youtu.be/52FC3qObp9E)
604
+ - [Cole Medin: "This is Hands Down the BEST MCP Server for AI Coding Assistants"](https://www.youtube.com/watch?v=G7gK8H6u7Rs)
605
+ - [Income Stream Surfers: "Context7 + SequentialThinking MCPs: Is This AGI?"](https://www.youtube.com/watch?v=-ggvzyLpK6o)
606
+ - [Julian Goldie SEO: "Context7: New MCP AI Agent Update"](https://www.youtube.com/watch?v=CTZm6fBYisc)
607
+ - [JeredBlu: "Context 7 MCP: Get Documentation Instantly + VS Code Setup"](https://www.youtube.com/watch?v=-ls0D-rtET4)
608
+ - [Income Stream Surfers: "Context7: The New MCP Server That Will CHANGE AI Coding"](https://www.youtube.com/watch?v=PS-2Azb-C3M)
609
+ - [AICodeKing: "Context7 + Cline & RooCode: This MCP Server Makes CLINE 100X MORE EFFECTIVE!"](https://www.youtube.com/watch?v=qZfENAPMnyo)
610
+ - [Sean Kochel: "5 MCP Servers For Vibe Coding Glory (Just Plug-In & Go)"](https://www.youtube.com/watch?v=LqTQi8qexJM)
611
+
612
+ ## ⭐ Star History
613
+
614
+ [![Star History Chart](https://api.star-history.com/svg?repos=upstash/context7&type=Date)](https://www.star-history.com/#upstash/context7&Date)
615
+
616
+ ## 📄 License
617
+
618
+ MIT
app.py CHANGED
@@ -2,80 +2,166 @@ import gradio as gr
2
  import os
3
  import json
4
  import subprocess
5
- import tempfile
6
  import dotenv
7
  import shutil
 
 
 
 
8
  from pathlib import Path
9
- from string import Template
10
  from pymilvus import MilvusClient, model
 
 
11
 
12
  _ = dotenv.load_dotenv()
13
 
14
  subprocess.run(["python3", "make_docs.py"])
15
  subprocess.run(["python3", "make_rag_db.py"])
16
 
17
- template = Template("""\
18
- ---
19
- File: $file_path
20
- ---
21
-
22
- $file_content""")
23
-
24
  client = MilvusClient("milvus.db")
25
  embedding_fn = model.dense.OpenAIEmbeddingFunction(
26
- model_name='text-embedding-3-small', # Specify the model name
27
- api_key=os.environ.get('OPENAI_API_KEY'), # Provide your OpenAI API key
28
- dimensions=1536 # Set the embedding dimensionality
29
  )
30
 
 
 
31
 
32
  def list_huggingface_resources_names() -> list[str]:
33
  """List all the names of the libraries, services, and other resources available within the HuggingFace ecosystem.
34
-
35
  Returns:
36
  A list of libraries, services, and other resources available within the HuggingFace ecosystem
37
  """
38
- with open('repos_config.json', 'r') as f:
39
  repos = json.load(f)
40
 
41
- print([repo['title'] for repo in repos])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
- return [repo['title'] for repo in repos]
44
 
45
 
46
  def get_huggingface_documentation(topic: str, resource_names: list[str] = []) -> str:
47
  """Get the documentation for the given topic and resource names.
48
-
49
  Args:
50
  topic: Focus the docs on a specific topic (e.g. "Anthropic Provider Chat UI", "LoRA methods PEFT" or "TGI on Intel GPUs")
51
- resource_names: A list of relevant resource names to the topic
52
-
53
  Returns:
54
  A string of documentation for the given topic and resource names
55
  """
56
- print(resource_names)
57
- query_vectors = embedding_fn.encode_queries([topic])
58
- res = client.search(collection_name="hf_docs", data=query_vectors, limit=3, output_fields=["text", "file_path"])
59
- print(res)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
- docs_paths = [res[0][i]['file_path'] for i in range(len(res[0]))]
62
- print(docs_paths)
 
 
 
 
63
 
64
- documentation = ""
65
- for path in docs_paths:
66
- with open(path, 'r') as f:
67
- content = f.read()
68
- documentation += template.substitute(file_path=path.replace('docs/', ''), file_content=content) + "\n\n"
 
 
 
 
 
 
 
 
69
 
70
- print(documentation.strip())
71
- return documentation.strip()
72
 
73
  list_resources_demo = gr.Interface(
74
  fn=list_huggingface_resources_names,
75
  inputs=[],
76
  outputs="json",
77
  title="HuggingFace Ecosystem Explorer",
78
- description="Explore the names of the libraries, services, and other resources available within the HuggingFace ecosystem"
79
  )
80
 
81
  get_docs_demo = gr.Interface(
@@ -84,11 +170,15 @@ get_docs_demo = gr.Interface(
84
  outputs="text",
85
  )
86
 
 
 
 
 
87
  # Create tabbed interface
88
  demo = gr.TabbedInterface(
89
- [list_resources_demo, get_docs_demo],
90
- ["List Resources", "Get Documentation"],
91
- title="HuggingFace Ecosystem Documentation Explorer",
92
- )
93
 
94
- demo.launch(mcp_server=True)
 
2
  import os
3
  import json
4
  import subprocess
 
5
  import dotenv
6
  import shutil
7
+ import uuid
8
+
9
+ from schemas import Response
10
+ from openai import OpenAI
11
  from pathlib import Path
 
12
  from pymilvus import MilvusClient, model
13
+ from repo2txt import make_tree
14
+ from utils import copy_search_results, create_documentation_string, choice_prompt
15
 
16
  _ = dotenv.load_dotenv()
17
 
18
  subprocess.run(["python3", "make_docs.py"])
19
  subprocess.run(["python3", "make_rag_db.py"])
20
 
 
 
 
 
 
 
 
21
  client = MilvusClient("milvus.db")
22
  embedding_fn = model.dense.OpenAIEmbeddingFunction(
23
+ model_name="text-embedding-3-large",
24
+ api_key=os.environ.get("OPENAI_API_KEY"),
25
+ dimensions=3072,
26
  )
27
 
28
+ oai_client = OpenAI()
29
+
30
 
31
  def list_huggingface_resources_names() -> list[str]:
32
  """List all the names of the libraries, services, and other resources available within the HuggingFace ecosystem.
33
+
34
  Returns:
35
  A list of libraries, services, and other resources available within the HuggingFace ecosystem
36
  """
37
+ with open("repos_config.json", "r") as f:
38
  repos = json.load(f)
39
 
40
+ print([repo["title"] for repo in repos])
41
+
42
+ return [repo["title"] for repo in repos]
43
+
44
+
45
+ def search_documents(query, resource_names=None, topk=50):
46
+ """Search for relevant documents in the Milvus database."""
47
+ query_vectors = embedding_fn.encode_queries([query])
48
+
49
+ search_params = {
50
+ "collection_name": "hf_docs",
51
+ "data": query_vectors,
52
+ "limit": topk,
53
+ "output_fields": ["text", "file_path", "resource"],
54
+ }
55
+
56
+ if resource_names:
57
+ if len(resource_names) == 1:
58
+ search_params["filter"] = f"resource == '{resource_names[0]}'"
59
+ else:
60
+ resource_list = "', '".join(resource_names)
61
+ search_params["filter"] = f"resource in ['{resource_list}']"
62
 
63
+ return client.search(**search_params)
64
 
65
 
66
  def get_huggingface_documentation(topic: str, resource_names: list[str] = []) -> str:
67
  """Get the documentation for the given topic and resource names.
68
+
69
  Args:
70
  topic: Focus the docs on a specific topic (e.g. "Anthropic Provider Chat UI", "LoRA methods PEFT" or "TGI on Intel GPUs")
71
+ resource_names: A list of relevant resource names to the topic. Must be as specific as possible. Empty list means all resources.
72
+
73
  Returns:
74
  A string of documentation for the given topic and resource names
75
  """
76
+ try:
77
+ # Search for relevant documents
78
+ query_vectors = embedding_fn.encode_queries([topic])
79
+
80
+ search_params = {
81
+ "collection_name": "hf_docs",
82
+ "data": query_vectors,
83
+ "limit": 50,
84
+ "output_fields": ["text", "file_path", "resource"],
85
+ }
86
+
87
+ if resource_names:
88
+ if len(resource_names) == 1:
89
+ search_params["filter"] = f"resource == '{resource_names[0]}'"
90
+ else:
91
+ resource_list = "', '".join(resource_names)
92
+ search_params["filter"] = f"resource in ['{resource_list}']"
93
+
94
+ search_results = client.search(**search_params)
95
+
96
+ # Create temporary folder and copy files
97
+ temp_folder = str(uuid.uuid4())
98
+ copy_search_results(search_results, temp_folder)
99
+
100
+ # Generate directory tree
101
+ tree_structure = make_tree(Path(temp_folder) / "docs")
102
+
103
+ # Get relevant file IDs using GPT-4
104
+ response = oai_client.responses.parse(
105
+ model="gpt-4o",
106
+ input=[
107
+ {
108
+ "role": "user",
109
+ "content": choice_prompt.substitute(
110
+ question=topic, tree_structure=tree_structure
111
+ ),
112
+ }
113
+ ],
114
+ text_format=Response,
115
+ )
116
+
117
+ file_ids = response.output_parsed.file_ids
118
+
119
+ # Create the documentation string using the file IDs and template
120
+ documentation_string = create_documentation_string(file_ids, temp_folder)
121
+
122
+ # Clean up temporary folder
123
+ shutil.rmtree(temp_folder, ignore_errors=True)
124
+
125
+ return documentation_string
126
+
127
+ except Exception as e:
128
+ return f"Error generating documentation: {str(e)}"
129
+
130
+
131
+ def load_readme() -> str:
132
+ """Load and return the README content, skipping YAML frontmatter."""
133
+ try:
134
+ with open("README.md", "r", encoding="utf-8") as f:
135
+ content = f.read()
136
 
137
+ # Skip YAML frontmatter if it exists
138
+ if content.startswith("---"):
139
+ # Find the second '---' line
140
+ lines = content.split("\n")
141
+ start_index = 0
142
+ dash_count = 0
143
 
144
+ for i, line in enumerate(lines):
145
+ if line.strip() == "---":
146
+ dash_count += 1
147
+ if dash_count == 2:
148
+ start_index = i + 1
149
+ break
150
+
151
+ # Join the lines after the frontmatter
152
+ content = "\n".join(lines[start_index:])
153
+
154
+ return content
155
+ except FileNotFoundError:
156
+ return "README.md not found"
157
 
 
 
158
 
159
  list_resources_demo = gr.Interface(
160
  fn=list_huggingface_resources_names,
161
  inputs=[],
162
  outputs="json",
163
  title="HuggingFace Ecosystem Explorer",
164
+ description="Explore the names of the libraries, services, and other resources available within the HuggingFace ecosystem",
165
  )
166
 
167
  get_docs_demo = gr.Interface(
 
170
  outputs="text",
171
  )
172
 
173
+ # Create README tab with Markdown component
174
+ with gr.Blocks() as readme_tab:
175
+ gr.Markdown(load_readme())
176
+
177
  # Create tabbed interface
178
  demo = gr.TabbedInterface(
179
+ [readme_tab, list_resources_demo, get_docs_demo],
180
+ ["Quickstart", "List Resources", "Get Documentation"],
181
+ title="OpenHFContext7 MCP - Up-to-date Code Docs For Any Prompt",
182
+ )
183
 
184
+ demo.launch(mcp_server=True)
make_docs.py CHANGED
@@ -51,11 +51,7 @@ def clone_repo(repo_url: str, dir_to_clone: str, target_dir: str) -> bool:
51
  sparse_init = run_command(["git", "sparse-checkout", "init", "--no-cone"], cwd=target_dir)
52
  if not sparse_init: return False
53
 
54
- # Set sparse checkout patterns to only include the specified directory. Pattern explanation:
55
- # '/*' - include all files at root level
56
- # '!/*' - exclude all files at root level (overrides previous)
57
- # f'/{dir_to_clone}/' - include the specific directory
58
- # f'/{dir_to_clone}/**' - include everything under that directory
59
  sparse_patterns = ['/*', '!/*', f'/{dir_to_clone}/', f'/{dir_to_clone}/**']
60
  sparse_set = run_command(["git", "sparse-checkout", "set", "--no-cone"] + sparse_patterns, cwd=target_dir)
61
  if not sparse_set: return False
@@ -72,17 +68,25 @@ def clone_repo(repo_url: str, dir_to_clone: str, target_dir: str) -> bool:
72
  return True
73
 
74
 
75
- def save_section_to_disk(section: Dict, file_path: Path, raw_docs_path: Path):
76
-
77
- title = section["title"]
 
 
 
78
 
79
  if "sections" in section:
80
- file_path = file_path / title
81
- os.makedirs(file_path, exist_ok=True)
82
- for subsection in section["sections"]:
83
- save_section_to_disk(subsection, file_path, raw_docs_path)
 
 
 
 
84
 
85
  else:
 
86
  try:
87
  local_path = raw_docs_path / f"{section['local']}.md"
88
 
@@ -90,7 +94,9 @@ def save_section_to_disk(section: Dict, file_path: Path, raw_docs_path: Path):
90
  local_path = raw_docs_path / f"{section['local']}.mdx"
91
  assert local_path.exists(), f"File {local_path} does not exist"
92
 
93
- shutil.copy(local_path, file_path / f"{title}{local_path.suffix}")
 
 
94
 
95
  except Exception as e:
96
  # TODO: Not many cases, but handle symlinks, missing files, and other edge cases
@@ -99,20 +105,23 @@ def save_section_to_disk(section: Dict, file_path: Path, raw_docs_path: Path):
99
 
100
  def make_docs(repos: Dict, args: Dict):
101
 
102
- for repo in tqdm(repos, desc="Consolidating 🤗 Documentation"):
103
  save_repo_docs_path = Path(f"{args.repos_dir}/{repo['repo_url'].split('/')[-1]}")
104
  clone_repo(repo["repo_url"], repo["subfolder"], str(save_repo_docs_path))
105
 
106
  repo_docs_path = save_repo_docs_path / repo["subfolder"]
107
  toctree = parse_toctree_yaml(repo_docs_path / "_toctree.yml")
108
 
109
- # print(toctree)
 
 
 
110
 
111
- save_doc_path = Path(f"{args.docs_dir}/{repo['title']}")
112
- os.makedirs(save_doc_path, exist_ok=True)
113
-
114
- for block in toctree:
115
- save_section_to_disk(block, save_doc_path, repo_docs_path)
116
 
117
  shutil.rmtree(save_repo_docs_path)
118
 
@@ -128,5 +137,7 @@ if __name__ == "__main__":
128
  with open("repos_config.json", "r") as f:
129
  repos = json.load(f)
130
 
131
- # shutil.rmtree(args.docs_dir)
132
- make_docs(repos, args)
 
 
 
51
  sparse_init = run_command(["git", "sparse-checkout", "init", "--no-cone"], cwd=target_dir)
52
  if not sparse_init: return False
53
 
54
+ # Set sparse checkout patterns to only include the specified directory
 
 
 
 
55
  sparse_patterns = ['/*', '!/*', f'/{dir_to_clone}/', f'/{dir_to_clone}/**']
56
  sparse_set = run_command(["git", "sparse-checkout", "set", "--no-cone"] + sparse_patterns, cwd=target_dir)
57
  if not sparse_set: return False
 
68
  return True
69
 
70
 
71
+ def save_section_to_disk(section: Dict, file_path: Path, raw_docs_path: Path, prefix: str, index: int):
72
+ """
73
+ Recursively saves a documentation section to disk with hierarchical numbering.
74
+ """
75
+ current_number = f"{prefix}{index}"
76
+ numbered_title = f"{current_number}. {section['title']}"
77
 
78
  if "sections" in section:
79
+ # This is a directory
80
+ new_dir_path = file_path / numbered_title
81
+ os.makedirs(new_dir_path, exist_ok=True)
82
+
83
+ # The new prefix for children adds the current number, e.g., "1.1."
84
+ new_prefix = f"{current_number}."
85
+ for i, subsection in enumerate(section["sections"], 1):
86
+ save_section_to_disk(subsection, new_dir_path, raw_docs_path, new_prefix, i)
87
 
88
  else:
89
+ # This is a file
90
  try:
91
  local_path = raw_docs_path / f"{section['local']}.md"
92
 
 
94
  local_path = raw_docs_path / f"{section['local']}.mdx"
95
  assert local_path.exists(), f"File {local_path} does not exist"
96
 
97
+ # Create the numbered filename
98
+ new_filename = f"{numbered_title}{local_path.suffix}"
99
+ shutil.copy(local_path, file_path / new_filename)
100
 
101
  except Exception as e:
102
  # TODO: Not many cases, but handle symlinks, missing files, and other edge cases
 
105
 
106
  def make_docs(repos: Dict, args: Dict):
107
 
108
+ for repo_index, repo in enumerate(tqdm(repos, desc="Consolidating 🤗 Documentation"), 1):
109
  save_repo_docs_path = Path(f"{args.repos_dir}/{repo['repo_url'].split('/')[-1]}")
110
  clone_repo(repo["repo_url"], repo["subfolder"], str(save_repo_docs_path))
111
 
112
  repo_docs_path = save_repo_docs_path / repo["subfolder"]
113
  toctree = parse_toctree_yaml(repo_docs_path / "_toctree.yml")
114
 
115
+ # Create the top-level numbered directory for the repo, e.g., "1. Accelerate"
116
+ repo_title = f"{repo_index}. {repo['title']}"
117
+ repo_output_path = Path(args.docs_dir) / repo_title
118
+ os.makedirs(repo_output_path, exist_ok=True)
119
 
120
+ # The initial prefix for numbering is the repo index, e.g., "1."
121
+ prefix = f"{repo_index}."
122
+ for block_index, block in enumerate(toctree, 1):
123
+ # Start the recursive saving with the initial prefix and the block's index
124
+ save_section_to_disk(block, repo_output_path, repo_docs_path, prefix, block_index)
125
 
126
  shutil.rmtree(save_repo_docs_path)
127
 
 
137
  with open("repos_config.json", "r") as f:
138
  repos = json.load(f)
139
 
140
+ if os.path.exists(args.docs_dir):
141
+ shutil.rmtree(args.docs_dir)
142
+
143
+ make_docs(repos, args)
make_rag_db.py CHANGED
@@ -1,5 +1,7 @@
1
  import os
2
  import argparse
 
 
3
  from typing import Dict
4
  import dotenv
5
  from pathlib import Path
@@ -18,6 +20,14 @@ def create_collection(client: MilvusClient, collection_name: str, dimension: int
18
  dimension=dimension,
19
  )
20
 
 
 
 
 
 
 
 
 
21
  def main(args: Dict):
22
  client = MilvusClient("milvus.db")
23
 
@@ -29,17 +39,25 @@ def main(args: Dict):
29
 
30
  create_collection(client, args.collection_name, args.dimension)
31
 
32
- docs = Path(args.docs_dir)
33
- md_file_paths = list(docs.rglob('*.md'))
34
- mdx_file_paths = list(docs.rglob('*.mdx'))
35
- all_file_paths = md_file_paths + mdx_file_paths
36
-
37
  docs, payloads = [], []
38
- for file in tqdm(all_file_paths):
39
- embed_string = str(file).replace('docs/', '').replace('.mdx', '').replace('.md', '').replace('/', ' ')
40
 
41
- docs.append(embed_string)
42
- payloads.append({'file_path': str(file)})
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  vectors = embedding_fn.encode_documents(docs)
45
 
@@ -54,9 +72,14 @@ def main(args: Dict):
54
  if __name__ == "__main__":
55
  parser = argparse.ArgumentParser()
56
  parser.add_argument("--collection_name", type=str, default="hf_docs")
57
- parser.add_argument("--model_name", type=str, default="text-embedding-3-small")
58
- parser.add_argument("--dimension", type=int, default=1536)
59
  parser.add_argument("--docs_dir", type=str, default="docs")
 
60
  args = parser.parse_args()
61
 
 
 
 
 
62
  main(args)
 
1
  import os
2
  import argparse
3
+ import json
4
+ import re
5
  from typing import Dict
6
  import dotenv
7
  from pathlib import Path
 
20
  dimension=dimension,
21
  )
22
 
23
+
24
+ def clean_filename(s):
25
+ s = re.sub(r'\d+(?:\.\d+)*\.\s*', '', s) # Remove hierarchical numbering (e.g., "28.", "28.1.")
26
+ s = re.sub(r'[^\w\s/.-]', '', s) # Remove emojis
27
+ s = re.sub(r'\s+', ' ', s) # Clean up extra spaces
28
+ return s.strip()
29
+
30
+
31
  def main(args: Dict):
32
  client = MilvusClient("milvus.db")
33
 
 
39
 
40
  create_collection(client, args.collection_name, args.dimension)
41
 
42
+ with open(args.repos_config_path, "r") as f:
43
+ repos = json.load(f)
44
+
 
 
45
  docs, payloads = [], []
46
+ for i, repo in enumerate(repos, 1):
 
47
 
48
+ docs_path = Path('docs') / f"{i}. {repo['title']}"
49
+ md_file_paths = list(docs_path.rglob('*.md'))
50
+ mdx_file_paths = list(docs_path.rglob('*.mdx'))
51
+ all_file_paths = md_file_paths + mdx_file_paths
52
+
53
+ # print(all_file_paths[:5])
54
+
55
+ for file in all_file_paths:
56
+ embed_string = str(file).replace('docs/', '').replace('.mdx', '').replace('.md', '').replace('/', ' ')
57
+ embed_string = clean_filename(embed_string)
58
+
59
+ docs.append(embed_string)
60
+ payloads.append({'file_path': str(file), 'resource': repo['title']})
61
 
62
  vectors = embedding_fn.encode_documents(docs)
63
 
 
72
  if __name__ == "__main__":
73
  parser = argparse.ArgumentParser()
74
  parser.add_argument("--collection_name", type=str, default="hf_docs")
75
+ parser.add_argument("--model_name", type=str, default="text-embedding-3-large")
76
+ parser.add_argument("--dimension", type=int, default=3072)
77
  parser.add_argument("--docs_dir", type=str, default="docs")
78
+ parser.add_argument("--repos_config_path", type=str, default="repos_config.json")
79
  args = parser.parse_args()
80
 
81
+ if Path('milvus.db').exists():
82
+ print("Removing existing Milvus database...")
83
+ os.remove('milvus.db')
84
+
85
  main(args)
postBuild DELETED
@@ -1,2 +0,0 @@
1
- python3 make_docs.py
2
- python3 make_rag_db.py
 
 
 
repo2txt.py CHANGED
@@ -5,47 +5,21 @@ This version only includes the functionality to document the structure of a repo
5
  """
6
 
7
  import os
8
- import argparse
9
 
10
- def parse_args():
11
- """
12
- Parse command-line arguments for the script.
13
 
14
- Returns:
15
- argparse.Namespace: An object containing the parsed command-line arguments.
16
- """
17
- parser = argparse.ArgumentParser(
18
- description='Document the structure of a repository containing .md and .mdx files.',
19
- epilog='Example usage:\n python repo2txt.py -r /path/to/repo -o output.txt',
20
- formatter_class=argparse.RawDescriptionHelpFormatter
21
- )
22
-
23
- parser.add_argument('-r', '--repo_path', default=os.getcwd(),
24
- help='Path to the directory to process. Defaults to the current directory.')
25
- parser.add_argument('-o', '--output_file', default='output.txt',
26
- help='Name for the output text file. Defaults to "output.txt".')
27
-
28
- return parser.parse_args()
29
-
30
-
31
- def should_ignore(item_path, output_file_path):
32
  """
33
  Determine if a given item should be ignored.
34
  Only includes .md and .mdx files, ignores hidden files and directories.
35
 
36
  Args:
37
  item_path (str): The path of the item (file or directory) to check.
38
- output_file_path (str): The path of the output file being written to.
39
 
40
  Returns:
41
  bool: True if the item should be ignored, False otherwise.
42
  """
43
  item_name = os.path.basename(item_path)
44
 
45
- # Ignore the output file itself
46
- if os.path.abspath(item_path) == os.path.abspath(output_file_path):
47
- return True
48
-
49
  # Ignore hidden files and directories
50
  if item_name.startswith('.'):
51
  return True
@@ -58,24 +32,26 @@ def should_ignore(item_path, output_file_path):
58
  # Include directories (they will be traversed)
59
  return False
60
 
61
-
62
- def write_tree(dir_path, output_file, output_file_path, prefix="", is_root=True):
63
  """
64
- Recursively write the directory tree to the output file.
65
 
66
  Args:
67
  dir_path (str): The path of the directory to document.
68
- output_file (file object): The file object to write to.
69
- output_file_path (str): The path of the output file being written to.
70
  prefix (str): Prefix string for line indentation and structure.
71
  is_root (bool): Flag to indicate if the current directory is the root.
 
 
 
72
  """
 
 
73
  if is_root:
74
- output_file.write("└── ./\n")
75
  # Add the actual directory name as a child of ./
76
  actual_dir_name = os.path.basename(dir_path)
77
  if actual_dir_name:
78
- output_file.write(f" └── {actual_dir_name}\n")
79
  prefix = " "
80
  else:
81
  prefix = " "
@@ -84,7 +60,7 @@ def write_tree(dir_path, output_file, output_file_path, prefix="", is_root=True)
84
  try:
85
  items = os.listdir(dir_path)
86
  except PermissionError:
87
- return
88
 
89
  items.sort()
90
 
@@ -92,7 +68,7 @@ def write_tree(dir_path, output_file, output_file_path, prefix="", is_root=True)
92
  filtered_items = []
93
  for item in items:
94
  item_path = os.path.join(dir_path, item)
95
- if not should_ignore(item_path, output_file_path):
96
  filtered_items.append(item)
97
 
98
  num_items = len(filtered_items)
@@ -103,75 +79,10 @@ def write_tree(dir_path, output_file, output_file_path, prefix="", is_root=True)
103
  new_prefix = "└── " if is_last_item else "├── "
104
  child_prefix = " " if is_last_item else "│ "
105
 
106
- output_file.write(f"{prefix}{new_prefix}{item}\n")
107
 
108
  if os.path.isdir(item_path):
109
  next_prefix = prefix + child_prefix
110
- write_tree(item_path, output_file, output_file_path, next_prefix, is_root=False)
111
-
112
-
113
- def write_file_content(file_path, output_file):
114
- """
115
- Write the contents of a given file to the output file.
116
-
117
- Args:
118
- file_path (str): Path of the file to read.
119
- output_file (file object): The file object to write the contents to.
120
- """
121
- try:
122
- with open(file_path, 'r', encoding='utf-8', errors='ignore') as file:
123
- for line in file:
124
- output_file.write(line)
125
- except Exception as e:
126
- output_file.write(f"Error reading file: {e}\n")
127
-
128
-
129
- def write_file_contents_in_order(dir_path, output_file, output_file_path, repo_path):
130
- """
131
- Recursively document the contents of .md and .mdx files in directory order.
132
-
133
- Args:
134
- dir_path (str): The path of the directory to start documenting from.
135
- output_file (file object): The file object to write the contents to.
136
- output_file_path (str): The path of the output file being written to.
137
- repo_path (str): The root path of the repository for relative path calculation.
138
- """
139
- try:
140
- items = os.listdir(dir_path)
141
- except PermissionError:
142
- return
143
 
144
- items = sorted(item for item in items if not should_ignore(os.path.join(dir_path, item), output_file_path))
145
-
146
- for item in items:
147
- item_path = os.path.join(dir_path, item)
148
- relative_path = os.path.relpath(item_path, start=repo_path)
149
-
150
- if os.path.isdir(item_path):
151
- write_file_contents_in_order(item_path, output_file, output_file_path, repo_path)
152
- elif os.path.isfile(item_path):
153
- output_file.write(f"\n\n---\nFile: /{relative_path}\n---\n\n")
154
- write_file_content(item_path, output_file)
155
-
156
-
157
- def main():
158
- """
159
- Main function to execute the script logic.
160
- """
161
- args = parse_args()
162
-
163
- # Check if the provided directory path is valid
164
- if not os.path.isdir(args.repo_path):
165
- print(f"Error: The specified directory does not exist: {args.repo_path}")
166
- return
167
-
168
- with open(args.output_file, 'w', encoding='utf-8') as output_file:
169
- output_file.write("Directory Structure:\n\n")
170
- write_tree(args.repo_path, output_file, args.output_file, "", is_root=True)
171
- write_file_contents_in_order(args.repo_path, output_file, args.output_file, args.repo_path)
172
-
173
- print(f"Documentation generated successfully: {args.output_file}")
174
-
175
-
176
- if __name__ == "__main__":
177
- main()
 
5
  """
6
 
7
  import os
 
8
 
 
 
 
9
 
10
+ def should_ignore(item_path):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  """
12
  Determine if a given item should be ignored.
13
  Only includes .md and .mdx files, ignores hidden files and directories.
14
 
15
  Args:
16
  item_path (str): The path of the item (file or directory) to check.
 
17
 
18
  Returns:
19
  bool: True if the item should be ignored, False otherwise.
20
  """
21
  item_name = os.path.basename(item_path)
22
 
 
 
 
 
23
  # Ignore hidden files and directories
24
  if item_name.startswith('.'):
25
  return True
 
32
  # Include directories (they will be traversed)
33
  return False
34
 
35
+ def make_tree(dir_path, prefix="", is_root=True):
 
36
  """
37
+ Recursively generate the directory tree as a string.
38
 
39
  Args:
40
  dir_path (str): The path of the directory to document.
 
 
41
  prefix (str): Prefix string for line indentation and structure.
42
  is_root (bool): Flag to indicate if the current directory is the root.
43
+
44
+ Returns:
45
+ str: The tree structure as a string.
46
  """
47
+ tree_string = ""
48
+
49
  if is_root:
50
+ tree_string += "└── ./\n"
51
  # Add the actual directory name as a child of ./
52
  actual_dir_name = os.path.basename(dir_path)
53
  if actual_dir_name:
54
+ tree_string += f" └── {actual_dir_name}\n"
55
  prefix = " "
56
  else:
57
  prefix = " "
 
60
  try:
61
  items = os.listdir(dir_path)
62
  except PermissionError:
63
+ return tree_string
64
 
65
  items.sort()
66
 
 
68
  filtered_items = []
69
  for item in items:
70
  item_path = os.path.join(dir_path, item)
71
+ if not should_ignore(item_path):
72
  filtered_items.append(item)
73
 
74
  num_items = len(filtered_items)
 
79
  new_prefix = "└── " if is_last_item else "├── "
80
  child_prefix = " " if is_last_item else "│ "
81
 
82
+ tree_string += f"{prefix}{new_prefix}{item}\n"
83
 
84
  if os.path.isdir(item_path):
85
  next_prefix = prefix + child_prefix
86
+ tree_string += make_tree(item_path, next_prefix, is_root=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
+ return tree_string
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,5 +1,6 @@
1
- pymilvus==2.5.10
2
- pymilvus_model==0.3.2
3
- python-dotenv==1.1.0
4
- PyYAML==6.0.2
5
- tqdm==4.65.0
 
 
1
+ pymilvus
2
+ pymilvus_model
3
+ python-dotenv
4
+ PyYAML
5
+ tqdm
6
+ openai
schemas.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+
3
+ class Response(BaseModel):
4
+ file_ids: list[str]
utils.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import shutil
2
+ from pathlib import Path
3
+ from string import Template
4
+
5
+
6
+ doc_template = Template("""
7
+
8
+ ---
9
+ File: $file_path
10
+ ---
11
+
12
+ $file_content
13
+
14
+ """.strip())
15
+
16
+
17
+ choice_prompt = Template("""
18
+
19
+ The user has asked the following question: $question
20
+
21
+ The goal is get the user the 3 most relevant documentation files to answer the question.
22
+
23
+ Here is the tree structure of the documentation. Your task is to return the numeric ids \
24
+ associated with the 3 most relevant .md and .mdx files.
25
+
26
+ <tree>
27
+ $tree_structure
28
+ </tree>
29
+
30
+ Sample response: ["1.3.2", "11.4.12", "7.12.11"]
31
+ Top 3 file ids:
32
+
33
+ """.strip())
34
+
35
+
36
+ def copy_search_results(search_results, dest_folder):
37
+ """Copy files from search results to destination folder."""
38
+ for item in search_results[0]:
39
+ file_path = item['entity']['file_path']
40
+ dest_path = Path(dest_folder) / file_path
41
+
42
+ dest_path.parent.mkdir(parents=True, exist_ok=True)
43
+ shutil.copy2(file_path, dest_path)
44
+
45
+
46
+ def create_documentation_string(file_ids, temp_folder):
47
+ """Create documentation string from file IDs using the template."""
48
+ documentation_parts = []
49
+
50
+ for file_id in file_ids:
51
+ # Find the corresponding file in the temp folder
52
+ docs_path = Path(temp_folder) / "docs"
53
+ for file_path in docs_path.rglob("*.md*"):
54
+ if file_id in str(file_path):
55
+ try:
56
+ with open(file_path, 'r', encoding='utf-8') as f:
57
+ content = f.read()
58
+
59
+ formatted_doc = doc_template.substitute(
60
+ file_path=str(file_path.relative_to(docs_path)),
61
+ file_content=content
62
+ )
63
+ documentation_parts.append(formatted_doc)
64
+ break
65
+ except Exception as e:
66
+ print(f"Error reading file {file_path}: {e}")
67
+
68
+ return "\n\n".join(documentation_parts)