File size: 5,551 Bytes
add19e8 f2aca49 6ef48c6 add19e8 6ef48c6 add19e8 f2aca49 add19e8 6ef48c6 9d978bc 6ef48c6 9d978bc 6ef48c6 e90574b 6ef48c6 9d978bc 6ef48c6 9d978bc 6ef48c6 9d978bc 6ef48c6 9d978bc 6ef48c6 9d978bc 6ef48c6 e90574b 6ef48c6 e90574b 6ef48c6 e90574b 9d978bc e90574b 6ef48c6 e90574b 6ef48c6 9d978bc e90574b 6ef48c6 e90574b 6ef48c6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
title: Web Search MCP
emoji: π
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
short_description: Search and extract web content for LLM ingestion
---
# Web Search MCP Server
A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles.
## Features
- **Dual search modes**:
- **General Search**: Get diverse results from blogs, documentation, articles, and more
- **News Search**: Find fresh news articles and breaking stories from news sources
- **Real-time web search**: Search for any topic with up-to-date results
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
- **Flexible results**: Control the number of results (1-20)
## Prerequisites
1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key
2. **Python 3.8+**: Ensure you have Python installed
3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application
## Installation
1. Clone or download this repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
Or install manually:
```bash
pip install "gradio[mcp]" httpx trafilatura python-dateutil limits
```
3. Set your Serper API key:
```bash
export SERPER_API_KEY="your-api-key-here"
```
## Usage
### Starting the MCP Server
```bash
python app_mcp.py
```
The server will start on `http://localhost:7860` with the MCP endpoint at:
```
http://localhost:7860/gradio_api/mcp/sse
```
### Connecting to LLM Clients
#### Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"web-search": {
"command": "python",
"args": ["/path/to/app_mcp.py"],
"env": {
"SERPER_API_KEY": "your-api-key-here"
}
}
}
}
```
#### Direct URL Connection
For clients that support URL-based MCP servers:
1. Start the server: `python app_mcp.py`
2. Connect to: `http://localhost:7860/gradio_api/mcp/sse`
## Tool Documentation
### `search_web` Function
**Purpose**: Search the web for information or fresh news and extract content.
**Parameters**:
- `query` (str, **REQUIRED**): The search query
- Examples: "OpenAI news", "climate change 2024", "python tutorial"
- `num_results` (int, **OPTIONAL**): Number of results to fetch
- Default: 4
- Range: 1-20
- More results provide more context but take longer
- `search_type` (str, **OPTIONAL**): Type of search to perform
- Default: "search" (general web search)
- Options: "search" or "news"
- Use "news" for fresh, time-sensitive news articles
- Use "search" for general information, documentation, tutorials
**Returns**: Formatted text containing:
- Summary of extraction results
- For each article:
- Title
- Source and date
- URL
- Extracted main content
**When to use each search type**:
- **Use "news" mode for**:
- Breaking news or very recent events
- Time-sensitive information ("today", "this week")
- Current affairs and latest developments
- Press releases and announcements
- **Use "search" mode for**:
- General information and research
- Technical documentation or tutorials
- Historical information
- Diverse perspectives from various sources
- How-to guides and explanations
**Example Usage in LLM**:
```
# News mode examples
"Search for breaking news about OpenAI" -> uses news mode
"Find today's stock market updates" -> uses news mode
"Get latest climate change developments" -> uses news mode
# Search mode examples (default)
"Search for Python programming tutorials" -> uses search mode
"Find information about machine learning algorithms" -> uses search mode
"Research historical data about climate change" -> uses search mode
```
## Error Handling
The tool handles various error scenarios:
- Missing API key: Clear error message with setup instructions
- Rate limiting: Informs when limit is exceeded
- Failed extractions: Reports which articles couldn't be extracted
- Network errors: Graceful error messages
## Testing
You can test the server manually:
1. Open `http://localhost:7860` in your browser
2. Enter a search query
3. Adjust the number of results
4. Click "Search" to see the extracted content
## Tips for LLM Usage
1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
2. **Be specific with queries**: More specific queries yield better results
3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
4. **Check dates**: The tool shows article dates for temporal context
5. **Follow up**: Use the extracted content to ask follow-up questions
## Limitations
- Rate limited to 200 requests per hour
- Extraction quality depends on website structure
- Some websites may block automated access
- News mode focuses on recent articles from news sources
- Search mode provides diverse results but may include older content
## Troubleshooting
1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported
2. **Rate limit errors**: Wait before making more requests
3. **No content extracted**: Some websites block scrapers; try different queries
4. **Connection errors**: Check your internet connection and firewall settings |