|
--- |
|
title: Web Search MCP |
|
emoji: π |
|
colorFrom: red |
|
colorTo: green |
|
sdk: gradio |
|
sdk_version: 5.36.2 |
|
app_file: app.py |
|
pinned: false |
|
short_description: Search and extract web content for LLM ingestion |
|
--- |
|
|
|
# Web Search MCP Server |
|
|
|
A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles. |
|
|
|
## Features |
|
|
|
- **Dual search modes**: |
|
- **General Search**: Get diverse results from blogs, documentation, articles, and more |
|
- **News Search**: Find fresh news articles and breaking stories from news sources |
|
- **Real-time web search**: Search for any topic with up-to-date results |
|
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate |
|
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse |
|
- **Structured output**: Returns formatted content with metadata (title, source, date, URL) |
|
- **Flexible results**: Control the number of results (1-20) |
|
|
|
## Prerequisites |
|
|
|
1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key |
|
2. **Python 3.8+**: Ensure you have Python installed |
|
3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application |
|
|
|
## Installation |
|
|
|
1. Clone or download this repository |
|
2. Install dependencies: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
Or install manually: |
|
```bash |
|
pip install "gradio[mcp]" httpx trafilatura python-dateutil limits |
|
``` |
|
|
|
3. Set your Serper API key: |
|
```bash |
|
export SERPER_API_KEY="your-api-key-here" |
|
``` |
|
|
|
## Usage |
|
|
|
### Starting the MCP Server |
|
|
|
```bash |
|
python app_mcp.py |
|
``` |
|
|
|
The server will start on `http://localhost:7860` with the MCP endpoint at: |
|
``` |
|
http://localhost:7860/gradio_api/mcp/sse |
|
``` |
|
|
|
### Connecting to LLM Clients |
|
|
|
#### Claude Desktop |
|
Add to your `claude_desktop_config.json`: |
|
```json |
|
{ |
|
"mcpServers": { |
|
"web-search": { |
|
"command": "python", |
|
"args": ["/path/to/app_mcp.py"], |
|
"env": { |
|
"SERPER_API_KEY": "your-api-key-here" |
|
} |
|
} |
|
} |
|
} |
|
``` |
|
|
|
#### Direct URL Connection |
|
For clients that support URL-based MCP servers: |
|
1. Start the server: `python app_mcp.py` |
|
2. Connect to: `http://localhost:7860/gradio_api/mcp/sse` |
|
|
|
## Tool Documentation |
|
|
|
### `search_web` Function |
|
|
|
**Purpose**: Search the web for information or fresh news and extract content. |
|
|
|
**Parameters**: |
|
- `query` (str, **REQUIRED**): The search query |
|
- Examples: "OpenAI news", "climate change 2024", "python tutorial" |
|
|
|
- `num_results` (int, **OPTIONAL**): Number of results to fetch |
|
- Default: 4 |
|
- Range: 1-20 |
|
- More results provide more context but take longer |
|
|
|
- `search_type` (str, **OPTIONAL**): Type of search to perform |
|
- Default: "search" (general web search) |
|
- Options: "search" or "news" |
|
- Use "news" for fresh, time-sensitive news articles |
|
- Use "search" for general information, documentation, tutorials |
|
|
|
**Returns**: Formatted text containing: |
|
- Summary of extraction results |
|
- For each article: |
|
- Title |
|
- Source and date |
|
- URL |
|
- Extracted main content |
|
|
|
**When to use each search type**: |
|
- **Use "news" mode for**: |
|
- Breaking news or very recent events |
|
- Time-sensitive information ("today", "this week") |
|
- Current affairs and latest developments |
|
- Press releases and announcements |
|
|
|
- **Use "search" mode for**: |
|
- General information and research |
|
- Technical documentation or tutorials |
|
- Historical information |
|
- Diverse perspectives from various sources |
|
- How-to guides and explanations |
|
|
|
**Example Usage in LLM**: |
|
``` |
|
# News mode examples |
|
"Search for breaking news about OpenAI" -> uses news mode |
|
"Find today's stock market updates" -> uses news mode |
|
"Get latest climate change developments" -> uses news mode |
|
|
|
# Search mode examples (default) |
|
"Search for Python programming tutorials" -> uses search mode |
|
"Find information about machine learning algorithms" -> uses search mode |
|
"Research historical data about climate change" -> uses search mode |
|
``` |
|
|
|
## Error Handling |
|
|
|
The tool handles various error scenarios: |
|
- Missing API key: Clear error message with setup instructions |
|
- Rate limiting: Informs when limit is exceeded |
|
- Failed extractions: Reports which articles couldn't be extracted |
|
- Network errors: Graceful error messages |
|
|
|
## Testing |
|
|
|
You can test the server manually: |
|
1. Open `http://localhost:7860` in your browser |
|
2. Enter a search query |
|
3. Adjust the number of results |
|
4. Click "Search" to see the extracted content |
|
|
|
## Tips for LLM Usage |
|
|
|
1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information |
|
2. **Be specific with queries**: More specific queries yield better results |
|
3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research |
|
4. **Check dates**: The tool shows article dates for temporal context |
|
5. **Follow up**: Use the extracted content to ask follow-up questions |
|
|
|
## Limitations |
|
|
|
- Rate limited to 200 requests per hour |
|
- Extraction quality depends on website structure |
|
- Some websites may block automated access |
|
- News mode focuses on recent articles from news sources |
|
- Search mode provides diverse results but may include older content |
|
|
|
## Troubleshooting |
|
|
|
1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported |
|
2. **Rate limit errors**: Wait before making more requests |
|
3. **No content extracted**: Some websites block scrapers; try different queries |
|
4. **Connection errors**: Check your internet connection and firewall settings |