websearch / README.md
victor's picture
victor HF Staff
Update README and app.py: change title to 'Web Search MCP', enhance rate limit to 360 requests/hour, and improve logging for rate limit and content extraction.
f2aca49
---
title: Web Search MCP
emoji: πŸ”Ž
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
short_description: Search and extract web content for LLM ingestion
---
# Web Search MCP Server
A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles.
## Features
- **Dual search modes**:
- **General Search**: Get diverse results from blogs, documentation, articles, and more
- **News Search**: Find fresh news articles and breaking stories from news sources
- **Real-time web search**: Search for any topic with up-to-date results
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
- **Flexible results**: Control the number of results (1-20)
## Prerequisites
1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key
2. **Python 3.8+**: Ensure you have Python installed
3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application
## Installation
1. Clone or download this repository
2. Install dependencies:
```bash
pip install -r requirements.txt
```
Or install manually:
```bash
pip install "gradio[mcp]" httpx trafilatura python-dateutil limits
```
3. Set your Serper API key:
```bash
export SERPER_API_KEY="your-api-key-here"
```
## Usage
### Starting the MCP Server
```bash
python app_mcp.py
```
The server will start on `http://localhost:7860` with the MCP endpoint at:
```
http://localhost:7860/gradio_api/mcp/sse
```
### Connecting to LLM Clients
#### Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"web-search": {
"command": "python",
"args": ["/path/to/app_mcp.py"],
"env": {
"SERPER_API_KEY": "your-api-key-here"
}
}
}
}
```
#### Direct URL Connection
For clients that support URL-based MCP servers:
1. Start the server: `python app_mcp.py`
2. Connect to: `http://localhost:7860/gradio_api/mcp/sse`
## Tool Documentation
### `search_web` Function
**Purpose**: Search the web for information or fresh news and extract content.
**Parameters**:
- `query` (str, **REQUIRED**): The search query
- Examples: "OpenAI news", "climate change 2024", "python tutorial"
- `num_results` (int, **OPTIONAL**): Number of results to fetch
- Default: 4
- Range: 1-20
- More results provide more context but take longer
- `search_type` (str, **OPTIONAL**): Type of search to perform
- Default: "search" (general web search)
- Options: "search" or "news"
- Use "news" for fresh, time-sensitive news articles
- Use "search" for general information, documentation, tutorials
**Returns**: Formatted text containing:
- Summary of extraction results
- For each article:
- Title
- Source and date
- URL
- Extracted main content
**When to use each search type**:
- **Use "news" mode for**:
- Breaking news or very recent events
- Time-sensitive information ("today", "this week")
- Current affairs and latest developments
- Press releases and announcements
- **Use "search" mode for**:
- General information and research
- Technical documentation or tutorials
- Historical information
- Diverse perspectives from various sources
- How-to guides and explanations
**Example Usage in LLM**:
```
# News mode examples
"Search for breaking news about OpenAI" -> uses news mode
"Find today's stock market updates" -> uses news mode
"Get latest climate change developments" -> uses news mode
# Search mode examples (default)
"Search for Python programming tutorials" -> uses search mode
"Find information about machine learning algorithms" -> uses search mode
"Research historical data about climate change" -> uses search mode
```
## Error Handling
The tool handles various error scenarios:
- Missing API key: Clear error message with setup instructions
- Rate limiting: Informs when limit is exceeded
- Failed extractions: Reports which articles couldn't be extracted
- Network errors: Graceful error messages
## Testing
You can test the server manually:
1. Open `http://localhost:7860` in your browser
2. Enter a search query
3. Adjust the number of results
4. Click "Search" to see the extracted content
## Tips for LLM Usage
1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
2. **Be specific with queries**: More specific queries yield better results
3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
4. **Check dates**: The tool shows article dates for temporal context
5. **Follow up**: Use the extracted content to ask follow-up questions
## Limitations
- Rate limited to 200 requests per hour
- Extraction quality depends on website structure
- Some websites may block automated access
- News mode focuses on recent articles from news sources
- Search mode provides diverse results but may include older content
## Troubleshooting
1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported
2. **Rate limit errors**: Wait before making more requests
3. **No content extracted**: Some websites block scrapers; try different queries
4. **Connection errors**: Check your internet connection and firewall settings