File size: 5,551 Bytes
add19e8
f2aca49
6ef48c6
add19e8
6ef48c6
add19e8
 
 
 
f2aca49
add19e8
 
6ef48c6
 
9d978bc
6ef48c6
 
 
9d978bc
 
 
 
6ef48c6
 
 
 
e90574b
 
 
6ef48c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d978bc
6ef48c6
 
 
9d978bc
6ef48c6
 
 
 
 
 
9d978bc
 
 
 
 
 
6ef48c6
 
 
 
 
 
 
 
9d978bc
 
 
 
 
 
 
 
 
 
 
 
 
 
6ef48c6
 
9d978bc
 
 
 
 
 
 
 
 
6ef48c6
 
 
 
 
 
 
 
 
 
 
e90574b
6ef48c6
 
 
 
 
e90574b
6ef48c6
e90574b
9d978bc
 
 
 
 
e90574b
6ef48c6
e90574b
6ef48c6
 
 
9d978bc
 
e90574b
6ef48c6
e90574b
6ef48c6
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
---
title: Web Search MCP
emoji: πŸ”Ž
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
short_description: Search and extract web content for LLM ingestion
---

# Web Search MCP Server

A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles.

## Features

- **Dual search modes**: 
  - **General Search**: Get diverse results from blogs, documentation, articles, and more
  - **News Search**: Find fresh news articles and breaking stories from news sources
- **Real-time web search**: Search for any topic with up-to-date results
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
- **Flexible results**: Control the number of results (1-20)

## Prerequisites

1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key
2. **Python 3.8+**: Ensure you have Python installed
3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application

## Installation

1. Clone or download this repository
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
   Or install manually:
   ```bash
   pip install "gradio[mcp]" httpx trafilatura python-dateutil limits
   ```

3. Set your Serper API key:
   ```bash
   export SERPER_API_KEY="your-api-key-here"
   ```

## Usage

### Starting the MCP Server

```bash
python app_mcp.py
```

The server will start on `http://localhost:7860` with the MCP endpoint at:
```
http://localhost:7860/gradio_api/mcp/sse
```

### Connecting to LLM Clients

#### Claude Desktop
Add to your `claude_desktop_config.json`:
```json
{
  "mcpServers": {
    "web-search": {
      "command": "python",
      "args": ["/path/to/app_mcp.py"],
      "env": {
        "SERPER_API_KEY": "your-api-key-here"
      }
    }
  }
}
```

#### Direct URL Connection
For clients that support URL-based MCP servers:
1. Start the server: `python app_mcp.py`
2. Connect to: `http://localhost:7860/gradio_api/mcp/sse`

## Tool Documentation

### `search_web` Function

**Purpose**: Search the web for information or fresh news and extract content.

**Parameters**:
- `query` (str, **REQUIRED**): The search query
  - Examples: "OpenAI news", "climate change 2024", "python tutorial"
  
- `num_results` (int, **OPTIONAL**): Number of results to fetch
  - Default: 4
  - Range: 1-20
  - More results provide more context but take longer

- `search_type` (str, **OPTIONAL**): Type of search to perform
  - Default: "search" (general web search)
  - Options: "search" or "news"
  - Use "news" for fresh, time-sensitive news articles
  - Use "search" for general information, documentation, tutorials

**Returns**: Formatted text containing:
- Summary of extraction results
- For each article:
  - Title
  - Source and date
  - URL
  - Extracted main content

**When to use each search type**:
- **Use "news" mode for**:
  - Breaking news or very recent events
  - Time-sensitive information ("today", "this week")
  - Current affairs and latest developments
  - Press releases and announcements

- **Use "search" mode for**:
  - General information and research
  - Technical documentation or tutorials
  - Historical information
  - Diverse perspectives from various sources
  - How-to guides and explanations

**Example Usage in LLM**:
```
# News mode examples
"Search for breaking news about OpenAI" -> uses news mode
"Find today's stock market updates" -> uses news mode
"Get latest climate change developments" -> uses news mode

# Search mode examples (default)
"Search for Python programming tutorials" -> uses search mode
"Find information about machine learning algorithms" -> uses search mode
"Research historical data about climate change" -> uses search mode
```

## Error Handling

The tool handles various error scenarios:
- Missing API key: Clear error message with setup instructions
- Rate limiting: Informs when limit is exceeded
- Failed extractions: Reports which articles couldn't be extracted
- Network errors: Graceful error messages

## Testing

You can test the server manually:
1. Open `http://localhost:7860` in your browser
2. Enter a search query
3. Adjust the number of results
4. Click "Search" to see the extracted content

## Tips for LLM Usage

1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
2. **Be specific with queries**: More specific queries yield better results
3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
4. **Check dates**: The tool shows article dates for temporal context
5. **Follow up**: Use the extracted content to ask follow-up questions

## Limitations

- Rate limited to 200 requests per hour
- Extraction quality depends on website structure
- Some websites may block automated access
- News mode focuses on recent articles from news sources
- Search mode provides diverse results but may include older content

## Troubleshooting

1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported
2. **Rate limit errors**: Wait before making more requests
3. **No content extracted**: Some websites block scrapers; try different queries
4. **Connection errors**: Check your internet connection and firewall settings