Enhance README and app.py: clarify search functionality, add search type options, and improve usage examples for web search capabilities.
Browse files
README.md
CHANGED
@@ -11,11 +11,14 @@ pinned: false
|
|
11 |
|
12 |
# Web Search MCP Server
|
13 |
|
14 |
-
A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from
|
15 |
|
16 |
## Features
|
17 |
|
18 |
-
- **
|
|
|
|
|
|
|
19 |
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
|
20 |
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
|
21 |
- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
|
@@ -84,17 +87,23 @@ For clients that support URL-based MCP servers:
|
|
84 |
|
85 |
### `search_web` Function
|
86 |
|
87 |
-
**Purpose**: Search the web for
|
88 |
|
89 |
**Parameters**:
|
90 |
- `query` (str, **REQUIRED**): The search query
|
91 |
-
- Examples: "OpenAI news", "climate change 2024", "python
|
92 |
|
93 |
- `num_results` (int, **OPTIONAL**): Number of results to fetch
|
94 |
- Default: 4
|
95 |
- Range: 1-20
|
96 |
- More results provide more context but take longer
|
97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
**Returns**: Formatted text containing:
|
99 |
- Summary of extraction results
|
100 |
- For each article:
|
@@ -103,11 +112,31 @@ For clients that support URL-based MCP servers:
|
|
103 |
- URL
|
104 |
- Extracted main content
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
**Example Usage in LLM**:
|
107 |
```
|
108 |
-
|
109 |
-
"
|
110 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
```
|
112 |
|
113 |
## Error Handling
|
@@ -128,17 +157,19 @@ You can test the server manually:
|
|
128 |
|
129 |
## Tips for LLM Usage
|
130 |
|
131 |
-
1. **
|
132 |
-
2. **
|
133 |
-
3. **
|
134 |
-
4. **
|
|
|
135 |
|
136 |
## Limitations
|
137 |
|
138 |
- Rate limited to 200 requests per hour
|
139 |
-
- Only searches news articles (not general web pages)
|
140 |
- Extraction quality depends on website structure
|
141 |
- Some websites may block automated access
|
|
|
|
|
142 |
|
143 |
## Troubleshooting
|
144 |
|
|
|
11 |
|
12 |
# Web Search MCP Server
|
13 |
|
14 |
+
A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles.
|
15 |
|
16 |
## Features
|
17 |
|
18 |
+
- **Dual search modes**:
|
19 |
+
- **General Search**: Get diverse results from blogs, documentation, articles, and more
|
20 |
+
- **News Search**: Find fresh news articles and breaking stories from news sources
|
21 |
+
- **Real-time web search**: Search for any topic with up-to-date results
|
22 |
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
|
23 |
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
|
24 |
- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
|
|
|
87 |
|
88 |
### `search_web` Function
|
89 |
|
90 |
+
**Purpose**: Search the web for information or fresh news and extract content.
|
91 |
|
92 |
**Parameters**:
|
93 |
- `query` (str, **REQUIRED**): The search query
|
94 |
+
- Examples: "OpenAI news", "climate change 2024", "python tutorial"
|
95 |
|
96 |
- `num_results` (int, **OPTIONAL**): Number of results to fetch
|
97 |
- Default: 4
|
98 |
- Range: 1-20
|
99 |
- More results provide more context but take longer
|
100 |
|
101 |
+
- `search_type` (str, **OPTIONAL**): Type of search to perform
|
102 |
+
- Default: "search" (general web search)
|
103 |
+
- Options: "search" or "news"
|
104 |
+
- Use "news" for fresh, time-sensitive news articles
|
105 |
+
- Use "search" for general information, documentation, tutorials
|
106 |
+
|
107 |
**Returns**: Formatted text containing:
|
108 |
- Summary of extraction results
|
109 |
- For each article:
|
|
|
112 |
- URL
|
113 |
- Extracted main content
|
114 |
|
115 |
+
**When to use each search type**:
|
116 |
+
- **Use "news" mode for**:
|
117 |
+
- Breaking news or very recent events
|
118 |
+
- Time-sensitive information ("today", "this week")
|
119 |
+
- Current affairs and latest developments
|
120 |
+
- Press releases and announcements
|
121 |
+
|
122 |
+
- **Use "search" mode for**:
|
123 |
+
- General information and research
|
124 |
+
- Technical documentation or tutorials
|
125 |
+
- Historical information
|
126 |
+
- Diverse perspectives from various sources
|
127 |
+
- How-to guides and explanations
|
128 |
+
|
129 |
**Example Usage in LLM**:
|
130 |
```
|
131 |
+
# News mode examples
|
132 |
+
"Search for breaking news about OpenAI" -> uses news mode
|
133 |
+
"Find today's stock market updates" -> uses news mode
|
134 |
+
"Get latest climate change developments" -> uses news mode
|
135 |
+
|
136 |
+
# Search mode examples (default)
|
137 |
+
"Search for Python programming tutorials" -> uses search mode
|
138 |
+
"Find information about machine learning algorithms" -> uses search mode
|
139 |
+
"Research historical data about climate change" -> uses search mode
|
140 |
```
|
141 |
|
142 |
## Error Handling
|
|
|
157 |
|
158 |
## Tips for LLM Usage
|
159 |
|
160 |
+
1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
|
161 |
+
2. **Be specific with queries**: More specific queries yield better results
|
162 |
+
3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
|
163 |
+
4. **Check dates**: The tool shows article dates for temporal context
|
164 |
+
5. **Follow up**: Use the extracted content to ask follow-up questions
|
165 |
|
166 |
## Limitations
|
167 |
|
168 |
- Rate limited to 200 requests per hour
|
|
|
169 |
- Extraction quality depends on website structure
|
170 |
- Some websites may block automated access
|
171 |
+
- News mode focuses on recent articles from news sources
|
172 |
+
- Search mode provides diverse results but may include older content
|
173 |
|
174 |
## Troubleshooting
|
175 |
|
app.py
CHANGED
@@ -29,7 +29,8 @@ from limits.aio.strategies import MovingWindowRateLimiter
|
|
29 |
|
30 |
# Configuration
|
31 |
SERPER_API_KEY = os.getenv("SERPER_API_KEY")
|
32 |
-
|
|
|
33 |
HEADERS = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
|
34 |
|
35 |
# Rate limiting
|
@@ -38,29 +39,45 @@ limiter = MovingWindowRateLimiter(storage)
|
|
38 |
rate_limit = parse("200/hour")
|
39 |
|
40 |
|
41 |
-
async def search_web(query: str, num_results: Optional[int] = 4) -> str:
|
42 |
"""
|
43 |
-
Search the web for
|
44 |
|
45 |
-
This tool
|
46 |
-
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
Args:
|
50 |
query (str): The search query. This is REQUIRED. Examples: "apple inc earnings",
|
51 |
"climate change 2024", "AI developments"
|
|
|
|
|
|
|
52 |
num_results (int): Number of results to fetch. This is OPTIONAL. Default is 4.
|
53 |
Range: 1-20. More results = more context but longer response time.
|
54 |
|
55 |
Returns:
|
56 |
-
str: Formatted text containing extracted
|
57 |
source, date, URL, and main text) for each result, separated by dividers.
|
58 |
Returns error message if API key is missing or search fails.
|
59 |
|
60 |
Examples:
|
61 |
-
- search_web("OpenAI news", 5) - Get 5
|
62 |
-
- search_web("python
|
63 |
-
- search_web("stock market today", 10) - Get 10 articles about today's market
|
|
|
64 |
"""
|
65 |
if not SERPER_API_KEY:
|
66 |
return "Error: SERPER_API_KEY environment variable is not set. Please set it to use this tool."
|
@@ -69,28 +86,44 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
|
|
69 |
if num_results is None:
|
70 |
num_results = 4
|
71 |
num_results = max(1, min(20, num_results))
|
|
|
|
|
|
|
|
|
72 |
|
73 |
try:
|
74 |
# Check rate limit
|
75 |
if not await limiter.hit(rate_limit, "global"):
|
76 |
return "Error: Rate limit exceeded. Please try again later (limit: 200 requests per hour)."
|
77 |
|
78 |
-
#
|
79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
80 |
async with httpx.AsyncClient(timeout=15) as client:
|
81 |
-
resp = await client.post(
|
82 |
|
83 |
if resp.status_code != 200:
|
84 |
return f"Error: Search API returned status {resp.status_code}. Please check your API key and try again."
|
85 |
|
86 |
-
|
87 |
-
if
|
|
|
|
|
|
|
|
|
|
|
88 |
return (
|
89 |
-
f"No results found for query: '{query}'. Try a different search term."
|
90 |
)
|
91 |
|
92 |
# Fetch HTML content concurrently
|
93 |
-
urls = [
|
94 |
async with httpx.AsyncClient(timeout=20, follow_redirects=True) as client:
|
95 |
tasks = [client.get(u) for u in urls]
|
96 |
responses = await asyncio.gather(*tasks, return_exceptions=True)
|
@@ -99,7 +132,7 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
|
|
99 |
chunks = []
|
100 |
successful_extractions = 0
|
101 |
|
102 |
-
for meta, response in zip(
|
103 |
if isinstance(response, Exception):
|
104 |
continue
|
105 |
|
@@ -115,16 +148,22 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
|
|
115 |
|
116 |
# Parse and format date
|
117 |
try:
|
118 |
-
|
119 |
-
|
120 |
-
|
|
|
|
|
|
|
121 |
except Exception:
|
122 |
-
date_iso =
|
123 |
|
124 |
# Format the chunk
|
|
|
|
|
|
|
125 |
chunk = (
|
126 |
f"## {meta['title']}\n"
|
127 |
-
f"**Source:** {
|
128 |
f"**Date:** {date_iso}\n"
|
129 |
f"**URL:** {meta['link']}\n\n"
|
130 |
f"{body.strip()}\n"
|
@@ -132,10 +171,10 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
|
|
132 |
chunks.append(chunk)
|
133 |
|
134 |
if not chunks:
|
135 |
-
return f"Found {len(
|
136 |
|
137 |
result = "\n---\n".join(chunks)
|
138 |
-
summary = f"Successfully extracted content from {successful_extractions} out of {len(
|
139 |
|
140 |
return summary + result
|
141 |
|
@@ -149,8 +188,12 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
|
|
149 |
"""
|
150 |
# 🔍 Web Search MCP Server
|
151 |
|
152 |
-
This MCP server provides web search capabilities to LLMs. It
|
153 |
-
|
|
|
|
|
|
|
|
|
154 |
|
155 |
**Note:** This interface is primarily designed for MCP tool usage by LLMs, but you can
|
156 |
also test it manually below.
|
@@ -158,18 +201,28 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
|
|
158 |
)
|
159 |
|
160 |
with gr.Row():
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
166 |
num_results_input = gr.Slider(
|
167 |
minimum=1,
|
168 |
maximum=20,
|
169 |
value=4,
|
170 |
step=1,
|
171 |
label="Number of Results",
|
172 |
-
info="Optional: How many
|
173 |
)
|
174 |
|
175 |
output = gr.Textbox(
|
@@ -184,20 +237,21 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
|
|
184 |
# Add examples
|
185 |
gr.Examples(
|
186 |
examples=[
|
187 |
-
["OpenAI GPT-5 news", 5],
|
188 |
-
["
|
189 |
-
["
|
190 |
-
["
|
191 |
-
["
|
|
|
192 |
],
|
193 |
-
inputs=[query_input, num_results_input],
|
194 |
outputs=output,
|
195 |
fn=search_web,
|
196 |
cache_examples=False,
|
197 |
)
|
198 |
|
199 |
search_button.click(
|
200 |
-
fn=search_web, inputs=[query_input, num_results_input], outputs=output
|
201 |
)
|
202 |
|
203 |
|
|
|
29 |
|
30 |
# Configuration
|
31 |
SERPER_API_KEY = os.getenv("SERPER_API_KEY")
|
32 |
+
SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
|
33 |
+
SERPER_NEWS_ENDPOINT = "https://google.serper.dev/news"
|
34 |
HEADERS = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
|
35 |
|
36 |
# Rate limiting
|
|
|
39 |
rate_limit = parse("200/hour")
|
40 |
|
41 |
|
42 |
+
async def search_web(query: str, search_type: str = "search", num_results: Optional[int] = 4) -> str:
|
43 |
"""
|
44 |
+
Search the web for information or fresh news, returning extracted content.
|
45 |
|
46 |
+
This tool can perform two types of searches:
|
47 |
+
- "search" (default): General web search for diverse, relevant content from various sources
|
48 |
+
- "news": Specifically searches for fresh news articles and breaking stories
|
49 |
+
|
50 |
+
Use "news" mode when looking for:
|
51 |
+
- Breaking news or very recent events
|
52 |
+
- Time-sensitive information
|
53 |
+
- Current affairs and latest developments
|
54 |
+
- Today's/this week's happenings
|
55 |
+
|
56 |
+
Use "search" mode (default) for:
|
57 |
+
- General information and research
|
58 |
+
- Technical documentation or guides
|
59 |
+
- Historical information
|
60 |
+
- Diverse perspectives from various sources
|
61 |
|
62 |
Args:
|
63 |
query (str): The search query. This is REQUIRED. Examples: "apple inc earnings",
|
64 |
"climate change 2024", "AI developments"
|
65 |
+
search_type (str): Type of search. This is OPTIONAL. Default is "search".
|
66 |
+
Options: "search" (general web search) or "news" (fresh news articles).
|
67 |
+
Use "news" for time-sensitive, breaking news content.
|
68 |
num_results (int): Number of results to fetch. This is OPTIONAL. Default is 4.
|
69 |
Range: 1-20. More results = more context but longer response time.
|
70 |
|
71 |
Returns:
|
72 |
+
str: Formatted text containing extracted content with metadata (title,
|
73 |
source, date, URL, and main text) for each result, separated by dividers.
|
74 |
Returns error message if API key is missing or search fails.
|
75 |
|
76 |
Examples:
|
77 |
+
- search_web("OpenAI GPT-5", "news", 5) - Get 5 fresh news articles about OpenAI
|
78 |
+
- search_web("python tutorial", "search") - Get 4 general results about Python (default count)
|
79 |
+
- search_web("stock market today", "news", 10) - Get 10 news articles about today's market
|
80 |
+
- search_web("machine learning basics") - Get 4 general search results (all defaults)
|
81 |
"""
|
82 |
if not SERPER_API_KEY:
|
83 |
return "Error: SERPER_API_KEY environment variable is not set. Please set it to use this tool."
|
|
|
86 |
if num_results is None:
|
87 |
num_results = 4
|
88 |
num_results = max(1, min(20, num_results))
|
89 |
+
|
90 |
+
# Validate search_type
|
91 |
+
if search_type not in ["search", "news"]:
|
92 |
+
search_type = "search"
|
93 |
|
94 |
try:
|
95 |
# Check rate limit
|
96 |
if not await limiter.hit(rate_limit, "global"):
|
97 |
return "Error: Rate limit exceeded. Please try again later (limit: 200 requests per hour)."
|
98 |
|
99 |
+
# Select endpoint based on search type
|
100 |
+
endpoint = SERPER_NEWS_ENDPOINT if search_type == "news" else SERPER_SEARCH_ENDPOINT
|
101 |
+
|
102 |
+
# Prepare payload
|
103 |
+
payload = {"q": query, "num": num_results}
|
104 |
+
if search_type == "news":
|
105 |
+
payload["type"] = "news"
|
106 |
+
payload["page"] = 1
|
107 |
+
|
108 |
async with httpx.AsyncClient(timeout=15) as client:
|
109 |
+
resp = await client.post(endpoint, headers=HEADERS, json=payload)
|
110 |
|
111 |
if resp.status_code != 200:
|
112 |
return f"Error: Search API returned status {resp.status_code}. Please check your API key and try again."
|
113 |
|
114 |
+
# Extract results based on search type
|
115 |
+
if search_type == "news":
|
116 |
+
results = resp.json().get("news", [])
|
117 |
+
else:
|
118 |
+
results = resp.json().get("organic", [])
|
119 |
+
|
120 |
+
if not results:
|
121 |
return (
|
122 |
+
f"No {search_type} results found for query: '{query}'. Try a different search term or search type."
|
123 |
)
|
124 |
|
125 |
# Fetch HTML content concurrently
|
126 |
+
urls = [r["link"] for r in results]
|
127 |
async with httpx.AsyncClient(timeout=20, follow_redirects=True) as client:
|
128 |
tasks = [client.get(u) for u in urls]
|
129 |
responses = await asyncio.gather(*tasks, return_exceptions=True)
|
|
|
132 |
chunks = []
|
133 |
successful_extractions = 0
|
134 |
|
135 |
+
for meta, response in zip(results, responses):
|
136 |
if isinstance(response, Exception):
|
137 |
continue
|
138 |
|
|
|
148 |
|
149 |
# Parse and format date
|
150 |
try:
|
151 |
+
# For news results, date is in 'date' field; for search results, it might be in 'snippet'
|
152 |
+
date_str = meta.get("date", "")
|
153 |
+
if date_str:
|
154 |
+
date_iso = dateparser.parse(date_str, fuzzy=True).strftime("%Y-%m-%d")
|
155 |
+
else:
|
156 |
+
date_iso = "Unknown"
|
157 |
except Exception:
|
158 |
+
date_iso = "Unknown"
|
159 |
|
160 |
# Format the chunk
|
161 |
+
# For search results, source might be in 'displayLink' or domain
|
162 |
+
source = meta.get('source', meta.get('displayLink', meta['link'].split('/')[2]))
|
163 |
+
|
164 |
chunk = (
|
165 |
f"## {meta['title']}\n"
|
166 |
+
f"**Source:** {source} "
|
167 |
f"**Date:** {date_iso}\n"
|
168 |
f"**URL:** {meta['link']}\n\n"
|
169 |
f"{body.strip()}\n"
|
|
|
171 |
chunks.append(chunk)
|
172 |
|
173 |
if not chunks:
|
174 |
+
return f"Found {len(results)} {search_type} results for '{query}', but couldn't extract readable content from any of them. The websites might be blocking automated access."
|
175 |
|
176 |
result = "\n---\n".join(chunks)
|
177 |
+
summary = f"Successfully extracted content from {successful_extractions} out of {len(results)} {search_type} results for query: '{query}'\n\n---\n\n"
|
178 |
|
179 |
return summary + result
|
180 |
|
|
|
188 |
"""
|
189 |
# 🔍 Web Search MCP Server
|
190 |
|
191 |
+
This MCP server provides web search capabilities to LLMs. It can perform general web searches
|
192 |
+
or specifically search for fresh news articles, extracting the main content from results.
|
193 |
+
|
194 |
+
**Search Types:**
|
195 |
+
- **General Search**: Diverse results from various sources (blogs, docs, articles, etc.)
|
196 |
+
- **News Search**: Fresh news articles and breaking stories from news sources
|
197 |
|
198 |
**Note:** This interface is primarily designed for MCP tool usage by LLMs, but you can
|
199 |
also test it manually below.
|
|
|
201 |
)
|
202 |
|
203 |
with gr.Row():
|
204 |
+
with gr.Column(scale=3):
|
205 |
+
query_input = gr.Textbox(
|
206 |
+
label="Search Query",
|
207 |
+
placeholder='e.g. "OpenAI news", "climate change 2024", "AI developments"',
|
208 |
+
info="Required: Enter your search query",
|
209 |
+
)
|
210 |
+
with gr.Column(scale=1):
|
211 |
+
search_type_input = gr.Radio(
|
212 |
+
choices=["search", "news"],
|
213 |
+
value="search",
|
214 |
+
label="Search Type",
|
215 |
+
info="Choose search type",
|
216 |
+
)
|
217 |
+
|
218 |
+
with gr.Row():
|
219 |
num_results_input = gr.Slider(
|
220 |
minimum=1,
|
221 |
maximum=20,
|
222 |
value=4,
|
223 |
step=1,
|
224 |
label="Number of Results",
|
225 |
+
info="Optional: How many results to fetch (default: 4)",
|
226 |
)
|
227 |
|
228 |
output = gr.Textbox(
|
|
|
237 |
# Add examples
|
238 |
gr.Examples(
|
239 |
examples=[
|
240 |
+
["OpenAI GPT-5 latest developments", "news", 5],
|
241 |
+
["python programming tutorial", "search", 4],
|
242 |
+
["stock market today breaking news", "news", 6],
|
243 |
+
["machine learning algorithms explained", "search", 8],
|
244 |
+
["climate change 2024 latest news", "news", 4],
|
245 |
+
["web development best practices", "search", 4],
|
246 |
],
|
247 |
+
inputs=[query_input, search_type_input, num_results_input],
|
248 |
outputs=output,
|
249 |
fn=search_web,
|
250 |
cache_examples=False,
|
251 |
)
|
252 |
|
253 |
search_button.click(
|
254 |
+
fn=search_web, inputs=[query_input, search_type_input, num_results_input], outputs=output
|
255 |
)
|
256 |
|
257 |
|