victor HF Staff commited on
Commit
9d978bc
·
1 Parent(s): 6ef48c6

Enhance README and app.py: clarify search functionality, add search type options, and improve usage examples for web search capabilities.

Browse files
Files changed (2) hide show
  1. README.md +43 -12
  2. app.py +94 -40
README.md CHANGED
@@ -11,11 +11,14 @@ pinned: false
11
 
12
  # Web Search MCP Server
13
 
14
- A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from recent news articles.
15
 
16
  ## Features
17
 
18
- - **Real-time web search**: Search for recent news on any topic
 
 
 
19
  - **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
20
  - **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
21
  - **Structured output**: Returns formatted content with metadata (title, source, date, URL)
@@ -84,17 +87,23 @@ For clients that support URL-based MCP servers:
84
 
85
  ### `search_web` Function
86
 
87
- **Purpose**: Search the web for recent news and extract article content.
88
 
89
  **Parameters**:
90
  - `query` (str, **REQUIRED**): The search query
91
- - Examples: "OpenAI news", "climate change 2024", "python updates"
92
 
93
  - `num_results` (int, **OPTIONAL**): Number of results to fetch
94
  - Default: 4
95
  - Range: 1-20
96
  - More results provide more context but take longer
97
 
 
 
 
 
 
 
98
  **Returns**: Formatted text containing:
99
  - Summary of extraction results
100
  - For each article:
@@ -103,11 +112,31 @@ For clients that support URL-based MCP servers:
103
  - URL
104
  - Extracted main content
105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  **Example Usage in LLM**:
107
  ```
108
- "Search for recent developments in artificial intelligence"
109
- "Find 10 articles about climate change in 2024"
110
- "Get news about Python programming language updates"
 
 
 
 
 
 
111
  ```
112
 
113
  ## Error Handling
@@ -128,17 +157,19 @@ You can test the server manually:
128
 
129
  ## Tips for LLM Usage
130
 
131
- 1. **Be specific with queries**: More specific queries yield better results
132
- 2. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
133
- 3. **Check dates**: The tool shows article dates for temporal context
134
- 4. **Follow up**: Use the extracted content to ask follow-up questions
 
135
 
136
  ## Limitations
137
 
138
  - Rate limited to 200 requests per hour
139
- - Only searches news articles (not general web pages)
140
  - Extraction quality depends on website structure
141
  - Some websites may block automated access
 
 
142
 
143
  ## Troubleshooting
144
 
 
11
 
12
  # Web Search MCP Server
13
 
14
+ A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles.
15
 
16
  ## Features
17
 
18
+ - **Dual search modes**:
19
+ - **General Search**: Get diverse results from blogs, documentation, articles, and more
20
+ - **News Search**: Find fresh news articles and breaking stories from news sources
21
+ - **Real-time web search**: Search for any topic with up-to-date results
22
  - **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
23
  - **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
24
  - **Structured output**: Returns formatted content with metadata (title, source, date, URL)
 
87
 
88
  ### `search_web` Function
89
 
90
+ **Purpose**: Search the web for information or fresh news and extract content.
91
 
92
  **Parameters**:
93
  - `query` (str, **REQUIRED**): The search query
94
+ - Examples: "OpenAI news", "climate change 2024", "python tutorial"
95
 
96
  - `num_results` (int, **OPTIONAL**): Number of results to fetch
97
  - Default: 4
98
  - Range: 1-20
99
  - More results provide more context but take longer
100
 
101
+ - `search_type` (str, **OPTIONAL**): Type of search to perform
102
+ - Default: "search" (general web search)
103
+ - Options: "search" or "news"
104
+ - Use "news" for fresh, time-sensitive news articles
105
+ - Use "search" for general information, documentation, tutorials
106
+
107
  **Returns**: Formatted text containing:
108
  - Summary of extraction results
109
  - For each article:
 
112
  - URL
113
  - Extracted main content
114
 
115
+ **When to use each search type**:
116
+ - **Use "news" mode for**:
117
+ - Breaking news or very recent events
118
+ - Time-sensitive information ("today", "this week")
119
+ - Current affairs and latest developments
120
+ - Press releases and announcements
121
+
122
+ - **Use "search" mode for**:
123
+ - General information and research
124
+ - Technical documentation or tutorials
125
+ - Historical information
126
+ - Diverse perspectives from various sources
127
+ - How-to guides and explanations
128
+
129
  **Example Usage in LLM**:
130
  ```
131
+ # News mode examples
132
+ "Search for breaking news about OpenAI" -> uses news mode
133
+ "Find today's stock market updates" -> uses news mode
134
+ "Get latest climate change developments" -> uses news mode
135
+
136
+ # Search mode examples (default)
137
+ "Search for Python programming tutorials" -> uses search mode
138
+ "Find information about machine learning algorithms" -> uses search mode
139
+ "Research historical data about climate change" -> uses search mode
140
  ```
141
 
142
  ## Error Handling
 
157
 
158
  ## Tips for LLM Usage
159
 
160
+ 1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
161
+ 2. **Be specific with queries**: More specific queries yield better results
162
+ 3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
163
+ 4. **Check dates**: The tool shows article dates for temporal context
164
+ 5. **Follow up**: Use the extracted content to ask follow-up questions
165
 
166
  ## Limitations
167
 
168
  - Rate limited to 200 requests per hour
 
169
  - Extraction quality depends on website structure
170
  - Some websites may block automated access
171
+ - News mode focuses on recent articles from news sources
172
+ - Search mode provides diverse results but may include older content
173
 
174
  ## Troubleshooting
175
 
app.py CHANGED
@@ -29,7 +29,8 @@ from limits.aio.strategies import MovingWindowRateLimiter
29
 
30
  # Configuration
31
  SERPER_API_KEY = os.getenv("SERPER_API_KEY")
32
- SERPER_ENDPOINT = "https://google.serper.dev/news"
 
33
  HEADERS = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
34
 
35
  # Rate limiting
@@ -38,29 +39,45 @@ limiter = MovingWindowRateLimiter(storage)
38
  rate_limit = parse("200/hour")
39
 
40
 
41
- async def search_web(query: str, num_results: Optional[int] = 4) -> str:
42
  """
43
- Search the web for recent news and information, returning extracted content.
44
 
45
- This tool searches for recent news articles related to your query and extracts
46
- the main content from each article, providing you with fresh, relevant information
47
- from the web.
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  Args:
50
  query (str): The search query. This is REQUIRED. Examples: "apple inc earnings",
51
  "climate change 2024", "AI developments"
 
 
 
52
  num_results (int): Number of results to fetch. This is OPTIONAL. Default is 4.
53
  Range: 1-20. More results = more context but longer response time.
54
 
55
  Returns:
56
- str: Formatted text containing extracted article content with metadata (title,
57
  source, date, URL, and main text) for each result, separated by dividers.
58
  Returns error message if API key is missing or search fails.
59
 
60
  Examples:
61
- - search_web("OpenAI news", 5) - Get 5 recent news articles about OpenAI
62
- - search_web("python 3.13 features") - Get 4 articles about Python 3.13
63
- - search_web("stock market today", 10) - Get 10 articles about today's market
 
64
  """
65
  if not SERPER_API_KEY:
66
  return "Error: SERPER_API_KEY environment variable is not set. Please set it to use this tool."
@@ -69,28 +86,44 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
69
  if num_results is None:
70
  num_results = 4
71
  num_results = max(1, min(20, num_results))
 
 
 
 
72
 
73
  try:
74
  # Check rate limit
75
  if not await limiter.hit(rate_limit, "global"):
76
  return "Error: Rate limit exceeded. Please try again later (limit: 200 requests per hour)."
77
 
78
- # Search for news
79
- payload = {"q": query, "type": "news", "num": num_results, "page": 1}
 
 
 
 
 
 
 
80
  async with httpx.AsyncClient(timeout=15) as client:
81
- resp = await client.post(SERPER_ENDPOINT, headers=HEADERS, json=payload)
82
 
83
  if resp.status_code != 200:
84
  return f"Error: Search API returned status {resp.status_code}. Please check your API key and try again."
85
 
86
- news_items = resp.json().get("news", [])
87
- if not news_items:
 
 
 
 
 
88
  return (
89
- f"No results found for query: '{query}'. Try a different search term."
90
  )
91
 
92
  # Fetch HTML content concurrently
93
- urls = [n["link"] for n in news_items]
94
  async with httpx.AsyncClient(timeout=20, follow_redirects=True) as client:
95
  tasks = [client.get(u) for u in urls]
96
  responses = await asyncio.gather(*tasks, return_exceptions=True)
@@ -99,7 +132,7 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
99
  chunks = []
100
  successful_extractions = 0
101
 
102
- for meta, response in zip(news_items, responses):
103
  if isinstance(response, Exception):
104
  continue
105
 
@@ -115,16 +148,22 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
115
 
116
  # Parse and format date
117
  try:
118
- date_iso = dateparser.parse(meta.get("date", ""), fuzzy=True).strftime(
119
- "%Y-%m-%d"
120
- )
 
 
 
121
  except Exception:
122
- date_iso = meta.get("date", "Unknown")
123
 
124
  # Format the chunk
 
 
 
125
  chunk = (
126
  f"## {meta['title']}\n"
127
- f"**Source:** {meta['source']} "
128
  f"**Date:** {date_iso}\n"
129
  f"**URL:** {meta['link']}\n\n"
130
  f"{body.strip()}\n"
@@ -132,10 +171,10 @@ async def search_web(query: str, num_results: Optional[int] = 4) -> str:
132
  chunks.append(chunk)
133
 
134
  if not chunks:
135
- return f"Found {len(news_items)} results for '{query}', but couldn't extract readable content from any of them. The websites might be blocking automated access."
136
 
137
  result = "\n---\n".join(chunks)
138
- summary = f"Successfully extracted content from {successful_extractions} out of {len(news_items)} search results for query: '{query}'\n\n---\n\n"
139
 
140
  return summary + result
141
 
@@ -149,8 +188,12 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
149
  """
150
  # 🔍 Web Search MCP Server
151
 
152
- This MCP server provides web search capabilities to LLMs. It searches for recent news
153
- and extracts the main content from articles.
 
 
 
 
154
 
155
  **Note:** This interface is primarily designed for MCP tool usage by LLMs, but you can
156
  also test it manually below.
@@ -158,18 +201,28 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
158
  )
159
 
160
  with gr.Row():
161
- query_input = gr.Textbox(
162
- label="Search Query",
163
- placeholder='e.g. "OpenAI news", "climate change 2024", "AI developments"',
164
- info="Required: Enter your search query",
165
- )
 
 
 
 
 
 
 
 
 
 
166
  num_results_input = gr.Slider(
167
  minimum=1,
168
  maximum=20,
169
  value=4,
170
  step=1,
171
  label="Number of Results",
172
- info="Optional: How many articles to fetch (default: 4)",
173
  )
174
 
175
  output = gr.Textbox(
@@ -184,20 +237,21 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
184
  # Add examples
185
  gr.Examples(
186
  examples=[
187
- ["OpenAI GPT-5 news", 5],
188
- ["climate change 2024", 4],
189
- ["artificial intelligence breakthroughs", 8],
190
- ["stock market today", 6],
191
- ["python programming updates", 4],
 
192
  ],
193
- inputs=[query_input, num_results_input],
194
  outputs=output,
195
  fn=search_web,
196
  cache_examples=False,
197
  )
198
 
199
  search_button.click(
200
- fn=search_web, inputs=[query_input, num_results_input], outputs=output
201
  )
202
 
203
 
 
29
 
30
  # Configuration
31
  SERPER_API_KEY = os.getenv("SERPER_API_KEY")
32
+ SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
33
+ SERPER_NEWS_ENDPOINT = "https://google.serper.dev/news"
34
  HEADERS = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
35
 
36
  # Rate limiting
 
39
  rate_limit = parse("200/hour")
40
 
41
 
42
+ async def search_web(query: str, search_type: str = "search", num_results: Optional[int] = 4) -> str:
43
  """
44
+ Search the web for information or fresh news, returning extracted content.
45
 
46
+ This tool can perform two types of searches:
47
+ - "search" (default): General web search for diverse, relevant content from various sources
48
+ - "news": Specifically searches for fresh news articles and breaking stories
49
+
50
+ Use "news" mode when looking for:
51
+ - Breaking news or very recent events
52
+ - Time-sensitive information
53
+ - Current affairs and latest developments
54
+ - Today's/this week's happenings
55
+
56
+ Use "search" mode (default) for:
57
+ - General information and research
58
+ - Technical documentation or guides
59
+ - Historical information
60
+ - Diverse perspectives from various sources
61
 
62
  Args:
63
  query (str): The search query. This is REQUIRED. Examples: "apple inc earnings",
64
  "climate change 2024", "AI developments"
65
+ search_type (str): Type of search. This is OPTIONAL. Default is "search".
66
+ Options: "search" (general web search) or "news" (fresh news articles).
67
+ Use "news" for time-sensitive, breaking news content.
68
  num_results (int): Number of results to fetch. This is OPTIONAL. Default is 4.
69
  Range: 1-20. More results = more context but longer response time.
70
 
71
  Returns:
72
+ str: Formatted text containing extracted content with metadata (title,
73
  source, date, URL, and main text) for each result, separated by dividers.
74
  Returns error message if API key is missing or search fails.
75
 
76
  Examples:
77
+ - search_web("OpenAI GPT-5", "news", 5) - Get 5 fresh news articles about OpenAI
78
+ - search_web("python tutorial", "search") - Get 4 general results about Python (default count)
79
+ - search_web("stock market today", "news", 10) - Get 10 news articles about today's market
80
+ - search_web("machine learning basics") - Get 4 general search results (all defaults)
81
  """
82
  if not SERPER_API_KEY:
83
  return "Error: SERPER_API_KEY environment variable is not set. Please set it to use this tool."
 
86
  if num_results is None:
87
  num_results = 4
88
  num_results = max(1, min(20, num_results))
89
+
90
+ # Validate search_type
91
+ if search_type not in ["search", "news"]:
92
+ search_type = "search"
93
 
94
  try:
95
  # Check rate limit
96
  if not await limiter.hit(rate_limit, "global"):
97
  return "Error: Rate limit exceeded. Please try again later (limit: 200 requests per hour)."
98
 
99
+ # Select endpoint based on search type
100
+ endpoint = SERPER_NEWS_ENDPOINT if search_type == "news" else SERPER_SEARCH_ENDPOINT
101
+
102
+ # Prepare payload
103
+ payload = {"q": query, "num": num_results}
104
+ if search_type == "news":
105
+ payload["type"] = "news"
106
+ payload["page"] = 1
107
+
108
  async with httpx.AsyncClient(timeout=15) as client:
109
+ resp = await client.post(endpoint, headers=HEADERS, json=payload)
110
 
111
  if resp.status_code != 200:
112
  return f"Error: Search API returned status {resp.status_code}. Please check your API key and try again."
113
 
114
+ # Extract results based on search type
115
+ if search_type == "news":
116
+ results = resp.json().get("news", [])
117
+ else:
118
+ results = resp.json().get("organic", [])
119
+
120
+ if not results:
121
  return (
122
+ f"No {search_type} results found for query: '{query}'. Try a different search term or search type."
123
  )
124
 
125
  # Fetch HTML content concurrently
126
+ urls = [r["link"] for r in results]
127
  async with httpx.AsyncClient(timeout=20, follow_redirects=True) as client:
128
  tasks = [client.get(u) for u in urls]
129
  responses = await asyncio.gather(*tasks, return_exceptions=True)
 
132
  chunks = []
133
  successful_extractions = 0
134
 
135
+ for meta, response in zip(results, responses):
136
  if isinstance(response, Exception):
137
  continue
138
 
 
148
 
149
  # Parse and format date
150
  try:
151
+ # For news results, date is in 'date' field; for search results, it might be in 'snippet'
152
+ date_str = meta.get("date", "")
153
+ if date_str:
154
+ date_iso = dateparser.parse(date_str, fuzzy=True).strftime("%Y-%m-%d")
155
+ else:
156
+ date_iso = "Unknown"
157
  except Exception:
158
+ date_iso = "Unknown"
159
 
160
  # Format the chunk
161
+ # For search results, source might be in 'displayLink' or domain
162
+ source = meta.get('source', meta.get('displayLink', meta['link'].split('/')[2]))
163
+
164
  chunk = (
165
  f"## {meta['title']}\n"
166
+ f"**Source:** {source} "
167
  f"**Date:** {date_iso}\n"
168
  f"**URL:** {meta['link']}\n\n"
169
  f"{body.strip()}\n"
 
171
  chunks.append(chunk)
172
 
173
  if not chunks:
174
+ return f"Found {len(results)} {search_type} results for '{query}', but couldn't extract readable content from any of them. The websites might be blocking automated access."
175
 
176
  result = "\n---\n".join(chunks)
177
+ summary = f"Successfully extracted content from {successful_extractions} out of {len(results)} {search_type} results for query: '{query}'\n\n---\n\n"
178
 
179
  return summary + result
180
 
 
188
  """
189
  # 🔍 Web Search MCP Server
190
 
191
+ This MCP server provides web search capabilities to LLMs. It can perform general web searches
192
+ or specifically search for fresh news articles, extracting the main content from results.
193
+
194
+ **Search Types:**
195
+ - **General Search**: Diverse results from various sources (blogs, docs, articles, etc.)
196
+ - **News Search**: Fresh news articles and breaking stories from news sources
197
 
198
  **Note:** This interface is primarily designed for MCP tool usage by LLMs, but you can
199
  also test it manually below.
 
201
  )
202
 
203
  with gr.Row():
204
+ with gr.Column(scale=3):
205
+ query_input = gr.Textbox(
206
+ label="Search Query",
207
+ placeholder='e.g. "OpenAI news", "climate change 2024", "AI developments"',
208
+ info="Required: Enter your search query",
209
+ )
210
+ with gr.Column(scale=1):
211
+ search_type_input = gr.Radio(
212
+ choices=["search", "news"],
213
+ value="search",
214
+ label="Search Type",
215
+ info="Choose search type",
216
+ )
217
+
218
+ with gr.Row():
219
  num_results_input = gr.Slider(
220
  minimum=1,
221
  maximum=20,
222
  value=4,
223
  step=1,
224
  label="Number of Results",
225
+ info="Optional: How many results to fetch (default: 4)",
226
  )
227
 
228
  output = gr.Textbox(
 
237
  # Add examples
238
  gr.Examples(
239
  examples=[
240
+ ["OpenAI GPT-5 latest developments", "news", 5],
241
+ ["python programming tutorial", "search", 4],
242
+ ["stock market today breaking news", "news", 6],
243
+ ["machine learning algorithms explained", "search", 8],
244
+ ["climate change 2024 latest news", "news", 4],
245
+ ["web development best practices", "search", 4],
246
  ],
247
+ inputs=[query_input, search_type_input, num_results_input],
248
  outputs=output,
249
  fn=search_web,
250
  cache_examples=False,
251
  )
252
 
253
  search_button.click(
254
+ fn=search_web, inputs=[query_input, search_type_input, num_results_input], outputs=output
255
  )
256
 
257