How To Build a News Agent with GPT-OSS, Hugging Face Inference & Gradio

OpenAI recently released GPT-OSS 20B and GPT-OSS 120B, two open-weight language models built for strong reasoning, tool use, and developer flexibility. With Hugging Face Inference Providers, you can run them instantly β no GPU setup or local hosting required β and quickly spin up a demo with Gradio.
This cookbook walks you through building a fully-functional, AI-powered news agent powered by GPT-OSS via Hugging Face. Your agent will:
- Searches top headlines
- Queries specific topics and sites
- Optionally fetches full articles for deeper analysis
- Synthesizes answers with sources
- Runs behind a simple Gradio chat UI -
- Logs traces to Langfuse for debugging and iteration
You can see it in action here.
Table of contents
- What you'll build
- Prerequisites
- Quickstart
- Configure models and routing
- Tools: search, site-search, fetch
- Agent loop: tool use and synthesis
- UI: Gradio chat app
- Observability with Langfuse
1) What you'll build
You're making a news-focused research agent that runs GPT-OSS models via Hugging Face's inference router.
It:
- Picks the right tool for the query (RSS, Serper News, site-specific search, or article fetch)
- Stops tool use when enough data is gathered
- Always returns a final summary with clickable source links
π‘ This is designed as a proof of concept, for responsible use --- the system prompt explicitly limits scraping to user-requested analysis.
2) Prerequisites
You'll need:
- Python 3.10+
- An HF token with inference access
- A Serper API key for Google-style search
- Optional: Langfuse keys for tracing
Install dependencies:
pip install gradio python-dotenv requests trafilatura openai langfuse
π Why
.env
? Keeps tokens out of code, so you can commit safely.
3) Quickstart
- You can clone the repo (
git clone https://huggingface.co/spaces/fdaudens/gpt-oss-news-agent
) or start from scratch by creating the following files structure:
news-agent-gpt-oss/
β
βββ app.py # Main application code (agent logic, tools, Gradio UI)
βββ requirements.txt # Python dependencies to install on HF Spaces
βββ .env # Environment variables (HF token, Serper API key, Langfuse keys)
βββ README.md # (Optional) Documentation / instructions
- Fill
.env
with valid keys:
HF_TOKEN=hf_************************
SERPER_API_KEY=************************
LANGFUSE_PUBLIC_KEY=...
LANGFUSE_SECRET_KEY=...
LANGFUSE_HOST=https://cloud.langfuse.com
- Make sure your
requirements.txt
file contains:
gradio
openai
python-dotenv
trafilatura
langfuse
If you want to run the code locally to test it:
python app.py
Open the URL printed in your terminal to start chatting.
4) Under the Hood: Configure models and routing
We use Hugging Faceβs Router endpoint to call the GPT-OSS models served by Fireworks:
AVAILABLE_MODELS = [
"openai/gpt-oss-120b:fireworks-ai",
"openai/gpt-oss-20b:fireworks-ai"
]
# Default model
DEFAULT_MODEL = "openai/gpt-oss-120b:fireworks-ai"
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=HF_TOKEN
)
Two models available:
openai/gpt-oss-120b:fireworks-ai
(default --- higher reasoning)openai/gpt-oss-20b:fireworks-ai
(faster & cheaper)
β‘ Switch to 20B if youβre aiming for faster responses or need to optimize for cost. In my initial tests, it performed impressively well, so I encourage you to experiment with both models and compare their speed, accuracy, and overall feel.
5) Tools: search, site-search, fetch
Here, we build the tools the agent can access. We declare them in the OpenAI function-calling format and map each one to its corresponding Python function.
a) fetch_google_news_rss(num=10)
- Gets top headlines from Google News RSS\
- Use for: "what's happening today?"\
- Returns title, link, pub date, and source
b) serper_news_search(query, num=5)
- Searches news for a specific topic\
- Use for: "AI regulation", "climate change"\
- Returns title, link, snippet, date, and source
c) serper_site_search(query, site, num=5)
- Restricts to a specific domain\
- Use for: "site:nytimes.com AI chips"\
- Returns title, link, snippet, and favicons
d) fetch_article(url, max_chars=12000)
- Fetches and extracts full article text\
- Only used for deep analysis / quotes\
- Uses Trafilatura for clean text extraction
β οΈ The system prompt prevents unnecessary article scraping
Full code for building the tools:
def fetch_google_news_rss(num: int = 10) -> List[Dict[str, Any]]:
"""Fetch general news from Google News RSS feed."""
try:
url = "https://news.google.com/rss"
r = requests.get(url, timeout=30)
r.raise_for_status()
# Parse RSS XML
root = ET.fromstring(r.content)
items = root.findall('.//item')
results = []
for item in items[:num]:
title = item.find('title')
link = item.find('link')
pub_date = item.find('pubDate')
source = item.find('source')
results.append({
"title": title.text if title is not None else "No title",
"link": link.text if link is not None else "",
"pub_date": pub_date.text if pub_date is not None else "No date",
"source": source.text if source is not None else "Google News"
})
return results
except Exception as e:
return {"ok": False, "error": repr(e)}
def serper_news_search(query: str, num: int = 5) -> List[Dict[str, Any]]:
"""Fetch news for a specific topic or query."""
url = "https://google.serper.dev/news"
headers = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
payload = {"q": query, "gl": "us", "hl": "en", "tbs": "qdr:d"}
r = requests.post(url, headers=headers, json=payload, timeout=30)
r.raise_for_status()
data = r.json()
results = []
for item in data.get("news", [])[:num]:
results.append({
"title": item.get("title"),
"link": item.get("link"),
"snippet": item.get("snippet"),
"date": item.get("date"), # ISO8601 when available
"source": item.get("source")
})
return results
def serper_site_search(query: str, site: str, num: int = 5) -> List[Dict[str, Any]]:
"""Site restricted web search."""
url = "https://google.serper.dev/search"
headers = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
payload = {"q": f"site:{site} {query}", "gl": "us", "hl": "en"}
r = requests.post(url, headers=headers, json=payload, timeout=30)
r.raise_for_status()
data = r.json()
results = []
for item in data.get("organic", [])[:num]:
results.append({
"title": item.get("title"),
"link": item.get("link"),
"snippet": item.get("snippet"),
"favicons": item.get("favicons", {})
})
return results
def fetch_article(url: str, max_chars: int = 12000) -> Dict[str, Any]:
"""Fetch and extract clean article text with trafilatura."""
try:
downloaded = trafilatura.fetch_url(url, timeout=30)
text = trafilatura.extract(downloaded, include_comments=False) if downloaded else None
if not text:
return {"ok": False, "error": "could_not_extract"}
text = text.strip()
if len(text) > max_chars:
text = text[:max_chars] + " ..."
return {"ok": True, "text": text}
except Exception as e:
return {"ok": False, "error": repr(e)}
# OpenAI-style tool specs for function calling
TOOLS = [
{
"type": "function",
"function": {
"name": "fetch_google_news_rss",
"description": "Fetch general top headlines from Google News RSS feed. Use this when you want to see what's happening in the world today without a specific topic focus.",
"parameters": {
"type": "object",
"properties": {
"num": {"type": "integer", "minimum": 1, "maximum": 20, "description": "Number of news items to fetch"}
},
"required": []
}
}
},
{
"type": "function",
"function": {
"name": "serper_news_search",
"description": "Search Google News for articles about a specific topic or query. Use this when you need news about particular subjects, companies, or events.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"num": {"type": "integer", "minimum": 1, "maximum": 20}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "serper_site_search",
"description": "Search a specific news domain for relevant articles.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"site": {"type": "string", "description": "Domain like ft.com or nytimes.com"},
"num": {"type": "integer", "minimum": 1, "maximum": 10}
},
"required": ["query", "site"]
}
}
},
{
"type": "function",
"function": {
"name": "fetch_article",
"description": "Download and extract the main text of an article from a URL. ONLY use this when the user asks specific questions about article content, details, or wants to analyze/quote from particular articles. Do NOT use this for general news summaries or overviews.",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string"},
"max_chars": {"type": "integer", "minimum": 1000, "maximum": 60000}
},
"required": ["url"]
}
}
}
]
FUNCTION_MAP = {
"fetch_google_news_rss": fetch_google_news_rss,
"serper_news_search": serper_news_search,
"serper_site_search": serper_site_search,
"fetch_article": fetch_article,
}
6) Agent loop: tool use and synthesis
Goal: Let the model choose tools, run them, and synthesize results into a final answer.
This is the heart of the agent β the decision-making loop where GPT-OSS decides when and how to use tools, processes their outputs, and turns them into a final, well-sourced answer.
The loop works by:
- Giving the model a clear system prompt that acts like a playbook for tool selection.
- Letting the model autonomously call tools in sequence (via OpenAI function-calling).
- Passing the results of each tool call back into the conversation.
- Nudging the model to stop calling tools and synthesize once enough data has been gathered.
This design keeps responses efficient, prevents unnecessary tool calls, and ensures every answer includes sources.
Flow:
- System message = playbook: clear rules for when to use each tool
- On each turn: - Call
chat.completions.create(...)
withtool_choice="auto"
- If tools are requested, run them and append results to
messages
- After a couple of tool calls, nudge the model:
"You now have sufficient information. Please provide your final answer with sources."
- Cap at 6 steps to avoid loops
- Return answer or friendly error if it fails
Full code for building the agent:
def call_model(messages: List[Dict[str, str]], tools=TOOLS, temperature: float = 0.3, model: str = DEFAULT_MODEL):
"""One step with tool calling support."""
try:
return client.chat.completions.create(
model=model,
temperature=temperature,
messages=messages,
tools=tools,
tool_choice="auto"
)
except Exception as e:
print(f"Error calling model: {e}")
raise
def run_agent(user_prompt: str, site_limit: Optional[str] = None, model: str = DEFAULT_MODEL) -> str:
"""
High level prompt for a news agent.
It may search, read links, then synthesize and cite URLs.
"""
system = {
"role": "system",
"content": (
"You are a careful news agent. Follow these steps:\n"
"1. For general news requests: Use fetch_google_news_rss to get top headlines\n"
"2. For specific topic requests: Use serper_news_search with the topic\n"
"3. ONLY use fetch_article when the user asks specific questions about article content, details, or wants to analyze/quote from particular articles\n"
"4. For general news summaries, provide information based on headlines and snippets without fetching full articles\n"
"5. STOP calling tools and provide your final answer\n"
"6. Always include a bullet list of sources with URLs\n"
"IMPORTANT: After reading articles (if any), you must provide your final answer without calling more tools.\n\n"
"TOOL SELECTION GUIDE:\n"
"- fetch_google_news_rss: Use for 'what's happening today' or 'top news' requests\n"
"- serper_news_search: Use for specific topics like 'AI chips', 'Nvidia', 'climate change'\n"
"- serper_site_search: Use when restricted to specific news sources\n"
"- fetch_article: ONLY use when user asks about specific article content, details, or wants to analyze particular articles\n"
"PRIORITY: For general news requests, provide summaries based on headlines and snippets. Only fetch full articles when specifically needed for detailed analysis.\n"
),
}
messages: List[Dict[str, str]] = [system, {"role": "user", "content": user_prompt}]
if site_limit:
messages.append({"role": "user", "content": f"Restrict searches to {site_limit} when appropriate."})
for step in range(6): # small safety cap
try:
resp = call_model(messages, model=model)
msg = resp.choices[0].message
# If the model wants to call tools
if getattr(msg, "tool_calls", None) and msg.tool_calls:
# Add the assistant message with tool calls to the conversation
assistant_message = {
"role": "assistant",
"content": msg.content or "",
"tool_calls": [
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments
}
}
for tool_call in msg.tool_calls
]
}
messages.append(assistant_message)
# Process each tool call
for tool_call in msg.tool_calls:
name = tool_call.function.name
args = {}
try:
args = json.loads(tool_call.function.arguments or "{}")
except json.JSONDecodeError:
args = {}
fn = FUNCTION_MAP.get(name)
if not fn:
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"name": name,
"content": json.dumps({"ok": False, "error": "unknown_tool"})
})
continue
try:
result = fn(**args)
except TypeError as e:
result = {"ok": False, "error": f"bad_args: {e}"}
except Exception as e:
result = {"ok": False, "error": repr(e)}
tool_response = {
"role": "tool",
"tool_call_id": tool_call.id,
"name": name,
"content": json.dumps(result),
}
messages.append(tool_response)
# After processing tools, add a reminder to synthesize
if step >= 2: # After 2+ tool calls, encourage synthesis
messages.append({
"role": "user",
"content": "You now have sufficient information. Please provide your final answer with sources."
})
# Continue loop so the model can see tool outputs
continue
# If we have a final assistant message without tool calls
if msg.content:
return msg.content
# Fallback tiny sleep then continue
time.sleep(0.2)
except Exception as e:
# If there's an error, try to continue or return error message
if step == 5: # Last step
return f"Error occurred during processing: {e}"
time.sleep(0.5)
continue
return "I could not complete the task within the step limit. Try refining your query."
7) UI: Gradio chat app
The Gradio interface turns your backend logic into an interactive web app with almost no extra code. It gives users: β’ A simple chat window for questions and answers β’ A model selector for switching between GPT-OSS variants β’ Example prompts to guide usage
This makes your agent easy to demo, share, and iterate on without building a custom frontend.
Every interaction is wrapped with Langfuse tracing, so you can inspect inputs, outputs, tool usage, and errors β making it easy to debug, fine-tune prompts, and monitor performance in real time.
- Wrap
chat_with_agent
in@observe()
- Logs inputs, model choice, and history length
- Logs outputs + metadata (length, success), and errors with
success=False
Code:
@observe()
def chat_with_agent(message, history, model):
"""Handle chat messages and return agent responses."""
if not message.strip():
return history
lf = get_client()
lf.update_current_trace(
input={"user_message": message, "model": model, "history_length": len(history)}
)
try:
response = run_agent(message, None, model)
lf.update_current_trace(
output={"agent_response": response},
metadata={
"model": model,
"message_length": len(message),
"response_length": len(response),
"success": True,
},
)
history.append({"role": "user", "content": message})
history.append({"role": "assistant", "content": response})
return history
except Exception as e:
lf.update_current_trace(
output={"error": str(e)},
metadata={"success": False, "error": str(e)},
)
error_msg = f"Sorry, I encountered an error: {str(e)}"
history.append({"role": "user", "content": message})
history.append({"role": "assistant", "content": error_msg})
return history
def clear_chat():
"""Clear the chat history."""
return [], ""
# Create the Gradio interface
with gr.Blocks(
title="Chat with the News",
theme=gr.themes.Monochrome()
) as demo:
# Header using Gradio markdown
gr.Markdown("""
# π° Chat with the News
Your AI-powered news research assistant with real-time search capabilities, based on [GPT-OSS models](https://huggingface.co/collections/openai/gpt-oss-68911959590a1634ba11c7a4) and running on inference providers.
""")
# Examples section using Gradio markdown
gr.Markdown("""
### π‘ Try these examples:
- **General:** "What are the top news stories today?"
- **Specific topic:** "What's the latest on artificial intelligence?"
- **Site-specific:** "What's the latest climate change news on the BBC?"
""")
# Model selector
model_selector = gr.Dropdown(
choices=AVAILABLE_MODELS,
value=DEFAULT_MODEL,
label="π€ Select Model",
info="Choose between GPT-OSS 120B and 20B models"
)
# Message input
msg = gr.Textbox(
label="Ask me about the news",
placeholder="What would you like to know about today?",
lines=2
)
# Buttons in a row
with gr.Row():
submit_btn = gr.Button("π Send", variant="primary", size="lg")
clear_btn = gr.Button("ποΈ Clear Chat", variant="secondary", size="lg")
# Chat interface
chatbot = gr.Chatbot(
label="News Agent",
height=500,
show_label=False,
container=True,
type="messages"
)
# Event handlers
submit_btn.click(
chat_with_agent,
inputs=[msg, chatbot, model_selector],
outputs=[chatbot],
show_progress=True
)
msg.submit(
chat_with_agent,
inputs=[msg, chatbot, model_selector],
outputs=[chatbot],
show_progress=True
)
clear_btn.click(
clear_chat,
outputs=[chatbot, msg]
)
# Instructions using Gradio markdown
gr.Markdown("""
---
### βΉοΈ How it works
This AI agent can search Google News, fetch articles from specific sources, and provide comprehensive news summaries with proper citations. It uses real-time data and can restrict searches to specific news domains when requested.
**Model Selection:**
- **GPT-OSS 120B**: Larger, more capable model for complex reasoning tasks
- **GPT-OSS 20B**: Faster, more efficient model for quick responses
""")
# Launch the app
if __name__ == "__main__":
demo.launch(
server_name="0.0.0.0",
server_port=7860,
share=False,
show_error=True
)
Recap
You now have:
- A lightweight, controlled news agent
- GPT-OSS via Hugging Face inference
- Tool selection rules
- Langfuse tracing for observability
- A ready-to-use Gradio UI