samspeaks5 commited on
Commit
d445f2a
·
verified ·
1 Parent(s): 6bec675

initial commit

Browse files
.env.example ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # API Keys
2
+ # Get your Brave Search API key from https://brave.com/search/api/
3
+ # Free tier: 1 request per minute, 2000 per month
4
+ BRAVE_API_KEY=your_brave_api_key_here
5
+
6
+ # Get your OpenAI API key from https://platform.openai.com/api-keys
7
+ OPENAI_API_KEY=your_openai_api_key_here
8
+
9
+ # Get your Tavily API key from https://tavily.com
10
+ # Free tier: 1000 requests per month
11
+ TAVILY_API_KEY=your_tavily_api_key_here
12
+
13
+ # Optional Configuration
14
+ # Set to True or False to enable/disable detailed logging
15
+ VERBOSE=False
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/assistant_avatar.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,12 +1,146 @@
1
- ---
2
- title: Web Research Agent
3
- emoji: 🌍
4
- colorFrom: pink
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.26.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Web Research Agent
2
+
3
+ A powerful AI research assistant built with CrewAI that conducts comprehensive web research on any topic, providing factual, cited responses through a multi-agent approach.
4
+
5
+ ## Overview
6
+
7
+ This application uses specialized AI agents working together to:
8
+ 1. Refine search queries for optimal results
9
+ 2. Search the web across multiple search engines
10
+ 3. Analyze and verify content
11
+ 4. Produce well-structured, factual responses with proper citations
12
+
13
+ ## Setup Instructions
14
+
15
+ ### Prerequisites
16
+
17
+ - Python 3.9+ (recommended: Python 3.11)
18
+ - API keys for:
19
+ - OpenAI (required)
20
+ - Brave Search (recommended)
21
+ - Tavily Search (optional)
22
+
23
+ ### Installation
24
+
25
+ 1. Clone the repository and navigate to the project directory:
26
+ ```bash
27
+ git clone https://github.com/yourusername/web-research-agent.git
28
+ cd web-research-agent
29
+ ```
30
+
31
+ 2. Install required dependencies:
32
+ ```bash
33
+ pip install -r requirements.txt
34
+ ```
35
+
36
+ 3. Create a `.env` file in the root directory with your API keys:
37
+ ```
38
+ OPENAI_API_KEY=your_openai_api_key
39
+ BRAVE_API_KEY=your_brave_api_key
40
+ TAVILY_API_KEY=your_tavily_api_key
41
+ VERBOSE=False # Set to True for detailed logging
42
+ ```
43
+
44
+ ### Running the Application
45
+
46
+ Start the web interface:
47
+ ```bash
48
+ python app.py
49
+ ```
50
+
51
+ The application will be available at http://localhost:7860
52
+
53
+ ## Common Issues & Troubleshooting
54
+
55
+ ### Pydantic/CrewAI Compatibility Issues
56
+
57
+ If you encounter errors like:
58
+ ```
59
+ AttributeError: 'property' object has no attribute 'model_fields'
60
+ ```
61
+
62
+ Try the following fixes:
63
+
64
+ 1. Update to the latest CrewAI version:
65
+ ```bash
66
+ pip install -U crewai crewai-tools
67
+ ```
68
+
69
+ 2. If issues persist, temporarily modify the `tools/rate_limited_tool.py` file to fix compatibility with Pydantic.
70
+
71
+ ### Search API Rate Limits
72
+
73
+ - Brave Search API has a free tier limit of 1 request per minute and 2,000 requests per month
74
+ - The application implements rate limiting to prevent API throttling
75
+ - Research queries may take several minutes to complete due to these limitations
76
+
77
+ ### Gradio Interface Issues
78
+
79
+ If the interface fails to load or throws errors:
80
+
81
+ 1. Try installing a specific Gradio version:
82
+ ```bash
83
+ pip install gradio==4.26.0
84
+ ```
85
+
86
+ 2. Clear your browser cache to remove cached JavaScript files
87
+
88
+ 3. Run the headless test script as an alternative:
89
+ ```bash
90
+ python test.py "Your research question"
91
+ ```
92
+
93
+ ## Advanced Usage
94
+
95
+ ### Command Line Operation
96
+
97
+ Test the research engine without the web interface:
98
+ ```
99
+ python test.py "Your research query here"
100
+ ```
101
+
102
+ ### Environment Variables
103
+
104
+ - `OPENAI_API_KEY`: Required for language model access
105
+ - `BRAVE_API_KEY`: Recommended for web search functionality
106
+ - `TAVILY_API_KEY`: Optional alternative search engine
107
+ - `VERBOSE`: Set to True/False to control logging detail
108
+
109
+ ## Deployment
110
+
111
+ This project can be deployed to Hugging Face Spaces for web access.
112
+
113
+ ### Hugging Face Spaces Deployment
114
+
115
+ 1. **Create a new Space on Hugging Face**
116
+ - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
117
+ - Click "Create new Space"
118
+ - Choose a name and select "Gradio" as the SDK
119
+ - Set visibility as needed
120
+
121
+ 2. **Configure Environment Variables**
122
+ - In Space settings, add required API keys as secrets
123
+
124
+ 3. **Deploy Code**
125
+ ```bash
126
+ git clone https://huggingface.co/spaces/your-username/your-space-name
127
+ cd your-space-name
128
+ cp -r /path/to/web-research-agent/* .
129
+ git add .
130
+ git commit -m "Initial deployment"
131
+ git push
132
+ ```
133
+
134
+ ### Security Notes
135
+
136
+ - Never commit your `.env` file or expose API keys
137
+ - Use repository secrets in Hugging Face Spaces
138
+ - Keep sensitive deployments private
139
+
140
+ ## Development Structure
141
+
142
+ - `app.py`: Web interface and session management
143
+ - `research_engine.py`: Core research orchestration logic
144
+ - `agents.py`: Agent definitions and configurations
145
+ - `tools/`: Search and analysis tools
146
+ - `test.py`: Command-line testing utility
agents.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Dict, Any, Optional
2
+ from crewai import Agent
3
+ from crewai_tools import BraveSearchTool, ScrapeWebsiteTool
4
+ from tools import ContentAnalyzerTool, RateLimitedToolWrapper, TavilySearchTool, SearchRotationTool
5
+
6
+ def create_researcher_agent(llm=None, verbose=True) -> Agent:
7
+ """
8
+ Creates a researcher agent responsible for query refinement and web search.
9
+
10
+ Args:
11
+ llm: Language model to use for the agent
12
+ verbose: Whether to log agent activity
13
+
14
+ Returns:
15
+ Configured researcher agent
16
+ """
17
+ # Initialize search tools
18
+ brave_search_tool = BraveSearchTool(
19
+ n_results=5,
20
+ save_file=False
21
+ )
22
+
23
+ # Initialize Tavily search tool
24
+ # Requires a TAVILY_API_KEY in environment variables
25
+ tavily_search_tool = TavilySearchTool(
26
+ max_results=5,
27
+ search_depth="basic",
28
+ timeout=15 # Increase timeout for more reliable results
29
+ )
30
+
31
+ # Add minimal rate limiting to avoid API throttling
32
+ # Set delay to 0 to disable rate limiting completely
33
+ rate_limited_brave_search = RateLimitedToolWrapper(tool=brave_search_tool, delay=0)
34
+ rate_limited_tavily_search = RateLimitedToolWrapper(tool=tavily_search_tool, delay=0)
35
+
36
+ # Create the search rotation tool
37
+ search_rotation_tool = SearchRotationTool(
38
+ search_tools=[rate_limited_brave_search, rate_limited_tavily_search],
39
+ max_searches_per_query=5 # Limit to 5 searches per query as requested
40
+ )
41
+
42
+ return Agent(
43
+ role="Research Specialist",
44
+ goal="Discover accurate and relevant information from the web",
45
+ backstory=(
46
+ "You are an expert web researcher with a talent for crafting effective search queries "
47
+ "and finding high-quality information on any topic. Your goal is to find the most "
48
+ "relevant and factual information to answer user questions. You have access to multiple "
49
+ "search engines and know how to efficiently use them within the search limits."
50
+ ),
51
+ # Use the search rotation tool
52
+ tools=[search_rotation_tool],
53
+ verbose=verbose,
54
+ allow_delegation=True,
55
+ memory=True,
56
+ llm=llm
57
+ )
58
+
59
+ def create_analyst_agent(llm=None, verbose=True) -> Agent:
60
+ """
61
+ Creates an analyst agent responsible for content analysis and evaluation.
62
+
63
+ Args:
64
+ llm: Language model to use for the agent
65
+ verbose: Whether to log agent activity
66
+
67
+ Returns:
68
+ Configured analyst agent
69
+ """
70
+ # Initialize tools
71
+ scrape_tool = ScrapeWebsiteTool()
72
+ content_analyzer = ContentAnalyzerTool()
73
+
74
+ return Agent(
75
+ role="Content Analyst",
76
+ goal="Analyze web content for relevance, factuality, and quality",
77
+ backstory=(
78
+ "You are a discerning content analyst with a keen eye for detail and a strong "
79
+ "commitment to factual accuracy. You excel at evaluating information and filtering "
80
+ "out irrelevant or potentially misleading content. Your expertise helps ensure that "
81
+ "only the most reliable information is presented."
82
+ ),
83
+ tools=[scrape_tool, content_analyzer],
84
+ verbose=verbose,
85
+ allow_delegation=True,
86
+ memory=True,
87
+ llm=llm
88
+ )
89
+
90
+ def create_writer_agent(llm=None, verbose=True) -> Agent:
91
+ """
92
+ Creates a writer agent responsible for synthesizing information into coherent responses.
93
+
94
+ Args:
95
+ llm: Language model to use for the agent
96
+ verbose: Whether to log agent activity
97
+
98
+ Returns:
99
+ Configured writer agent
100
+ """
101
+ return Agent(
102
+ role="Research Writer",
103
+ goal="Create informative, factual, and well-cited responses to research queries",
104
+ backstory=(
105
+ "You are a skilled writer specializing in creating clear, concise, and informative "
106
+ "responses based on research findings. You have a talent for synthesizing information "
107
+ "from multiple sources and presenting it in a coherent and readable format, always with "
108
+ "proper citations. You prioritize factual accuracy and clarity in your writing."
109
+ ),
110
+ verbose=verbose,
111
+ allow_delegation=True,
112
+ memory=True,
113
+ llm=llm
114
+ )
app.py ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ import logging
4
+ import uuid
5
+ import pathlib
6
+ from dotenv import load_dotenv
7
+ from research_engine import ResearchEngine
8
+ import time
9
+ import traceback
10
+
11
+ # Load environment variables
12
+ load_dotenv()
13
+
14
+ # Configure logging
15
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
16
+ logger = logging.getLogger(__name__)
17
+
18
+ # Initialize the research engine with verbose=False for production
19
+ research_engine = None
20
+
21
+ # Dict to store session-specific research engines
22
+ session_engines = {}
23
+
24
+ def validate_api_keys(custom_openai_key=None):
25
+ """Checks if required API keys are set"""
26
+ missing_keys = []
27
+
28
+ if not os.getenv("BRAVE_API_KEY"):
29
+ missing_keys.append("BRAVE_API_KEY")
30
+
31
+ # Check for OpenAI key in either the environment or the custom key provided
32
+ if not custom_openai_key and not os.getenv("OPENAI_API_KEY"):
33
+ missing_keys.append("OPENAI_API_KEY")
34
+
35
+ return missing_keys
36
+
37
+ def get_engine_for_session(session_id, openai_api_key=None):
38
+ """Get or create a research engine for the specific session with optional custom API key"""
39
+ if session_id not in session_engines:
40
+ logger.info(f"Creating new research engine for session {session_id}")
41
+ # Set temporary API key if provided by user
42
+ original_key = None
43
+ if openai_api_key:
44
+ logger.info("Using custom OpenAI API key provided by user")
45
+ original_key = os.environ.get("OPENAI_API_KEY")
46
+ os.environ["OPENAI_API_KEY"] = openai_api_key
47
+
48
+ try:
49
+ session_engines[session_id] = ResearchEngine(verbose=False)
50
+ finally:
51
+ # Restore original key if we changed it
52
+ if original_key is not None:
53
+ os.environ["OPENAI_API_KEY"] = original_key
54
+ elif openai_api_key:
55
+ # If there was no original key, remove the temporary one
56
+ os.environ.pop("OPENAI_API_KEY", None)
57
+
58
+ return session_engines[session_id]
59
+
60
+ def cleanup_session(session_id):
61
+ """Remove a session when it's no longer needed"""
62
+ if session_id in session_engines:
63
+ logger.info(f"Cleaning up session {session_id}")
64
+ del session_engines[session_id]
65
+
66
+ def process_message(message, history, session_id, openai_api_key=None):
67
+ """
68
+ Process user message and update chat history.
69
+
70
+ Args:
71
+ message: User's message
72
+ history: Chat history list
73
+ session_id: Unique identifier for the session
74
+ openai_api_key: Optional custom OpenAI API key
75
+
76
+ Returns:
77
+ Updated history
78
+ """
79
+ # Validate API keys
80
+ missing_keys = validate_api_keys(openai_api_key)
81
+ if missing_keys:
82
+ return history + [
83
+ {"role": "user", "content": message},
84
+ {"role": "assistant", "content": f"Error: Missing required API keys: {', '.join(missing_keys)}. Please set these in your .env file or input your OpenAI API key below."}
85
+ ]
86
+
87
+ # Add user message to history
88
+ history.append({"role": "user", "content": message})
89
+
90
+ try:
91
+ print(f"Starting research for: {message}")
92
+ start_time = time.time()
93
+
94
+ # Get the appropriate engine for this session, passing the API key if provided
95
+ engine = get_engine_for_session(session_id, openai_api_key)
96
+
97
+ # Set the API key for this specific request if provided
98
+ original_key = None
99
+ if openai_api_key:
100
+ original_key = os.environ.get("OPENAI_API_KEY")
101
+ os.environ["OPENAI_API_KEY"] = openai_api_key
102
+
103
+ try:
104
+ # Start the research process
105
+ research_task = engine.research(message)
106
+ finally:
107
+ # Restore original key if we changed it
108
+ if original_key is not None:
109
+ os.environ["OPENAI_API_KEY"] = original_key
110
+ elif openai_api_key:
111
+ # If there was no original key, remove the temporary one
112
+ os.environ.pop("OPENAI_API_KEY", None)
113
+
114
+ # Print the research task output for debugging
115
+ print(f"Research task result type: {type(research_task)}")
116
+ print(f"Research task content: {research_task}")
117
+
118
+ # If we get here, step 1 is complete
119
+ history[-1] = {"role": "user", "content": message}
120
+ history.append({"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query..."})
121
+ yield history
122
+
123
+ # We don't actually have real-time progress indication from the engine,
124
+ # so we'll simulate it with a slight delay between steps
125
+ time.sleep(1)
126
+
127
+ history[-1] = {"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query... ✓\n**Step 2/4:** Searching the web..."}
128
+ yield history
129
+
130
+ time.sleep(1)
131
+
132
+ history[-1] = {"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query... ✓\n**Step 2/4:** Searching the web... ✓\n**Step 3/4:** Analyzing results..."}
133
+ yield history
134
+
135
+ time.sleep(1)
136
+
137
+ history[-1] = {"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query... ✓\n**Step 2/4:** Searching the web... ✓\n**Step 3/4:** Analyzing results... ✓\n**Step 4/4:** Synthesizing information..."}
138
+ yield history
139
+
140
+ # Get response from research engine
141
+ response = research_task["result"]
142
+
143
+ end_time = time.time()
144
+ processing_time = end_time - start_time
145
+
146
+ # Add processing time for transparency
147
+ response += f"\n\nResearch completed in {processing_time:.2f} seconds."
148
+
149
+ # Update last message with the full response
150
+ history[-1] = {"role": "assistant", "content": response}
151
+ yield history
152
+ except Exception as e:
153
+ logger.exception("Error processing message")
154
+ error_traceback = traceback.format_exc()
155
+ error_message = f"An error occurred: {str(e)}\n\nTraceback: {error_traceback}"
156
+ history[-1] = {"role": "assistant", "content": error_message}
157
+ yield history
158
+
159
+ # Define a basic theme with minimal customization - more styling in CSS
160
+ custom_theme = gr.themes.Soft(
161
+ primary_hue=gr.themes.colors.indigo,
162
+ secondary_hue=gr.themes.colors.blue,
163
+ neutral_hue=gr.themes.colors.slate,
164
+ )
165
+
166
+ # Gradio versions have different ways of loading CSS, let's ensure compatibility
167
+ css_file_path = pathlib.Path("assets/custom.css")
168
+ if css_file_path.exists():
169
+ with open(css_file_path, 'r') as f:
170
+ css_content = f.read()
171
+ else:
172
+ css_content = "" # Fallback empty CSS if file doesn't exist
173
+
174
+ # Add the CSS as a style tag to ensure it works in all Gradio versions
175
+ css_head = f"""
176
+ <style>
177
+ {css_content}
178
+
179
+ /* Additional styling for API key input */
180
+ .api-settings .api-key-input input {{
181
+ border: 1px solid #ccc;
182
+ border-radius: 8px;
183
+ font-family: monospace;
184
+ letter-spacing: 1px;
185
+ }}
186
+
187
+ .api-settings .api-key-info {{
188
+ font-size: 0.8rem;
189
+ color: #666;
190
+ margin-top: 5px;
191
+ }}
192
+
193
+ .api-settings {{
194
+ margin-bottom: 20px;
195
+ border: 1px solid #eee;
196
+ border-radius: 8px;
197
+ padding: 10px;
198
+ background-color: #f9f9f9;
199
+ }}
200
+ </style>
201
+ """
202
+
203
+ # Create the Gradio interface with multiple CSS loading methods for compatibility
204
+ with gr.Blocks(
205
+ title="Web Research Agent",
206
+ theme=custom_theme,
207
+ css=css_content,
208
+ head=css_head, # Older versions may use this
209
+ ) as app:
210
+ # Create a unique session ID for each user
211
+ session_id = gr.State(lambda: str(uuid.uuid4()))
212
+
213
+ with gr.Row(elem_classes=["container"]):
214
+ with gr.Column():
215
+ with gr.Row(elem_classes=["app-header"]):
216
+ gr.Markdown("""
217
+ <div style="display: flex; align-items: center; justify-content: center;">
218
+ <div style="width: 40px; height: 40px; margin-right: 15px; background: linear-gradient(135deg, #3a7bd5, #00d2ff); border-radius: 10px; display: flex; justify-content: center; align-items: center;">
219
+ <span style="color: white; font-size: 24px; font-weight: bold;">R</span>
220
+ </div>
221
+ <h1 style="margin: 0;">Web Research Agent</h1>
222
+ </div>
223
+ """)
224
+
225
+ gr.Markdown("""
226
+ This intelligent agent utilizes a multi-step process to deliver comprehensive research on any topic.
227
+ Simply enter your question or topic below to get comprehensive, accurate information with proper citations.
228
+ """, elem_classes=["md-container"])
229
+
230
+ # Missing keys warning
231
+ missing_keys = validate_api_keys()
232
+ if missing_keys:
233
+ gr.Markdown(f"⚠️ **Warning:** Missing required API keys: {', '.join(missing_keys)}. Add these to your .env file.", elem_classes=["warning"])
234
+
235
+ chatbot = gr.Chatbot(
236
+ height=600,
237
+ show_copy_button=True,
238
+ avatar_images=(None, "./assets/assistant_avatar.png"),
239
+ type="messages", # Use the modern messages format instead of tuples
240
+ elem_classes=["chatbot-container"]
241
+ )
242
+
243
+ # API Key input
244
+ with gr.Accordion("API Settings", open=False, elem_classes=["api-settings"]):
245
+ openai_api_key = gr.Textbox(
246
+ label="OpenAI API Key (optional)",
247
+ placeholder="sk-...",
248
+ type="password",
249
+ info="Provide your own OpenAI API key if you don't want to use the system default key.",
250
+ elem_classes=["api-key-input"]
251
+ )
252
+ gr.Markdown("""
253
+ Your API key is only used for your requests and is never stored on our servers.
254
+ It's a safer alternative to adding it to the .env file.
255
+ [Get an API key from OpenAI](https://platform.openai.com/account/api-keys)
256
+ """, elem_classes=["api-key-info"])
257
+
258
+ with gr.Row(elem_classes=["input-container"]):
259
+ msg = gr.Textbox(
260
+ placeholder="Ask me anything...",
261
+ scale=9,
262
+ container=False,
263
+ show_label=False,
264
+ elem_classes=["input-box"]
265
+ )
266
+ submit = gr.Button("Search", scale=1, variant="primary", elem_classes=["search-button"], value="search")
267
+
268
+ # Clear button
269
+ clear = gr.Button("Clear Conversation", elem_classes=["clear-button"])
270
+
271
+ # Examples
272
+ with gr.Accordion("Example Questions", open=False, elem_classes=["examples-container"]):
273
+ examples = gr.Examples(
274
+ examples=[
275
+ "What are the latest advancements in artificial intelligence?",
276
+ "Explain the impact of climate change on marine ecosystems",
277
+ "How do mRNA vaccines work?",
278
+ "What are the health benefits of intermittent fasting?",
279
+ "Explain the current state of quantum computing research",
280
+ "What are the main theories about dark matter?",
281
+ "How is blockchain technology being used outside of cryptocurrency?",
282
+ ],
283
+ inputs=msg
284
+ )
285
+
286
+ # Set up event handlers
287
+ submit_click_event = submit.click(
288
+ process_message,
289
+ inputs=[msg, chatbot, session_id, openai_api_key],
290
+ outputs=[chatbot],
291
+ show_progress=True
292
+ )
293
+
294
+ msg_submit_event = msg.submit(
295
+ process_message,
296
+ inputs=[msg, chatbot, session_id, openai_api_key],
297
+ outputs=[chatbot],
298
+ show_progress=True
299
+ )
300
+
301
+ # Clear message input after sending
302
+ submit_click_event.then(lambda: "", None, msg)
303
+ msg_submit_event.then(lambda: "", None, msg)
304
+
305
+ # Clear conversation and reset session
306
+ def clear_conversation_and_session(session_id_value):
307
+ # Clear the session data
308
+ cleanup_session(session_id_value)
309
+ # Generate a new session ID
310
+ new_session_id = str(uuid.uuid4())
311
+ # Return empty history and new session ID
312
+ return [], new_session_id
313
+
314
+ clear.click(
315
+ clear_conversation_and_session,
316
+ inputs=[session_id],
317
+ outputs=[chatbot, session_id]
318
+ )
319
+
320
+ # Citation and tools information
321
+ with gr.Accordion("About This Research Agent", open=False, elem_classes=["footer"]):
322
+ gr.Markdown("""
323
+ ### Research Agent Features
324
+
325
+ This research agent uses a combination of specialized AI agents to provide comprehensive answers:
326
+
327
+ - **Researcher Agent**: Refines queries and searches the web
328
+ - **Analyst Agent**: Evaluates content relevance and factual accuracy
329
+ - **Writer Agent**: Synthesizes information into coherent responses
330
+
331
+ #### Tools Used
332
+ - BraveSearch and Tavily for web searching
333
+ - Content scraping for in-depth information
334
+ - Analysis for relevance and factual verification
335
+
336
+ #### API Keys
337
+ - You can use your own OpenAI API key by entering it in the "API Settings" section
338
+ - Your API key is used only for your requests and is never stored on our servers
339
+ - This lets you control costs and use your preferred API tier
340
+
341
+ All information is provided with proper citations and sources.
342
+
343
+ *Processing may take a minute or two as the agent searches, analyzes, and synthesizes information.*
344
+ """, elem_classes=["md-container"])
345
+
346
+ if __name__ == "__main__":
347
+ # Create assets directory if it doesn't exist
348
+ os.makedirs("assets", exist_ok=True)
349
+
350
+ # Launch the Gradio app
351
+ app.launch()
architecture.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Web Research Agent Architecture
2
+
3
+ ```
4
+ ┌──────────────────────────────────────────────────────────────────────────────┐
5
+ │ Gradio Interface │
6
+ └───────────────────────────────────┬──────────────────────────────────────────┘
7
+
8
+
9
+ ┌──────────────────────────────────────────────────────────────────────────────┐
10
+ │ Research Engine │
11
+ │ │
12
+ │ ┌───────────────────────────────────────────────────────────────────────┐ │
13
+ │ │ Conversation History │ │
14
+ │ └───────────────────────────────────────────────────────────────────────┘ │
15
+ │ │
16
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
17
+ │ │ Researcher │◄────────────►│ Analyst │◄────────────►│ Writer │ │
18
+ │ │ Agent │ │ Agent │ │ Agent │ │
19
+ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
20
+ │ │ │ │ │
21
+ │ ▼ ▼ ▼ │
22
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
23
+ │ │ Search │ │ Scrape │ │ Information │ │
24
+ │ │ Rotation │ │ Website Tool│ │ Synthesis │ │
25
+ │ │ Tool │ └──────┬──────┘ └─────────────┘ │
26
+ │ └─────────────┘ │ │
27
+ │ ▼ │
28
+ │ ┌─────────────┐ │
29
+ │ │ Content │ │
30
+ │ │ Analyzer │ │
31
+ │ └─────────────┘ │
32
+ └──────────────────────────────────────────────────────────────────────────────┘
33
+ ```
34
+
35
+ ## Research Flow
36
+
37
+ 1. **User Input**
38
+ - User enters a query in the Gradio interface
39
+ - Query is validated for legitimacy and processed by the system
40
+
41
+ 2. **Query Refinement** (Researcher Agent)
42
+ - Original query is analyzed and refined for optimal search results
43
+ - Ambiguous terms are clarified and search intent is identified
44
+ - Refined query is prepared for web search with improved keywords
45
+
46
+ 3. **Web Search** (Researcher Agent + Search Rotation Tool)
47
+ - Search Rotation Tool executes search using multiple search engines
48
+ - Rate limiting is implemented to avoid API throttling
49
+ - Search is performed with a maximum of 5 searches per query
50
+ - Results are cached for similar queries to improve efficiency
51
+ - Search results are collected with URLs and snippets
52
+
53
+ 4. **Content Scraping** (Analyst Agent + ScrapeWebsiteTool)
54
+ - ScrapeWebsiteTool extracts content from search result URLs
55
+ - HTML content is parsed to extract meaningful text
56
+ - Raw content is prepared for analysis and evaluation
57
+
58
+ 5. **Content Analysis** (Analyst Agent + ContentAnalyzerTool)
59
+ - Content is analyzed for relevance to the query (scores 0-10)
60
+ - Factuality and quality are evaluated (scores 0-10)
61
+ - Irrelevant or low-quality content is filtered out
62
+ - Content is organized by relevance and information value
63
+
64
+ 6. **Response Creation** (Writer Agent)
65
+ - Analyzed content is synthesized into a comprehensive response
66
+ - Information is organized logically with a clear structure
67
+ - Contradictory information is reconciled when present
68
+ - Citations are added in [1], [2] format with proper attribution
69
+ - Source URLs are included for reference and verification
70
+
71
+ 7. **Result Presentation**
72
+ - Final response with citations is displayed to the user
73
+ - Conversation history is updated and maintained per session
74
+ - Results can be saved to file if requested
75
+
76
+ ## System Architecture
77
+
78
+ - **Multi-Agent System**: Three specialized agents work together with distinct roles
79
+ - **Stateless Design**: Each research request is processed independently
80
+ - **Session Management**: User sessions maintain separate conversation contexts
81
+ - **API Integration**: Multiple search APIs with fallback mechanisms
82
+ - **Memory**: All agents maintain context throughout the research process
83
+ - **Tool Abstraction**: Search and analysis tools are modular and interchangeable
84
+ - **Error Handling**: Comprehensive error handling at each processing stage
85
+ - **Rate Limiting**: API calls are rate-limited to prevent throttling
86
+
87
+ ## Technical Implementation
88
+
89
+ - **Frontend**: Gradio web interface with real-time feedback
90
+ - **Backend**: Python-based research engine with modular components
91
+ - **Tools**:
92
+ - Search Rotation Tool (supports multiple search engines)
93
+ - Rate Limited Tool Wrapper (prevents API throttling)
94
+ - Content Analyzer Tool (evaluates relevance and factuality)
95
+ - Scrape Website Tool (extracts content from URLs)
96
+ - **Deployment**: Compatible with Hugging Face Spaces for online access
97
+ - **Caching**: Results are cached to improve performance and reduce API calls
assets/.gitkeep ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ # This directory is for assets like the assistant avatar
2
+ # You can add an image named assistant_avatar.png here for the chatbot interface
assets/assistant_avatar.png ADDED

Git LFS Details

  • SHA256: 7772041226ef4b5c3197e833693e121f5113a7bf008b454eefae4a8e8401ec4e
  • Pointer size: 132 Bytes
  • Size of remote file: 2.27 MB
assets/custom.css ADDED
@@ -0,0 +1,494 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Custom CSS for Web Research Agent */
2
+
3
+ /* Global body styling */
4
+ body {
5
+ background: linear-gradient(to right, #0f2027, #203a43, #2c5364);
6
+ color: #f0f0f0;
7
+ font-family: 'Inter', system-ui, sans-serif;
8
+ }
9
+
10
+ /* Override Gradio container styles */
11
+ .gradio-container {
12
+ max-width: 1200px !important;
13
+ margin: 0 auto !important;
14
+ background-color: transparent !important;
15
+ }
16
+
17
+ /* Button styling overrides */
18
+ .primary-btn {
19
+ background: linear-gradient(90deg, #3a7bd5, #00d2ff) !important;
20
+ color: white !important;
21
+ border: none !important;
22
+ transition: all 0.3s ease !important;
23
+ }
24
+
25
+ .primary-btn:hover {
26
+ transform: translateY(-2px) !important;
27
+ box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1) !important;
28
+ }
29
+
30
+ /* Input field styling overrides */
31
+ textarea, input[type="text"] {
32
+ background: rgba(255, 255, 255, 0.05) !important;
33
+ border: 1px solid rgba(255, 255, 255, 0.1) !important;
34
+ color: white !important;
35
+ }
36
+
37
+ textarea:focus, input[type="text"]:focus {
38
+ border-color: #3a7bd5 !important;
39
+ box-shadow: 0 0 0 2px rgba(58, 123, 213, 0.3) !important;
40
+ }
41
+
42
+ /* Chat bubbles */
43
+ .message-bubble {
44
+ background-color: rgba(42, 46, 53, 0.8) !important;
45
+ border-radius: 12px !important;
46
+ }
47
+
48
+ .user-bubble {
49
+ background-color: rgba(48, 66, 105, 0.8) !important;
50
+ }
51
+
52
+ /* Main container styling */
53
+ .container {
54
+ max-width: 1200px;
55
+ margin: 0 auto;
56
+ padding: 20px;
57
+ }
58
+
59
+ /* Header styling */
60
+ h1.title {
61
+ font-size: 2.5rem;
62
+ font-weight: 700;
63
+ background: linear-gradient(90deg, #3a7bd5, #00d2ff);
64
+ -webkit-background-clip: text; /* Prefix for older WebKit */
65
+ background-clip: text; /* Standard property */
66
+ -webkit-text-fill-color: transparent;
67
+ text-align: center;
68
+ margin-bottom: 0.5rem;
69
+ }
70
+
71
+ /* Chatbot container */
72
+ .chatbot-container {
73
+ border-radius: 12px;
74
+ background: rgba(255, 255, 255, 0.05);
75
+ backdrop-filter: blur(10px);
76
+ box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
77
+ margin-bottom: 20px;
78
+ }
79
+
80
+ /* Chat messages */
81
+ .message {
82
+ padding: 15px 20px;
83
+ border-radius: 10px;
84
+ margin-bottom: 10px;
85
+ max-width: 80%;
86
+ }
87
+
88
+ .user-message {
89
+ background-color: #304269;
90
+ color: white;
91
+ align-self: flex-end;
92
+ border-bottom-right-radius: 0;
93
+ }
94
+
95
+ .bot-message {
96
+ background-color: #2a2e35;
97
+ color: #eaeaea;
98
+ align-self: flex-start;
99
+ border-bottom-left-radius: 0;
100
+ }
101
+
102
+ /* Input box styling */
103
+ .input-container {
104
+ display: flex;
105
+ gap: 10px;
106
+ margin-top: 20px;
107
+ }
108
+
109
+ .input-box {
110
+ border-radius: 8px;
111
+ border: 1px solid rgba(255, 255, 255, 0.1);
112
+ background: rgba(255, 255, 255, 0.05);
113
+ padding: 12px 16px;
114
+ font-size: 16px;
115
+ color: white;
116
+ transition: all 0.3s ease;
117
+ }
118
+
119
+ .input-box:focus {
120
+ border-color: #3a7bd5;
121
+ box-shadow: 0 0 0 2px rgba(58, 123, 213, 0.3);
122
+ outline: none;
123
+ }
124
+
125
+ /* Button styling */
126
+ .search-button {
127
+ background: linear-gradient(90deg, #3a7bd5, #00d2ff);
128
+ color: white;
129
+ border: none;
130
+ border-radius: 8px;
131
+ padding: 12px 24px;
132
+ font-weight: 600;
133
+ cursor: pointer;
134
+ transition: all 0.3s ease;
135
+ }
136
+
137
+ .search-button:hover {
138
+ transform: translateY(-2px);
139
+ box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
140
+ }
141
+
142
+ .clear-button {
143
+ background: transparent;
144
+ color: #adadad;
145
+ border: 1px solid rgba(255, 255, 255, 0.2);
146
+ border-radius: 8px;
147
+ padding: 8px 16px;
148
+ font-weight: 500;
149
+ cursor: pointer;
150
+ transition: all 0.3s ease;
151
+ }
152
+
153
+ .clear-button:hover {
154
+ background: rgba(255, 255, 255, 0.05);
155
+ color: white;
156
+ }
157
+
158
+ /* Examples section */
159
+ .examples-container {
160
+ margin-top: 20px;
161
+ padding: 15px;
162
+ border-radius: 8px;
163
+ background: rgba(255, 255, 255, 0.03);
164
+ }
165
+
166
+ .examples-container h3 {
167
+ margin-top: 0;
168
+ color: #b8b9bd;
169
+ font-size: 1rem;
170
+ }
171
+
172
+ .example-item {
173
+ padding: 8px 12px;
174
+ background: rgba(58, 123, 213, 0.1);
175
+ border-radius: 6px;
176
+ margin-bottom: 8px;
177
+ cursor: pointer;
178
+ transition: all 0.2s ease;
179
+ }
180
+
181
+ .example-item:hover {
182
+ background: rgba(58, 123, 213, 0.2);
183
+ }
184
+
185
+ /* Loading indicator */
186
+ .loading-indicator {
187
+ display: inline-block;
188
+ margin-left: 10px;
189
+ color: #3a7bd5;
190
+ }
191
+
192
+ /* Citation and source styling */
193
+ .citation {
194
+ font-size: 0.85rem;
195
+ color: #6c757d;
196
+ background-color: rgba(108, 117, 125, 0.1);
197
+ padding: 0 4px;
198
+ border-radius: 3px;
199
+ }
200
+
201
+ .source-list {
202
+ font-size: 0.9rem;
203
+ padding-left: 20px;
204
+ margin-top: 10px;
205
+ color: #b8b9bd;
206
+ }
207
+
208
+ /* Warning messages */
209
+ .warning {
210
+ background-color: rgba(255, 207, 0, 0.1);
211
+ border-left: 4px solid #ffcf00;
212
+ padding: 12px 16px;
213
+ border-radius: 4px;
214
+ margin-bottom: 20px;
215
+ color: #f0f0f0;
216
+ }
217
+
218
+ /* Footer styling */
219
+ .footer {
220
+ margin-top: 30px;
221
+ padding-top: 20px;
222
+ border-top: 1px solid rgba(255, 255, 255, 0.1);
223
+ text-align: center;
224
+ font-size: 0.9rem;
225
+ color: #b8b9bd;
226
+ }
227
+
228
+ /* Markdown content styling */
229
+ .md-container {
230
+ line-height: 1.6;
231
+ }
232
+
233
+ .md-container code {
234
+ background-color: rgba(255, 255, 255, 0.1);
235
+ padding: 2px 5px;
236
+ border-radius: 3px;
237
+ font-family: monospace;
238
+ }
239
+
240
+ .md-container pre {
241
+ background-color: rgba(0, 0, 0, 0.2);
242
+ padding: 15px;
243
+ border-radius: 5px;
244
+ overflow-x: auto;
245
+ }
246
+
247
+ /* Avatar styling */
248
+ .avatar {
249
+ width: 36px;
250
+ height: 36px;
251
+ border-radius: 50%;
252
+ object-fit: cover;
253
+ }
254
+
255
+ /* Dark mode specific adjustments */
256
+ @media (prefers-color-scheme: dark) {
257
+ body {
258
+ background-color: #1a1c23;
259
+ color: #f0f0f0;
260
+ }
261
+
262
+ .input-box {
263
+ background: rgba(255, 255, 255, 0.03);
264
+ }
265
+ }
266
+
267
+ /* Custom scrollbar */
268
+ ::-webkit-scrollbar {
269
+ width: 8px;
270
+ height: 8px;
271
+ }
272
+
273
+ ::-webkit-scrollbar-track {
274
+ background: rgba(255, 255, 255, 0.05);
275
+ }
276
+
277
+ ::-webkit-scrollbar-thumb {
278
+ background: rgba(255, 255, 255, 0.2);
279
+ border-radius: 4px;
280
+ }
281
+
282
+ ::-webkit-scrollbar-thumb:hover {
283
+ background: rgba(255, 255, 255, 0.3);
284
+ }
285
+
286
+ /* Progress indicator styling */
287
+ .progress-step {
288
+ margin: 10px 0;
289
+ padding: 8px 12px;
290
+ border-radius: 8px;
291
+ background-color: rgba(58, 123, 213, 0.1);
292
+ transition: all 0.3s ease;
293
+ }
294
+
295
+ .progress-step.completed {
296
+ background-color: rgba(0, 210, 255, 0.15);
297
+ }
298
+
299
+ .progress-check {
300
+ color: #00d2ff;
301
+ margin-left: 8px;
302
+ }
303
+
304
+ /* Loading animation */
305
+ @keyframes pulse {
306
+ 0% { opacity: 0.6; }
307
+ 50% { opacity: 1; }
308
+ 100% { opacity: 0.6; }
309
+ }
310
+
311
+ .loading-dot {
312
+ display: inline-block;
313
+ width: 8px;
314
+ height: 8px;
315
+ border-radius: 50%;
316
+ background-color: #3a7bd5;
317
+ margin: 0 2px;
318
+ animation: pulse 1.5s infinite;
319
+ }
320
+
321
+ .loading-dot:nth-child(2) {
322
+ animation-delay: 0.2s;
323
+ }
324
+
325
+ .loading-dot:nth-child(3) {
326
+ animation-delay: 0.4s;
327
+ }
328
+
329
+ /* Message content styling - improve readability */
330
+ .message-content {
331
+ line-height: 1.6;
332
+ font-size: 1rem;
333
+ }
334
+
335
+ /* Code blocks in messages */
336
+ .message-content pre {
337
+ background-color: rgba(0, 0, 0, 0.2);
338
+ border-radius: 8px;
339
+ padding: 12px;
340
+ overflow-x: auto;
341
+ font-family: 'Courier New', monospace;
342
+ font-size: 0.9rem;
343
+ }
344
+
345
+ .message-content code {
346
+ background-color: rgba(0, 0, 0, 0.2);
347
+ padding: 2px 4px;
348
+ border-radius: 4px;
349
+ font-family: 'Courier New', monospace;
350
+ font-size: 0.9em;
351
+ }
352
+
353
+ /* Improve citation styling */
354
+ .citation {
355
+ background-color: rgba(58, 123, 213, 0.2);
356
+ padding: 2px 5px;
357
+ border-radius: 4px;
358
+ font-weight: 500;
359
+ color: #b8cff5;
360
+ margin: 0 2px;
361
+ font-size: 0.9em;
362
+ }
363
+
364
+ /* Source list at the end of responses */
365
+ .source-list {
366
+ margin-top: 20px;
367
+ padding-top: 10px;
368
+ border-top: 1px solid rgba(255, 255, 255, 0.1);
369
+ }
370
+
371
+ .source-list ol {
372
+ margin-left: 20px;
373
+ padding-left: 10px;
374
+ }
375
+
376
+ .source-list li {
377
+ margin-bottom: 5px;
378
+ }
379
+
380
+ /* Make links more visible */
381
+ a {
382
+ color: #00d2ff;
383
+ text-decoration: none;
384
+ transition: all 0.2s ease;
385
+ }
386
+
387
+ a:hover {
388
+ text-decoration: underline;
389
+ color: #3a7bd5;
390
+ }
391
+
392
+ /* Add app logo/header styling */
393
+ .app-header {
394
+ display: flex;
395
+ align-items: center;
396
+ justify-content: center;
397
+ margin-bottom: 20px;
398
+ }
399
+
400
+ .app-logo {
401
+ width: 40px;
402
+ height: 40px;
403
+ margin-right: 10px;
404
+ }
405
+
406
+ /* Responsive adjustments */
407
+ @media (max-width: 768px) {
408
+ .container {
409
+ padding: 10px;
410
+ }
411
+
412
+ h1.title {
413
+ font-size: 1.8rem;
414
+ }
415
+
416
+ .chatbot-container {
417
+ height: 70vh;
418
+ }
419
+
420
+ .message {
421
+ max-width: 90%;
422
+ }
423
+ }
424
+
425
+ /* Chatbot container and message styling */
426
+ .gradio-container .prose {
427
+ max-width: 100% !important; /* Override max-width constraint */
428
+ }
429
+
430
+ /* Target the chatbot messages directly */
431
+ .chatbot .message {
432
+ padding: 15px !important;
433
+ border-radius: 12px !important;
434
+ margin-bottom: 12px !important;
435
+ box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1) !important;
436
+ }
437
+
438
+ /* User messages */
439
+ .chatbot .user {
440
+ background-color: #304269 !important;
441
+ color: white !important;
442
+ border-bottom-right-radius: 2px !important;
443
+ }
444
+
445
+ /* Bot messages */
446
+ .chatbot .bot {
447
+ background-color: #2a2e35 !important;
448
+ color: #eaeaea !important;
449
+ border-bottom-left-radius: 2px !important;
450
+ }
451
+
452
+ /* Apply primary gradient to buttons */
453
+ button.primary {
454
+ background: linear-gradient(90deg, #3a7bd5, #00d2ff) !important;
455
+ color: white !important;
456
+ }
457
+
458
+ /* Style the chatbot container */
459
+ .chatbot-container > div {
460
+ border-radius: 12px !important;
461
+ background: rgba(31, 41, 55, 0.4) !important;
462
+ backdrop-filter: blur(10px) !important;
463
+ }
464
+
465
+ /* Fix scrollbar in chat */
466
+ .chatbot ::-webkit-scrollbar {
467
+ width: 8px !important;
468
+ }
469
+
470
+ .chatbot ::-webkit-scrollbar-track {
471
+ background: rgba(255, 255, 255, 0.05) !important;
472
+ }
473
+
474
+ .chatbot ::-webkit-scrollbar-thumb {
475
+ background: rgba(255, 255, 255, 0.2) !important;
476
+ border-radius: 4px !important;
477
+ }
478
+
479
+ /* Style the copy button */
480
+ .copy-button {
481
+ background-color: rgba(58, 123, 213, 0.2) !important;
482
+ color: #b8cff5 !important;
483
+ }
484
+
485
+ /* Fix mobile responsiveness */
486
+ @media (max-width: 640px) {
487
+ .gradio-container {
488
+ padding: 10px !important;
489
+ }
490
+
491
+ .container {
492
+ padding: 10px !important;
493
+ }
494
+ }
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ crewai>=0.11.0
2
+ gradio>=3.50.0
3
+ python-dotenv>=1.0.0
4
+ duckduckgo-search>=3.9.0
5
+ beautifulsoup4>=4.12.0
6
+ requests>=2.31.0
7
+ pydantic>=2.0.0
research_engine.py ADDED
@@ -0,0 +1,382 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import logging
4
+ import time
5
+ from typing import List, Dict, Any, Optional, Tuple, Union
6
+
7
+ from crewai import Crew
8
+ from crewai.agent import Agent
9
+ from crewai.task import Task
10
+
11
+ from agents import create_researcher_agent, create_analyst_agent, create_writer_agent
12
+ from tasks import (
13
+ create_query_refinement_task,
14
+ create_search_task,
15
+ create_content_scraping_task,
16
+ create_content_analysis_task,
17
+ create_response_writing_task
18
+ )
19
+ from utils import is_valid_query, format_research_results, extract_citations
20
+
21
+ # Configure logging
22
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
23
+ logger = logging.getLogger(__name__)
24
+
25
+ class ResearchEngine:
26
+ """
27
+ Main engine for web research using CrewAI.
28
+ Orchestrates agents and tasks to provide comprehensive research results.
29
+ """
30
+
31
+ def __init__(self, llm=None, verbose=False):
32
+ """
33
+ Initialize the research engine.
34
+
35
+ Args:
36
+ llm: The language model to use for agents
37
+ verbose: Whether to log detailed information
38
+ """
39
+ self.llm = llm
40
+ self.verbose = verbose
41
+
42
+ # Initialize agents
43
+ logger.info("Initializing agents...")
44
+ self.researcher = create_researcher_agent(llm=llm, verbose=verbose)
45
+ self.analyst = create_analyst_agent(llm=llm, verbose=verbose)
46
+ self.writer = create_writer_agent(llm=llm, verbose=verbose)
47
+
48
+ # Chat history for maintaining conversation context
49
+ self.chat_history = []
50
+
51
+ logger.info("Research engine initialized with agents")
52
+
53
+ def _validate_api_keys(self):
54
+ """Validates that required API keys are present"""
55
+ missing_keys = []
56
+
57
+ if not os.getenv("BRAVE_API_KEY"):
58
+ missing_keys.append("BRAVE_API_KEY")
59
+
60
+ if not os.getenv("TAVILY_API_KEY"):
61
+ missing_keys.append("TAVILY_API_KEY")
62
+
63
+ if not os.getenv("OPENAI_API_KEY") and not self.llm:
64
+ missing_keys.append("OPENAI_API_KEY or custom LLM")
65
+
66
+ if missing_keys:
67
+ logger.warning(f"Missing API keys: {', '.join(missing_keys)}")
68
+ if "TAVILY_API_KEY" in missing_keys:
69
+ logger.warning("Tavily API key is missing - search functionality may be limited")
70
+ if "BRAVE_API_KEY" in missing_keys:
71
+ logger.warning("Brave API key is missing - search functionality may be limited")
72
+
73
+ # Only raise error if all search API keys are missing
74
+ if "BRAVE_API_KEY" in missing_keys and "TAVILY_API_KEY" in missing_keys:
75
+ raise ValueError(f"Missing required API keys: {', '.join(missing_keys)}")
76
+ else:
77
+ logger.info("All required API keys are present")
78
+
79
+ def research(self, query: str, output_file=None) -> Dict[str, Any]:
80
+ """
81
+ Perform research on the given query.
82
+
83
+ Args:
84
+ query: The research query
85
+ output_file: Optional file to save the research results
86
+
87
+ Returns:
88
+ Research results
89
+ """
90
+ logger.info(f"Research initiated for query: {query}")
91
+ start_time = time.time() # Initialize the start_time for tracking processing time
92
+
93
+ try:
94
+ self._validate_api_keys()
95
+ logger.info(f"Starting research for query: {query}")
96
+
97
+ # Add the query to chat history
98
+ self.chat_history.append({"role": "user", "content": query})
99
+
100
+ # Step 1: Initialize the crew
101
+ logger.info("Initializing research crew...")
102
+ crew = Crew(
103
+ agents=[self.researcher],
104
+ tasks=[create_query_refinement_task(self.researcher, query)],
105
+ verbose=self.verbose, # Use the instance's verbose setting
106
+ process="sequential"
107
+ )
108
+
109
+ # Step 2: Start the research process
110
+ logger.info("Starting research process...")
111
+ refinement_result = crew.kickoff(inputs={"query": query})
112
+ logger.info(f"Query refinement completed with result type: {type(refinement_result)}")
113
+ logger.debug(f"Refinement result: {refinement_result}")
114
+
115
+ # Extract the refined query
116
+ refined_query = None
117
+ try:
118
+ logger.info(f"Attempting to extract refined query from result type: {type(refinement_result)}")
119
+
120
+ # Handle CrewOutput object (new CrewAI format)
121
+ if hasattr(refinement_result, '__class__') and refinement_result.__class__.__name__ == 'CrewOutput':
122
+ logger.info("Processing CrewOutput format refinement result")
123
+
124
+ # Try to access raw attribute first (contains the raw output)
125
+ if hasattr(refinement_result, 'raw'):
126
+ refined_query = self._extract_query_from_string(refinement_result.raw)
127
+ logger.info(f"Extracted from CrewOutput.raw: {refined_query}")
128
+
129
+ # Try to access as dictionary
130
+ elif hasattr(refinement_result, 'to_dict'):
131
+ crew_dict = refinement_result.to_dict()
132
+ logger.info(f"CrewOutput converted to dict: {crew_dict}")
133
+
134
+ if 'result' in crew_dict:
135
+ refined_query = self._extract_query_from_string(crew_dict['result'])
136
+ logger.info(f"Extracted from CrewOutput dict result: {refined_query}")
137
+
138
+ # Try string representation as last resort
139
+ else:
140
+ crew_str = str(refinement_result)
141
+ refined_query = self._extract_query_from_string(crew_str)
142
+ logger.info(f"Extracted from CrewOutput string representation: {refined_query}")
143
+
144
+ # First try to access it as a dictionary (new CrewAI format)
145
+ elif isinstance(refinement_result, dict):
146
+ logger.info("Processing dictionary format refinement result")
147
+ if "query" in refinement_result:
148
+ refined_query = refinement_result["query"]
149
+ elif "refined_query" in refinement_result:
150
+ refined_query = refinement_result["refined_query"]
151
+ elif "result" in refinement_result and isinstance(refinement_result["result"], str):
152
+ # Try to extract from nested result field
153
+ json_str = refinement_result["result"]
154
+ refined_query = self._extract_query_from_string(json_str)
155
+
156
+ # Then try to access it as a string (old CrewAI format)
157
+ elif isinstance(refinement_result, str):
158
+ logger.info("Processing string format refinement result")
159
+ refined_query = self._extract_query_from_string(refinement_result)
160
+
161
+ else:
162
+ logger.warning(f"Unexpected refinement result format: {type(refinement_result)}")
163
+ # Try to extract information by examining the structure
164
+ try:
165
+ # Try to access common attributes
166
+ if hasattr(refinement_result, "result"):
167
+ result_str = str(getattr(refinement_result, "result"))
168
+ refined_query = self._extract_query_from_string(result_str)
169
+ logger.info(f"Extracted from .result attribute: {refined_query}")
170
+ elif hasattr(refinement_result, "task_output"):
171
+ task_output = getattr(refinement_result, "task_output")
172
+ refined_query = self._extract_query_from_string(str(task_output))
173
+ logger.info(f"Extracted from .task_output attribute: {refined_query}")
174
+ else:
175
+ # Last resort: convert to string and extract
176
+ refined_query = self._extract_query_from_string(str(refinement_result))
177
+ logger.info(f"Extracted from string representation: {refined_query}")
178
+ except Exception as attr_error:
179
+ logger.exception(f"Error trying to extract attributes: {attr_error}")
180
+ refined_query = query # Fall back to original query
181
+
182
+ logger.debug(f"Refinement result: {refinement_result}")
183
+ except Exception as e:
184
+ logger.exception(f"Error extracting refined query: {e}")
185
+ refined_query = query # Fall back to original query on error
186
+
187
+ if not refined_query or refined_query.strip() == "":
188
+ logger.warning("Refined query is empty, using original query")
189
+ refined_query = query
190
+
191
+ logger.info(f"Refined query: {refined_query}")
192
+
193
+ # Step 3: Create tasks for research process
194
+ logger.info("Creating research tasks...")
195
+ search_task = create_search_task(self.researcher, refined_query)
196
+
197
+ scrape_task = create_content_scraping_task(self.analyst, search_task)
198
+
199
+ analyze_task = create_content_analysis_task(self.analyst, refined_query, scrape_task)
200
+
201
+ write_task = create_response_writing_task(self.writer, refined_query, analyze_task)
202
+
203
+ # Step 4: Create a new crew for the research tasks
204
+ logger.info("Initializing main research crew...")
205
+ research_crew = Crew(
206
+ agents=[self.researcher, self.analyst, self.writer],
207
+ tasks=[search_task, scrape_task, analyze_task, write_task],
208
+ verbose=self.verbose, # Use the instance's verbose setting
209
+ process="sequential"
210
+ )
211
+
212
+ # Step 5: Start the research process
213
+ logger.info("Starting main research process...")
214
+ result = research_crew.kickoff()
215
+ logger.info(f"Research completed with result type: {type(result)}")
216
+ logger.debug(f"Research result: {result}")
217
+
218
+ # Extract the result
219
+ final_result = {"query": query, "refined_query": refined_query}
220
+
221
+ # Handle different result types
222
+ if isinstance(result, dict) and "result" in result:
223
+ final_result["result"] = result["result"]
224
+ # Handle CrewOutput object (new CrewAI format)
225
+ elif hasattr(result, '__class__') and result.__class__.__name__ == 'CrewOutput':
226
+ logger.info("Processing CrewOutput format result")
227
+
228
+ # Try to access raw attribute first (contains the raw output)
229
+ if hasattr(result, 'raw'):
230
+ final_result["result"] = result.raw
231
+ logger.info("Extracted result from CrewOutput.raw")
232
+
233
+ # Try to access as dictionary
234
+ elif hasattr(result, 'to_dict'):
235
+ crew_dict = result.to_dict()
236
+ if 'result' in crew_dict:
237
+ final_result["result"] = crew_dict['result']
238
+ logger.info("Extracted result from CrewOutput dict")
239
+
240
+ # Use string representation as last resort
241
+ else:
242
+ final_result["result"] = str(result)
243
+ logger.info("Used string representation of CrewOutput")
244
+ else:
245
+ # For any other type, use the string representation
246
+ final_result["result"] = str(result)
247
+ logger.info(f"Used string representation for result type: {type(result)}")
248
+
249
+ logger.info("Research process completed successfully")
250
+
251
+ # Save to file if requested
252
+ if output_file:
253
+ with open(output_file, 'w', encoding='utf-8') as f:
254
+ json.dump(final_result, f, ensure_ascii=False, indent=2)
255
+
256
+ # Extract citations for easy access (if possible from the final string)
257
+ citations = extract_citations(final_result["result"])
258
+
259
+ # Calculate total processing time
260
+ processing_time = time.time() - start_time
261
+ logger.info(f"Research completed successfully in {processing_time:.2f} seconds")
262
+
263
+ return {
264
+ "result": final_result["result"],
265
+ "success": True,
266
+ "refined_query": refined_query,
267
+ "citations": citations,
268
+ "processing_time": processing_time
269
+ }
270
+ except Exception as e:
271
+ logger.exception(f"Error in research process: {e}")
272
+ return {
273
+ "result": f"I encountered an error while researching your query: {str(e)}",
274
+ "success": False,
275
+ "reason": "research_error",
276
+ "error": str(e)
277
+ }
278
+
279
+ def chat(self, message: str) -> str:
280
+ """
281
+ Handle a chat message, which could be a research query or a follow-up question.
282
+
283
+ Args:
284
+ message: The user's message
285
+
286
+ Returns:
287
+ The assistant's response
288
+ """
289
+ # Treat all messages as new research queries for simplicity
290
+ try:
291
+ research_result = self.research(message)
292
+ return research_result["result"]
293
+ except Exception as e:
294
+ logger.exception(f"Error during research for message: {message}")
295
+ return f"I encountered an error while processing your request: {str(e)}"
296
+
297
+ def clear_history(self):
298
+ """Clear the chat history"""
299
+ self.chat_history = []
300
+
301
+ def _extract_query_from_string(self, text: str) -> str:
302
+ """
303
+ Extract refined query from text string, handling various formats including JSON embedded in strings.
304
+
305
+ Args:
306
+ text: The text to extract the query from
307
+
308
+ Returns:
309
+ The extracted query or None if not found
310
+ """
311
+ if not text:
312
+ return None
313
+
314
+ # Log the input for debugging
315
+ logger.debug(f"Extracting query from: {text[:200]}...")
316
+
317
+ # Try to parse as JSON first
318
+ try:
319
+ # Check if the entire string is valid JSON
320
+ json_data = json.loads(text)
321
+
322
+ # Check for known keys in the parsed JSON
323
+ if isinstance(json_data, dict):
324
+ if "refined_query" in json_data:
325
+ return json_data["refined_query"]
326
+ elif "query" in json_data:
327
+ return json_data["query"]
328
+ elif "result" in json_data and isinstance(json_data["result"], str):
329
+ # Try to recursively extract from nested result
330
+ return self._extract_query_from_string(json_data["result"])
331
+ except json.JSONDecodeError:
332
+ # Not valid JSON, continue with string parsing
333
+ pass
334
+
335
+ # Look for JSON blocks in the string
336
+ try:
337
+ import re
338
+ # Match both markdown JSON blocks and regular JSON objects
339
+ json_pattern = r'```(?:json)?\s*({[^`]*})```|({[\s\S]*})'
340
+ json_matches = re.findall(json_pattern, text, re.DOTALL)
341
+
342
+ for json_match in json_matches:
343
+ # Handle tuple result from findall with multiple capture groups
344
+ json_str = next((s for s in json_match if s), '')
345
+ try:
346
+ json_data = json.loads(json_str)
347
+ if isinstance(json_data, dict):
348
+ if "refined_query" in json_data:
349
+ return json_data["refined_query"]
350
+ elif "query" in json_data:
351
+ return json_data["query"]
352
+ except Exception:
353
+ continue
354
+ except Exception as e:
355
+ logger.debug(f"Error parsing JSON blocks: {e}")
356
+
357
+ # Check for common patterns in CrewAI output format
358
+ patterns = [
359
+ r'refined query[:\s]+([^\n]+)',
360
+ r'query[:\s]+([^\n]+)',
361
+ r'search(?:ed)? for[:\s]+[\'"]([^\'"]+)[\'"]',
362
+ r'search(?:ing)? for[:\s]+[\'"]([^\'"]+)[\'"]',
363
+ r'research(?:ing)? (?:about|on)[:\s]+[\'"]([^\'"]+)[\'"]',
364
+ r'query is[:\s]+[\'"]([^\'"]+)[\'"]'
365
+ ]
366
+
367
+ for pattern in patterns:
368
+ try:
369
+ match = re.search(pattern, text.lower())
370
+ if match:
371
+ return match.group(1).strip()
372
+ except Exception as e:
373
+ logger.debug(f"Error matching pattern {pattern}: {e}")
374
+
375
+ # Fall back to string parsing methods
376
+ if "refined query:" in text.lower():
377
+ return text.split("refined query:", 1)[1].strip()
378
+ elif "query:" in text.lower():
379
+ return text.split("query:", 1)[1].strip()
380
+
381
+ # If all else fails, return the whole string
382
+ return text
run_app.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Run script for the Web Research Agent with error handling
3
+ """
4
+ import os
5
+ import sys
6
+ import traceback
7
+
8
+ # Ensure assets directory exists
9
+ os.makedirs("assets", exist_ok=True)
10
+
11
+ try:
12
+ # Try importing gradio first to check version and availability
13
+ import gradio as gr
14
+ print(f"Using Gradio version: {gr.__version__}")
15
+
16
+ # Then run the main app
17
+ from app import app
18
+
19
+ # Launch the app with debugging enabled
20
+ app.launch(share=False, debug=True) # Enable debug mode to see error traces
21
+
22
+ except ImportError as e:
23
+ print("Error: Missing required packages.")
24
+ print(f"Details: {e}")
25
+ print("\nPlease install the required packages:")
26
+ print("pip install -r requirements.txt")
27
+ sys.exit(1)
28
+
29
+ except Exception as e:
30
+ print(f"Error: {e}")
31
+ print("\nTraceback:")
32
+ traceback.print_exc()
33
+
34
+ # Special handling for common Gradio errors
35
+ if "got an unexpected keyword argument" in str(e):
36
+ print("\nThis appears to be an issue with Gradio version compatibility.")
37
+ print("The app is trying to use features not available in your installed Gradio version.")
38
+ print("\nTry updating Gradio:")
39
+ print("pip install --upgrade gradio")
40
+ elif "CrewOutput" in str(e) or "dict object" in str(e):
41
+ print("\nThis appears to be an issue with CrewAI output format.")
42
+ print("The app is having trouble processing CrewAI outputs.")
43
+ print("\nTry updating CrewAI:")
44
+ print("pip install --upgrade crewai crewai-tools")
45
+
46
+ sys.exit(1)
search_test.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ from dotenv import load_dotenv
4
+ from crewai_tools import BraveSearchTool
5
+ from tools import TavilySearchTool, RateLimitedToolWrapper, SearchRotationTool
6
+
7
+ # Load environment variables
8
+ load_dotenv()
9
+
10
+ def validate_api_keys():
11
+ """Checks if required API keys are set"""
12
+ missing_keys = []
13
+
14
+ if not os.getenv("BRAVE_API_KEY"):
15
+ missing_keys.append("BRAVE_API_KEY")
16
+
17
+ if not os.getenv("TAVILY_API_KEY"):
18
+ missing_keys.append("TAVILY_API_KEY")
19
+
20
+ return missing_keys
21
+
22
+ def main():
23
+ # Check for API keys
24
+ missing_keys = validate_api_keys()
25
+ if missing_keys:
26
+ print(f"Error: Missing required API keys: {', '.join(missing_keys)}")
27
+ print("Please set these in your .env file.")
28
+ sys.exit(1)
29
+
30
+ # Initialize search tools
31
+ brave_search_tool = BraveSearchTool(
32
+ n_results=3,
33
+ save_file=False
34
+ )
35
+
36
+ tavily_search_tool = TavilySearchTool(
37
+ max_results=3,
38
+ search_depth="basic"
39
+ )
40
+
41
+ # Add rate limiting to each search tool
42
+ rate_limited_brave_search = RateLimitedToolWrapper(tool=brave_search_tool, delay=10) # Reduced delay for testing
43
+ rate_limited_tavily_search = RateLimitedToolWrapper(tool=tavily_search_tool, delay=10) # Reduced delay for testing
44
+
45
+ # Create the search rotation tool
46
+ search_rotation_tool = SearchRotationTool(
47
+ search_tools=[rate_limited_brave_search, rate_limited_tavily_search],
48
+ max_searches_per_query=5
49
+ )
50
+
51
+ # Get user query
52
+ if len(sys.argv) > 1:
53
+ query = " ".join(sys.argv[1:])
54
+ else:
55
+ query = input("Enter your search query: ")
56
+
57
+ # Perform searches
58
+ print(f"Searching for: '{query}'")
59
+ print("Will perform up to 5 searches using Brave and Tavily in rotation")
60
+ print("-" * 50)
61
+
62
+ # First search
63
+ result1 = search_rotation_tool.run(query)
64
+ print(result1)
65
+ print("\n" + "-" * 50)
66
+
67
+ # Modified query
68
+ modified_query = f"{query} recent news"
69
+ print(f"Searching for modified query: '{modified_query}'")
70
+
71
+ # Second search
72
+ result2 = search_rotation_tool.run(modified_query)
73
+ print(result2)
74
+ print("\n" + "-" * 50)
75
+
76
+ # Try exceeding the limit with multiple searches for the same query
77
+ print(f"Attempting additional searches for: '{query}'")
78
+
79
+ for i in range(4):
80
+ print(f"\nAttempt {i+1}:")
81
+ result = search_rotation_tool.run(query)
82
+ print(result)
83
+ print("-" * 50)
84
+
85
+ print("\nTest complete!")
86
+
87
+ if __name__ == "__main__":
88
+ main()
tasks.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, List, Any
2
+ from crewai import Task
3
+ from crewai import Agent
4
+ from datetime import datetime
5
+ def create_query_refinement_task(researcher_agent: Agent, query: str) -> Task:
6
+ """
7
+ Creates a task for refining the user's query to optimize search results.
8
+
9
+ Args:
10
+ researcher_agent: The researcher agent to perform the task
11
+ query: The original user query
12
+
13
+ Returns:
14
+ Task for query refinement
15
+ """
16
+ return Task(
17
+ description=(
18
+ f"Given the user query: '{query}', refine it to create an effective search query.Today is {datetime.now().strftime('%Y-%m-%d')}"
19
+ f"Consider adding specificity, removing ambiguity, and using precise terms. But don't add anything that's not relevant to the query. i.e if you don't know the meaning of abbriviations then don't try to complete it. "
20
+ f"If the query is invalid (just emojis, random numbers, gibberish, etc.), "
21
+ f"flag it as invalid. Otherwise, return both the original query and your refined version."
22
+ f"Don't add any extra information to the query. Just refine it."
23
+ f"For Technical queries , don't try to make it a question. Just refine it."
24
+ f"I want you to understand the user's query and refine it to be more specific and accurate do not add any extra information to the query which will change the meaning of the query."
25
+ ),
26
+ expected_output=(
27
+ "Return your response in a structured format like this:\n"
28
+ "```json\n"
29
+ "{\n"
30
+ ' "original_query": "original query here",\n'
31
+ ' "refined_query": "improved query here",\n'
32
+ ' "reasoning": "brief explanation of your refinements"\n'
33
+ "}\n"
34
+ "```\n\n"
35
+ "Or if the query is invalid, return:\n"
36
+ "```json\n"
37
+ "{\n"
38
+ ' "is_valid": false,\n'
39
+ ' "reason": "explanation why the query is invalid"\n'
40
+ "}\n"
41
+ "```"
42
+ ),
43
+ agent=researcher_agent
44
+ )
45
+
46
+ def create_search_task(researcher_agent: Agent, query: str) -> Task:
47
+ """
48
+ Creates a task for performing web search with the refined query.
49
+
50
+ Args:
51
+ researcher_agent: The researcher agent to perform the task
52
+ query: The refined query to search
53
+
54
+ Returns:
55
+ Task for web search
56
+ """
57
+ return Task(
58
+ description=(
59
+ f"Using the refined query: '{query}', search the web to find the most relevant "
60
+ f"and reliable information. Return a comprehensive list of search results, "
61
+ f"including titles, snippets, and URLs. Focus on finding high-quality sources."
62
+ ),
63
+ expected_output=(
64
+ "A JSON list of search results containing: "
65
+ "1. Title of the page "
66
+ "2. URL "
67
+ "3. Snippet or description "
68
+ ),
69
+ agent=researcher_agent
70
+ )
71
+
72
+ def create_content_scraping_task(analyst_agent: Agent, search_results: List[Dict[str, Any]]) -> Task:
73
+ """
74
+ Creates a task for scraping content from search result URLs.
75
+
76
+ Args:
77
+ analyst_agent: The analyst agent to perform the task
78
+ search_results: The search results to scrape
79
+
80
+ Returns:
81
+ Task for content scraping
82
+ """
83
+ urls = [result.get("link", "") for result in search_results if "link" in result]
84
+ urls_str = "\n".join(urls)
85
+
86
+ return Task(
87
+ description=(
88
+ f"Scrape the content from these URLs:\n{urls_str}\n\n"
89
+ f"For each URL, extract the main content, focusing on text relevant to the search query. "
90
+ f"Ignore navigation elements, ads, and other irrelevant page components."
91
+ ),
92
+ expected_output=(
93
+ "A JSON dictionary mapping each URL to its scraped content. For each URL, provide: "
94
+ "1. The URL as the key "
95
+ "2. The extracted content as the value"
96
+ ),
97
+ agent=analyst_agent
98
+ )
99
+
100
+ def create_content_analysis_task(analyst_agent: Agent, query: str, scraped_contents: Dict[str, str]) -> Task:
101
+ """
102
+ Creates a task for analyzing and evaluating scraped content.
103
+
104
+ Args:
105
+ analyst_agent: The analyst agent to perform the task
106
+ query: The original or refined query
107
+ scraped_contents: Dict mapping URLs to scraped content
108
+
109
+ Returns:
110
+ Task for content analysis
111
+ """
112
+ return Task(
113
+ description=(
114
+ f"Analyze the relevance and factuality of the scraped content in relation to the query: '{query}'\n\n"
115
+ f"For each piece of content, evaluate: "
116
+ f"1. Relevance to the query (score 0-10) "
117
+ f"2. Factual accuracy (score 0-10) "
118
+ f"3. Filter out low-quality or irrelevant information"
119
+ ),
120
+ expected_output=(
121
+ "A JSON dictionary with analysis for each URL containing: "
122
+ "1. Relevance score (0-10) "
123
+ "2. Factuality score (0-10) "
124
+ "3. Filtered content (removing irrelevant parts) "
125
+ "4. Brief analysis explaining your judgment"
126
+ ),
127
+ agent=analyst_agent
128
+ )
129
+
130
+ def create_response_writing_task(writer_agent: Agent, query: str, analyzed_contents: Dict[str, Dict[str, Any]]) -> Task:
131
+ """
132
+ Creates a task for writing a comprehensive response based on analyzed content.
133
+
134
+ Args:
135
+ writer_agent: The writer agent to perform the task
136
+ query: The original query
137
+ analyzed_contents: Dict mapping URLs to analysis results
138
+
139
+ Returns:
140
+ Task for response writing
141
+ """
142
+ return Task(
143
+ description=(
144
+ f"Write a comprehensive response to the query: '{query}'\n\n"
145
+ f"Use the analyzed content to craft a well-structured, informative response that directly "
146
+ f"answers the user's query. Include proper citations for all information using [1], [2] format. "
147
+ f"Focus on clarity, factual accuracy, and addressing all aspects of the query."
148
+ ),
149
+ expected_output=(
150
+ "A comprehensive response that: "
151
+ "1. Directly answers the user's query "
152
+ "2. Uses information from the provided sources "
153
+ "3. Includes citations in [1], [2] format for all factual information "
154
+ "4. Provides a list of sources at the end"
155
+ ),
156
+ agent=writer_agent
157
+ )
tools/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from .search_rotation import SearchRotationTool
2
+ from .content_analyzer import ContentAnalyzerTool
3
+ from .rate_limited_tool import RateLimitedToolWrapper
4
+ from .tavily_search import TavilySearchTool
5
+
6
+ __all__ = [
7
+ 'SearchRotationTool',
8
+ 'ContentAnalyzerTool',
9
+ 'RateLimitedToolWrapper',
10
+ 'TavilySearchTool'
11
+ ]
tools/__pycache__/__init__.cpython-311.pyc ADDED
Binary file (522 Bytes). View file
 
tools/__pycache__/content_analyzer.cpython-311.pyc ADDED
Binary file (4.54 kB). View file
 
tools/__pycache__/rate_limited_tool.cpython-311.pyc ADDED
Binary file (5.52 kB). View file
 
tools/__pycache__/search_rotation.cpython-311.pyc ADDED
Binary file (13.6 kB). View file
 
tools/__pycache__/tavily_search.cpython-311.pyc ADDED
Binary file (7.28 kB). View file
 
tools/content_analyzer.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Optional, Dict, Any, Type
2
+ from crewai.tools import BaseTool
3
+ from pydantic import Field, BaseModel
4
+
5
+ # Define the input schema as a separate class
6
+ class ContentAnalyzerArgs(BaseModel):
7
+ query: str = Field(
8
+ ...,
9
+ description="The search query to compare content against"
10
+ )
11
+ content: str = Field(
12
+ ...,
13
+ description="The content to analyze for relevance and factuality"
14
+ )
15
+
16
+ class ContentAnalyzerTool(BaseTool):
17
+ """
18
+ A tool for analyzing content relevance and factuality.
19
+ This tool uses LLM to judge the relevance and factual accuracy of content
20
+ in relation to a specific query.
21
+ """
22
+
23
+ name: str = Field(
24
+ default="Content Analyzer",
25
+ description="Name of the content analysis tool"
26
+ )
27
+ description: str = Field(
28
+ default=(
29
+ "Use this tool to analyze the relevance and factuality of content "
30
+ "in relation to a specific query. "
31
+ "It helps filter out irrelevant or potentially non-factual information."
32
+ ),
33
+ description="Description of what the content analyzer does"
34
+ )
35
+
36
+ # Define args_schema as a class attribute
37
+ args_schema: Type[BaseModel] = ContentAnalyzerArgs
38
+
39
+ def _run(self, query: str, content: str) -> Dict[str, Any]:
40
+ """
41
+ Analyze the content for relevance and factuality.
42
+
43
+ Args:
44
+ query: The original search query
45
+ content: The content to analyze
46
+
47
+ Returns:
48
+ Dict with analysis results including:
49
+ - relevance_score: A score from 0-10 indicating relevance
50
+ - factuality_score: A score from 0-10 indicating factual reliability
51
+ - filtered_content: The processed content with irrelevant parts removed
52
+ - analysis: Brief explanation of the judgment
53
+ """
54
+ # The actual implementation will use the agent's LLM
55
+ # via CrewAI's mechanism, returning the placeholders
56
+ # for now which will be replaced during execution
57
+ prompt = f"""
58
+ You are a strict content judge evaluating web search results.
59
+
60
+ QUERY: {query}
61
+ CONTENT: {content}
62
+
63
+ Analyze the content above with these criteria:
64
+ 1. Relevance to the query (score 0-10)
65
+ 2. Factual accuracy and reliability (score 0-10)
66
+ 3. Information quality
67
+
68
+ For content scoring below 5 on relevance, discard it entirely.
69
+ For content with factuality concerns, flag these specifically.
70
+
71
+ PROVIDE YOUR ANALYSIS IN THIS FORMAT:
72
+ {{
73
+ "relevance_score": [0-10],
74
+ "factuality_score": [0-10],
75
+ "filtered_content": "The filtered and cleaned content, removing irrelevant parts",
76
+ "analysis": "Brief explanation of your judgment"
77
+ }}
78
+
79
+ ONLY RETURN THE JSON, nothing else.
80
+ """
81
+
82
+ # This method will be handled by CrewAI's internal mechanism
83
+ # For placeholder purposes during direct testing, we return example data.
84
+ # In a real CrewAI run, the agent's LLM would process the prompt.
85
+ return {
86
+ "relevance_score": 7, # Placeholder
87
+ "factuality_score": 8, # Placeholder
88
+ "filtered_content": content, # Placeholder
89
+ "analysis": "This is a placeholder analysis. The real analysis will be performed during execution."
90
+ }
91
+
92
+ class Config:
93
+ """Pydantic config for the tool"""
94
+ arbitrary_types_allowed = True
95
+
96
+ def run(self, query: str, content: str) -> Dict[str, Any]:
97
+ """Public method to run content analysis"""
98
+ return self._run(query, content)
tools/rate_limited_tool.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+ from typing import Any, Dict, Optional, Type
3
+ from crewai.tools import BaseTool
4
+ from pydantic import BaseModel, Field, model_validator, create_model
5
+ import logging
6
+
7
+ logger = logging.getLogger(__name__)
8
+
9
+ class RateLimitedToolWrapper(BaseTool):
10
+ """
11
+ A wrapper tool that adds an optional time delay after executing another tool.
12
+ Useful for enforcing rate limits on API calls or simply adding a pause.
13
+ It also ensures that arguments are correctly passed to the wrapped tool.
14
+ """
15
+ name: str = Field(
16
+ default="Rate Limited Tool Wrapper",
17
+ description="A tool that wraps another tool to add a delay after execution"
18
+ )
19
+ description: str = Field(
20
+ default="Wraps another tool to add a delay after execution, enforcing rate limits.",
21
+ description="The tool's description that will be passed to the agent"
22
+ )
23
+ tool: BaseTool = Field(
24
+ ...,
25
+ description="The tool to be wrapped with rate limiting"
26
+ )
27
+ delay: float = Field(
28
+ default=0.0,
29
+ description="Delay in seconds to wait after tool execution (0 means no delay)",
30
+ ge=0.0
31
+ )
32
+
33
+ # Create a simple args schema for fallback
34
+ class RateLimitedToolArgs(BaseModel):
35
+ query: str = Field(..., description="The search query to pass to the wrapped tool")
36
+
37
+ def __init__(self, **data):
38
+ # Store the original args_schema if available
39
+ tool = data.get('tool')
40
+
41
+ # Set args_schema directly in data before initialization
42
+ if tool and hasattr(tool, 'args_schema') and tool.args_schema is not None:
43
+ if isinstance(tool.args_schema, type) and issubclass(tool.args_schema, BaseModel):
44
+ data['args_schema'] = tool.args_schema
45
+ else:
46
+ data['args_schema'] = self.RateLimitedToolArgs
47
+ else:
48
+ data['args_schema'] = self.RateLimitedToolArgs
49
+
50
+ super().__init__(**data)
51
+
52
+ def _run(self, query: str) -> str:
53
+ """
54
+ Run the wrapped tool with the query parameter and then pause for the specified delay.
55
+
56
+ Args:
57
+ query: The query string to pass to the wrapped tool.
58
+
59
+ Returns:
60
+ The result from the wrapped tool.
61
+ """
62
+ logger.debug(f"RateLimitedToolWrapper: Running tool '{self.tool.name}' with query: {query}")
63
+
64
+ try:
65
+ # Call the tool's run method with the query
66
+ result = self.tool.run(query)
67
+
68
+ except Exception as e:
69
+ logger.error(f"Exception running wrapped tool '{self.tool.name}': {e}")
70
+ # Fall back to trying the _run method directly if the run method fails
71
+ try:
72
+ if hasattr(self.tool, '_run'):
73
+ logger.warning(f"Falling back to direct _run call for tool '{self.tool.name}'")
74
+ result = self.tool._run(query)
75
+ else:
76
+ raise e
77
+ except Exception as inner_e:
78
+ logger.error(f"Fallback also failed for tool '{self.tool.name}': {inner_e}")
79
+ raise inner_e
80
+
81
+ # Enforce the delay only if greater than 0
82
+ if self.delay > 0:
83
+ logger.info(f"Rate limit enforced: Waiting {self.delay:.2f} seconds after running {self.tool.name}.")
84
+ time.sleep(self.delay)
85
+
86
+ return result
tools/search_rotation.py ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import random
2
+ import time
3
+ from typing import List, Dict, Any, Optional, Type
4
+ from crewai.tools import BaseTool
5
+ from pydantic import BaseModel, Field
6
+
7
+ class SearchRotationArgs(BaseModel):
8
+ """Input schema for SearchRotationTool."""
9
+ query: str = Field(..., description="The search query to look up")
10
+
11
+ class SearchRotationTool(BaseTool):
12
+ """
13
+ Tool for rotating between multiple search engines with a limit on searches per query.
14
+
15
+ This tool alternates between different search engines and enforces a maximum
16
+ number of searches per query to manage API usage and costs.
17
+ """
18
+ name: str = Field(
19
+ default="Web Search Rotation",
20
+ description="Search the internet using multiple search engines in rotation"
21
+ )
22
+ description: str = Field(
23
+ default="Use this tool to search for information on the internet using different search engines in rotation.",
24
+ description="Description of the search rotation tool"
25
+ )
26
+
27
+ search_tools: List[BaseTool] = Field(
28
+ default=[],
29
+ description="List of search tools to rotate between"
30
+ )
31
+ max_searches_per_query: int = Field(
32
+ default=5,
33
+ description="Maximum number of searches allowed per query"
34
+ )
35
+ cache_timeout: int = Field(
36
+ default=300, # 5 minutes
37
+ description="How long to cache results for similar queries in seconds"
38
+ )
39
+
40
+ args_schema: Type[BaseModel] = SearchRotationArgs
41
+
42
+ def __init__(self, **data):
43
+ super().__init__(**data)
44
+ if not self.search_tools:
45
+ raise ValueError("At least one search tool must be provided")
46
+ self._search_count = 0
47
+ self._current_search_query = None
48
+ self._last_used_tool = None
49
+ self._cache = {} # Simple cache for recent queries
50
+ self._last_search_time = {} # Track when each tool was last used
51
+
52
+ # Log available search tools
53
+ tool_names = [tool.name for tool in self.search_tools]
54
+ print(f"SearchRotationTool initialized with tools: {', '.join(tool_names)}")
55
+
56
+ def _run(self, query: str) -> str:
57
+ """
58
+ Execute a web search using a rotation of search engines.
59
+
60
+ Args:
61
+ query: The search query to look up
62
+
63
+ Returns:
64
+ String containing the search results
65
+ """
66
+ print(f"SearchRotationTool executing search for: '{query}'")
67
+
68
+ # Check cache first for very similar queries
69
+ for cached_query, (timestamp, result) in list(self._cache.items()):
70
+ # Simple similarity check - if query is very similar to a cached query
71
+ if self._is_similar_query(query, cached_query):
72
+ # Check if cache is still valid
73
+ if time.time() - timestamp < self.cache_timeout:
74
+ print(f"Using cached result for similar query: '{cached_query}'")
75
+ return f"{result}\n\n[Cached result from similar query: '{cached_query}']"
76
+ else:
77
+ # Remove expired cache entries to prevent cache bloat
78
+ print(f"Cache expired for query: '{cached_query}'")
79
+ self._cache.pop(cached_query, None)
80
+
81
+ # Reset counter if this is a new query
82
+ if not self._is_similar_query(self._current_search_query, query):
83
+ print(f"New search query detected. Resetting search count.")
84
+ self._current_search_query = query
85
+ self._search_count = 0
86
+
87
+ # Check if we've reached the search limit
88
+ if self._search_count >= self.max_searches_per_query:
89
+ print(f"Search limit reached ({self._search_count}/{self.max_searches_per_query})")
90
+ return (f"Search limit reached. You've performed {self._search_count} searches "
91
+ f"for this query. Maximum allowed is {self.max_searches_per_query}.")
92
+
93
+ # Select the most appropriate search tool based on usage and delay
94
+ search_tool = self._select_optimal_tool()
95
+ print(f"Selected search tool: {search_tool.name}")
96
+
97
+ # Keep track of which tools we've tried for this specific search attempt
98
+ tried_tools = set()
99
+ max_retry_attempts = min(3, len(self.search_tools))
100
+ retry_count = 0
101
+
102
+ while retry_count < max_retry_attempts:
103
+ tried_tools.add(search_tool.name)
104
+
105
+ try:
106
+ # Execute the search
107
+ print(f"Using Tool: {search_tool.name}")
108
+ start_time = time.time()
109
+ result = search_tool.run(query)
110
+ search_time = time.time() - start_time
111
+
112
+ # Basic validation of result - check if it's empty or error message
113
+ if not result or "error" in result.lower() or len(result.strip()) < 20:
114
+ # Result might be invalid, try another tool if available
115
+ print(f"Invalid or error result from {search_tool.name}. Trying another tool.")
116
+ retry_count += 1
117
+ search_tool = self._select_next_tool(tried_tools)
118
+ if not search_tool: # No more tools to try
119
+ print("All search tools failed. No more tools to try.")
120
+ return "All search tools failed to provide meaningful results for this query."
121
+ continue
122
+
123
+ # Valid result obtained
124
+ print(f"Valid result obtained from {search_tool.name} in {search_time:.2f}s")
125
+
126
+ # Update tracking
127
+ self._last_used_tool = search_tool
128
+ self._last_search_time[search_tool.name] = time.time()
129
+
130
+ # Cache the result
131
+ self._cache[query] = (time.time(), result)
132
+
133
+ # Increment the counter
134
+ self._search_count += 1
135
+ print(f"Search count incremented to {self._search_count}/{self.max_searches_per_query}")
136
+
137
+ # Add usage information
138
+ searches_left = self.max_searches_per_query - self._search_count
139
+ usage_info = f"\n\nSearch performed using {search_tool.name} in {search_time:.2f}s. "
140
+ usage_info += f"Searches used: {self._search_count}/{self.max_searches_per_query}. "
141
+ usage_info += f"Searches remaining: {max(0, searches_left)}."
142
+
143
+ return f"{result}\n{usage_info}"
144
+
145
+ except Exception as e:
146
+ # If this search tool fails, try another one
147
+ print(f"Exception in {search_tool.name}: {str(e)}")
148
+ retry_count += 1
149
+ search_tool = self._select_next_tool(tried_tools)
150
+ if not search_tool: # No more tools to try
151
+ print("All search tools failed with exceptions. No more tools to try.")
152
+ return f"Error searching with all available search engines: {str(e)}"
153
+
154
+ # If we've exhausted our retry attempts
155
+ print(f"Failed after {retry_count} retry attempts")
156
+ return "Failed to get search results after multiple attempts with different search engines."
157
+
158
+ def _select_next_tool(self, tried_tools: set) -> Optional[BaseTool]:
159
+ """Select the next tool that hasn't been tried yet."""
160
+ available_tools = [t for t in self.search_tools if t.name not in tried_tools]
161
+ if not available_tools:
162
+ return None
163
+
164
+ # Sort by last used time (oldest first) if we have that data
165
+ if self._last_search_time:
166
+ available_tools.sort(key=lambda t: self._last_search_time.get(t.name, 0))
167
+
168
+ return available_tools[0] if available_tools else None
169
+
170
+ def _select_optimal_tool(self) -> BaseTool:
171
+ """Select the best tool based on recent usage patterns."""
172
+ current_time = time.time()
173
+
174
+ # If we have no history or all tools used recently, pick randomly with weights
175
+ if not self._last_used_tool or not self._last_search_time:
176
+ return random.choice(self.search_tools)
177
+
178
+ # Try to avoid using the same tool twice in a row
179
+ available_tools = [t for t in self.search_tools if t != self._last_used_tool]
180
+
181
+ # If we have multiple tools available, choose the one used least recently
182
+ if available_tools:
183
+ # Sort by last used time (oldest first)
184
+ available_tools.sort(key=lambda t: self._last_search_time.get(t.name, 0))
185
+ return available_tools[0]
186
+
187
+ # If only one tool available, use it
188
+ return self.search_tools[0]
189
+
190
+ def _is_similar_query(self, query1, query2):
191
+ """Check if two queries are similar enough to use cached results."""
192
+ if not query1 or not query2:
193
+ return False
194
+
195
+ # Convert to lowercase and remove common filler words
196
+ q1 = query1.lower()
197
+ q2 = query2.lower()
198
+
199
+ # If the strings are identical
200
+ if q1 == q2:
201
+ return True
202
+
203
+ # Remove common filler words to focus on meaningful terms
204
+ filler_words = {'the', 'a', 'an', 'and', 'or', 'but', 'is', 'are', 'was', 'were',
205
+ 'in', 'on', 'at', 'to', 'for', 'with', 'by', 'about', 'like',
206
+ 'through', 'over', 'before', 'between', 'after', 'since', 'without',
207
+ 'under', 'within', 'along', 'following', 'across', 'behind',
208
+ 'beyond', 'plus', 'except', 'but', 'up', 'down', 'off', 'on', 'me', 'you'}
209
+
210
+ # Clean and tokenize
211
+ def clean_and_tokenize(q):
212
+ # Remove punctuation
213
+ q = ''.join(c for c in q if c.isalnum() or c.isspace())
214
+ # Tokenize
215
+ tokens = q.split()
216
+ # Remove filler words
217
+ return {word for word in tokens if word.lower() not in filler_words and len(word) > 1}
218
+
219
+ words1 = clean_and_tokenize(q1)
220
+ words2 = clean_and_tokenize(q2)
221
+
222
+ # If either query has no significant words after cleaning, they're not similar
223
+ if not words1 or not words2:
224
+ return False
225
+
226
+ # Calculate Jaccard similarity
227
+ intersection = len(words1.intersection(words2))
228
+ union = len(words1.union(words2))
229
+
230
+ # If the queries are short, we require more overlap
231
+ min_words = min(len(words1), len(words2))
232
+ max_words = max(len(words1), len(words2))
233
+
234
+ # For short queries, use strict similarity threshold
235
+ if min_words <= 3:
236
+ # For very short queries, require almost exact match
237
+ return intersection / union > 0.8
238
+ # For normal length queries
239
+ elif min_words <= 6:
240
+ return intersection / union > 0.7
241
+ # For longer queries
242
+ else:
243
+ # Check both Jaccard similarity and absolute intersection size
244
+ # For long queries, having many words in common is important
245
+ absolute_overlap_threshold = min(5, min_words // 2)
246
+ return (intersection / union > 0.6) or (intersection >= absolute_overlap_threshold)
tools/tavily_search.py ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import requests
3
+ import time
4
+ import hashlib
5
+ from typing import Dict, Any, Optional, Type
6
+ from crewai.tools import BaseTool
7
+ from pydantic import BaseModel, Field
8
+
9
+ class TavilySearchArgs(BaseModel):
10
+ """Input schema for TavilySearchTool."""
11
+ query: str = Field(..., description="The search query to look up")
12
+
13
+ class TavilySearchTool(BaseTool):
14
+ """
15
+ Tool for performing web searches using the Tavily Search API.
16
+
17
+ This tool sends a search query to Tavily and returns relevant search results.
18
+ """
19
+ name: str = Field(
20
+ default="Tavily Web Search",
21
+ description="Search the internet using Tavily"
22
+ )
23
+ description: str = Field(
24
+ default="Use this tool to search for information on the internet using Tavily Search API.",
25
+ description="Description of the Tavily search tool"
26
+ )
27
+
28
+ api_key: Optional[str] = Field(
29
+ default=None,
30
+ description="Tavily API key. If not provided, will look for TAVILY_API_KEY environment variable"
31
+ )
32
+ search_depth: str = Field(
33
+ default="basic",
34
+ description="The depth of the search, 'basic' or 'advanced'"
35
+ )
36
+ max_results: int = Field(
37
+ default=5,
38
+ description="Maximum number of search results to return (1-10)"
39
+ )
40
+ include_answer: bool = Field(
41
+ default=False,
42
+ description="Whether to include an AI-generated answer in the response"
43
+ )
44
+ timeout: int = Field(
45
+ default=10,
46
+ description="Timeout for the API request in seconds"
47
+ )
48
+
49
+ args_schema: Type[BaseModel] = TavilySearchArgs
50
+
51
+ def __init__(self, **data):
52
+ super().__init__(**data)
53
+ self.api_key = self.api_key or os.getenv("TAVILY_API_KEY")
54
+ if not self.api_key:
55
+ print("WARNING: Tavily API key is missing. The tool will return an error message when used.")
56
+ self._cache = {} # Simple in-memory cache
57
+
58
+ def _run(self, query: str) -> str:
59
+ """
60
+ Execute a web search using Tavily.
61
+
62
+ Args:
63
+ query: The search query to look up
64
+
65
+ Returns:
66
+ String containing the search results
67
+ """
68
+ # Check if API key is missing
69
+ if not self.api_key:
70
+ return (
71
+ "ERROR: Tavily API key is missing. Please set the TAVILY_API_KEY environment variable. "
72
+ "Search cannot be performed without a valid API key."
73
+ )
74
+
75
+ # Check cache first
76
+ cache_key = self._get_cache_key(query)
77
+ if cache_key in self._cache:
78
+ timestamp, result = self._cache[cache_key]
79
+ # Cache valid for 30 minutes
80
+ if time.time() - timestamp < 1800:
81
+ return f"{result}\n\n[Cached Tavily result]"
82
+
83
+ url = "https://api.tavily.com/search"
84
+
85
+ payload = {
86
+ "api_key": self.api_key,
87
+ "query": query,
88
+ "search_depth": self.search_depth,
89
+ "max_results": min(self.max_results, 10), # Ensure we don't exceed API limits
90
+ "include_answer": self.include_answer
91
+ }
92
+
93
+ try:
94
+ response = requests.post(url, json=payload, timeout=self.timeout)
95
+ response.raise_for_status()
96
+ result = response.json()
97
+
98
+ if "results" not in result:
99
+ return f"Error in search: {result.get('error', 'Unknown error')}"
100
+
101
+ # Format the results
102
+ formatted_results = self._format_results(result)
103
+
104
+ # Cache the result
105
+ self._cache[cache_key] = (time.time(), formatted_results)
106
+
107
+ return formatted_results
108
+
109
+ except requests.exceptions.Timeout:
110
+ return "Error: Tavily search request timed out. Please try again later."
111
+ except requests.exceptions.RequestException as e:
112
+ return f"Error during Tavily search: {str(e)}"
113
+
114
+ def _format_results(self, result: Dict[str, Any]) -> str:
115
+ """Format the search results into a readable string."""
116
+ output = []
117
+
118
+ # Add the answer if included
119
+ if "answer" in result and result["answer"]:
120
+ output.append(f"Answer: {result['answer']}\n")
121
+
122
+ # Add search results
123
+ output.append("Search Results:")
124
+
125
+ for i, r in enumerate(result.get("results", []), 1):
126
+ title = r.get("title", "No Title")
127
+ url = r.get("url", "No URL")
128
+ content = r.get("content", "No Content").strip()
129
+
130
+ result_text = f"\n{i}. {title}\n URL: {url}\n Content: {content}\n"
131
+ output.append(result_text)
132
+
133
+ return "\n".join(output)
134
+
135
+ def _get_cache_key(self, query: str) -> str:
136
+ """Generate a cache key for the given query."""
137
+ # Include search parameters in the key
138
+ params_str = f"{query}|{self.search_depth}|{self.max_results}|{self.include_answer}"
139
+ return hashlib.md5(params_str.encode()).hexdigest()
utils/__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .helpers import is_valid_query, format_research_results, extract_citations
2
+
3
+ __all__ = ['is_valid_query', 'format_research_results', 'extract_citations']
utils/__pycache__/__init__.cpython-311.pyc ADDED
Binary file (330 Bytes). View file
 
utils/__pycache__/helpers.cpython-311.pyc ADDED
Binary file (4.66 kB). View file
 
utils/helpers.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import json
3
+ from typing import Dict, Any, List, Optional
4
+
5
+ def is_valid_query(query: str) -> bool:
6
+ """
7
+ Validates if a search query is legitimate.
8
+
9
+ Args:
10
+ query: The search query to validate
11
+
12
+ Returns:
13
+ Boolean indicating if the query is valid
14
+ """
15
+ # Reject empty queries
16
+ if not query or query.strip() == "":
17
+ return False
18
+
19
+ # Reject single emoji queries
20
+ emoji_pattern = re.compile(
21
+ "["
22
+ "\U0001F600-\U0001F64F" # emoticons
23
+ "\U0001F300-\U0001F5FF" # symbols & pictographs
24
+ "\U0001F680-\U0001F6FF" # transport & map symbols
25
+ "\U0001F700-\U0001F77F" # alchemical symbols
26
+ "\U0001F780-\U0001F7FF" # Geometric Shapes
27
+ "\U0001F800-\U0001F8FF" # Supplemental Arrows-C
28
+ "\U0001F900-\U0001F9FF" # Supplemental Symbols and Pictographs
29
+ "\U0001FA00-\U0001FA6F" # Chess Symbols
30
+ "\U0001FA70-\U0001FAFF" # Symbols and Pictographs Extended-A
31
+ "\U00002702-\U000027B0" # Dingbats
32
+ "\U000024C2-\U0001F251"
33
+ "]+"
34
+ )
35
+
36
+ stripped_query = emoji_pattern.sub(r'', query).strip()
37
+ if not stripped_query and len(query) <= 5: # Single emoji or very short
38
+ return False
39
+
40
+ # Reject random numbers only (at least 5 digits with no context)
41
+ if re.match(r'^\d{5,}$', query.strip()):
42
+ return False
43
+
44
+ # Reject gibberish (no vowels in long string suggests gibberish)
45
+ if len(query) > 10 and not re.search(r'[aeiouAEIOU]', query):
46
+ return False
47
+
48
+ return True
49
+
50
+ def format_research_results(search_results: List[Dict[str, Any]],
51
+ scraped_contents: Dict[str, str],
52
+ analyzed_contents: Dict[str, Dict[str, Any]]) -> str:
53
+ """
54
+ Formats research results into a readable response with citations.
55
+
56
+ Args:
57
+ search_results: The list of search result items
58
+ scraped_contents: Dict mapping URLs to scraped content
59
+ analyzed_contents: Dict mapping URLs to analysis results
60
+
61
+ Returns:
62
+ Formatted response with citations
63
+ """
64
+ response_parts = []
65
+ citations = []
66
+
67
+ # Filter to only include relevant content based on analysis
68
+ relevant_urls = {
69
+ url: data
70
+ for url, data in analyzed_contents.items()
71
+ if data.get("relevance_score", 0) >= 5
72
+ }
73
+
74
+ # No relevant results
75
+ if not relevant_urls:
76
+ return "I couldn't find relevant information for your query. Could you try rephrasing or providing more details?"
77
+
78
+ # Compile the response with relevant information
79
+ for i, (url, data) in enumerate(relevant_urls.items(), 1):
80
+ citations.append(f"[{i}] {url}")
81
+ filtered_content = data.get("filtered_content", "")
82
+
83
+ # Add the content with citation
84
+ if filtered_content:
85
+ response_parts.append(f"{filtered_content} [{i}]")
86
+
87
+ # Combine everything
88
+ response = "\n\n".join(response_parts)
89
+ citation_text = "\n".join(citations)
90
+
91
+ return f"{response}\n\nSources:\n{citation_text}"
92
+
93
+ def extract_citations(text: str) -> List[Dict[str, str]]:
94
+ """
95
+ Extract citations from formatted text.
96
+
97
+ Args:
98
+ text: Text with citation markers like [1], [2], etc.
99
+
100
+ Returns:
101
+ List of citation objects with citation number and referenced text
102
+ """
103
+ citations = []
104
+ citation_pattern = r'\[(\d+)\]'
105
+
106
+ matches = re.finditer(citation_pattern, text)
107
+ for match in matches:
108
+ citation_num = match.group(1)
109
+ # Get the preceding text (limited to reasonable length)
110
+ start_pos = max(0, match.start() - 100)
111
+ cited_text = text[start_pos:match.start()].strip()
112
+ if len(cited_text) == 100: # Truncated
113
+ cited_text = "..." + cited_text
114
+
115
+ citations.append({
116
+ "number": citation_num,
117
+ "text": cited_text
118
+ })
119
+
120
+ return citations