Spaces:

samspeaks5
/

web-research-agent

Sleeping

App Files Files Community

samspeaks5 commited on Apr 26

Commit

d445f2a

verified ·

1 Parent(s): 6bec675

initial commit

Browse files

Files changed (28) hide show

.env.example +15 -0
.gitattributes +1 -0
README.md +146 -12
agents.py +114 -0
app.py +351 -0
architecture.md +97 -0
assets/.gitkeep +2 -0
assets/assistant_avatar.png +3 -0
assets/custom.css +494 -0
requirements.txt +7 -0
research_engine.py +382 -0
run_app.py +46 -0
search_test.py +88 -0
tasks.py +157 -0
tools/__init__.py +11 -0
tools/__pycache__/__init__.cpython-311.pyc +0 -0
tools/__pycache__/content_analyzer.cpython-311.pyc +0 -0
tools/__pycache__/rate_limited_tool.cpython-311.pyc +0 -0
tools/__pycache__/search_rotation.cpython-311.pyc +0 -0
tools/__pycache__/tavily_search.cpython-311.pyc +0 -0
tools/content_analyzer.py +98 -0
tools/rate_limited_tool.py +86 -0
tools/search_rotation.py +246 -0
tools/tavily_search.py +139 -0
utils/__init__.py +3 -0
utils/__pycache__/__init__.cpython-311.pyc +0 -0
utils/__pycache__/helpers.cpython-311.pyc +0 -0
utils/helpers.py +120 -0

.env.example ADDED Viewed

	@@ -0,0 +1,15 @@

+# API Keys
+# Get your Brave Search API key from https://brave.com/search/api/
+# Free tier: 1 request per minute, 2000 per month
+BRAVE_API_KEY=your_brave_api_key_here
+# Get your OpenAI API key from https://platform.openai.com/api-keys
+OPENAI_API_KEY=your_openai_api_key_here
+# Get your Tavily API key from https://tavily.com
+# Free tier: 1000 requests per month
+TAVILY_API_KEY=your_tavily_api_key_here
+# Optional Configuration
+# Set to True or False to enable/disable detailed logging
+VERBOSE=False

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/assistant_avatar.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,12 +1,146 @@
----
-title: Web Research Agent
-emoji: 🌍
-colorFrom: pink
-colorTo: purple
-sdk: gradio
-sdk_version: 5.26.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Web Research Agent
+A powerful AI research assistant built with CrewAI that conducts comprehensive web research on any topic, providing factual, cited responses through a multi-agent approach.
+## Overview
+This application uses specialized AI agents working together to:
+1. Refine search queries for optimal results
+2. Search the web across multiple search engines
+3. Analyze and verify content
+4. Produce well-structured, factual responses with proper citations
+## Setup Instructions
+### Prerequisites
+- Python 3.9+ (recommended: Python 3.11)
+- API keys for:
+  - OpenAI (required)
+  - Brave Search (recommended)
+  - Tavily Search (optional)
+### Installation
+1. Clone the repository and navigate to the project directory:
+   ```bash
+   git clone https://github.com/yourusername/web-research-agent.git
+   cd web-research-agent
+   ```
+2. Install required dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Create a `.env` file in the root directory with your API keys:
+   ```
+   OPENAI_API_KEY=your_openai_api_key
+   BRAVE_API_KEY=your_brave_api_key
+   TAVILY_API_KEY=your_tavily_api_key
+   VERBOSE=False  # Set to True for detailed logging
+   ```
+### Running the Application
+Start the web interface:
+```bash
+python app.py
+```
+The application will be available at http://localhost:7860
+## Common Issues & Troubleshooting
+### Pydantic/CrewAI Compatibility Issues
+If you encounter errors like:
+```
+AttributeError: 'property' object has no attribute 'model_fields'
+```
+Try the following fixes:
+1. Update to the latest CrewAI version:
+   ```bash
+   pip install -U crewai crewai-tools
+   ```
+2. If issues persist, temporarily modify the `tools/rate_limited_tool.py` file to fix compatibility with Pydantic.
+### Search API Rate Limits
+- Brave Search API has a free tier limit of 1 request per minute and 2,000 requests per month
+- The application implements rate limiting to prevent API throttling
+- Research queries may take several minutes to complete due to these limitations
+### Gradio Interface Issues
+If the interface fails to load or throws errors:
+1. Try installing a specific Gradio version:
+   ```bash
+   pip install gradio==4.26.0
+   ```
+2. Clear your browser cache to remove cached JavaScript files
+3. Run the headless test script as an alternative:
+   ```bash
+   python test.py "Your research question"
+   ```
+## Advanced Usage
+### Command Line Operation
+Test the research engine without the web interface:
+```
+python test.py "Your research query here"
+```
+### Environment Variables
+- `OPENAI_API_KEY`: Required for language model access
+- `BRAVE_API_KEY`: Recommended for web search functionality
+- `TAVILY_API_KEY`: Optional alternative search engine
+- `VERBOSE`: Set to True/False to control logging detail
+## Deployment
+This project can be deployed to Hugging Face Spaces for web access.
+### Hugging Face Spaces Deployment
+1. **Create a new Space on Hugging Face**
+   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+   - Click "Create new Space"
+   - Choose a name and select "Gradio" as the SDK
+   - Set visibility as needed
+2. **Configure Environment Variables**
+   - In Space settings, add required API keys as secrets
+3. **Deploy Code**
+   ```bash
+   git clone https://huggingface.co/spaces/your-username/your-space-name
+   cd your-space-name
+   cp -r /path/to/web-research-agent/* .
+   git add .
+   git commit -m "Initial deployment"
+   git push
+   ```
+### Security Notes
+- Never commit your `.env` file or expose API keys
+- Use repository secrets in Hugging Face Spaces
+- Keep sensitive deployments private
+## Development Structure
+- `app.py`: Web interface and session management
+- `research_engine.py`: Core research orchestration logic
+- `agents.py`: Agent definitions and configurations
+- `tools/`: Search and analysis tools
+- `test.py`: Command-line testing utility

agents.py ADDED Viewed

	@@ -0,0 +1,114 @@

+from typing import List, Dict, Any, Optional
+from crewai import Agent
+from crewai_tools import BraveSearchTool, ScrapeWebsiteTool
+from tools import ContentAnalyzerTool, RateLimitedToolWrapper, TavilySearchTool, SearchRotationTool
+def create_researcher_agent(llm=None, verbose=True) -> Agent:
+    """
+    Creates a researcher agent responsible for query refinement and web search.
+    Args:
+        llm: Language model to use for the agent
+        verbose: Whether to log agent activity
+    Returns:
+        Configured researcher agent
+    """
+    # Initialize search tools
+    brave_search_tool = BraveSearchTool(
+        n_results=5,
+        save_file=False
+    )
+    # Initialize Tavily search tool
+    # Requires a TAVILY_API_KEY in environment variables
+    tavily_search_tool = TavilySearchTool(
+        max_results=5,
+        search_depth="basic",
+        timeout=15  # Increase timeout for more reliable results
+    )
+    # Add minimal rate limiting to avoid API throttling
+    # Set delay to 0 to disable rate limiting completely
+    rate_limited_brave_search = RateLimitedToolWrapper(tool=brave_search_tool, delay=0)
+    rate_limited_tavily_search = RateLimitedToolWrapper(tool=tavily_search_tool, delay=0)
+    # Create the search rotation tool
+    search_rotation_tool = SearchRotationTool(
+        search_tools=[rate_limited_brave_search, rate_limited_tavily_search],
+        max_searches_per_query=5  # Limit to 5 searches per query as requested
+    )
+    return Agent(
+        role="Research Specialist",
+        goal="Discover accurate and relevant information from the web",
+        backstory=(
+            "You are an expert web researcher with a talent for crafting effective search queries "
+            "and finding high-quality information on any topic. Your goal is to find the most "
+            "relevant and factual information to answer user questions. You have access to multiple "
+            "search engines and know how to efficiently use them within the search limits."
+        ),
+        # Use the search rotation tool
+        tools=[search_rotation_tool],
+        verbose=verbose,
+        allow_delegation=True,
+        memory=True,
+        llm=llm
+    )
+def create_analyst_agent(llm=None, verbose=True) -> Agent:
+    """
+    Creates an analyst agent responsible for content analysis and evaluation.
+    Args:
+        llm: Language model to use for the agent
+        verbose: Whether to log agent activity
+    Returns:
+        Configured analyst agent
+    """
+    # Initialize tools
+    scrape_tool = ScrapeWebsiteTool()
+    content_analyzer = ContentAnalyzerTool()
+    return Agent(
+        role="Content Analyst",
+        goal="Analyze web content for relevance, factuality, and quality",
+        backstory=(
+            "You are a discerning content analyst with a keen eye for detail and a strong "
+            "commitment to factual accuracy. You excel at evaluating information and filtering "
+            "out irrelevant or potentially misleading content. Your expertise helps ensure that "
+            "only the most reliable information is presented."
+        ),
+        tools=[scrape_tool, content_analyzer],
+        verbose=verbose,
+        allow_delegation=True,
+        memory=True,
+        llm=llm
+    )
+def create_writer_agent(llm=None, verbose=True) -> Agent:
+    """
+    Creates a writer agent responsible for synthesizing information into coherent responses.
+    Args:
+        llm: Language model to use for the agent
+        verbose: Whether to log agent activity
+    Returns:
+        Configured writer agent
+    """
+    return Agent(
+        role="Research Writer",
+        goal="Create informative, factual, and well-cited responses to research queries",
+        backstory=(
+            "You are a skilled writer specializing in creating clear, concise, and informative "
+            "responses based on research findings. You have a talent for synthesizing information "
+            "from multiple sources and presenting it in a coherent and readable format, always with "
+            "proper citations. You prioritize factual accuracy and clarity in your writing."
+        ),
+        verbose=verbose,
+        allow_delegation=True,
+        memory=True,
+        llm=llm
+    )

app.py ADDED Viewed

	@@ -0,0 +1,351 @@

+import os
+import gradio as gr
+import logging
+import uuid
+import pathlib
+from dotenv import load_dotenv
+from research_engine import ResearchEngine
+import time
+import traceback
+# Load environment variables
+load_dotenv()
+# Configure logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+# Initialize the research engine with verbose=False for production
+research_engine = None
+# Dict to store session-specific research engines
+session_engines = {}
+def validate_api_keys(custom_openai_key=None):
+    """Checks if required API keys are set"""
+    missing_keys = []
+    if not os.getenv("BRAVE_API_KEY"):
+        missing_keys.append("BRAVE_API_KEY")
+    # Check for OpenAI key in either the environment or the custom key provided
+    if not custom_openai_key and not os.getenv("OPENAI_API_KEY"):
+        missing_keys.append("OPENAI_API_KEY")
+    return missing_keys
+def get_engine_for_session(session_id, openai_api_key=None):
+    """Get or create a research engine for the specific session with optional custom API key"""
+    if session_id not in session_engines:
+        logger.info(f"Creating new research engine for session {session_id}")
+        # Set temporary API key if provided by user
+        original_key = None
+        if openai_api_key:
+            logger.info("Using custom OpenAI API key provided by user")
+            original_key = os.environ.get("OPENAI_API_KEY")
+            os.environ["OPENAI_API_KEY"] = openai_api_key
+        try:
+            session_engines[session_id] = ResearchEngine(verbose=False)
+        finally:
+            # Restore original key if we changed it
+            if original_key is not None:
+                os.environ["OPENAI_API_KEY"] = original_key
+            elif openai_api_key:
+                # If there was no original key, remove the temporary one
+                os.environ.pop("OPENAI_API_KEY", None)
+    return session_engines[session_id]
+def cleanup_session(session_id):
+    """Remove a session when it's no longer needed"""
+    if session_id in session_engines:
+        logger.info(f"Cleaning up session {session_id}")
+        del session_engines[session_id]
+def process_message(message, history, session_id, openai_api_key=None):
+    """
+    Process user message and update chat history.
+    Args:
+        message: User's message
+        history: Chat history list
+        session_id: Unique identifier for the session
+        openai_api_key: Optional custom OpenAI API key
+    Returns:
+        Updated history
+    """
+    # Validate API keys
+    missing_keys = validate_api_keys(openai_api_key)
+    if missing_keys:
+        return history + [
+            {"role": "user", "content": message},
+            {"role": "assistant", "content": f"Error: Missing required API keys: {', '.join(missing_keys)}. Please set these in your .env file or input your OpenAI API key below."}
+        ]
+    # Add user message to history
+    history.append({"role": "user", "content": message})
+    try:
+        print(f"Starting research for: {message}")
+        start_time = time.time()
+        # Get the appropriate engine for this session, passing the API key if provided
+        engine = get_engine_for_session(session_id, openai_api_key)
+        # Set the API key for this specific request if provided
+        original_key = None
+        if openai_api_key:
+            original_key = os.environ.get("OPENAI_API_KEY")
+            os.environ["OPENAI_API_KEY"] = openai_api_key
+        try:
+            # Start the research process
+            research_task = engine.research(message)
+        finally:
+            # Restore original key if we changed it
+            if original_key is not None:
+                os.environ["OPENAI_API_KEY"] = original_key
+            elif openai_api_key:
+                # If there was no original key, remove the temporary one
+                os.environ.pop("OPENAI_API_KEY", None)
+        # Print the research task output for debugging
+        print(f"Research task result type: {type(research_task)}")
+        print(f"Research task content: {research_task}")
+        # If we get here, step 1 is complete
+        history[-1] = {"role": "user", "content": message}
+        history.append({"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query..."})
+        yield history
+        # We don't actually have real-time progress indication from the engine,
+        # so we'll simulate it with a slight delay between steps
+        time.sleep(1)
+        history[-1] = {"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query... ✓\n**Step 2/4:** Searching the web..."}
+        yield history
+        time.sleep(1)
+        history[-1] = {"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query... ✓\n**Step 2/4:** Searching the web... ✓\n**Step 3/4:** Analyzing results..."}
+        yield history
+        time.sleep(1)
+        history[-1] = {"role": "assistant", "content": f"Researching... this may take a minute or two...\n\n**Step 1/4:** Refining your query... ✓\n**Step 2/4:** Searching the web... ✓\n**Step 3/4:** Analyzing results... ✓\n**Step 4/4:** Synthesizing information..."}
+        yield history
+        # Get response from research engine
+        response = research_task["result"]
+        end_time = time.time()
+        processing_time = end_time - start_time
+        # Add processing time for transparency
+        response += f"\n\nResearch completed in {processing_time:.2f} seconds."
+        # Update last message with the full response
+        history[-1] = {"role": "assistant", "content": response}
+        yield history
+    except Exception as e:
+        logger.exception("Error processing message")
+        error_traceback = traceback.format_exc()
+        error_message = f"An error occurred: {str(e)}\n\nTraceback: {error_traceback}"
+        history[-1] = {"role": "assistant", "content": error_message}
+        yield history
+# Define a basic theme with minimal customization - more styling in CSS
+custom_theme = gr.themes.Soft(
+    primary_hue=gr.themes.colors.indigo,
+    secondary_hue=gr.themes.colors.blue,
+    neutral_hue=gr.themes.colors.slate,
+)
+# Gradio versions have different ways of loading CSS, let's ensure compatibility
+css_file_path = pathlib.Path("assets/custom.css")
+if css_file_path.exists():
+    with open(css_file_path, 'r') as f:
+        css_content = f.read()
+else:
+    css_content = ""  # Fallback empty CSS if file doesn't exist
+# Add the CSS as a style tag to ensure it works in all Gradio versions
+css_head = f"""
+<style>
+{css_content}
+/* Additional styling for API key input */
+.api-settings .api-key-input input {{
+    border: 1px solid #ccc;
+    border-radius: 8px;
+    font-family: monospace;
+    letter-spacing: 1px;
+}}
+.api-settings .api-key-info {{
+    font-size: 0.8rem;
+    color: #666;
+    margin-top: 5px;
+}}
+.api-settings {{
+    margin-bottom: 20px;
+    border: 1px solid #eee;
+    border-radius: 8px;
+    padding: 10px;
+    background-color: #f9f9f9;
+}}
+</style>
+"""
+# Create the Gradio interface with multiple CSS loading methods for compatibility
+with gr.Blocks(
+    title="Web Research Agent",
+    theme=custom_theme,
+    css=css_content,
+    head=css_head,  # Older versions may use this
+) as app:
+    # Create a unique session ID for each user
+    session_id = gr.State(lambda: str(uuid.uuid4()))
+    with gr.Row(elem_classes=["container"]):
+        with gr.Column():
+            with gr.Row(elem_classes=["app-header"]):
+                gr.Markdown("""
+                <div style="display: flex; align-items: center; justify-content: center;">
+                    <div style="width: 40px; height: 40px; margin-right: 15px; background: linear-gradient(135deg, #3a7bd5, #00d2ff); border-radius: 10px; display: flex; justify-content: center; align-items: center;">
+                        <span style="color: white; font-size: 24px; font-weight: bold;">R</span>
+                    </div>
+                    <h1 style="margin: 0;">Web Research Agent</h1>
+                </div>
+                """)
+            gr.Markdown("""
+            This intelligent agent utilizes a multi-step process to deliver comprehensive research on any topic.
+            Simply enter your question or topic below to get comprehensive, accurate information with proper citations.
+            """, elem_classes=["md-container"])
+            # Missing keys warning
+            missing_keys = validate_api_keys()
+            if missing_keys:
+                gr.Markdown(f"⚠️ **Warning:** Missing required API keys: {', '.join(missing_keys)}. Add these to your .env file.", elem_classes=["warning"])
+            chatbot = gr.Chatbot(
+                height=600,
+                show_copy_button=True,
+                avatar_images=(None, "./assets/assistant_avatar.png"),
+                type="messages",  # Use the modern messages format instead of tuples
+                elem_classes=["chatbot-container"]
+            )
+            # API Key input
+            with gr.Accordion("API Settings", open=False, elem_classes=["api-settings"]):
+                openai_api_key = gr.Textbox(
+                    label="OpenAI API Key (optional)",
+                    placeholder="sk-...",
+                    type="password",
+                    info="Provide your own OpenAI API key if you don't want to use the system default key.",
+                    elem_classes=["api-key-input"]
+                )
+                gr.Markdown("""
+                Your API key is only used for your requests and is never stored on our servers.
+                It's a safer alternative to adding it to the .env file.
+                [Get an API key from OpenAI](https://platform.openai.com/account/api-keys)
+                """, elem_classes=["api-key-info"])
+            with gr.Row(elem_classes=["input-container"]):
+                msg = gr.Textbox(
+                    placeholder="Ask me anything...",
+                    scale=9,
+                    container=False,
+                    show_label=False,
+                    elem_classes=["input-box"]
+                )
+                submit = gr.Button("Search", scale=1, variant="primary", elem_classes=["search-button"], value="search")
+            # Clear button
+            clear = gr.Button("Clear Conversation", elem_classes=["clear-button"])
+            # Examples
+            with gr.Accordion("Example Questions", open=False, elem_classes=["examples-container"]):
+                examples = gr.Examples(
+                    examples=[
+                        "What are the latest advancements in artificial intelligence?",
+                        "Explain the impact of climate change on marine ecosystems",
+                        "How do mRNA vaccines work?",
+                        "What are the health benefits of intermittent fasting?",
+                        "Explain the current state of quantum computing research",
+                        "What are the main theories about dark matter?",
+                        "How is blockchain technology being used outside of cryptocurrency?",
+                    ],
+                    inputs=msg
+                )
+            # Set up event handlers
+            submit_click_event = submit.click(
+                process_message,
+                inputs=[msg, chatbot, session_id, openai_api_key],
+                outputs=[chatbot],
+                show_progress=True
+            )
+            msg_submit_event = msg.submit(
+                process_message,
+                inputs=[msg, chatbot, session_id, openai_api_key],
+                outputs=[chatbot],
+                show_progress=True
+            )
+            # Clear message input after sending
+            submit_click_event.then(lambda: "", None, msg)
+            msg_submit_event.then(lambda: "", None, msg)
+            # Clear conversation and reset session
+            def clear_conversation_and_session(session_id_value):
+                # Clear the session data
+                cleanup_session(session_id_value)
+                # Generate a new session ID
+                new_session_id = str(uuid.uuid4())
+                # Return empty history and new session ID
+                return [], new_session_id
+            clear.click(
+                clear_conversation_and_session,
+                inputs=[session_id],
+                outputs=[chatbot, session_id]
+            )
+            # Citation and tools information
+            with gr.Accordion("About This Research Agent", open=False, elem_classes=["footer"]):
+                gr.Markdown("""
+                ### Research Agent Features
+                This research agent uses a combination of specialized AI agents to provide comprehensive answers:
+                - **Researcher Agent**: Refines queries and searches the web
+                - **Analyst Agent**: Evaluates content relevance and factual accuracy
+                - **Writer Agent**: Synthesizes information into coherent responses
+                #### Tools Used
+                - BraveSearch and Tavily for web searching
+                - Content scraping for in-depth information
+                - Analysis for relevance and factual verification
+                #### API Keys
+                - You can use your own OpenAI API key by entering it in the "API Settings" section
+                - Your API key is used only for your requests and is never stored on our servers
+                - This lets you control costs and use your preferred API tier
+                All information is provided with proper citations and sources.
+                *Processing may take a minute or two as the agent searches, analyzes, and synthesizes information.*
+                """, elem_classes=["md-container"])
+if __name__ == "__main__":
+    # Create assets directory if it doesn't exist
+    os.makedirs("assets", exist_ok=True)
+    # Launch the Gradio app
+    app.launch()

architecture.md ADDED Viewed

	@@ -0,0 +1,97 @@

+# Web Research Agent Architecture
+```
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                               Gradio Interface                                │
+└───────────────────────────────────┬──────────────────────────────────────────┘
+                                    │
+                                    ▼
+┌──────────────────────────────────────────────────────────────────────────────┐
+│                              Research Engine                                  │
+│                                                                               │
+│   ┌───────────────────────────────────────────────────────────────────────┐  │
+│   │                        Conversation History                            │  │
+│   └───────────────────────────────────────────────────────────────────────┘  │
+│                                                                               │
+│   ┌─────────────┐              ┌─────────────┐              ┌─────────────┐  │
+│   │  Researcher │◄────────────►│   Analyst   │◄────────────►│    Writer   │  │
+│   │    Agent    │              │    Agent    │              │    Agent    │  │
+│   └──────┬──────┘              └──────┬──────┘              └──────┬──────┘  │
+│          │                            │                            │          │
+│          ▼                            ▼                            ▼          │
+│   ┌─────────────┐              ┌─────────────┐              ┌─────────────┐  │
+│   │ Search      │              │   Scrape    │              │ Information │  │
+│   │ Rotation    │              │ Website Tool│              │ Synthesis   │  │
+│   │ Tool        │              └──────┬──────┘              └─────────────┘  │
+│   └─────────────┘                     │                                       │
+│                                       ▼                                       │
+│                                ┌─────────────┐                                │
+│                                │   Content   │                                │
+│                                │  Analyzer   │                                │
+│                                └─────────────┘                                │
+└──────────────────────────────────────────────────────────────────────────────┘
+```
+## Research Flow
+1. **User Input**
+   - User enters a query in the Gradio interface
+   - Query is validated for legitimacy and processed by the system
+2. **Query Refinement** (Researcher Agent)
+   - Original query is analyzed and refined for optimal search results
+   - Ambiguous terms are clarified and search intent is identified
+   - Refined query is prepared for web search with improved keywords
+3. **Web Search** (Researcher Agent + Search Rotation Tool)
+   - Search Rotation Tool executes search using multiple search engines
+   - Rate limiting is implemented to avoid API throttling
+   - Search is performed with a maximum of 5 searches per query
+   - Results are cached for similar queries to improve efficiency
+   - Search results are collected with URLs and snippets
+4. **Content Scraping** (Analyst Agent + ScrapeWebsiteTool)
+   - ScrapeWebsiteTool extracts content from search result URLs
+   - HTML content is parsed to extract meaningful text
+   - Raw content is prepared for analysis and evaluation
+5. **Content Analysis** (Analyst Agent + ContentAnalyzerTool)
+   - Content is analyzed for relevance to the query (scores 0-10)
+   - Factuality and quality are evaluated (scores 0-10)
+   - Irrelevant or low-quality content is filtered out
+   - Content is organized by relevance and information value
+6. **Response Creation** (Writer Agent)
+   - Analyzed content is synthesized into a comprehensive response
+   - Information is organized logically with a clear structure
+   - Contradictory information is reconciled when present
+   - Citations are added in [1], [2] format with proper attribution
+   - Source URLs are included for reference and verification
+7. **Result Presentation**
+   - Final response with citations is displayed to the user
+   - Conversation history is updated and maintained per session
+   - Results can be saved to file if requested
+## System Architecture
+- **Multi-Agent System**: Three specialized agents work together with distinct roles
+- **Stateless Design**: Each research request is processed independently
+- **Session Management**: User sessions maintain separate conversation contexts
+- **API Integration**: Multiple search APIs with fallback mechanisms
+- **Memory**: All agents maintain context throughout the research process
+- **Tool Abstraction**: Search and analysis tools are modular and interchangeable
+- **Error Handling**: Comprehensive error handling at each processing stage
+- **Rate Limiting**: API calls are rate-limited to prevent throttling
+## Technical Implementation
+- **Frontend**: Gradio web interface with real-time feedback
+- **Backend**: Python-based research engine with modular components
+- **Tools**:
+  - Search Rotation Tool (supports multiple search engines)
+  - Rate Limited Tool Wrapper (prevents API throttling)
+  - Content Analyzer Tool (evaluates relevance and factuality)
+  - Scrape Website Tool (extracts content from URLs)
+- **Deployment**: Compatible with Hugging Face Spaces for online access
+- **Caching**: Results are cached to improve performance and reduce API calls

assets/.gitkeep ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # This directory is for assets like the assistant avatar
2	+ # You can add an image named assistant_avatar.png here for the chatbot interface

assets/assistant_avatar.png ADDED Viewed

Git LFS Details

SHA256: 7772041226ef4b5c3197e833693e121f5113a7bf008b454eefae4a8e8401ec4e
Pointer size: 132 Bytes
Size of remote file: 2.27 MB

assets/custom.css ADDED Viewed

	@@ -0,0 +1,494 @@

+/* Custom CSS for Web Research Agent */
+/* Global body styling */
+body {
+    background: linear-gradient(to right, #0f2027, #203a43, #2c5364);
+    color: #f0f0f0;
+    font-family: 'Inter', system-ui, sans-serif;
+}
+/* Override Gradio container styles */
+.gradio-container {
+    max-width: 1200px !important;
+    margin: 0 auto !important;
+    background-color: transparent !important;
+}
+/* Button styling overrides */
+.primary-btn {
+    background: linear-gradient(90deg, #3a7bd5, #00d2ff) !important;
+    color: white !important;
+    border: none !important;
+    transition: all 0.3s ease !important;
+}
+.primary-btn:hover {
+    transform: translateY(-2px) !important;
+    box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1) !important;
+}
+/* Input field styling overrides */
+textarea, input[type="text"] {
+    background: rgba(255, 255, 255, 0.05) !important;
+    border: 1px solid rgba(255, 255, 255, 0.1) !important;
+    color: white !important;
+}
+textarea:focus, input[type="text"]:focus {
+    border-color: #3a7bd5 !important;
+    box-shadow: 0 0 0 2px rgba(58, 123, 213, 0.3) !important;
+}
+/* Chat bubbles */
+.message-bubble {
+    background-color: rgba(42, 46, 53, 0.8) !important;
+    border-radius: 12px !important;
+}
+.user-bubble {
+    background-color: rgba(48, 66, 105, 0.8) !important;
+}
+/* Main container styling */
+.container {
+    max-width: 1200px;
+    margin: 0 auto;
+    padding: 20px;
+}
+/* Header styling */
+h1.title {
+    font-size: 2.5rem;
+    font-weight: 700;
+    background: linear-gradient(90deg, #3a7bd5, #00d2ff);
+    -webkit-background-clip: text; /* Prefix for older WebKit */
+    background-clip: text; /* Standard property */
+    -webkit-text-fill-color: transparent;
+    text-align: center;
+    margin-bottom: 0.5rem;
+}
+/* Chatbot container */
+.chatbot-container {
+    border-radius: 12px;
+    background: rgba(255, 255, 255, 0.05);
+    backdrop-filter: blur(10px);
+    box-shadow: 0 8px 32px rgba(0, 0, 0, 0.1);
+    margin-bottom: 20px;
+}
+/* Chat messages */
+.message {
+    padding: 15px 20px;
+    border-radius: 10px;
+    margin-bottom: 10px;
+    max-width: 80%;
+}
+.user-message {
+    background-color: #304269;
+    color: white;
+    align-self: flex-end;
+    border-bottom-right-radius: 0;
+}
+.bot-message {
+    background-color: #2a2e35;
+    color: #eaeaea;
+    align-self: flex-start;
+    border-bottom-left-radius: 0;
+}
+/* Input box styling */
+.input-container {
+    display: flex;
+    gap: 10px;
+    margin-top: 20px;
+}
+.input-box {
+    border-radius: 8px;
+    border: 1px solid rgba(255, 255, 255, 0.1);
+    background: rgba(255, 255, 255, 0.05);
+    padding: 12px 16px;
+    font-size: 16px;
+    color: white;
+    transition: all 0.3s ease;
+}
+.input-box:focus {
+    border-color: #3a7bd5;
+    box-shadow: 0 0 0 2px rgba(58, 123, 213, 0.3);
+    outline: none;
+}
+/* Button styling */
+.search-button {
+    background: linear-gradient(90deg, #3a7bd5, #00d2ff);
+    color: white;
+    border: none;
+    border-radius: 8px;
+    padding: 12px 24px;
+    font-weight: 600;
+    cursor: pointer;
+    transition: all 0.3s ease;
+}
+.search-button:hover {
+    transform: translateY(-2px);
+    box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
+}
+.clear-button {
+    background: transparent;
+    color: #adadad;
+    border: 1px solid rgba(255, 255, 255, 0.2);
+    border-radius: 8px;
+    padding: 8px 16px;
+    font-weight: 500;
+    cursor: pointer;
+    transition: all 0.3s ease;
+}
+.clear-button:hover {
+    background: rgba(255, 255, 255, 0.05);
+    color: white;
+}
+/* Examples section */
+.examples-container {
+    margin-top: 20px;
+    padding: 15px;
+    border-radius: 8px;
+    background: rgba(255, 255, 255, 0.03);
+}
+.examples-container h3 {
+    margin-top: 0;
+    color: #b8b9bd;
+    font-size: 1rem;
+}
+.example-item {
+    padding: 8px 12px;
+    background: rgba(58, 123, 213, 0.1);
+    border-radius: 6px;
+    margin-bottom: 8px;
+    cursor: pointer;
+    transition: all 0.2s ease;
+}
+.example-item:hover {
+    background: rgba(58, 123, 213, 0.2);
+}
+/* Loading indicator */
+.loading-indicator {
+    display: inline-block;
+    margin-left: 10px;
+    color: #3a7bd5;
+}
+/* Citation and source styling */
+.citation {
+    font-size: 0.85rem;
+    color: #6c757d;
+    background-color: rgba(108, 117, 125, 0.1);
+    padding: 0 4px;
+    border-radius: 3px;
+}
+.source-list {
+    font-size: 0.9rem;
+    padding-left: 20px;
+    margin-top: 10px;
+    color: #b8b9bd;
+}
+/* Warning messages */
+.warning {
+    background-color: rgba(255, 207, 0, 0.1);
+    border-left: 4px solid #ffcf00;
+    padding: 12px 16px;
+    border-radius: 4px;
+    margin-bottom: 20px;
+    color: #f0f0f0;
+}
+/* Footer styling */
+.footer {
+    margin-top: 30px;
+    padding-top: 20px;
+    border-top: 1px solid rgba(255, 255, 255, 0.1);
+    text-align: center;
+    font-size: 0.9rem;
+    color: #b8b9bd;
+}
+/* Markdown content styling */
+.md-container {
+    line-height: 1.6;
+}
+.md-container code {
+    background-color: rgba(255, 255, 255, 0.1);
+    padding: 2px 5px;
+    border-radius: 3px;
+    font-family: monospace;
+}
+.md-container pre {
+    background-color: rgba(0, 0, 0, 0.2);
+    padding: 15px;
+    border-radius: 5px;
+    overflow-x: auto;
+}
+/* Avatar styling */
+.avatar {
+    width: 36px;
+    height: 36px;
+    border-radius: 50%;
+    object-fit: cover;
+}
+/* Dark mode specific adjustments */
+@media (prefers-color-scheme: dark) {
+    body {
+        background-color: #1a1c23;
+        color: #f0f0f0;
+    }
+    .input-box {
+        background: rgba(255, 255, 255, 0.03);
+    }
+}
+/* Custom scrollbar */
+::-webkit-scrollbar {
+    width: 8px;
+    height: 8px;
+}
+::-webkit-scrollbar-track {
+    background: rgba(255, 255, 255, 0.05);
+}
+::-webkit-scrollbar-thumb {
+    background: rgba(255, 255, 255, 0.2);
+    border-radius: 4px;
+}
+::-webkit-scrollbar-thumb:hover {
+    background: rgba(255, 255, 255, 0.3);
+}
+/* Progress indicator styling */
+.progress-step {
+    margin: 10px 0;
+    padding: 8px 12px;
+    border-radius: 8px;
+    background-color: rgba(58, 123, 213, 0.1);
+    transition: all 0.3s ease;
+}
+.progress-step.completed {
+    background-color: rgba(0, 210, 255, 0.15);
+}
+.progress-check {
+    color: #00d2ff;
+    margin-left: 8px;
+}
+/* Loading animation */
+@keyframes pulse {
+    0% { opacity: 0.6; }
+    50% { opacity: 1; }
+    100% { opacity: 0.6; }
+}
+.loading-dot {
+    display: inline-block;
+    width: 8px;
+    height: 8px;
+    border-radius: 50%;
+    background-color: #3a7bd5;
+    margin: 0 2px;
+    animation: pulse 1.5s infinite;
+}
+.loading-dot:nth-child(2) {
+    animation-delay: 0.2s;
+}
+.loading-dot:nth-child(3) {
+    animation-delay: 0.4s;
+}
+/* Message content styling - improve readability */
+.message-content {
+    line-height: 1.6;
+    font-size: 1rem;
+}
+/* Code blocks in messages */
+.message-content pre {
+    background-color: rgba(0, 0, 0, 0.2);
+    border-radius: 8px;
+    padding: 12px;
+    overflow-x: auto;
+    font-family: 'Courier New', monospace;
+    font-size: 0.9rem;
+}
+.message-content code {
+    background-color: rgba(0, 0, 0, 0.2);
+    padding: 2px 4px;
+    border-radius: 4px;
+    font-family: 'Courier New', monospace;
+    font-size: 0.9em;
+}
+/* Improve citation styling */
+.citation {
+    background-color: rgba(58, 123, 213, 0.2);
+    padding: 2px 5px;
+    border-radius: 4px;
+    font-weight: 500;
+    color: #b8cff5;
+    margin: 0 2px;
+    font-size: 0.9em;
+}
+/* Source list at the end of responses */
+.source-list {
+    margin-top: 20px;
+    padding-top: 10px;
+    border-top: 1px solid rgba(255, 255, 255, 0.1);
+}
+.source-list ol {
+    margin-left: 20px;
+    padding-left: 10px;
+}
+.source-list li {
+    margin-bottom: 5px;
+}
+/* Make links more visible */
+a {
+    color: #00d2ff;
+    text-decoration: none;
+    transition: all 0.2s ease;
+}
+a:hover {
+    text-decoration: underline;
+    color: #3a7bd5;
+}
+/* Add app logo/header styling */
+.app-header {
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    margin-bottom: 20px;
+}
+.app-logo {
+    width: 40px;
+    height: 40px;
+    margin-right: 10px;
+}
+/* Responsive adjustments */
+@media (max-width: 768px) {
+    .container {
+        padding: 10px;
+    }
+    h1.title {
+        font-size: 1.8rem;
+    }
+    .chatbot-container {
+        height: 70vh;
+    }
+    .message {
+        max-width: 90%;
+    }
+}
+/* Chatbot container and message styling */
+.gradio-container .prose {
+    max-width: 100% !important; /* Override max-width constraint */
+}
+/* Target the chatbot messages directly */
+.chatbot .message {
+    padding: 15px !important;
+    border-radius: 12px !important;
+    margin-bottom: 12px !important;
+    box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1) !important;
+}
+/* User messages */
+.chatbot .user {
+    background-color: #304269 !important;
+    color: white !important;
+    border-bottom-right-radius: 2px !important;
+}
+/* Bot messages */
+.chatbot .bot {
+    background-color: #2a2e35 !important;
+    color: #eaeaea !important;
+    border-bottom-left-radius: 2px !important;
+}
+/* Apply primary gradient to buttons */
+button.primary {
+    background: linear-gradient(90deg, #3a7bd5, #00d2ff) !important;
+    color: white !important;
+}
+/* Style the chatbot container */
+.chatbot-container > div {
+    border-radius: 12px !important;
+    background: rgba(31, 41, 55, 0.4) !important;
+    backdrop-filter: blur(10px) !important;
+}
+/* Fix scrollbar in chat */
+.chatbot ::-webkit-scrollbar {
+    width: 8px !important;
+}
+.chatbot ::-webkit-scrollbar-track {
+    background: rgba(255, 255, 255, 0.05) !important;
+}
+.chatbot ::-webkit-scrollbar-thumb {
+    background: rgba(255, 255, 255, 0.2) !important;
+    border-radius: 4px !important;
+}
+/* Style the copy button */
+.copy-button {
+    background-color: rgba(58, 123, 213, 0.2) !important;
+    color: #b8cff5 !important;
+}
+/* Fix mobile responsiveness */
+@media (max-width: 640px) {
+    .gradio-container {
+        padding: 10px !important;
+    }
+    .container {
+        padding: 10px !important;
+    }
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+crewai>=0.11.0
+gradio>=3.50.0
+python-dotenv>=1.0.0
+duckduckgo-search>=3.9.0
+beautifulsoup4>=4.12.0
+requests>=2.31.0
+pydantic>=2.0.0

research_engine.py ADDED Viewed

	@@ -0,0 +1,382 @@

+import os
+import json
+import logging
+import time
+from typing import List, Dict, Any, Optional, Tuple, Union
+from crewai import Crew
+from crewai.agent import Agent
+from crewai.task import Task
+from agents import create_researcher_agent, create_analyst_agent, create_writer_agent
+from tasks import (
+    create_query_refinement_task,
+    create_search_task,
+    create_content_scraping_task,
+    create_content_analysis_task,
+    create_response_writing_task
+)
+from utils import is_valid_query, format_research_results, extract_citations
+# Configure logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+class ResearchEngine:
+    """
+    Main engine for web research using CrewAI.
+    Orchestrates agents and tasks to provide comprehensive research results.
+    """
+    def __init__(self, llm=None, verbose=False):
+        """
+        Initialize the research engine.
+        Args:
+            llm: The language model to use for agents
+            verbose: Whether to log detailed information
+        """
+        self.llm = llm
+        self.verbose = verbose
+        # Initialize agents
+        logger.info("Initializing agents...")
+        self.researcher = create_researcher_agent(llm=llm, verbose=verbose)
+        self.analyst = create_analyst_agent(llm=llm, verbose=verbose)
+        self.writer = create_writer_agent(llm=llm, verbose=verbose)
+        # Chat history for maintaining conversation context
+        self.chat_history = []
+        logger.info("Research engine initialized with agents")
+    def _validate_api_keys(self):
+        """Validates that required API keys are present"""
+        missing_keys = []
+        if not os.getenv("BRAVE_API_KEY"):
+            missing_keys.append("BRAVE_API_KEY")
+        if not os.getenv("TAVILY_API_KEY"):
+            missing_keys.append("TAVILY_API_KEY")
+        if not os.getenv("OPENAI_API_KEY") and not self.llm:
+            missing_keys.append("OPENAI_API_KEY or custom LLM")
+        if missing_keys:
+            logger.warning(f"Missing API keys: {', '.join(missing_keys)}")
+            if "TAVILY_API_KEY" in missing_keys:
+                logger.warning("Tavily API key is missing - search functionality may be limited")
+            if "BRAVE_API_KEY" in missing_keys:
+                logger.warning("Brave API key is missing - search functionality may be limited")
+            # Only raise error if all search API keys are missing
+            if "BRAVE_API_KEY" in missing_keys and "TAVILY_API_KEY" in missing_keys:
+                raise ValueError(f"Missing required API keys: {', '.join(missing_keys)}")
+        else:
+            logger.info("All required API keys are present")
+    def research(self, query: str, output_file=None) -> Dict[str, Any]:
+        """
+        Perform research on the given query.
+        Args:
+            query: The research query
+            output_file: Optional file to save the research results
+        Returns:
+            Research results
+        """
+        logger.info(f"Research initiated for query: {query}")
+        start_time = time.time()  # Initialize the start_time for tracking processing time
+        try:
+            self._validate_api_keys()
+            logger.info(f"Starting research for query: {query}")
+            # Add the query to chat history
+            self.chat_history.append({"role": "user", "content": query})
+            # Step 1: Initialize the crew
+            logger.info("Initializing research crew...")
+            crew = Crew(
+                agents=[self.researcher],
+                tasks=[create_query_refinement_task(self.researcher, query)],
+                verbose=self.verbose,  # Use the instance's verbose setting
+                process="sequential"
+            )
+            # Step 2: Start the research process
+            logger.info("Starting research process...")
+            refinement_result = crew.kickoff(inputs={"query": query})
+            logger.info(f"Query refinement completed with result type: {type(refinement_result)}")
+            logger.debug(f"Refinement result: {refinement_result}")
+            # Extract the refined query
+            refined_query = None
+            try:
+                logger.info(f"Attempting to extract refined query from result type: {type(refinement_result)}")
+                # Handle CrewOutput object (new CrewAI format)
+                if hasattr(refinement_result, '__class__') and refinement_result.__class__.__name__ == 'CrewOutput':
+                    logger.info("Processing CrewOutput format refinement result")
+                    # Try to access raw attribute first (contains the raw output)
+                    if hasattr(refinement_result, 'raw'):
+                        refined_query = self._extract_query_from_string(refinement_result.raw)
+                        logger.info(f"Extracted from CrewOutput.raw: {refined_query}")
+                    # Try to access as dictionary
+                    elif hasattr(refinement_result, 'to_dict'):
+                        crew_dict = refinement_result.to_dict()
+                        logger.info(f"CrewOutput converted to dict: {crew_dict}")
+                        if 'result' in crew_dict:
+                            refined_query = self._extract_query_from_string(crew_dict['result'])
+                            logger.info(f"Extracted from CrewOutput dict result: {refined_query}")
+                    # Try string representation as last resort
+                    else:
+                        crew_str = str(refinement_result)
+                        refined_query = self._extract_query_from_string(crew_str)
+                        logger.info(f"Extracted from CrewOutput string representation: {refined_query}")
+                # First try to access it as a dictionary (new CrewAI format)
+                elif isinstance(refinement_result, dict):
+                    logger.info("Processing dictionary format refinement result")
+                    if "query" in refinement_result:
+                        refined_query = refinement_result["query"]
+                    elif "refined_query" in refinement_result:
+                        refined_query = refinement_result["refined_query"]
+                    elif "result" in refinement_result and isinstance(refinement_result["result"], str):
+                        # Try to extract from nested result field
+                        json_str = refinement_result["result"]
+                        refined_query = self._extract_query_from_string(json_str)
+                # Then try to access it as a string (old CrewAI format)
+                elif isinstance(refinement_result, str):
+                    logger.info("Processing string format refinement result")
+                    refined_query = self._extract_query_from_string(refinement_result)
+                else:
+                    logger.warning(f"Unexpected refinement result format: {type(refinement_result)}")
+                    # Try to extract information by examining the structure
+                    try:
+                        # Try to access common attributes
+                        if hasattr(refinement_result, "result"):
+                            result_str = str(getattr(refinement_result, "result"))
+                            refined_query = self._extract_query_from_string(result_str)
+                            logger.info(f"Extracted from .result attribute: {refined_query}")
+                        elif hasattr(refinement_result, "task_output"):
+                            task_output = getattr(refinement_result, "task_output")
+                            refined_query = self._extract_query_from_string(str(task_output))
+                            logger.info(f"Extracted from .task_output attribute: {refined_query}")
+                        else:
+                            # Last resort: convert to string and extract
+                            refined_query = self._extract_query_from_string(str(refinement_result))
+                            logger.info(f"Extracted from string representation: {refined_query}")
+                    except Exception as attr_error:
+                        logger.exception(f"Error trying to extract attributes: {attr_error}")
+                        refined_query = query  # Fall back to original query
+                    logger.debug(f"Refinement result: {refinement_result}")
+            except Exception as e:
+                logger.exception(f"Error extracting refined query: {e}")
+                refined_query = query  # Fall back to original query on error
+            if not refined_query or refined_query.strip() == "":
+                logger.warning("Refined query is empty, using original query")
+                refined_query = query
+            logger.info(f"Refined query: {refined_query}")
+            # Step 3: Create tasks for research process
+            logger.info("Creating research tasks...")
+            search_task = create_search_task(self.researcher, refined_query)
+            scrape_task = create_content_scraping_task(self.analyst, search_task)
+            analyze_task = create_content_analysis_task(self.analyst, refined_query, scrape_task)
+            write_task = create_response_writing_task(self.writer, refined_query, analyze_task)
+            # Step 4: Create a new crew for the research tasks
+            logger.info("Initializing main research crew...")
+            research_crew = Crew(
+                agents=[self.researcher, self.analyst, self.writer],
+                tasks=[search_task, scrape_task, analyze_task, write_task],
+                verbose=self.verbose,  # Use the instance's verbose setting
+                process="sequential"
+            )
+            # Step 5: Start the research process
+            logger.info("Starting main research process...")
+            result = research_crew.kickoff()
+            logger.info(f"Research completed with result type: {type(result)}")
+            logger.debug(f"Research result: {result}")
+            # Extract the result
+            final_result = {"query": query, "refined_query": refined_query}
+            # Handle different result types
+            if isinstance(result, dict) and "result" in result:
+                final_result["result"] = result["result"]
+            # Handle CrewOutput object (new CrewAI format)
+            elif hasattr(result, '__class__') and result.__class__.__name__ == 'CrewOutput':
+                logger.info("Processing CrewOutput format result")
+                # Try to access raw attribute first (contains the raw output)
+                if hasattr(result, 'raw'):
+                    final_result["result"] = result.raw
+                    logger.info("Extracted result from CrewOutput.raw")
+                # Try to access as dictionary
+                elif hasattr(result, 'to_dict'):
+                    crew_dict = result.to_dict()
+                    if 'result' in crew_dict:
+                        final_result["result"] = crew_dict['result']
+                        logger.info("Extracted result from CrewOutput dict")
+                # Use string representation as last resort
+                else:
+                    final_result["result"] = str(result)
+                    logger.info("Used string representation of CrewOutput")
+            else:
+                # For any other type, use the string representation
+                final_result["result"] = str(result)
+                logger.info(f"Used string representation for result type: {type(result)}")
+            logger.info("Research process completed successfully")
+            # Save to file if requested
+            if output_file:
+                with open(output_file, 'w', encoding='utf-8') as f:
+                    json.dump(final_result, f, ensure_ascii=False, indent=2)
+            # Extract citations for easy access (if possible from the final string)
+            citations = extract_citations(final_result["result"])
+            # Calculate total processing time
+            processing_time = time.time() - start_time
+            logger.info(f"Research completed successfully in {processing_time:.2f} seconds")
+            return {
+                "result": final_result["result"],
+                "success": True,
+                "refined_query": refined_query,
+                "citations": citations,
+                "processing_time": processing_time
+            }
+        except Exception as e:
+            logger.exception(f"Error in research process: {e}")
+            return {
+                "result": f"I encountered an error while researching your query: {str(e)}",
+                "success": False,
+                "reason": "research_error",
+                "error": str(e)
+            }
+    def chat(self, message: str) -> str:
+        """
+        Handle a chat message, which could be a research query or a follow-up question.
+        Args:
+            message: The user's message
+        Returns:
+            The assistant's response
+        """
+        # Treat all messages as new research queries for simplicity
+        try:
+            research_result = self.research(message)
+            return research_result["result"]
+        except Exception as e:
+            logger.exception(f"Error during research for message: {message}")
+            return f"I encountered an error while processing your request: {str(e)}"
+    def clear_history(self):
+        """Clear the chat history"""
+        self.chat_history = []
+    def _extract_query_from_string(self, text: str) -> str:
+        """
+        Extract refined query from text string, handling various formats including JSON embedded in strings.
+        Args:
+            text: The text to extract the query from
+        Returns:
+            The extracted query or None if not found
+        """
+        if not text:
+            return None
+        # Log the input for debugging
+        logger.debug(f"Extracting query from: {text[:200]}...")
+        # Try to parse as JSON first
+        try:
+            # Check if the entire string is valid JSON
+            json_data = json.loads(text)
+            # Check for known keys in the parsed JSON
+            if isinstance(json_data, dict):
+                if "refined_query" in json_data:
+                    return json_data["refined_query"]
+                elif "query" in json_data:
+                    return json_data["query"]
+                elif "result" in json_data and isinstance(json_data["result"], str):
+                    # Try to recursively extract from nested result
+                    return self._extract_query_from_string(json_data["result"])
+        except json.JSONDecodeError:
+            # Not valid JSON, continue with string parsing
+            pass
+        # Look for JSON blocks in the string
+        try:
+            import re
+            # Match both markdown JSON blocks and regular JSON objects
+            json_pattern = r'```(?:json)?\s*({[^`]*})```|({[\s\S]*})'
+            json_matches = re.findall(json_pattern, text, re.DOTALL)
+            for json_match in json_matches:
+                # Handle tuple result from findall with multiple capture groups
+                json_str = next((s for s in json_match if s), '')
+                try:
+                    json_data = json.loads(json_str)
+                    if isinstance(json_data, dict):
+                        if "refined_query" in json_data:
+                            return json_data["refined_query"]
+                        elif "query" in json_data:
+                            return json_data["query"]
+                except Exception:
+                    continue
+        except Exception as e:
+            logger.debug(f"Error parsing JSON blocks: {e}")
+        # Check for common patterns in CrewAI output format
+        patterns = [
+            r'refined query[:\s]+([^\n]+)',
+            r'query[:\s]+([^\n]+)',
+            r'search(?:ed)? for[:\s]+[\'"]([^\'"]+)[\'"]',
+            r'search(?:ing)? for[:\s]+[\'"]([^\'"]+)[\'"]',
+            r'research(?:ing)? (?:about|on)[:\s]+[\'"]([^\'"]+)[\'"]',
+            r'query is[:\s]+[\'"]([^\'"]+)[\'"]'
+        ]
+        for pattern in patterns:
+            try:
+                match = re.search(pattern, text.lower())
+                if match:
+                    return match.group(1).strip()
+            except Exception as e:
+                logger.debug(f"Error matching pattern {pattern}: {e}")
+        # Fall back to string parsing methods
+        if "refined query:" in text.lower():
+            return text.split("refined query:", 1)[1].strip()
+        elif "query:" in text.lower():
+            return text.split("query:", 1)[1].strip()
+        # If all else fails, return the whole string
+        return text

run_app.py ADDED Viewed

	@@ -0,0 +1,46 @@

+"""
+Run script for the Web Research Agent with error handling
+"""
+import os
+import sys
+import traceback
+# Ensure assets directory exists
+os.makedirs("assets", exist_ok=True)
+try:
+    # Try importing gradio first to check version and availability
+    import gradio as gr
+    print(f"Using Gradio version: {gr.__version__}")
+    # Then run the main app
+    from app import app
+    # Launch the app with debugging enabled
+    app.launch(share=False, debug=True)  # Enable debug mode to see error traces
+except ImportError as e:
+    print("Error: Missing required packages.")
+    print(f"Details: {e}")
+    print("\nPlease install the required packages:")
+    print("pip install -r requirements.txt")
+    sys.exit(1)
+except Exception as e:
+    print(f"Error: {e}")
+    print("\nTraceback:")
+    traceback.print_exc()
+    # Special handling for common Gradio errors
+    if "got an unexpected keyword argument" in str(e):
+        print("\nThis appears to be an issue with Gradio version compatibility.")
+        print("The app is trying to use features not available in your installed Gradio version.")
+        print("\nTry updating Gradio:")
+        print("pip install --upgrade gradio")
+    elif "CrewOutput" in str(e) or "dict object" in str(e):
+        print("\nThis appears to be an issue with CrewAI output format.")
+        print("The app is having trouble processing CrewAI outputs.")
+        print("\nTry updating CrewAI:")
+        print("pip install --upgrade crewai crewai-tools")
+    sys.exit(1)

search_test.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import os
+import sys
+from dotenv import load_dotenv
+from crewai_tools import BraveSearchTool
+from tools import TavilySearchTool, RateLimitedToolWrapper, SearchRotationTool
+# Load environment variables
+load_dotenv()
+def validate_api_keys():
+    """Checks if required API keys are set"""
+    missing_keys = []
+    if not os.getenv("BRAVE_API_KEY"):
+        missing_keys.append("BRAVE_API_KEY")
+    if not os.getenv("TAVILY_API_KEY"):
+        missing_keys.append("TAVILY_API_KEY")
+    return missing_keys
+def main():
+    # Check for API keys
+    missing_keys = validate_api_keys()
+    if missing_keys:
+        print(f"Error: Missing required API keys: {', '.join(missing_keys)}")
+        print("Please set these in your .env file.")
+        sys.exit(1)
+    # Initialize search tools
+    brave_search_tool = BraveSearchTool(
+        n_results=3,
+        save_file=False
+    )
+    tavily_search_tool = TavilySearchTool(
+        max_results=3,
+        search_depth="basic"
+    )
+    # Add rate limiting to each search tool
+    rate_limited_brave_search = RateLimitedToolWrapper(tool=brave_search_tool, delay=10)  # Reduced delay for testing
+    rate_limited_tavily_search = RateLimitedToolWrapper(tool=tavily_search_tool, delay=10)  # Reduced delay for testing
+    # Create the search rotation tool
+    search_rotation_tool = SearchRotationTool(
+        search_tools=[rate_limited_brave_search, rate_limited_tavily_search],
+        max_searches_per_query=5
+    )
+    # Get user query
+    if len(sys.argv) > 1:
+        query = " ".join(sys.argv[1:])
+    else:
+        query = input("Enter your search query: ")
+    # Perform searches
+    print(f"Searching for: '{query}'")
+    print("Will perform up to 5 searches using Brave and Tavily in rotation")
+    print("-" * 50)
+    # First search
+    result1 = search_rotation_tool.run(query)
+    print(result1)
+    print("\n" + "-" * 50)
+    # Modified query
+    modified_query = f"{query} recent news"
+    print(f"Searching for modified query: '{modified_query}'")
+    # Second search
+    result2 = search_rotation_tool.run(modified_query)
+    print(result2)
+    print("\n" + "-" * 50)
+    # Try exceeding the limit with multiple searches for the same query
+    print(f"Attempting additional searches for: '{query}'")
+    for i in range(4):
+        print(f"\nAttempt {i+1}:")
+        result = search_rotation_tool.run(query)
+        print(result)
+        print("-" * 50)
+    print("\nTest complete!")
+if __name__ == "__main__":
+    main()

tasks.py ADDED Viewed

	@@ -0,0 +1,157 @@

+from typing import Dict, List, Any
+from crewai import Task
+from crewai import Agent
+from datetime import datetime
+def create_query_refinement_task(researcher_agent: Agent, query: str) -> Task:
+    """
+    Creates a task for refining the user's query to optimize search results.
+    Args:
+        researcher_agent: The researcher agent to perform the task
+        query: The original user query
+    Returns:
+        Task for query refinement
+    """
+    return Task(
+        description=(
+            f"Given the user query: '{query}', refine it to create an effective search query.Today is {datetime.now().strftime('%Y-%m-%d')}"
+            f"Consider adding specificity, removing ambiguity, and using precise terms. But don't add anything that's not relevant to the query. i.e if you don't know the meaning of abbriviations then don't try to complete it. "
+            f"If the query is invalid (just emojis, random numbers, gibberish, etc.), "
+            f"flag it as invalid. Otherwise, return both the original query and your refined version."
+            f"Don't add any extra information to the query. Just refine it."
+            f"For Technical queries , don't try to make it a question. Just refine it."
+            f"I want you to understand the user's query and refine it to be more specific and accurate do not add any extra information to the query which will change the meaning of the query."
+        ),
+        expected_output=(
+            "Return your response in a structured format like this:\n"
+            "```json\n"
+            "{\n"
+            '  "original_query": "original query here",\n'
+            '  "refined_query": "improved query here",\n'
+            '  "reasoning": "brief explanation of your refinements"\n'
+            "}\n"
+            "```\n\n"
+            "Or if the query is invalid, return:\n"
+            "```json\n"
+            "{\n"
+            '  "is_valid": false,\n'
+            '  "reason": "explanation why the query is invalid"\n'
+            "}\n"
+            "```"
+        ),
+        agent=researcher_agent
+    )
+def create_search_task(researcher_agent: Agent, query: str) -> Task:
+    """
+    Creates a task for performing web search with the refined query.
+    Args:
+        researcher_agent: The researcher agent to perform the task
+        query: The refined query to search
+    Returns:
+        Task for web search
+    """
+    return Task(
+        description=(
+            f"Using the refined query: '{query}', search the web to find the most relevant "
+            f"and reliable information. Return a comprehensive list of search results, "
+            f"including titles, snippets, and URLs. Focus on finding high-quality sources."
+        ),
+        expected_output=(
+            "A JSON list of search results containing: "
+            "1. Title of the page "
+            "2. URL "
+            "3. Snippet or description "
+        ),
+        agent=researcher_agent
+    )
+def create_content_scraping_task(analyst_agent: Agent, search_results: List[Dict[str, Any]]) -> Task:
+    """
+    Creates a task for scraping content from search result URLs.
+    Args:
+        analyst_agent: The analyst agent to perform the task
+        search_results: The search results to scrape
+    Returns:
+        Task for content scraping
+    """
+    urls = [result.get("link", "") for result in search_results if "link" in result]
+    urls_str = "\n".join(urls)
+    return Task(
+        description=(
+            f"Scrape the content from these URLs:\n{urls_str}\n\n"
+            f"For each URL, extract the main content, focusing on text relevant to the search query. "
+            f"Ignore navigation elements, ads, and other irrelevant page components."
+        ),
+        expected_output=(
+            "A JSON dictionary mapping each URL to its scraped content. For each URL, provide: "
+            "1. The URL as the key "
+            "2. The extracted content as the value"
+        ),
+        agent=analyst_agent
+    )
+def create_content_analysis_task(analyst_agent: Agent, query: str, scraped_contents: Dict[str, str]) -> Task:
+    """
+    Creates a task for analyzing and evaluating scraped content.
+    Args:
+        analyst_agent: The analyst agent to perform the task
+        query: The original or refined query
+        scraped_contents: Dict mapping URLs to scraped content
+    Returns:
+        Task for content analysis
+    """
+    return Task(
+        description=(
+            f"Analyze the relevance and factuality of the scraped content in relation to the query: '{query}'\n\n"
+            f"For each piece of content, evaluate: "
+            f"1. Relevance to the query (score 0-10) "
+            f"2. Factual accuracy (score 0-10) "
+            f"3. Filter out low-quality or irrelevant information"
+        ),
+        expected_output=(
+            "A JSON dictionary with analysis for each URL containing: "
+            "1. Relevance score (0-10) "
+            "2. Factuality score (0-10) "
+            "3. Filtered content (removing irrelevant parts) "
+            "4. Brief analysis explaining your judgment"
+        ),
+        agent=analyst_agent
+    )
+def create_response_writing_task(writer_agent: Agent, query: str, analyzed_contents: Dict[str, Dict[str, Any]]) -> Task:
+    """
+    Creates a task for writing a comprehensive response based on analyzed content.
+    Args:
+        writer_agent: The writer agent to perform the task
+        query: The original query
+        analyzed_contents: Dict mapping URLs to analysis results
+    Returns:
+        Task for response writing
+    """
+    return Task(
+        description=(
+            f"Write a comprehensive response to the query: '{query}'\n\n"
+            f"Use the analyzed content to craft a well-structured, informative response that directly "
+            f"answers the user's query. Include proper citations for all information using [1], [2] format. "
+            f"Focus on clarity, factual accuracy, and addressing all aspects of the query."
+        ),
+        expected_output=(
+            "A comprehensive response that: "
+            "1. Directly answers the user's query "
+            "2. Uses information from the provided sources "
+            "3. Includes citations in [1], [2] format for all factual information "
+            "4. Provides a list of sources at the end"
+        ),
+        agent=writer_agent
+    )

tools/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from .search_rotation import SearchRotationTool
+from .content_analyzer import ContentAnalyzerTool
+from .rate_limited_tool import RateLimitedToolWrapper
+from .tavily_search import TavilySearchTool
+__all__ = [
+    'SearchRotationTool',
+    'ContentAnalyzerTool',
+    'RateLimitedToolWrapper',
+    'TavilySearchTool'
+]

tools/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (522 Bytes). View file

tools/__pycache__/content_analyzer.cpython-311.pyc ADDED Viewed

Binary file (4.54 kB). View file

tools/__pycache__/rate_limited_tool.cpython-311.pyc ADDED Viewed

Binary file (5.52 kB). View file

tools/__pycache__/search_rotation.cpython-311.pyc ADDED Viewed

Binary file (13.6 kB). View file

tools/__pycache__/tavily_search.cpython-311.pyc ADDED Viewed

Binary file (7.28 kB). View file

tools/content_analyzer.py ADDED Viewed

	@@ -0,0 +1,98 @@

+from typing import Optional, Dict, Any, Type
+from crewai.tools import BaseTool
+from pydantic import Field, BaseModel
+# Define the input schema as a separate class
+class ContentAnalyzerArgs(BaseModel):
+    query: str = Field(
+        ...,
+        description="The search query to compare content against"
+    )
+    content: str = Field(
+        ...,
+        description="The content to analyze for relevance and factuality"
+    )
+class ContentAnalyzerTool(BaseTool):
+    """
+    A tool for analyzing content relevance and factuality.
+    This tool uses LLM to judge the relevance and factual accuracy of content
+    in relation to a specific query.
+    """
+    name: str = Field(
+        default="Content Analyzer",
+        description="Name of the content analysis tool"
+    )
+    description: str = Field(
+        default=(
+            "Use this tool to analyze the relevance and factuality of content "
+            "in relation to a specific query. "
+            "It helps filter out irrelevant or potentially non-factual information."
+        ),
+        description="Description of what the content analyzer does"
+    )
+    # Define args_schema as a class attribute
+    args_schema: Type[BaseModel] = ContentAnalyzerArgs
+    def _run(self, query: str, content: str) -> Dict[str, Any]:
+        """
+        Analyze the content for relevance and factuality.
+        Args:
+            query: The original search query
+            content: The content to analyze
+        Returns:
+            Dict with analysis results including:
+            - relevance_score: A score from 0-10 indicating relevance
+            - factuality_score: A score from 0-10 indicating factual reliability
+            - filtered_content: The processed content with irrelevant parts removed
+            - analysis: Brief explanation of the judgment
+        """
+        # The actual implementation will use the agent's LLM
+        # via CrewAI's mechanism, returning the placeholders
+        # for now which will be replaced during execution
+        prompt = f"""
+        You are a strict content judge evaluating web search results.
+        QUERY: {query}
+        CONTENT: {content}
+        Analyze the content above with these criteria:
+        1. Relevance to the query (score 0-10)
+        2. Factual accuracy and reliability (score 0-10)
+        3. Information quality
+        For content scoring below 5 on relevance, discard it entirely.
+        For content with factuality concerns, flag these specifically.
+        PROVIDE YOUR ANALYSIS IN THIS FORMAT:
+        {{
+            "relevance_score": [0-10],
+            "factuality_score": [0-10],
+            "filtered_content": "The filtered and cleaned content, removing irrelevant parts",
+            "analysis": "Brief explanation of your judgment"
+        }}
+        ONLY RETURN THE JSON, nothing else.
+        """
+        # This method will be handled by CrewAI's internal mechanism
+        # For placeholder purposes during direct testing, we return example data.
+        # In a real CrewAI run, the agent's LLM would process the prompt.
+        return {
+            "relevance_score": 7,  # Placeholder
+            "factuality_score": 8,  # Placeholder
+            "filtered_content": content,  # Placeholder
+            "analysis": "This is a placeholder analysis. The real analysis will be performed during execution."
+        }
+    class Config:
+        """Pydantic config for the tool"""
+        arbitrary_types_allowed = True
+    def run(self, query: str, content: str) -> Dict[str, Any]:
+        """Public method to run content analysis"""
+        return self._run(query, content)

tools/rate_limited_tool.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import time
+from typing import Any, Dict, Optional, Type
+from crewai.tools import BaseTool
+from pydantic import BaseModel, Field, model_validator, create_model
+import logging
+logger = logging.getLogger(__name__)
+class RateLimitedToolWrapper(BaseTool):
+    """
+    A wrapper tool that adds an optional time delay after executing another tool.
+    Useful for enforcing rate limits on API calls or simply adding a pause.
+    It also ensures that arguments are correctly passed to the wrapped tool.
+    """
+    name: str = Field(
+        default="Rate Limited Tool Wrapper",
+        description="A tool that wraps another tool to add a delay after execution"
+    )
+    description: str = Field(
+        default="Wraps another tool to add a delay after execution, enforcing rate limits.",
+        description="The tool's description that will be passed to the agent"
+    )
+    tool: BaseTool = Field(
+        ...,
+        description="The tool to be wrapped with rate limiting"
+    )
+    delay: float = Field(
+        default=0.0,
+        description="Delay in seconds to wait after tool execution (0 means no delay)",
+        ge=0.0
+    )
+    # Create a simple args schema for fallback
+    class RateLimitedToolArgs(BaseModel):
+        query: str = Field(..., description="The search query to pass to the wrapped tool")
+    def __init__(self, **data):
+        # Store the original args_schema if available
+        tool = data.get('tool')
+        # Set args_schema directly in data before initialization
+        if tool and hasattr(tool, 'args_schema') and tool.args_schema is not None:
+            if isinstance(tool.args_schema, type) and issubclass(tool.args_schema, BaseModel):
+                data['args_schema'] = tool.args_schema
+            else:
+                data['args_schema'] = self.RateLimitedToolArgs
+        else:
+            data['args_schema'] = self.RateLimitedToolArgs
+        super().__init__(**data)
+    def _run(self, query: str) -> str:
+        """
+        Run the wrapped tool with the query parameter and then pause for the specified delay.
+        Args:
+            query: The query string to pass to the wrapped tool.
+        Returns:
+            The result from the wrapped tool.
+        """
+        logger.debug(f"RateLimitedToolWrapper: Running tool '{self.tool.name}' with query: {query}")
+        try:
+            # Call the tool's run method with the query
+            result = self.tool.run(query)
+        except Exception as e:
+            logger.error(f"Exception running wrapped tool '{self.tool.name}': {e}")
+            # Fall back to trying the _run method directly if the run method fails
+            try:
+                if hasattr(self.tool, '_run'):
+                    logger.warning(f"Falling back to direct _run call for tool '{self.tool.name}'")
+                    result = self.tool._run(query)
+                else:
+                    raise e
+            except Exception as inner_e:
+                logger.error(f"Fallback also failed for tool '{self.tool.name}': {inner_e}")
+                raise inner_e
+        # Enforce the delay only if greater than 0
+        if self.delay > 0:
+            logger.info(f"Rate limit enforced: Waiting {self.delay:.2f} seconds after running {self.tool.name}.")
+            time.sleep(self.delay)
+        return result

tools/search_rotation.py ADDED Viewed

	@@ -0,0 +1,246 @@

+import random
+import time
+from typing import List, Dict, Any, Optional, Type
+from crewai.tools import BaseTool
+from pydantic import BaseModel, Field
+class SearchRotationArgs(BaseModel):
+    """Input schema for SearchRotationTool."""
+    query: str = Field(..., description="The search query to look up")
+class SearchRotationTool(BaseTool):
+    """
+    Tool for rotating between multiple search engines with a limit on searches per query.
+    This tool alternates between different search engines and enforces a maximum
+    number of searches per query to manage API usage and costs.
+    """
+    name: str = Field(
+        default="Web Search Rotation",
+        description="Search the internet using multiple search engines in rotation"
+    )
+    description: str = Field(
+        default="Use this tool to search for information on the internet using different search engines in rotation.",
+        description="Description of the search rotation tool"
+    )
+    search_tools: List[BaseTool] = Field(
+        default=[],
+        description="List of search tools to rotate between"
+    )
+    max_searches_per_query: int = Field(
+        default=5,
+        description="Maximum number of searches allowed per query"
+    )
+    cache_timeout: int = Field(
+        default=300,  # 5 minutes
+        description="How long to cache results for similar queries in seconds"
+    )
+    args_schema: Type[BaseModel] = SearchRotationArgs
+    def __init__(self, **data):
+        super().__init__(**data)
+        if not self.search_tools:
+            raise ValueError("At least one search tool must be provided")
+        self._search_count = 0
+        self._current_search_query = None
+        self._last_used_tool = None
+        self._cache = {}  # Simple cache for recent queries
+        self._last_search_time = {}  # Track when each tool was last used
+        # Log available search tools
+        tool_names = [tool.name for tool in self.search_tools]
+        print(f"SearchRotationTool initialized with tools: {', '.join(tool_names)}")
+    def _run(self, query: str) -> str:
+        """
+        Execute a web search using a rotation of search engines.
+        Args:
+            query: The search query to look up
+        Returns:
+            String containing the search results
+        """
+        print(f"SearchRotationTool executing search for: '{query}'")
+        # Check cache first for very similar queries
+        for cached_query, (timestamp, result) in list(self._cache.items()):
+            # Simple similarity check - if query is very similar to a cached query
+            if self._is_similar_query(query, cached_query):
+                # Check if cache is still valid
+                if time.time() - timestamp < self.cache_timeout:
+                    print(f"Using cached result for similar query: '{cached_query}'")
+                    return f"{result}\n\n[Cached result from similar query: '{cached_query}']"
+                else:
+                    # Remove expired cache entries to prevent cache bloat
+                    print(f"Cache expired for query: '{cached_query}'")
+                    self._cache.pop(cached_query, None)
+        # Reset counter if this is a new query
+        if not self._is_similar_query(self._current_search_query, query):
+            print(f"New search query detected. Resetting search count.")
+            self._current_search_query = query
+            self._search_count = 0
+        # Check if we've reached the search limit
+        if self._search_count >= self.max_searches_per_query:
+            print(f"Search limit reached ({self._search_count}/{self.max_searches_per_query})")
+            return (f"Search limit reached. You've performed {self._search_count} searches "
+                    f"for this query. Maximum allowed is {self.max_searches_per_query}.")
+        # Select the most appropriate search tool based on usage and delay
+        search_tool = self._select_optimal_tool()
+        print(f"Selected search tool: {search_tool.name}")
+        # Keep track of which tools we've tried for this specific search attempt
+        tried_tools = set()
+        max_retry_attempts = min(3, len(self.search_tools))
+        retry_count = 0
+        while retry_count < max_retry_attempts:
+            tried_tools.add(search_tool.name)
+            try:
+                # Execute the search
+                print(f"Using Tool: {search_tool.name}")
+                start_time = time.time()
+                result = search_tool.run(query)
+                search_time = time.time() - start_time
+                # Basic validation of result - check if it's empty or error message
+                if not result or "error" in result.lower() or len(result.strip()) < 20:
+                    # Result might be invalid, try another tool if available
+                    print(f"Invalid or error result from {search_tool.name}. Trying another tool.")
+                    retry_count += 1
+                    search_tool = self._select_next_tool(tried_tools)
+                    if not search_tool:  # No more tools to try
+                        print("All search tools failed. No more tools to try.")
+                        return "All search tools failed to provide meaningful results for this query."
+                    continue
+                # Valid result obtained
+                print(f"Valid result obtained from {search_tool.name} in {search_time:.2f}s")
+                # Update tracking
+                self._last_used_tool = search_tool
+                self._last_search_time[search_tool.name] = time.time()
+                # Cache the result
+                self._cache[query] = (time.time(), result)
+                # Increment the counter
+                self._search_count += 1
+                print(f"Search count incremented to {self._search_count}/{self.max_searches_per_query}")
+                # Add usage information
+                searches_left = self.max_searches_per_query - self._search_count
+                usage_info = f"\n\nSearch performed using {search_tool.name} in {search_time:.2f}s. "
+                usage_info += f"Searches used: {self._search_count}/{self.max_searches_per_query}. "
+                usage_info += f"Searches remaining: {max(0, searches_left)}."
+                return f"{result}\n{usage_info}"
+            except Exception as e:
+                # If this search tool fails, try another one
+                print(f"Exception in {search_tool.name}: {str(e)}")
+                retry_count += 1
+                search_tool = self._select_next_tool(tried_tools)
+                if not search_tool:  # No more tools to try
+                    print("All search tools failed with exceptions. No more tools to try.")
+                    return f"Error searching with all available search engines: {str(e)}"
+        # If we've exhausted our retry attempts
+        print(f"Failed after {retry_count} retry attempts")
+        return "Failed to get search results after multiple attempts with different search engines."
+    def _select_next_tool(self, tried_tools: set) -> Optional[BaseTool]:
+        """Select the next tool that hasn't been tried yet."""
+        available_tools = [t for t in self.search_tools if t.name not in tried_tools]
+        if not available_tools:
+            return None
+        # Sort by last used time (oldest first) if we have that data
+        if self._last_search_time:
+            available_tools.sort(key=lambda t: self._last_search_time.get(t.name, 0))
+        return available_tools[0] if available_tools else None
+    def _select_optimal_tool(self) -> BaseTool:
+        """Select the best tool based on recent usage patterns."""
+        current_time = time.time()
+        # If we have no history or all tools used recently, pick randomly with weights
+        if not self._last_used_tool or not self._last_search_time:
+            return random.choice(self.search_tools)
+        # Try to avoid using the same tool twice in a row
+        available_tools = [t for t in self.search_tools if t != self._last_used_tool]
+        # If we have multiple tools available, choose the one used least recently
+        if available_tools:
+            # Sort by last used time (oldest first)
+            available_tools.sort(key=lambda t: self._last_search_time.get(t.name, 0))
+            return available_tools[0]
+        # If only one tool available, use it
+        return self.search_tools[0]
+    def _is_similar_query(self, query1, query2):
+        """Check if two queries are similar enough to use cached results."""
+        if not query1 or not query2:
+            return False
+        # Convert to lowercase and remove common filler words
+        q1 = query1.lower()
+        q2 = query2.lower()
+        # If the strings are identical
+        if q1 == q2:
+            return True
+        # Remove common filler words to focus on meaningful terms
+        filler_words = {'the', 'a', 'an', 'and', 'or', 'but', 'is', 'are', 'was', 'were',
+                       'in', 'on', 'at', 'to', 'for', 'with', 'by', 'about', 'like',
+                       'through', 'over', 'before', 'between', 'after', 'since', 'without',
+                       'under', 'within', 'along', 'following', 'across', 'behind',
+                       'beyond', 'plus', 'except', 'but', 'up', 'down', 'off', 'on', 'me', 'you'}
+        # Clean and tokenize
+        def clean_and_tokenize(q):
+            # Remove punctuation
+            q = ''.join(c for c in q if c.isalnum() or c.isspace())
+            # Tokenize
+            tokens = q.split()
+            # Remove filler words
+            return {word for word in tokens if word.lower() not in filler_words and len(word) > 1}
+        words1 = clean_and_tokenize(q1)
+        words2 = clean_and_tokenize(q2)
+        # If either query has no significant words after cleaning, they're not similar
+        if not words1 or not words2:
+            return False
+        # Calculate Jaccard similarity
+        intersection = len(words1.intersection(words2))
+        union = len(words1.union(words2))
+        # If the queries are short, we require more overlap
+        min_words = min(len(words1), len(words2))
+        max_words = max(len(words1), len(words2))
+        # For short queries, use strict similarity threshold
+        if min_words <= 3:
+            # For very short queries, require almost exact match
+            return intersection / union > 0.8
+        # For normal length queries
+        elif min_words <= 6:
+            return intersection / union > 0.7
+        # For longer queries
+        else:
+            # Check both Jaccard similarity and absolute intersection size
+            # For long queries, having many words in common is important
+            absolute_overlap_threshold = min(5, min_words // 2)
+            return (intersection / union > 0.6) or (intersection >= absolute_overlap_threshold)

tools/tavily_search.py ADDED Viewed

	@@ -0,0 +1,139 @@

+import os
+import requests
+import time
+import hashlib
+from typing import Dict, Any, Optional, Type
+from crewai.tools import BaseTool
+from pydantic import BaseModel, Field
+class TavilySearchArgs(BaseModel):
+    """Input schema for TavilySearchTool."""
+    query: str = Field(..., description="The search query to look up")
+class TavilySearchTool(BaseTool):
+    """
+    Tool for performing web searches using the Tavily Search API.
+    This tool sends a search query to Tavily and returns relevant search results.
+    """
+    name: str = Field(
+        default="Tavily Web Search",
+        description="Search the internet using Tavily"
+    )
+    description: str = Field(
+        default="Use this tool to search for information on the internet using Tavily Search API.",
+        description="Description of the Tavily search tool"
+    )
+    api_key: Optional[str] = Field(
+        default=None,
+        description="Tavily API key. If not provided, will look for TAVILY_API_KEY environment variable"
+    )
+    search_depth: str = Field(
+        default="basic",
+        description="The depth of the search, 'basic' or 'advanced'"
+    )
+    max_results: int = Field(
+        default=5,
+        description="Maximum number of search results to return (1-10)"
+    )
+    include_answer: bool = Field(
+        default=False,
+        description="Whether to include an AI-generated answer in the response"
+    )
+    timeout: int = Field(
+        default=10,
+        description="Timeout for the API request in seconds"
+    )
+    args_schema: Type[BaseModel] = TavilySearchArgs
+    def __init__(self, **data):
+        super().__init__(**data)
+        self.api_key = self.api_key or os.getenv("TAVILY_API_KEY")
+        if not self.api_key:
+            print("WARNING: Tavily API key is missing. The tool will return an error message when used.")
+        self._cache = {}  # Simple in-memory cache
+    def _run(self, query: str) -> str:
+        """
+        Execute a web search using Tavily.
+        Args:
+            query: The search query to look up
+        Returns:
+            String containing the search results
+        """
+        # Check if API key is missing
+        if not self.api_key:
+            return (
+                "ERROR: Tavily API key is missing. Please set the TAVILY_API_KEY environment variable. "
+                "Search cannot be performed without a valid API key."
+            )
+        # Check cache first
+        cache_key = self._get_cache_key(query)
+        if cache_key in self._cache:
+            timestamp, result = self._cache[cache_key]
+            # Cache valid for 30 minutes
+            if time.time() - timestamp < 1800:
+                return f"{result}\n\n[Cached Tavily result]"
+        url = "https://api.tavily.com/search"
+        payload = {
+            "api_key": self.api_key,
+            "query": query,
+            "search_depth": self.search_depth,
+            "max_results": min(self.max_results, 10),  # Ensure we don't exceed API limits
+            "include_answer": self.include_answer
+        }
+        try:
+            response = requests.post(url, json=payload, timeout=self.timeout)
+            response.raise_for_status()
+            result = response.json()
+            if "results" not in result:
+                return f"Error in search: {result.get('error', 'Unknown error')}"
+            # Format the results
+            formatted_results = self._format_results(result)
+            # Cache the result
+            self._cache[cache_key] = (time.time(), formatted_results)
+            return formatted_results
+        except requests.exceptions.Timeout:
+            return "Error: Tavily search request timed out. Please try again later."
+        except requests.exceptions.RequestException as e:
+            return f"Error during Tavily search: {str(e)}"
+    def _format_results(self, result: Dict[str, Any]) -> str:
+        """Format the search results into a readable string."""
+        output = []
+        # Add the answer if included
+        if "answer" in result and result["answer"]:
+            output.append(f"Answer: {result['answer']}\n")
+        # Add search results
+        output.append("Search Results:")
+        for i, r in enumerate(result.get("results", []), 1):
+            title = r.get("title", "No Title")
+            url = r.get("url", "No URL")
+            content = r.get("content", "No Content").strip()
+            result_text = f"\n{i}. {title}\n   URL: {url}\n   Content: {content}\n"
+            output.append(result_text)
+        return "\n".join(output)
+    def _get_cache_key(self, query: str) -> str:
+        """Generate a cache key for the given query."""
+        # Include search parameters in the key
+        params_str = f"{query}|{self.search_depth}|{self.max_results}|{self.include_answer}"
+        return hashlib.md5(params_str.encode()).hexdigest()

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .helpers import is_valid_query, format_research_results, extract_citations
2	+
3	+ __all__ = ['is_valid_query', 'format_research_results', 'extract_citations']

utils/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (330 Bytes). View file

utils/__pycache__/helpers.cpython-311.pyc ADDED Viewed

Binary file (4.66 kB). View file

utils/helpers.py ADDED Viewed

	@@ -0,0 +1,120 @@

+import re
+import json
+from typing import Dict, Any, List, Optional
+def is_valid_query(query: str) -> bool:
+    """
+    Validates if a search query is legitimate.
+    Args:
+        query: The search query to validate
+    Returns:
+        Boolean indicating if the query is valid
+    """
+    # Reject empty queries
+    if not query or query.strip() == "":
+        return False
+    # Reject single emoji queries
+    emoji_pattern = re.compile(
+        "["
+        "\U0001F600-\U0001F64F"  # emoticons
+        "\U0001F300-\U0001F5FF"  # symbols & pictographs
+        "\U0001F680-\U0001F6FF"  # transport & map symbols
+        "\U0001F700-\U0001F77F"  # alchemical symbols
+        "\U0001F780-\U0001F7FF"  # Geometric Shapes
+        "\U0001F800-\U0001F8FF"  # Supplemental Arrows-C
+        "\U0001F900-\U0001F9FF"  # Supplemental Symbols and Pictographs
+        "\U0001FA00-\U0001FA6F"  # Chess Symbols
+        "\U0001FA70-\U0001FAFF"  # Symbols and Pictographs Extended-A
+        "\U00002702-\U000027B0"  # Dingbats
+        "\U000024C2-\U0001F251"
+        "]+"
+    )
+    stripped_query = emoji_pattern.sub(r'', query).strip()
+    if not stripped_query and len(query) <= 5:  # Single emoji or very short
+        return False
+    # Reject random numbers only (at least 5 digits with no context)
+    if re.match(r'^\d{5,}$', query.strip()):
+        return False
+    # Reject gibberish (no vowels in long string suggests gibberish)
+    if len(query) > 10 and not re.search(r'[aeiouAEIOU]', query):
+        return False
+    return True
+def format_research_results(search_results: List[Dict[str, Any]],
+                           scraped_contents: Dict[str, str],
+                           analyzed_contents: Dict[str, Dict[str, Any]]) -> str:
+    """
+    Formats research results into a readable response with citations.
+    Args:
+        search_results: The list of search result items
+        scraped_contents: Dict mapping URLs to scraped content
+        analyzed_contents: Dict mapping URLs to analysis results
+    Returns:
+        Formatted response with citations
+    """
+    response_parts = []
+    citations = []
+    # Filter to only include relevant content based on analysis
+    relevant_urls = {
+        url: data
+        for url, data in analyzed_contents.items()
+        if data.get("relevance_score", 0) >= 5
+    }
+    # No relevant results
+    if not relevant_urls:
+        return "I couldn't find relevant information for your query. Could you try rephrasing or providing more details?"
+    # Compile the response with relevant information
+    for i, (url, data) in enumerate(relevant_urls.items(), 1):
+        citations.append(f"[{i}] {url}")
+        filtered_content = data.get("filtered_content", "")
+        # Add the content with citation
+        if filtered_content:
+            response_parts.append(f"{filtered_content} [{i}]")
+    # Combine everything
+    response = "\n\n".join(response_parts)
+    citation_text = "\n".join(citations)
+    return f"{response}\n\nSources:\n{citation_text}"
+def extract_citations(text: str) -> List[Dict[str, str]]:
+    """
+    Extract citations from formatted text.
+    Args:
+        text: Text with citation markers like [1], [2], etc.
+    Returns:
+        List of citation objects with citation number and referenced text
+    """
+    citations = []
+    citation_pattern = r'\[(\d+)\]'
+    matches = re.finditer(citation_pattern, text)
+    for match in matches:
+        citation_num = match.group(1)
+        # Get the preceding text (limited to reasonable length)
+        start_pos = max(0, match.start() - 100)
+        cited_text = text[start_pos:match.start()].strip()
+        if len(cited_text) == 100:  # Truncated
+            cited_text = "..." + cited_text
+        citations.append({
+            "number": citation_num,
+            "text": cited_text
+        })
+    return citations