web-research-agent / architecture.md
samspeaks5's picture
initial commit
d445f2a verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

Web Research Agent Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                               Gradio Interface                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Research Engine                                  β”‚
β”‚                                                                               β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                        Conversation History                            β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                               β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚  Researcher │◄────────────►│   Analyst   │◄────────────►│    Writer   β”‚  β”‚
β”‚   β”‚    Agent    β”‚              β”‚    Agent    β”‚              β”‚    Agent    β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚          β”‚                            β”‚                            β”‚          β”‚
β”‚          β–Ό                            β–Ό                            β–Ό          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ Search      β”‚              β”‚   Scrape    β”‚              β”‚ Information β”‚  β”‚
β”‚   β”‚ Rotation    β”‚              β”‚ Website Toolβ”‚              β”‚ Synthesis   β”‚  β”‚
β”‚   β”‚ Tool        β”‚              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚                                       β”‚
β”‚                                       β–Ό                                       β”‚
β”‚                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                β”‚
β”‚                                β”‚   Content   β”‚                                β”‚
β”‚                                β”‚  Analyzer   β”‚                                β”‚
β”‚                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Research Flow

  1. User Input

    • User enters a query in the Gradio interface
    • Query is validated for legitimacy and processed by the system
  2. Query Refinement (Researcher Agent)

    • Original query is analyzed and refined for optimal search results
    • Ambiguous terms are clarified and search intent is identified
    • Refined query is prepared for web search with improved keywords
  3. Web Search (Researcher Agent + Search Rotation Tool)

    • Search Rotation Tool executes search using multiple search engines
    • Rate limiting is implemented to avoid API throttling
    • Search is performed with a maximum of 5 searches per query
    • Results are cached for similar queries to improve efficiency
    • Search results are collected with URLs and snippets
  4. Content Scraping (Analyst Agent + ScrapeWebsiteTool)

    • ScrapeWebsiteTool extracts content from search result URLs
    • HTML content is parsed to extract meaningful text
    • Raw content is prepared for analysis and evaluation
  5. Content Analysis (Analyst Agent + ContentAnalyzerTool)

    • Content is analyzed for relevance to the query (scores 0-10)
    • Factuality and quality are evaluated (scores 0-10)
    • Irrelevant or low-quality content is filtered out
    • Content is organized by relevance and information value
  6. Response Creation (Writer Agent)

    • Analyzed content is synthesized into a comprehensive response
    • Information is organized logically with a clear structure
    • Contradictory information is reconciled when present
    • Citations are added in [1], [2] format with proper attribution
    • Source URLs are included for reference and verification
  7. Result Presentation

    • Final response with citations is displayed to the user
    • Conversation history is updated and maintained per session
    • Results can be saved to file if requested

System Architecture

  • Multi-Agent System: Three specialized agents work together with distinct roles
  • Stateless Design: Each research request is processed independently
  • Session Management: User sessions maintain separate conversation contexts
  • API Integration: Multiple search APIs with fallback mechanisms
  • Memory: All agents maintain context throughout the research process
  • Tool Abstraction: Search and analysis tools are modular and interchangeable
  • Error Handling: Comprehensive error handling at each processing stage
  • Rate Limiting: API calls are rate-limited to prevent throttling

Technical Implementation

  • Frontend: Gradio web interface with real-time feedback
  • Backend: Python-based research engine with modular components
  • Tools:
    • Search Rotation Tool (supports multiple search engines)
    • Rate Limited Tool Wrapper (prevents API throttling)
    • Content Analyzer Tool (evaluates relevance and factuality)
    • Scrape Website Tool (extracts content from URLs)
  • Deployment: Compatible with Hugging Face Spaces for online access
  • Caching: Results are cached to improve performance and reduce API calls