File size: 7,065 Bytes
d445f2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Web Research Agent Architecture

```

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚                               Gradio Interface                                β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                                    β”‚

                                    β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

β”‚                              Research Engine                                  β”‚

β”‚                                                                               β”‚

β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚

β”‚   β”‚                        Conversation History                            β”‚  β”‚

β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚

β”‚                                                                               β”‚

β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚

β”‚   β”‚  Researcher │◄────────────►│   Analyst   │◄────────────►│    Writer   β”‚  β”‚

β”‚   β”‚    Agent    β”‚              β”‚    Agent    β”‚              β”‚    Agent    β”‚  β”‚

β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚

β”‚          β”‚                            β”‚                            β”‚          β”‚

β”‚          β–Ό                            β–Ό                            β–Ό          β”‚

β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚

β”‚   β”‚ Search      β”‚              β”‚   Scrape    β”‚              β”‚ Information β”‚  β”‚

β”‚   β”‚ Rotation    β”‚              β”‚ Website Toolβ”‚              β”‚ Synthesis   β”‚  β”‚

β”‚   β”‚ Tool        β”‚              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚

β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚                                       β”‚

β”‚                                       β–Ό                                       β”‚

β”‚                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                β”‚

β”‚                                β”‚   Content   β”‚                                β”‚

β”‚                                β”‚  Analyzer   β”‚                                β”‚

β”‚                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                β”‚

β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

```

## Research Flow

1. **User Input**
   - User enters a query in the Gradio interface
   - Query is validated for legitimacy and processed by the system

2. **Query Refinement** (Researcher Agent)
   - Original query is analyzed and refined for optimal search results
   - Ambiguous terms are clarified and search intent is identified
   - Refined query is prepared for web search with improved keywords

3. **Web Search** (Researcher Agent + Search Rotation Tool)
   - Search Rotation Tool executes search using multiple search engines
   - Rate limiting is implemented to avoid API throttling
   - Search is performed with a maximum of 5 searches per query
   - Results are cached for similar queries to improve efficiency
   - Search results are collected with URLs and snippets

4. **Content Scraping** (Analyst Agent + ScrapeWebsiteTool)
   - ScrapeWebsiteTool extracts content from search result URLs
   - HTML content is parsed to extract meaningful text
   - Raw content is prepared for analysis and evaluation

5. **Content Analysis** (Analyst Agent + ContentAnalyzerTool)
   - Content is analyzed for relevance to the query (scores 0-10)
   - Factuality and quality are evaluated (scores 0-10)
   - Irrelevant or low-quality content is filtered out
   - Content is organized by relevance and information value

6. **Response Creation** (Writer Agent)
   - Analyzed content is synthesized into a comprehensive response
   - Information is organized logically with a clear structure
   - Contradictory information is reconciled when present
   - Citations are added in [1], [2] format with proper attribution
   - Source URLs are included for reference and verification

7. **Result Presentation**
   - Final response with citations is displayed to the user
   - Conversation history is updated and maintained per session
   - Results can be saved to file if requested

## System Architecture

- **Multi-Agent System**: Three specialized agents work together with distinct roles
- **Stateless Design**: Each research request is processed independently
- **Session Management**: User sessions maintain separate conversation contexts
- **API Integration**: Multiple search APIs with fallback mechanisms
- **Memory**: All agents maintain context throughout the research process
- **Tool Abstraction**: Search and analysis tools are modular and interchangeable
- **Error Handling**: Comprehensive error handling at each processing stage
- **Rate Limiting**: API calls are rate-limited to prevent throttling

## Technical Implementation

- **Frontend**: Gradio web interface with real-time feedback
- **Backend**: Python-based research engine with modular components
- **Tools**: 
  - Search Rotation Tool (supports multiple search engines)
  - Rate Limited Tool Wrapper (prevents API throttling)
  - Content Analyzer Tool (evaluates relevance and factuality)
  - Scrape Website Tool (extracts content from URLs)
- **Deployment**: Compatible with Hugging Face Spaces for online access
- **Caching**: Results are cached to improve performance and reduce API calls