File size: 9,954 Bytes
6bcc4cb
3e090a6
 
6bcc4cb
 
 
 
3e090a6
6bcc4cb
 
 
 
3e090a6
b2706cf
3e090a6
b2706cf
3e090a6
b2706cf
3e090a6
 
 
 
b2706cf
3e090a6
 
 
 
 
b2706cf
3e090a6
 
 
 
b2706cf
3e090a6
b2706cf
3e090a6
b2706cf
3e090a6
 
 
 
 
b2706cf
3e090a6
 
 
 
 
b2706cf
3e090a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
---
title: LangGraph Data Analyst Agent
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.28.0"
app_file: app.py
pinned: false
license: mit
---

# πŸ€– LangGraph Data Analyst Agent

An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.

## 🌟 Features

### Core Functionality
- **Multi-Agent Architecture**: Separate specialized agents for structured and unstructured queries
- **Query Classification**: Automatic routing to appropriate agent based on query type
- **Rich Tool Set**: Comprehensive tools for data analysis and insights

### Advanced Memory & Persistence
- **Session Management**: Persistent conversations across page reloads and browser sessions
- **User Profile Tracking**: Agent learns and remembers user interests and preferences  
- **Conversation History**: Full context retention using LangGraph checkpointers
- **Cross-Session Continuity**: Resume conversations using session IDs

### Intelligent Recommendations
- **Query Suggestions**: AI-powered recommendations based on conversation history
- **Interactive Refinement**: Collaborative query building with the agent
- **Context-Aware**: Suggestions based on user profile and previous interactions

## πŸ—οΈ Architecture

The agent uses LangGraph's multi-agent architecture with the following components:

```
User Query β†’ Classifier β†’ [Structured Agent | Unstructured Agent | Recommender] β†’ Summarizer β†’ Response
                ↓
            Tool Nodes (Dataset Analysis Tools)
```

### Agent Types
1. **Structured Agent**: Handles quantitative queries (statistics, examples, distributions)
2. **Unstructured Agent**: Handles qualitative queries (summaries, insights, patterns)
3. **Query Recommender**: Suggests follow-up questions based on context
4. **Summarizer**: Updates user profile and conversation memory

## πŸš€ Setup Instructions

### Prerequisites
- **Python Version**: 3.9 or higher
- **API Key**: OpenAI API key or Nebius API key
- **For Hugging Face Spaces**: Ensure your API key is set as a Space secret

### Installation

1. **Clone the repository**:
```bash
git clone <repository-url>
cd Agents
```

2. **Install dependencies**:
```bash
pip install -r requirements.txt
```

3. **Configure API Key**:

Create a `.env` file in the project root:
```bash
# For OpenAI (recommended)
OPENAI_API_KEY=your_openai_api_key_here

# OR for Nebius
NEBIUS_API_KEY=your_nebius_api_key_here
```

4. **Run the application**:
```bash
streamlit run app.py
```

5. **Access the app**:
Open your browser to `http://localhost:8501`

### Alternative Deployment

#### For Hugging Face Spaces:
1. **Fork or upload this repository to Hugging Face Spaces**
2. **Set your API key as a Space secret:**
   - Go to your Space settings
   - Navigate to "Variables and secrets" 
   - Add a secret named `NEBIUS_API_KEY` or `OPENAI_API_KEY`
   - Enter your API key as the value
3. **The app will start automatically**

#### For other cloud deployment:
```bash
export OPENAI_API_KEY=your_api_key_here
# OR
export NEBIUS_API_KEY=your_api_key_here
```

## 🎯 Usage Guide

### Query Types

#### Structured Queries (Quantitative Analysis)
- "How many records are in each category?"
- "What are the most common customer issues?"
- "Show me 5 examples of billing problems"
- "Get distribution of intents"

#### Unstructured Queries (Qualitative Analysis)  
- "Summarize the refund category"
- "What patterns do you see in payment issues?"
- "Analyze customer sentiment in billing conversations"
- "What insights can you provide about technical support?"

#### Memory & Recommendations
- "What do you remember about me?"
- "What should I query next?"
- "Advise me what to explore"
- "Recommend follow-up questions"

### Session Management

#### Creating Sessions
- **New Session**: Click "πŸ†• New Session" to start fresh
- **Auto-Generated**: Each new browser session gets a unique ID

#### Resuming Sessions
1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`)
2. Enter the full session ID in "Join Existing Session"
3. Click "πŸ”— Join Session" to resume

#### Cross-Tab Persistence
- Open multiple tabs with the same session ID
- Conversations sync across all tabs
- Memory and user profile persist

## 🧠 Memory System

### User Profile Tracking
The agent automatically tracks:
- **Interests**: Topics and categories you frequently ask about
- **Expertise Level**: Inferred from question complexity (beginner/intermediate/advanced)
- **Preferences**: Analysis style preferences (quantitative vs qualitative)
- **Query History**: Recent questions for context

### Conversation Persistence
- **Thread-based**: Each session has a unique thread ID
- **Checkpoint System**: LangGraph automatically saves state after each interaction
- **Cross-Session**: Resume conversations days or weeks later

### Memory Queries
Ask the agent what it remembers:
```
"What do you remember about me?"
"What are my interests?"
"What have I asked about before?"
```

## πŸ”§ Testing the Agent

### Basic Functionality Tests

1. **Classification Test**:
```
Query: "How many categories are there?"
Expected: Routes to Structured Agent β†’ Uses get_dataset_stats tool
```

2. **Follow-up Memory Test**:
```
Query 1: "Show me billing examples"
Query 2: "Show me more examples"
Expected: Agent remembers previous context about billing
```

3. **User Profile Test**:
```
Query 1: "I'm interested in refund patterns"
Query 2: "What do you remember about me?"
Expected: Agent mentions interest in refunds
```

4. **Recommendation Test**:
```
Query: "What should I query next?"
Expected: Personalized suggestions based on history
```

### Advanced Feature Tests

1. **Session Persistence**:
   - Ask a question, reload the page
   - Verify conversation history remains
   - Verify user profile persists

2. **Cross-Session Memory**:
   - Note your session ID
   - Close browser completely
   - Reopen and join the same session
   - Verify full conversation and profile restoration

3. **Interactive Recommendations**:
```
User: "Advise me what to query next"
Agent: "Based on your interest in billing, you might want to analyze refund patterns."
User: "I'd rather see examples instead"
Agent: "Then I suggest showing 5 examples of refund requests."
User: "Please do so"
Expected: Agent executes the refined query
```

## πŸ“ File Structure

```
Agents/
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ .env                     # API keys (create this)
β”œβ”€β”€ app.py                   # LangGraph Streamlit app
β”œβ”€β”€ langgraph_agent.py       # LangGraph agent implementation
β”œβ”€β”€ agent-memory.ipynb       # Memory example notebook
β”œβ”€β”€ test_agent.py            # Test suite
└── DEPLOYMENT_GUIDE.md      # Original deployment guide
```

## πŸ› οΈ Technical Implementation

### LangGraph Components

**State Management**:
```python
class AgentState(TypedDict):
    messages: List[Any]
    query_type: Optional[str]
    user_profile: Optional[Dict[str, Any]]
    session_context: Optional[Dict[str, Any]]
```

**Tool Categories**:
- **Structured Tools**: Statistics, distributions, examples, search
- **Unstructured Tools**: Summaries, insights, pattern analysis
- **Memory Tools**: Profile updates, preference tracking

**Graph Flow**:
1. **Classifier**: Determines query type
2. **Agent Selection**: Routes to appropriate specialist
3. **Tool Execution**: Dynamic tool usage based on needs
4. **Memory Update**: Profile and context updates
5. **Response Generation**: Final answer with memory integration

### Memory Architecture

**Checkpointer**: LangGraph's `MemorySaver` for conversation persistence
**Thread Management**: Unique thread IDs for session isolation
**Profile Synthesis**: LLM-powered extraction of user characteristics
**Context Retention**: Full conversation history with temporal awareness

## πŸ” Troubleshooting

### Common Issues

1. **API Key Errors**:
   - Verify `.env` file exists and has correct key
   - Check environment variable is set in deployment
   - Ensure API key has sufficient credits

2. **Memory Not Persisting**:
   - Verify session ID remains consistent
   - Check browser localStorage not being cleared
   - Ensure thread_id parameter is passed correctly

3. **Dataset Loading Issues**:
   - Check internet connection for Hugging Face datasets
   - Verify datasets library is installed
   - Try clearing Streamlit cache: `streamlit cache clear`

4. **Tool Execution Errors**:
   - Verify all dependencies in requirements.txt are installed
   - Check dataset is properly loaded
   - Review error messages in Streamlit interface

### Debug Mode

Enable debug logging by setting:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## πŸŽ“ Learning Objectives

This implementation demonstrates:

1. **LangGraph Multi-Agent Systems**: Specialized agents for different query types
2. **Memory & Persistence**: Conversation continuity across sessions  
3. **Tool Integration**: Dynamic tool selection and execution
4. **State Management**: Complex state updates and routing
5. **User Experience**: Session management and interactive features

## πŸš€ Future Enhancements

Potential improvements:
- **Database Persistence**: Replace MemorySaver with PostgreSQL checkpointer
- **Advanced Analytics**: More sophisticated data analysis tools
- **Export Features**: PDF/CSV report generation
- **User Authentication**: Multi-user support with profiles
- **Real-time Collaboration**: Shared sessions between users

## πŸ“„ License

This project is for educational purposes as part of a data science curriculum.

## 🀝 Contributing

This is an assignment project. For questions or issues, please contact the course instructors.

---

**Built with**: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets