Spaces:

SaritMeshesha
/

langraph-llm-data-analyst-agent

Sleeping

File size: 9,954 Bytes

6bcc4cb
3e090a6
 
6bcc4cb
 
 
 
3e090a6
6bcc4cb
 
 
 
3e090a6
b2706cf
3e090a6
b2706cf
3e090a6
b2706cf
3e090a6
 
 
 
b2706cf
3e090a6
 
 
 
 
b2706cf
3e090a6
 
 
 
b2706cf
3e090a6
b2706cf
3e090a6
b2706cf
3e090a6
 
 
 
 
b2706cf
3e090a6
 
 
 
 
b2706cf
3e090a6

---
title: LangGraph Data Analyst Agent
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.28.0"
app_file: app.py
pinned: false
license: mit
---

# 🤖 LangGraph Data Analyst Agent

An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.

## 🌟 Features

### Core Functionality
- **Multi-Agent Architecture**: Separate specialized agents for structured and unstructured queries
- **Query Classification**: Automatic routing to appropriate agent based on query type
- **Rich Tool Set**: Comprehensive tools for data analysis and insights

### Advanced Memory & Persistence
- **Session Management**: Persistent conversations across page reloads and browser sessions
- **User Profile Tracking**: Agent learns and remembers user interests and preferences  
- **Conversation History**: Full context retention using LangGraph checkpointers
- **Cross-Session Continuity**: Resume conversations using session IDs

### Intelligent Recommendations
- **Query Suggestions**: AI-powered recommendations based on conversation history
- **Interactive Refinement**: Collaborative query building with the agent
- **Context-Aware**: Suggestions based on user profile and previous interactions

## 🏗️ Architecture

The agent uses LangGraph's multi-agent architecture with the following components:

```
User Query → Classifier → [Structured Agent | Unstructured Agent | Recommender] → Summarizer → Response
                ↓
            Tool Nodes (Dataset Analysis Tools)
```

### Agent Types
1. **Structured Agent**: Handles quantitative queries (statistics, examples, distributions)
2. **Unstructured Agent**: Handles qualitative queries (summaries, insights, patterns)
3. **Query Recommender**: Suggests follow-up questions based on context
4. **Summarizer**: Updates user profile and conversation memory

## 🚀 Setup Instructions

### Prerequisites
- **Python Version**: 3.9 or higher
- **API Key**: OpenAI API key or Nebius API key
- **For Hugging Face Spaces**: Ensure your API key is set as a Space secret

### Installation

1. **Clone the repository**:
```bash
git clone <repository-url>
cd Agents
```

2. **Install dependencies**:
```bash
pip install -r requirements.txt
```

3. **Configure API Key**:

Create a `.env` file in the project root:
```bash
# For OpenAI (recommended)
OPENAI_API_KEY=your_openai_api_key_here

# OR for Nebius
NEBIUS_API_KEY=your_nebius_api_key_here
```

4. **Run the application**:
```bash
streamlit run app.py
```

5. **Access the app**:
Open your browser to `http://localhost:8501`

### Alternative Deployment

#### For Hugging Face Spaces:
1. **Fork or upload this repository to Hugging Face Spaces**
2. **Set your API key as a Space secret:**
   - Go to your Space settings
   - Navigate to "Variables and secrets" 
   - Add a secret named `NEBIUS_API_KEY` or `OPENAI_API_KEY`
   - Enter your API key as the value
3. **The app will start automatically**

#### For other cloud deployment:
```bash
export OPENAI_API_KEY=your_api_key_here
# OR
export NEBIUS_API_KEY=your_api_key_here
```

## 🎯 Usage Guide

### Query Types

#### Structured Queries (Quantitative Analysis)
- "How many records are in each category?"
- "What are the most common customer issues?"
- "Show me 5 examples of billing problems"
- "Get distribution of intents"

#### Unstructured Queries (Qualitative Analysis)  
- "Summarize the refund category"
- "What patterns do you see in payment issues?"
- "Analyze customer sentiment in billing conversations"
- "What insights can you provide about technical support?"

#### Memory & Recommendations
- "What do you remember about me?"
- "What should I query next?"
- "Advise me what to explore"
- "Recommend follow-up questions"

### Session Management

#### Creating Sessions
- **New Session**: Click "🆕 New Session" to start fresh
- **Auto-Generated**: Each new browser session gets a unique ID

#### Resuming Sessions
1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`)
2. Enter the full session ID in "Join Existing Session"
3. Click "🔗 Join Session" to resume

#### Cross-Tab Persistence
- Open multiple tabs with the same session ID
- Conversations sync across all tabs
- Memory and user profile persist

## 🧠 Memory System

### User Profile Tracking
The agent automatically tracks:
- **Interests**: Topics and categories you frequently ask about
- **Expertise Level**: Inferred from question complexity (beginner/intermediate/advanced)
- **Preferences**: Analysis style preferences (quantitative vs qualitative)
- **Query History**: Recent questions for context

### Conversation Persistence
- **Thread-based**: Each session has a unique thread ID
- **Checkpoint System**: LangGraph automatically saves state after each interaction
- **Cross-Session**: Resume conversations days or weeks later

### Memory Queries
Ask the agent what it remembers:
```
"What do you remember about me?"
"What are my interests?"
"What have I asked about before?"
```

## 🔧 Testing the Agent

### Basic Functionality Tests

1. **Classification Test**:
```
Query: "How many categories are there?"
Expected: Routes to Structured Agent → Uses get_dataset_stats tool
```

2. **Follow-up Memory Test**:
```
Query 1: "Show me billing examples"
Query 2: "Show me more examples"
Expected: Agent remembers previous context about billing
```

3. **User Profile Test**:
```
Query 1: "I'm interested in refund patterns"
Query 2: "What do you remember about me?"
Expected: Agent mentions interest in refunds
```

4. **Recommendation Test**:
```
Query: "What should I query next?"
Expected: Personalized suggestions based on history
```

### Advanced Feature Tests

1. **Session Persistence**:
   - Ask a question, reload the page
   - Verify conversation history remains
   - Verify user profile persists

2. **Cross-Session Memory**:
   - Note your session ID
   - Close browser completely
   - Reopen and join the same session
   - Verify full conversation and profile restoration

3. **Interactive Recommendations**:
```
User: "Advise me what to query next"
Agent: "Based on your interest in billing, you might want to analyze refund patterns."
User: "I'd rather see examples instead"
Agent: "Then I suggest showing 5 examples of refund requests."
User: "Please do so"
Expected: Agent executes the refined query
```

## 📁 File Structure

```
Agents/
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── .env                     # API keys (create this)
├── app.py                   # LangGraph Streamlit app
├── langgraph_agent.py       # LangGraph agent implementation
├── agent-memory.ipynb       # Memory example notebook
├── test_agent.py            # Test suite
└── DEPLOYMENT_GUIDE.md      # Original deployment guide
```

## 🛠️ Technical Implementation

### LangGraph Components

**State Management**:
```python
class AgentState(TypedDict):
    messages: List[Any]
    query_type: Optional[str]
    user_profile: Optional[Dict[str, Any]]
    session_context: Optional[Dict[str, Any]]
```

**Tool Categories**:
- **Structured Tools**: Statistics, distributions, examples, search
- **Unstructured Tools**: Summaries, insights, pattern analysis
- **Memory Tools**: Profile updates, preference tracking

**Graph Flow**:
1. **Classifier**: Determines query type
2. **Agent Selection**: Routes to appropriate specialist
3. **Tool Execution**: Dynamic tool usage based on needs
4. **Memory Update**: Profile and context updates
5. **Response Generation**: Final answer with memory integration

### Memory Architecture

**Checkpointer**: LangGraph's `MemorySaver` for conversation persistence
**Thread Management**: Unique thread IDs for session isolation
**Profile Synthesis**: LLM-powered extraction of user characteristics
**Context Retention**: Full conversation history with temporal awareness

## 🔍 Troubleshooting

### Common Issues

1. **API Key Errors**:
   - Verify `.env` file exists and has correct key
   - Check environment variable is set in deployment
   - Ensure API key has sufficient credits

2. **Memory Not Persisting**:
   - Verify session ID remains consistent
   - Check browser localStorage not being cleared
   - Ensure thread_id parameter is passed correctly

3. **Dataset Loading Issues**:
   - Check internet connection for Hugging Face datasets
   - Verify datasets library is installed
   - Try clearing Streamlit cache: `streamlit cache clear`

4. **Tool Execution Errors**:
   - Verify all dependencies in requirements.txt are installed
   - Check dataset is properly loaded
   - Review error messages in Streamlit interface

### Debug Mode

Enable debug logging by setting:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```

## 🎓 Learning Objectives

This implementation demonstrates:

1. **LangGraph Multi-Agent Systems**: Specialized agents for different query types
2. **Memory & Persistence**: Conversation continuity across sessions  
3. **Tool Integration**: Dynamic tool selection and execution
4. **State Management**: Complex state updates and routing
5. **User Experience**: Session management and interactive features

## 🚀 Future Enhancements

Potential improvements:
- **Database Persistence**: Replace MemorySaver with PostgreSQL checkpointer
- **Advanced Analytics**: More sophisticated data analysis tools
- **Export Features**: PDF/CSV report generation
- **User Authentication**: Multi-user support with profiles
- **Real-time Collaboration**: Shared sessions between users

## 📄 License

This project is for educational purposes as part of a data science curriculum.

## 🤝 Contributing

This is an assignment project. For questions or issues, please contact the course instructors.

---

**Built with**: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets