metadata

title: LangGraph Data Analyst Agent
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit

🤖 LangGraph Data Analyst Agent

An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.

🌟 Features

Core Functionality

Multi-Agent Architecture: Separate specialized agents for structured and unstructured queries
Query Classification: Automatic routing to appropriate agent based on query type
Rich Tool Set: Comprehensive tools for data analysis and insights

Advanced Memory & Persistence

Session Management: Persistent conversations across page reloads and browser sessions
User Profile Tracking: Agent learns and remembers user interests and preferences
Conversation History: Full context retention using LangGraph checkpointers
Cross-Session Continuity: Resume conversations using session IDs

Intelligent Recommendations

Query Suggestions: AI-powered recommendations based on conversation history
Interactive Refinement: Collaborative query building with the agent
Context-Aware: Suggestions based on user profile and previous interactions

🏗️ Architecture

The agent uses LangGraph's multi-agent architecture with the following components:

User Query → Classifier → [Structured Agent | Unstructured Agent | Recommender] → Summarizer → Response
                ↓
            Tool Nodes (Dataset Analysis Tools)

Agent Types

Structured Agent: Handles quantitative queries (statistics, examples, distributions)
Unstructured Agent: Handles qualitative queries (summaries, insights, patterns)
Query Recommender: Suggests follow-up questions based on context
Summarizer: Updates user profile and conversation memory

🚀 Setup Instructions

Prerequisites

Python Version: 3.9 or higher
API Key: OpenAI API key or Nebius API key
For Hugging Face Spaces: Ensure your API key is set as a Space secret

Installation

Clone the repository:

git clone <repository-url>
cd Agents

Install dependencies:

pip install -r requirements.txt

Configure API Key:

Create a .env file in the project root:

# For OpenAI (recommended)
OPENAI_API_KEY=your_openai_api_key_here

# OR for Nebius
NEBIUS_API_KEY=your_nebius_api_key_here

Run the application:

streamlit run app.py

Access the app: Open your browser to http://localhost:8501

Alternative Deployment

For Hugging Face Spaces:

Fork or upload this repository to Hugging Face Spaces
Set your API key as a Space secret:
- Go to your Space settings
- Navigate to "Variables and secrets"
- Add a secret named NEBIUS_API_KEY or OPENAI_API_KEY
- Enter your API key as the value
The app will start automatically

For other cloud deployment:

export OPENAI_API_KEY=your_api_key_here
# OR
export NEBIUS_API_KEY=your_api_key_here

🎯 Usage Guide

Query Types

Structured Queries (Quantitative Analysis)

"How many records are in each category?"
"What are the most common customer issues?"
"Show me 5 examples of billing problems"
"Get distribution of intents"

Unstructured Queries (Qualitative Analysis)

"Summarize the refund category"
"What patterns do you see in payment issues?"
"Analyze customer sentiment in billing conversations"
"What insights can you provide about technical support?"

Memory & Recommendations

"What do you remember about me?"
"What should I query next?"
"Advise me what to explore"
"Recommend follow-up questions"

Session Management

Creating Sessions

New Session: Click "🆕 New Session" to start fresh
Auto-Generated: Each new browser session gets a unique ID

Resuming Sessions

Copy your session ID from the sidebar (e.g., a1b2c3d4...)
Enter the full session ID in "Join Existing Session"
Click "🔗 Join Session" to resume

Cross-Tab Persistence

Open multiple tabs with the same session ID
Conversations sync across all tabs
Memory and user profile persist

🧠 Memory System

User Profile Tracking

The agent automatically tracks:

Interests: Topics and categories you frequently ask about
Expertise Level: Inferred from question complexity (beginner/intermediate/advanced)
Preferences: Analysis style preferences (quantitative vs qualitative)
Query History: Recent questions for context

Conversation Persistence

Thread-based: Each session has a unique thread ID
Checkpoint System: LangGraph automatically saves state after each interaction
Cross-Session: Resume conversations days or weeks later

Memory Queries

Ask the agent what it remembers:

"What do you remember about me?"
"What are my interests?"
"What have I asked about before?"

🔧 Testing the Agent

Basic Functionality Tests

Classification Test:

Query: "How many categories are there?"
Expected: Routes to Structured Agent → Uses get_dataset_stats tool

Follow-up Memory Test:

Query 1: "Show me billing examples"
Query 2: "Show me more examples"
Expected: Agent remembers previous context about billing

User Profile Test:

Query 1: "I'm interested in refund patterns"
Query 2: "What do you remember about me?"
Expected: Agent mentions interest in refunds

Recommendation Test:

Query: "What should I query next?"
Expected: Personalized suggestions based on history

Advanced Feature Tests

Session Persistence:
- Ask a question, reload the page
- Verify conversation history remains
- Verify user profile persists
Cross-Session Memory:
- Note your session ID
- Close browser completely
- Reopen and join the same session
- Verify full conversation and profile restoration
Interactive Recommendations:

User: "Advise me what to query next"
Agent: "Based on your interest in billing, you might want to analyze refund patterns."
User: "I'd rather see examples instead"
Agent: "Then I suggest showing 5 examples of refund requests."
User: "Please do so"
Expected: Agent executes the refined query

📁 File Structure

Agents/
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── .env                     # API keys (create this)
├── app.py                   # LangGraph Streamlit app
├── langgraph_agent.py       # LangGraph agent implementation
├── agent-memory.ipynb       # Memory example notebook
├── test_agent.py            # Test suite
└── DEPLOYMENT_GUIDE.md      # Original deployment guide

🛠️ Technical Implementation

LangGraph Components

State Management:

class AgentState(TypedDict):
    messages: List[Any]
    query_type: Optional[str]
    user_profile: Optional[Dict[str, Any]]
    session_context: Optional[Dict[str, Any]]

Tool Categories:

Structured Tools: Statistics, distributions, examples, search
Unstructured Tools: Summaries, insights, pattern analysis
Memory Tools: Profile updates, preference tracking

Graph Flow:

Classifier: Determines query type
Agent Selection: Routes to appropriate specialist
Tool Execution: Dynamic tool usage based on needs
Memory Update: Profile and context updates
Response Generation: Final answer with memory integration

Memory Architecture

Checkpointer: LangGraph's MemorySaver for conversation persistence Thread Management: Unique thread IDs for session isolation Profile Synthesis: LLM-powered extraction of user characteristics Context Retention: Full conversation history with temporal awareness

🔍 Troubleshooting

Common Issues

API Key Errors:
- Verify .env file exists and has correct key
- Check environment variable is set in deployment
- Ensure API key has sufficient credits
Memory Not Persisting:
- Verify session ID remains consistent
- Check browser localStorage not being cleared
- Ensure thread_id parameter is passed correctly
Dataset Loading Issues:
- Check internet connection for Hugging Face datasets
- Verify datasets library is installed
- Try clearing Streamlit cache: streamlit cache clear
Tool Execution Errors:
- Verify all dependencies in requirements.txt are installed
- Check dataset is properly loaded
- Review error messages in Streamlit interface

Debug Mode

Enable debug logging by setting:

import logging
logging.basicConfig(level=logging.DEBUG)

🎓 Learning Objectives

This implementation demonstrates:

LangGraph Multi-Agent Systems: Specialized agents for different query types
Memory & Persistence: Conversation continuity across sessions
Tool Integration: Dynamic tool selection and execution
State Management: Complex state updates and routing
User Experience: Session management and interactive features

🚀 Future Enhancements

Potential improvements:

Database Persistence: Replace MemorySaver with PostgreSQL checkpointer
Advanced Analytics: More sophisticated data analysis tools
Export Features: PDF/CSV report generation
User Authentication: Multi-user support with profiles
Real-time Collaboration: Shared sessions between users

📄 License

This project is for educational purposes as part of a data science curriculum.

🤝 Contributing

This is an assignment project. For questions or issues, please contact the course instructors.

Built with: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets