SaritMeshesha's picture
Upload 2 files
3e090a6 verified

A newer version of the Streamlit SDK is available: 1.48.0

Upgrade
metadata
title: LangGraph Data Analyst Agent
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.28.0
app_file: app.py
pinned: false
license: mit

πŸ€– LangGraph Data Analyst Agent

An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.

🌟 Features

Core Functionality

  • Multi-Agent Architecture: Separate specialized agents for structured and unstructured queries
  • Query Classification: Automatic routing to appropriate agent based on query type
  • Rich Tool Set: Comprehensive tools for data analysis and insights

Advanced Memory & Persistence

  • Session Management: Persistent conversations across page reloads and browser sessions
  • User Profile Tracking: Agent learns and remembers user interests and preferences
  • Conversation History: Full context retention using LangGraph checkpointers
  • Cross-Session Continuity: Resume conversations using session IDs

Intelligent Recommendations

  • Query Suggestions: AI-powered recommendations based on conversation history
  • Interactive Refinement: Collaborative query building with the agent
  • Context-Aware: Suggestions based on user profile and previous interactions

πŸ—οΈ Architecture

The agent uses LangGraph's multi-agent architecture with the following components:

User Query β†’ Classifier β†’ [Structured Agent | Unstructured Agent | Recommender] β†’ Summarizer β†’ Response
                ↓
            Tool Nodes (Dataset Analysis Tools)

Agent Types

  1. Structured Agent: Handles quantitative queries (statistics, examples, distributions)
  2. Unstructured Agent: Handles qualitative queries (summaries, insights, patterns)
  3. Query Recommender: Suggests follow-up questions based on context
  4. Summarizer: Updates user profile and conversation memory

πŸš€ Setup Instructions

Prerequisites

  • Python Version: 3.9 or higher
  • API Key: OpenAI API key or Nebius API key
  • For Hugging Face Spaces: Ensure your API key is set as a Space secret

Installation

  1. Clone the repository:
git clone <repository-url>
cd Agents
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure API Key:

Create a .env file in the project root:

# For OpenAI (recommended)
OPENAI_API_KEY=your_openai_api_key_here

# OR for Nebius
NEBIUS_API_KEY=your_nebius_api_key_here
  1. Run the application:
streamlit run app.py
  1. Access the app: Open your browser to http://localhost:8501

Alternative Deployment

For Hugging Face Spaces:

  1. Fork or upload this repository to Hugging Face Spaces
  2. Set your API key as a Space secret:
    • Go to your Space settings
    • Navigate to "Variables and secrets"
    • Add a secret named NEBIUS_API_KEY or OPENAI_API_KEY
    • Enter your API key as the value
  3. The app will start automatically

For other cloud deployment:

export OPENAI_API_KEY=your_api_key_here
# OR
export NEBIUS_API_KEY=your_api_key_here

🎯 Usage Guide

Query Types

Structured Queries (Quantitative Analysis)

  • "How many records are in each category?"
  • "What are the most common customer issues?"
  • "Show me 5 examples of billing problems"
  • "Get distribution of intents"

Unstructured Queries (Qualitative Analysis)

  • "Summarize the refund category"
  • "What patterns do you see in payment issues?"
  • "Analyze customer sentiment in billing conversations"
  • "What insights can you provide about technical support?"

Memory & Recommendations

  • "What do you remember about me?"
  • "What should I query next?"
  • "Advise me what to explore"
  • "Recommend follow-up questions"

Session Management

Creating Sessions

  • New Session: Click "πŸ†• New Session" to start fresh
  • Auto-Generated: Each new browser session gets a unique ID

Resuming Sessions

  1. Copy your session ID from the sidebar (e.g., a1b2c3d4...)
  2. Enter the full session ID in "Join Existing Session"
  3. Click "πŸ”— Join Session" to resume

Cross-Tab Persistence

  • Open multiple tabs with the same session ID
  • Conversations sync across all tabs
  • Memory and user profile persist

🧠 Memory System

User Profile Tracking

The agent automatically tracks:

  • Interests: Topics and categories you frequently ask about
  • Expertise Level: Inferred from question complexity (beginner/intermediate/advanced)
  • Preferences: Analysis style preferences (quantitative vs qualitative)
  • Query History: Recent questions for context

Conversation Persistence

  • Thread-based: Each session has a unique thread ID
  • Checkpoint System: LangGraph automatically saves state after each interaction
  • Cross-Session: Resume conversations days or weeks later

Memory Queries

Ask the agent what it remembers:

"What do you remember about me?"
"What are my interests?"
"What have I asked about before?"

πŸ”§ Testing the Agent

Basic Functionality Tests

  1. Classification Test:
Query: "How many categories are there?"
Expected: Routes to Structured Agent β†’ Uses get_dataset_stats tool
  1. Follow-up Memory Test:
Query 1: "Show me billing examples"
Query 2: "Show me more examples"
Expected: Agent remembers previous context about billing
  1. User Profile Test:
Query 1: "I'm interested in refund patterns"
Query 2: "What do you remember about me?"
Expected: Agent mentions interest in refunds
  1. Recommendation Test:
Query: "What should I query next?"
Expected: Personalized suggestions based on history

Advanced Feature Tests

  1. Session Persistence:

    • Ask a question, reload the page
    • Verify conversation history remains
    • Verify user profile persists
  2. Cross-Session Memory:

    • Note your session ID
    • Close browser completely
    • Reopen and join the same session
    • Verify full conversation and profile restoration
  3. Interactive Recommendations:

User: "Advise me what to query next"
Agent: "Based on your interest in billing, you might want to analyze refund patterns."
User: "I'd rather see examples instead"
Agent: "Then I suggest showing 5 examples of refund requests."
User: "Please do so"
Expected: Agent executes the refined query

πŸ“ File Structure

Agents/
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ .env                     # API keys (create this)
β”œβ”€β”€ app.py                   # LangGraph Streamlit app
β”œβ”€β”€ langgraph_agent.py       # LangGraph agent implementation
β”œβ”€β”€ agent-memory.ipynb       # Memory example notebook
β”œβ”€β”€ test_agent.py            # Test suite
└── DEPLOYMENT_GUIDE.md      # Original deployment guide

πŸ› οΈ Technical Implementation

LangGraph Components

State Management:

class AgentState(TypedDict):
    messages: List[Any]
    query_type: Optional[str]
    user_profile: Optional[Dict[str, Any]]
    session_context: Optional[Dict[str, Any]]

Tool Categories:

  • Structured Tools: Statistics, distributions, examples, search
  • Unstructured Tools: Summaries, insights, pattern analysis
  • Memory Tools: Profile updates, preference tracking

Graph Flow:

  1. Classifier: Determines query type
  2. Agent Selection: Routes to appropriate specialist
  3. Tool Execution: Dynamic tool usage based on needs
  4. Memory Update: Profile and context updates
  5. Response Generation: Final answer with memory integration

Memory Architecture

Checkpointer: LangGraph's MemorySaver for conversation persistence Thread Management: Unique thread IDs for session isolation Profile Synthesis: LLM-powered extraction of user characteristics Context Retention: Full conversation history with temporal awareness

πŸ” Troubleshooting

Common Issues

  1. API Key Errors:

    • Verify .env file exists and has correct key
    • Check environment variable is set in deployment
    • Ensure API key has sufficient credits
  2. Memory Not Persisting:

    • Verify session ID remains consistent
    • Check browser localStorage not being cleared
    • Ensure thread_id parameter is passed correctly
  3. Dataset Loading Issues:

    • Check internet connection for Hugging Face datasets
    • Verify datasets library is installed
    • Try clearing Streamlit cache: streamlit cache clear
  4. Tool Execution Errors:

    • Verify all dependencies in requirements.txt are installed
    • Check dataset is properly loaded
    • Review error messages in Streamlit interface

Debug Mode

Enable debug logging by setting:

import logging
logging.basicConfig(level=logging.DEBUG)

πŸŽ“ Learning Objectives

This implementation demonstrates:

  1. LangGraph Multi-Agent Systems: Specialized agents for different query types
  2. Memory & Persistence: Conversation continuity across sessions
  3. Tool Integration: Dynamic tool selection and execution
  4. State Management: Complex state updates and routing
  5. User Experience: Session management and interactive features

πŸš€ Future Enhancements

Potential improvements:

  • Database Persistence: Replace MemorySaver with PostgreSQL checkpointer
  • Advanced Analytics: More sophisticated data analysis tools
  • Export Features: PDF/CSV report generation
  • User Authentication: Multi-user support with profiles
  • Real-time Collaboration: Shared sessions between users

πŸ“„ License

This project is for educational purposes as part of a data science curriculum.

🀝 Contributing

This is an assignment project. For questions or issues, please contact the course instructors.


Built with: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets