Spaces:

NTU-Peak-2
/

Singtel_Use_Case1

Runtime error

App Files Files Community

cosmoruler commited on Jul 16

Commit

5269c7e

1 Parent(s): 532a561

first draft

Browse files

Files changed (7) hide show

README.md +202 -2
analyze.py +126 -0
app.py +234 -0
config.py +62 -0
requirements.txt +11 -0
test_setup.py +38 -0
upload.py +40 -0

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Singtel Use Case1
-emoji: 🐠
 colorFrom: purple
 colorTo: green
 sdk: gradio
@@ -9,4 +9,204 @@ app_file: app.py
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: AI Data Analysis with SmoLagent
+emoji: 🤖
 colorFrom: purple
 colorTo: green
 sdk: gradio
 pinned: false
 ---
+# AI Data Analysis with SmoLagent
+An intelligent data analysis application that uses SmoLagent for AI-powered insights on CSV data.
+## Features
+🤖 **AI-Powered Analysis**: Uses SmoLagent for natural language queries about your data
+📊 **Interactive Visualizations**: Correlation heatmaps, distribution plots, and more
+📈 **Statistical Analysis**: Comprehensive statistical summaries and insights
+🌐 **Web Interface**: User-friendly Gradio interface for easy interaction
+🔧 **Flexible**: Works with various LLM models (OpenAI, local models, etc.)
+## Quick Start
+1. **Install Dependencies**
+   ```bash
+   pip install -r requirements.txt
+   ```
+2. **Configure Your Model** (Optional for basic features)
+   - Edit `config.py` to set up your preferred LLM
+   - Supported: OpenAI, Ollama, Hugging Face, and more
+3. **Run the Application**
+   ```bash
+   python app.py
+   ```
+4. **Access the Interface**
+   - Open your browser to the displayed URL (usually http://localhost:7860)
+## Files Overview
+- `app.py` - Main Gradio application with AI analysis features
+- `upload.py` - Data loading and exploration script
+- `analyze.py` - Example script showing SmoLagent usage
+- `config.py` - Configuration file for model setup
+- `requirements.txt` - Python dependencies
+## Model Configuration
+### OpenAI Models
+```python
+from smolagents.models import OpenAIServerModel
+model = OpenAIServerModel(
+    model_id="gpt-4",
+    api_key="your-openai-api-key"
+)
+```
+### Local Models (Ollama)
+```python
+from smolagents.models import LiteLLMModel
+model = LiteLLMModel(
+    model_id="ollama/llama2",
+    api_base="http://localhost:11434"
+)
+```
+### Hugging Face Models
+```python
+from smolagents.models import HfApiModel
+model = HfApiModel(
+    model_id="microsoft/DialoGPT-medium",
+    token="your-hf-token"
+)
+```
+## Usage Examples
+### Basic Data Exploration
+```python
+python upload.py  # Load and explore your CSV data
+```
+### Interactive Analysis
+```python
+python app.py     # Start the web interface
+```
+### Programmatic Analysis
+```python
+python analyze.py # Run example analysis scripts
+```
+## Features Available
+### 1. Data Overview Tab
+- Dataset shape and structure
+- Column information and data types
+- Missing value analysis
+- Memory usage statistics
+### 2. Basic Statistics Tab
+- Descriptive statistics for all columns
+- Summary statistics (mean, median, std, etc.)
+- Data distribution insights
+### 3. Visualizations Tab
+- **Correlation Heatmap**: Shows relationships between numerical variables
+- **Distribution Plots**: Histograms for all numerical columns
+### 4. AI Analysis Tab
+- Natural language queries about your data
+- AI-powered insights and recommendations
+- Automated pattern detection
+- Outlier identification
+## Example AI Queries
+Ask SmoLagent questions like:
+- "What are the main trends in this data?"
+- "Find any outliers or anomalies"
+- "Suggest the best features for prediction"
+- "Identify data quality issues"
+- "Perform clustering analysis"
+- "Find seasonal patterns"
+## Data Requirements
+- CSV format
+- Update the file path in `config.py` or `upload.py`
+- Supports various data types (numerical, categorical, datetime)
+## Troubleshooting
+### Common Issues:
+1. **File Not Found Error**
+   - Check the CSV file path in `config.py`
+   - Ensure the file exists and is accessible
+2. **Model Configuration Error**
+   - Verify your API keys in `config.py`
+   - Check model availability and configuration
+3. **Dependency Issues**
+   - Run `pip install -r requirements.txt`
+   - Ensure Python 3.8+ is installed
+### Getting Help:
+- Check the console output for detailed error messages
+- Verify your model configuration in `config.py`
+- Ensure your CSV file is properly formatted
+## Advanced Usage
+### Custom Analysis Functions
+You can extend the application by adding custom analysis functions:
+```python
+def custom_analysis(df):
+    # Your custom analysis logic here
+    return results
+```
+### Adding New Visualizations
+Add new plotting functions to create additional visualizations:
+```python
+def create_custom_plot(df):
+    # Your plotting logic here
+    return plot_image
+```
+## Dependencies
+- smolagents - AI agent framework
+- gradio - Web interface
+- pandas - Data manipulation
+- numpy - Numerical computing
+- matplotlib/seaborn - Plotting
+- plotly - Interactive visualizations
+- scikit-learn - Machine learning tools
+## License
+This project is open source and available under the MIT License.
+## Configuration Reference
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

analyze.py ADDED Viewed

	@@ -0,0 +1,126 @@

+"""
+Example script demonstrating SmoLagent data analysis
+===================================================
+This script shows how to use SmoLagent for automated data analysis
+"""
+import pandas as pd
+from smolagents import CodeAgent, PythonCodeTool
+import matplotlib.pyplot as plt
+import seaborn as sns
+# Configuration
+CSV_FILE_PATH = "C:/Users/Cosmo/Desktop/NTU Peak Singtel/outsystems_sample_logs_6months.csv"
+def simple_data_analysis():
+    """Perform basic data analysis without AI agent"""
+    print("=== LOADING DATA ===")
+    try:
+        df = pd.read_csv(CSV_FILE_PATH)
+        print(f"✅ Data loaded successfully! Shape: {df.shape}")
+    except Exception as e:
+        print(f"❌ Error loading data: {e}")
+        return
+    print("\n=== BASIC INFO ===")
+    print(f"Columns: {list(df.columns)}")
+    print(f"Data types:\n{df.dtypes}")
+    print(f"\nMissing values:\n{df.isnull().sum()}")
+    print("\n=== STATISTICAL SUMMARY ===")
+    print(df.describe())
+    # Create some basic plots
+    numeric_columns = df.select_dtypes(include=['number']).columns
+    if len(numeric_columns) > 0:
+        print(f"\n=== CREATING PLOTS FOR {len(numeric_columns)} NUMERIC COLUMNS ===")
+        # Distribution plots
+        plt.figure(figsize=(15, 10))
+        for i, col in enumerate(numeric_columns[:6]):  # Limit to first 6 columns
+            plt.subplot(2, 3, i+1)
+            df[col].hist(bins=30, alpha=0.7)
+            plt.title(f'Distribution of {col}')
+            plt.xlabel(col)
+            plt.ylabel('Frequency')
+        plt.tight_layout()
+        plt.savefig('distributions.png', dpi=300, bbox_inches='tight')
+        plt.show()
+        print("✅ Distribution plots saved as 'distributions.png'")
+        # Correlation heatmap
+        if len(numeric_columns) > 1:
+            plt.figure(figsize=(12, 8))
+            correlation_matrix = df[numeric_columns].corr()
+            sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
+            plt.title('Correlation Heatmap')
+            plt.tight_layout()
+            plt.savefig('correlation_heatmap.png', dpi=300, bbox_inches='tight')
+            plt.show()
+            print("✅ Correlation heatmap saved as 'correlation_heatmap.png'")
+    return df
+def analyze_with_smolagent_example():
+    """Example of how to use SmoLagent (requires model configuration)"""
+    print("\n=== SMOLAGENT ANALYSIS EXAMPLE ===")
+    print("Note: This requires proper model configuration in config.py")
+    # This is a template - you need to configure your model
+    try:
+        # Uncomment and configure based on your model choice:
+        # For OpenAI:
+        # from smolagents.models import OpenAIServerModel
+        # model = OpenAIServerModel(model_id="gpt-4", api_key="your-api-key")
+        # For local Ollama:
+        # from smolagents.models import LiteLLMModel
+        # model = LiteLLMModel(model_id="ollama/llama2", api_base="http://localhost:11434")
+        # Create agent
+        # python_tool = PythonCodeTool()
+        # agent = CodeAgent(tools=[python_tool], model=model)
+        # Load data for analysis
+        df = pd.read_csv(CSV_FILE_PATH)
+        # Example queries you could ask:
+        example_queries = [
+            "Analyze the distribution of numerical columns and identify any outliers",
+            "Find correlations between variables and suggest interesting patterns",
+            "Perform clustering analysis on the data",
+            "Identify trends and seasonality in time-series data",
+            "Suggest data quality improvements",
+        ]
+        print("Example queries you can ask SmoLagent:")
+        for i, query in enumerate(example_queries, 1):
+            print(f"{i}. {query}")
+        print("\nTo use SmoLagent:")
+        print("1. Configure your model in config.py")
+        print("2. Uncomment the model initialization code above")
+        print("3. Run the agent with your queries")
+        # Example usage (commented out until model is configured):
+        # response = agent.run(f"Analyze this dataset: {df.head().to_string()}")
+        # print(f"AI Analysis: {response}")
+    except Exception as e:
+        print(f"SmoLagent setup needed: {e}")
+if __name__ == "__main__":
+    # Run basic analysis
+    df = simple_data_analysis()
+    # Show SmoLagent example
+    analyze_with_smolagent_example()
+    print("\n=== NEXT STEPS ===")
+    print("1. Configure your AI model in config.py")
+    print("2. Run 'python app.py' to start the Gradio interface")
+    print("3. Use the web interface for interactive analysis")

app.py CHANGED Viewed

	@@ -0,0 +1,234 @@

+import gradio as gr
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+import plotly.express as px
+import plotly.graph_objects as go
+from smolagents import CodeAgent, DuckDuckGoSearchTool, PythonCodeTool
+from smolagents.models import OpenAIServerModel
+import io
+import base64
+from PIL import Image
+# Configure the CSV file path
+CSV_FILE_PATH = "C:/Users/Cosmo/Desktop/NTU Peak Singtel/outsystems_sample_logs_6months.csv"
+class DataAnalysisAgent:
+    def __init__(self):
+        """Initialize the data analysis agent with SmoLagent"""
+        # Initialize tools
+        self.python_tool = PythonCodeTool()
+        self.search_tool = DuckDuckGoSearchTool()
+        # Note: You'll need to set up your LLM model here
+        # For this example, I'm using a placeholder - replace with your actual model
+        try:
+            # Replace with your actual model configuration
+            # model = OpenAIServerModel(model_id="gpt-4", api_key="your-api-key")
+            # self.agent = CodeAgent(tools=[self.python_tool, self.search_tool], model=model)
+            pass
+        except:
+            self.agent = None
+        self.df = None
+        self.load_data()
+    def load_data(self):
+        """Load the CSV data"""
+        try:
+            self.df = pd.read_csv(CSV_FILE_PATH)
+            return f"Data loaded successfully! Shape: {self.df.shape}"
+        except Exception as e:
+            return f"Error loading data: {str(e)}"
+    def get_data_overview(self):
+        """Get basic overview of the dataset"""
+        if self.df is None:
+            return "No data loaded"
+        overview = {
+            "shape": self.df.shape,
+            "columns": list(self.df.columns),
+            "dtypes": self.df.dtypes.to_dict(),
+            "missing_values": self.df.isnull().sum().to_dict(),
+            "memory_usage": f"{self.df.memory_usage(deep=True).sum() / 1024**2:.2f} MB"
+        }
+        return overview
+    def generate_basic_stats(self):
+        """Generate basic statistical summary"""
+        if self.df is None:
+            return "No data loaded"
+        return self.df.describe(include='all').to_html()
+    def create_correlation_heatmap(self):
+        """Create correlation heatmap for numerical columns"""
+        if self.df is None:
+            return None
+        numeric_cols = self.df.select_dtypes(include=[np.number]).columns
+        if len(numeric_cols) < 2:
+            return "Not enough numerical columns for correlation analysis"
+        plt.figure(figsize=(12, 8))
+        correlation_matrix = self.df[numeric_cols].corr()
+        sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
+        plt.title('Correlation Heatmap')
+        plt.tight_layout()
+        # Save plot to bytes
+        img_buffer = io.BytesIO()
+        plt.savefig(img_buffer, format='png', dpi=300, bbox_inches='tight')
+        img_buffer.seek(0)
+        plt.close()
+        return img_buffer
+    def create_distribution_plots(self):
+        """Create distribution plots for numerical columns"""
+        if self.df is None:
+            return None
+        numeric_cols = self.df.select_dtypes(include=[np.number]).columns
+        if len(numeric_cols) == 0:
+            return "No numerical columns found"
+        n_cols = min(3, len(numeric_cols))
+        n_rows = (len(numeric_cols) + n_cols - 1) // n_cols
+        fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, 5*n_rows))
+        if n_rows == 1 and n_cols == 1:
+            axes = [axes]
+        elif n_rows == 1 or n_cols == 1:
+            axes = axes.flatten()
+        else:
+            axes = axes.flatten()
+        for i, col in enumerate(numeric_cols):
+            if i < len(axes):
+                self.df[col].hist(bins=30, ax=axes[i], alpha=0.7)
+                axes[i].set_title(f'Distribution of {col}')
+                axes[i].set_xlabel(col)
+                axes[i].set_ylabel('Frequency')
+        # Hide empty subplots
+        for i in range(len(numeric_cols), len(axes)):
+            axes[i].set_visible(False)
+        plt.tight_layout()
+        img_buffer = io.BytesIO()
+        plt.savefig(img_buffer, format='png', dpi=300, bbox_inches='tight')
+        img_buffer.seek(0)
+        plt.close()
+        return img_buffer
+    def analyze_with_smolagent(self, query):
+        """Use SmoLagent to analyze data based on user query"""
+        if self.agent is None:
+            return "SmoLagent not configured. Please set up your LLM model."
+        # Prepare context about the dataset
+        data_context = f"""
+        Dataset shape: {self.df.shape}
+        Columns: {list(self.df.columns)}
+        Data types: {self.df.dtypes.to_dict()}
+        First few rows: {self.df.head().to_string()}
+        """
+        prompt = f"""
+        You have access to a pandas DataFrame with the following information:
+        {data_context}
+        User query: {query}
+        Please analyze the data and provide insights. Use the PythonCodeTool to write and execute code for analysis.
+        """
+        try:
+            response = self.agent.run(prompt)
+            return response
+        except Exception as e:
+            return f"Error in SmoLagent analysis: {str(e)}"
+# Initialize the agent
+data_agent = DataAnalysisAgent()
+def analyze_data_overview():
+    """Gradio function for data overview"""
+    overview = data_agent.get_data_overview()
+    return str(overview)
+def generate_statistics():
+    """Gradio function for basic statistics"""
+    return data_agent.generate_basic_stats()
+def create_correlation_plot():
+    """Gradio function for correlation heatmap"""
+    img_buffer = data_agent.create_correlation_heatmap()
+    if isinstance(img_buffer, str):
+        return None
+    return Image.open(img_buffer)
+def create_distribution_plot():
+    """Gradio function for distribution plots"""
+    img_buffer = data_agent.create_distribution_plots()
+    if isinstance(img_buffer, str):
+        return None
+    return Image.open(img_buffer)
+def smolagent_analysis(query):
+    """Gradio function for SmoLagent analysis"""
+    return data_agent.analyze_with_smolagent(query)
+# Create Gradio interface
+with gr.Blocks(title="AI Data Analysis with SmoLagent") as demo:
+    gr.Markdown("# AI Data Analysis Dashboard")
+    gr.Markdown("Analyze your CSV data using AI-powered insights with SmoLagent")
+    with gr.Tab("Data Overview"):
+        gr.Markdown("## Dataset Overview")
+        overview_btn = gr.Button("Get Data Overview")
+        overview_output = gr.Textbox(label="Dataset Information", lines=10)
+        overview_btn.click(analyze_data_overview, outputs=overview_output)
+    with gr.Tab("Basic Statistics"):
+        gr.Markdown("## Statistical Summary")
+        stats_btn = gr.Button("Generate Statistics")
+        stats_output = gr.HTML(label="Statistical Summary")
+        stats_btn.click(generate_statistics, outputs=stats_output)
+    with gr.Tab("Visualizations"):
+        gr.Markdown("## Data Visualizations")
+        with gr.Row():
+            corr_btn = gr.Button("Generate Correlation Heatmap")
+            dist_btn = gr.Button("Generate Distribution Plots")
+        with gr.Row():
+            corr_plot = gr.Image(label="Correlation Heatmap")
+            dist_plot = gr.Image(label="Distribution Plots")
+        corr_btn.click(create_correlation_plot, outputs=corr_plot)
+        dist_btn.click(create_distribution_plot, outputs=dist_plot)
+    with gr.Tab("AI Analysis"):
+        gr.Markdown("## SmoLagent AI Analysis")
+        gr.Markdown("Ask questions about your data and get AI-powered insights")
+        query_input = gr.Textbox(
+            label="Enter your analysis question",
+            placeholder="e.g., 'What are the main trends in this data?' or 'Find outliers and anomalies'",
+            lines=3
+        )
+        analyze_btn = gr.Button("Analyze with AI")
+        ai_output = gr.Textbox(label="AI Analysis Results", lines=15)
+        analyze_btn.click(smolagent_analysis, inputs=query_input, outputs=ai_output)
+if __name__ == "__main__":
+    demo.launch()

config.py ADDED Viewed

	@@ -0,0 +1,62 @@

+# Configuration file for SmoLagent setup
+"""
+SmoLagent Configuration Guide
+============================
+To use the AI analysis features, you need to configure a Language Model.
+SmoLagent supports various models including:
+1. OpenAI Models (GPT-3.5, GPT-4)
+2. Local models (Ollama, LMStudio)
+3. Hugging Face models
+4. Other API-compatible models
+Example configurations:
+# For OpenAI:
+from smolagents.models import OpenAIServerModel
+model = OpenAIServerModel(
+    model_id="gpt-4",
+    api_key="your-openai-api-key"
+)
+# For local Ollama:
+from smolagents.models import LiteLLMModel
+model = LiteLLMModel(
+    model_id="ollama/llama2",
+    api_base="http://localhost:11434"
+)
+# For Hugging Face:
+from smolagents.models import HfApiModel
+model = HfApiModel(
+    model_id="microsoft/DialoGPT-medium",
+    token="your-hf-token"
+)
+Instructions:
+1. Choose your preferred model from above
+2. Get the necessary API keys/tokens
+3. Update the model configuration in app.py
+4. Replace the placeholder model initialization with your chosen configuration
+"""
+# CSV file path configuration
+CSV_FILE_PATH = "C:/Users/Cosmo/Desktop/NTU Peak Singtel/outsystems_sample_logs_6months.csv"
+# Model configuration (update with your preferred settings)
+MODEL_CONFIG = {
+    "provider": "openai",  # Change to your preferred provider
+    "model_id": "gpt-4",   # Change to your preferred model
+    "api_key": "your-api-key-here",  # Add your actual API key
+    "api_base": None,      # For local models, set the base URL
+}
+# Analysis settings
+ANALYSIS_SETTINGS = {
+    "max_rows_display": 1000,
+    "plot_style": "seaborn",
+    "figure_size": (12, 8),
+    "dpi": 300,
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+smolagents>=0.3.0
+gradio>=5.37.0
+pandas>=2.0.0
+numpy>=1.24.0
+matplotlib>=3.7.0
+seaborn>=0.12.0
+plotly>=5.15.0
+Pillow>=10.0.0
+scikit-learn>=1.3.0
+openai>=1.0.0
+requests>=2.31.0

test_setup.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import sys
+import os
+print("Python version:", sys.version)
+print("Current directory:", os.getcwd())
+try:
+    import pandas as pd
+    print("✅ Pandas imported successfully")
+except ImportError as e:
+    print("❌ Pandas import failed:", e)
+try:
+    import smolagents
+    print("✅ SmoLagents imported successfully")
+except ImportError as e:
+    print("❌ SmoLagents import failed:", e)
+try:
+    import gradio as gr
+    print("✅ Gradio imported successfully")
+except ImportError as e:
+    print("❌ Gradio import failed:", e)
+# Check if CSV file exists
+csv_path = "C:/Users/Cosmo/Desktop/NTU Peak Singtel/outsystems_sample_logs_6months.csv"
+if os.path.exists(csv_path):
+    print(f"✅ CSV file found at: {csv_path}")
+    try:
+        df = pd.read_csv(csv_path)
+        print(f"✅ CSV loaded successfully. Shape: {df.shape}")
+        print(f"Columns: {list(df.columns)}")
+    except Exception as e:
+        print(f"❌ Error loading CSV: {e}")
+else:
+    print(f"❌ CSV file not found at: {csv_path}")
+    print("Please check the file path and ensure the file exists.")
+print("\n🚀 Setup verification complete!")

upload.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import pandas as pd
+import os
+# Replace 'your_file.csv' with your CSV file path
+csv_file_path = "C:/Users/Cosmo/Desktop/NTU Peak Singtel/outsystems_sample_logs_6months.csv"
+def load_and_explore_data():
+    """Load and explore the CSV data"""
+    try:
+        # Check if file exists
+        if not os.path.exists(csv_file_path):
+            print(f"Error: File not found at {csv_file_path}")
+            return None
+        # Read the CSV file into a DataFrame
+        df = pd.read_csv(csv_file_path)
+        print("=== DATA LOADED SUCCESSFULLY ===")
+        print(f"Dataset shape: {df.shape}")
+        print(f"Columns: {list(df.columns)}")
+        print("\n=== FIRST 5 ROWS ===")
+        print(df.head())
+        print("\n=== DATA TYPES ===")
+        print(df.dtypes)
+        print("\n=== MISSING VALUES ===")
+        print(df.isnull().sum())
+        print("\n=== BASIC STATISTICS ===")
+        print(df.describe())
+        return df
+    except Exception as e:
+        print(f"Error loading data: {str(e)}")
+        return None
+if __name__ == "__main__":
+    df = load_and_explore_data()