DataForge / README.md
ai-puppy
Update README.md
94bb6a1

A newer version of the Gradio SDK is available: 5.33.1

Upgrade
metadata
title: DataForge
emoji: πŸ’¬
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: CodeAct Agent to process large data set
tags:
  - agent-demo-track

πŸŽ₯ Demo Video

Watch DataForge in Action:

DataForge Demo

🎬 Click here to watch the full demo on YouTube


πŸ” DataForge - AI Assistant with File Analysis

An intelligent AI assistant that combines conversational chat capabilities with advanced file analysis using CodeAct agents. Built with Gradio, LangChain, and LangGraph.

✨ Features

πŸ’¬ Chat Assistant

  • Interactive AI chatbot powered by OpenAI GPT-4
  • Customizable system messages and parameters
  • Real-time streaming responses
  • Conversation history support

πŸ“ File Analysis

  • Upload & Analyze: Support for various file formats (.txt, .log, .csv, .json, .xml, .py, .js, .html, .md)
  • Smart Analysis: Automatic file type detection and tailored analysis
  • CodeAct Integration: Uses LangGraph CodeAct agents for deep file analysis
  • Comprehensive Insights: Provides security analysis, performance insights, error detection, and statistical summaries

πŸš€ Getting Started

Prerequisites

  • Python 3.11+
  • OpenAI API Key

Installations

  1. Create and activate virtual environment:
uv venv --python 3.11 
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
uv pip install -r requirements.txt
  1. Set up environment variables:
# Create .env file and add your OpenAI API key
OPENAI_API_KEY=your_openai_api_key_here

Running the Application

python app.py

The application will start a Gradio interface accessible at http://localhost:7860

πŸ“Š File Analysis Capabilities

Supported File Types

  • Log files (.log, .txt): Security analysis, performance bottlenecks, error detection
  • Data files (.csv, .json): Data quality assessment, statistical analysis
  • Code files (.py, .js, .html): Structure analysis, best practices review
  • Configuration files (.xml, .md): Content analysis and recommendations

Analysis Features

  • Security Analysis: Detect threats, suspicious activities, and security patterns
  • Performance Insights: Identify bottlenecks and performance issues
  • Error Analysis: Categorize and analyze errors and warnings
  • Statistical Summary: Basic statistics and data distribution
  • Pattern Recognition: Identify trends and anomalies
  • Actionable Recommendations: Suggested actions based on analysis

πŸ§ͺ Testing

A sample server log file (sample_server.log) is included for testing the file analysis functionality.

πŸ› οΈ Technical Architecture

  • Frontend: Gradio for web interface
  • Backend: LangChain for AI orchestration
  • Analysis Engine: LangGraph CodeAct agents with PyodideSandbox
  • File Processing: Custom FileInjectedPyodideSandbox for secure file analysis
  • Model: OpenAI GPT-4 for both chat and analysis

πŸ“„ License

MIT License