Spaces:
Running
Running
ai-puppy
commited on
Commit
Β·
b2ca056
1
Parent(s):
b212a72
save
Browse files- .gitignore +2 -0
- README.md +79 -18
- agent.py +191 -60
- app.py +164 -18
- requirements.txt +1 -0
- sample_server.log +25 -0
.gitignore
CHANGED
@@ -1,2 +1,4 @@
|
|
1 |
.DS_Store
|
2 |
.env
|
|
|
|
|
|
1 |
.DS_Store
|
2 |
.env
|
3 |
+
node_modules/
|
4 |
+
__pycache__/
|
README.md
CHANGED
@@ -1,20 +1,81 @@
|
|
1 |
-
|
2 |
-
title: DataForge
|
3 |
-
emoji: π¬
|
4 |
-
colorFrom: yellow
|
5 |
-
colorTo: purple
|
6 |
-
sdk: gradio
|
7 |
-
sdk_version: 5.0.1
|
8 |
-
app_file: app.py
|
9 |
-
pinned: false
|
10 |
-
license: mit
|
11 |
-
short_description: CodeAct Agent to process large data set
|
12 |
-
---
|
13 |
-
|
14 |
-
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
uv venv --python 3.11
|
17 |
-
source .venv/bin/activate
|
18 |
-
|
19 |
-
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π DataForge - AI Assistant with File Analysis
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
+
An intelligent AI assistant that combines conversational chat capabilities with advanced file analysis using CodeAct agents. Built with Gradio, LangChain, and LangGraph.
|
4 |
+
|
5 |
+
## β¨ Features
|
6 |
+
|
7 |
+
### π¬ Chat Assistant
|
8 |
+
- Interactive AI chatbot powered by OpenAI GPT-4
|
9 |
+
- Customizable system messages and parameters
|
10 |
+
- Real-time streaming responses
|
11 |
+
- Conversation history support
|
12 |
+
|
13 |
+
### π File Analysis
|
14 |
+
- **Upload & Analyze**: Support for various file formats (.txt, .log, .csv, .json, .xml, .py, .js, .html, .md)
|
15 |
+
- **Smart Analysis**: Automatic file type detection and tailored analysis
|
16 |
+
- **CodeAct Integration**: Uses LangGraph CodeAct agents for deep file analysis
|
17 |
+
- **Comprehensive Insights**: Provides security analysis, performance insights, error detection, and statistical summaries
|
18 |
+
|
19 |
+
## π Getting Started
|
20 |
+
|
21 |
+
### Prerequisites
|
22 |
+
- Python 3.11+
|
23 |
+
- OpenAI API Key
|
24 |
+
|
25 |
+
### Installation
|
26 |
+
|
27 |
+
1. Create and activate virtual environment:
|
28 |
+
```bash
|
29 |
uv venv --python 3.11
|
30 |
+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
31 |
+
```
|
32 |
+
|
33 |
+
2. Install dependencies:
|
34 |
+
```bash
|
35 |
+
uv pip install -r requirements.txt
|
36 |
+
```
|
37 |
+
|
38 |
+
3. Set up environment variables:
|
39 |
+
```bash
|
40 |
+
# Create .env file and add your OpenAI API key
|
41 |
+
OPENAI_API_KEY=your_openai_api_key_here
|
42 |
+
```
|
43 |
+
|
44 |
+
### Running the Application
|
45 |
+
```bash
|
46 |
+
python app.py
|
47 |
+
```
|
48 |
+
|
49 |
+
The application will start a Gradio interface accessible at `http://localhost:7860`
|
50 |
+
|
51 |
+
## π File Analysis Capabilities
|
52 |
+
|
53 |
+
### Supported File Types
|
54 |
+
- **Log files** (.log, .txt): Security analysis, performance bottlenecks, error detection
|
55 |
+
- **Data files** (.csv, .json): Data quality assessment, statistical analysis
|
56 |
+
- **Code files** (.py, .js, .html): Structure analysis, best practices review
|
57 |
+
- **Configuration files** (.xml, .md): Content analysis and recommendations
|
58 |
+
|
59 |
+
### Analysis Features
|
60 |
+
- **Security Analysis**: Detect threats, suspicious activities, and security patterns
|
61 |
+
- **Performance Insights**: Identify bottlenecks and performance issues
|
62 |
+
- **Error Analysis**: Categorize and analyze errors and warnings
|
63 |
+
- **Statistical Summary**: Basic statistics and data distribution
|
64 |
+
- **Pattern Recognition**: Identify trends and anomalies
|
65 |
+
- **Actionable Recommendations**: Suggested actions based on analysis
|
66 |
+
|
67 |
+
## π§ͺ Testing
|
68 |
+
|
69 |
+
A sample server log file (`sample_server.log`) is included for testing the file analysis functionality.
|
70 |
+
|
71 |
+
## π οΈ Technical Architecture
|
72 |
+
|
73 |
+
- **Frontend**: Gradio for web interface
|
74 |
+
- **Backend**: LangChain for AI orchestration
|
75 |
+
- **Analysis Engine**: LangGraph CodeAct agents with PyodideSandbox
|
76 |
+
- **File Processing**: Custom FileInjectedPyodideSandbox for secure file analysis
|
77 |
+
- **Model**: OpenAI GPT-4 for both chat and analysis
|
78 |
+
|
79 |
+
## π License
|
80 |
+
|
81 |
+
MIT License
|
agent.py
CHANGED
@@ -2,6 +2,8 @@ import asyncio
|
|
2 |
import inspect
|
3 |
import uuid
|
4 |
import os
|
|
|
|
|
5 |
from typing import Any
|
6 |
|
7 |
from langchain.chat_models import init_chat_model
|
@@ -15,11 +17,17 @@ load_dotenv(find_dotenv())
|
|
15 |
class FileInjectedPyodideSandbox(PyodideSandbox):
|
16 |
"""Custom PyodideSandbox that can inject files into the virtual filesystem."""
|
17 |
|
18 |
-
def __init__(self, file_path: str = None, virtual_path: str = "/
|
19 |
-
|
|
|
|
|
|
|
|
|
20 |
self.file_path = file_path
|
21 |
self.virtual_path = virtual_path
|
22 |
self._file_injected = False
|
|
|
|
|
23 |
|
24 |
async def execute(self, code: str, **kwargs):
|
25 |
# If we have a file to inject, prepend the injection code to the user code
|
@@ -40,7 +48,7 @@ class FileInjectedPyodideSandbox(PyodideSandbox):
|
|
40 |
import base64
|
41 |
import os
|
42 |
|
43 |
-
# Decode the
|
44 |
encoded_content = """{encoded_content}"""
|
45 |
file_content = base64.b64decode(encoded_content).decode('utf-8')
|
46 |
|
@@ -54,7 +62,7 @@ total_lines = len(log_lines)
|
|
54 |
|
55 |
print(f"[INJECTION] Successfully created {self.virtual_path} with {{len(file_content)}} characters")
|
56 |
print(f"[INJECTION] File content available as 'file_content' variable ({{len(file_content)}} chars)")
|
57 |
-
print(f"[INJECTION]
|
58 |
|
59 |
# Verify injection worked
|
60 |
if os.path.exists("{self.virtual_path}"):
|
@@ -64,8 +72,8 @@ else:
|
|
64 |
|
65 |
# Variables now available for analysis:
|
66 |
# - file_content: raw file content as string
|
67 |
-
# - log_lines: list of individual
|
68 |
-
# - total_lines: number of lines in the
|
69 |
# - File also available at: {self.virtual_path}
|
70 |
|
71 |
# End of injection code
|
@@ -82,6 +90,19 @@ else:
|
|
82 |
return await super().execute(code, **kwargs)
|
83 |
else:
|
84 |
return await super().execute(code, **kwargs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
def create_pyodide_eval_fn(sandbox: PyodideSandbox) -> EvalCoroutine:
|
87 |
"""Create an eval_fn that uses PyodideSandbox.
|
@@ -160,68 +181,178 @@ def read_file(file_path: str) -> str:
|
|
160 |
return file.read()
|
161 |
|
162 |
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
-
|
185 |
-
|
186 |
-
|
187 |
-
|
188 |
-
|
189 |
-
|
190 |
-
The server logs follow this format:
|
191 |
-
YYYY-MM-DD HH:MM:SS [LEVEL] event_type: key=value, key=value, ...
|
192 |
|
193 |
-
Sample log entries:
|
194 |
-
- 2024-01-15 08:23:45 [INFO] user_login: user=john_doe, ip=192.168.1.100, success=true
|
195 |
-
- 2024-01-15 08:24:12 [INFO] api_request: endpoint=/api/users, method=GET, user=john_doe, response_time=45ms
|
196 |
-
- 2024-01-15 08:27:22 [WARN] failed_login: user=admin, ip=203.45.67.89, attempts=3
|
197 |
-
- 2024-01-15 08:38:33 [CRITICAL] security_alert: suspicious_activity, ip=185.234.72.19, pattern=sql_injection_attempt
|
198 |
-
- 2024-01-15 08:26:01 [ERROR] database_connection: host=db-primary, error=timeout, duration=30s
|
199 |
|
200 |
-
|
201 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
202 |
|
203 |
DATA SOURCES AVAILABLE:
|
204 |
-
- `file_content`: Raw
|
205 |
-
- `log_lines`: List of individual
|
206 |
-
- `total_lines`: Number of lines in the
|
207 |
-
- File path: `/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
208 |
|
209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
210 |
"""
|
211 |
|
212 |
|
213 |
-
async def
|
214 |
-
|
215 |
-
|
216 |
-
|
217 |
-
|
218 |
-
|
219 |
-
if
|
220 |
-
|
221 |
-
|
222 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
223 |
|
224 |
|
|
|
225 |
if __name__ == "__main__":
|
226 |
-
#
|
227 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
import inspect
|
3 |
import uuid
|
4 |
import os
|
5 |
+
import tempfile
|
6 |
+
import shutil
|
7 |
from typing import Any
|
8 |
|
9 |
from langchain.chat_models import init_chat_model
|
|
|
17 |
class FileInjectedPyodideSandbox(PyodideSandbox):
|
18 |
"""Custom PyodideSandbox that can inject files into the virtual filesystem."""
|
19 |
|
20 |
+
def __init__(self, file_path: str = None, virtual_path: str = "/uploaded_file.log", sessions_dir: str = None, **kwargs):
|
21 |
+
# Create a temporary sessions directory if none provided
|
22 |
+
if sessions_dir is None:
|
23 |
+
sessions_dir = tempfile.mkdtemp(prefix="pyodide_sessions_")
|
24 |
+
|
25 |
+
super().__init__(sessions_dir=sessions_dir, **kwargs)
|
26 |
self.file_path = file_path
|
27 |
self.virtual_path = virtual_path
|
28 |
self._file_injected = False
|
29 |
+
self._temp_sessions_dir = sessions_dir
|
30 |
+
self._created_temp_dir = sessions_dir is None
|
31 |
|
32 |
async def execute(self, code: str, **kwargs):
|
33 |
# If we have a file to inject, prepend the injection code to the user code
|
|
|
48 |
import base64
|
49 |
import os
|
50 |
|
51 |
+
# Decode the file content from base64
|
52 |
encoded_content = """{encoded_content}"""
|
53 |
file_content = base64.b64decode(encoded_content).decode('utf-8')
|
54 |
|
|
|
62 |
|
63 |
print(f"[INJECTION] Successfully created {self.virtual_path} with {{len(file_content)}} characters")
|
64 |
print(f"[INJECTION] File content available as 'file_content' variable ({{len(file_content)}} chars)")
|
65 |
+
print(f"[INJECTION] Lines available as 'log_lines' variable ({{total_lines}} lines)")
|
66 |
|
67 |
# Verify injection worked
|
68 |
if os.path.exists("{self.virtual_path}"):
|
|
|
72 |
|
73 |
# Variables now available for analysis:
|
74 |
# - file_content: raw file content as string
|
75 |
+
# - log_lines: list of individual lines
|
76 |
+
# - total_lines: number of lines in the file
|
77 |
# - File also available at: {self.virtual_path}
|
78 |
|
79 |
# End of injection code
|
|
|
90 |
return await super().execute(code, **kwargs)
|
91 |
else:
|
92 |
return await super().execute(code, **kwargs)
|
93 |
+
|
94 |
+
def cleanup(self):
|
95 |
+
"""Clean up temporary directories if we created them."""
|
96 |
+
if self._created_temp_dir and self._temp_sessions_dir and os.path.exists(self._temp_sessions_dir):
|
97 |
+
try:
|
98 |
+
shutil.rmtree(self._temp_sessions_dir)
|
99 |
+
print(f"Cleaned up temporary sessions directory: {self._temp_sessions_dir}")
|
100 |
+
except Exception as e:
|
101 |
+
print(f"Warning: Could not clean up temporary directory {self._temp_sessions_dir}: {e}")
|
102 |
+
|
103 |
+
def __del__(self):
|
104 |
+
"""Cleanup when object is destroyed."""
|
105 |
+
self.cleanup()
|
106 |
|
107 |
def create_pyodide_eval_fn(sandbox: PyodideSandbox) -> EvalCoroutine:
|
108 |
"""Create an eval_fn that uses PyodideSandbox.
|
|
|
181 |
return file.read()
|
182 |
|
183 |
|
184 |
+
def create_analysis_agent(file_path: str, model=None, virtual_path: str = "/uploaded_file.log", sessions_dir: str = None):
|
185 |
+
"""
|
186 |
+
Create a CodeAct agent configured for file analysis.
|
187 |
+
|
188 |
+
Args:
|
189 |
+
file_path: Path to the file to analyze
|
190 |
+
model: Language model to use (if None, will initialize default)
|
191 |
+
virtual_path: Virtual path where file will be mounted in sandbox
|
192 |
+
sessions_dir: Directory for PyodideSandbox sessions (if None, will create temp dir)
|
193 |
+
|
194 |
+
Returns:
|
195 |
+
Compiled CodeAct agent ready for analysis
|
196 |
+
"""
|
197 |
+
if model is None:
|
198 |
+
model = init_chat_model("gpt-4.1-2025-04-14", model_provider="openai")
|
199 |
+
|
200 |
+
# Create our custom sandbox with file injection capability
|
201 |
+
sandbox = FileInjectedPyodideSandbox(
|
202 |
+
file_path=file_path,
|
203 |
+
virtual_path=virtual_path,
|
204 |
+
sessions_dir=sessions_dir,
|
205 |
+
allow_net=True
|
206 |
+
)
|
207 |
+
|
208 |
+
eval_fn = create_pyodide_eval_fn(sandbox)
|
209 |
+
code_act = create_codeact(model, [], eval_fn)
|
210 |
+
return code_act.compile()
|
|
|
|
|
211 |
|
|
|
|
|
|
|
|
|
|
|
|
|
212 |
|
213 |
+
def get_default_analysis_query(file_extension: str = None) -> str:
|
214 |
+
"""
|
215 |
+
Get a default analysis query based on file type.
|
216 |
+
|
217 |
+
Args:
|
218 |
+
file_extension: File extension (e.g., '.log', '.csv', '.txt')
|
219 |
+
|
220 |
+
Returns:
|
221 |
+
Analysis query string
|
222 |
+
"""
|
223 |
+
if file_extension and file_extension.lower() in ['.log', '.txt']:
|
224 |
+
return """
|
225 |
+
Analyze this uploaded file and provide comprehensive insights. Follow the example code patterns below for reliable analysis.
|
226 |
+
|
227 |
+
ANALYSIS REQUIREMENTS:
|
228 |
+
1. **Content Overview** - What type of data/logs this file contains
|
229 |
+
2. **Security Analysis** - Identify any security-related events, threats, or suspicious activities
|
230 |
+
3. **Performance Insights** - Find bottlenecks, slow operations, or performance issues
|
231 |
+
4. **Error Analysis** - Identify and categorize errors, warnings, and critical issues
|
232 |
+
5. **Statistical Summary** - Basic statistics (line count, data distribution, time ranges)
|
233 |
+
6. **Key Patterns** - Important patterns, trends, or anomalies found
|
234 |
+
7. **Recommendations** - Suggested actions based on the analysis
|
235 |
|
236 |
DATA SOURCES AVAILABLE:
|
237 |
+
- `file_content`: Raw file content as a string
|
238 |
+
- `log_lines`: List of individual lines
|
239 |
+
- `total_lines`: Number of lines in the file
|
240 |
+
- File path: `/uploaded_file.log`
|
241 |
+
|
242 |
+
EXAMPLE CODE PATTERNS TO FOLLOW:
|
243 |
+
|
244 |
+
Start with basic analysis, then add specific patterns based on your file type:
|
245 |
+
|
246 |
+
1. Import required libraries: re, Counter, defaultdict, datetime
|
247 |
+
2. Basic file statistics: total_lines, file_content length, sample lines
|
248 |
+
3. Pattern analysis using regex for security, performance, errors
|
249 |
+
4. Data extraction and frequency analysis
|
250 |
+
5. Clear formatted output with sections
|
251 |
+
6. Actionable recommendations
|
252 |
+
|
253 |
+
Use these code snippets as templates:
|
254 |
+
- Counter() for frequency analysis
|
255 |
+
- re.search() and re.findall() for pattern matching
|
256 |
+
- enumerate(log_lines, 1) for line-by-line processing
|
257 |
+
- defaultdict(list) for grouping findings
|
258 |
+
- Clear print statements with section headers
|
259 |
+
|
260 |
+
Generate Python code following these patterns. Always include proper error handling, clear output formatting, and actionable insights.
|
261 |
+
"""
|
262 |
+
else:
|
263 |
+
return """
|
264 |
+
Analyze this uploaded file and provide comprehensive insights. Follow these reliable patterns:
|
265 |
+
|
266 |
+
ANALYSIS REQUIREMENTS:
|
267 |
+
1. **File Type Analysis** - What type of file this is and its structure
|
268 |
+
2. **Content Summary** - Overview of the file contents
|
269 |
+
3. **Key Information** - Important data points or patterns found
|
270 |
+
4. **Data Quality** - Assessment of data completeness and consistency
|
271 |
+
5. **Statistical Analysis** - Basic statistics and data distribution
|
272 |
+
6. **Insights & Findings** - Key takeaways from the analysis
|
273 |
+
7. **Recommendations** - Suggested next steps or insights
|
274 |
|
275 |
+
DATA SOURCES AVAILABLE:
|
276 |
+
- file_content: Raw file content as a string
|
277 |
+
- log_lines: List of individual lines
|
278 |
+
- total_lines: Number of lines in the file
|
279 |
+
- File path: /uploaded_file.log
|
280 |
+
|
281 |
+
RELIABLE CODE PATTERNS:
|
282 |
+
1. Start with basic stats: total_lines, len(file_content), file preview
|
283 |
+
2. Use Counter() for frequency analysis of patterns
|
284 |
+
3. Use re.findall() for extracting structured data like emails, IPs, dates
|
285 |
+
4. Analyze line structure and consistency
|
286 |
+
5. Calculate data quality metrics
|
287 |
+
6. Provide clear sections with === headers ===
|
288 |
+
7. End with actionable recommendations
|
289 |
+
|
290 |
+
Focus on reliability over complexity. Use simple, proven Python patterns that work consistently.
|
291 |
+
|
292 |
+
Generate Python code following these guidelines for robust file analysis.
|
293 |
"""
|
294 |
|
295 |
|
296 |
+
async def run_file_analysis(file_path: str, query: str = None, model=None) -> str:
|
297 |
+
"""
|
298 |
+
Run file analysis using CodeAct agent.
|
299 |
+
|
300 |
+
Args:
|
301 |
+
file_path: Path to the file to analyze
|
302 |
+
query: Analysis query (if None, will use default based on file type)
|
303 |
+
model: Language model to use
|
304 |
+
|
305 |
+
Returns:
|
306 |
+
Analysis results as string
|
307 |
+
"""
|
308 |
+
if not os.path.exists(file_path):
|
309 |
+
return f"β File not found: {file_path}"
|
310 |
+
|
311 |
+
try:
|
312 |
+
# Create the agent
|
313 |
+
agent = create_analysis_agent(file_path, model)
|
314 |
+
|
315 |
+
# Use default query if none provided
|
316 |
+
if query is None:
|
317 |
+
file_ext = os.path.splitext(file_path)[1]
|
318 |
+
query = get_default_analysis_query(file_ext)
|
319 |
+
|
320 |
+
# Run the analysis
|
321 |
+
result_parts = []
|
322 |
+
async for typ, chunk in agent.astream(
|
323 |
+
{"messages": query},
|
324 |
+
stream_mode=["values", "messages"],
|
325 |
+
):
|
326 |
+
if typ == "messages":
|
327 |
+
result_parts.append(chunk[0].content)
|
328 |
+
elif typ == "values":
|
329 |
+
if chunk and "messages" in chunk:
|
330 |
+
final_message = chunk["messages"][-1]
|
331 |
+
if hasattr(final_message, 'content'):
|
332 |
+
result_parts.append(f"\n\n**Final Analysis:**\n{final_message.content}")
|
333 |
+
|
334 |
+
return "\n".join(result_parts) if result_parts else "Analysis completed but no output generated."
|
335 |
+
|
336 |
+
except Exception as e:
|
337 |
+
return f"β Error analyzing file: {str(e)}"
|
338 |
|
339 |
|
340 |
+
# Example usage and testing
|
341 |
if __name__ == "__main__":
|
342 |
+
# This section is for testing only - remove or comment out in production
|
343 |
+
import sys
|
344 |
+
|
345 |
+
if len(sys.argv) > 1:
|
346 |
+
test_file_path = sys.argv[1]
|
347 |
+
print(f"Testing with file: {test_file_path}")
|
348 |
+
|
349 |
+
async def test_analysis():
|
350 |
+
result = await run_file_analysis(test_file_path)
|
351 |
+
print("Analysis Result:")
|
352 |
+
print("=" * 50)
|
353 |
+
print(result)
|
354 |
+
|
355 |
+
asyncio.run(test_analysis())
|
356 |
+
else:
|
357 |
+
print("Usage: python agent.py <file_path>")
|
358 |
+
print("Or import this module and use the functions directly.")
|
app.py
CHANGED
@@ -1,11 +1,16 @@
|
|
1 |
import os
|
2 |
import gradio as gr
|
|
|
|
|
3 |
from dotenv import find_dotenv, load_dotenv
|
4 |
from langchain.chat_models import init_chat_model
|
5 |
from langchain.schema import HumanMessage, SystemMessage
|
6 |
from langgraph.prebuilt import create_react_agent
|
7 |
from langsmith import traceable
|
8 |
|
|
|
|
|
|
|
9 |
# Load environment variables
|
10 |
load_dotenv(find_dotenv())
|
11 |
|
@@ -15,9 +20,14 @@ openai_model = init_chat_model(
|
|
15 |
api_key=os.getenv("OPENAI_API_KEY"),
|
16 |
)
|
17 |
|
18 |
-
# Create the
|
19 |
chat_agent = create_react_agent(openai_model, tools=[])
|
20 |
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
@traceable
|
23 |
def respond(
|
@@ -54,7 +64,6 @@ def respond(
|
|
54 |
if "messages" in chunk and chunk["messages"]:
|
55 |
latest_message = chunk["messages"][-1]
|
56 |
if hasattr(latest_message, 'content'):
|
57 |
-
# Extract content from the message
|
58 |
current_content = latest_message.content
|
59 |
if current_content and len(current_content) > len(response_text):
|
60 |
response_text = current_content
|
@@ -67,26 +76,163 @@ def respond(
|
|
67 |
except Exception as e:
|
68 |
yield f"Error: {str(e)}. Please make sure your OpenAI API key is set correctly."
|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
|
|
|
71 |
"""
|
72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
"""
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
|
|
|
|
|
|
|
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
if __name__ == "__main__":
|
92 |
demo.launch()
|
|
|
1 |
import os
|
2 |
import gradio as gr
|
3 |
+
import asyncio
|
4 |
+
import tempfile
|
5 |
from dotenv import find_dotenv, load_dotenv
|
6 |
from langchain.chat_models import init_chat_model
|
7 |
from langchain.schema import HumanMessage, SystemMessage
|
8 |
from langgraph.prebuilt import create_react_agent
|
9 |
from langsmith import traceable
|
10 |
|
11 |
+
# Import the CodeAct agent functionality
|
12 |
+
from agent import FileInjectedPyodideSandbox, create_pyodide_eval_fn, create_codeact
|
13 |
+
|
14 |
# Load environment variables
|
15 |
load_dotenv(find_dotenv())
|
16 |
|
|
|
20 |
api_key=os.getenv("OPENAI_API_KEY"),
|
21 |
)
|
22 |
|
23 |
+
# Create the basic chat agent
|
24 |
chat_agent = create_react_agent(openai_model, tools=[])
|
25 |
|
26 |
+
# Initialize CodeAct model for file analysis
|
27 |
+
codeact_model = init_chat_model("gpt-4.1-2025-04-14", model_provider="openai")
|
28 |
+
|
29 |
+
# Store uploaded file path globally
|
30 |
+
uploaded_file_path = None
|
31 |
|
32 |
@traceable
|
33 |
def respond(
|
|
|
64 |
if "messages" in chunk and chunk["messages"]:
|
65 |
latest_message = chunk["messages"][-1]
|
66 |
if hasattr(latest_message, 'content'):
|
|
|
67 |
current_content = latest_message.content
|
68 |
if current_content and len(current_content) > len(response_text):
|
69 |
response_text = current_content
|
|
|
76 |
except Exception as e:
|
77 |
yield f"Error: {str(e)}. Please make sure your OpenAI API key is set correctly."
|
78 |
|
79 |
+
def handle_file_upload(file):
|
80 |
+
"""Handle file upload and store the path globally"""
|
81 |
+
global uploaded_file_path
|
82 |
+
if file is not None:
|
83 |
+
uploaded_file_path = file.name
|
84 |
+
return f"β
File uploaded successfully: {os.path.basename(file.name)}"
|
85 |
+
else:
|
86 |
+
uploaded_file_path = None
|
87 |
+
return "β No file uploaded"
|
88 |
+
|
89 |
+
async def analyze_uploaded_file():
|
90 |
+
"""Analyze the uploaded file using CodeAct agent"""
|
91 |
+
global uploaded_file_path
|
92 |
+
|
93 |
+
if not uploaded_file_path or not os.path.exists(uploaded_file_path):
|
94 |
+
return "β No file uploaded or file not found. Please upload a file first."
|
95 |
+
|
96 |
+
try:
|
97 |
+
# Create sandbox with the uploaded file
|
98 |
+
sandbox = FileInjectedPyodideSandbox(
|
99 |
+
file_path=uploaded_file_path,
|
100 |
+
virtual_path="/uploaded_file.log",
|
101 |
+
sessions_dir=None, # Will create temp directory automatically
|
102 |
+
allow_net=True
|
103 |
+
)
|
104 |
+
|
105 |
+
eval_fn = create_pyodide_eval_fn(sandbox)
|
106 |
+
code_act = create_codeact(codeact_model, [], eval_fn)
|
107 |
+
agent = code_act.compile()
|
108 |
+
|
109 |
+
# Create analysis query based on file type
|
110 |
+
file_ext = os.path.splitext(uploaded_file_path)[1].lower()
|
111 |
+
|
112 |
+
if file_ext in ['.log', '.txt']:
|
113 |
+
query = """
|
114 |
+
Analyze this uploaded file and provide:
|
115 |
+
1. **Content Overview** - What type of data/logs this file contains
|
116 |
+
2. **Key Patterns** - Important patterns, trends, or anomalies found
|
117 |
+
3. **Statistical Summary** - Basic statistics (line count, data distribution, etc.)
|
118 |
+
4. **Insights & Findings** - Key takeaways from the analysis
|
119 |
+
5. **Recommendations** - Suggested actions based on the analysis
|
120 |
+
|
121 |
+
DATA SOURCES AVAILABLE:
|
122 |
+
- `file_content`: Raw file content as a string
|
123 |
+
- `log_lines`: List of individual lines
|
124 |
+
- `total_lines`: Number of lines in the file
|
125 |
+
- File path: `/uploaded_file.log` (can be read with open('/uploaded_file.log', 'r'))
|
126 |
|
127 |
+
Generate Python code to analyze the file and provide comprehensive insights.
|
128 |
"""
|
129 |
+
else:
|
130 |
+
query = f"""
|
131 |
+
Analyze this uploaded {file_ext} file and provide:
|
132 |
+
1. **File Type Analysis** - What type of file this is and its structure
|
133 |
+
2. **Content Summary** - Overview of the file contents
|
134 |
+
3. **Key Information** - Important data points or patterns found
|
135 |
+
4. **Statistical Analysis** - Basic statistics and data distribution
|
136 |
+
5. **Recommendations** - Suggested next steps or insights
|
137 |
+
|
138 |
+
DATA SOURCES AVAILABLE:
|
139 |
+
- `file_content`: Raw file content as a string
|
140 |
+
- `log_lines`: List of individual lines
|
141 |
+
- `total_lines`: Number of lines in the file
|
142 |
+
- File path: `/uploaded_file.log`
|
143 |
+
|
144 |
+
Generate Python code to analyze this file and provide comprehensive insights.
|
145 |
"""
|
146 |
+
|
147 |
+
# Run the analysis
|
148 |
+
result_parts = []
|
149 |
+
async for typ, chunk in agent.astream(
|
150 |
+
{"messages": query},
|
151 |
+
stream_mode=["values", "messages"],
|
152 |
+
):
|
153 |
+
if typ == "messages":
|
154 |
+
result_parts.append(chunk[0].content)
|
155 |
+
elif typ == "values":
|
156 |
+
if chunk and "messages" in chunk:
|
157 |
+
final_message = chunk["messages"][-1]
|
158 |
+
if hasattr(final_message, 'content'):
|
159 |
+
result_parts.append(f"\n\n**Final Analysis:**\n{final_message.content}")
|
160 |
+
|
161 |
+
return "\n".join(result_parts) if result_parts else "Analysis completed but no output generated."
|
162 |
+
|
163 |
+
except Exception as e:
|
164 |
+
return f"β Error analyzing file: {str(e)}"
|
165 |
|
166 |
+
def run_file_analysis():
|
167 |
+
"""Wrapper to run async file analysis in sync context"""
|
168 |
+
return asyncio.run(analyze_uploaded_file())
|
169 |
+
|
170 |
+
# Create the Gradio interface
|
171 |
+
with gr.Blocks(title="DataForge - AI Assistant with File Analysis") as demo:
|
172 |
+
gr.Markdown("# π DataForge - AI Assistant with File Analysis")
|
173 |
+
gr.Markdown("Upload files for analysis or chat with the AI assistant.")
|
174 |
+
|
175 |
+
with gr.Tab("π¬ Chat Assistant"):
|
176 |
+
chat_interface = gr.ChatInterface(
|
177 |
+
respond,
|
178 |
+
additional_inputs=[
|
179 |
+
gr.Textbox(
|
180 |
+
value="You are a helpful AI assistant. Be friendly, informative, and concise in your responses.",
|
181 |
+
label="System message"
|
182 |
+
),
|
183 |
+
gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
|
184 |
+
gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
|
185 |
+
gr.Slider(
|
186 |
+
minimum=0.1,
|
187 |
+
maximum=1.0,
|
188 |
+
value=0.95,
|
189 |
+
step=0.05,
|
190 |
+
label="Top-p (nucleus sampling)",
|
191 |
+
),
|
192 |
+
],
|
193 |
+
title="Chat with AI Assistant",
|
194 |
+
description="Ask questions or get help with any topic."
|
195 |
+
)
|
196 |
+
|
197 |
+
with gr.Tab("π File Analysis"):
|
198 |
+
gr.Markdown("## Upload and Analyze Files")
|
199 |
+
gr.Markdown("Upload log files, text files, or other data files for comprehensive AI-powered analysis.")
|
200 |
+
|
201 |
+
with gr.Row():
|
202 |
+
with gr.Column(scale=1):
|
203 |
+
file_upload = gr.File(
|
204 |
+
label="Upload File for Analysis",
|
205 |
+
file_types=[".txt", ".log", ".csv", ".json", ".xml", ".py", ".js", ".html", ".md"],
|
206 |
+
type="filepath"
|
207 |
+
)
|
208 |
+
upload_status = gr.Textbox(
|
209 |
+
label="Upload Status",
|
210 |
+
value="No file uploaded",
|
211 |
+
interactive=False
|
212 |
+
)
|
213 |
+
analyze_btn = gr.Button("π Analyze File", variant="primary", size="lg")
|
214 |
+
|
215 |
+
with gr.Column(scale=2):
|
216 |
+
analysis_output = gr.Textbox(
|
217 |
+
label="Analysis Results",
|
218 |
+
lines=20,
|
219 |
+
max_lines=30,
|
220 |
+
placeholder="Upload a file and click 'Analyze File' to see detailed analysis results here...",
|
221 |
+
interactive=False
|
222 |
+
)
|
223 |
+
|
224 |
+
# Event handlers
|
225 |
+
file_upload.change(
|
226 |
+
fn=handle_file_upload,
|
227 |
+
inputs=[file_upload],
|
228 |
+
outputs=[upload_status]
|
229 |
+
)
|
230 |
+
|
231 |
+
analyze_btn.click(
|
232 |
+
fn=run_file_analysis,
|
233 |
+
inputs=[],
|
234 |
+
outputs=[analysis_output]
|
235 |
+
)
|
236 |
|
237 |
if __name__ == "__main__":
|
238 |
demo.launch()
|
requirements.txt
CHANGED
@@ -6,6 +6,7 @@ charset-normalizer
|
|
6 |
distro
|
7 |
dotenv
|
8 |
e2b-code-interpreter
|
|
|
9 |
h11
|
10 |
httpcore
|
11 |
httpx
|
|
|
6 |
distro
|
7 |
dotenv
|
8 |
e2b-code-interpreter
|
9 |
+
gradio
|
10 |
h11
|
11 |
httpcore
|
12 |
httpx
|
sample_server.log
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2024-01-15 08:23:45 [INFO] user_login: user=john_doe, ip=192.168.1.100, success=true, session_id=abc123
|
2 |
+
2024-01-15 08:24:12 [INFO] api_request: endpoint=/api/users, method=GET, user=john_doe, response_time=45ms, status=200
|
3 |
+
2024-01-15 08:24:15 [INFO] api_request: endpoint=/api/dashboard, method=GET, user=john_doe, response_time=120ms, status=200
|
4 |
+
2024-01-15 08:25:33 [INFO] user_login: user=alice_smith, ip=192.168.1.101, success=true, session_id=def456
|
5 |
+
2024-01-15 08:26:01 [ERROR] database_connection: host=db-primary, error=timeout, duration=30s, query=SELECT * FROM users
|
6 |
+
2024-01-15 08:26:45 [INFO] api_request: endpoint=/api/products, method=GET, user=alice_smith, response_time=2300ms, status=200
|
7 |
+
2024-01-15 08:27:22 [WARN] failed_login: user=admin, ip=203.45.67.89, attempts=3, reason=invalid_password
|
8 |
+
2024-01-15 08:28:11 [INFO] user_logout: user=john_doe, session_duration=4m26s, pages_visited=5
|
9 |
+
2024-01-15 08:29:33 [CRITICAL] security_alert: suspicious_activity, ip=185.234.72.19, pattern=sql_injection_attempt, endpoint=/api/users
|
10 |
+
2024-01-15 08:30:15 [ERROR] api_request: endpoint=/api/orders, method=POST, user=alice_smith, response_time=timeout, status=500, error=database_unavailable
|
11 |
+
2024-01-15 08:31:45 [WARN] rate_limit: ip=203.45.67.89, endpoint=/api/login, requests_per_minute=25, limit=10
|
12 |
+
2024-01-15 08:32:12 [INFO] user_login: user=bob_wilson, ip=192.168.1.102, success=true, session_id=ghi789
|
13 |
+
2024-01-15 08:33:28 [CRITICAL] security_alert: brute_force_attack, ip=203.45.67.89, attempts=15, duration=5m, blocked=true
|
14 |
+
2024-01-15 08:34:55 [INFO] api_request: endpoint=/api/reports, method=GET, user=bob_wilson, response_time=850ms, status=200
|
15 |
+
2024-01-15 08:35:21 [ERROR] memory_usage: process=web-server, usage=85%, threshold=80%, action=alert_sent
|
16 |
+
2024-01-15 08:36:47 [INFO] backup_completed: database=primary, size=2.3GB, duration=45m, status=success
|
17 |
+
2024-01-15 08:37:15 [WARN] disk_space: partition=/data, usage=92%, available=800MB, threshold=90%
|
18 |
+
2024-01-15 08:38:33 [CRITICAL] security_alert: suspicious_activity, ip=185.234.72.19, pattern=xss_attempt, endpoint=/api/comments
|
19 |
+
2024-01-15 08:39:12 [INFO] user_login: user=carol_davis, ip=192.168.1.103, success=true, session_id=jkl012
|
20 |
+
2024-01-15 08:40:28 [ERROR] external_api: service=payment_gateway, endpoint=charge, response_time=timeout, status=503, retry_attempt=3
|
21 |
+
2024-01-15 08:41:15 [INFO] api_request: endpoint=/api/analytics, method=GET, user=carol_davis, response_time=1200ms, status=200
|
22 |
+
2024-01-15 08:42:33 [WARN] slow_query: query=SELECT * FROM orders WHERE date > '2024-01', duration=5.2s, threshold=2s
|
23 |
+
2024-01-15 08:43:47 [INFO] cache_hit: key=user_preferences_john_doe, hit_rate=89%, response_time=5ms
|
24 |
+
2024-01-15 08:44:12 [CRITICAL] system_alert: cpu_usage=95%, memory_usage=88%, load_average=4.2, action=scaling_triggered
|
25 |
+
2024-01-15 08:45:28 [INFO] user_logout: user=alice_smith, session_duration=19m55s, pages_visited=12
|