ai-puppy commited on
Commit
b2ca056
Β·
1 Parent(s): b212a72
Files changed (6) hide show
  1. .gitignore +2 -0
  2. README.md +79 -18
  3. agent.py +191 -60
  4. app.py +164 -18
  5. requirements.txt +1 -0
  6. sample_server.log +25 -0
.gitignore CHANGED
@@ -1,2 +1,4 @@
1
  .DS_Store
2
  .env
 
 
 
1
  .DS_Store
2
  .env
3
+ node_modules/
4
+ __pycache__/
README.md CHANGED
@@ -1,20 +1,81 @@
1
- ---
2
- title: DataForge
3
- emoji: πŸ’¬
4
- colorFrom: yellow
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.0.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: CodeAct Agent to process large data set
12
- ---
13
-
14
- An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  uv venv --python 3.11
17
- source .venv/bin/activate
18
- deactivate
19
- uv pip freeze > requirements.txt
20
- uv pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ” DataForge - AI Assistant with File Analysis
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ An intelligent AI assistant that combines conversational chat capabilities with advanced file analysis using CodeAct agents. Built with Gradio, LangChain, and LangGraph.
4
+
5
+ ## ✨ Features
6
+
7
+ ### πŸ’¬ Chat Assistant
8
+ - Interactive AI chatbot powered by OpenAI GPT-4
9
+ - Customizable system messages and parameters
10
+ - Real-time streaming responses
11
+ - Conversation history support
12
+
13
+ ### πŸ“ File Analysis
14
+ - **Upload & Analyze**: Support for various file formats (.txt, .log, .csv, .json, .xml, .py, .js, .html, .md)
15
+ - **Smart Analysis**: Automatic file type detection and tailored analysis
16
+ - **CodeAct Integration**: Uses LangGraph CodeAct agents for deep file analysis
17
+ - **Comprehensive Insights**: Provides security analysis, performance insights, error detection, and statistical summaries
18
+
19
+ ## πŸš€ Getting Started
20
+
21
+ ### Prerequisites
22
+ - Python 3.11+
23
+ - OpenAI API Key
24
+
25
+ ### Installation
26
+
27
+ 1. Create and activate virtual environment:
28
+ ```bash
29
  uv venv --python 3.11
30
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
31
+ ```
32
+
33
+ 2. Install dependencies:
34
+ ```bash
35
+ uv pip install -r requirements.txt
36
+ ```
37
+
38
+ 3. Set up environment variables:
39
+ ```bash
40
+ # Create .env file and add your OpenAI API key
41
+ OPENAI_API_KEY=your_openai_api_key_here
42
+ ```
43
+
44
+ ### Running the Application
45
+ ```bash
46
+ python app.py
47
+ ```
48
+
49
+ The application will start a Gradio interface accessible at `http://localhost:7860`
50
+
51
+ ## πŸ“Š File Analysis Capabilities
52
+
53
+ ### Supported File Types
54
+ - **Log files** (.log, .txt): Security analysis, performance bottlenecks, error detection
55
+ - **Data files** (.csv, .json): Data quality assessment, statistical analysis
56
+ - **Code files** (.py, .js, .html): Structure analysis, best practices review
57
+ - **Configuration files** (.xml, .md): Content analysis and recommendations
58
+
59
+ ### Analysis Features
60
+ - **Security Analysis**: Detect threats, suspicious activities, and security patterns
61
+ - **Performance Insights**: Identify bottlenecks and performance issues
62
+ - **Error Analysis**: Categorize and analyze errors and warnings
63
+ - **Statistical Summary**: Basic statistics and data distribution
64
+ - **Pattern Recognition**: Identify trends and anomalies
65
+ - **Actionable Recommendations**: Suggested actions based on analysis
66
+
67
+ ## πŸ§ͺ Testing
68
+
69
+ A sample server log file (`sample_server.log`) is included for testing the file analysis functionality.
70
+
71
+ ## πŸ› οΈ Technical Architecture
72
+
73
+ - **Frontend**: Gradio for web interface
74
+ - **Backend**: LangChain for AI orchestration
75
+ - **Analysis Engine**: LangGraph CodeAct agents with PyodideSandbox
76
+ - **File Processing**: Custom FileInjectedPyodideSandbox for secure file analysis
77
+ - **Model**: OpenAI GPT-4 for both chat and analysis
78
+
79
+ ## πŸ“„ License
80
+
81
+ MIT License
agent.py CHANGED
@@ -2,6 +2,8 @@ import asyncio
2
  import inspect
3
  import uuid
4
  import os
 
 
5
  from typing import Any
6
 
7
  from langchain.chat_models import init_chat_model
@@ -15,11 +17,17 @@ load_dotenv(find_dotenv())
15
  class FileInjectedPyodideSandbox(PyodideSandbox):
16
  """Custom PyodideSandbox that can inject files into the virtual filesystem."""
17
 
18
- def __init__(self, file_path: str = None, virtual_path: str = "/server.log", **kwargs):
19
- super().__init__(**kwargs)
 
 
 
 
20
  self.file_path = file_path
21
  self.virtual_path = virtual_path
22
  self._file_injected = False
 
 
23
 
24
  async def execute(self, code: str, **kwargs):
25
  # If we have a file to inject, prepend the injection code to the user code
@@ -40,7 +48,7 @@ class FileInjectedPyodideSandbox(PyodideSandbox):
40
  import base64
41
  import os
42
 
43
- # Decode the log file content from base64
44
  encoded_content = """{encoded_content}"""
45
  file_content = base64.b64decode(encoded_content).decode('utf-8')
46
 
@@ -54,7 +62,7 @@ total_lines = len(log_lines)
54
 
55
  print(f"[INJECTION] Successfully created {self.virtual_path} with {{len(file_content)}} characters")
56
  print(f"[INJECTION] File content available as 'file_content' variable ({{len(file_content)}} chars)")
57
- print(f"[INJECTION] Log lines available as 'log_lines' variable ({{total_lines}} lines)")
58
 
59
  # Verify injection worked
60
  if os.path.exists("{self.virtual_path}"):
@@ -64,8 +72,8 @@ else:
64
 
65
  # Variables now available for analysis:
66
  # - file_content: raw file content as string
67
- # - log_lines: list of individual log lines
68
- # - total_lines: number of lines in the log
69
  # - File also available at: {self.virtual_path}
70
 
71
  # End of injection code
@@ -82,6 +90,19 @@ else:
82
  return await super().execute(code, **kwargs)
83
  else:
84
  return await super().execute(code, **kwargs)
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  def create_pyodide_eval_fn(sandbox: PyodideSandbox) -> EvalCoroutine:
87
  """Create an eval_fn that uses PyodideSandbox.
@@ -160,68 +181,178 @@ def read_file(file_path: str) -> str:
160
  return file.read()
161
 
162
 
163
- tools = []
164
-
165
- model = init_chat_model("gpt-4.1-2025-04-14", model_provider="openai")
166
-
167
- # Specify the log file path
168
- log_file_path = "/Users/hw/Desktop/codeact_agent/server.log"
169
-
170
- # Create our custom sandbox with file injection capability
171
- sandbox = FileInjectedPyodideSandbox(
172
- file_path=log_file_path,
173
- virtual_path="/server.log",
174
- allow_net=True
175
- )
176
-
177
- eval_fn = create_pyodide_eval_fn(sandbox)
178
- code_act = create_codeact(model, tools, eval_fn)
179
- agent = code_act.compile()
180
-
181
- query = """
182
- Analyze these server logs and provide:
183
- 1. Security threat summary - identify attack patterns, suspicious IPs, and breach attempts
184
- 2. Performance bottlenecks - find slow endpoints, database issues, and resource constraints
185
- 3. User behavior analysis - login patterns, most accessed endpoints, session durations
186
- 4. System health report - error rates, critical alerts, and infrastructure issues
187
- 5. Recommended actions based on the analysis
188
-
189
- LOG FORMAT INFORMATION:
190
- The server logs follow this format:
191
- YYYY-MM-DD HH:MM:SS [LEVEL] event_type: key=value, key=value, ...
192
 
193
- Sample log entries:
194
- - 2024-01-15 08:23:45 [INFO] user_login: user=john_doe, ip=192.168.1.100, success=true
195
- - 2024-01-15 08:24:12 [INFO] api_request: endpoint=/api/users, method=GET, user=john_doe, response_time=45ms
196
- - 2024-01-15 08:27:22 [WARN] failed_login: user=admin, ip=203.45.67.89, attempts=3
197
- - 2024-01-15 08:38:33 [CRITICAL] security_alert: suspicious_activity, ip=185.234.72.19, pattern=sql_injection_attempt
198
- - 2024-01-15 08:26:01 [ERROR] database_connection: host=db-primary, error=timeout, duration=30s
199
 
200
- Key log levels: INFO, WARN, ERROR, CRITICAL
201
- Key event types: user_login, user_logout, api_request, failed_login, security_alert, database_connection, etc.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
 
203
  DATA SOURCES AVAILABLE:
204
- - `file_content`: Raw log content as a string
205
- - `log_lines`: List of individual log lines
206
- - `total_lines`: Number of lines in the log
207
- - File path: `/server.log` (can be read with open('/server.log', 'r'))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
 
209
- Generate python code and run it in the sandbox to get the analysis.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210
  """
211
 
212
 
213
- async def run_agent(query: str):
214
- # Stream agent outputs
215
- async for typ, chunk in agent.astream(
216
- {"messages": query},
217
- stream_mode=["values", "messages"],
218
- ):
219
- if typ == "messages":
220
- print(chunk[0].content, end="")
221
- elif typ == "values":
222
- print("\n\n---answer---\n\n", chunk)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
 
224
 
 
225
  if __name__ == "__main__":
226
- # Run the agent
227
- asyncio.run(run_agent(query))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  import inspect
3
  import uuid
4
  import os
5
+ import tempfile
6
+ import shutil
7
  from typing import Any
8
 
9
  from langchain.chat_models import init_chat_model
 
17
  class FileInjectedPyodideSandbox(PyodideSandbox):
18
  """Custom PyodideSandbox that can inject files into the virtual filesystem."""
19
 
20
+ def __init__(self, file_path: str = None, virtual_path: str = "/uploaded_file.log", sessions_dir: str = None, **kwargs):
21
+ # Create a temporary sessions directory if none provided
22
+ if sessions_dir is None:
23
+ sessions_dir = tempfile.mkdtemp(prefix="pyodide_sessions_")
24
+
25
+ super().__init__(sessions_dir=sessions_dir, **kwargs)
26
  self.file_path = file_path
27
  self.virtual_path = virtual_path
28
  self._file_injected = False
29
+ self._temp_sessions_dir = sessions_dir
30
+ self._created_temp_dir = sessions_dir is None
31
 
32
  async def execute(self, code: str, **kwargs):
33
  # If we have a file to inject, prepend the injection code to the user code
 
48
  import base64
49
  import os
50
 
51
+ # Decode the file content from base64
52
  encoded_content = """{encoded_content}"""
53
  file_content = base64.b64decode(encoded_content).decode('utf-8')
54
 
 
62
 
63
  print(f"[INJECTION] Successfully created {self.virtual_path} with {{len(file_content)}} characters")
64
  print(f"[INJECTION] File content available as 'file_content' variable ({{len(file_content)}} chars)")
65
+ print(f"[INJECTION] Lines available as 'log_lines' variable ({{total_lines}} lines)")
66
 
67
  # Verify injection worked
68
  if os.path.exists("{self.virtual_path}"):
 
72
 
73
  # Variables now available for analysis:
74
  # - file_content: raw file content as string
75
+ # - log_lines: list of individual lines
76
+ # - total_lines: number of lines in the file
77
  # - File also available at: {self.virtual_path}
78
 
79
  # End of injection code
 
90
  return await super().execute(code, **kwargs)
91
  else:
92
  return await super().execute(code, **kwargs)
93
+
94
+ def cleanup(self):
95
+ """Clean up temporary directories if we created them."""
96
+ if self._created_temp_dir and self._temp_sessions_dir and os.path.exists(self._temp_sessions_dir):
97
+ try:
98
+ shutil.rmtree(self._temp_sessions_dir)
99
+ print(f"Cleaned up temporary sessions directory: {self._temp_sessions_dir}")
100
+ except Exception as e:
101
+ print(f"Warning: Could not clean up temporary directory {self._temp_sessions_dir}: {e}")
102
+
103
+ def __del__(self):
104
+ """Cleanup when object is destroyed."""
105
+ self.cleanup()
106
 
107
  def create_pyodide_eval_fn(sandbox: PyodideSandbox) -> EvalCoroutine:
108
  """Create an eval_fn that uses PyodideSandbox.
 
181
  return file.read()
182
 
183
 
184
+ def create_analysis_agent(file_path: str, model=None, virtual_path: str = "/uploaded_file.log", sessions_dir: str = None):
185
+ """
186
+ Create a CodeAct agent configured for file analysis.
187
+
188
+ Args:
189
+ file_path: Path to the file to analyze
190
+ model: Language model to use (if None, will initialize default)
191
+ virtual_path: Virtual path where file will be mounted in sandbox
192
+ sessions_dir: Directory for PyodideSandbox sessions (if None, will create temp dir)
193
+
194
+ Returns:
195
+ Compiled CodeAct agent ready for analysis
196
+ """
197
+ if model is None:
198
+ model = init_chat_model("gpt-4.1-2025-04-14", model_provider="openai")
199
+
200
+ # Create our custom sandbox with file injection capability
201
+ sandbox = FileInjectedPyodideSandbox(
202
+ file_path=file_path,
203
+ virtual_path=virtual_path,
204
+ sessions_dir=sessions_dir,
205
+ allow_net=True
206
+ )
207
+
208
+ eval_fn = create_pyodide_eval_fn(sandbox)
209
+ code_act = create_codeact(model, [], eval_fn)
210
+ return code_act.compile()
 
 
211
 
 
 
 
 
 
 
212
 
213
+ def get_default_analysis_query(file_extension: str = None) -> str:
214
+ """
215
+ Get a default analysis query based on file type.
216
+
217
+ Args:
218
+ file_extension: File extension (e.g., '.log', '.csv', '.txt')
219
+
220
+ Returns:
221
+ Analysis query string
222
+ """
223
+ if file_extension and file_extension.lower() in ['.log', '.txt']:
224
+ return """
225
+ Analyze this uploaded file and provide comprehensive insights. Follow the example code patterns below for reliable analysis.
226
+
227
+ ANALYSIS REQUIREMENTS:
228
+ 1. **Content Overview** - What type of data/logs this file contains
229
+ 2. **Security Analysis** - Identify any security-related events, threats, or suspicious activities
230
+ 3. **Performance Insights** - Find bottlenecks, slow operations, or performance issues
231
+ 4. **Error Analysis** - Identify and categorize errors, warnings, and critical issues
232
+ 5. **Statistical Summary** - Basic statistics (line count, data distribution, time ranges)
233
+ 6. **Key Patterns** - Important patterns, trends, or anomalies found
234
+ 7. **Recommendations** - Suggested actions based on the analysis
235
 
236
  DATA SOURCES AVAILABLE:
237
+ - `file_content`: Raw file content as a string
238
+ - `log_lines`: List of individual lines
239
+ - `total_lines`: Number of lines in the file
240
+ - File path: `/uploaded_file.log`
241
+
242
+ EXAMPLE CODE PATTERNS TO FOLLOW:
243
+
244
+ Start with basic analysis, then add specific patterns based on your file type:
245
+
246
+ 1. Import required libraries: re, Counter, defaultdict, datetime
247
+ 2. Basic file statistics: total_lines, file_content length, sample lines
248
+ 3. Pattern analysis using regex for security, performance, errors
249
+ 4. Data extraction and frequency analysis
250
+ 5. Clear formatted output with sections
251
+ 6. Actionable recommendations
252
+
253
+ Use these code snippets as templates:
254
+ - Counter() for frequency analysis
255
+ - re.search() and re.findall() for pattern matching
256
+ - enumerate(log_lines, 1) for line-by-line processing
257
+ - defaultdict(list) for grouping findings
258
+ - Clear print statements with section headers
259
+
260
+ Generate Python code following these patterns. Always include proper error handling, clear output formatting, and actionable insights.
261
+ """
262
+ else:
263
+ return """
264
+ Analyze this uploaded file and provide comprehensive insights. Follow these reliable patterns:
265
+
266
+ ANALYSIS REQUIREMENTS:
267
+ 1. **File Type Analysis** - What type of file this is and its structure
268
+ 2. **Content Summary** - Overview of the file contents
269
+ 3. **Key Information** - Important data points or patterns found
270
+ 4. **Data Quality** - Assessment of data completeness and consistency
271
+ 5. **Statistical Analysis** - Basic statistics and data distribution
272
+ 6. **Insights & Findings** - Key takeaways from the analysis
273
+ 7. **Recommendations** - Suggested next steps or insights
274
 
275
+ DATA SOURCES AVAILABLE:
276
+ - file_content: Raw file content as a string
277
+ - log_lines: List of individual lines
278
+ - total_lines: Number of lines in the file
279
+ - File path: /uploaded_file.log
280
+
281
+ RELIABLE CODE PATTERNS:
282
+ 1. Start with basic stats: total_lines, len(file_content), file preview
283
+ 2. Use Counter() for frequency analysis of patterns
284
+ 3. Use re.findall() for extracting structured data like emails, IPs, dates
285
+ 4. Analyze line structure and consistency
286
+ 5. Calculate data quality metrics
287
+ 6. Provide clear sections with === headers ===
288
+ 7. End with actionable recommendations
289
+
290
+ Focus on reliability over complexity. Use simple, proven Python patterns that work consistently.
291
+
292
+ Generate Python code following these guidelines for robust file analysis.
293
  """
294
 
295
 
296
+ async def run_file_analysis(file_path: str, query: str = None, model=None) -> str:
297
+ """
298
+ Run file analysis using CodeAct agent.
299
+
300
+ Args:
301
+ file_path: Path to the file to analyze
302
+ query: Analysis query (if None, will use default based on file type)
303
+ model: Language model to use
304
+
305
+ Returns:
306
+ Analysis results as string
307
+ """
308
+ if not os.path.exists(file_path):
309
+ return f"❌ File not found: {file_path}"
310
+
311
+ try:
312
+ # Create the agent
313
+ agent = create_analysis_agent(file_path, model)
314
+
315
+ # Use default query if none provided
316
+ if query is None:
317
+ file_ext = os.path.splitext(file_path)[1]
318
+ query = get_default_analysis_query(file_ext)
319
+
320
+ # Run the analysis
321
+ result_parts = []
322
+ async for typ, chunk in agent.astream(
323
+ {"messages": query},
324
+ stream_mode=["values", "messages"],
325
+ ):
326
+ if typ == "messages":
327
+ result_parts.append(chunk[0].content)
328
+ elif typ == "values":
329
+ if chunk and "messages" in chunk:
330
+ final_message = chunk["messages"][-1]
331
+ if hasattr(final_message, 'content'):
332
+ result_parts.append(f"\n\n**Final Analysis:**\n{final_message.content}")
333
+
334
+ return "\n".join(result_parts) if result_parts else "Analysis completed but no output generated."
335
+
336
+ except Exception as e:
337
+ return f"❌ Error analyzing file: {str(e)}"
338
 
339
 
340
+ # Example usage and testing
341
  if __name__ == "__main__":
342
+ # This section is for testing only - remove or comment out in production
343
+ import sys
344
+
345
+ if len(sys.argv) > 1:
346
+ test_file_path = sys.argv[1]
347
+ print(f"Testing with file: {test_file_path}")
348
+
349
+ async def test_analysis():
350
+ result = await run_file_analysis(test_file_path)
351
+ print("Analysis Result:")
352
+ print("=" * 50)
353
+ print(result)
354
+
355
+ asyncio.run(test_analysis())
356
+ else:
357
+ print("Usage: python agent.py <file_path>")
358
+ print("Or import this module and use the functions directly.")
app.py CHANGED
@@ -1,11 +1,16 @@
1
  import os
2
  import gradio as gr
 
 
3
  from dotenv import find_dotenv, load_dotenv
4
  from langchain.chat_models import init_chat_model
5
  from langchain.schema import HumanMessage, SystemMessage
6
  from langgraph.prebuilt import create_react_agent
7
  from langsmith import traceable
8
 
 
 
 
9
  # Load environment variables
10
  load_dotenv(find_dotenv())
11
 
@@ -15,9 +20,14 @@ openai_model = init_chat_model(
15
  api_key=os.getenv("OPENAI_API_KEY"),
16
  )
17
 
18
- # Create the agent (you can add tools here later if needed)
19
  chat_agent = create_react_agent(openai_model, tools=[])
20
 
 
 
 
 
 
21
 
22
  @traceable
23
  def respond(
@@ -54,7 +64,6 @@ def respond(
54
  if "messages" in chunk and chunk["messages"]:
55
  latest_message = chunk["messages"][-1]
56
  if hasattr(latest_message, 'content'):
57
- # Extract content from the message
58
  current_content = latest_message.content
59
  if current_content and len(current_content) > len(response_text):
60
  response_text = current_content
@@ -67,26 +76,163 @@ def respond(
67
  except Exception as e:
68
  yield f"Error: {str(e)}. Please make sure your OpenAI API key is set correctly."
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
 
71
  """
72
- For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  """
74
- demo = gr.ChatInterface(
75
- respond,
76
- additional_inputs=[
77
- gr.Textbox(value="You are a helpful AI assistant. Be friendly, informative, and concise in your responses.", label="System message"),
78
- gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
79
- gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
80
- gr.Slider(
81
- minimum=0.1,
82
- maximum=1.0,
83
- value=0.95,
84
- step=0.05,
85
- label="Top-p (nucleus sampling)",
86
- ),
87
- ],
88
- )
 
 
 
 
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
  if __name__ == "__main__":
92
  demo.launch()
 
1
  import os
2
  import gradio as gr
3
+ import asyncio
4
+ import tempfile
5
  from dotenv import find_dotenv, load_dotenv
6
  from langchain.chat_models import init_chat_model
7
  from langchain.schema import HumanMessage, SystemMessage
8
  from langgraph.prebuilt import create_react_agent
9
  from langsmith import traceable
10
 
11
+ # Import the CodeAct agent functionality
12
+ from agent import FileInjectedPyodideSandbox, create_pyodide_eval_fn, create_codeact
13
+
14
  # Load environment variables
15
  load_dotenv(find_dotenv())
16
 
 
20
  api_key=os.getenv("OPENAI_API_KEY"),
21
  )
22
 
23
+ # Create the basic chat agent
24
  chat_agent = create_react_agent(openai_model, tools=[])
25
 
26
+ # Initialize CodeAct model for file analysis
27
+ codeact_model = init_chat_model("gpt-4.1-2025-04-14", model_provider="openai")
28
+
29
+ # Store uploaded file path globally
30
+ uploaded_file_path = None
31
 
32
  @traceable
33
  def respond(
 
64
  if "messages" in chunk and chunk["messages"]:
65
  latest_message = chunk["messages"][-1]
66
  if hasattr(latest_message, 'content'):
 
67
  current_content = latest_message.content
68
  if current_content and len(current_content) > len(response_text):
69
  response_text = current_content
 
76
  except Exception as e:
77
  yield f"Error: {str(e)}. Please make sure your OpenAI API key is set correctly."
78
 
79
+ def handle_file_upload(file):
80
+ """Handle file upload and store the path globally"""
81
+ global uploaded_file_path
82
+ if file is not None:
83
+ uploaded_file_path = file.name
84
+ return f"βœ… File uploaded successfully: {os.path.basename(file.name)}"
85
+ else:
86
+ uploaded_file_path = None
87
+ return "❌ No file uploaded"
88
+
89
+ async def analyze_uploaded_file():
90
+ """Analyze the uploaded file using CodeAct agent"""
91
+ global uploaded_file_path
92
+
93
+ if not uploaded_file_path or not os.path.exists(uploaded_file_path):
94
+ return "❌ No file uploaded or file not found. Please upload a file first."
95
+
96
+ try:
97
+ # Create sandbox with the uploaded file
98
+ sandbox = FileInjectedPyodideSandbox(
99
+ file_path=uploaded_file_path,
100
+ virtual_path="/uploaded_file.log",
101
+ sessions_dir=None, # Will create temp directory automatically
102
+ allow_net=True
103
+ )
104
+
105
+ eval_fn = create_pyodide_eval_fn(sandbox)
106
+ code_act = create_codeact(codeact_model, [], eval_fn)
107
+ agent = code_act.compile()
108
+
109
+ # Create analysis query based on file type
110
+ file_ext = os.path.splitext(uploaded_file_path)[1].lower()
111
+
112
+ if file_ext in ['.log', '.txt']:
113
+ query = """
114
+ Analyze this uploaded file and provide:
115
+ 1. **Content Overview** - What type of data/logs this file contains
116
+ 2. **Key Patterns** - Important patterns, trends, or anomalies found
117
+ 3. **Statistical Summary** - Basic statistics (line count, data distribution, etc.)
118
+ 4. **Insights & Findings** - Key takeaways from the analysis
119
+ 5. **Recommendations** - Suggested actions based on the analysis
120
+
121
+ DATA SOURCES AVAILABLE:
122
+ - `file_content`: Raw file content as a string
123
+ - `log_lines`: List of individual lines
124
+ - `total_lines`: Number of lines in the file
125
+ - File path: `/uploaded_file.log` (can be read with open('/uploaded_file.log', 'r'))
126
 
127
+ Generate Python code to analyze the file and provide comprehensive insights.
128
  """
129
+ else:
130
+ query = f"""
131
+ Analyze this uploaded {file_ext} file and provide:
132
+ 1. **File Type Analysis** - What type of file this is and its structure
133
+ 2. **Content Summary** - Overview of the file contents
134
+ 3. **Key Information** - Important data points or patterns found
135
+ 4. **Statistical Analysis** - Basic statistics and data distribution
136
+ 5. **Recommendations** - Suggested next steps or insights
137
+
138
+ DATA SOURCES AVAILABLE:
139
+ - `file_content`: Raw file content as a string
140
+ - `log_lines`: List of individual lines
141
+ - `total_lines`: Number of lines in the file
142
+ - File path: `/uploaded_file.log`
143
+
144
+ Generate Python code to analyze this file and provide comprehensive insights.
145
  """
146
+
147
+ # Run the analysis
148
+ result_parts = []
149
+ async for typ, chunk in agent.astream(
150
+ {"messages": query},
151
+ stream_mode=["values", "messages"],
152
+ ):
153
+ if typ == "messages":
154
+ result_parts.append(chunk[0].content)
155
+ elif typ == "values":
156
+ if chunk and "messages" in chunk:
157
+ final_message = chunk["messages"][-1]
158
+ if hasattr(final_message, 'content'):
159
+ result_parts.append(f"\n\n**Final Analysis:**\n{final_message.content}")
160
+
161
+ return "\n".join(result_parts) if result_parts else "Analysis completed but no output generated."
162
+
163
+ except Exception as e:
164
+ return f"❌ Error analyzing file: {str(e)}"
165
 
166
+ def run_file_analysis():
167
+ """Wrapper to run async file analysis in sync context"""
168
+ return asyncio.run(analyze_uploaded_file())
169
+
170
+ # Create the Gradio interface
171
+ with gr.Blocks(title="DataForge - AI Assistant with File Analysis") as demo:
172
+ gr.Markdown("# πŸ” DataForge - AI Assistant with File Analysis")
173
+ gr.Markdown("Upload files for analysis or chat with the AI assistant.")
174
+
175
+ with gr.Tab("πŸ’¬ Chat Assistant"):
176
+ chat_interface = gr.ChatInterface(
177
+ respond,
178
+ additional_inputs=[
179
+ gr.Textbox(
180
+ value="You are a helpful AI assistant. Be friendly, informative, and concise in your responses.",
181
+ label="System message"
182
+ ),
183
+ gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
184
+ gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
185
+ gr.Slider(
186
+ minimum=0.1,
187
+ maximum=1.0,
188
+ value=0.95,
189
+ step=0.05,
190
+ label="Top-p (nucleus sampling)",
191
+ ),
192
+ ],
193
+ title="Chat with AI Assistant",
194
+ description="Ask questions or get help with any topic."
195
+ )
196
+
197
+ with gr.Tab("πŸ“ File Analysis"):
198
+ gr.Markdown("## Upload and Analyze Files")
199
+ gr.Markdown("Upload log files, text files, or other data files for comprehensive AI-powered analysis.")
200
+
201
+ with gr.Row():
202
+ with gr.Column(scale=1):
203
+ file_upload = gr.File(
204
+ label="Upload File for Analysis",
205
+ file_types=[".txt", ".log", ".csv", ".json", ".xml", ".py", ".js", ".html", ".md"],
206
+ type="filepath"
207
+ )
208
+ upload_status = gr.Textbox(
209
+ label="Upload Status",
210
+ value="No file uploaded",
211
+ interactive=False
212
+ )
213
+ analyze_btn = gr.Button("πŸ” Analyze File", variant="primary", size="lg")
214
+
215
+ with gr.Column(scale=2):
216
+ analysis_output = gr.Textbox(
217
+ label="Analysis Results",
218
+ lines=20,
219
+ max_lines=30,
220
+ placeholder="Upload a file and click 'Analyze File' to see detailed analysis results here...",
221
+ interactive=False
222
+ )
223
+
224
+ # Event handlers
225
+ file_upload.change(
226
+ fn=handle_file_upload,
227
+ inputs=[file_upload],
228
+ outputs=[upload_status]
229
+ )
230
+
231
+ analyze_btn.click(
232
+ fn=run_file_analysis,
233
+ inputs=[],
234
+ outputs=[analysis_output]
235
+ )
236
 
237
  if __name__ == "__main__":
238
  demo.launch()
requirements.txt CHANGED
@@ -6,6 +6,7 @@ charset-normalizer
6
  distro
7
  dotenv
8
  e2b-code-interpreter
 
9
  h11
10
  httpcore
11
  httpx
 
6
  distro
7
  dotenv
8
  e2b-code-interpreter
9
+ gradio
10
  h11
11
  httpcore
12
  httpx
sample_server.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2024-01-15 08:23:45 [INFO] user_login: user=john_doe, ip=192.168.1.100, success=true, session_id=abc123
2
+ 2024-01-15 08:24:12 [INFO] api_request: endpoint=/api/users, method=GET, user=john_doe, response_time=45ms, status=200
3
+ 2024-01-15 08:24:15 [INFO] api_request: endpoint=/api/dashboard, method=GET, user=john_doe, response_time=120ms, status=200
4
+ 2024-01-15 08:25:33 [INFO] user_login: user=alice_smith, ip=192.168.1.101, success=true, session_id=def456
5
+ 2024-01-15 08:26:01 [ERROR] database_connection: host=db-primary, error=timeout, duration=30s, query=SELECT * FROM users
6
+ 2024-01-15 08:26:45 [INFO] api_request: endpoint=/api/products, method=GET, user=alice_smith, response_time=2300ms, status=200
7
+ 2024-01-15 08:27:22 [WARN] failed_login: user=admin, ip=203.45.67.89, attempts=3, reason=invalid_password
8
+ 2024-01-15 08:28:11 [INFO] user_logout: user=john_doe, session_duration=4m26s, pages_visited=5
9
+ 2024-01-15 08:29:33 [CRITICAL] security_alert: suspicious_activity, ip=185.234.72.19, pattern=sql_injection_attempt, endpoint=/api/users
10
+ 2024-01-15 08:30:15 [ERROR] api_request: endpoint=/api/orders, method=POST, user=alice_smith, response_time=timeout, status=500, error=database_unavailable
11
+ 2024-01-15 08:31:45 [WARN] rate_limit: ip=203.45.67.89, endpoint=/api/login, requests_per_minute=25, limit=10
12
+ 2024-01-15 08:32:12 [INFO] user_login: user=bob_wilson, ip=192.168.1.102, success=true, session_id=ghi789
13
+ 2024-01-15 08:33:28 [CRITICAL] security_alert: brute_force_attack, ip=203.45.67.89, attempts=15, duration=5m, blocked=true
14
+ 2024-01-15 08:34:55 [INFO] api_request: endpoint=/api/reports, method=GET, user=bob_wilson, response_time=850ms, status=200
15
+ 2024-01-15 08:35:21 [ERROR] memory_usage: process=web-server, usage=85%, threshold=80%, action=alert_sent
16
+ 2024-01-15 08:36:47 [INFO] backup_completed: database=primary, size=2.3GB, duration=45m, status=success
17
+ 2024-01-15 08:37:15 [WARN] disk_space: partition=/data, usage=92%, available=800MB, threshold=90%
18
+ 2024-01-15 08:38:33 [CRITICAL] security_alert: suspicious_activity, ip=185.234.72.19, pattern=xss_attempt, endpoint=/api/comments
19
+ 2024-01-15 08:39:12 [INFO] user_login: user=carol_davis, ip=192.168.1.103, success=true, session_id=jkl012
20
+ 2024-01-15 08:40:28 [ERROR] external_api: service=payment_gateway, endpoint=charge, response_time=timeout, status=503, retry_attempt=3
21
+ 2024-01-15 08:41:15 [INFO] api_request: endpoint=/api/analytics, method=GET, user=carol_davis, response_time=1200ms, status=200
22
+ 2024-01-15 08:42:33 [WARN] slow_query: query=SELECT * FROM orders WHERE date > '2024-01', duration=5.2s, threshold=2s
23
+ 2024-01-15 08:43:47 [INFO] cache_hit: key=user_preferences_john_doe, hit_rate=89%, response_time=5ms
24
+ 2024-01-15 08:44:12 [CRITICAL] system_alert: cpu_usage=95%, memory_usage=88%, load_average=4.2, action=scaling_triggered
25
+ 2024-01-15 08:45:28 [INFO] user_logout: user=alice_smith, session_duration=19m55s, pages_visited=12