Spaces:

Aedelon
/

GAIA_Agent

Sleeping

App Files Files Community

Delanoe Pirard commited on May 12

Commit

4c353e9

1 Parent(s): 9d75f65

Ajout du binaire stockfish via Git LFS

Browse files

Files changed (3) hide show

agents/code_agent.py +63 -13
app.py +1 -1
prompts/code_gen_prompt.txt +37 -42

agents/code_agent.py CHANGED Viewed

@@ -7,6 +7,9 @@ from llama_index.llms.google_genai import GoogleGenAI
 from llama_index.llms.openai import OpenAI
 from llama_index.tools.code_interpreter import CodeInterpreterToolSpec
 # Setup logging
 logger = logging.getLogger(__name__)
@@ -59,12 +62,14 @@ def generate_python_code(prompt: str) -> str:
     gen_prompt_template = load_prompt_from_file("../prompts/code_gen_prompt.txt", default_gen_prompt_template)
     input_prompt = gen_prompt_template.format(prompt=prompt)
     try:
         llm = OpenAI(
             model=gen_llm_model,
             api_key=gen_api_key,
             reasoning_effort="high",
-            temperature=0.05,
             max_tokens=16384
         )
         logger.info(f"Using code generation LLM: {gen_llm_model}")
@@ -127,22 +132,65 @@ def initialize_code_agent() -> ReActAgent:
         llm = GoogleGenAI(
             api_key=gemini_api_key,
             model=agent_llm_model,
-            temperature=0.05
         )
         logger.info(f"Using agent LLM: {agent_llm_model}")
         # Load system prompt (consider loading from file)
         default_system_prompt = """\
-        You are CodeAgent, a specialist in generating and executing Python code. Your mission:
-        1. **Thought**: Think step-by-step before acting and state your reasoning.
-        2. **Code Generation**: To produce code, call `python_code_generator` with a concise, unambiguous prompt. Review the generated code for correctness and safety.
-        3. **Execution & Testing**: To execute or test code, call `code_interpreter`. Provide the complete code snippet. Analyze its output (stdout, stderr, result) to verify functionality and debug errors.
-        4. **Iteration**: If execution fails or the result is incorrect, analyze the error, think about the fix, generate corrected code using `python_code_generator`, and execute again using `code_interpreter`.
-        5. **Tool Use**: Always adhere strictly to each tool’s input/output format.
-        6. **Final Output**: Once the code works correctly and achieves the goal, output *only* the final functional code or the final execution result, as appropriate for the task.
-        7. **Hand-Off**: If further logical reasoning or verification is needed, delegate to **reasoning_agent**. Otherwise, pass your final output to **planner_agent** for synthesis.
-        """
         system_prompt = load_prompt_from_file("code_agent_system_prompt.txt", default_system_prompt)
         agent = ReActAgent(
@@ -179,7 +227,9 @@ def initialize_code_agent() -> ReActAgent:
                 "- stockfish==3.28.0      : UCI interface to Stockfish chess engine\n"
                 "- sympy>=1.14.0          : Symbolic math, algebra, calculus CAS\n"
                 "- youtube-transcript-api>=1.0.3 : Fetch YouTube video transcripts via API\n"
-                "- yt-dlp>=2025.3.31      : Download videos/playlists from YouTube and other sites\n"
             ),
             # REMOVED: code_execute_fn - Execution is handled by the code_interpreter tool via the agent loop.
             tools=[

 from llama_index.llms.openai import OpenAI
 from llama_index.tools.code_interpreter import CodeInterpreterToolSpec
+import dotenv
+dotenv.load_dotenv()
 # Setup logging
 logger = logging.getLogger(__name__)
     gen_prompt_template = load_prompt_from_file("../prompts/code_gen_prompt.txt", default_gen_prompt_template)
     input_prompt = gen_prompt_template.format(prompt=prompt)
+    print(gen_prompt_template)
     try:
         llm = OpenAI(
             model=gen_llm_model,
             api_key=gen_api_key,
             reasoning_effort="high",
+            temperature=0.0,
             max_tokens=16384
         )
         logger.info(f"Using code generation LLM: {gen_llm_model}")
         llm = GoogleGenAI(
             api_key=gemini_api_key,
             model=agent_llm_model,
+            temperature=0.0
         )
         logger.info(f"Using agent LLM: {agent_llm_model}")
         # Load system prompt (consider loading from file)
         default_system_prompt = """\
+            You are CodeAgent, a specialist in generating and executing Python code. Your mission:
+            1. **Thought**: Think step-by-step before acting and state your reasoning.
+            2. **Code Generation**: To produce code, call `python_code_generator` with a concise, unambiguous prompt. Review the generated code for correctness and safety.
+            3. **Execution & Testing**: To execute or test code, call `code_interpreter`. Provide the complete code snippet. Analyze its output (stdout, stderr, result) to verify functionality and debug errors.
+            4. **Iteration**: If execution fails or the result is incorrect, analyze the error, think about the fix, generate corrected code using `python_code_generator`, and execute again using `code_interpreter`.
+            5. **Tool Use**: Always adhere strictly to each tool’s input/output format.
+            6. **Final Output**: Once the code works correctly and achieves the goal, output *only* the final functional code or the final execution result, as appropriate for the task.
+            7. **Hand-Off**: If further logical reasoning or verification is needed, delegate to **reasoning_agent**. Otherwise, pass your final output to **planner_agent** for synthesis.
+            **Special Instructions for Chess-Related Tasks**:
+            - Prioritize using the Stockfish engine to solve chess problems.
+            - The Stockfish engine executable is located at `./stockfish`.
+            - To initialize Stockfish in code, use:
+                from stockfish import Stockfish
+                stockfish = Stockfish(path="./stockfish") - The Stockfish engine executable is located at `./stockfish` or the key "STOCKFISH_PATH" registered the path to the executable in the environment variables.
+            - Use `python-chess` to represent boards, generate and validate moves, and parse PGN/FEN.
+            **Available Python Packages**:
+            - beautifulsoup4: HTML/XML parsing and lightweight web scraping
+            - certifi: Mozilla CA bundle for secure TLS/SSL requests
+            - datasets: Hugging Face dataset loading and streaming
+            - dotenv: Load environment variables from .env files
+            - duckdb: In‑process OLAP SQL engine (analytics, Parquet, Arrow)
+            - ffmpeg-python: Wrapper around FFmpeg for audio/video operations
+            - gradio[oauth]: Rapid web‑UI prototyping with optional OAuth
+            - helium: High‑level Selenium / browser automation toolkit
+            - huggingface: Interact with Hugging Face Hub models, datasets, spaces
+            - imageio: Read and write images, GIFs, MP4s, volumes, etc.
+            - matplotlib: 2‑D plotting (figures, axes, annotations)
+            - numpy: N‑dimensional arrays and vectorized math
+            - openai-whisper: Speech‑to‑text transcription
+            - opencv-python: Computer vision, image/video processing
+            - openpyxl: Excel .xlsx read/write, styles, formulas
+            - pandas: DataFrames, time series, CSV/Parquet I/O
+            - pyarrow: Apache Arrow tables, Parquet, Flight RPC
+            - pygame: Simple 2‑D game/graphics engine (SDL based)
+            - python-chess: Chess move generation, PGN/FEN handling, engine UCI integration
+            - requests: HTTP/HTTPS client with sessions and retries
+            - scikit-learn: Machine‑learning algorithms, preprocessing, pipelines
+            - scipy: Scientific computing, optimization, signal processing
+            - seaborn: Statistical visualization on top of matplotlib
+            - sqlalchemy: SQL ORM and core engine for many databases
+            - statsmodels: Econometrics and statistical modeling (GLM, ARIMA)
+            - stockfish: UCI interface to Stockfish chess engine
+            - sympy: Symbolic math, algebra, calculus CAS
+            - youtube-transcript-api: Fetch YouTube video transcripts via API
+            - yt-dlp: Download videos/playlists from YouTube and other sites
+            """
         system_prompt = load_prompt_from_file("code_agent_system_prompt.txt", default_system_prompt)
         agent = ReActAgent(
                 "- stockfish==3.28.0      : UCI interface to Stockfish chess engine\n"
                 "- sympy>=1.14.0          : Symbolic math, algebra, calculus CAS\n"
                 "- youtube-transcript-api>=1.0.3 : Fetch YouTube video transcripts via API\n"
+                "- yt-dlp>=2025.3.31      : Download videos/playlists from YouTube and other sites\n\n"
+                "Additionally, the `stockfish` package enables the agent to solve chess problems by analyzing positions, "
+                "identifying tactical motifs, and calculating optimal move sequences, making it a valuable tool for chess training and analysis."
             ),
             # REMOVED: code_execute_fn - Execution is handled by the code_interpreter tool via the agent loop.
             tools=[

app.py CHANGED Viewed

@@ -387,7 +387,7 @@ async def run_and_submit_all( profile: gr.OAuthProfile | None):
         return "Failed to fetch questions.", None
     # 3. Process Questions
-    # questions_data = [questions_data[6]]
     for item in questions_data:
         answers = await process_question(agent, item, fetch_file_url)
         results_log.append(answers)

         return "Failed to fetch questions.", None
     # 3. Process Questions
+    questions_data = [questions_data[3]]
     for item in questions_data:
         answers = await process_question(agent, item, fetch_file_url)
         results_log.append(answers)

prompts/code_gen_prompt.txt CHANGED Viewed

@@ -8,49 +8,44 @@ You are CodeAgent, a specialist in generating and executing Python code. Your mi
 6. **Final Output**: Once the code works correctly and achieves the goal, output *only* the final functional code or the final execution result, as appropriate for the task.
 7. **Hand-Off**: If further logical reasoning or verification is needed, delegate to **reasoning_agent**. Otherwise, pass your final output to **planner_agent** for synthesis.
-You are also a helpful assistant that writes Python code.
-You will be given a prompt and you must generate Python code based on that prompt.
-You must only generate Python code and nothing else.
-Do not include any explanations or any other text.
-Do not use any markdown.
-Notes:
-    - The generated code may be complex; it is recommended to review and test
-      it before execution.
-    - This function only generates code and does not execute it.
-    - The following Python packages are available in the environment:
-        beautifulsoup4>=4.13.4,
-        certifi>=2025.4.26,
-        datasets>=3.5.1,
-        dotenv>=0.9.9,
-        duckdb>=1.2.2,
-        ffmpeg-python>=0.2.0,
-        gradio[oauth]>=5.28.0,
-        helium>=5.1.1,
-        huggingface>=0.0.1,
-        imageio>=2.37.0,
-        matplotlib>=3.10.1,
-        numpy>=2.2.5,
-        openai-whisper>=20240930,
-        opencv-python>=4.11.0.86,
-        openpyxl>=3.1.5,
-        pandas>=2.2.3,
-        pyarrow>=20.0.0,
-        pygame>=2.6.1,
-        python-chess>=1.999,
-        requests>=2.32.3,
-        scikit-learn>=1.6.1,
-        scipy>=1.15.2,
-        seaborn>=0.13.2,
-        sqlalchemy>=2.0.40,
-        statsmodels>=0.14.4,
-        stockfish==3.28.0,
-        sympy>=1.14.0,
-        youtube-transcript-api>=1.0.3,
-        yt-dlp>=2025.3.31
-If your response exceeds the maximum token limit and cannot be completed in a single reply, please conclude your output with the marker [CONTINUE]. In subsequent interactions, I will prompt you with “continue” to receive the next portion of the response.
-Prompt: {prompt}
-Code:

 6. **Final Output**: Once the code works correctly and achieves the goal, output *only* the final functional code or the final execution result, as appropriate for the task.
 7. **Hand-Off**: If further logical reasoning or verification is needed, delegate to **reasoning_agent**. Otherwise, pass your final output to **planner_agent** for synthesis.
+**Special Instructions for Chess-Related Tasks**:
+- Prioritize using the Stockfish engine to solve chess problems.
+- The Stockfish engine executable is located at `stockfish` or the key "STOCKFISH_PATH" registered the path to the executable in the environment variables.
+- To initialize Stockfish in code, use:
+    from stockfish import Stockfish
+    stockfish = Stockfish(path="stockfish")
+- Use `python-chess` to represent boards, generate and validate moves, and parse PGN/FEN.
+**Available Python Packages**:
+- beautifulsoup4: HTML/XML parsing and lightweight web scraping
+- certifi: Mozilla CA bundle for secure TLS/SSL requests
+- datasets: Hugging Face dataset loading and streaming
+- dotenv: Load environment variables from .env files
+- duckdb: In‑process OLAP SQL engine (analytics, Parquet, Arrow)
+- ffmpeg-python: Wrapper around FFmpeg for audio/video operations
+- gradio[oauth]: Rapid web‑UI prototyping with optional OAuth
+- helium: High‑level Selenium / browser automation toolkit
+- huggingface: Interact with Hugging Face Hub models, datasets, spaces
+- imageio: Read and write images, GIFs, MP4s, volumes, etc.
+- matplotlib: 2‑D plotting (figures, axes, annotations)
+- numpy: N‑dimensional arrays and vectorized math
+- openai-whisper: Speech‑to‑text transcription
+- opencv-python: Computer vision, image/video processing
+- openpyxl: Excel .xlsx read/write, styles, formulas
+- pandas: DataFrames, time series, CSV/Parquet I/O
+- pyarrow: Apache Arrow tables, Parquet, Flight RPC
+- pygame: Simple 2‑D game/graphics engine (SDL based)
+- python-chess: Chess move generation, PGN/FEN handling, engine UCI integration
+- requests: HTTP/HTTPS client with sessions and retries
+- scikit-learn: Machine‑learning algorithms, preprocessing, pipelines
+- scipy: Scientific computing, optimization, signal processing
+- seaborn: Statistical visualization on top of matplotlib
+- sqlalchemy: SQL ORM and core engine for many databases
+- statsmodels: Econometrics and statistical modeling (GLM, ARIMA)
+- stockfish: UCI interface to Stockfish chess engine
+- sympy: Symbolic math, algebra, calculus CAS
+- youtube-transcript-api: Fetch YouTube video transcripts via API
+- yt-dlp: Download videos/playlists from YouTube and other sites