Spaces:

dkondic
/

data-analyst

Running

App Files Files Community

Dacho688 commited on Aug 1, 2024

Commit

e89ef0e

1 Parent(s): 49099ea

App Updates

Browse files

- improve base prompt
- include an example

Files changed (10) hide show

__pycache__/streaming.cpython-312.pyc +0 -0
__pycache__/test_streaming.cpython-312.pyc +0 -0
__pycache__/test_streaming.cpython-39.pyc +0 -0
app.py +38 -20
figures/classification_report.png +0 -0
figures/confusion_matrix.png +0 -0
figures/fare_sex_boxplot.png +0 -0
requirements.txt +1 -1
test_app.py +0 -134
test_streaming.py +0 -64

__pycache__/streaming.cpython-312.pyc ADDED Viewed

Binary file (3.43 kB). View file

__pycache__/test_streaming.cpython-312.pyc ADDED Viewed

Binary file (3.43 kB). View file

__pycache__/test_streaming.cpython-39.pyc ADDED Viewed

Binary file (2.1 kB). View file

app.py CHANGED Viewed

@@ -16,7 +16,7 @@ llm_engine = HfEngine("meta-llama/Meta-Llama-3.1-70B-Instruct")
 agent = ReactCodeAgent(
     tools=[],
     llm_engine=llm_engine,
-    additional_authorized_imports=["numpy", "pandas", "matplotlib", "seaborn","scipy"],
     max_iterations=10,
 )
@@ -24,13 +24,19 @@ base_prompt = """You are an expert full stack data analyst.
 You are given a data file and the data structure below.
 The data file is passed to you as the variable data_file, it is a pandas dataframe, you can use it directly.
 DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
-When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear each figure with plt.clf() before doing another plot.
-When filtering pandas dataframe use the iloc.
-When importing packages use this format: from package import module
-For example: from matplotlib import pyplot as plt
-Not: import matplotlib.pyplot as plt
-Use the data file to answer the question or solve a problem given below.
 Structure of the data:
 {structure_notes}
@@ -39,7 +45,7 @@ Question/Problem:
 """
 example_notes="""This data is about the Titanic wreck in 1912.
-The target figure is the survival of passengers, notes by 'Survived'
 pclass: A proxy for socio-economic status (SES)
 1st = Upper
 2nd = Middle
@@ -51,7 +57,9 @@ Spouse = husband, wife (mistresses and fiancés were ignored)
 parch: The dataset defines family relations in this way...
 Parent = mother, father
 Child = daughter, son, stepdaughter, stepson
-Some children travelled only with a nanny, therefore parch=0 for them."""
 def get_images_in_directory(directory):
     image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
@@ -106,13 +114,22 @@ with gr.Blocks(
         secondary_hue=gr.themes.colors.yellow,
     )
 ) as demo:
-    gr.Markdown("""# Llama-3.1 Data analyst 📊🤔
-Drop a `.csv` file below and ask a question about your data.
-**Llama-3.1-70B will analyze and answer.**""")
-    file_input = gr.File(label="Your file to analyze")
     text_input = gr.Textbox(
-        label="Ask a question about your data?"
     )
     submit = gr.Button("Run", variant="primary")
     chatbot = gr.Chatbot(
@@ -123,11 +140,12 @@ Drop a `.csv` file below and ask a question about your data.
             "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
         ),
     )
-    # gr.Examples(
-    #     examples=[["./example/titanic.csv", example_notes]],
-    #     inputs=[file_input, text_input],
-    #     cache_examples=False
-    # )
     submit.click(interact_with_agent, [file_input, text_input], [chatbot])

 agent = ReactCodeAgent(
     tools=[],
     llm_engine=llm_engine,
+    additional_authorized_imports=["numpy", "pandas", "matplotlib", "seaborn","scipy","sklearn"],
     max_iterations=10,
 )
 You are given a data file and the data structure below.
 The data file is passed to you as the variable data_file, it is a pandas dataframe, you can use it directly.
 DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
+When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear
+each figure with plt.clf() before doing another plot.
+When plotting make the plots as pretty as possible given your tools. Same with tables, charts, or anything else.
+In your final answer: summarize your findings and steps taken.
+After each number derive real worlds insights, for instance: "Correlation between is_december and boredness is 1.3453, which suggest people are more bored in winter".
+Your final answer should be a long string with at least 4 numbered and detailed parts:
+    1. Summary of Question/Problem
+    2. Summary of Actions
+    3. Summary of Findings
+    3. Potential Next Steps
+Use the data file to answer the question or perform a task below.
 Structure of the data:
 {structure_notes}
 """
 example_notes="""This data is about the Titanic wreck in 1912.
+The target variable is the survival of passengers, noted by 'Survived'
 pclass: A proxy for socio-economic status (SES)
 1st = Upper
 2nd = Middle
 parch: The dataset defines family relations in this way...
 Parent = mother, father
 Child = daughter, son, stepdaughter, stepson
+Some children travelled only with a nanny, therefore parch=0 for them.
+Run a logistic regression."""
 def get_images_in_directory(directory):
     image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
         secondary_hue=gr.themes.colors.yellow,
     )
 ) as demo:
+    gr.Markdown("""# Data Analyst (ReAct Code Agent) 📊🤔
+**Who am I?**
+I'm your personal Data Analyst built on top of Llama-3.1-70B and the ReAct agent framework.
+I break down your task step-by-step until I reach an answer/solution.
+Along the way I share my thoughts, actions (Python code blobs), and observations.
+I come packed with pandas, numpy, sklearn, matplotlib, seaborn, and more!
+**Instructions**
+1. Drop or upload a `.csv` file below.
+2. Ask a question or give it a task.
+3. **Watch Llama-3.1-70B think, act, and observe until final answer.
+\n**For an example, click on the example at the bottom of page to auto populate.**""")
+    file_input = gr.File(label="Drop/upload a .csv file to analyze")
     text_input = gr.Textbox(
+        label="Ask a question or give it a task."
     )
     submit = gr.Button("Run", variant="primary")
     chatbot = gr.Chatbot(
             "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
         ),
     )
+    gr.Examples(
+        examples=[["./example/titanic.csv", example_notes]],
+        inputs=[file_input, text_input],
+        cache_examples=False,
+        label='Click anywhere below to try this example.'
+    )
     submit.click(interact_with_agent, [file_input, text_input], [chatbot])

figures/classification_report.png ADDED Viewed

figures/confusion_matrix.png ADDED Viewed

figures/fare_sex_boxplot.png DELETED Viewed

Binary file (9.84 kB)

requirements.txt CHANGED Viewed

@@ -1,5 +1,5 @@
 git+https://github.com/huggingface/transformers.git#egg=transformers[agents]
 matplotlib
 seaborn
-scikit-learn
 scipy

 git+https://github.com/huggingface/transformers.git#egg=transformers[agents]
 matplotlib
 seaborn
+sklearn
 scipy

test_app.py DELETED Viewed

@@ -1,134 +0,0 @@
-import os
-import shutil
-import gradio as gr
-from transformers import ReactCodeAgent, HfEngine, Tool
-import pandas as pd
-from gradio import Chatbot
-from test_streaming import stream_to_gradio
-from huggingface_hub import login
-from gradio.data_classes import FileData
-#login(os.getenv("HUGGINGFACEHUB_API_TOKEN"))
-llm_engine = HfEngine("meta-llama/Meta-Llama-3.1-70B-Instruct")
-agent = ReactCodeAgent(
-    tools=[],
-    llm_engine=llm_engine,
-    additional_authorized_imports=["numpy", "pandas", "matplotlib", "seaborn","scipy"],
-    max_iterations=10,
-)
-base_prompt = """You are an expert full stack data analyst.
-You are given a data file and the data structure below.
-The data file is passed to you as the variable data_file, it is a pandas dataframe, you can use it directly.
-DO NOT try to load data_file, it is already a dataframe pre-loaded in your python interpreter!
-When plotting using matplotlib/seaborn save the figures to the (already existing) folder'./figures/': take care to clear each figure with plt.clf() before doing another plot.
-When filtering pandas dataframe use the iloc.
-When importing packages use this format: from package import module
-For example: from matplotlib import pyplot as plt
-Not: import matplotlib.pyplot as plt
-Use the data file to answer the question or solve a problem given below.
-Structure of the data:
-{structure_notes}
-Question/Problem:
-"""
-example_notes="""This data is about the Titanic wreck in 1912.
-The target figure is the survival of passengers, notes by 'Survived'
-pclass: A proxy for socio-economic status (SES)
-1st = Upper
-2nd = Middle
-3rd = Lower
-age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
-sibsp: The dataset defines family relations in this way...
-Sibling = brother, sister, stepbrother, stepsister
-Spouse = husband, wife (mistresses and fiancés were ignored)
-parch: The dataset defines family relations in this way...
-Parent = mother, father
-Child = daughter, son, stepdaughter, stepson
-Some children travelled only with a nanny, therefore parch=0 for them."""
-def get_images_in_directory(directory):
-    image_extensions = {'.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff'}
-    image_files = []
-    for root, dirs, files in os.walk(directory):
-        for file in files:
-            if os.path.splitext(file)[1].lower() in image_extensions:
-                image_files.append(os.path.join(root, file))
-    return image_files
-def interact_with_agent(file_input, additional_notes):
-    shutil.rmtree("./figures")
-    os.makedirs("./figures")
-    data_file = pd.read_csv(file_input)
-    data_structure_notes = f"""- Description (output of .describe()):
-    {data_file.describe()}
-    - Columns with dtypes:
-    {data_file.dtypes}"""
-    prompt = base_prompt.format(structure_notes=data_structure_notes)
-    if additional_notes and len(additional_notes) > 0:
-        prompt += additional_notes
-    messages = [gr.ChatMessage(role="user", content=additional_notes)]
-    yield messages + [
-        gr.ChatMessage(role="assistant", content="⏳ _Starting task..._")
-    ]
-    plot_image_paths = {}
-    for msg in stream_to_gradio(agent, prompt, data_file=data_file):
-        messages.append(msg)
-        for image_path in get_images_in_directory("./figures"):
-            if image_path not in plot_image_paths:
-                image_message = gr.ChatMessage(
-                    role="assistant",
-                    content=FileData(path=image_path, mime_type="image/png"),
-                )
-                plot_image_paths[image_path] = True
-                messages.append(image_message)
-        yield messages + [
-            gr.ChatMessage(role="assistant", content="⏳ _Still processing..._")
-        ]
-    yield messages
-with gr.Blocks(
-    theme=gr.themes.Soft(
-        primary_hue=gr.themes.colors.blue,
-        secondary_hue=gr.themes.colors.yellow,
-    )
-) as demo:
-    gr.Markdown("""# Llama-3.1 Data analyst 📊🤔
-Drop a `.csv` file below and ask a question about your data.
-**Llama-3.1-70B will analyze and answer.**""")
-    file_input = gr.File(label="Your file to analyze")
-    text_input = gr.Textbox(
-        label="Ask a question about your data?"
-    )
-    submit = gr.Button("Run", variant="primary")
-    chatbot = gr.Chatbot(
-        label="Data Analyst Agent",
-        type="messages",
-        avatar_images=(
-            None,
-            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
-        ),
-    )
-    # gr.Examples(
-    #     examples=[["./example/titanic.csv", example_notes]],
-    #     inputs=[file_input, text_input],
-    #     cache_examples=False
-    # )
-    submit.click(interact_with_agent, [file_input, text_input], [chatbot])
-if __name__ == "__main__":
-    demo.launch(server_port=7860)

test_streaming.py DELETED Viewed

@@ -1,64 +0,0 @@
-from transformers.agents.agent_types import AgentAudio, AgentImage, AgentText, AgentType
-from transformers.agents import ReactAgent
-def pull_message(step_log: dict):
-    try:
-        from gradio import ChatMessage
-    except ImportError:
-        raise ImportError("Gradio should be installed in order to launch a gradio demo.")
-    if step_log.get("rationale"):
-        yield ChatMessage(role="assistant", content=step_log["rationale"])
-    if step_log.get("tool_call"):
-        used_code = step_log["tool_call"]["tool_name"] == "code interpreter"
-        content = step_log["tool_call"]["tool_arguments"]
-        if used_code:
-            content = f"```py\n{content}\n```"
-        yield ChatMessage(
-            role="assistant",
-            metadata={"title": f"🛠️ Used tool {step_log['tool_call']['tool_name']}"},
-            content=content,
-        )
-    if step_log.get("observation"):
-        yield ChatMessage(role="assistant", content=f"```\n{step_log['observation']}\n```")
-    if step_log.get("error"):
-        yield ChatMessage(
-            role="assistant",
-            content=str(step_log["error"]),
-            metadata={"title": "💥 Error"},
-        )
-def stream_to_gradio(agent: ReactAgent, task: str, **kwargs):
-    """Runs an agent with the given task and streams the messages from the agent as gradio ChatMessages."""
-    try:
-        from gradio import ChatMessage
-    except ImportError:
-        raise ImportError("Gradio should be installed in order to launch a gradio demo.")
-    class Output:
-        output: AgentType | str = None
-    for step_log in agent.run(task, stream=True, **kwargs):
-        if isinstance(step_log, dict):
-            for message in pull_message(step_log):
-                print("message", message)
-                yield message
-    Output.output = step_log
-    if isinstance(Output.output, AgentText):
-        yield ChatMessage(role="assistant", content=f"**Final answer:**\n```\n{Output.output.to_string()}\n```")
-    elif isinstance(Output.output, AgentImage):
-        yield ChatMessage(
-            role="assistant",
-            content={"path": Output.output.to_string(), "mime_type": "image/png"},
-        )
-    elif isinstance(Output.output, AgentAudio):
-        yield ChatMessage(
-            role="assistant",
-            content={"path": Output.output.to_string(), "mime_type": "audio/wav"},
-        )
-    else:
-        yield ChatMessage(role="assistant", content=Output.output)