Spaces:
Running
Running
File size: 5,644 Bytes
27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef 27f8cfc 62cf4ef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
from components.sidebar import ssf_sidebar
from constants import DEFAULT_TOOLS
import streamlit as st
import asyncio
import nest_asyncio
from services.agent import (
configure_agent,
display_evaluation_results,
display_output,
evaluate_agent,
run_agent,
)
nest_asyncio.apply()
# Set page config
st.set_page_config(page_title="Surf Spot Finder", page_icon="π", layout="wide")
# Allow a user to resize the sidebar to take up most of the screen to make editing eval cases easier
st.markdown(
"""
<style>
/* When sidebar is expanded, adjust main content */
section[data-testid="stSidebar"][aria-expanded="true"] {
max-width: 99% !important;
}
</style>
""",
unsafe_allow_html=True,
)
with st.sidebar:
user_inputs = ssf_sidebar()
is_valid = user_inputs is not None
run_button = st.button("Run Agent π€", disabled=not is_valid, type="primary")
# Main content
async def main():
# Handle agent execution button click
if run_button:
agent, agent_config = await configure_agent(user_inputs)
agent_trace, execution_time = await run_agent(agent, agent_config)
await display_output(agent_trace, execution_time)
evaluation_result = await evaluate_agent(agent_config, agent_trace)
await display_evaluation_results(evaluation_result)
else:
st.title("π Surf Spot Finder")
st.markdown(
"Find the best surfing spots based on your location and preferences! [Github Repo](https://github.com/mozilla-ai/surf-spot-finder)"
)
st.info(
"π Configure your search parameters in the sidebar and click Run to start!"
)
# Display tools in a more organized way
st.markdown("### π οΈ Available Tools")
st.markdown("""
The AI Agent built for this project has a few tools available for use in order to find the perfect surf spot.
The agent is given the freedom to use (or not use) these tools in order to accomplish the task.
""")
weather_tools = [
tool
for tool in DEFAULT_TOOLS
if "forecast" in tool.__name__ or "weather" in tool.__name__
]
for tool in weather_tools:
with st.expander(f"π€οΈ {tool.__name__}"):
st.markdown(tool.__doc__ or "No description available")
location_tools = [
tool
for tool in DEFAULT_TOOLS
if "lat" in tool.__name__
or "lon" in tool.__name__
or "area" in tool.__name__
]
for tool in location_tools:
with st.expander(f"π {tool.__name__}"):
st.markdown(tool.__doc__ or "No description available")
web_tools = [
tool
for tool in DEFAULT_TOOLS
if "web" in tool.__name__ or "search" in tool.__name__
]
for tool in web_tools:
with st.expander(f"π {tool.__name__}"):
st.markdown(tool.__doc__ or "No description available")
# add a check that all tools were listed
if len(weather_tools) + len(location_tools) + len(web_tools) != len(
DEFAULT_TOOLS
):
st.warning(
"Some tools are not listed. Please check the code for more details."
)
# Add Custom Evaluation explanation section
st.markdown("### π Custom Evaluation")
st.markdown("""
The Surf Spot Finder includes a powerful evaluation system that allows you to customize how the agent's performance is assessed.
You can find these settings in the sidebar under the "Custom Evaluation" expander.
""")
with st.expander("Learn more about Custom Evaluation"):
st.markdown("""
#### What is Custom Evaluation?
The Custom Evaluation feature uses an LLM-as-a-Judge approach to evaluate how well the agent performs its task.
An LLM will be given the complete agent trace (not just the final answer), and will assess the agent's performance based on the criteria you set.
You can customize:
- **Evaluation Model**: Choose which LLM should act as the judge
- **Evaluation Criteria**: Define specific checkpoints that the agent should meet
- **Scoring System**: Assign points to each criterion
#### How to Use Custom Evaluation
1. **Select an Evaluation Model**: Choose which LLM you want to use as the judge
2. **Edit Checkpoints**: Use the data editor to:
- Add new evaluation criteria
- Modify existing criteria
- Adjust point values
- Remove criteria you don't want to evaluate
#### Example Criteria
You can evaluate things like:
- Tool usage and success
- Order of operations
- Quality of final recommendations
- Response completeness
- Number of steps taken
#### Tips for Creating Good Evaluation Criteria
- Be specific about what you want to evaluate
- Use clear, unambiguous language
- Consider both process (how the agent works) and outcome (what it produces)
- Assign appropriate point values based on importance
The evaluation results will be displayed after each agent run, showing how well the agent met your custom criteria.
""")
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
|