Spaces:
Build error
Build error
File size: 20,188 Bytes
3382f47 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 |
# AutoGPT Forge: Crafting Intelligent Agent Logic

**By Craig Swift & [Ryan Brandt](https://github.com/paperMoose)**
Hey there! Ready for part 3 of our AutoGPT Forge tutorial series? If you missed the earlier parts, catch up here:
- [Getting Started](001_getting_started.md)
- [Blueprint of an Agent](002_blueprint_of_an_agent.md)
Now, let's get hands-on! We'll use an LLM to power our agent and complete a task. The challenge? Making the agent write "Washington" to a .txt file. We won't give it step-by-step instructions—just the task. Let's see our agent in action and watch it figure out the steps on its own!
## Get Your Smart Agent Project Ready
Make sure you've set up your project and created an agent as described in our initial guide. If you skipped that part, [click here](#) to get started. Once you're done, come back, and we'll move forward.
In the image below, you'll see my "SmartAgent" and the agent.py file inside the 'forge' folder. That's where we'll be adding our LLM-based logic. If you're unsure about the project structure or agent functions from our last guide, don't worry. We'll cover the basics as we go!

---
## The Task Lifecycle
The lifecycle of a task, from its creation to execution, is outlined in the agent protocol. In simple terms: a task is initiated, its steps are systematically executed, and it concludes once completed.
Want your agent to perform an action? Start by dispatching a create_task request. This crucial step involves specifying the task details, much like how you'd send a prompt to ChatGPT, using the input field. If you're giving this a shot on your own, the UI is your best friend; it effortlessly handles all the API calls on your behalf.
When the agent gets this, it runs the create_task function. The code `super().create_task(task_request)` takes care of protocol steps. It then logs the task's start. For this guide, you don't need to change this function.
```python
async def create_task(self, task_request: TaskRequestBody) -> Task:
"""
The agent protocol, which is the core of the Forge, works by creating a task and then
executing steps for that task. This method is called when the agent is asked to create
a task.
We are hooking into function to add a custom log message. Though you can do anything you
want here.
"""
task = await super().create_task(task_request)
LOG.info(
f"📦 Task created: {task.task_id} input: {task.input[:40]}{'...' if len(task.input) > 40 else ''}"
)
return task
```
After starting a task, the `execute_step` function runs until all steps are done. Here's a basic view of `execute_step`. I've left out the detailed comments for simplicity, but you'll find them in your project.
```python
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# An example that
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
self.workspace.write(task_id=task_id, path="output.txt", data=b"Washington D.C")
await self.db.create_artifact(
task_id=task_id,
step_id=step.step_id,
file_name="output.txt",
relative_path="",
agent_created=True,
)
step.output = "Washington D.C"
LOG.info(f"\t✅ Final Step completed: {step.step_id}")
return step
```
Here's the breakdown of the 'write file' process in four steps:
1. **Database Step Creation**: The first stage is all about creating a step within the database, an essential aspect of the agent protocol. You'll observe that while setting up this step, we've flagged it with `is_last=True`. This signals to the agent protocol that no more steps are pending. For the purpose of this guide, let's work under the assumption that our agent will only tackle single-step tasks. However, hang tight for future tutorials, where we'll level up and let the agent determine its completion point.
2. **File Writing**: Next, we pen down "Washington D.C." using the workspace.write function.
3. **Artifact Database Update**: After writing, we record the file in the agent's artifact database.
4. **Step Output & Logging**: Finally, we set the step output to match the file content, log the executed step, and use the step object.
With the 'write file' process clear, let's make our agent smarter and more autonomous. Ready to dive in?
---
## Building the Foundations For Our Smart Agent
First, we need to update the `execute_step()` function. Instead of a fixed solution, it should use the given request.
To do this, we'll fetch the task details using the provided `task_id`:
```python
task = await self.db.get_task(task_id)
```
Next, remember to create a database record and mark it as a single-step task with `is_last=True`:
```python
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
```
Your updated `execute_step` function will look like this:
```python
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# Get the task details
task = await self.db.get_task(task_id)
# Add a new step to the database
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
return step
```
Now that we've set this up, let's move to the next exciting part: The PromptEngine.
---
**The Art of Prompting**

Prompting is like shaping messages for powerful language models like ChatGPT. Since these models respond to input details, creating the right prompt can be a challenge. That's where the **PromptEngine** comes in.
The "PromptEngine" helps you store prompts in text files, specifically in Jinja2 templates. This means you can change the prompts without changing the code. It also lets you adjust prompts for different LLMs. Here's how to use it:
First, add the PromptEngine from the SDK:
```python
from .sdk import PromptEngine
```
In your `execute_step` function, set up the engine for the `gpt-3.5-turbo` LLM:
```python
prompt_engine = PromptEngine("gpt-3.5-turbo")
```
Loading a prompt is straightforward. For instance, loading the `system-format` prompt, which dictates the response format from the LLM, is as easy as:
```python
system_prompt = prompt_engine.load_prompt("system-format")
```
For intricate use cases, like the `task-step` prompt which requires parameters, employ the following method:
```python
# Define the task parameters
task_kwargs = {
"task": task.input,
"abilities": self.abilities.list_abilities_for_prompt(),
}
# Load the task prompt with those parameters
task_prompt = prompt_engine.load_prompt("task-step", **task_kwargs)
```
Delving deeper, let's look at the `task-step` prompt template in `prompts/gpt-3.5-turbo/task-step.j2`:
```jinja
{% extends "techniques/expert.j2" %}
{% block expert %}Planner{% endblock %}
{% block prompt %}
Your task is:
{{ task }}
Ensure to respond in the given format. Always make autonomous decisions, devoid of user guidance. Harness the power of your LLM, opting for straightforward tactics sans any legal entanglements.
{% if constraints %}
## Constraints
Operate under these confines:
{% for constraint in constraints %}
- {{ constraint }}
{% endfor %}
{% endif %}
{% if resources %}
## Resources
Utilize these resources:
{% for resource in resources %}
- {{ resource }}
{% endfor %}
{% endif %}
{% if abilities %}
## Abilities
Summon these abilities:
{% for ability in abilities %}
- {{ ability }}
{% endfor %}
{% endif %}
{% if abilities %}
## Abilities
Use these abilities:
{% for ability in abilities %}
- {{ ability }}
{% endfor %}
{% endif %}
{% if best_practices %}
## Best Practices
{% for best_practice in best_practices %}
- {{ best_practice }}
{% endfor %}
{% endif %}
{% endblock %}
```
This template is modular. It uses the `extends` directive to build on the `expert.j2` template. The different sections like constraints, resources, abilities, and best practices make the prompt dynamic. It guides the LLM in understanding the task and using resources and abilities.
The PromptEngine equips us with a potent tool to converse seamlessly with large language models. By externalizing prompts and using templates, we can ensure that our agent remains agile, adapting to new challenges without a code overhaul. As we march forward, keep this foundation in mind—it's the bedrock of our agent's intelligence.
---
## Engaging with your LLM
To make the most of the LLM, you'll send a series of organized instructions, not just one prompt. Structure your prompts as a list of messages for the LLM. Using the `system_prompt` and `task_prompt` from before, create the `messages` list:
```python
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": task_prompt}
]
```
With the prompt set, send it to the LLM. This step involves foundational code, focusing on the `chat_completion_request`. This function gives the LLM your prompt, and then gets the LLM's output. The other code sets up our request and interprets the feedback:
```python
try:
# Set the parameters for the chat completion
chat_completion_kwargs = {
"messages": messages,
"model": "gpt-3.5-turbo",
}
# Get the LLM's response and interpret it
chat_response = await chat_completion_request(**chat_completion_kwargs)
answer = json.loads(chat_response.choices[0].message.content)
# Log the answer for reference
LOG.info(pprint.pformat(answer))
except json.JSONDecodeError as e:
# Handle JSON decoding errors
LOG.error(f"Can't decode chat response: {chat_response}")
except Exception as e:
# Handle other errors
LOG.error(f"Can't get chat response: {e}")
```
Extracting clear messages from LLM outputs can be complex. Our method is simple and works with GPT-3.5 and GPT-4. Future guides will show more ways to interpret LLM outputs. The goal? To go beyond JSON, as some LLMs work best with other response types. Stay tuned!
---
## Using and Creating Abilities
Abilities are the gears and levers that enable the agent to interact with tasks at hand. Let's unpack the mechanisms behind these abilities and how you can harness, and even extend, them.
In the Forge folder, there's a `actions` folder containing `registry.py`, `finish.py`, and a `file_system` subfolder. You can also add your own abilities here. `registry.py` is the main file for abilities. It contains the `@action` decorator and the `ActionRegister` class. This class actively tracks abilities and outlines their function. The base Agent class includes a default Action register available via `self.abilities`. It looks like this:
```python
self.abilities = ActionRegister(self)
```
The `ActionRegister` has two key methods. `list_abilities_for_prompt` prepares abilities for prompts. `run_action` makes the ability work. An ability is a function with the `@action` decorator. It must have specific parameters, including the agent and `task_id`.
```python
@action(
name="write_file",
description="Write data to a file",
parameters=[
{
"name": "file_path",
"description": "Path to the file",
"type": "string",
"required": True,
},
{
"name": "data",
"description": "Data to write to the file",
"type": "bytes",
"required": True,
},
],
output_type="None",
)
async def write_file(agent, task_id: str, file_path: str, data: bytes) -> None:
pass
```
The `@action` decorator defines the ability's details, like its identity (name), functionality (description), and operational parameters.
## Example of a Custom Ability: Webpage Fetcher
```python
import requests
@action(
name="fetch_webpage",
description="Retrieve the content of a webpage",
parameters=[
{
"name": "url",
"description": "Webpage URL",
"type": "string",
"required": True,
}
],
output_type="string",
)
async def fetch_webpage(agent, task_id: str, url: str) -> str:
response = requests.get(url)
return response.text
```
This ability, `fetch_webpage`, accepts a URL as input and returns the HTML content of the webpage as a string. Custom abilities let you add more features to your agent. They can integrate other tools and libraries to enhance its functions. To make a custom ability, you need to understand the structure and add technical details. With abilities like "fetch_webpage", your agent can handle complex tasks efficiently.
## Running an Ability
Now that you understand abilities and how to create them, let's use them. The last piece is the `execute_step` function. Our goal is to understand the agent's response, find the ability, and use it.
First, we get the ability details from the agent's answer:
```python
# Extract the ability from the answer
ability = answer["ability"]
```
With the ability details, we use it. We call the `run_ability` function:
```python
# Run the ability and get the output
# We don't actually use the output in this example
output = await self.abilities.run_action(
task_id, ability["name"], **ability["args"]
)
```
Here, we’re invoking the specified ability. The task_id ensures continuity, ability['name'] pinpoints the exact function, and the arguments (ability["args"]) provide necessary context.
Finally, we make the step's output show the agent's thinking:
```python
# Set the step output to the "speak" part of the answer
step.output = answer["thoughts"]["speak"]
# Return the completed step
return step
```
And there you have it! Your first Smart Agent, sculpted with precision and purpose, stands ready to take on challenges. The stage is set. It’s showtime!
Here is what your function should look like:
```python
async def execute_step(self, task_id: str, step_request: StepRequestBody) -> Step:
# Firstly we get the task this step is for so we can access the task input
task = await self.db.get_task(task_id)
# Create a new step in the database
step = await self.db.create_step(
task_id=task_id, input=step_request, is_last=True
)
# Log the message
LOG.info(f"\t✅ Final Step completed: {step.step_id} input: {step.input[:19]}")
# Initialize the PromptEngine with the "gpt-3.5-turbo" model
prompt_engine = PromptEngine("gpt-3.5-turbo")
# Load the system and task prompts
system_prompt = prompt_engine.load_prompt("system-format")
# Initialize the messages list with the system prompt
messages = [
{"role": "system", "content": system_prompt},
]
# Define the task parameters
task_kwargs = {
"task": task.input,
"abilities": self.abilities.list_abilities_for_prompt(),
}
# Load the task prompt with the defined task parameters
task_prompt = prompt_engine.load_prompt("task-step", **task_kwargs)
# Append the task prompt to the messages list
messages.append({"role": "user", "content": task_prompt})
try:
# Define the parameters for the chat completion request
chat_completion_kwargs = {
"messages": messages,
"model": "gpt-3.5-turbo",
}
# Make the chat completion request and parse the response
chat_response = await chat_completion_request(**chat_completion_kwargs)
answer = json.loads(chat_response.choices[0].message.content)
# Log the answer for debugging purposes
LOG.info(pprint.pformat(answer))
except json.JSONDecodeError as e:
# Handle JSON decoding errors
LOG.error(f"Unable to decode chat response: {chat_response}")
except Exception as e:
# Handle other exceptions
LOG.error(f"Unable to generate chat response: {e}")
# Extract the ability from the answer
ability = answer["ability"]
# Run the ability and get the output
# We don't actually use the output in this example
output = await self.abilities.run_action(
task_id, ability["name"], **ability["args"]
)
# Set the step output to the "speak" part of the answer
step.output = answer["thoughts"]["speak"]
# Return the completed step
return step
```
## Interacting with your Agent
> ⚠️ Heads up: The UI and benchmark are still in the oven, so they might be a tad glitchy.
With the heavy lifting of crafting our Smart Agent behind us, it’s high time to see it in action. Kick things off by firing up the agent with this command:
```bash
./run agent start SmartAgent.
```
Once your digital playground is all set, your terminal should light up with:
```bash
d8888 888 .d8888b. 8888888b. 88888888888
d88888 888 d88P Y88b 888 Y88b 888
d88P888 888 888 888 888 888 888
d88P 888 888 888 888888 .d88b. 888 888 d88P 888
d88P 888 888 888 888 d88""88b 888 88888 8888888P" 888
d88P 888 888 888 888 888 888 888 888 888 888
d8888888888 Y88b 888 Y88b. Y88..88P Y88b d88P 888 888
d88P 888 "Y88888 "Y888 "Y88P" "Y8888P88 888 888
8888888888
888
888
8888888 .d88b. 888d888 .d88b. .d88b.
888 d88""88b 888P" d88P"88b d8P Y8b
888 888 888 888 888 888 88888888
888 Y88..88P 888 Y88b 888 Y8b.
888 "Y88P" 888 "Y88888 "Y8888
888
Y8b d88P
"Y88P" v0.2.0
[2023-09-27 15:39:07,832] [forge.sdk.agent] [INFO] 📝 Agent server starting on http://localhost:8000
```
1. **Get Started**
- Click the link to access the AutoGPT Agent UI.
2. **Login**
- Log in using your Gmail or Github credentials.
3. **Navigate to Benchmarking**
- Look to the left, and you'll spot a trophy icon. Click it to enter the benchmarking arena.

4. **Select the 'WriteFile' Test**
- Choose the 'WriteFile' test from the available options.
5. **Initiate the Test Suite**
- Hit 'Initiate test suite' to start the benchmarking process.
6. **Monitor in Real-Time**
- Keep your eyes on the right panel as it displays real-time output.
7. **Check the Console**
- For additional information, you can also monitor your console for progress updates and messages.
```bash
📝 📦 Task created: 70518b75-0104-49b0-923e-f607719d042b input: Write the word 'Washington' to a .txt fi...
📝 ✅ Final Step completed: a736c45f-65a5-4c44-a697-f1d6dcd94d5c input: y
```
If you see this, you've done it!
8. **Troubleshooting**
- If you encounter any issues or see cryptic error messages, don't worry. Just hit the retry button. Remember, LLMs are powerful but may occasionally need some guidance.
## Wrap Up
- Stay tuned for our next tutorial, where we'll enhance the agent's capabilities by adding memory!
## Keep Exploring
- Keep experimenting and pushing the boundaries of AI. Happy coding! 🚀
## Wrap Up
In our next tutorial, we’ll further refine this process, enhancing the agent’s capabilities, through the addition of memory!
Until then, keep experimenting and pushing the boundaries of AI. Happy coding! 🚀
|