Upload 4 files
Browse files
docs/source/en/examples/rag.md
ADDED
@@ -0,0 +1,206 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Agentic RAG
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
## Introduction to Retrieval-Augmented Generation (RAG)
|
6 |
+
|
7 |
+
Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval to produce more accurate, factual, and contextually relevant responses. At its core, RAG is about "using an LLM to answer a user query, but basing the answer on information retrieved from a knowledge base."
|
8 |
+
|
9 |
+
### Why Use RAG?
|
10 |
+
|
11 |
+
RAG offers several significant advantages over using vanilla or fine-tuned LLMs:
|
12 |
+
|
13 |
+
1. **Factual Grounding**: Reduces hallucinations by anchoring responses in retrieved facts
|
14 |
+
2. **Domain Specialization**: Provides domain-specific knowledge without model retraining
|
15 |
+
3. **Knowledge Recency**: Allows access to information beyond the model's training cutoff
|
16 |
+
4. **Transparency**: Enables citation of sources for generated content
|
17 |
+
5. **Control**: Offers fine-grained control over what information the model can access
|
18 |
+
|
19 |
+
### Limitations of Traditional RAG
|
20 |
+
|
21 |
+
Despite its benefits, traditional RAG approaches face several challenges:
|
22 |
+
|
23 |
+
- **Single Retrieval Step**: If the initial retrieval results are poor, the final generation will suffer
|
24 |
+
- **Query-Document Mismatch**: User queries (often questions) may not match well with documents containing answers (often statements)
|
25 |
+
- **Limited Reasoning**: Simple RAG pipelines don't allow for multi-step reasoning or query refinement
|
26 |
+
- **Context Window Constraints**: Retrieved documents must fit within the model's context window
|
27 |
+
|
28 |
+
## Agentic RAG: A More Powerful Approach
|
29 |
+
|
30 |
+
We can overcome these limitations by implementing an **Agentic RAG** system - essentially an agent equipped with retrieval capabilities. This approach transforms RAG from a rigid pipeline into an interactive, reasoning-driven process.
|
31 |
+
|
32 |
+
### Key Benefits of Agentic RAG
|
33 |
+
|
34 |
+
An agent with retrieval tools can:
|
35 |
+
|
36 |
+
1. ✅ **Formulate optimized queries**: The agent can transform user questions into retrieval-friendly queries
|
37 |
+
2. ✅ **Perform multiple retrievals**: The agent can retrieve information iteratively as needed
|
38 |
+
3. ✅ **Reason over retrieved content**: The agent can analyze, synthesize, and draw conclusions from multiple sources
|
39 |
+
4. ✅ **Self-critique and refine**: The agent can evaluate retrieval results and adjust its approach
|
40 |
+
|
41 |
+
This approach naturally implements advanced RAG techniques:
|
42 |
+
- **Hypothetical Document Embedding (HyDE)**: Instead of using the user query directly, the agent formulates retrieval-optimized queries ([paper reference](https://huggingface.co/papers/2212.10496))
|
43 |
+
- **Self-Query Refinement**: The agent can analyze initial results and perform follow-up retrievals with refined queries ([technique reference](https://docs.llamaindex.ai/en/stable/examples/evaluation/RetryQuery/))
|
44 |
+
|
45 |
+
## Building an Agentic RAG System
|
46 |
+
|
47 |
+
Let's build a complete Agentic RAG system step by step. We'll create an agent that can answer questions about the Hugging Face Transformers library by retrieving information from its documentation.
|
48 |
+
|
49 |
+
You can follow along with the code snippets below, or check out the full example in the smolagents GitHub repository: [examples/rag.py](https://github.com/huggingface/smolagents/blob/main/examples/rag.py).
|
50 |
+
|
51 |
+
### Step 1: Install Required Dependencies
|
52 |
+
|
53 |
+
First, we need to install the necessary packages:
|
54 |
+
|
55 |
+
```bash
|
56 |
+
pip install smolagents pandas langchain langchain-community sentence-transformers datasets python-dotenv rank_bm25 --upgrade
|
57 |
+
```
|
58 |
+
|
59 |
+
If you plan to use Hugging Face's Inference API, you'll need to set up your API token:
|
60 |
+
|
61 |
+
```python
|
62 |
+
# Load environment variables (including HF_TOKEN)
|
63 |
+
from dotenv import load_dotenv
|
64 |
+
load_dotenv()
|
65 |
+
```
|
66 |
+
|
67 |
+
### Step 2: Prepare the Knowledge Base
|
68 |
+
|
69 |
+
We'll use a dataset containing Hugging Face documentation and prepare it for retrieval:
|
70 |
+
|
71 |
+
```python
|
72 |
+
import datasets
|
73 |
+
from langchain.docstore.document import Document
|
74 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
75 |
+
from langchain_community.retrievers import BM25Retriever
|
76 |
+
|
77 |
+
# Load the Hugging Face documentation dataset
|
78 |
+
knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")
|
79 |
+
|
80 |
+
# Filter to include only Transformers documentation
|
81 |
+
knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers"))
|
82 |
+
|
83 |
+
# Convert dataset entries to Document objects with metadata
|
84 |
+
source_docs = [
|
85 |
+
Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]})
|
86 |
+
for doc in knowledge_base
|
87 |
+
]
|
88 |
+
|
89 |
+
# Split documents into smaller chunks for better retrieval
|
90 |
+
text_splitter = RecursiveCharacterTextSplitter(
|
91 |
+
chunk_size=500, # Characters per chunk
|
92 |
+
chunk_overlap=50, # Overlap between chunks to maintain context
|
93 |
+
add_start_index=True,
|
94 |
+
strip_whitespace=True,
|
95 |
+
separators=["\n\n", "\n", ".", " ", ""], # Priority order for splitting
|
96 |
+
)
|
97 |
+
docs_processed = text_splitter.split_documents(source_docs)
|
98 |
+
|
99 |
+
print(f"Knowledge base prepared with {len(docs_processed)} document chunks")
|
100 |
+
```
|
101 |
+
|
102 |
+
### Step 3: Create a Retriever Tool
|
103 |
+
|
104 |
+
Now we'll create a custom tool that our agent can use to retrieve information from the knowledge base:
|
105 |
+
|
106 |
+
```python
|
107 |
+
from smolagents import Tool
|
108 |
+
|
109 |
+
class RetrieverTool(Tool):
|
110 |
+
name = "retriever"
|
111 |
+
description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
|
112 |
+
inputs = {
|
113 |
+
"query": {
|
114 |
+
"type": "string",
|
115 |
+
"description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
|
116 |
+
}
|
117 |
+
}
|
118 |
+
output_type = "string"
|
119 |
+
|
120 |
+
def __init__(self, docs, **kwargs):
|
121 |
+
super().__init__(**kwargs)
|
122 |
+
# Initialize the retriever with our processed documents
|
123 |
+
self.retriever = BM25Retriever.from_documents(
|
124 |
+
docs, k=10 # Return top 10 most relevant documents
|
125 |
+
)
|
126 |
+
|
127 |
+
def forward(self, query: str) -> str:
|
128 |
+
"""Execute the retrieval based on the provided query."""
|
129 |
+
assert isinstance(query, str), "Your search query must be a string"
|
130 |
+
|
131 |
+
# Retrieve relevant documents
|
132 |
+
docs = self.retriever.invoke(query)
|
133 |
+
|
134 |
+
# Format the retrieved documents for readability
|
135 |
+
return "\nRetrieved documents:\n" + "".join(
|
136 |
+
[
|
137 |
+
f"\n\n===== Document {str(i)} =====\n" + doc.page_content
|
138 |
+
for i, doc in enumerate(docs)
|
139 |
+
]
|
140 |
+
)
|
141 |
+
|
142 |
+
# Initialize our retriever tool with the processed documents
|
143 |
+
retriever_tool = RetrieverTool(docs_processed)
|
144 |
+
```
|
145 |
+
|
146 |
+
> [!TIP]
|
147 |
+
> We're using BM25, a lexical retrieval method, for simplicity and speed. For production systems, you might want to use semantic search with embeddings for better retrieval quality. Check the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for high-quality embedding models.
|
148 |
+
|
149 |
+
### Step 4: Create an Advanced Retrieval Agent
|
150 |
+
|
151 |
+
Now we'll create an agent that can use our retriever tool to answer questions:
|
152 |
+
|
153 |
+
```python
|
154 |
+
from smolagents import InferenceClientModel, CodeAgent
|
155 |
+
|
156 |
+
# Initialize the agent with our retriever tool
|
157 |
+
agent = CodeAgent(
|
158 |
+
tools=[retriever_tool], # List of tools available to the agent
|
159 |
+
model=InferenceClientModel(), # Default model "Qwen/Qwen2.5-Coder-32B-Instruct"
|
160 |
+
max_steps=4, # Limit the number of reasoning steps
|
161 |
+
verbosity_level=2, # Show detailed agent reasoning
|
162 |
+
)
|
163 |
+
|
164 |
+
# To use a specific model, you can specify it like this:
|
165 |
+
# model=InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")
|
166 |
+
```
|
167 |
+
|
168 |
+
> [!TIP]
|
169 |
+
> Inference Providers give access to hundreds of models, powered by serverless inference partners. A list of supported providers can be found [here](https://huggingface.co/docs/inference-providers/index).
|
170 |
+
|
171 |
+
### Step 5: Run the Agent to Answer Questions
|
172 |
+
|
173 |
+
Let's use our agent to answer a question about Transformers:
|
174 |
+
|
175 |
+
```python
|
176 |
+
# Ask a question that requires retrieving information
|
177 |
+
question = "For a transformers model training, which is slower, the forward or the backward pass?"
|
178 |
+
|
179 |
+
# Run the agent to get an answer
|
180 |
+
agent_output = agent.run(question)
|
181 |
+
|
182 |
+
# Display the final answer
|
183 |
+
print("\nFinal answer:")
|
184 |
+
print(agent_output)
|
185 |
+
```
|
186 |
+
|
187 |
+
## Practical Applications of Agentic RAG
|
188 |
+
|
189 |
+
Agentic RAG systems can be applied to various use cases:
|
190 |
+
|
191 |
+
1. **Technical Documentation Assistance**: Help users navigate complex technical documentation
|
192 |
+
2. **Research Paper Analysis**: Extract and synthesize information from scientific papers
|
193 |
+
3. **Legal Document Review**: Find relevant precedents and clauses in legal documents
|
194 |
+
4. **Customer Support**: Answer questions based on product documentation and knowledge bases
|
195 |
+
5. **Educational Tutoring**: Provide explanations based on textbooks and learning materials
|
196 |
+
|
197 |
+
## Conclusion
|
198 |
+
|
199 |
+
Agentic RAG represents a significant advancement over traditional RAG pipelines. By combining the reasoning capabilities of LLM agents with the factual grounding of retrieval systems, we can build more powerful, flexible, and accurate information systems.
|
200 |
+
|
201 |
+
The approach we've demonstrated:
|
202 |
+
- Overcomes the limitations of single-step retrieval
|
203 |
+
- Enables more natural interactions with knowledge bases
|
204 |
+
- Provides a framework for continuous improvement through self-critique and query refinement
|
205 |
+
|
206 |
+
As you build your own Agentic RAG systems, consider experimenting with different retrieval methods, agent architectures, and knowledge sources to find the optimal configuration for your specific use case.
|
docs/source/en/examples/text_to_sql.md
ADDED
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Text-to-SQL
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
In this tutorial, we’ll see how to implement an agent that leverages SQL using `smolagents`.
|
6 |
+
|
7 |
+
> Let's start with the golden question: why not keep it simple and use a standard text-to-SQL pipeline?
|
8 |
+
|
9 |
+
A standard text-to-sql pipeline is brittle, since the generated SQL query can be incorrect. Even worse, the query could be incorrect, but not raise an error, instead giving some incorrect/useless outputs without raising an alarm.
|
10 |
+
|
11 |
+
👉 Instead, an agent system is able to critically inspect outputs and decide if the query needs to be changed or not, thus giving it a huge performance boost.
|
12 |
+
|
13 |
+
Let’s build this agent! 💪
|
14 |
+
|
15 |
+
Run the line below to install required dependencies:
|
16 |
+
```bash
|
17 |
+
!pip install smolagents python-dotenv sqlalchemy --upgrade -q
|
18 |
+
```
|
19 |
+
To call Inference Providers, you will need a valid token as your environment variable `HF_TOKEN`.
|
20 |
+
We use python-dotenv to load it.
|
21 |
+
```py
|
22 |
+
from dotenv import load_dotenv
|
23 |
+
load_dotenv()
|
24 |
+
```
|
25 |
+
|
26 |
+
Then, we setup the SQL environment:
|
27 |
+
```py
|
28 |
+
from sqlalchemy import (
|
29 |
+
create_engine,
|
30 |
+
MetaData,
|
31 |
+
Table,
|
32 |
+
Column,
|
33 |
+
String,
|
34 |
+
Integer,
|
35 |
+
Float,
|
36 |
+
insert,
|
37 |
+
inspect,
|
38 |
+
text,
|
39 |
+
)
|
40 |
+
|
41 |
+
engine = create_engine("sqlite:///:memory:")
|
42 |
+
metadata_obj = MetaData()
|
43 |
+
|
44 |
+
def insert_rows_into_table(rows, table, engine=engine):
|
45 |
+
for row in rows:
|
46 |
+
stmt = insert(table).values(**row)
|
47 |
+
with engine.begin() as connection:
|
48 |
+
connection.execute(stmt)
|
49 |
+
|
50 |
+
table_name = "receipts"
|
51 |
+
receipts = Table(
|
52 |
+
table_name,
|
53 |
+
metadata_obj,
|
54 |
+
Column("receipt_id", Integer, primary_key=True),
|
55 |
+
Column("customer_name", String(16), primary_key=True),
|
56 |
+
Column("price", Float),
|
57 |
+
Column("tip", Float),
|
58 |
+
)
|
59 |
+
metadata_obj.create_all(engine)
|
60 |
+
|
61 |
+
rows = [
|
62 |
+
{"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
|
63 |
+
{"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
|
64 |
+
{"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
|
65 |
+
{"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
|
66 |
+
]
|
67 |
+
insert_rows_into_table(rows, receipts)
|
68 |
+
```
|
69 |
+
|
70 |
+
### Build our agent
|
71 |
+
|
72 |
+
Now let’s make our SQL table retrievable by a tool.
|
73 |
+
|
74 |
+
The tool’s description attribute will be embedded in the LLM’s prompt by the agent system: it gives the LLM information about how to use the tool. This is where we want to describe the SQL table.
|
75 |
+
|
76 |
+
```py
|
77 |
+
inspector = inspect(engine)
|
78 |
+
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]
|
79 |
+
|
80 |
+
table_description = "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
|
81 |
+
print(table_description)
|
82 |
+
```
|
83 |
+
|
84 |
+
```text
|
85 |
+
Columns:
|
86 |
+
- receipt_id: INTEGER
|
87 |
+
- customer_name: VARCHAR(16)
|
88 |
+
- price: FLOAT
|
89 |
+
- tip: FLOAT
|
90 |
+
```
|
91 |
+
|
92 |
+
Now let’s build our tool. It needs the following: (read [the tool doc](../tutorials/tools) for more detail)
|
93 |
+
- A docstring with an `Args:` part listing arguments.
|
94 |
+
- Type hints on both inputs and output.
|
95 |
+
|
96 |
+
```py
|
97 |
+
from smolagents import tool
|
98 |
+
|
99 |
+
@tool
|
100 |
+
def sql_engine(query: str) -> str:
|
101 |
+
"""
|
102 |
+
Allows you to perform SQL queries on the table. Returns a string representation of the result.
|
103 |
+
The table is named 'receipts'. Its description is as follows:
|
104 |
+
Columns:
|
105 |
+
- receipt_id: INTEGER
|
106 |
+
- customer_name: VARCHAR(16)
|
107 |
+
- price: FLOAT
|
108 |
+
- tip: FLOAT
|
109 |
+
|
110 |
+
Args:
|
111 |
+
query: The query to perform. This should be correct SQL.
|
112 |
+
"""
|
113 |
+
output = ""
|
114 |
+
with engine.connect() as con:
|
115 |
+
rows = con.execute(text(query))
|
116 |
+
for row in rows:
|
117 |
+
output += "\n" + str(row)
|
118 |
+
return output
|
119 |
+
```
|
120 |
+
|
121 |
+
Now let us create an agent that leverages this tool.
|
122 |
+
|
123 |
+
We use the `CodeAgent`, which is smolagents’ main agent class: an agent that writes actions in code and can iterate on previous output according to the ReAct framework.
|
124 |
+
|
125 |
+
The model is the LLM that powers the agent system. `InferenceClientModel` allows you to call LLMs using HF’s Inference API, either via Serverless or Dedicated endpoint, but you could also use any proprietary API.
|
126 |
+
|
127 |
+
```py
|
128 |
+
from smolagents import CodeAgent, InferenceClientModel
|
129 |
+
|
130 |
+
agent = CodeAgent(
|
131 |
+
tools=[sql_engine],
|
132 |
+
model=InferenceClientModel(model_id="meta-llama/Llama-3.1-8B-Instruct"),
|
133 |
+
)
|
134 |
+
agent.run("Can you give me the name of the client who got the most expensive receipt?")
|
135 |
+
```
|
136 |
+
|
137 |
+
### Level 2: Table joins
|
138 |
+
|
139 |
+
Now let’s make it more challenging! We want our agent to handle joins across multiple tables.
|
140 |
+
|
141 |
+
So let’s make a second table recording the names of waiters for each receipt_id!
|
142 |
+
|
143 |
+
```py
|
144 |
+
table_name = "waiters"
|
145 |
+
waiters = Table(
|
146 |
+
table_name,
|
147 |
+
metadata_obj,
|
148 |
+
Column("receipt_id", Integer, primary_key=True),
|
149 |
+
Column("waiter_name", String(16), primary_key=True),
|
150 |
+
)
|
151 |
+
metadata_obj.create_all(engine)
|
152 |
+
|
153 |
+
rows = [
|
154 |
+
{"receipt_id": 1, "waiter_name": "Corey Johnson"},
|
155 |
+
{"receipt_id": 2, "waiter_name": "Michael Watts"},
|
156 |
+
{"receipt_id": 3, "waiter_name": "Michael Watts"},
|
157 |
+
{"receipt_id": 4, "waiter_name": "Margaret James"},
|
158 |
+
]
|
159 |
+
insert_rows_into_table(rows, waiters)
|
160 |
+
```
|
161 |
+
Since we changed the table, we update the `SQLExecutorTool` with this table’s description to let the LLM properly leverage information from this table.
|
162 |
+
|
163 |
+
```py
|
164 |
+
updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
|
165 |
+
It can use the following tables:"""
|
166 |
+
|
167 |
+
inspector = inspect(engine)
|
168 |
+
for table in ["receipts", "waiters"]:
|
169 |
+
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]
|
170 |
+
|
171 |
+
table_description = f"Table '{table}':\n"
|
172 |
+
|
173 |
+
table_description += "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
|
174 |
+
updated_description += "\n\n" + table_description
|
175 |
+
|
176 |
+
print(updated_description)
|
177 |
+
```
|
178 |
+
Since this request is a bit harder than the previous one, we’ll switch the LLM engine to use the more powerful [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)!
|
179 |
+
|
180 |
+
```py
|
181 |
+
sql_engine.description = updated_description
|
182 |
+
|
183 |
+
agent = CodeAgent(
|
184 |
+
tools=[sql_engine],
|
185 |
+
model=InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct"),
|
186 |
+
)
|
187 |
+
|
188 |
+
agent.run("Which waiter got more total money from tips?")
|
189 |
+
```
|
190 |
+
It directly works! The setup was surprisingly simple, wasn’t it?
|
191 |
+
|
192 |
+
This example is done! We've touched upon these concepts:
|
193 |
+
- Building new tools.
|
194 |
+
- Updating a tool's description.
|
195 |
+
- Switching to a stronger LLM helps agent reasoning.
|
196 |
+
|
197 |
+
✅ Now you can go build this text-to-SQL system you’ve always dreamt of! ✨
|
docs/source/en/examples/using_different_models.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Using different models
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
`smolagents` provides a flexible framework that allows you to use various language models from different providers.
|
6 |
+
This guide will show you how to use different model types with your agents.
|
7 |
+
|
8 |
+
## Available model types
|
9 |
+
|
10 |
+
`smolagents` supports several model types out of the box:
|
11 |
+
1. [`InferenceClientModel`]: Uses Hugging Face's Inference API to access models
|
12 |
+
2. [`TransformersModel`]: Runs models locally using the Transformers library
|
13 |
+
3. [`VLLMModel`]: Uses vLLM for fast inference with optimized serving
|
14 |
+
4. [`MLXModel`]: Optimized for Apple Silicon devices using MLX
|
15 |
+
5. [`LiteLLMModel`]: Provides access to hundreds of LLMs through LiteLLM
|
16 |
+
6. [`LiteLLMRouterModel`]: Distributes requests among multiple models
|
17 |
+
7. [`OpenAIServerModel`]: Provides access to any provider that implements an OpenAI-compatible API
|
18 |
+
8. [`AzureOpenAIServerModel`]: Uses Azure's OpenAI service
|
19 |
+
9. [`AmazonBedrockServerModel`]: Connects to AWS Bedrock's API
|
20 |
+
|
21 |
+
## Using Google Gemini Models
|
22 |
+
|
23 |
+
As explained in the Google Gemini API documentation (https://ai.google.dev/gemini-api/docs/openai),
|
24 |
+
Google provides an OpenAI-compatible API for Gemini models, allowing you to use the [`OpenAIServerModel`]
|
25 |
+
with Gemini models by setting the appropriate base URL.
|
26 |
+
|
27 |
+
First, install the required dependencies:
|
28 |
+
```bash
|
29 |
+
pip install smolagents[openai]
|
30 |
+
```
|
31 |
+
|
32 |
+
Then, [get a Gemini API key](https://ai.google.dev/gemini-api/docs/api-key) and set it in your code:
|
33 |
+
```python
|
34 |
+
GEMINI_API_KEY = <YOUR-GEMINI-API-KEY>
|
35 |
+
```
|
36 |
+
|
37 |
+
Now, you can initialize the Gemini model using the `OpenAIServerModel` class
|
38 |
+
and setting the `api_base` parameter to the Gemini API base URL:
|
39 |
+
```python
|
40 |
+
from smolagents import OpenAIServerModel
|
41 |
+
|
42 |
+
model = OpenAIServerModel(
|
43 |
+
model_id="gemini-2.0-flash",
|
44 |
+
# Google Gemini OpenAI-compatible API base URL
|
45 |
+
api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
|
46 |
+
api_key=GEMINI_API_KEY,
|
47 |
+
)
|
48 |
+
```
|
49 |
+
|
50 |
+
## Using OpenRouter Models
|
51 |
+
|
52 |
+
OpenRouter provides access to a wide variety of language models through a unified OpenAI-compatible API.
|
53 |
+
You can use the [`OpenAIServerModel`] to connect to OpenRouter by setting the appropriate base URL.
|
54 |
+
|
55 |
+
First, install the required dependencies:
|
56 |
+
```bash
|
57 |
+
pip install smolagents[openai]
|
58 |
+
```
|
59 |
+
|
60 |
+
Then, [get an OpenRouter API key](https://openrouter.ai/keys) and set it in your code:
|
61 |
+
```python
|
62 |
+
OPENROUTER_API_KEY = <YOUR-OPENROUTER-API-KEY>
|
63 |
+
```
|
64 |
+
|
65 |
+
Now, you can initialize any model available on OpenRouter using the `OpenAIServerModel` class:
|
66 |
+
```python
|
67 |
+
from smolagents import OpenAIServerModel
|
68 |
+
|
69 |
+
model = OpenAIServerModel(
|
70 |
+
# You can use any model ID available on OpenRouter
|
71 |
+
model_id="openai/gpt-4o",
|
72 |
+
# OpenRouter API base URL
|
73 |
+
api_base="https://openrouter.ai/api/v1",
|
74 |
+
api_key=OPENROUTER_API_KEY,
|
75 |
+
)
|
76 |
+
```
|
docs/source/en/examples/web_browser.md
ADDED
@@ -0,0 +1,213 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Web Browser Automation with Agents 🤖🌐
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
In this notebook, we'll create an **agent-powered web browser automation system**! This system can navigate websites, interact with elements, and extract information automatically.
|
6 |
+
|
7 |
+
The agent will be able to:
|
8 |
+
|
9 |
+
- [x] Navigate to web pages
|
10 |
+
- [x] Click on elements
|
11 |
+
- [x] Search within pages
|
12 |
+
- [x] Handle popups and modals
|
13 |
+
- [x] Extract information
|
14 |
+
|
15 |
+
Let's set up this system step by step!
|
16 |
+
|
17 |
+
First, run these lines to install the required dependencies:
|
18 |
+
|
19 |
+
```bash
|
20 |
+
pip install smolagents selenium helium pillow -q
|
21 |
+
```
|
22 |
+
|
23 |
+
Let's import our required libraries and set up environment variables:
|
24 |
+
|
25 |
+
```python
|
26 |
+
from io import BytesIO
|
27 |
+
from time import sleep
|
28 |
+
|
29 |
+
import helium
|
30 |
+
from dotenv import load_dotenv
|
31 |
+
from PIL import Image
|
32 |
+
from selenium import webdriver
|
33 |
+
from selenium.webdriver.common.by import By
|
34 |
+
from selenium.webdriver.common.keys import Keys
|
35 |
+
|
36 |
+
from smolagents import CodeAgent, tool
|
37 |
+
from smolagents.agents import ActionStep
|
38 |
+
|
39 |
+
# Load environment variables
|
40 |
+
load_dotenv()
|
41 |
+
```
|
42 |
+
|
43 |
+
Now let's create our core browser interaction tools that will allow our agent to navigate and interact with web pages:
|
44 |
+
|
45 |
+
```python
|
46 |
+
@tool
|
47 |
+
def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
|
48 |
+
"""
|
49 |
+
Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
|
50 |
+
Args:
|
51 |
+
text: The text to search for
|
52 |
+
nth_result: Which occurrence to jump to (default: 1)
|
53 |
+
"""
|
54 |
+
elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
|
55 |
+
if nth_result > len(elements):
|
56 |
+
raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
|
57 |
+
result = f"Found {len(elements)} matches for '{text}'."
|
58 |
+
elem = elements[nth_result - 1]
|
59 |
+
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
|
60 |
+
result += f"Focused on element {nth_result} of {len(elements)}"
|
61 |
+
return result
|
62 |
+
|
63 |
+
@tool
|
64 |
+
def go_back() -> None:
|
65 |
+
"""Goes back to previous page."""
|
66 |
+
driver.back()
|
67 |
+
|
68 |
+
@tool
|
69 |
+
def close_popups() -> str:
|
70 |
+
"""
|
71 |
+
Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows!
|
72 |
+
This does not work on cookie consent banners.
|
73 |
+
"""
|
74 |
+
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
|
75 |
+
```
|
76 |
+
|
77 |
+
Let's set up our browser with Chrome and configure screenshot capabilities:
|
78 |
+
|
79 |
+
```python
|
80 |
+
# Configure Chrome options
|
81 |
+
chrome_options = webdriver.ChromeOptions()
|
82 |
+
chrome_options.add_argument("--force-device-scale-factor=1")
|
83 |
+
chrome_options.add_argument("--window-size=1000,1350")
|
84 |
+
chrome_options.add_argument("--disable-pdf-viewer")
|
85 |
+
chrome_options.add_argument("--window-position=0,0")
|
86 |
+
|
87 |
+
# Initialize the browser
|
88 |
+
driver = helium.start_chrome(headless=False, options=chrome_options)
|
89 |
+
|
90 |
+
# Set up screenshot callback
|
91 |
+
def save_screenshot(memory_step: ActionStep, agent: CodeAgent) -> None:
|
92 |
+
sleep(1.0) # Let JavaScript animations happen before taking the screenshot
|
93 |
+
driver = helium.get_driver()
|
94 |
+
current_step = memory_step.step_number
|
95 |
+
if driver is not None:
|
96 |
+
for previous_memory_step in agent.memory.steps: # Remove previous screenshots for lean processing
|
97 |
+
if isinstance(previous_memory_step, ActionStep) and previous_memory_step.step_number <= current_step - 2:
|
98 |
+
previous_memory_step.observations_images = None
|
99 |
+
png_bytes = driver.get_screenshot_as_png()
|
100 |
+
image = Image.open(BytesIO(png_bytes))
|
101 |
+
print(f"Captured a browser screenshot: {image.size} pixels")
|
102 |
+
memory_step.observations_images = [image.copy()] # Create a copy to ensure it persists
|
103 |
+
|
104 |
+
# Update observations with current URL
|
105 |
+
url_info = f"Current url: {driver.current_url}"
|
106 |
+
memory_step.observations = (
|
107 |
+
url_info if memory_step.observations is None else memory_step.observations + "\n" + url_info
|
108 |
+
)
|
109 |
+
```
|
110 |
+
|
111 |
+
Now let's create our web automation agent:
|
112 |
+
|
113 |
+
```python
|
114 |
+
from smolagents import InferenceClientModel
|
115 |
+
|
116 |
+
# Initialize the model
|
117 |
+
model_id = "Qwen/Qwen2-VL-72B-Instruct" # You can change this to your preferred VLM model
|
118 |
+
model = InferenceClientModel(model_id=model_id)
|
119 |
+
|
120 |
+
# Create the agent
|
121 |
+
agent = CodeAgent(
|
122 |
+
tools=[go_back, close_popups, search_item_ctrl_f],
|
123 |
+
model=model,
|
124 |
+
additional_authorized_imports=["helium"],
|
125 |
+
step_callbacks=[save_screenshot],
|
126 |
+
max_steps=20,
|
127 |
+
verbosity_level=2,
|
128 |
+
)
|
129 |
+
|
130 |
+
# Import helium for the agent
|
131 |
+
agent.python_executor("from helium import *", agent.state)
|
132 |
+
```
|
133 |
+
|
134 |
+
The agent needs instructions on how to use Helium for web automation. Here are the instructions we'll provide:
|
135 |
+
|
136 |
+
```python
|
137 |
+
helium_instructions = """
|
138 |
+
You can use helium to access websites. Don't bother about the helium driver, it's already managed.
|
139 |
+
We've already ran "from helium import *"
|
140 |
+
Then you can go to pages!
|
141 |
+
Code:
|
142 |
+
```py
|
143 |
+
go_to('github.com/trending')
|
144 |
+
```<end_code>
|
145 |
+
|
146 |
+
You can directly click clickable elements by inputting the text that appears on them.
|
147 |
+
Code:
|
148 |
+
```py
|
149 |
+
click("Top products")
|
150 |
+
```<end_code>
|
151 |
+
|
152 |
+
If it's a link:
|
153 |
+
Code:
|
154 |
+
```py
|
155 |
+
click(Link("Top products"))
|
156 |
+
```<end_code>
|
157 |
+
|
158 |
+
If you try to interact with an element and it's not found, you'll get a LookupError.
|
159 |
+
In general stop your action after each button click to see what happens on your screenshot.
|
160 |
+
Never try to login in a page.
|
161 |
+
|
162 |
+
To scroll up or down, use scroll_down or scroll_up with as an argument the number of pixels to scroll from.
|
163 |
+
Code:
|
164 |
+
```py
|
165 |
+
scroll_down(num_pixels=1200) # This will scroll one viewport down
|
166 |
+
```<end_code>
|
167 |
+
|
168 |
+
When you have pop-ups with a cross icon to close, don't try to click the close icon by finding its element or targeting an 'X' element (this most often fails).
|
169 |
+
Just use your built-in tool `close_popups` to close them:
|
170 |
+
Code:
|
171 |
+
```py
|
172 |
+
close_popups()
|
173 |
+
```<end_code>
|
174 |
+
|
175 |
+
You can use .exists() to check for the existence of an element. For example:
|
176 |
+
Code:
|
177 |
+
```py
|
178 |
+
if Text('Accept cookies?').exists():
|
179 |
+
click('I accept')
|
180 |
+
```<end_code>
|
181 |
+
"""
|
182 |
+
```
|
183 |
+
|
184 |
+
Now we can run our agent with a task! Let's try finding information on Wikipedia:
|
185 |
+
|
186 |
+
```python
|
187 |
+
search_request = """
|
188 |
+
Please navigate to https://en.wikipedia.org/wiki/Chicago and give me a sentence containing the word "1992" that mentions a construction accident.
|
189 |
+
"""
|
190 |
+
|
191 |
+
agent_output = agent.run(search_request + helium_instructions)
|
192 |
+
print("Final output:")
|
193 |
+
print(agent_output)
|
194 |
+
```
|
195 |
+
|
196 |
+
You can run different tasks by modifying the request. For example, here's for me to know if I should work harder:
|
197 |
+
|
198 |
+
```python
|
199 |
+
github_request = """
|
200 |
+
I'm trying to find how hard I have to work to get a repo in github.com/trending.
|
201 |
+
Can you navigate to the profile for the top author of the top trending repo, and give me their total number of commits over the last year?
|
202 |
+
"""
|
203 |
+
|
204 |
+
agent_output = agent.run(github_request + helium_instructions)
|
205 |
+
print("Final output:")
|
206 |
+
print(agent_output)
|
207 |
+
```
|
208 |
+
|
209 |
+
The system is particularly effective for tasks like:
|
210 |
+
- Data extraction from websites
|
211 |
+
- Web research automation
|
212 |
+
- UI testing and verification
|
213 |
+
- Content monitoring
|