Upload 3 files
Browse files
docs/source/zh/examples/rag.md
ADDED
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Agentic RAG
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
Retrieval-Augmented-Generation (RAG) 是“使用大语言模型(LLM)来回答用户查询,但基于从知识库中检索的信息”。它比使用普通或微调的 LLM 具有许多优势:举几个例子,它允许将答案基于真实事实并减少虚构;它允许提供 LLM 领域特定的知识;并允许对知识库中的信息访问进行精细控制。
|
6 |
+
|
7 |
+
但是,普通的 RAG 存在一些局限性,以下两点尤为突出:
|
8 |
+
|
9 |
+
- 它只执行一次检索步骤:如果结果不好,生成的内容也会不好。
|
10 |
+
- 语义相似性是以用户查询为参考计算的,这可能不是最优的:例如,用户查询通常是一个问题,而包含真实答案的文档通常是肯定语态,因此其相似性得分会比其他以疑问形式呈现的源文档低,从而导致错失相关信息的风险。
|
11 |
+
|
12 |
+
我们可以通过制作一个 RAG agent来缓解这些问题:非常简单,一个配备了检索工具的agent!这个 agent 将
|
13 |
+
会:✅ 自己构建查询和检索,✅ 如果需要的话会重新检索。
|
14 |
+
|
15 |
+
因此,它将比普通 RAG 更智能,因为它可以自己构建查询,而不是直接使用用户查询作为参考。这样,它可以更
|
16 |
+
接近目标文档,从而提高检索的准确性, [HyDE](https://huggingface.co/papers/2212.10496)。此 agent 可以
|
17 |
+
使用生成的片段,并在需要时重新检索,就像 [Self-Query](https://docs.llamaindex.ai/en/stable/examples/evaluation/RetryQuery/)。
|
18 |
+
|
19 |
+
我们现在开始构建这个系统. 🛠️
|
20 |
+
|
21 |
+
运行以下代码以安装所需的依赖包:
|
22 |
+
```bash
|
23 |
+
!pip install smolagents pandas langchain langchain-community sentence-transformers rank_bm25 --upgrade -q
|
24 |
+
```
|
25 |
+
|
26 |
+
你需要一个有效的 token 作为环境变量 `HF_TOKEN` 来调用 Inference Providers。我们使用 python-dotenv 来加载它。
|
27 |
+
```py
|
28 |
+
from dotenv import load_dotenv
|
29 |
+
load_dotenv()
|
30 |
+
```
|
31 |
+
|
32 |
+
我们首先加载一个知识库以在其上执行 RAG:此数据集是许多 Hugging Face 库的文档页面的汇编,存储为 markdown 格式。我们将仅保留 `transformers` 库的文档。然后通过处理数据集并将其存储到向量数据库中,为检索器准备知识库。我们将使用 [LangChain](https://python.langchain.com/docs/introduction/) 来利用其出色的向量数据库工具。
|
33 |
+
```py
|
34 |
+
import datasets
|
35 |
+
from langchain.docstore.document import Document
|
36 |
+
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
37 |
+
from langchain_community.retrievers import BM25Retriever
|
38 |
+
|
39 |
+
knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")
|
40 |
+
knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers"))
|
41 |
+
|
42 |
+
source_docs = [
|
43 |
+
Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]})
|
44 |
+
for doc in knowledge_base
|
45 |
+
]
|
46 |
+
|
47 |
+
text_splitter = RecursiveCharacterTextSplitter(
|
48 |
+
chunk_size=500,
|
49 |
+
chunk_overlap=50,
|
50 |
+
add_start_index=True,
|
51 |
+
strip_whitespace=True,
|
52 |
+
separators=["\n\n", "\n", ".", " ", ""],
|
53 |
+
)
|
54 |
+
docs_processed = text_splitter.split_documents(source_docs)
|
55 |
+
```
|
56 |
+
|
57 |
+
现在文档已准备好。我们来一起构建我们的 agent RAG 系统!
|
58 |
+
👉 我们只需要一个 RetrieverTool,我们的 agent 可以利用它从知识库中检索信息。
|
59 |
+
|
60 |
+
由于我们需要将 vectordb 添加为工具的属性,我们不能简单地使用带有 `@tool` 装饰器的简单工具构造函数:因此我们将遵循 [tools 教程](../tutorials/tools) 中突出显示的高级设置。
|
61 |
+
|
62 |
+
```py
|
63 |
+
from smolagents import Tool
|
64 |
+
|
65 |
+
class RetrieverTool(Tool):
|
66 |
+
name = "retriever"
|
67 |
+
description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
|
68 |
+
inputs = {
|
69 |
+
"query": {
|
70 |
+
"type": "string",
|
71 |
+
"description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
|
72 |
+
}
|
73 |
+
}
|
74 |
+
output_type = "string"
|
75 |
+
|
76 |
+
def __init__(self, docs, **kwargs):
|
77 |
+
super().__init__(**kwargs)
|
78 |
+
self.retriever = BM25Retriever.from_documents(
|
79 |
+
docs, k=10
|
80 |
+
)
|
81 |
+
|
82 |
+
def forward(self, query: str) -> str:
|
83 |
+
assert isinstance(query, str), "Your search query must be a string"
|
84 |
+
|
85 |
+
docs = self.retriever.invoke(
|
86 |
+
query,
|
87 |
+
)
|
88 |
+
return "\nRetrieved documents:\n" + "".join(
|
89 |
+
[
|
90 |
+
f"\n\n===== Document {str(i)} =====\n" + doc.page_content
|
91 |
+
for i, doc in enumerate(docs)
|
92 |
+
]
|
93 |
+
)
|
94 |
+
|
95 |
+
retriever_tool = RetrieverTool(docs_processed)
|
96 |
+
```
|
97 |
+
BM25 检索方法是一个经典的检索方法,因为它的设置速度非常快。为了提高检索准确性,你可以使用语义搜索,使用文档的向量表示替换 BM25:因此你可以前往 [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) 选择一个好的嵌入模型。
|
98 |
+
|
99 |
+
现在我们已经创建了一个可以从知识库中检���信息的工具,现在我们可以很容易地创建一个利用这个
|
100 |
+
`retriever_tool` 的 agent!此 agent 将使用如下参数初始化:
|
101 |
+
- `tools`:代理将能够调用的工具列表。
|
102 |
+
- `model`:为代理提供动力的 LLM。
|
103 |
+
|
104 |
+
我们的 `model` 必须是一个可调用对象,它接受一个消息的 list 作为输入,并返回文本。它还需要接受一个 stop_sequences 参数,指示何时停止生成。为了方便起见,我们直接使用包中提供的 `HfEngine` 类来获取调用 Hugging Face 的 Inference API 的 LLM 引擎。
|
105 |
+
|
106 |
+
接着,我们将使用 [meta-llama/Llama-3.3-70B-Instruct](meta-llama/Llama-3.3-70B-Instruct) 作为 llm 引
|
107 |
+
擎,因为:
|
108 |
+
- 它有一个长 128k 上下文,这对处理长源文档很有用。
|
109 |
+
- 它在 HF 的 Inference API 上始终免费提供!
|
110 |
+
|
111 |
+
_Note:_ 此 Inference API 托管基于各种标准的模型,部署的模型可能会在没有事先通知的情况下进行更新或替换。了解更多信息,请点击[这里](https://huggingface.co/docs/api-inference/supported-models)。
|
112 |
+
|
113 |
+
```py
|
114 |
+
from smolagents import InferenceClientModel, CodeAgent
|
115 |
+
|
116 |
+
agent = CodeAgent(
|
117 |
+
tools=[retriever_tool], model=InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct"), max_steps=4, verbose=True
|
118 |
+
)
|
119 |
+
```
|
120 |
+
|
121 |
+
当我们初始化 CodeAgent 时,它已经自动获得了一个默认的系统提示,告诉 LLM 引擎按步骤处理并生成工具调用作为代码片段,但你可以根据需要替换此提示模板。接着,当其 `.run()` 方法被调用时,代理将负责调用 LLM 引擎,并在循环中执行工具调用,直到工具 `final_answer` 被调用,而其参数为最终答案。
|
122 |
+
|
123 |
+
```py
|
124 |
+
agent_output = agent.run("For a transformers model training, which is slower, the forward or the backward pass?")
|
125 |
+
|
126 |
+
print("Final output:")
|
127 |
+
print(agent_output)
|
128 |
+
```
|
docs/source/zh/examples/text_to_sql.md
ADDED
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Text-to-SQL
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
在此教程中,我们将看到如何使用 `smolagents` 实现一个利用 SQL 的 agent。
|
6 |
+
|
7 |
+
> 让我们从经典问题开始:为什么不简单地使用标准的 text-to-SQL pipeline 呢?
|
8 |
+
|
9 |
+
标准的 text-to-SQL pipeline 很脆弱,因为生成的 SQL 查询可能会出错。更糟糕的是,查询可能出错却不引发错误警报,从而返回一些不正确或无用的结果。
|
10 |
+
|
11 |
+
👉 相反,agent 系统则可以检视输出结果并决定查询是否需要被更改,因此带来巨大的性能提升。
|
12 |
+
|
13 |
+
让我们来一起构建这个 agent! 💪
|
14 |
+
|
15 |
+
首先,我们构建一个 SQL 的环境:
|
16 |
+
```py
|
17 |
+
from sqlalchemy import (
|
18 |
+
create_engine,
|
19 |
+
MetaData,
|
20 |
+
Table,
|
21 |
+
Column,
|
22 |
+
String,
|
23 |
+
Integer,
|
24 |
+
Float,
|
25 |
+
insert,
|
26 |
+
inspect,
|
27 |
+
text,
|
28 |
+
)
|
29 |
+
|
30 |
+
engine = create_engine("sqlite:///:memory:")
|
31 |
+
metadata_obj = MetaData()
|
32 |
+
|
33 |
+
# create city SQL table
|
34 |
+
table_name = "receipts"
|
35 |
+
receipts = Table(
|
36 |
+
table_name,
|
37 |
+
metadata_obj,
|
38 |
+
Column("receipt_id", Integer, primary_key=True),
|
39 |
+
Column("customer_name", String(16), primary_key=True),
|
40 |
+
Column("price", Float),
|
41 |
+
Column("tip", Float),
|
42 |
+
)
|
43 |
+
metadata_obj.create_all(engine)
|
44 |
+
|
45 |
+
rows = [
|
46 |
+
{"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
|
47 |
+
{"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
|
48 |
+
{"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
|
49 |
+
{"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
|
50 |
+
]
|
51 |
+
for row in rows:
|
52 |
+
stmt = insert(receipts).values(**row)
|
53 |
+
with engine.begin() as connection:
|
54 |
+
cursor = connection.execute(stmt)
|
55 |
+
```
|
56 |
+
|
57 |
+
### 构建 agent
|
58 |
+
|
59 |
+
现在,我们构建一个 agent,它将使用 SQL 查询来回答问题。工具的 description 属性将被 agent 系统嵌入到 LLM 的提示中:它为 LLM 提供有关如何使用该工具的信息。这正是我们描述 SQL 表的地方。
|
60 |
+
|
61 |
+
```py
|
62 |
+
inspector = inspect(engine)
|
63 |
+
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]
|
64 |
+
|
65 |
+
table_description = "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
|
66 |
+
print(table_description)
|
67 |
+
```
|
68 |
+
|
69 |
+
```text
|
70 |
+
Columns:
|
71 |
+
- receipt_id: INTEGER
|
72 |
+
- customer_name: VARCHAR(16)
|
73 |
+
- price: FLOAT
|
74 |
+
- tip: FLOAT
|
75 |
+
```
|
76 |
+
|
77 |
+
现在让我们构建我们的工具。它需要以下内容:(更多细节请参阅[工具文档](../tutorials/tools))
|
78 |
+
|
79 |
+
- 一个带有 `Args:` 部分列出参数的 docstring。
|
80 |
+
- 输入和输出的type hints。
|
81 |
+
|
82 |
+
```py
|
83 |
+
from smolagents import tool
|
84 |
+
|
85 |
+
@tool
|
86 |
+
def sql_engine(query: str) -> str:
|
87 |
+
"""
|
88 |
+
Allows you to perform SQL queries on the table. Returns a string representation of the result.
|
89 |
+
The table is named 'receipts'. Its description is as follows:
|
90 |
+
Columns:
|
91 |
+
- receipt_id: INTEGER
|
92 |
+
- customer_name: VARCHAR(16)
|
93 |
+
- price: FLOAT
|
94 |
+
- tip: FLOAT
|
95 |
+
|
96 |
+
Args:
|
97 |
+
query: The query to perform. This should be correct SQL.
|
98 |
+
"""
|
99 |
+
output = ""
|
100 |
+
with engine.connect() as con:
|
101 |
+
rows = con.execute(text(query))
|
102 |
+
for row in rows:
|
103 |
+
output += "\n" + str(row)
|
104 |
+
return output
|
105 |
+
```
|
106 |
+
|
107 |
+
我们现在使用这个工具来创建一个 agent。我们使用 `CodeAgent`,这是 smolagent 的主要 agent 类:一个在代码中编写操作并根据 ReAct 框架迭代先前输出的 agent。
|
108 |
+
|
109 |
+
这个模型是驱动 agent 系统的 LLM。`InferenceClientModel` 允许你使用 HF Inference API 调用 LLM,无论是通过 Serverless 还是 Dedicated endpoint,但你也可以使用任何专有 API。
|
110 |
+
|
111 |
+
```py
|
112 |
+
from smolagents import CodeAgent, InferenceClientModel
|
113 |
+
|
114 |
+
agent = CodeAgent(
|
115 |
+
tools=[sql_engine],
|
116 |
+
model=InferenceClientModel(model_id="meta-llama/Meta-Llama-3.1-8B-Instruct"),
|
117 |
+
)
|
118 |
+
agent.run("Can you give me the name of the client who got the most expensive receipt?")
|
119 |
+
```
|
120 |
+
|
121 |
+
### Level 2: 表连接
|
122 |
+
|
123 |
+
现在让我们增加一些挑战!我们希望我们的 agent 能够处理跨多个表的连接。因此,我们创建一个新表,记录每个 receipt_id 的服务员名字!
|
124 |
+
|
125 |
+
```py
|
126 |
+
table_name = "waiters"
|
127 |
+
receipts = Table(
|
128 |
+
table_name,
|
129 |
+
metadata_obj,
|
130 |
+
Column("receipt_id", Integer, primary_key=True),
|
131 |
+
Column("waiter_name", String(16), primary_key=True),
|
132 |
+
)
|
133 |
+
metadata_obj.create_all(engine)
|
134 |
+
|
135 |
+
rows = [
|
136 |
+
{"receipt_id": 1, "waiter_name": "Corey Johnson"},
|
137 |
+
{"receipt_id": 2, "waiter_name": "Michael Watts"},
|
138 |
+
{"receipt_id": 3, "waiter_name": "Michael Watts"},
|
139 |
+
{"receipt_id": 4, "waiter_name": "Margaret James"},
|
140 |
+
]
|
141 |
+
for row in rows:
|
142 |
+
stmt = insert(receipts).values(**row)
|
143 |
+
with engine.begin() as connection:
|
144 |
+
cursor = connection.execute(stmt)
|
145 |
+
```
|
146 |
+
|
147 |
+
因为我们改变了表,我们需要更新 `SQLExecutorTool`,让 LLM 能够正确利用这个表的信息。
|
148 |
+
|
149 |
+
```py
|
150 |
+
updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
|
151 |
+
It can use the following tables:"""
|
152 |
+
|
153 |
+
inspector = inspect(engine)
|
154 |
+
for table in ["receipts", "waiters"]:
|
155 |
+
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]
|
156 |
+
|
157 |
+
table_description = f"Table '{table}':\n"
|
158 |
+
|
159 |
+
table_description += "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
|
160 |
+
updated_description += "\n\n" + table_description
|
161 |
+
|
162 |
+
print(updated_description)
|
163 |
+
```
|
164 |
+
|
165 |
+
因为这个request 比之前的要难一些,我们将 LLM 引擎切换到更强大的 [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)!
|
166 |
+
|
167 |
+
```py
|
168 |
+
sql_engine.description = updated_description
|
169 |
+
|
170 |
+
agent = CodeAgent(
|
171 |
+
tools=[sql_engine],
|
172 |
+
model=InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct"),
|
173 |
+
)
|
174 |
+
|
175 |
+
agent.run("Which waiter got more total money from tips?")
|
176 |
+
```
|
177 |
+
|
178 |
+
它直接就能工作!设置过程非常简单,难道不是吗?
|
179 |
+
|
180 |
+
这个例子到此结束!我们涵盖了这些概念:
|
181 |
+
|
182 |
+
- 构建新工具。
|
183 |
+
- 更新工具的描述。
|
184 |
+
- 切换到更强大的 LLM 有助于 agent 推理。
|
185 |
+
|
186 |
+
✅ 现在你可以构建你一直梦寐以求的 text-to-SQL 系统了!✨
|
docs/source/zh/examples/web_browser.md
ADDED
@@ -0,0 +1,214 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# 使用Agent实现网页浏览器自动化 🤖🌐
|
2 |
+
|
3 |
+
[[open-in-colab]]
|
4 |
+
|
5 |
+
在本notebook中,我们将创建一个**基于Agent的网页浏览器自动化系统**!该系统可以自动导航网站、与网页元素交互并提取信息。
|
6 |
+
|
7 |
+
该Agent将能够:
|
8 |
+
|
9 |
+
- [x] 导航到网页
|
10 |
+
- [x] 点击元素
|
11 |
+
- [x] 在页面内搜索
|
12 |
+
- [x] 处理弹出窗口和模态框
|
13 |
+
- [x] 提取信息
|
14 |
+
|
15 |
+
让我们一步步搭建这个系统!
|
16 |
+
|
17 |
+
首先运行以下命令安装所需依赖:
|
18 |
+
|
19 |
+
```bash
|
20 |
+
pip install smolagents selenium helium pillow -q
|
21 |
+
```
|
22 |
+
|
23 |
+
让我们导入所需的库并设置环境变量:
|
24 |
+
|
25 |
+
```python
|
26 |
+
from io import BytesIO
|
27 |
+
from time import sleep
|
28 |
+
|
29 |
+
import helium
|
30 |
+
from dotenv import load_dotenv
|
31 |
+
from PIL import Image
|
32 |
+
from selenium import webdriver
|
33 |
+
from selenium.webdriver.common.by import By
|
34 |
+
from selenium.webdriver.common.keys import Keys
|
35 |
+
|
36 |
+
from smolagents import CodeAgent, tool
|
37 |
+
from smolagents.agents import ActionStep
|
38 |
+
|
39 |
+
# Load environment variables
|
40 |
+
load_dotenv()
|
41 |
+
```
|
42 |
+
|
43 |
+
现在我们来创建核心的浏览器交互工具,使我们的Agent能够导航并与网页交互:
|
44 |
+
|
45 |
+
```python
|
46 |
+
@tool
|
47 |
+
def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
|
48 |
+
"""
|
49 |
+
Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
|
50 |
+
Args:
|
51 |
+
text: The text to search for
|
52 |
+
nth_result: Which occurrence to jump to (default: 1)
|
53 |
+
"""
|
54 |
+
elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
|
55 |
+
if nth_result > len(elements):
|
56 |
+
raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
|
57 |
+
result = f"Found {len(elements)} matches for '{text}'."
|
58 |
+
elem = elements[nth_result - 1]
|
59 |
+
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
|
60 |
+
result += f"Focused on element {nth_result} of {len(elements)}"
|
61 |
+
return result
|
62 |
+
|
63 |
+
@tool
|
64 |
+
def go_back() -> None:
|
65 |
+
"""Goes back to previous page."""
|
66 |
+
driver.back()
|
67 |
+
|
68 |
+
@tool
|
69 |
+
def close_popups() -> str:
|
70 |
+
"""
|
71 |
+
Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows!
|
72 |
+
This does not work on cookie consent banners.
|
73 |
+
"""
|
74 |
+
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
|
75 |
+
```
|
76 |
+
|
77 |
+
让我们配置使用Chrome浏览器并设置截图功能:
|
78 |
+
|
79 |
+
```python
|
80 |
+
# Configure Chrome options
|
81 |
+
chrome_options = webdriver.ChromeOptions()
|
82 |
+
chrome_options.add_argument("--force-device-scale-factor=1")
|
83 |
+
chrome_options.add_argument("--window-size=1000,1350")
|
84 |
+
chrome_options.add_argument("--disable-pdf-viewer")
|
85 |
+
chrome_options.add_argument("--window-position=0,0")
|
86 |
+
|
87 |
+
# Initialize the browser
|
88 |
+
driver = helium.start_chrome(headless=False, options=chrome_options)
|
89 |
+
|
90 |
+
# Set up screenshot callback
|
91 |
+
def save_screenshot(memory_step: ActionStep, agent: CodeAgent) -> None:
|
92 |
+
sleep(1.0) # Let JavaScript animations happen before taking the screenshot
|
93 |
+
driver = helium.get_driver()
|
94 |
+
current_step = memory_step.step_number
|
95 |
+
if driver is not None:
|
96 |
+
for previous_memory_step in agent.memory.steps: # Remove previous screenshots for lean processing
|
97 |
+
if isinstance(previous_memory_step, ActionStep) and previous_memory_step.step_number <= current_step - 2:
|
98 |
+
previous_memory_step.observations_images = None
|
99 |
+
png_bytes = driver.get_screenshot_as_png()
|
100 |
+
image = Image.open(BytesIO(png_bytes))
|
101 |
+
print(f"Captured a browser screenshot: {image.size} pixels")
|
102 |
+
memory_step.observations_images = [image.copy()] # Create a copy to ensure it persists
|
103 |
+
|
104 |
+
# Update observations with current URL
|
105 |
+
url_info = f"Current url: {driver.current_url}"
|
106 |
+
memory_step.observations = (
|
107 |
+
url_info if memory_step.observations is None else memory_step.observations + "\n" + url_info
|
108 |
+
)
|
109 |
+
```
|
110 |
+
|
111 |
+
现在我们来创建网页自动化Agent:
|
112 |
+
|
113 |
+
```python
|
114 |
+
from smolagents import InferenceClientModel
|
115 |
+
|
116 |
+
# Initialize the model
|
117 |
+
model_id = "meta-llama/Llama-3.3-70B-Instruct" # You can change this to your preferred model
|
118 |
+
model = InferenceClientModel(model_id=model_id)
|
119 |
+
|
120 |
+
# Create the agent
|
121 |
+
agent = CodeAgent(
|
122 |
+
tools=[go_back, close_popups, search_item_ctrl_f],
|
123 |
+
model=model,
|
124 |
+
additional_authorized_imports=["helium"],
|
125 |
+
step_callbacks=[save_screenshot],
|
126 |
+
max_steps=20,
|
127 |
+
verbosity_level=2,
|
128 |
+
)
|
129 |
+
|
130 |
+
# Import helium for the agent
|
131 |
+
agent.python_executor("from helium import *", agent.state)
|
132 |
+
```
|
133 |
+
|
134 |
+
Agent需要获得关于如何使用Helium进行网页自动化的指导。以下是我们将提供的操作说明:
|
135 |
+
|
136 |
+
```python
|
137 |
+
helium_instructions = """
|
138 |
+
You can use helium to access websites. Don't bother about the helium driver, it's already managed.
|
139 |
+
We've already ran "from helium import *"
|
140 |
+
Then you can go to pages!
|
141 |
+
Code:
|
142 |
+
```py
|
143 |
+
go_to('github.com/trending')
|
144 |
+
```<end_code>
|
145 |
+
|
146 |
+
You can directly click clickable elements by inputting the text that appears on them.
|
147 |
+
Code:
|
148 |
+
```py
|
149 |
+
click("Top products")
|
150 |
+
```<end_code>
|
151 |
+
|
152 |
+
If it's a link:
|
153 |
+
Code:
|
154 |
+
```py
|
155 |
+
click(Link("Top products"))
|
156 |
+
```<end_code>
|
157 |
+
|
158 |
+
If you try to interact with an element and it's not found, you'll get a LookupError.
|
159 |
+
In general stop your action after each button click to see what happens on your screenshot.
|
160 |
+
Never try to login in a page.
|
161 |
+
|
162 |
+
To scroll up or down, use scroll_down or scroll_up with as an argument the number of pixels to scroll from.
|
163 |
+
Code:
|
164 |
+
```py
|
165 |
+
scroll_down(num_pixels=1200) # This will scroll one viewport down
|
166 |
+
```<end_code>
|
167 |
+
|
168 |
+
When you have pop-ups with a cross icon to close, don't try to click the close icon by finding its element or targeting an 'X' element (this most often fails).
|
169 |
+
Just use your built-in tool `close_popups` to close them:
|
170 |
+
Code:
|
171 |
+
```py
|
172 |
+
close_popups()
|
173 |
+
```<end_code>
|
174 |
+
|
175 |
+
You can use .exists() to check for the existence of an element. For example:
|
176 |
+
Code:
|
177 |
+
```py
|
178 |
+
if Text('Accept cookies?').exists():
|
179 |
+
click('I accept')
|
180 |
+
```<end_code>
|
181 |
+
"""
|
182 |
+
```
|
183 |
+
|
184 |
+
现在我们可以运行Agent执行任务了!让我们尝试在维基百科上查找信息:
|
185 |
+
|
186 |
+
```python
|
187 |
+
search_request = """
|
188 |
+
Please navigate to https://en.wikipedia.org/wiki/Chicago and give me a sentence containing the word "1992" that mentions a construction accident.
|
189 |
+
"""
|
190 |
+
|
191 |
+
agent_output = agent.run(search_request + helium_instructions)
|
192 |
+
print("Final output:")
|
193 |
+
print(agent_output)
|
194 |
+
```
|
195 |
+
|
196 |
+
您可以通过修改请求参数执行不同任务。例如,以下请求可帮助我判断是否需要更加努力工作:
|
197 |
+
|
198 |
+
```python
|
199 |
+
github_request = """
|
200 |
+
I'm trying to find how hard I have to work to get a repo in github.com/trending.
|
201 |
+
Can you navigate to the profile for the top author of the top trending repo, and give me their total number of commits over the last year?
|
202 |
+
"""
|
203 |
+
|
204 |
+
agent_output = agent.run(github_request + helium_instructions)
|
205 |
+
print("Final output:")
|
206 |
+
print(agent_output)
|
207 |
+
```
|
208 |
+
|
209 |
+
该系统在以下任务中尤为有效:
|
210 |
+
|
211 |
+
- 从网站提取数据
|
212 |
+
- 网页研究自动化
|
213 |
+
- 用户界面测试与验证
|
214 |
+
- 内容监控
|