Final_Assignment_Template3

Sleeping

App Files Files Community

Duibonduil commited on Jun 28

Commit

bbc0741

verified ·

1 Parent(s): 6908c24

Upload 3 files

Browse files

Files changed (3) hide show

docs/source/zh/examples/rag.md +128 -0
docs/source/zh/examples/text_to_sql.md +186 -0
docs/source/zh/examples/web_browser.md +214 -0

docs/source/zh/examples/rag.md ADDED Viewed

	@@ -0,0 +1,128 @@

+# Agentic RAG
+[[open-in-colab]]
+Retrieval-Augmented-Generation (RAG) 是“使用大语言模型（LLM）来回答用户查询，但基于从知识库中检索的信息”。它比使用普通或微调的 LLM 具有许多优势：举几个例子，它允许将答案基于真实事实并减少虚构；它允许提供 LLM 领域特定的知识；并允许对知识库中的信息访问进行精细控制。
+但是，普通的 RAG 存在一些局限性，以下两点尤为突出：
+- 它只执行一次检索步骤：如果结果不好，生成的内容也会不好。
+- 语义相似性是以用户查询为参考计算的，这可能不是最优的：例如，用户查询通常是一个问题，而包含真实答案的文档通常是肯定语态，因此其相似性得分会比其他以疑问形式呈现的源文档低，从而导致错失相关信息的风险。
+我们可以通过制作一个 RAG  agent来缓解这些问题：非常简单，一个配备了检索工具的agent！这个 agent 将
+会：✅ 自己构建查询和检索，✅ 如果需要的话会重新检索。
+因此，它将比普通 RAG 更智能，因为它可以自己构建查询，而不是直接使用用户查询作为参考。这样，它可以更
+接近目标文档，从而提高检索的准确性， [HyDE](https://huggingface.co/papers/2212.10496)。此 agent 可以
+使用生成的片段，并在需要时重新检索，就像 [Self-Query](https://docs.llamaindex.ai/en/stable/examples/evaluation/RetryQuery/)。
+我们现在开始构建这个系统. 🛠️
+运行以下代码以安装所需的依赖包：
+```bash
+!pip install smolagents pandas langchain langchain-community sentence-transformers rank_bm25 --upgrade -q
+```
+你需要一个有效的 token 作为环境变量 `HF_TOKEN` 来调用 Inference Providers。我们使用 python-dotenv 来加载它。
+```py
+from dotenv import load_dotenv
+load_dotenv()
+```
+我们首先加载一个知识库以在其上执行 RAG：此数据集是许多 Hugging Face 库的文档页面的汇编，存储为 markdown 格式。我们将仅保留 `transformers` 库的文档。然后通过处理数据集并将其存储到向量数据库中，为检索器准备知识库。我们将使用 [LangChain](https://python.langchain.com/docs/introduction/) 来利用其出色的向量数据库工具。
+```py
+import datasets
+from langchain.docstore.document import Document
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.retrievers import BM25Retriever
+knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train")
+knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers"))
+source_docs = [
+    Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]})
+    for doc in knowledge_base
+]
+text_splitter = RecursiveCharacterTextSplitter(
+    chunk_size=500,
+    chunk_overlap=50,
+    add_start_index=True,
+    strip_whitespace=True,
+    separators=["\n\n", "\n", ".", " ", ""],
+)
+docs_processed = text_splitter.split_documents(source_docs)
+```
+现在文档已准备好。我们来一起构建我们的 agent RAG 系统！
+👉 我们只需要一个 RetrieverTool，我们的 agent 可以利用它从知识库中检索信息。
+由于我们需要将 vectordb 添加为工具的属性，我们不能简单地使用带有 `@tool` 装饰器的简单工具构造函数：因此我们将遵循 [tools 教程](../tutorials/tools) 中突出显示的高级设置。
+```py
+from smolagents import Tool
+class RetrieverTool(Tool):
+    name = "retriever"
+    description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query."
+    inputs = {
+        "query": {
+            "type": "string",
+            "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
+        }
+    }
+    output_type = "string"
+    def __init__(self, docs, **kwargs):
+        super().__init__(**kwargs)
+        self.retriever = BM25Retriever.from_documents(
+            docs, k=10
+        )
+    def forward(self, query: str) -> str:
+        assert isinstance(query, str), "Your search query must be a string"
+        docs = self.retriever.invoke(
+            query,
+        )
+        return "\nRetrieved documents:\n" + "".join(
+            [
+                f"\n\n===== Document {str(i)} =====\n" + doc.page_content
+                for i, doc in enumerate(docs)
+            ]
+        )
+retriever_tool = RetrieverTool(docs_processed)
+```
+BM25 检索方法是一个经典的检索方法，因为它的设置速度非常快。为了提高检索准确性，你可以使用语义搜索，使用文档的向量表示替换 BM25：因此你可以前往 [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) 选择一个好的嵌入模型。
+现在我们已经创建了一个可以从知识库中检���信息的工具，现在我们可以很容易地创建一个利用这个
+`retriever_tool` 的 agent！此 agent 将使用如下参数初始化：
+- `tools`：代理将能够调用的工具列表。
+- `model`：为代理提供动力的 LLM。
+我们的 `model` 必须是一个可调用对象，它接受一个消息的 list 作为输入，并返回文本。它还需要接受一个 stop_sequences 参数，指示何时停止生成。为了方便起见，我们直接使用包中提供的 `HfEngine` 类来获取调用 Hugging Face 的 Inference API 的 LLM 引擎。
+接着，我们将使用 [meta-llama/Llama-3.3-70B-Instruct](meta-llama/Llama-3.3-70B-Instruct) 作为 llm 引
+擎，因为：
+- 它有一个长 128k 上下文，这对处理长源文档很有用。
+- 它在 HF 的 Inference API 上始终免费提供！
+_Note:_ 此 Inference API 托管基于各种标准的模型，部署的模型可能会在没有事先通知的情况下进行更新或替换。了解更多信息，请点击[这里](https://huggingface.co/docs/api-inference/supported-models)。
+```py
+from smolagents import InferenceClientModel, CodeAgent
+agent = CodeAgent(
+    tools=[retriever_tool], model=InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct"), max_steps=4, verbose=True
+)
+```
+当我们初始化 CodeAgent 时，它已经自动获得了一个默认的系统提示，告诉 LLM 引擎按步骤处理并生成工具调用作为代码片段，但你可以根据需要替换此提示模板。接着，当其 `.run()` 方法被调用时，代理将负责调用 LLM 引擎，并在循环中执行工具调用，直到工具 `final_answer` 被调用，而其参数为最终答案。
+```py
+agent_output = agent.run("For a transformers model training, which is slower, the forward or the backward pass?")
+print("Final output:")
+print(agent_output)
+```

docs/source/zh/examples/text_to_sql.md ADDED Viewed

	@@ -0,0 +1,186 @@

+# Text-to-SQL
+[[open-in-colab]]
+在此教程中，我们将看到如何使用 `smolagents` 实现一个利用 SQL 的 agent。
+> 让我们从经典问题开始：为什么不简单地使用标准的 text-to-SQL pipeline 呢？
+标准的 text-to-SQL pipeline 很脆弱，因为生成的 SQL 查询可能会出错。更糟糕的是，查询可能出错却不引发错误警报，从而返回一些不正确或无用的结果。
+👉 相反，agent 系统则可以检视输出结果并决定查询是否需要被更改，因此带来巨大的性能提升。
+让我们来一起构建这个 agent! 💪
+首先，我们构建一个 SQL 的环境：
+```py
+from sqlalchemy import (
+    create_engine,
+    MetaData,
+    Table,
+    Column,
+    String,
+    Integer,
+    Float,
+    insert,
+    inspect,
+    text,
+)
+engine = create_engine("sqlite:///:memory:")
+metadata_obj = MetaData()
+# create city SQL table
+table_name = "receipts"
+receipts = Table(
+    table_name,
+    metadata_obj,
+    Column("receipt_id", Integer, primary_key=True),
+    Column("customer_name", String(16), primary_key=True),
+    Column("price", Float),
+    Column("tip", Float),
+)
+metadata_obj.create_all(engine)
+rows = [
+    {"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20},
+    {"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24},
+    {"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43},
+    {"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00},
+]
+for row in rows:
+    stmt = insert(receipts).values(**row)
+    with engine.begin() as connection:
+        cursor = connection.execute(stmt)
+```
+### 构建 agent
+现在，我们构建一个 agent，它将使用 SQL 查询来回答问题。工具的 description 属性将被 agent 系统嵌入到 LLM 的提示中：它为 LLM 提供有关如何使用该工具的信息。这正是我们描述 SQL 表的地方。
+```py
+inspector = inspect(engine)
+columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]
+table_description = "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
+print(table_description)
+```
+```text
+Columns:
+  - receipt_id: INTEGER
+  - customer_name: VARCHAR(16)
+  - price: FLOAT
+  - tip: FLOAT
+```
+现在让我们构建我们的工具。它需要以下内容：（更多细节请参阅[工具文档](../tutorials/tools)）
+- 一个带有 `Args:` 部分列出参数的 docstring。
+- 输入和输出的type hints。
+```py
+from smolagents import tool
+@tool
+def sql_engine(query: str) -> str:
+    """
+    Allows you to perform SQL queries on the table. Returns a string representation of the result.
+    The table is named 'receipts'. Its description is as follows:
+        Columns:
+        - receipt_id: INTEGER
+        - customer_name: VARCHAR(16)
+        - price: FLOAT
+        - tip: FLOAT
+    Args:
+        query: The query to perform. This should be correct SQL.
+    """
+    output = ""
+    with engine.connect() as con:
+        rows = con.execute(text(query))
+        for row in rows:
+            output += "\n" + str(row)
+    return output
+```
+我们现在使用这个工具来创建一个 agent。我们使用 `CodeAgent`，这是 smolagent 的主要 agent 类：一个在代码中编写操作并根据 ReAct 框架迭代先前输出的 agent。
+这个模型是驱动 agent 系统的 LLM。`InferenceClientModel` 允许你使用 HF  Inference API 调用 LLM，无论是通过 Serverless 还是 Dedicated endpoint，但你也可以使用任何专有 API。
+```py
+from smolagents import CodeAgent, InferenceClientModel
+agent = CodeAgent(
+    tools=[sql_engine],
+    model=InferenceClientModel(model_id="meta-llama/Meta-Llama-3.1-8B-Instruct"),
+)
+agent.run("Can you give me the name of the client who got the most expensive receipt?")
+```
+### Level 2: 表连接
+现在让我们增加一些挑战！我们希望我们的 agent 能够处理跨多个表的连接。因此，我们创建一个新表，记录每个 receipt_id 的服务员名字！
+```py
+table_name = "waiters"
+receipts = Table(
+    table_name,
+    metadata_obj,
+    Column("receipt_id", Integer, primary_key=True),
+    Column("waiter_name", String(16), primary_key=True),
+)
+metadata_obj.create_all(engine)
+rows = [
+    {"receipt_id": 1, "waiter_name": "Corey Johnson"},
+    {"receipt_id": 2, "waiter_name": "Michael Watts"},
+    {"receipt_id": 3, "waiter_name": "Michael Watts"},
+    {"receipt_id": 4, "waiter_name": "Margaret James"},
+]
+for row in rows:
+    stmt = insert(receipts).values(**row)
+    with engine.begin() as connection:
+        cursor = connection.execute(stmt)
+```
+因为我们改变了表，我们需要更新 `SQLExecutorTool`，让 LLM 能够正确利用这个表的信息。
+```py
+updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output.
+It can use the following tables:"""
+inspector = inspect(engine)
+for table in ["receipts", "waiters"]:
+    columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)]
+    table_description = f"Table '{table}':\n"
+    table_description += "Columns:\n" + "\n".join([f"  - {name}: {col_type}" for name, col_type in columns_info])
+    updated_description += "\n\n" + table_description
+print(updated_description)
+```
+因为这个request 比之前的要难一些，我们将 LLM 引擎切换到更强大的 [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)！
+```py
+sql_engine.description = updated_description
+agent = CodeAgent(
+    tools=[sql_engine],
+    model=InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct"),
+)
+agent.run("Which waiter got more total money from tips?")
+```
+它直接就能工作！设置过程非常简单，难道不是吗？
+这个例子到此结束！我们涵盖了这些概念：
+- 构建新工具。
+- 更新工具的描述。
+- 切换到更强大的 LLM 有助于 agent 推理。
+✅ 现在你可以构建你一直梦寐以求的 text-to-SQL 系统了！✨

docs/source/zh/examples/web_browser.md ADDED Viewed

	@@ -0,0 +1,214 @@

+# 使用Agent实现网页浏览器自动化 🤖🌐
+[[open-in-colab]]
+在本notebook中，我们将创建一个**基于Agent的网页浏览器自动化系统**！该系统可以自动导航网站、与网页元素交互并提取信息。
+该Agent将能够：
+- [x] 导航到网页
+- [x] 点击元素
+- [x] 在页面内搜索
+- [x] 处理弹出窗口和模态框
+- [x] 提取信息
+让我们一步步搭建这个系统！
+首先运行以下命令安装所需依赖：
+```bash
+pip install smolagents selenium helium pillow -q
+```
+让我们导入所需的库并设置环境变量：
+```python
+from io import BytesIO
+from time import sleep
+import helium
+from dotenv import load_dotenv
+from PIL import Image
+from selenium import webdriver
+from selenium.webdriver.common.by import By
+from selenium.webdriver.common.keys import Keys
+from smolagents import CodeAgent, tool
+from smolagents.agents import ActionStep
+# Load environment variables
+load_dotenv()
+```
+现在我们来创建核心的浏览器交互工具，使我们的Agent能够导航并与网页交互：
+```python
+@tool
+def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
+    """
+    Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
+    Args:
+        text: The text to search for
+        nth_result: Which occurrence to jump to (default: 1)
+    """
+    elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
+    if nth_result > len(elements):
+        raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
+    result = f"Found {len(elements)} matches for '{text}'."
+    elem = elements[nth_result - 1]
+    driver.execute_script("arguments[0].scrollIntoView(true);", elem)
+    result += f"Focused on element {nth_result} of {len(elements)}"
+    return result
+@tool
+def go_back() -> None:
+    """Goes back to previous page."""
+    driver.back()
+@tool
+def close_popups() -> str:
+    """
+    Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows!
+    This does not work on cookie consent banners.
+    """
+    webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
+```
+让我们配置使用Chrome浏览器并设置截图功能：
+```python
+# Configure Chrome options
+chrome_options = webdriver.ChromeOptions()
+chrome_options.add_argument("--force-device-scale-factor=1")
+chrome_options.add_argument("--window-size=1000,1350")
+chrome_options.add_argument("--disable-pdf-viewer")
+chrome_options.add_argument("--window-position=0,0")
+# Initialize the browser
+driver = helium.start_chrome(headless=False, options=chrome_options)
+# Set up screenshot callback
+def save_screenshot(memory_step: ActionStep, agent: CodeAgent) -> None:
+    sleep(1.0)  # Let JavaScript animations happen before taking the screenshot
+    driver = helium.get_driver()
+    current_step = memory_step.step_number
+    if driver is not None:
+        for previous_memory_step in agent.memory.steps:  # Remove previous screenshots for lean processing
+            if isinstance(previous_memory_step, ActionStep) and previous_memory_step.step_number <= current_step - 2:
+                previous_memory_step.observations_images = None
+        png_bytes = driver.get_screenshot_as_png()
+        image = Image.open(BytesIO(png_bytes))
+        print(f"Captured a browser screenshot: {image.size} pixels")
+        memory_step.observations_images = [image.copy()]  # Create a copy to ensure it persists
+    # Update observations with current URL
+    url_info = f"Current url: {driver.current_url}"
+    memory_step.observations = (
+        url_info if memory_step.observations is None else memory_step.observations + "\n" + url_info
+    )
+```
+现在我们来创建网页自动化Agent：
+```python
+from smolagents import InferenceClientModel
+# Initialize the model
+model_id = "meta-llama/Llama-3.3-70B-Instruct"  # You can change this to your preferred model
+model = InferenceClientModel(model_id=model_id)
+# Create the agent
+agent = CodeAgent(
+    tools=[go_back, close_popups, search_item_ctrl_f],
+    model=model,
+    additional_authorized_imports=["helium"],
+    step_callbacks=[save_screenshot],
+    max_steps=20,
+    verbosity_level=2,
+)
+# Import helium for the agent
+agent.python_executor("from helium import *", agent.state)
+```
+Agent需要获得关于如何使用Helium进行网页自动化的指导。以下是我们将提供的操作说明：
+```python
+helium_instructions = """
+You can use helium to access websites. Don't bother about the helium driver, it's already managed.
+We've already ran "from helium import *"
+Then you can go to pages!
+Code:
+```py
+go_to('github.com/trending')
+```<end_code>
+You can directly click clickable elements by inputting the text that appears on them.
+Code:
+```py
+click("Top products")
+```<end_code>
+If it's a link:
+Code:
+```py
+click(Link("Top products"))
+```<end_code>
+If you try to interact with an element and it's not found, you'll get a LookupError.
+In general stop your action after each button click to see what happens on your screenshot.
+Never try to login in a page.
+To scroll up or down, use scroll_down or scroll_up with as an argument the number of pixels to scroll from.
+Code:
+```py
+scroll_down(num_pixels=1200) # This will scroll one viewport down
+```<end_code>
+When you have pop-ups with a cross icon to close, don't try to click the close icon by finding its element or targeting an 'X' element (this most often fails).
+Just use your built-in tool `close_popups` to close them:
+Code:
+```py
+close_popups()
+```<end_code>
+You can use .exists() to check for the existence of an element. For example:
+Code:
+```py
+if Text('Accept cookies?').exists():
+    click('I accept')
+```<end_code>
+"""
+```
+现在我们可以运行Agent执行任务了！让我们尝试在维基百科上查找信息：
+```python
+search_request = """
+Please navigate to https://en.wikipedia.org/wiki/Chicago and give me a sentence containing the word "1992" that mentions a construction accident.
+"""
+agent_output = agent.run(search_request + helium_instructions)
+print("Final output:")
+print(agent_output)
+```
+您可以通过修改请求参数执行不同任务。例如，以下请求可帮助我判断是否需要更加努力工作：
+```python
+github_request = """
+I'm trying to find how hard I have to work to get a repo in github.com/trending.
+Can you navigate to the profile for the top author of the top trending repo, and give me their total number of commits over the last year?
+"""
+agent_output = agent.run(github_request + helium_instructions)
+print("Final output:")
+print(agent_output)
+```
+该系统在以下任务中尤为有效：
+- 从网站提取数据
+- 网页研究自动化
+- 用户界面测试与验证
+- 内容监控