{ "cells": [ { "cell_type": "markdown", "id": "89791f21c171372a", "metadata": {}, "source": [ "# Agent\n", "\n", "Dans ce *notebook*, **nous allons construire un agent simple en utilisant LangGraph**.\n", "\n", "Ce notebook fait parti du cours sur les agents d'Hugging Face, un cours gratuit qui vous guidera, du **niveau débutant à expert**, pour comprendre, utiliser et construire des agents.\n", "![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)\n", "\n", "Comme nous l'avons vu dans l'Unité 1, un agent a besoin de 3 étapes telles qu'introduites dans l'architecture ReAct :\n", "[ReAct](https://react-lm.github.io/), une architecture générale d'agent.\n", "\n", "* `act` - laisser le modèle appeler des outils spécifiques\n", "* `observe` - transmettre la sortie de l'outil au modèle\n", "* `reason` - permet au modèle de raisonner sur la sortie de l'outil pour décider de ce qu'il doit faire ensuite (par exemple, appeler un autre outil ou simplement répondre directement).\n", "\n", "![Agent](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Agent.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "bef6c5514bd263ce", "metadata": {}, "outputs": [], "source": [ "%pip install -q -U langchain_openai langchain_core langgraph" ] }, { "cell_type": "code", "execution_count": null, "id": "61d0ed53b26fa5c6", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "# Veuillez configurer votre propre clé\n", "os.environ[\"OPENAI_API_KEY\"] = \"sk-xxxxxx\"" ] }, { "cell_type": "code", "execution_count": null, "id": "a4a8bf0d5ac25a37", "metadata": {}, "outputs": [], "source": [ "import base64\n", "from langchain_core.messages import HumanMessage\n", "from langchain_openai import ChatOpenAI\n", "\n", "vision_llm = ChatOpenAI(model=\"gpt-4o\")\n", "\n", "\n", "def extract_text(img_path: str) -> str:\n", " \"\"\"\n", " Extract text from an image file using a multimodal model.\n", "\n", " Args:\n", " img_path: A local image file path (strings).\n", "\n", " Returns:\n", " A single string containing the concatenated text extracted from each image.\n", " \"\"\"\n", " all_text = \"\"\n", " try:\n", "\n", " # Lire l'image et l'encoder en base64\n", " with open(img_path, \"rb\") as image_file:\n", " image_bytes = image_file.read()\n", "\n", " image_base64 = base64.b64encode(image_bytes).decode(\"utf-8\")\n", "\n", " # Préparer le prompt en incluant les données de l'image base64\n", " message = [\n", " HumanMessage(\n", " content=[\n", " {\n", " \"type\": \"text\",\n", " \"text\": (\n", " \"Extract all the text from this image. \"\n", " \"Return only the extracted text, no explanations.\"\n", " ),\n", " },\n", " {\n", " \"type\": \"image_url\",\n", " \"image_url\": {\n", " \"url\": f\"data:image/png;base64,{image_base64}\"\n", " },\n", " },\n", " ]\n", " )\n", " ]\n", "\n", " # Appeler le VLM\n", " response = vision_llm.invoke(message)\n", "\n", " # Ajouter le texte extrait\n", " all_text += response.content + \"\\n\\n\"\n", "\n", " return all_text.strip()\n", " except Exception as e:\n", " # Vous pouvez choisir de renvoyer une chaîne vide ou un message d'erreur.\n", " error_msg = f\"Error extracting text: {str(e)}\"\n", " print(error_msg)\n", " return \"\"\n", "\n", "\n", "llm = ChatOpenAI(model=\"gpt-4o\")\n", "\n", "\n", "def divide(a: int, b: int) -> float:\n", " \"\"\"Divide a and b.\"\"\"\n", " return a / b\n", "\n", "\n", "tools = [\n", " divide,\n", " extract_text\n", "]\n", "llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=False)" ] }, { "cell_type": "markdown", "id": "3e7c17a2e155014e", "metadata": {}, "source": [ "Créons notre LLM et demandons-lui le comportement global souhaité de l'agent." ] }, { "cell_type": "code", "execution_count": null, "id": "f31250bc1f61da81", "metadata": {}, "outputs": [], "source": [ "from typing import TypedDict, Annotated, Optional\n", "from langchain_core.messages import AnyMessage\n", "from langgraph.graph.message import add_messages\n", "\n", "\n", "class AgentState(TypedDict):\n", " # Le document d'entrée\n", " input_file: Optional[str] # Contient le chemin d'accès au fichier, le type (PNG)\n", " messages: Annotated[list[AnyMessage], add_messages]" ] }, { "cell_type": "code", "execution_count": null, "id": "3c4a736f9e55afa9", "metadata": {}, "outputs": [], "source": [ "from langchain_core.messages import HumanMessage, SystemMessage\n", "from langchain_core.utils.function_calling import convert_to_openai_tool\n", "\n", "\n", "def assistant(state: AgentState):\n", " # Message système\n", " textual_description_of_tool = \"\"\"\n", "extract_text(img_path: str) -> str:\n", " Extract text from an image file using a multimodal model.\n", "\n", " Args:\n", " img_path: A local image file path (strings).\n", "\n", " Returns:\n", " A single string containing the concatenated text extracted from each image.\n", "divide(a: int, b: int) -> float:\n", " Divide a and b\n", "\"\"\"\n", " image = state[\"input_file\"]\n", " sys_msg = SystemMessage(content=f\"You are an helpful agent that can analyse some images and run some computatio without provided tools :\\n{textual_description_of_tool} \\n You have access to some otpional images. Currently the loaded images is : {image}\")\n", "\n", " return {\"messages\": [llm_with_tools.invoke([sys_msg] + state[\"messages\"])], \"input_file\": state[\"input_file\"]}" ] }, { "cell_type": "markdown", "id": "6f1efedd943d8b1d", "metadata": {}, "source": [ "Nous définissons un nœud `tools` avec notre liste d'outils.\n", "\n", "Le noeud `assistant` est juste notre modèle avec les outils liés.\n", "\n", "Nous créons un graphe avec les noeuds `assistant` et `tools`.\n", "\n", "Nous ajoutons l'arête `tools_condition`, qui route vers `End` ou vers `tools` selon que le `assistant` appelle ou non un outil.\n", "\n", "Maintenant, nous ajoutons une nouvelle étape :\n", "\n", "Nous connectons le noeud `tools` au `assistant`, formant une boucle.\n", "\n", "* Après l'exécution du noeud `assistant`, `tools_condition` vérifie si la sortie du modèle est un appel d'outil.\n", "* Si c'est le cas, le flux est dirigé vers le noeud `tools`.\n", "* Le noeud `tools` se connecte à `assistant`.\n", "* Cette boucle continue tant que le modèle décide d'appeler des outils.\n", "* Si la réponse du modèle n'est pas un appel d'outil, le flux est dirigé vers END, mettant fin au processus." ] }, { "cell_type": "code", "execution_count": null, "id": "e013061de784638a", "metadata": {}, "outputs": [], "source": [ "from langgraph.graph import START, StateGraph\n", "from langgraph.prebuilt import ToolNode, tools_condition\n", "from IPython.display import Image, display\n", "\n", "# Graphe\n", "builder = StateGraph(AgentState)\n", "\n", "# Définir les nœuds : ce sont eux qui font le travail\n", "builder.add_node(\"assistant\", assistant)\n", "builder.add_node(\"tools\", ToolNode(tools))\n", "\n", "# Définir les arêtes : elles déterminent la manière dont le flux de contrôle se déplace\n", "builder.add_edge(START, \"assistant\")\n", "builder.add_conditional_edges(\n", " \"assistant\",\n", " # Si le dernier message (résultat) de l'assistant est un appel d'outil -> tools_condition va vers tools\n", " # Si le dernier message (résultat) de l'assistant n'est pas un appel d'outil -> tools_condition va à END\n", " tools_condition,\n", ")\n", "builder.add_edge(\"tools\", \"assistant\")\n", "react_graph = builder.compile()\n", "\n", "# Afficher\n", "display(Image(react_graph.get_graph(xray=True).draw_mermaid_png()))" ] }, { "cell_type": "code", "execution_count": null, "id": "d3b0ba5be1a54aad", "metadata": {}, "outputs": [], "source": [ "messages = [HumanMessage(content=\"Divide 6790 by 5\")]\n", "\n", "messages = react_graph.invoke({\"messages\": messages, \"input_file\": None})" ] }, { "cell_type": "code", "execution_count": null, "id": "55eb0f1afd096731", "metadata": {}, "outputs": [], "source": [ "for m in messages['messages']:\n", " m.pretty_print()" ] }, { "cell_type": "markdown", "id": "e0062c1b99cb4779", "metadata": {}, "source": [ "## Programme d'entraînement\n", "M. Wayne a laissé une note avec son programme d'entraînement pour la semaine. J'ai trouvé une recette pour le dîner, laissée dans une note.\n", "\n", "Vous pouvez trouver le document [ICI](https://huggingface.co/datasets/agents-course/course-images/blob/main/en/unit2/LangGraph/Batman_training_and_meals.png), alors téléchargez-le et mettez-le dans le dossier local.\n", "\n", "![Training](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/LangGraph/Batman_training_and_meals.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "2e166ebba82cfd2a", "metadata": {}, "outputs": [], "source": [ "messages = [HumanMessage(content=\"According the note provided by MR wayne in the provided images. What's the list of items I should buy for the dinner menu ?\")]\n", "\n", "messages = react_graph.invoke({\"messages\": messages, \"input_file\": \"Batman_training_and_meals.png\"})" ] }, { "cell_type": "code", "execution_count": null, "id": "5bfd67af70b7dcf3", "metadata": {}, "outputs": [], "source": [ "for m in messages['messages']:\n", " m.pretty_print()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 5 }