{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "gJXW_DgiSebM" }, "source": [ "# LangGraph and LangSmith - Agentic RAG Powered by LangChain\n", "\n", "In the following notebook we'll complete the following tasks:\n", "\n", "- 🤝 Breakout Room #1:\n", " 1. Install required libraries\n", " 2. Set Environment Variables\n", " 3. Creating our Tool Belt\n", " 4. Creating Our State\n", " 5. Creating and Compiling A Graph!\n", "\n", "- 🤝 Breakout Room #2:\n", " 1. Evaluating the LangGraph Application with LangSmith\n", " 2. Adding Helpfulness Check and \"Loop\" Limits\n", " 3. LangGraph for the \"Patterns\" of GenAI" ] }, { "cell_type": "markdown", "metadata": { "id": "djQ3nRAgoF67" }, "source": [ "# 🤝 Breakout Room #1" ] }, { "cell_type": "markdown", "metadata": { "id": "e7pQDUhUnIo8" }, "source": [ "## Part 1: LangGraph - Building Cyclic Applications with LangChain\n", "\n", "LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.\n", "\n", "### Why Cycles?\n", "\n", "In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.\n", "\n", "Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.\n", "\n", "### Why LangGraph?\n", "\n", "Beyond the agent-forward approach - we can easily compose and combine traditional \"DAG\" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!" ] }, { "cell_type": "markdown", "metadata": { "id": "3_fLDElOVoop" }, "source": [ "## Task 1: Dependencies\n", "\n", "We'll first install all our required libraries.\n", "\n", "> NOTE: If you're running this locally - please skip this step." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KaVwN269EttM", "outputId": "ba50f775-3957-4d88-9a88-43acc6966dda" }, "outputs": [], "source": [ "#!pip install -qU langchain langchain_openai langchain-community langgraph arxiv" ] }, { "cell_type": "markdown", "metadata": { "id": "wujPjGJuoPwg" }, "source": [ "## Task 2: Environment Variables\n", "\n", "We'll want to set both our OpenAI API key and our LangSmith environment variables." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Jdh8CoVWHRvs", "outputId": "3fa78560-393c-4ee5-b871-9886bf0d70f4" }, "outputs": [], "source": [ "import os\n", "import getpass\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Jkla2fpx28QK", "outputId": "52d7ad22-fcb1-4abe-853b-216c55a12650" }, "outputs": [], "source": [ "os.environ[\"TAVILY_API_KEY\"] = getpass.getpass(\"TAVILY_API_KEY\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Nv0glIDyHmRt", "outputId": "b69df90a-b4e1-4ddb-9de0-882d98b68ab2" }, "outputs": [], "source": [ "from uuid import uuid4\n", "\n", "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n", "os.environ[\"LANGCHAIN_PROJECT\"] = f\"AIE6 - LangGraph - {uuid4().hex[0:8]}\"\n", "os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass(\"LangSmith API Key: \")" ] }, { "cell_type": "markdown", "metadata": { "id": "sBRyQmEAVzua" }, "source": [ "## Task 3: Creating our Tool Belt\n", "\n", "As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.\n", "\n", "There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!\n", "\n", "We'll leverage:\n", "\n", "- [Tavily Search Results](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/tools/tavily_search/tool.py)\n", "- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)" ] }, { "cell_type": "markdown", "metadata": { "id": "2k6n_Dob2F46" }, "source": [ "#### 🏗️ Activity #1:\n", "\n", "Please add the tools to use into our toolbelt.\n", "\n", "> NOTE: Each tool in our toolbelt should be a method." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "lAxaSvlfIeOg" }, "outputs": [], "source": [ "from langchain_community.tools.tavily_search import TavilySearchResults\n", "from langchain_community.tools.arxiv.tool import ArxivQueryRun\n", "\n", "tavily_tool = TavilySearchResults(max_results=5)\n", "\n", "tool_belt = [\n", " tavily_tool,\n", " ArxivQueryRun(),\n", "]" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[TavilySearchResults(api_wrapper=TavilySearchAPIWrapper(tavily_api_key=SecretStr('**********'))),\n", " ArxivQueryRun(api_wrapper=ArxivAPIWrapper(arxiv_search=, arxiv_exceptions=(, , ), top_k_results=3, ARXIV_MAX_QUERY_LENGTH=300, continue_on_failure=False, load_max_docs=100, load_all_available_meta=False, doc_content_chars_max=4000))]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tool_belt" ] }, { "cell_type": "markdown", "metadata": { "id": "VI-C669ZYVI5" }, "source": [ "### Model\n", "\n", "Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:\n", "\n", "- OpenAI's GPT-3.5 and GPT-4\n", "- Anthropic's Claude\n", "- Google's Gemini\n", "\n", "> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "QkNS8rNZJs4z" }, "outputs": [], "source": [ "from langchain_openai import ChatOpenAI\n", "\n", "model = ChatOpenAI(model=\"gpt-4o\", temperature=0)" ] }, { "cell_type": "markdown", "metadata": { "id": "Ugkj3GzuZpQv" }, "source": [ "Now that we have our model set-up, let's \"put on the tool belt\", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "4OdMqFafZ_0V" }, "outputs": [], "source": [ "model = model.bind_tools(tool_belt)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RunnableBinding(bound=ChatOpenAI(client=, async_client=, root_client=, root_async_client=, model_name='gpt-4o', temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********')), kwargs={'tools': [{'type': 'function', 'function': {'name': 'tavily_search_results_json', 'description': 'A search engine optimized for comprehensive, accurate, and trusted results. Useful for when you need to answer questions about current events. Input should be a search query.', 'parameters': {'properties': {'query': {'description': 'search query to look up', 'type': 'string'}}, 'required': ['query'], 'type': 'object'}}}, {'type': 'function', 'function': {'name': 'arxiv', 'description': 'A wrapper around Arxiv.org Useful for when you need to answer questions about Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering, and Economics from scientific articles on arxiv.org. Input should be a search query.', 'parameters': {'properties': {'query': {'description': 'search query to look up', 'type': 'string'}}, 'required': ['query'], 'type': 'object'}}}]}, config={}, config_factories=[])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model" ] }, { "cell_type": "markdown", "metadata": { "id": "ERzuGo6W18Lr" }, "source": [ "#### ❓ Question #1:\n", "\n", "How does the model determine which tool to use?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Answer\n", " - When tools are bound to a model (as seen in the line model = model.bind_tools(tool_belt)), the LLM receives information about each tool's name, description, and required parameters.\n", " - The model analyzes the user's query to understand what information is needed or what task needs to be performed.\n", " - Based on this understanding, the model decides which tool would be most appropriate to use by selecting from the available tools in its context.\n", " - In the function calling framework (like OpenAI's), the model outputs a structured format specifying which tool to call and with what parameters." ] }, { "cell_type": "markdown", "metadata": { "id": "_296Ub96Z_H8" }, "source": [ "## Task 4: Putting the State in Stateful\n", "\n", "Earlier we used this phrasing:\n", "\n", "`coordinated multi-actor and stateful applications`\n", "\n", "So what does that \"stateful\" mean?\n", "\n", "To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.\n", "\n", "LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.\n", "\n", "There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.\n", "\n", "Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):\n", "\n", "1. We initialize our state object:\n", " - `{\"messages\" : []}`\n", "2. Our user submits a query to our application.\n", " - New State: `HumanMessage(#1)`\n", " - `{\"messages\" : [HumanMessage(#1)}`\n", "3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.\n", " - New State: `AgentMessage(#1, additional_kwargs {\"function_call\" : \"WebSearchTool\"})`\n", " - `{\"messages\" : [HumanMessage(#1), AgentMessage(#1, ...)]}`\n", "4. We pass our state object to a \"conditional node\" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "mxL9b_NZKUdL" }, "outputs": [], "source": [ "from typing import TypedDict, Annotated\n", "from langgraph.graph.message import add_messages\n", "import operator\n", "from langchain_core.messages import BaseMessage\n", "\n", "class AgentState(TypedDict):\n", " messages: Annotated[list, add_messages]" ] }, { "cell_type": "markdown", "metadata": { "id": "vWsMhfO9grLu" }, "source": [ "## Task 5: It's Graphing Time!\n", "\n", "Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!\n", "\n", "Let's take a second to refresh ourselves about what a graph is in this context.\n", "\n", "Graphs, also called networks in some circles, are a collection of connected objects.\n", "\n", "The objects in question are typically called nodes, or vertices, and the connections are called edges.\n", "\n", "Let's look at a simple graph.\n", "\n", "![image](https://i.imgur.com/2NFLnIc.png)\n", "\n", "Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.\n", "\n", "If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.\n", "\n", "If we were to think about edges in the context of LangGraph - we might think of them as \"paths to take\" or \"where to pass our state object next\".\n", "\n", "Let's create some nodes and expand on our diagram.\n", "\n", "> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "id": "91flJWtZLUrl" }, "outputs": [], "source": [ "from langgraph.prebuilt import ToolNode\n", "\n", "def call_model(state):\n", " messages = state[\"messages\"]\n", " response = model.invoke(messages)\n", " return {\"messages\" : [response]}\n", "\n", "tool_node = ToolNode(tool_belt)" ] }, { "cell_type": "markdown", "metadata": { "id": "2bwR7MgWj3Wg" }, "source": [ "Now we have two total nodes. We have:\n", "\n", "- `call_model` is a node that will...well...call the model\n", "- `tool_node` is a node which can call a tool\n", "\n", "Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_vF4_lgtmQNo", "outputId": "a4384377-8f7a-415f-be1b-fee6169cb101" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langgraph.graph import StateGraph, END\n", "\n", "uncompiled_graph = StateGraph(AgentState)\n", "\n", "uncompiled_graph.add_node(\"agent\", call_model)\n", "uncompiled_graph.add_node(\"action\", tool_node)" ] }, { "cell_type": "markdown", "metadata": { "id": "b8CjRlbVmRpW" }, "source": [ "Let's look at what we have so far:\n", "\n", "![image](https://i.imgur.com/md7inqG.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "uaXHpPeSnOWC" }, "source": [ "Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YGCbaYqRnmiw", "outputId": "5351807c-2ac7-4316-a3a3-878abeacd114" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uncompiled_graph.set_entry_point(\"agent\")" ] }, { "cell_type": "markdown", "metadata": { "id": "BUsfGoSpoF9U" }, "source": [ "![image](https://i.imgur.com/wNixpJe.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "0Q_pQgHmoW0M" }, "source": [ "Now we want to build a \"conditional edge\" which will use the output state of a node to determine which path to follow.\n", "\n", "We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!\n", "\n", "Notice how our function simply checks if there is a \"function_call\" kwarg present.\n", "\n", "Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).\n", "\n", "It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `\"end\"` or `\"continue\"` which are subsequently mapped to the action node or the END node." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1BZgb81VQf9o", "outputId": "73a07c15-5f0b-40f2-b033-38b57d056dd8" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def should_continue(state):\n", " last_message = state[\"messages\"][-1]\n", "\n", " if last_message.tool_calls:\n", " return \"action\"\n", "\n", " return END\n", "\n", "uncompiled_graph.add_conditional_edges(\n", " \"agent\",\n", " should_continue\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "-Cvhcf4jp0Ce" }, "source": [ "Let's visualize what this looks like.\n", "\n", "![image](https://i.imgur.com/8ZNwKI5.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "yKCjWJCkrJb9" }, "source": [ "Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UvcgbHf1rIXZ", "outputId": "45d4bdd6-d6bb-4a1d-bb79-cad43c130bf2" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "uncompiled_graph.add_edge(\"action\", \"agent\")" ] }, { "cell_type": "markdown", "metadata": { "id": "EiWDwBQtrw7Z" }, "source": [ "Let's look at the final visualization.\n", "\n", "![image](https://i.imgur.com/NWO7usO.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "KYqDpErlsCsu" }, "source": [ "All that's left to do now is to compile our workflow - and we're off!" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "id": "zt9-KS8DpzNx" }, "outputs": [], "source": [ "compiled_graph = uncompiled_graph.compile()" ] }, { "cell_type": "markdown", "metadata": { "id": "xhNWIwBL1W4Q" }, "source": [ "#### ❓ Question #2:\n", "\n", "Is there any specific limit to how many times we can cycle?\n", "\n", "If not, how could we impose a limit to the number of cycles?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Answer\n", " - By default Langgraph has 25 super steps limit after which we will see `GraphRecursionError` from langgraph.\n", " - There are couple of ways to impose limit\n", " - Have a conditional edge that specifies a termination condition\n", " - this could be a business case condition\n", " - just a counter\n", " - Setting up `recursion_limit` in the invoke will limit recursion\n", " ```\n", " graph.invoke(inputs, {\"recursion_limit\": 3})\n", "\n", " ```\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "VEYcTShCsPaa" }, "source": [ "## Using Our Graph\n", "\n", "Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!\n", "\n", "Let's try out a few examples to see how it fairs:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Qn4n37PQRPII", "outputId": "5eeedfae-089d-496e-e71f-071939fa5832" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Receiving update from node: 'agent'\n", "[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_tsXMo9fY3k63iaeVgvVkRR6k', 'function': {'arguments': '{\"query\":\"current captain of the Winnipeg Jets 2023\"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 162, 'total_tokens': 189, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-005e321a-543b-49aa-9e22-d03ecac04943-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'current captain of the Winnipeg Jets 2023'}, 'id': 'call_tsXMo9fY3k63iaeVgvVkRR6k', 'type': 'tool_call'}], usage_metadata={'input_tokens': 162, 'output_tokens': 27, 'total_tokens': 189, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n", "Receiving update from node: 'action'\n", "[ToolMessage(content='[{\"url\": \"https://www.hockey-reference.com/teams/WPG/2024.html\", \"content\": \"via Sports Logos.net\\\\n\\\\nAbout logos\\\\n\\\\n2023-24\\\\nWinnipeg Jets\\\\nRoster and Statistics\\\\n\\\\nRecord: 52-24-6 (110 points), Finished 2nd in NHL\\xa0Central Division\\\\t\\xa0(Schedule and Results)\\\\n\\\\nCoach: Rick Bowness (52-24-6)\\\\n\\\\nCaptain:\\\\nAdam Lowry\\\\n\\\\nPrimary Arena: Canada Life Centre\\\\n\\\\nGoals For: 259 (15th of 32), Goals Against: 198 (1st of 32)\\\\n\\\\tSRS: 0.69 (4th of 32),\\xa0\\\\n\\\\tSOS: -0.04 (29th of 32)\\\\n\\\\nPlayoffs:Lost First Round (4-1) to Colorado Avalanche\\\\n\\\\nPreseason Odds: Stanley Cup +5000; O/U: 91.5\\\\n\\\\nOn this page: [...] 4 | Josh Morrissey | 28 | D | 5 | 3 | 1 | 4 | -3 | 2 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | 12 | 25.0 | 28 | 122:55 | 24:35 | 0 | 0 | | 12 | 4 | 2 | 2 | \\\\n5 | Brenden Dillon | 33 | D | 3 | 0 | 3 | 3 | 1 | 4 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 1 | 0.0 | 9 | 63:13 | 21:04 | 0 | 0 | | 6 | 12 | 0 | 0 | \\\\n6 | Tyler Toffoli | 31 | C | 5 | 2 | 0 | 2 | -2 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 25.0 | 18 | 70:23 | 14:05 | 4 | 1 | 80.0 | 2 | 8 | 0 | 1 | [...] 1 | Mark Scheifele | 30 | C | 5 | 8 | 3 | 11 | 0 | -6 | \\\\n2 | Kyle Connor | 27 | LW | 5 | 9 | 3 | 8 | 0 | -2 | \\\\n3 | Gabriel Vilardi | 24 | C | 5 | 8 | 3 | 9 | 0 | -4 | \\\\n4 | Josh Morrissey | 28 | D | 5 | 8 | 3 | 8 | 0 | -3 | \\\\n5 | Brenden Dillon | 33 | D | 3 | 5 | 0 | 6 | 2 | 1 | \\\\n6 | Nino Niederreiter | 31 | RW | 5 | 2 | 0 | 7 | 0 | -5 | \\\\n7 | Mason Appleton | 28 | C | 5 | 3 | 0 | 7 | 0 | -4 | \\\\n8 | Tyler Toffoli | 31 | C | 5 | 2 | 0 | 4 | 0 | -2 |\"}, {\"url\": \"https://www.nytimes.com/athletic/4855976/2023/09/12/adam-lowry-jets-captain/\", \"content\": \"NHL\\\\n\\\\nJets name Adam Lowry as captain ahead of 2023-24 season: Why Winnipeg chose him\\\\n\\\\nThe Winnipeg Jets finally have a captain again. The team gave forward Adam Lowry the honor Tuesday, making him the third captain in franchise history since the Jets moved from Atlanta. Here’s what you need to know:\\\\n\\\\nWe’ll battle beside him any day 😤 pic.twitter.com/2eeP3tjtGh\\\\n\\\\n— Winnipeg Jets (@NHLJets) September 12, 2023\\\\n\\\\n\\\\n\\\\nThe Athletic’s instant analysis:\\\\n\\\\nWhy Lowry? [...] important, checking role. Still, if you need a superstar to be your team’s captain, Lowry isn’t it. —\\xa0Ates [...] assist in five playoff games.\"}, {\"url\": \"https://www.nhl.com/jets/player/adam-lowry-8476392\", \"content\": \"Adam Lowry was named captain of the Winnipeg Jets on Sept. 12, 2023, and at age 30 was planning to rely on his father, Dave Lowry.\"}, {\"url\": \"https://en.wikipedia.org/wiki/2023%E2%80%9324_Winnipeg_Jets_season\", \"content\": \"2023–24Winnipeg Jets\\\\nDivision | 2ndCentral\\\\nConference | 2ndWestern\\\\n2023–24 record | 52–24–6\\\\nHome record | 27–11–3\\\\nRoad record | 25–13–3\\\\nGoals for | 259\\\\nGoals against | 199\\\\nTeam information\\\\nGeneral manager | Kevin Cheveldayoff\\\\nCoach | Rick Bowness(Oct. 11 – Oct. 23, Nov. 24 – Apr. 30)Scott Arniel(interim, Oct. 23 – Nov. 24)\\\\nCaptain | Adam Lowry\\\\nAlternate captains | Josh MorrisseyMark Scheifele\\\\nArena | Canada Life Centre\\\\nAverage attendance | 13,447[1] [...] vteWinnipeg Jets\\\\nFormerly theAtlanta ThrashersFounded in1999Based inWinnipeg, Manitoba\\\\nFranchise | TeamGeneral managersCoachesPlayersCaptainsDraft picksexpansion draftSeasonsCurrent season\\\\nHistory | RecordsAward winnersBroadcasters\\\\nPersonnel | Owner(s)True North Sports & Entertainment(Mark Chipman, chairman)General managerKevin CheveldayoffHead coachScott ArnielTeam captainAdam LowryCurrent roster\\\\nArena | Canada Life Centre\\\\nAffiliates | AHLManitoba MooseECHLNorfolk Admirals [...] Player | GP | G | A | Pts | +/− | PIM\\\\nMark Scheifele | 74 | 25 | 47 | 72 | 19 | 57\\\\nJosh Morrissey | 81 | 10 | 59 | 69 | 34 | 44\\\\nKyle Connor | 65 | 34 | 27 | 61 | –6 | 6\\\\nNikolaj Ehlers | 82 | 25 | 36 | 61 | 27 | 29\\\\nCole Perfetti | 71 | 19 | 19 | 38 | 13 | 12\\\\nVladislav Namestnikov | 78 | 11 | 26 | 37 | 17 | 37\\\\nGabriel Vilardi | 47 | 22 | 14 | 36 | 11 | 14\\\\nMason Appleton | 82 | 14 | 22 | 36 | 16 | 26\\\\nAdam Lowry | 81 | 12 | 23 | 35 | 17 | 57\\\\nNino Niederreiter | 77 | 18 | 16 | 34 | 12 | 34\"}, {\"url\": \"https://www.nhlpa.com/news/2-36227/jets-make-centre-adam-lowry-new-captain-third-to-wear-c-in-winnipeg-history\", \"content\": \"Lowry is the third Jets captain since the team moved from Atlanta in 2011. After going without a captain for the 2022-23 season, Winnipeg chose the rugged centre over alternate captains Josh Morrissey and Mark Scheifele to succeed Blake Wheeler and Andrew Ladd.\\\\n\\\\n“It’s pretty special,” Lowry said at a press conference held at Canada Life Centre. “The FaceTime to my mom and seeing her reaction, and just seeing how proud she was, it was extremely special. [...] \\\\\"It’s an important step and an exciting one. I think Adam really embodies a lot of what the city of Winnipeg is all about — hard work, honest, blue collar, and comes to work each and every day.”\\\\n\\\\nCheveldayoff believes that the Jets\\' newest captain will help usher in a new era.\\\\n\\\\n\\\\\"We believe the time is right for Adam Lowry to assume this role and set the standard for the way the Winnipeg Jets will move forward with new leadership and new purpose.”\"}]', name='tavily_search_results_json', id='a365f21e-d91a-439c-bbf9-e48913578093', tool_call_id='call_tsXMo9fY3k63iaeVgvVkRR6k', artifact={'query': 'current captain of the Winnipeg Jets 2023', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://www.hockey-reference.com/teams/WPG/2024.html', 'title': '2023-24 Winnipeg Jets Roster and Statistics - Hockey-Reference.com', 'content': 'via Sports Logos.net\\n\\nAbout logos\\n\\n2023-24\\nWinnipeg Jets\\nRoster and Statistics\\n\\nRecord: 52-24-6 (110 points), Finished 2nd in NHL\\xa0Central Division\\t\\xa0(Schedule and Results)\\n\\nCoach: Rick Bowness (52-24-6)\\n\\nCaptain:\\nAdam Lowry\\n\\nPrimary Arena: Canada Life Centre\\n\\nGoals For: 259 (15th of 32), Goals Against: 198 (1st of 32)\\n\\tSRS: 0.69 (4th of 32),\\xa0\\n\\tSOS: -0.04 (29th of 32)\\n\\nPlayoffs:Lost First Round (4-1) to Colorado Avalanche\\n\\nPreseason Odds: Stanley Cup +5000; O/U: 91.5\\n\\nOn this page: [...] 4 | Josh Morrissey | 28 | D | 5 | 3 | 1 | 4 | -3 | 2 | 1 | 2 | 0 | 0 | 0 | 1 | 0 | 12 | 25.0 | 28 | 122:55 | 24:35 | 0 | 0 | | 12 | 4 | 2 | 2 | \\n5 | Brenden Dillon | 33 | D | 3 | 0 | 3 | 3 | 1 | 4 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 1 | 0.0 | 9 | 63:13 | 21:04 | 0 | 0 | | 6 | 12 | 0 | 0 | \\n6 | Tyler Toffoli | 31 | C | 5 | 2 | 0 | 2 | -2 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 25.0 | 18 | 70:23 | 14:05 | 4 | 1 | 80.0 | 2 | 8 | 0 | 1 | [...] 1 | Mark Scheifele | 30 | C | 5 | 8 | 3 | 11 | 0 | -6 | \\n2 | Kyle Connor | 27 | LW | 5 | 9 | 3 | 8 | 0 | -2 | \\n3 | Gabriel Vilardi | 24 | C | 5 | 8 | 3 | 9 | 0 | -4 | \\n4 | Josh Morrissey | 28 | D | 5 | 8 | 3 | 8 | 0 | -3 | \\n5 | Brenden Dillon | 33 | D | 3 | 5 | 0 | 6 | 2 | 1 | \\n6 | Nino Niederreiter | 31 | RW | 5 | 2 | 0 | 7 | 0 | -5 | \\n7 | Mason Appleton | 28 | C | 5 | 3 | 0 | 7 | 0 | -4 | \\n8 | Tyler Toffoli | 31 | C | 5 | 2 | 0 | 4 | 0 | -2 |', 'score': 0.90432173, 'raw_content': None}, {'url': 'https://www.nytimes.com/athletic/4855976/2023/09/12/adam-lowry-jets-captain/', 'title': 'Jets name Adam Lowry as captain ahead of 2023-24 season', 'content': 'NHL\\n\\nJets name Adam Lowry as captain ahead of 2023-24 season: Why Winnipeg chose him\\n\\nThe Winnipeg Jets finally have a captain again. The team gave forward Adam Lowry the honor Tuesday, making him the third captain in franchise history since the Jets moved from Atlanta. Here’s what you need to know:\\n\\nWe’ll battle beside him any day 😤 pic.twitter.com/2eeP3tjtGh\\n\\n— Winnipeg Jets (@NHLJets) September 12, 2023\\n\\n\\n\\nThe Athletic’s instant analysis:\\n\\nWhy Lowry? [...] important, checking role. Still, if you need a superstar to be your team’s captain, Lowry isn’t it. —\\xa0Ates [...] assist in five playoff games.', 'score': 0.89501935, 'raw_content': None}, {'url': 'https://www.nhl.com/jets/player/adam-lowry-8476392', 'title': 'Adam Lowry Stats And News | NHL.com', 'content': 'Adam Lowry was named captain of the Winnipeg Jets on Sept. 12, 2023, and at age 30 was planning to rely on his father, Dave Lowry.', 'score': 0.8947989, 'raw_content': None}, {'url': 'https://en.wikipedia.org/wiki/2023%E2%80%9324_Winnipeg_Jets_season', 'title': '2023–24 Winnipeg Jets season - Wikipedia', 'content': '2023–24Winnipeg Jets\\nDivision | 2ndCentral\\nConference | 2ndWestern\\n2023–24 record | 52–24–6\\nHome record | 27–11–3\\nRoad record | 25–13–3\\nGoals for | 259\\nGoals against | 199\\nTeam information\\nGeneral manager | Kevin Cheveldayoff\\nCoach | Rick Bowness(Oct. 11 – Oct. 23, Nov. 24 – Apr. 30)Scott Arniel(interim, Oct. 23 – Nov. 24)\\nCaptain | Adam Lowry\\nAlternate captains | Josh MorrisseyMark Scheifele\\nArena | Canada Life Centre\\nAverage attendance | 13,447[1] [...] vteWinnipeg Jets\\nFormerly theAtlanta ThrashersFounded in1999Based inWinnipeg, Manitoba\\nFranchise | TeamGeneral managersCoachesPlayersCaptainsDraft picksexpansion draftSeasonsCurrent season\\nHistory | RecordsAward winnersBroadcasters\\nPersonnel | Owner(s)True North Sports & Entertainment(Mark Chipman, chairman)General managerKevin CheveldayoffHead coachScott ArnielTeam captainAdam LowryCurrent roster\\nArena | Canada Life Centre\\nAffiliates | AHLManitoba MooseECHLNorfolk Admirals [...] Player | GP | G | A | Pts | +/− | PIM\\nMark Scheifele | 74 | 25 | 47 | 72 | 19 | 57\\nJosh Morrissey | 81 | 10 | 59 | 69 | 34 | 44\\nKyle Connor | 65 | 34 | 27 | 61 | –6 | 6\\nNikolaj Ehlers | 82 | 25 | 36 | 61 | 27 | 29\\nCole Perfetti | 71 | 19 | 19 | 38 | 13 | 12\\nVladislav Namestnikov | 78 | 11 | 26 | 37 | 17 | 37\\nGabriel Vilardi | 47 | 22 | 14 | 36 | 11 | 14\\nMason Appleton | 82 | 14 | 22 | 36 | 16 | 26\\nAdam Lowry | 81 | 12 | 23 | 35 | 17 | 57\\nNino Niederreiter | 77 | 18 | 16 | 34 | 12 | 34', 'score': 0.89053273, 'raw_content': None}, {'url': 'https://www.nhlpa.com/news/2-36227/jets-make-centre-adam-lowry-new-captain-third-to-wear-c-in-winnipeg-history', 'title': \"Jets make centre Adam Lowry new captain, third to wear 'C' in ...\", 'content': 'Lowry is the third Jets captain since the team moved from Atlanta in 2011. After going without a captain for the 2022-23 season, Winnipeg chose the rugged centre over alternate captains Josh Morrissey and Mark Scheifele to succeed Blake Wheeler and Andrew Ladd.\\n\\n“It’s pretty special,” Lowry said at a press conference held at Canada Life Centre. “The FaceTime to my mom and seeing her reaction, and just seeing how proud she was, it was extremely special. [...] \"It’s an important step and an exciting one. I think Adam really embodies a lot of what the city of Winnipeg is all about — hard work, honest, blue collar, and comes to work each and every day.”\\n\\nCheveldayoff believes that the Jets\\' newest captain will help usher in a new era.\\n\\n\"We believe the time is right for Adam Lowry to assume this role and set the standard for the way the Winnipeg Jets will move forward with new leadership and new purpose.”', 'score': 0.8893851, 'raw_content': None}], 'response_time': 2.33})]\n", "\n", "\n", "\n", "Receiving update from node: 'agent'\n", "[AIMessage(content='The current captain of the Winnipeg Jets is Adam Lowry. He was named captain on September 12, 2023.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 2020, 'total_tokens': 2047, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'stop', 'logprobs': None}, id='run-bd0ab516-98de-4caf-843b-fc2a802b4f15-0', usage_metadata={'input_tokens': 2020, 'output_tokens': 27, 'total_tokens': 2047, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n" ] } ], "source": [ "from langchain_core.messages import HumanMessage\n", "\n", "inputs = {\"messages\" : [HumanMessage(content=\"Who is the current captain of the Winnipeg Jets?\")]}\n", "\n", "async for chunk in compiled_graph.astream(inputs, stream_mode=\"updates\"):\n", " for node, values in chunk.items():\n", " print(f\"Receiving update from node: '{node}'\")\n", " print(values[\"messages\"])\n", " print(\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": { "id": "DBHnUtLSscRr" }, "source": [ "Let's look at what happened:\n", "\n", "1. Our state object was populated with our request\n", "2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge\n", "3. The conditional edge received the state object, found the \"tool_calls\" `additional_kwarg`, and sent the state object to the action node\n", "4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node\n", "5. The agent node added a response to the state object and passed it along the conditional edge\n", "6. The conditional edge received the state object, could not find the \"tool_calls\" `additional_kwarg` and passed the state object to END where we see it output in the cell above!\n", "\n", "Now let's look at an example that shows a multiple tool usage - all with the same flow!" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "afv2BuEsV5JG", "outputId": "ff009536-d281-4a56-c126-9cd245352bfe" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Receiving update from node: 'agent'\n", "[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_pPveli1xNGa6G9xZEKSFJ4Ux', 'function': {'arguments': '{\"query\":\"QLoRA\"}', 'name': 'arxiv'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 178, 'total_tokens': 195, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-0a66a0cc-3904-4fca-a870-0e3decad4e69-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_pPveli1xNGa6G9xZEKSFJ4Ux', 'type': 'tool_call'}], usage_metadata={'input_tokens': 178, 'output_tokens': 17, 'total_tokens': 195, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n", "Receiving update from node: 'action'\n", "Tool Used: arxiv\n", "[ToolMessage(content='Published: 2023-05-23\\nTitle: QLoRA: Efficient Finetuning of Quantized LLMs\\nAuthors: Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer\\nSummary: We present QLoRA, an efficient finetuning approach that reduces memory usage\\nenough to finetune a 65B parameter model on a single 48GB GPU while preserving\\nfull 16-bit finetuning task performance. QLoRA backpropagates gradients through\\na frozen, 4-bit quantized pretrained language model into Low Rank\\nAdapters~(LoRA). Our best model family, which we name Guanaco, outperforms all\\nprevious openly released models on the Vicuna benchmark, reaching 99.3% of the\\nperformance level of ChatGPT while only requiring 24 hours of finetuning on a\\nsingle GPU. QLoRA introduces a number of innovations to save memory without\\nsacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is\\ninformation theoretically optimal for normally distributed weights (b) double\\nquantization to reduce the average memory footprint by quantizing the\\nquantization constants, and (c) paged optimziers to manage memory spikes. We\\nuse QLoRA to finetune more than 1,000 models, providing a detailed analysis of\\ninstruction following and chatbot performance across 8 instruction datasets,\\nmultiple model types (LLaMA, T5), and model scales that would be infeasible to\\nrun with regular finetuning (e.g. 33B and 65B parameter models). Our results\\nshow that QLoRA finetuning on a small high-quality dataset leads to\\nstate-of-the-art results, even when using smaller models than the previous\\nSoTA. We provide a detailed analysis of chatbot performance based on both human\\nand GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable\\nalternative to human evaluation. Furthermore, we find that current chatbot\\nbenchmarks are not trustworthy to accurately evaluate the performance levels of\\nchatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to\\nChatGPT. We release all of our models and code, including CUDA kernels for\\n4-bit training.\\n\\nPublished: 2024-05-27\\nTitle: Accurate LoRA-Finetuning Quantization of LLMs via Information Retention\\nAuthors: Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno\\nSummary: The LoRA-finetuning quantization of LLMs has been extensively studied to\\nobtain accurate yet compact LLMs for deployment on resource-constrained\\nhardware. However, existing methods cause the quantized LLM to severely degrade\\nand even fail to benefit from the finetuning of LoRA. This paper proposes a\\nnovel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate\\nthrough information retention. The proposed IR-QLoRA mainly relies on two\\ntechnologies derived from the perspective of unified information: (1)\\nstatistics-based Information Calibration Quantization allows the quantized\\nparameters of LLM to retain original information accurately; (2)\\nfinetuning-based Information Elastic Connection makes LoRA utilizes elastic\\nrepresentation transformation with diverse information. Comprehensive\\nexperiments show that IR-QLoRA can significantly improve accuracy across LLaMA\\nand LLaMA2 families under 2-4 bit-widths, e.g., 4- bit LLaMA-7B achieves 1.4%\\nimprovement on MMLU compared with the state-of-the-art methods. The significant\\nperformance gain requires only a tiny 0.31% additional time consumption,\\nrevealing the satisfactory efficiency of our IR-QLoRA. We highlight that\\nIR-QLoRA enjoys excellent versatility, compatible with various frameworks\\n(e.g., NormalFloat and Integer quantization) and brings general accuracy gains.\\nThe code is available at https://github.com/htqin/ir-qlora.\\n\\nPublished: 2025-02-05\\nTitle: Resource-Efficient & Effective Code Summarization\\nAuthors: Saima Afrin, Joseph Call, Khai-Nguyen Nguyen, Oscar Chaparro, Antonio Mastropaolo\\nSummary: Code Language Models (CLMs) have demonstrated high effectiveness in\\nautomating software engineering tasks such as bug fixing, code generation, and\\ncode documentation. This ', name='arxiv', id='4bddbe8d-f983-4bd9-b008-5397ff09a25a', tool_call_id='call_pPveli1xNGa6G9xZEKSFJ4Ux')]\n", "\n", "\n", "\n", "Receiving update from node: 'agent'\n", "[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_fHCfpl4JD6woJUml6HdsViis', 'function': {'arguments': '{\"query\": \"Tim Dettmers latest tweet\"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_lFMs0M4YuAXrsHuJazIwNiKU', 'function': {'arguments': '{\"query\": \"Artidoro Pagnoni latest tweet\"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_Q1imAngZGaBL2Gr7ezZP32UK', 'function': {'arguments': '{\"query\": \"Ari Holtzman latest tweet\"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_qcArCrzpbZ8cBISYcP3pI9TL', 'function': {'arguments': '{\"query\": \"Luke Zettlemoyer latest tweet\"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 113, 'prompt_tokens': 1148, 'total_tokens': 1261, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-595b1c05-af15-41b7-8107-98d922d7806b-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'Tim Dettmers latest tweet'}, 'id': 'call_fHCfpl4JD6woJUml6HdsViis', 'type': 'tool_call'}, {'name': 'tavily_search_results_json', 'args': {'query': 'Artidoro Pagnoni latest tweet'}, 'id': 'call_lFMs0M4YuAXrsHuJazIwNiKU', 'type': 'tool_call'}, {'name': 'tavily_search_results_json', 'args': {'query': 'Ari Holtzman latest tweet'}, 'id': 'call_Q1imAngZGaBL2Gr7ezZP32UK', 'type': 'tool_call'}, {'name': 'tavily_search_results_json', 'args': {'query': 'Luke Zettlemoyer latest tweet'}, 'id': 'call_qcArCrzpbZ8cBISYcP3pI9TL', 'type': 'tool_call'}], usage_metadata={'input_tokens': 1148, 'output_tokens': 113, 'total_tokens': 1261, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n", "Receiving update from node: 'action'\n", "Tool Used: tavily_search_results_json\n", "[ToolMessage(content='[{\"url\": \"https://threadreaderapp.com/user/Tim_Dettmers\", \"content\": \"Oct 8, 2021 • 13 tweets • 5 min read. I am excited to share my latest work: 8-bit optimizers – a replacement for regular optimizers. Faster , 75% less\"}, {\"url\": \"https://timdettmers.com/\", \"content\": \"Filed Under: Academia, PhD Life Tagged With: Advisors, Grad school, PhD\\\\n\\\\nTPUs vs GPUs for Transformers (BERT)\\\\n\\\\n2018-10-17 by Tim Dettmers 26 Comments [...] Filed Under: Deep Learning, Hardware Tagged With: AMD, CPU, High Performance Computing, Matrix Multiplication, Parallel Computing, PCIe Lanes, Sparse Training\\\\n\\\\nLLM.int8() and Emergent Features\\\\n\\\\n2022-08-17 by Tim Dettmers 13 Comments [...] [Read more…] about How to Choose Your Grad School\\\\n\\\\nFiled Under: Academia, PhD Life Tagged With: Advisors, Grad school, PhD\\\\n\\\\nOn Creativity in Academia\\\\n\\\\n2019-09-03 by Tim Dettmers 5 Comments\"}, {\"url\": \"https://twitter.com/Tim_Dettmers\", \"content\": \"Perplexity Deep Research is quite close to OpenAI o3 on the Humanity Last Exam Benchmark despite being an order of magnitude faster and cheaper. This is\"}, {\"url\": \"https://developer.nvidia.com/blog/author/tdettmers/\", \"content\": \"He runs a blog about deep learning and takes part in Kaggle data science competitions where he has reached a world rank of 63. Follow @Tim_Dettmers on Twitter\"}, {\"url\": \"https://threadreaderapp.com/thread/1666076553665744896.html\", \"content\": \"Make a small donation by buying us coffee ($5) or help with server cost ($10)\\\\n\\\\nOr Donate anonymously using crypto!\\\\n\\\\nEthereum\\\\n\\\\n0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E\\\\ncopy\\\\n\\\\nBitcoin\\\\n\\\\n3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi\\\\ncopy\\\\n\\\\nThank you for your support!\\\\n\\\\nSend Email!\\\\n\\\\nEmail the whole thread instead of just a link! [...] More from @Tim_Dettmers\\\\n\\\\nTim Dettmers\\\\n\\\\nTim Dettmers\\\\n\\\\nTim Dettmers\\\\n\\\\nTim Dettmers\\\\n\\\\nTim Dettmers\\\\n\\\\nTim Dettmers\\\\n\\\\nDid Thread Reader help you today?\\\\n\\\\nSupport us! We are indie developers!\\\\n\\\\nThis site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.\\\\n\\\\nBecome a Premium Member ($3/month or $30/year) and get exclusive features!\\\\n\\\\nBecome Premium\\\\n\\\\nDon\\'t want to be a Premium member but still want to support us?\"}]', name='tavily_search_results_json', id='76e3518a-bc4b-4e3a-af99-c2ba113823af', tool_call_id='call_fHCfpl4JD6woJUml6HdsViis', artifact={'query': 'Tim Dettmers latest tweet', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://threadreaderapp.com/user/Tim_Dettmers', 'title': \"Tim Dettmers's Threads - Thread Reader App\", 'content': 'Oct 8, 2021 • 13 tweets • 5 min read. I am excited to share my latest work: 8-bit optimizers – a replacement for regular optimizers. Faster , 75% less', 'score': 0.73138535, 'raw_content': None}, {'url': 'https://timdettmers.com/', 'title': 'Tim Dettmers — Making deep learning accessible.', 'content': 'Filed Under: Academia, PhD Life Tagged With: Advisors, Grad school, PhD\\n\\nTPUs vs GPUs for Transformers (BERT)\\n\\n2018-10-17 by Tim Dettmers 26 Comments [...] Filed Under: Deep Learning, Hardware Tagged With: AMD, CPU, High Performance Computing, Matrix Multiplication, Parallel Computing, PCIe Lanes, Sparse Training\\n\\nLLM.int8() and Emergent Features\\n\\n2022-08-17 by Tim Dettmers 13 Comments [...] [Read more…] about How to Choose Your Grad School\\n\\nFiled Under: Academia, PhD Life Tagged With: Advisors, Grad school, PhD\\n\\nOn Creativity in Academia\\n\\n2019-09-03 by Tim Dettmers 5 Comments', 'score': 0.6649108, 'raw_content': None}, {'url': 'https://twitter.com/Tim_Dettmers', 'title': 'Tim Dettmers (@Tim_Dettmers) / X', 'content': 'Perplexity Deep Research is quite close to OpenAI o3 on the Humanity Last Exam Benchmark despite being an order of magnitude faster and cheaper. This is', 'score': 0.6147415, 'raw_content': None}, {'url': 'https://developer.nvidia.com/blog/author/tdettmers/', 'title': 'Author: Tim Dettmers | NVIDIA Technical Blog', 'content': 'He runs a blog about deep learning and takes part in Kaggle data science competitions where he has reached a world rank of 63. Follow @Tim_Dettmers on Twitter', 'score': 0.58948064, 'raw_content': None}, {'url': 'https://threadreaderapp.com/thread/1666076553665744896.html', 'title': 'Thread by @Tim_Dettmers on Thread Reader App', 'content': \"Make a small donation by buying us coffee ($5) or help with server cost ($10)\\n\\nOr Donate anonymously using crypto!\\n\\nEthereum\\n\\n0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E\\ncopy\\n\\nBitcoin\\n\\n3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi\\ncopy\\n\\nThank you for your support!\\n\\nSend Email!\\n\\nEmail the whole thread instead of just a link! [...] More from @Tim_Dettmers\\n\\nTim Dettmers\\n\\nTim Dettmers\\n\\nTim Dettmers\\n\\nTim Dettmers\\n\\nTim Dettmers\\n\\nTim Dettmers\\n\\nDid Thread Reader help you today?\\n\\nSupport us! We are indie developers!\\n\\nThis site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.\\n\\nBecome a Premium Member ($3/month or $30/year) and get exclusive features!\\n\\nBecome Premium\\n\\nDon't want to be a Premium member but still want to support us?\", 'score': 0.5492786, 'raw_content': None}], 'response_time': 2.39}), ToolMessage(content='[{\"url\": \"https://x.com/artidoropagnoni?lang=en\", \"content\": \"Artidoro Pagnoni\\'s posts ... 9h. Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest\"}, {\"url\": \"https://artidoro.github.io/\", \"content\": \"I have recently developed the Byte Latent Transformer, a new architecture that efficiently learns from raw byte data unlocking a new scaling dimension and paving the path towards universal byte models. With QLoRA, I reduced finetuning memory requirements by 15x and showed how to approach ChatGPT 3.5 performance in 24h on a single GPU. [...] Previously, I have also worked on sythetic data augmentation for improved controllability of generation systems, investigated language models’ reasoning and world modeling abilities, and evaluated their factual errors, as well as societal challenge associated with their use. [...] Artidoro Pagnoni\\\\n\\\\nPhD student in NLP at the University of Washington\\\\n\\\\nCiao!\\\\n\\\\nI am a final-year PhD student in Computer Science at the University of Washington, advised by Luke Zettlemoyer, and a visiting researcher at Meta. My research focuses on resource efficiency and improving LLM scaling trends.\"}, {\"url\": \"https://twitter.com/ArtidoroPagnoni/highlights\", \"content\": \"Artidoro Pagnoni\\'s Highlights ... Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead\"}, {\"url\": \"https://twitter.com/apagnonipagnoni?lang=ms\", \"content\": \"Don\\'t let climate change deniers derail progress in the global effort to #ActOnClimate: http://ofa.bo/e5FW. Imej. 204. 565. 1K. Artidoro Pagnoni menyiarkan\"}, {\"url\": \"https://x.com/ArtidoroPagnoni/status/1867601413741981804\", \"content\": \"Artidoro Pagnoni · @ArtidoroPagnoni. Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3\"}]', name='tavily_search_results_json', id='7dd9c8e7-8a4c-4826-a41a-fcbd99ba7bbf', tool_call_id='call_lFMs0M4YuAXrsHuJazIwNiKU', artifact={'query': 'Artidoro Pagnoni latest tweet', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://x.com/artidoropagnoni?lang=en', 'title': 'Artidoro Pagnoni (@ArtidoroPagnoni) / X', 'content': \"Artidoro Pagnoni's posts ... 9h. Meta FAIR is releasing several new research artifacts on our road to advanced machine intelligence (AMI). These latest\", 'score': 0.71137124, 'raw_content': None}, {'url': 'https://artidoro.github.io/', 'title': 'Artidoro Pagnoni: Ciao!', 'content': 'I have recently developed the Byte Latent Transformer, a new architecture that efficiently learns from raw byte data unlocking a new scaling dimension and paving the path towards universal byte models. With QLoRA, I reduced finetuning memory requirements by 15x and showed how to approach ChatGPT 3.5 performance in 24h on a single GPU. [...] Previously, I have also worked on sythetic data augmentation for improved controllability of generation systems, investigated language models’ reasoning and world modeling abilities, and evaluated their factual errors, as well as societal challenge associated with their use. [...] Artidoro Pagnoni\\n\\nPhD student in NLP at the University of Washington\\n\\nCiao!\\n\\nI am a final-year PhD student in Computer Science at the University of Washington, advised by Luke Zettlemoyer, and a visiting researcher at Meta. My research focuses on resource efficiency and improving LLM scaling trends.', 'score': 0.64746445, 'raw_content': None}, {'url': 'https://twitter.com/ArtidoroPagnoni/highlights', 'title': 'Highlights by Artidoro Pagnoni (@ArtidoroPagnoni) / X', 'content': \"Artidoro Pagnoni's Highlights ... Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead\", 'score': 0.64719695, 'raw_content': None}, {'url': 'https://twitter.com/apagnonipagnoni?lang=ms', 'title': 'Artidoro Pagnoni (@ApagnoniPagnoni) / X', 'content': \"Don't let climate change deniers derail progress in the global effort to #ActOnClimate: http://ofa.bo/e5FW. Imej. 204. 565. 1K. Artidoro Pagnoni menyiarkan\", 'score': 0.61168414, 'raw_content': None}, {'url': 'https://x.com/ArtidoroPagnoni/status/1867601413741981804', 'title': 'Artidoro Pagnoni - X', 'content': 'Artidoro Pagnoni · @ArtidoroPagnoni. Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3', 'score': 0.60274047, 'raw_content': None}], 'response_time': 1.77}), ToolMessage(content='[{\"url\": \"https://x.com/aaholtzman?lang=en\", \"content\": \"In other news, I\\'ll be joining @UChicagoCS and @DSI_UChicago in 2024 as an assistant professor and doing a postdoc @Meta in the meantime! I\\'m at\"}, {\"url\": \"https://twitter.com/universeinanegg/status/1677787350586228738\", \"content\": \"Last but not least, thank you to all folks who we had fantastic discussions with and who gave super insightful feedback!\"}, {\"url\": \"https://x.com/universeinanegg?lang=en\", \"content\": \"Ari Holtzman\\'s posts ... If you want a respite from OpenAI drama, how about joining academia? I\\'m starting Conceptualization Lab, recruiting PhDs & Postdocs! We\"}, {\"url\": \"https://aclanthology.org/2024.lrec-main.986.pdf\", \"content\": \"Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In Advances in Neural Information Processing Systems, volume 32. Curran Asso-ciates, Inc.\\\\nXinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40. [...] Query-2 arma 3 russia ukraine lang:en -is:retweet 2021-12-21 2022-04-06 100 Query-3 for Other russia ukraine war video lang:en (has:media OR has:geo) -is:retweet 2021-12-21 2022-04-06 50 Query-4 for Other russia ukraine war video lang:en -is:retweet 2021-12-21 2022-04-06 50 Table 6: An example event in the dataset. A part of URL is cropped due to space constraints. [...] 2022-04 2022-05 0.000 0.001 0.002 0.003 0.004 0.005 0.006 Density EN-Covid Event EN12 EN13 EN14 EN15 EN16 EN17 EN18 EN19 EN20 (b) COVID-19 in English 2020-09 2020-11 2021-01 2021-03 2021-05 2021-07 2021-09 2021-11 2022-01 2022-03 2022-05 0.00 0.02 0.04 0.06 0.08 0.10 Density EN-Immigration EN21 EN22 EN23 EN24 EN25 EN26 EN27 EN28 EN29 EN30 (c) Immigration in English 2022-02-01 2022-02-15 2022-03-01 2022-03-15 2022-04-01 2022-04-15 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Density TR-Ukraine TR01\"}, {\"url\": \"https://twitter.com/universeinanegg/status/1764113353113321929\", \"content\": \"My biggest challenge in life has consistently been: what skills that I don\\'t have a natural talent for are worth honing?\"}]', name='tavily_search_results_json', id='6716d08c-6cc3-43bd-a98f-5658730ea987', tool_call_id='call_Q1imAngZGaBL2Gr7ezZP32UK', artifact={'query': 'Ari Holtzman latest tweet', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://x.com/aaholtzman?lang=en', 'title': 'Ari Holtzman (@aaholtzman) / X', 'content': \"In other news, I'll be joining @UChicagoCS and @DSI_UChicago in 2024 as an assistant professor and doing a postdoc @Meta in the meantime! I'm at\", 'score': 0.64478505, 'raw_content': None}, {'url': 'https://twitter.com/universeinanegg/status/1677787350586228738', 'title': 'Ari Holtzman on X: \"Last but not least, thank you to all folks who we ...', 'content': 'Last but not least, thank you to all folks who we had fantastic discussions with and who gave super insightful feedback!', 'score': 0.5923135, 'raw_content': None}, {'url': 'https://x.com/universeinanegg?lang=en', 'title': 'Ari Holtzman (@universeinanegg) / X', 'content': \"Ari Holtzman's posts ... If you want a respite from OpenAI drama, how about joining academia? I'm starting Conceptualization Lab, recruiting PhDs & Postdocs! We\", 'score': 0.5402697, 'raw_content': None}, {'url': 'https://aclanthology.org/2024.lrec-main.986.pdf', 'title': '[PDF] An Annotated Multi-Event Tweet Dataset for Misinformation Detection', 'content': 'Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, and Yejin Choi. 2019. Defending against neural fake news. In Advances in Neural Information Processing Systems, volume 32. Curran Asso-ciates, Inc.\\nXinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40. [...] Query-2 arma 3 russia ukraine lang:en -is:retweet 2021-12-21 2022-04-06 100 Query-3 for Other russia ukraine war video lang:en (has:media OR has:geo) -is:retweet 2021-12-21 2022-04-06 50 Query-4 for Other russia ukraine war video lang:en -is:retweet 2021-12-21 2022-04-06 50 Table 6: An example event in the dataset. A part of URL is cropped due to space constraints. [...] 2022-04 2022-05 0.000 0.001 0.002 0.003 0.004 0.005 0.006 Density EN-Covid Event EN12 EN13 EN14 EN15 EN16 EN17 EN18 EN19 EN20 (b) COVID-19 in English 2020-09 2020-11 2021-01 2021-03 2021-05 2021-07 2021-09 2021-11 2022-01 2022-03 2022-05 0.00 0.02 0.04 0.06 0.08 0.10 Density EN-Immigration EN21 EN22 EN23 EN24 EN25 EN26 EN27 EN28 EN29 EN30 (c) Immigration in English 2022-02-01 2022-02-15 2022-03-01 2022-03-15 2022-04-01 2022-04-15 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Density TR-Ukraine TR01', 'score': 0.52597815, 'raw_content': None}, {'url': 'https://twitter.com/universeinanegg/status/1764113353113321929', 'title': 'Ari Holtzman on X: \"My biggest challenge in life has consistently ...', 'content': \"My biggest challenge in life has consistently been: what skills that I don't have a natural talent for are worth honing?\", 'score': 0.48264915, 'raw_content': None}], 'response_time': 1.24}), ToolMessage(content='[{\"url\": \"https://x.com/lukezettlemoyer?lang=en\", \"content\": \"By popular demand (see our GH issues ), we\\'re releasing 1B and 8B weights for our BLT models! We\\'re also hard at work at adding BLT to HF transformers!\"}, {\"url\": \"https://twitter.com/LukeZettlemoyer\", \"content\": \"Come work with me! We are looking to bring on more top talent to our language modeling workstream at Ai2 (@allen_ai ) building the open ecosystem.\"}, {\"url\": \"https://twitter.com/lukezettlemoyer/status/1003662931479941120\", \"content\": \"Come see Julian Michael presenting his work in question answer meaning representations, now at @NAACLHLT ! Image. 3:41 PM · Jun 4, 2018.\"}, {\"url\": \"https://www.linkedin.com/in/luke-zettlemoyer-a0109b226\", \"content\": \"Experience: Facebook AI · Education: Massachusetts Institute of Technology · Location: Seattle · 500+ connections on LinkedIn. View Luke Zettlemoyer\\'s\"}, {\"url\": \"https://arxiv.org/html/2303.12665v2\", \"content\": \"LLMs | LLaMA (reply) | 0.256 | 0.227 | 0.390 | 0.000 | 0.319 | 0.417 | 0.093\\\\n | LLaMA (source & reply) | 0.419 | 0.318 | 0.326 | 0.234 | 0.685 | 0.678 | 0.714\\\\n | Target-dependent subset | Target-independent subset\\\\n | Support | Deny | Query | Comment | Support | Deny | Query | Comment\\\\nMask Source Tweet | 40.3 | 69.9 | 98.7 | 85.7 | 43.0 | 90.8 | 98.7 | 89.0\\\\nShuffle Source Tweet | 54.1 | 82.9 | 93.6 | 90.9 | 57.7 | 94.9 | 97.3 | 83.1\\\\n2.2.1.\\xa0\\xa0\\xa0Experimental Setups [...] italic_i end_POSTSUBSCRIPT , italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )):222Performances of these models are lower than the figures reported in their original paper. The reason is that we do not consider the stance of the source tweet towards rumour, mainly belonging to the support class. (1) The winner of the RumourEval 2019 shared task, i.e. BLCU-NLP (Yang et\\xa0al., 2019); (2) BUT-FIT (Fajcik et\\xa0al., 2019), the second place in the 2019 shared task; (3) Hierarchical-BERT (Yu [...] Method | RumourEval2019 | RumourEval2017\\\\nPT-HCL | 0.452 | 0.431\\\\nHierarchical-BERT | 0.235 | 0.275\\\\nLLaMA | 0.419 | 0.314\\\\nTarget-oblivious BERTweet | 0.477 | 0.425\\\\nTarget-aware BERTweet | 0.435 | 0.426\\\\nProposed Method | 0.510 | 0.452\\\\nw/o weight | 0.458 | 0.421\\\\nw/o cross-att | 0.438 | 0.417\\\\nw/o weight,cross-att | 0.451 | 0.419\\\\nWe also evaluate our proposed method on target-dependent and -independent subsets.\"}]', name='tavily_search_results_json', id='9868a85d-8259-4ea2-9639-8deb4af131c4', tool_call_id='call_qcArCrzpbZ8cBISYcP3pI9TL', artifact={'query': 'Luke Zettlemoyer latest tweet', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://x.com/lukezettlemoyer?lang=en', 'title': 'Luke Zettlemoyer (@LukeZettlemoyer) / X', 'content': \"By popular demand (see our GH issues ), we're releasing 1B and 8B weights for our BLT models! We're also hard at work at adding BLT to HF transformers!\", 'score': 0.67706895, 'raw_content': None}, {'url': 'https://twitter.com/LukeZettlemoyer', 'title': 'Luke Zettlemoyer (@LukeZettlemoyer) / X', 'content': 'Come work with me! We are looking to bring on more top talent to our language modeling workstream at Ai2 (@allen_ai ) building the open ecosystem.', 'score': 0.65887916, 'raw_content': None}, {'url': 'https://twitter.com/lukezettlemoyer/status/1003662931479941120', 'title': 'Luke Zettlemoyer on X: \"Come see Julian Michael presenting his ...', 'content': 'Come see Julian Michael presenting his work in question answer meaning representations, now at @NAACLHLT ! Image. 3:41 PM · Jun 4, 2018.', 'score': 0.6147415, 'raw_content': None}, {'url': 'https://www.linkedin.com/in/luke-zettlemoyer-a0109b226', 'title': 'Luke Zettlemoyer - Facebook AI | LinkedIn', 'content': \"Experience: Facebook AI · Education: Massachusetts Institute of Technology · Location: Seattle · 500+ connections on LinkedIn. View Luke Zettlemoyer's\", 'score': 0.16083227, 'raw_content': None}, {'url': 'https://arxiv.org/html/2303.12665v2', 'title': 'Can We Identify Stance Without Target Arguments? A Study ... - arXiv', 'content': 'LLMs | LLaMA (reply) | 0.256 | 0.227 | 0.390 | 0.000 | 0.319 | 0.417 | 0.093\\n | LLaMA (source & reply) | 0.419 | 0.318 | 0.326 | 0.234 | 0.685 | 0.678 | 0.714\\n | Target-dependent subset | Target-independent subset\\n | Support | Deny | Query | Comment | Support | Deny | Query | Comment\\nMask Source Tweet | 40.3 | 69.9 | 98.7 | 85.7 | 43.0 | 90.8 | 98.7 | 89.0\\nShuffle Source Tweet | 54.1 | 82.9 | 93.6 | 90.9 | 57.7 | 94.9 | 97.3 | 83.1\\n2.2.1.\\xa0\\xa0\\xa0Experimental Setups [...] italic_i end_POSTSUBSCRIPT , italic_o start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )):222Performances of these models are lower than the figures reported in their original paper. The reason is that we do not consider the stance of the source tweet towards rumour, mainly belonging to the support class. (1) The winner of the RumourEval 2019 shared task, i.e. BLCU-NLP (Yang et\\xa0al., 2019); (2) BUT-FIT (Fajcik et\\xa0al., 2019), the second place in the 2019 shared task; (3) Hierarchical-BERT (Yu [...] Method | RumourEval2019 | RumourEval2017\\nPT-HCL | 0.452 | 0.431\\nHierarchical-BERT | 0.235 | 0.275\\nLLaMA | 0.419 | 0.314\\nTarget-oblivious BERTweet | 0.477 | 0.425\\nTarget-aware BERTweet | 0.435 | 0.426\\nProposed Method | 0.510 | 0.452\\nw/o weight | 0.458 | 0.421\\nw/o cross-att | 0.438 | 0.417\\nw/o weight,cross-att | 0.451 | 0.419\\nWe also evaluate our proposed method on target-dependent and -independent subsets.', 'score': 0.09048623, 'raw_content': None}], 'response_time': 1.41})]\n", "\n", "\n", "\n", "Receiving update from node: 'agent'\n", "[AIMessage(content='Here are the latest tweets or updates from the authors of the QLoRA paper:\\n\\n1. **Tim Dettmers**:\\n - Latest Tweet: \"Perplexity Deep Research is quite close to OpenAI o3 on the Humanity Last Exam Benchmark despite being an order of magnitude faster and cheaper.\" [Source](https://twitter.com/Tim_Dettmers)\\n\\n2. **Artidoro Pagnoni**:\\n - Latest Tweet: \"Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead.\" [Source](https://x.com/ArtidoroPagnoni/status/1867601413741981804)\\n\\n3. **Ari Holtzman**:\\n - Latest Tweet: \"My biggest challenge in life has consistently been: what skills that I don\\'t have a natural talent for are worth honing?\" [Source](https://twitter.com/universeinanegg/status/1764113353113321929)\\n\\n4. **Luke Zettlemoyer**:\\n - Latest Tweet: \"By popular demand (see our GH issues), we\\'re releasing 1B and 8B weights for our BLT models! We\\'re also hard at work at adding BLT to HF transformers!\" [Source](https://x.com/lukezettlemoyer?lang=en)\\n\\nThese tweets provide a glimpse into their recent activities and thoughts.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 289, 'prompt_tokens': 3950, 'total_tokens': 4239, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 1152}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'stop', 'logprobs': None}, id='run-76546539-8c3f-44ec-a559-2f1fceeed5a6-0', usage_metadata={'input_tokens': 3950, 'output_tokens': 289, 'total_tokens': 4239, 'input_token_details': {'audio': 0, 'cache_read': 1152}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n" ] } ], "source": [ "inputs = {\"messages\" : [HumanMessage(content=\"Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using Tavily!\")]}\n", "\n", "async for chunk in compiled_graph.astream(inputs, stream_mode=\"updates\"):\n", " for node, values in chunk.items():\n", " print(f\"Receiving update from node: '{node}'\")\n", " if node == \"action\":\n", " print(f\"Tool Used: {values['messages'][0].name}\")\n", " print(values[\"messages\"])\n", "\n", " print(\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": { "id": "CXzDlZVz1Hnf" }, "source": [ "#### 🏗️ Activity #2:\n", "\n", "Please write out the steps the agent took to arrive at the correct answer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Answer\n", "\n", "\n", "Here are the steps the agent took to arrive at the correct answer:\n", "\n", "1. **Initial State Setup**: \n", " - The state object was initialized with the user's query to search for the QLoRA paper and find the latest tweets from its authors.\n", "\n", "2. **First Agent Node Execution** (1st cycle):\n", " - The agent analyzed the query and decided to first search for information about QLoRA.\n", " - It made a tool call to the \"arxiv\" tool with the query \"QLoRA\" to find relevant academic papers.\n", "\n", "3. **First Tool Node Execution**:\n", " - The \"arxiv\" tool returned information about QLoRA papers, including the original paper \"QLoRA: Efficient Finetuning of Quantized LLMs\" by Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer.\n", " - This information was added to the state object and passed back to the agent node.\n", "\n", "4. **Second Agent Node Execution** (2nd cycle):\n", " - The agent processed the arxiv search results, identified the four authors of the paper.\n", " - It decided to search for their latest tweets using the Tavily search tool.\n", " - It made four consecutive tool calls to \"tavily_search_results_json\" for each author:\n", " - \"Tim Dettmers latest tweet\"\n", " - \"Artidoro Pagnoni latest tweet\"\n", " - \"Ari Holtzman latest tweet\"\n", " - \"Luke Zettlemoyer latest tweet\"\n", "\n", "5. **Second Tool Node Execution**:\n", " - The Tavily search tool executed all four queries and returned results for each author.\n", " - The tool results were added to the state object and passed back to the agent node.\n", "\n", "6. **Third Agent Node Execution** (3rd cycle):\n", " - The agent analyzed the search results for each author.\n", " - It formatted the information into a structured response, including each author's latest tweet and its source URL.\n", " - Since it completed the task and no further tool calls were needed, the conditional edge routed the flow to END.\n", " - The final response was returned as the answer.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 🤝 Breakout Room #2" ] }, { "cell_type": "markdown", "metadata": { "id": "v7c8-Uyarh1v" }, "source": [ "## Part 1: LangSmith Evaluator" ] }, { "cell_type": "markdown", "metadata": { "id": "pV3XeFOT1Sar" }, "source": [ "### Pre-processing for LangSmith" ] }, { "cell_type": "markdown", "metadata": { "id": "wruQCuzewUuO" }, "source": [ "To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "oeXdQgbxwhTv" }, "outputs": [], "source": [ "def convert_inputs(input_object):\n", " return {\"messages\" : [HumanMessage(content=input_object[\"question\"])]}\n", "\n", "def parse_output(input_state):\n", " return input_state[\"messages\"][-1].content\n", "\n", "agent_chain = convert_inputs | compiled_graph | parse_output" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 105 }, "id": "orYxBZXSxJjZ", "outputId": "76be837b-6424-4516-8f63-07fbd8c25bf5" }, "outputs": [ { "data": { "text/plain": [ "\"RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) that combines retrieval-based methods with generative models to improve the quality and accuracy of generated text. Here's how it works:\\n\\n1. **Retrieval**: The system first retrieves relevant information from a large corpus or database. This step involves searching for documents, passages, or data that are related to the input query or context.\\n\\n2. **Augmentation**: The retrieved information is then used to augment the input to a generative model. This means that the generative model has access to additional context or facts that can help it produce more accurate and informative responses.\\n\\n3. **Generation**: Finally, the generative model uses both the original input and the retrieved information to generate a response. This can be in the form of answering questions, completing sentences, or creating more complex text outputs.\\n\\nRAG is particularly useful in scenarios where the generative model alone might not have enough information to produce a high-quality response, such as in open-domain question answering or when dealing with specialized knowledge areas. By leveraging external data sources, RAG can enhance the model's ability to provide accurate and contextually relevant information.\"" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "agent_chain.invoke({\"question\" : \"What is RAG?\"})" ] }, { "cell_type": "markdown", "metadata": { "id": "f9UkCIqkpyZu" }, "source": [ "### Task 1: Creating An Evaluation Dataset\n", "\n", "Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.\n", "\n", "In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.\n", "\n", "```python\n", "questions = [\n", " \"What optimizer is used in QLoRA?\",\n", " \"What data type was created in the QLoRA paper?\",\n", " \"What is a Retrieval Augmented Generation system?\",\n", " \"Who authored the QLoRA paper?\",\n", " \"What is the most popular deep learning framework?\",\n", " \"What significant improvements does the LoRA system make?\"\n", "]\n", "\n", "answers = [\n", " {\"must_mention\" : [\"paged\", \"optimizer\"]},\n", " {\"must_mention\" : [\"NF4\", \"NormalFloat\"]},\n", " {\"must_mention\" : [\"ground\", \"context\"]},\n", " {\"must_mention\" : [\"Tim\", \"Dettmers\"]},\n", " {\"must_mention\" : [\"PyTorch\", \"TensorFlow\"]},\n", " {\"must_mention\" : [\"reduce\", \"parameters\"]},\n", "]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "VfMXF2KAsQxs" }, "source": [ "#### 🏗️ Activity #3:\n", "\n", "Please create a dataset in the above format with at least 5 questions." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "id": "CbagRuJop83E" }, "outputs": [], "source": [ "questions = [\n", " \"What optimizer is used in QLoRA?\",\n", " \"What data type was created in the QLoRA paper?\",\n", " \"What is a Retrieval Augmented Generation system?\",\n", " \"Who authored the QLoRA paper?\",\n", " \"What is the most popular deep learning framework?\",\n", " \"What significant improvements does the LoRA system make?\"\n", "]\n", "\n", "answers = [\n", " {\"must_mention\" : [\"paged\", \"optimizer\"]},\n", " {\"must_mention\" : [\"NF4\", \"NormalFloat\"]},\n", " {\"must_mention\" : [\"ground\", \"context\"]},\n", " {\"must_mention\" : [\"Tim\", \"Dettmers\"]},\n", " {\"must_mention\" : [\"PyTorch\", \"TensorFlow\"]},\n", " {\"must_mention\" : [\"reduce\", \"parameters\"]},\n", "]" ] }, { "cell_type": "markdown", "metadata": { "id": "z7QVFuAmsh7L" }, "source": [ "Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "RLfrZrgSsn85" }, "outputs": [ { "data": { "text/plain": [ "{'example_ids': ['04160349-02c4-4ba3-b103-a56ff5835e0c',\n", " '693c2c19-7e03-45ce-afb2-7312c3824dde',\n", " '8e7a920e-aaa9-4166-8917-b76fb413e39e',\n", " '1de91a09-0373-4982-bb87-9393b37b8d44',\n", " '1a45d644-6393-4b1b-b242-843982829449',\n", " 'b4217d52-f22c-44c9-a548-99ef8b068b6f'],\n", " 'count': 6}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langsmith import Client\n", "\n", "client = Client()\n", "\n", "dataset_name = f\"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}\"\n", "\n", "dataset = client.create_dataset(\n", " dataset_name=dataset_name,\n", " description=\"Questions about the QLoRA Paper to Evaluate RAG over the same paper.\"\n", ")\n", "\n", "client.create_examples(\n", " inputs=[{\"question\" : q} for q in questions],\n", " outputs=answers,\n", " dataset_id=dataset.id,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "ciV73F9Q04w0" }, "source": [ "#### ❓ Question #3:\n", "\n", "How are the correct answers associated with the questions?\n", "\n", "> NOTE: Feel free to indicate if this is problematic or not" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Answer\n", "\n", "\n", "The association between questions and correct answers happens through positional matching in parallel lists. \n", "\n", "This approach has several potential issues:\n", "\n", "1. **Position-dependent mapping**: The association relies solely on list positions - the first question matches the first answer, second question with second answer, and so on. This creates a risk of misalignment if lists are modified independently.\n", "\n", "2. **Fragile maintenance**: Adding or removing questions requires careful index management to maintain correct associations.\n", "\n", "3. **Limited evaluation criteria**: Each answer only contains phrases that \"must be mentioned\" rather than complete reference answers or more nuanced evaluation criteria.\n", "\n", "4. **String matching limitations**: The evaluator uses exact string matching (`all(phrase in prediction for phrase in required)`) which doesn't account for semantic equivalence or paraphrasing.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "-lRTXUrTtP9Y" }, "source": [ "### Task 2: Adding Evaluators\n", "\n", "Now we can add a custom evaluator to see if our responses contain the expected information.\n", "\n", "We'll be using a fairly naive exact-match process to determine if our response contains specific strings." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "QrAUXMFftlAY" }, "outputs": [], "source": [ "from langsmith.evaluation import EvaluationResult, run_evaluator\n", "\n", "@run_evaluator\n", "def must_mention(run, example) -> EvaluationResult:\n", " prediction = run.outputs.get(\"output\") or \"\"\n", " required = example.outputs.get(\"must_mention\") or []\n", " score = all(phrase in prediction for phrase in required)\n", " return EvaluationResult(key=\"must_mention\", score=score)" ] }, { "cell_type": "markdown", "metadata": { "id": "PNtHORUh0jZY" }, "source": [ "#### ❓ Question #4:\n", "\n", "What are some ways you could improve this metric as-is?\n", "\n", "> NOTE: Alternatively you can suggest where gaps exist in this method.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Answer\n", "\n", "the current implementation uses a simple exact string matching approach:\n", "\n", "Here are ways to improve this metric:\n", "\n", "1. **Semantic Matching**: Replace exact string matching with semantic similarity using embeddings to detect conceptually equivalent answers even when phrased differently.\n", "\n", "\n", "The current method has notable gaps in that it can't handle paraphrasing, doesn't consider semantic meaning, and may incorrectly score responses that contain the required phrases but are fundamentally incorrect or contradictory.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "r1RJr349zhv7" }, "source": [ "Task 3: Evaluating\n", "\n", "All that is left to do is evaluate our agent's response!" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 118, "referenced_widgets": [ "efcf57067cf743d8b4ce059a61cbe02e", "53e33aae3b97490c82aec7bbb0d6ebba", "ad84e0e971d3455db2efe7dd0d1f803e", "72adef9b70dd48198b7322b6c5b113cf", "8a61d045ffd44ac58f3f13eb10044836", "041e22a9b5514e36bd4d1dac01d5d398", "886d762f2a7c421382efb5502c6d42a1", "ab91fd625bbd43afbf8c6398193a88d0", "716557ad09874dcb989d75f7c74424cd", "77d4c0ebaae045b58efc4f789c9a2360", "0d622ccc56264fac8fd7508dbdbe6e29" ] }, "id": "p5TeCUUkuGld", "outputId": "2f7d62a2-e78d-447a-d07b-f9e4d500fb79" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "View the evaluation results for experiment: 'RAG Pipeline - Evaluation - fad9-8812cb12' at:\n", "https://smith.langchain.com/o/519bdaca-6663-4536-ab5c-1158fb651454/datasets/18764a3a-aa33-43b1-9bc4-ecf33083b931/compare?selectedSessions=b212c466-1d03-4a18-b456-4d9c19cb9379\n", "\n", "\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a4dfd2be5a6b487bbc8020710642cc5b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "0it [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "experiment_results = client.evaluate(\n", " agent_chain,\n", " data=dataset_name,\n", " evaluators=[must_mention],\n", " experiment_prefix=f\"RAG Pipeline - Evaluation - {uuid4().hex[0:4]}\",\n", " metadata={\"version\": \"1.0.0\"},\n", ")" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 342 }, "id": "eeEqU7s05Byu", "outputId": "78395075-a05d-4ebd-c798-ed968b935318" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
inputs.questionoutputs.outputerrorreference.must_mentionfeedback.must_mentionexecution_timeexample_idid
0What optimizer is used in QLoRA?QLoRA uses \"paged optimizers\" to manage memory...None[paged, optimizer]True9.61387204160349-02c4-4ba3-b103-a56ff5835e0c2d8e4582-7d84-42f2-97a4-0348904ec159
1What is the most popular deep learning framework?In 2023, the most popular deep learning framew...None[PyTorch, TensorFlow]True5.9586201a45d644-6393-4b1b-b242-843982829449b041746f-fbc0-46ba-953e-099a9151457f
2Who authored the QLoRA paper?The QLoRA paper titled \"Accurate LoRA-Finetuni...None[Tim, Dettmers]False13.9216701de91a09-0373-4982-bb87-9393b37b8d44ae8f2e00-d479-4f1b-958a-2126d1f36a6d
3What data type was created in the QLoRA paper?The QLoRA paper introduced a new data type cal...None[NF4, NormalFloat]True15.764034693c2c19-7e03-45ce-afb2-7312c3824ddeb3992d5f-6663-48a7-bebc-d0429aeaeab4
4What is a Retrieval Augmented Generation system?A Retrieval Augmented Generation (RAG) system ...None[ground, context]False2.4548758e7a920e-aaa9-4166-8917-b76fb413e39ec25d4866-92ac-44d4-929d-5f270aa8c40c
5What significant improvements does the LoRA sy...The LoRA (Low-Rank Adaptation) system has seen...None[reduce, parameters]False24.377492b4217d52-f22c-44c9-a548-99ef8b068b6fdc144e48-4c25-450d-9949-115a5f73de77
\n", "
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "experiment_results" ] }, { "cell_type": "markdown", "metadata": { "id": "jhTNe4kWrplB" }, "source": [ "## Part 2: LangGraph with Helpfulness:" ] }, { "cell_type": "markdown", "metadata": { "id": "w1wKRddbIY_S" }, "source": [ "### Task 3: Adding Helpfulness Check and \"Loop\" Limits\n", "\n", "Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!\n", "\n", "We're going to make a few key adjustments to account for this:\n", "\n", "1. We're going to add an artificial limit on how many \"loops\" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.\n", "2. We'll add to our existing conditional edge to obtain the behaviour we desire." ] }, { "cell_type": "markdown", "metadata": { "id": "npTYJ8ayR5B3" }, "source": [ "First, let's define our state again - we can check the length of the state object, so we don't need additional state for this." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "id": "-LQ84YhyJG0w" }, "outputs": [], "source": [ "class AgentState(TypedDict):\n", " messages: Annotated[list, add_messages]" ] }, { "cell_type": "markdown", "metadata": { "id": "sD7EV0HqSQcb" }, "source": [ "Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!" ] }, { "cell_type": "markdown", "metadata": { "id": "oajBwLkFVi1N" }, "source": [ "#### 🏗️ Activity #5:\n", "\n", "Please write markdown for the following cells to explain what each is doing." ] }, { "cell_type": "markdown", "metadata": { "id": "M6rN7feNVn9f" }, "source": [ "### Setting Up the Helpfulness-Enhanced Graph\n", "\n", "This code initializes a new StateGraph that will incorporate a helpfulness check mechanism. We create the graph with the same AgentState structure as before and add two key nodes:\n", "\n", "1. The \"agent\" node, which uses the call_model function to process the state and generate a response\n", "2. The \"action\" node, which uses tool_node to execute external tools when needed\n", "\n", "This forms the basic structure upon which we'll build our enhanced graph with helpfulness evaluation." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6r6XXA5FJbVf", "outputId": "ff713041-e498-4f0f-a875-a03502b87729" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_with_helpfulness_check = StateGraph(AgentState)\n", "\n", "graph_with_helpfulness_check.add_node(\"agent\", call_model)\n", "graph_with_helpfulness_check.add_node(\"action\", tool_node)" ] }, { "cell_type": "markdown", "metadata": { "id": "XZ22o2mWVrfp" }, "source": [ "### Defining the Entry Point\n", "\n", "Here we set the \"agent\" node as the entry point for our graph." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HNWHwWxuRiLY", "outputId": "295f5a35-ceff-452a-ffb8-c52eada6a816" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_with_helpfulness_check.set_entry_point(\"agent\")" ] }, { "cell_type": "markdown", "metadata": { "id": "rsXeF6xlaXOZ" }, "source": [ "### Creating the Conditional Routing Function\n", "\n", "This code defines the `tool_call_or_helpful` function, which serves as the decision-making component of our graph. It performs three critical evaluations:\n", "\n", "1. Checks if the last message contains tool calls, routing to the action node if tools are needed\n", "2. Implements a cycle limit (10 messages) to prevent infinite loops\n", "3. Uses an LLM-based helpfulness check that compares the initial query to the current response, routing to:\n", " - \"end\" if the response is deemed helpful (contains \"Y\")\n", " - \"continue\" to loop back through the agent if more work is needed\n", "\n", "This function adds intelligence to our graph by evaluating response quality." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "id": "z_Sq3A9SaV1O" }, "outputs": [], "source": [ "from langchain_core.prompts import PromptTemplate\n", "from langchain_core.output_parsers import StrOutputParser\n", "\n", "def tool_call_or_helpful(state):\n", " last_message = state[\"messages\"][-1]\n", "\n", " if last_message.tool_calls:\n", " return \"action\"\n", "\n", " initial_query = state[\"messages\"][0]\n", " final_response = state[\"messages\"][-1]\n", "\n", " if len(state[\"messages\"]) > 10:\n", " return \"END\"\n", "\n", " prompt_template = \"\"\"\\\n", " Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.\n", "\n", " Initial Query:\n", " {initial_query}\n", "\n", " Final Response:\n", " {final_response}\"\"\"\n", "\n", " prompt_template = PromptTemplate.from_template(prompt_template)\n", "\n", " helpfulness_check_model = ChatOpenAI(model=\"gpt-4\")\n", "\n", " helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()\n", "\n", " helpfulness_response = helpfulness_chain.invoke({\"initial_query\" : initial_query.content, \"final_response\" : final_response.content})\n", "\n", " if \"Y\" in helpfulness_response:\n", " return \"end\"\n", " else:\n", " return \"continue\"" ] }, { "cell_type": "markdown", "metadata": { "id": "Fz1u9Vf4SHxJ" }, "source": [ "#### 🏗️ Activity #4:\n", "\n", "Please write what is happening in our `tool_call_or_helpful` function!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Answer\n", "This code defines the `tool_call_or_helpful` function, which serves as the decision-making component of our graph. It performs three critical evaluations:\n", "\n", "1. Checks if the last message contains tool calls, routing to the action node if tools are needed\n", "2. Implements a cycle limit (10 messages) to prevent infinite loops\n", "3. Uses an LLM-based helpfulness check that compares the initial query to the current response, routing to:\n", " - \"end\" if the response is deemed helpful (contains \"Y\")\n", " - \"continue\" to loop back through the agent if more work is needed\n", "\n", "This function adds intelligence to our graph by evaluating response quality." ] }, { "cell_type": "markdown", "metadata": { "id": "6BhnBW2YVsJO" }, "source": [ "### Adding Conditional Edges with Helpfulness Routing\n", "\n", "This code connects the decision logic (`tool_call_or_helpful`) to our graph structure by adding conditional edges from the agent node. The routing paths include:\n", "\n", "- \"continue\": Loops back to the agent node for further processing if the response isn't helpful enough\n", "- \"action\": Routes to the action node when tool calls are detected\n", "- \"end\": Terminates the graph execution when a helpful response is generated or the cycle limit is reached\n", "\n", "These conditional edges enable the graph to dynamically determine the optimal path based on response quality and need for tools." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "aVTKnWMbP_8T", "outputId": "7f729b1f-311c-4084-ceaf-0da437900c85" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_with_helpfulness_check.add_conditional_edges(\n", " \"agent\",\n", " tool_call_or_helpful,\n", " {\n", " \"continue\" : \"agent\",\n", " \"action\" : \"action\",\n", " \"end\" : END\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "ZGDLEWOIVtK0" }, "source": [ "### Connecting Action Output Back to the Agent\n", "\n", "This code adds a direct edge from the \"action\" node back to the \"agent\" node. This creates a critical feedback loop where:\n", "\n", "1. The action node executes tools as requested\n", "2. Results from those tools are automatically passed back to the agent\n", "3. The agent can then incorporate the tool results into its reasoning\n", "\n", "This edge ensures the agent can process and reason about information retrieved from external tools.##### YOUR MARKDOWN HERE" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cbDK2MbuREgU", "outputId": "21a64c20-27a1-4e0e-afde-a639abaa8b55" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_with_helpfulness_check.add_edge(\"action\", \"agent\")" ] }, { "cell_type": "markdown", "metadata": { "id": "rSI8AOaEVvT-" }, "source": [ "### Compiling the Graph\n", "\n", "This step compiles our graph with all its nodes, edges, and conditional logic into an executable format. The compilation process optimizes the graph for efficient execution and validates that all connections are properly defined. " ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "oQldl8ERQ8lf" }, "outputs": [], "source": [ "agent_with_helpfulness_check = graph_with_helpfulness_check.compile()" ] }, { "cell_type": "markdown", "metadata": { "id": "F67FGCMRVwGz" }, "source": [ "\n", "### Streaming Agent Execution with Helpfulness Evaluation\n", "\n", "This code demonstrates our helpfulness-enhanced agent handling a complex, multi-part query about machine learning concepts and a researcher. The code:\n", "\n", "1. Creates an input with three distinct but related questions about LoRA (a machine learning technique), Tim Dettmers (a researcher), and Attention (a key ML concept)\n", "\n", "2. Uses `astream()` to asynchronously process the input and stream the results, allowing us to observe the agent's decision-making in real-time\n", "\n", "3. Prints each update from the graph's nodes as they occur, showing:\n", " - Which node is active (agent or action)\n", " - The content of messages being passed through the state\n", " - The decision points where helpfulness is evaluated\n", "\n", "This visualization helps us understand how the agent dynamically decides when to use tools, when to continue refining its answer, and when it determines its response is sufficiently helpful to terminate the graph execution.\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "B3oo8E-PRK1T", "outputId": "f152dea8-96ad-4d29-d8b2-a064c96a8bd3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Receiving update from node: 'agent'\n", "[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_EWD24pZEcEOhww4jynzaXaBG', 'function': {'arguments': '{\"query\": \"LoRA machine learning\"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_JjikNAdpIM8u8r7qpIkvO5nl', 'function': {'arguments': '{\"query\": \"Tim Dettmers\"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_YW20EEjNPAJAWH53y4IRY1sC', 'function': {'arguments': '{\"query\": \"Attention mechanism machine learning\"}', 'name': 'arxiv'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 72, 'prompt_tokens': 177, 'total_tokens': 249, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-400b9e62-7af3-4dfe-9205-833b4792f9cd-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'LoRA machine learning'}, 'id': 'call_EWD24pZEcEOhww4jynzaXaBG', 'type': 'tool_call'}, {'name': 'tavily_search_results_json', 'args': {'query': 'Tim Dettmers'}, 'id': 'call_JjikNAdpIM8u8r7qpIkvO5nl', 'type': 'tool_call'}, {'name': 'arxiv', 'args': {'query': 'Attention mechanism machine learning'}, 'id': 'call_YW20EEjNPAJAWH53y4IRY1sC', 'type': 'tool_call'}], usage_metadata={'input_tokens': 177, 'output_tokens': 72, 'total_tokens': 249, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n", "Receiving update from node: 'action'\n", "[ToolMessage(content=\"Published: 2024-10-28\\nTitle: KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation\\nAuthors: Rambod Azimi, Rishav Rishav, Marek Teichmann, Samira Ebrahimi Kahou\\nSummary: Large language models (LLMs) have demonstrated remarkable performance across\\nvarious downstream tasks. However, the high computational and memory\\nrequirements of LLMs are a major bottleneck. To address this,\\nparameter-efficient fine-tuning (PEFT) methods such as low-rank adaptation\\n(LoRA) have been proposed to reduce computational costs while ensuring minimal\\nloss in performance. Additionally, knowledge distillation (KD) has been a\\npopular choice for obtaining compact student models from teacher models. In\\nthis work, we present KD-LoRA, a novel fine-tuning method that combines LoRA\\nwith KD. Our results demonstrate that KD-LoRA achieves performance comparable\\nto full fine-tuning (FFT) and LoRA while significantly reducing resource\\nrequirements. Specifically, KD-LoRA retains 98% of LoRA's performance on the\\nGLUE benchmark, while being 40% more compact. Additionally, KD-LoRA reduces GPU\\nmemory usage by 30% compared to LoRA, while decreasing inference time by 30%\\ncompared to both FFT and LoRA. We evaluate KD-LoRA across three encoder-only\\nmodels: BERT, RoBERTa, and DeBERTaV3. Code is available at\\nhttps://github.com/rambodazimi/KD-LoRA.\\n\\nPublished: 2024-04-07\\nTitle: A Note on LoRA\\nAuthors: Vlad Fomenko, Han Yu, Jongho Lee, Stanley Hsieh, Weizhu Chen\\nSummary: LoRA (Low-Rank Adaptation) has emerged as a preferred method for efficiently\\nadapting Large Language Models (LLMs) with remarkable simplicity and efficacy.\\nThis note extends the original LoRA paper by offering new perspectives that\\nwere not initially discussed and presents a series of insights for deploying\\nLoRA at scale. Without introducing new experiments, we aim to improve the\\nunderstanding and application of LoRA.\\n\\nPublished: 2024-06-18\\nTitle: LoRA-drop: Efficient LoRA Parameter Pruning based on Output Evaluation\\nAuthors: Hongyun Zhou, Xiangyu Lu, Wang Xu, Conghui Zhu, Tiejun Zhao, Muyun Yang\\nSummary: Low-Rank Adaptation (LoRA) is currently the most commonly used\\nParameter-efficient fine-tuning (PEFT) method, it introduces auxiliary\\nparameters for each layer to fine-tune the pre-trained model under limited\\ncomputing resources. However, it still faces resource consumption challenges\\nduring training when scaling up to larger models. Most previous studies have\\ntackled this issue by using pruning techniques, which involve removing LoRA\\nparameters deemed unimportant. Nonetheless, these efforts only analyze LoRA\\nparameter features to evaluate their importance, such as parameter count, size,\\nand gradient. In fact, the output of LoRA (product of LoRA parameter and hidden\\nstate), directly impacts the final results. Preliminary experiments indicate\\nthat a fraction of LoRA elements possesses significantly high output values,\\nsubstantially influencing the layer output. Motivated by the observation, we\\npropose LoRA-drop. Concretely, LoRA-drop evaluates the importance of LoRA based\\non the LoRA output. Then we retain LoRA for important layers and the other\\nlayers share the same LoRA. We conduct abundant experiments with models of\\ndifferent scales on NLU and NLG tasks. Results demonstrate that LoRA-drop can\\nachieve performance comparable to full fine-tuning and LoRA, while retaining\\n50\\\\% of the LoRA parameters on average.\", name='arxiv', id='3bf4176d-fedd-4fff-8213-392d030dfd86', tool_call_id='call_EWD24pZEcEOhww4jynzaXaBG'), ToolMessage(content='[{\"url\": \"https://ai2050.schmidtsciences.org/fellow/tim-dettmers/\", \"content\": \"Tim Dettmers is an Assistant Professor at Carnegie Mellon University and a Research Scientist at the Allen Institute for AI, and his research focuses on making\"}, {\"url\": \"https://developer.nvidia.com/blog/author/tdettmers/\", \"content\": \"Tim Dettmers is a masters student in informatics at the University of Lugano where he works on deep learning research. Before that he studied applied\"}, {\"url\": \"https://www.interconnects.ai/p/tim-dettmers\", \"content\": \"Tim Dettmers does not need an introduction for most people building open-source AI. If you are part of that minority, you’re in for a treat. Tim is the lead developer behind most of the open-source tools for quantization: QLoRA, bitsandbytes, 4 and 8 bit inference, and plenty more. He recently finished his Ph.D. at the University of Washington, is now a researcher at the Allen Institute for AI, and is starting as a professor at Carnegie Mellon University in fall of 2025. [...] Nathan Lambert [00:00:00]: Hey, welcome back to Interconnects. This episode is an exciting one. It\\'s with Tim Dettmers.Okay, Tim, welcome to our bootleg in-person AI2 office recording studio we might get barged into, but I\\'m just excited to pick your brain. I mean, we eat lunch together and we talk, and it\\'s obvious that we agree on a lot of things. We have a unique perspective on a lot of things and it challenges normal worldview. So it\\'s good to just share it with people. I\\'ve got this [...] Tim is a joy to talk to. He thinks independently on all the AI issues of today, bringing new perspectives that challenge the status quo. At the same time, he’s sincere and very helpful to work with, working hard to uplift those around him and the academic community. There’s a reason he’s so loved in the open-source AI community. We cover:\\\\n\\\\nGeneral vibes in open-source,\\\\n\\\\nAgents, SWE-Bench, and using open models for tasks like this,\\\\n\\\\nHow to be a GPU poor academic and have huge impact,\"}, {\"url\": \"https://timdettmers.com/about/\", \"content\": \"Skip links\\\\n\\\\nTim Dettmers\\\\n\\\\nMaking deep learning accessible.\\\\n\\\\nHeader Right\\\\n\\\\nBlog Posts Topics\\\\n\\\\nMain navigation\\\\n\\\\nAbout Me\\\\n\\\\nResearch Interests Publications Awards & Honors Service\\\\n\\\\nGoogle Scholar\\\\n\\\\nfirstname.lastname@gmail.com\\\\n\\\\nI am a research scientist at the Allen Institute for Artificial Intelligence (Ai2) and an incoming Assistant Professor at Carnegie Mellon University (CMU). I am the creator and maintainer of bitsandbytes. [...] I have a PhD from University of Washington advised by Luke Zettlemoyer working on efficient deep learning at the intersection between machine learning, natural language processing, and computer systems with a focus on quantization and sparsity. My main research goal is to empower everyone to make AI their own. I do this by making large models accessible through my research (QLoRA, LLM.int8(), k-bit inference scaling laws, Petals, SWARM) and by developing software that makes it easy to use my [...] University while working as a software engineer in the automation industry.\"}, {\"url\": \"https://csd.cmu.edu/people/faculty/tim-dettmers\", \"content\": \"Main navigation\\\\n\\\\nBack to Main Menu\\\\n\\\\nAbout\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\tMain page\\\\n\\\\nBack to Main Menu\\\\n\\\\nAdmissions\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\tMain page\\\\n\\\\nBack to Main Menu\\\\n\\\\nAcademics\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\tMain page\\\\n\\\\nBack to Main Menu\\\\n\\\\nPeople\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\tMain page\\\\n\\\\nBack to Main Menu\\\\n\\\\nResearch\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\t\\\\tMain page\\\\n\\\\nBreadcrumb\\\\n\\\\nTim Dettmers\\\\n\\\\nAssistant Professor\\\\n\\\\nWebsite\\\\n\\\\nGoogle Scholars Link\\\\n\\\\nEmail\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\n\\\\t\\\\t\\\\t\\\\t\\\\t\\\\n dettmers@cmu.edu\"}]', name='tavily_search_results_json', id='5d335987-7168-4848-9126-8b6ddd001871', tool_call_id='call_JjikNAdpIM8u8r7qpIkvO5nl', artifact={'query': 'Tim Dettmers', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'url': 'https://ai2050.schmidtsciences.org/fellow/tim-dettmers/', 'title': 'Tim Dettmers - AI2050 - Schmidt Sciences', 'content': 'Tim Dettmers is an Assistant Professor at Carnegie Mellon University and a Research Scientist at the Allen Institute for AI, and his research focuses on making', 'score': 0.9067276, 'raw_content': None}, {'url': 'https://developer.nvidia.com/blog/author/tdettmers/', 'title': 'Author: Tim Dettmers | NVIDIA Technical Blog', 'content': 'Tim Dettmers is a masters student in informatics at the University of Lugano where he works on deep learning research. Before that he studied applied', 'score': 0.8795718, 'raw_content': None}, {'url': 'https://www.interconnects.ai/p/tim-dettmers', 'title': 'Interviewing Tim Dettmers on open-source AI - Interconnects', 'content': \"Tim Dettmers does not need an introduction for most people building open-source AI. If you are part of that minority, you’re in for a treat. Tim is the lead developer behind most of the open-source tools for quantization: QLoRA, bitsandbytes, 4 and 8 bit inference, and plenty more. He recently finished his Ph.D. at the University of Washington, is now a researcher at the Allen Institute for AI, and is starting as a professor at Carnegie Mellon University in fall of 2025. [...] Nathan Lambert [00:00:00]: Hey, welcome back to Interconnects. This episode is an exciting one. It's with Tim Dettmers.Okay, Tim, welcome to our bootleg in-person AI2 office recording studio we might get barged into, but I'm just excited to pick your brain. I mean, we eat lunch together and we talk, and it's obvious that we agree on a lot of things. We have a unique perspective on a lot of things and it challenges normal worldview. So it's good to just share it with people. I've got this [...] Tim is a joy to talk to. He thinks independently on all the AI issues of today, bringing new perspectives that challenge the status quo. At the same time, he’s sincere and very helpful to work with, working hard to uplift those around him and the academic community. There’s a reason he’s so loved in the open-source AI community. We cover:\\n\\nGeneral vibes in open-source,\\n\\nAgents, SWE-Bench, and using open models for tasks like this,\\n\\nHow to be a GPU poor academic and have huge impact,\", 'score': 0.7591723, 'raw_content': None}, {'url': 'https://timdettmers.com/about/', 'title': 'About Me - Tim Dettmers', 'content': 'Skip links\\n\\nTim Dettmers\\n\\nMaking deep learning accessible.\\n\\nHeader Right\\n\\nBlog Posts Topics\\n\\nMain navigation\\n\\nAbout Me\\n\\nResearch Interests Publications Awards & Honors Service\\n\\nGoogle Scholar\\n\\nfirstname.lastname@gmail.com\\n\\nI am a research scientist at the Allen Institute for Artificial Intelligence (Ai2) and an incoming Assistant Professor at Carnegie Mellon University (CMU). I am the creator and maintainer of bitsandbytes. [...] I have a PhD from University of Washington advised by Luke Zettlemoyer working on efficient deep learning at the intersection between machine learning, natural language processing, and computer systems with a focus on quantization and sparsity. My main research goal is to empower everyone to make AI their own. I do this by making large models accessible through my research (QLoRA, LLM.int8(), k-bit inference scaling laws, Petals, SWARM) and by developing software that makes it easy to use my [...] University while working as a software engineer in the automation industry.', 'score': 0.67037153, 'raw_content': None}, {'url': 'https://csd.cmu.edu/people/faculty/tim-dettmers', 'title': 'Tim Dettmers | Carnegie Mellon University Computer Science ...', 'content': 'Main navigation\\n\\nBack to Main Menu\\n\\nAbout\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMain page\\n\\nBack to Main Menu\\n\\nAdmissions\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMain page\\n\\nBack to Main Menu\\n\\nAcademics\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMain page\\n\\nBack to Main Menu\\n\\nPeople\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMain page\\n\\nBack to Main Menu\\n\\nResearch\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tMain page\\n\\nBreadcrumb\\n\\nTim Dettmers\\n\\nAssistant Professor\\n\\nWebsite\\n\\nGoogle Scholars Link\\n\\nEmail\\n\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\n dettmers@cmu.edu', 'score': 0.6580885, 'raw_content': None}], 'response_time': 3.76}), ToolMessage(content='Published: 2022-03-27\\nTitle: A General Survey on Attention Mechanisms in Deep Learning\\nAuthors: Gianni Brauwers, Flavius Frasincar\\nSummary: Attention is an important mechanism that can be employed for a variety of\\ndeep learning models across many different domains and tasks. This survey\\nprovides an overview of the most important attention mechanisms proposed in the\\nliterature. The various attention mechanisms are explained by means of a\\nframework consisting of a general attention model, uniform notation, and a\\ncomprehensive taxonomy of attention mechanisms. Furthermore, the various\\nmeasures for evaluating attention models are reviewed, and methods to\\ncharacterize the structure of attention models based on the proposed framework\\nare discussed. Last, future work in the field of attention models is\\nconsidered.\\n\\nPublished: 2022-07-04\\nTitle: Attention mechanisms for physiological signal deep learning: which attention should we take?\\nAuthors: Seong-A Park, Hyung-Chul Lee, Chul-Woo Jung, Hyun-Lim Yang\\nSummary: Attention mechanisms are widely used to dramatically improve deep learning\\nmodel performance in various fields. However, their general ability to improve\\nthe performance of physiological signal deep learning model is immature. In\\nthis study, we experimentally analyze four attention mechanisms (e.g.,\\nsqueeze-and-excitation, non-local, convolutional block attention module, and\\nmulti-head self-attention) and three convolutional neural network (CNN)\\narchitectures (e.g., VGG, ResNet, and Inception) for two representative\\nphysiological signal prediction tasks: the classification for predicting\\nhypotension and the regression for predicting cardiac output (CO). We evaluated\\nmultiple combinations for performance and convergence of physiological signal\\ndeep learning model. Accordingly, the CNN models with the spatial attention\\nmechanism showed the best performance in the classification problem, whereas\\nthe channel attention mechanism achieved the lowest error in the regression\\nproblem. Moreover, the performance and convergence of the CNN models with\\nattention mechanisms were better than stand-alone self-attention models in both\\nproblems. Hence, we verified that convolutional operation and attention\\nmechanisms are complementary and provide faster convergence time, despite the\\nstand-alone self-attention models requiring fewer parameters.\\n\\nPublished: 2018-10-17\\nTitle: An Analysis of Attention Mechanisms: The Case of Word Sense Disambiguation in Neural Machine Translation\\nAuthors: Gongbo Tang, Rico Sennrich, Joakim Nivre\\nSummary: Recent work has shown that the encoder-decoder attention mechanisms in neural\\nmachine translation (NMT) are different from the word alignment in statistical\\nmachine translation. In this paper, we focus on analyzing encoder-decoder\\nattention mechanisms, in the case of word sense disambiguation (WSD) in NMT\\nmodels. We hypothesize that attention mechanisms pay more attention to context\\ntokens when translating ambiguous words. We explore the attention distribution\\npatterns when translating ambiguous nouns. Counter-intuitively, we find that\\nattention mechanisms are likely to distribute more attention to the ambiguous\\nnoun itself rather than context tokens, in comparison to other nouns. We\\nconclude that attention mechanism is not the main mechanism used by NMT models\\nto incorporate contextual information for WSD. The experimental results suggest\\nthat NMT models learn to encode contextual information necessary for WSD in the\\nencoder hidden states. For the attention mechanism in Transformer models, we\\nreveal that the first few layers gradually learn to \"align\" source and target\\ntokens and the last few layers learn to extract features from the related but\\nunaligned context tokens.', name='arxiv', id='ff316a16-9c51-4750-a02f-ec4a2b5be329', tool_call_id='call_YW20EEjNPAJAWH53y4IRY1sC')]\n", "\n", "\n", "\n", "Receiving update from node: 'agent'\n", "[AIMessage(content=\"### LoRA in Machine Learning\\nLoRA, or Low-Rank Adaptation, is a method used for efficiently adapting large language models (LLMs) with minimal computational resources. It introduces auxiliary parameters for each layer to fine-tune pre-trained models, making it a popular parameter-efficient fine-tuning (PEFT) method. LoRA aims to reduce computational costs while maintaining performance. Recent advancements like KD-LoRA combine LoRA with knowledge distillation to further enhance efficiency, reducing GPU memory usage and inference time significantly. LoRA-drop is another technique that prunes LoRA parameters based on their output evaluation, retaining performance while reducing parameter count.\\n\\n### Tim Dettmers\\nTim Dettmers is a prominent figure in the AI community, known for his work in making deep learning more accessible. He is an Assistant Professor at Carnegie Mellon University and a Research Scientist at the Allen Institute for AI. Dettmers has contributed significantly to the development of open-source tools for AI, particularly in quantization and efficient deep learning. His research focuses on making large models accessible and includes projects like QLoRA and bitsandbytes. He holds a Ph.D. from the University of Washington, where he worked on efficient deep learning at the intersection of machine learning, natural language processing, and computer systems.\\n\\n### Attention Mechanism in Machine Learning\\nAttention mechanisms are crucial in deep learning models, enhancing performance across various domains and tasks. They allow models to focus on specific parts of the input data, improving the model's ability to handle complex tasks like translation and image recognition. Different types of attention mechanisms, such as multi-head self-attention and convolutional block attention modules, are used depending on the task. These mechanisms help models to better understand context and improve convergence times. In neural machine translation, attention mechanisms help in word sense disambiguation by focusing on context tokens, although they are not the sole mechanism for incorporating contextual information.\", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 385, 'prompt_tokens': 2734, 'total_tokens': 3119, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'stop', 'logprobs': None}, id='run-e5e650eb-9d36-47a4-b477-a012a0968bf7-0', usage_metadata={'input_tokens': 2734, 'output_tokens': 385, 'total_tokens': 3119, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]\n", "\n", "\n", "\n" ] } ], "source": [ "inputs = {\"messages\" : [HumanMessage(content=\"Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?\")]}\n", "\n", "async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode=\"updates\"):\n", " for node, values in chunk.items():\n", " print(f\"Receiving update from node: '{node}'\")\n", " print(values[\"messages\"])\n", " print(\"\\n\\n\")" ] }, { "cell_type": "markdown", "metadata": { "id": "yVmZPs6lnpsM" }, "source": [ "### Task 4: LangGraph for the \"Patterns\" of GenAI\n", "\n", "Let's ask our system about the 4 patterns of Generative AI:\n", "\n", "1. Prompt Engineering\n", "2. RAG\n", "3. Fine-tuning\n", "4. Agents" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "id": "ZoLl7GlXoae-" }, "outputs": [], "source": [ "patterns = [\"prompt engineering\", \"RAG\", \"fine-tuning\", \"LLM-based agents\"]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Zkh0YJuCp3Zl", "outputId": "d847426e-71b3-47e6-b1ae-351a78d68d1e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "**Prompt Engineering Definition:**\n", "\n", "Prompt engineering is the process of designing and refining input prompts to effectively guide the behavior of AI models. It involves structuring or crafting instructions to produce the best possible output from a generative AI model. This can include phrasing a query, specifying a style, choice of words and grammar, providing relevant context, or describing a character for the AI to mimic. It is a technique used to refine large language models (LLMs) with specific or recommended prompts, and it can be done by anyone using natural language in generators like ChatGPT or DALL-E. [Sources: Coursera, Wikipedia, TechTarget, Stanford UIT, DataCamp]\n", "\n", "**History of Prompt Engineering:**\n", "\n", "Prompt engineering has been around since the early days of natural language processing (NLP) and is closely tied to the development and evolution of NLP and AI systems. The importance of prompting grew as models became more sophisticated. A significant milestone was the release of GPT-3 by OpenAI in 2020, which showcased the potential of large-scale pretrained models. This led to the exploration of crafting effective prompts to control and guide the model's behavior. The field has rapidly grown into its own discipline, especially with the advent of reinforcement learning techniques and the development of models like InstructGPT. [Sources: PromptEngineering4U, K2View, Medium, Cobus Greyling, LarkSuite]\n", "\n", "\n", "\n", "Retrieval-Augmented Generation (RAG) is a relatively new technology that was first proposed in 2020. It gained attention in the field of generative AI following the publication of a paper titled \"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks\" by Patrick Lewis and a team at Facebook AI Research. RAG combines retrieval mechanisms with generative models to improve the accuracy and relevance of AI-generated content by grounding it in external knowledge sources.\n", "\n", "The concept of RAG builds on earlier developments in information retrieval and neural networks, with its roots tracing back to the 1950s and 1960s. However, the integration of retrieval techniques with generative models became more prominent with the advent of transformer architectures in the 2010s, which enabled significant advancements in language modeling and retrieval accuracy.\n", "\n", "RAG has been embraced by both academic and industry researchers as a way to enhance the capabilities of large language models (LLMs) in handling knowledge-intensive tasks. The technology continues to evolve, with ongoing research exploring modular frameworks and advanced retrieval processes to further improve its effectiveness and scalability.\n", "\n", "\n", "\n", "Fine-tuning in machine learning refers to the process of taking a pre-trained model and further training it on a smaller, targeted dataset. This approach is particularly useful for adapting a model to a specific task or domain after it has been initially trained on a broader dataset. Fine-tuning allows for the customization of models to improve their performance on specific tasks without the need to train a model from scratch, which can be resource-intensive.\n", "\n", "The concept of fine-tuning has been around for some time, but it gained significant attention with the rise of deep learning and neural networks. In the neural network era, fine-tuning typically involves taking a pre-trained model, attaching a new layer (or head) for a specific task, and training this new layer while keeping the base model largely unchanged. This method allows the task-specific head to learn to classify, predict, or generate outputs relevant to the new task.\n", "\n", "Fine-tuning became particularly prominent with the advent of transfer learning, where models pre-trained on large datasets (like ImageNet for image recognition tasks) are adapted to new tasks with smaller datasets. This approach has been widely adopted in various fields, including natural language processing and computer vision.\n", "\n", "The exact timeline of when fine-tuning \"broke onto the scene\" is not pinpointed in the search results, but it has been a part of the machine learning toolkit for several years, especially gaining traction with the development of large-scale neural networks and the need for efficient adaptation to specific tasks.\n", "\n", "\n", "\n", "LLM-based agents, or Large Language Model-based agents, are systems that utilize large language models to perform complex tasks by combining reasoning, planning, and memory capabilities. These agents act as a \"brain\" to control a flow of operations needed to complete tasks or user requests, often involving modules like planning, memory, and tool usage.\n", "\n", "The concept of LLM-based agents gained significant attention with the popularization of OpenAI's ChatGPT in 2022. Since then, various methods and techniques have been developed to enhance their utilization and address their limitations. These agents represent a significant advancement in AI, bridging the gap between reasoning and action, and enabling systems to interact with their environment in real-time. They are used in areas such as web interaction, software development, and scientific discovery.\n", "\n", "The evolution of LLM-based agents can be traced back to earlier AI agents from the 1960s, but they have significantly advanced with the introduction of LLMs, providing a higher level of generality and capability compared to earlier models.\n", "\n", "\n", "\n" ] } ], "source": [ "for pattern in patterns:\n", " what_is_string = f\"What is {pattern} and when did it break onto the scene??\"\n", " inputs = {\"messages\" : [HumanMessage(content=what_is_string)]}\n", " messages = agent_with_helpfulness_check.invoke(inputs)\n", " print(messages[\"messages\"][-1].content)\n", " print(\"\\n\\n\")" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "041e22a9b5514e36bd4d1dac01d5d398": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "0d622ccc56264fac8fd7508dbdbe6e29": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "53e33aae3b97490c82aec7bbb0d6ebba": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_041e22a9b5514e36bd4d1dac01d5d398", "placeholder": "​", "style": "IPY_MODEL_886d762f2a7c421382efb5502c6d42a1", "value": "" } }, "716557ad09874dcb989d75f7c74424cd": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "72adef9b70dd48198b7322b6c5b113cf": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_77d4c0ebaae045b58efc4f789c9a2360", "placeholder": "​", "style": "IPY_MODEL_0d622ccc56264fac8fd7508dbdbe6e29", "value": " 6/? [00:36<00:00,  5.78s/it]" } }, "77d4c0ebaae045b58efc4f789c9a2360": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "886d762f2a7c421382efb5502c6d42a1": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "8a61d045ffd44ac58f3f13eb10044836": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "ab91fd625bbd43afbf8c6398193a88d0": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": "20px" } }, "ad84e0e971d3455db2efe7dd0d1f803e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_ab91fd625bbd43afbf8c6398193a88d0", "max": 1, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_716557ad09874dcb989d75f7c74424cd", "value": 1 } }, "efcf57067cf743d8b4ce059a61cbe02e": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_53e33aae3b97490c82aec7bbb0d6ebba", "IPY_MODEL_ad84e0e971d3455db2efe7dd0d1f803e", "IPY_MODEL_72adef9b70dd48198b7322b6c5b113cf" ], "layout": "IPY_MODEL_8a61d045ffd44ac58f3f13eb10044836" } } } } }, "nbformat": 4, "nbformat_minor": 0 }