# LangGraph and LangSmith - Agentic RAG Powered by LangChain

In the following notebook we'll complete the following tasks:

- ü§ù Breakout Room #1:
  1. Install required libraries
  2. Set Environment Variables
  3. Creating our Tool Belt
  4. Creating Our State
  5. Creating and Compiling A Graph!

- ü§ù Breakout Room #2:
  1. Evaluating the LangGraph Application with LangSmith
  2. Adding Helpfulness Check and "Loop" Limits
  3. LangGraph for the "Patterns" of GenAI

# ü§ù Breakout Room #1

## Part 1: LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effectively allowing us to recreate application flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

## Task 1:  Dependencies

We'll first install all our required libraries.

> NOTE: If you're running this locally - please skip this step.

In [1]:
#!pip install -qU langchain langchain_openai langchain-community langgraph arxiv

## Task 2: Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [2]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [3]:
os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY")

In [4]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE6 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 3: Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Tavily Search Results](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/tools/tavily_search/tool.py)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)

#### üèóÔ∏è Activity #1:

Please add the tools to use into our toolbelt.

> NOTE: Each tool in our toolbelt should be a method.

In [5]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.tools.arxiv.tool import ArxivQueryRun

tavily_tool = TavilySearchResults(max_results=5)

tool_belt = [
    tavily_tool,
    ArxivQueryRun(),
]

In [34]:
tool_belt

[TavilySearchResults(api_wrapper=TavilySearchAPIWrapper(tavily_api_key=SecretStr('**********'))),
 ArxivQueryRun(api_wrapper=ArxivAPIWrapper(arxiv_search=<class 'arxiv.Search'>, arxiv_exceptions=(<class 'arxiv.ArxivError'>, <class 'arxiv.UnexpectedEmptyPageError'>, <class 'arxiv.HTTPError'>), top_k_results=3, ARXIV_MAX_QUERY_LENGTH=300, continue_on_failure=False, load_max_docs=100, load_all_available_meta=False, doc_content_chars_max=4000))]

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [6]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o", temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [7]:
model = model.bind_tools(tool_belt)

In [35]:
model

RunnableBinding(bound=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x119491a90>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x1194a6270>, root_client=<openai.OpenAI object at 0x118e26ba0>, root_async_client=<openai.AsyncOpenAI object at 0x119491be0>, model_name='gpt-4o', temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********')), kwargs={'tools': [{'type': 'function', 'function': {'name': 'tavily_search_results_json', 'description': 'A search engine optimized for comprehensive, accurate, and trusted results. Useful for when you need to answer questions about current events. Input should be a search query.', 'parameters': {'properties': {'query': {'description': 'search query to look up', 'type': 'string'}}, 'required': ['query'], 'type': 'object'}}}, {'type': 'function', 'function': {'name': 'arxiv', 'description': 'A wrapper around Arxiv.org Useful for when you need to answer questions

#### ‚ùì Question #1:

How does the model determine which tool to use?

#### Answer
 - When tools are bound to a model (as seen in the line model = model.bind_tools(tool_belt)), the LLM receives information about each tool's name, description, and required parameters.
 - The model analyzes the user's query to understand what information is needed or what task needs to be performed.
 - Based on this understanding, the model decides which tool would be most appropriate to use by selecting from the available tools in its context.
 - In the function calling framework (like OpenAI's), the model outputs a structured format specifying which tool to call and with what parameters.

## Task 4: Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [8]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

## Task 5: It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [37]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

tool_node = ToolNode(tool_belt)

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `tool_node` is a node which can call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [38]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x11c8647d0>

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [39]:
uncompiled_graph.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x11c8647d0>

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [40]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

<langgraph.graph.state.StateGraph at 0x11c8647d0>

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [41]:
uncompiled_graph.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x11c8647d0>

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [42]:
compiled_graph = uncompiled_graph.compile()

#### ‚ùì Question #2:

Is there any specific limit to how many times we can cycle?

If not, how could we impose a limit to the number of cycles?

#### Answer
 - By default Langgraph has 25 super steps limit after which we will see `GraphRecursionError` from langgraph.
 - There are couple of ways to impose limit
    - Have a conditional edge that specifies a termination condition
        - this could be a business case condition
        - just a counter
    - Setting up `recursion_limit` in the invoke will limit recursion
        ```
            graph.invoke(inputs, {"recursion_limit": 3})

        ```




## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [43]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="Who is the current captain of the Winnipeg Jets?")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_tsXMo9fY3k63iaeVgvVkRR6k', 'function': {'arguments': '{"query":"current captain of the Winnipeg Jets 2023"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 162, 'total_tokens': 189, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-005e321a-543b-49aa-9e22-d03ecac04943-0', tool_calls=[{'name': 'tavily_search_results_json', 'args': {'query': 'current captain of the Winnipeg Jets 2023'}, 'id': 'call_tsXMo9fY3k63iaeVgvVkRR6k', 'type': 'tool_call'}], usage_metadata={'input_tokens': 162, 'output_t

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "tool_calls" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "tool_calls" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

Now let's look at an example that shows a multiple tool usage - all with the same flow!

In [44]:
inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using Tavily!")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_pPveli1xNGa6G9xZEKSFJ4Ux', 'function': {'arguments': '{"query":"QLoRA"}', 'name': 'arxiv'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 178, 'total_tokens': 195, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-0a66a0cc-3904-4fca-a870-0e3decad4e69-0', tool_calls=[{'name': 'arxiv', 'args': {'query': 'QLoRA'}, 'id': 'call_pPveli1xNGa6G9xZEKSFJ4Ux', 'type': 'tool_call'}], usage_metadata={'input_tokens': 178, 'output_tokens': 17, 'total_tokens': 195, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'a

#### üèóÔ∏è Activity #2:

Please write out the steps the agent took to arrive at the correct answer.

#### Answer


Here are the steps the agent took to arrive at the correct answer:

1. **Initial State Setup**: 
   - The state object was initialized with the user's query to search for the QLoRA paper and find the latest tweets from its authors.

2. **First Agent Node Execution** (1st cycle):
   - The agent analyzed the query and decided to first search for information about QLoRA.
   - It made a tool call to the "arxiv" tool with the query "QLoRA" to find relevant academic papers.

3. **First Tool Node Execution**:
   - The "arxiv" tool returned information about QLoRA papers, including the original paper "QLoRA: Efficient Finetuning of Quantized LLMs" by Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer.
   - This information was added to the state object and passed back to the agent node.

4. **Second Agent Node Execution** (2nd cycle):
   - The agent processed the arxiv search results, identified the four authors of the paper.
   - It decided to search for their latest tweets using the Tavily search tool.
   - It made four consecutive tool calls to "tavily_search_results_json" for each author:
     - "Tim Dettmers latest tweet"
     - "Artidoro Pagnoni latest tweet"
     - "Ari Holtzman latest tweet"
     - "Luke Zettlemoyer latest tweet"

5. **Second Tool Node Execution**:
   - The Tavily search tool executed all four queries and returned results for each author.
   - The tool results were added to the state object and passed back to the agent node.

6. **Third Agent Node Execution** (3rd cycle):
   - The agent analyzed the search results for each author.
   - It formatted the information into a structured response, including each author's latest tweet and its source URL.
   - Since it completed the task and no further tool calls were needed, the conditional edge routed the flow to END.
   - The final response was returned as the answer.




# ü§ù Breakout Room #2

## Part 1: LangSmith Evaluator

### Pre-processing for LangSmith

To do a little bit more preprocessing, let's wrap our LangGraph agent in a simple chain.

In [17]:
def convert_inputs(input_object):
  return {"messages" : [HumanMessage(content=input_object["question"])]}

def parse_output(input_state):
  return input_state["messages"][-1].content

agent_chain = convert_inputs | compiled_graph | parse_output

In [18]:
agent_chain.invoke({"question" : "What is RAG?"})

"RAG stands for Retrieval-Augmented Generation. It is a technique used in natural language processing (NLP) that combines retrieval-based methods with generative models to improve the quality and accuracy of generated text. Here's how it works:\n\n1. **Retrieval**: The system first retrieves relevant information from a large corpus or database. This step involves searching for documents, passages, or data that are related to the input query or context.\n\n2. **Augmentation**: The retrieved information is then used to augment the input to a generative model. This means that the generative model has access to additional context or facts that can help it produce more accurate and informative responses.\n\n3. **Generation**: Finally, the generative model uses both the original input and the retrieved information to generate a response. This can be in the form of answering questions, completing sentences, or creating more complex text outputs.\n\nRAG is particularly useful in scenarios wher

### Task 1: Creating An Evaluation Dataset

Just as we saw last week, we'll want to create a dataset to test our Agent's ability to answer questions.

In order to do this - we'll want to provide some questions and some answers. Let's look at how we can create such a dataset below.

```python
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]
```

#### üèóÔ∏è Activity #3:

Please create a dataset in the above format with at least 5 questions.

In [19]:
questions = [
    "What optimizer is used in QLoRA?",
    "What data type was created in the QLoRA paper?",
    "What is a Retrieval Augmented Generation system?",
    "Who authored the QLoRA paper?",
    "What is the most popular deep learning framework?",
    "What significant improvements does the LoRA system make?"
]

answers = [
    {"must_mention" : ["paged", "optimizer"]},
    {"must_mention" : ["NF4", "NormalFloat"]},
    {"must_mention" : ["ground", "context"]},
    {"must_mention" : ["Tim", "Dettmers"]},
    {"must_mention" : ["PyTorch", "TensorFlow"]},
    {"must_mention" : ["reduce", "parameters"]},
]

Now we can add our dataset to our LangSmith project using the following code which we saw last Thursday!

In [20]:
from langsmith import Client

client = Client()

dataset_name = f"Retrieval Augmented Generation - Evaluation Dataset - {uuid4().hex[0:8]}"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about the QLoRA Paper to Evaluate RAG over the same paper."
)

client.create_examples(
    inputs=[{"question" : q} for q in questions],
    outputs=answers,
    dataset_id=dataset.id,
)

{'example_ids': ['04160349-02c4-4ba3-b103-a56ff5835e0c',
  '693c2c19-7e03-45ce-afb2-7312c3824dde',
  '8e7a920e-aaa9-4166-8917-b76fb413e39e',
  '1de91a09-0373-4982-bb87-9393b37b8d44',
  '1a45d644-6393-4b1b-b242-843982829449',
  'b4217d52-f22c-44c9-a548-99ef8b068b6f'],
 'count': 6}

#### ‚ùì Question #3:

How are the correct answers associated with the questions?

> NOTE: Feel free to indicate if this is problematic or not

#### Answer


The association between questions and correct answers happens through positional matching in parallel lists. 

This approach has several potential issues:

1. **Position-dependent mapping**: The association relies solely on list positions - the first question matches the first answer, second question with second answer, and so on. This creates a risk of misalignment if lists are modified independently.

2. **Fragile maintenance**: Adding or removing questions requires careful index management to maintain correct associations.

3. **Limited evaluation criteria**: Each answer only contains phrases that "must be mentioned" rather than complete reference answers or more nuanced evaluation criteria.

4. **String matching limitations**: The evaluator uses exact string matching (`all(phrase in prediction for phrase in required)`) which doesn't account for semantic equivalence or paraphrasing.



### Task 2: Adding Evaluators

Now we can add a custom evaluator to see if our responses contain the expected information.

We'll be using a fairly naive exact-match process to determine if our response contains specific strings.

In [21]:
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def must_mention(run, example) -> EvaluationResult:
    prediction = run.outputs.get("output") or ""
    required = example.outputs.get("must_mention") or []
    score = all(phrase in prediction for phrase in required)
    return EvaluationResult(key="must_mention", score=score)

#### ‚ùì Question #4:

What are some ways you could improve this metric as-is?

> NOTE: Alternatively you can suggest where gaps exist in this method.


#### Answer

the current implementation uses a simple exact string matching approach:

Here are ways to improve this metric:

1. **Semantic Matching**: Replace exact string matching with semantic similarity using embeddings to detect conceptually equivalent answers even when phrased differently.


The current method has notable gaps in that it can't handle paraphrasing, doesn't consider semantic meaning, and may incorrectly score responses that contain the required phrases but are fundamentally incorrect or contradictory.


Task 3: Evaluating

All that is left to do is evaluate our agent's response!

In [22]:
experiment_results = client.evaluate(
    agent_chain,
    data=dataset_name,
    evaluators=[must_mention],
    experiment_prefix=f"RAG Pipeline - Evaluation - {uuid4().hex[0:4]}",
    metadata={"version": "1.0.0"},
)

View the evaluation results for experiment: 'RAG Pipeline - Evaluation - fad9-8812cb12' at:
https://smith.langchain.com/o/519bdaca-6663-4536-ab5c-1158fb651454/datasets/18764a3a-aa33-43b1-9bc4-ecf33083b931/compare?selectedSessions=b212c466-1d03-4a18-b456-4d9c19cb9379




0it [00:00, ?it/s]

In [23]:
experiment_results

Unnamed: 0,inputs.question,outputs.output,error,reference.must_mention,feedback.must_mention,execution_time,example_id,id
0,What optimizer is used in QLoRA?,"QLoRA uses ""paged optimizers"" to manage memory...",,"[paged, optimizer]",True,9.613872,04160349-02c4-4ba3-b103-a56ff5835e0c,2d8e4582-7d84-42f2-97a4-0348904ec159
1,What is the most popular deep learning framework?,"In 2023, the most popular deep learning framew...",,"[PyTorch, TensorFlow]",True,5.95862,1a45d644-6393-4b1b-b242-843982829449,b041746f-fbc0-46ba-953e-099a9151457f
2,Who authored the QLoRA paper?,"The QLoRA paper titled ""Accurate LoRA-Finetuni...",,"[Tim, Dettmers]",False,13.92167,1de91a09-0373-4982-bb87-9393b37b8d44,ae8f2e00-d479-4f1b-958a-2126d1f36a6d
3,What data type was created in the QLoRA paper?,The QLoRA paper introduced a new data type cal...,,"[NF4, NormalFloat]",True,15.764034,693c2c19-7e03-45ce-afb2-7312c3824dde,b3992d5f-6663-48a7-bebc-d0429aeaeab4
4,What is a Retrieval Augmented Generation system?,A Retrieval Augmented Generation (RAG) system ...,,"[ground, context]",False,2.454875,8e7a920e-aaa9-4166-8917-b76fb413e39e,c25d4866-92ac-44d4-929d-5f270aa8c40c
5,What significant improvements does the LoRA sy...,The LoRA (Low-Rank Adaptation) system has seen...,,"[reduce, parameters]",False,24.377492,b4217d52-f22c-44c9-a548-99ef8b068b6f,dc144e48-4c25-450d-9949-115a5f73de77


## Part 2: LangGraph with Helpfulness:

### Task 3: Adding Helpfulness Check and "Loop" Limits

Now that we've done evaluation - let's see if we can add an extra step where we review the content we've generated to confirm if it fully answers the user's query!

We're going to make a few key adjustments to account for this:

1. We're going to add an artificial limit on how many "loops" the agent can go through - this will help us to avoid the potential situation where we never exit the loop.
2. We'll add to our existing conditional edge to obtain the behaviour we desire.

First, let's define our state again - we can check the length of the state object, so we don't need additional state for this.

In [24]:
class AgentState(TypedDict):
  messages: Annotated[list, add_messages]

Now we can set our graph up! This process will be almost entirely the same - with the inclusion of one additional node/conditional edge!

#### üèóÔ∏è Activity #5:

Please write markdown for the following cells to explain what each is doing.

### Setting Up the Helpfulness-Enhanced Graph

This code initializes a new StateGraph that will incorporate a helpfulness check mechanism. We create the graph with the same AgentState structure as before and add two key nodes:

1. The "agent" node, which uses the call_model function to process the state and generate a response
2. The "action" node, which uses tool_node to execute external tools when needed

This forms the basic structure upon which we'll build our enhanced graph with helpfulness evaluation.

In [25]:
graph_with_helpfulness_check = StateGraph(AgentState)

graph_with_helpfulness_check.add_node("agent", call_model)
graph_with_helpfulness_check.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x11c6cccd0>

### Defining the Entry Point

Here we set the "agent" node as the entry point for our graph.

In [26]:
graph_with_helpfulness_check.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x11c6cccd0>

### Creating the Conditional Routing Function

This code defines the `tool_call_or_helpful` function, which serves as the decision-making component of our graph. It performs three critical evaluations:

1. Checks if the last message contains tool calls, routing to the action node if tools are needed
2. Implements a cycle limit (10 messages) to prevent infinite loops
3. Uses an LLM-based helpfulness check that compares the initial query to the current response, routing to:
   - "end" if the response is deemed helpful (contains "Y")
   - "continue" to loop back through the agent if more work is needed

This function adds intelligence to our graph by evaluating response quality.

In [27]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

def tool_call_or_helpful(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  initial_query = state["messages"][0]
  final_response = state["messages"][-1]

  if len(state["messages"]) > 10:
    return "END"

  prompt_template = """\
  Given an initial query and a final response, determine if the final response is extremely helpful or not. Please indicate helpfulness with a 'Y' and unhelpfulness as an 'N'.

  Initial Query:
  {initial_query}

  Final Response:
  {final_response}"""

  prompt_template = PromptTemplate.from_template(prompt_template)

  helpfulness_check_model = ChatOpenAI(model="gpt-4")

  helpfulness_chain = prompt_template | helpfulness_check_model | StrOutputParser()

  helpfulness_response = helpfulness_chain.invoke({"initial_query" : initial_query.content, "final_response" : final_response.content})

  if "Y" in helpfulness_response:
    return "end"
  else:
    return "continue"

#### üèóÔ∏è Activity #4:

Please write what is happening in our `tool_call_or_helpful` function!

#### Answer
This code defines the `tool_call_or_helpful` function, which serves as the decision-making component of our graph. It performs three critical evaluations:

1. Checks if the last message contains tool calls, routing to the action node if tools are needed
2. Implements a cycle limit (10 messages) to prevent infinite loops
3. Uses an LLM-based helpfulness check that compares the initial query to the current response, routing to:
   - "end" if the response is deemed helpful (contains "Y")
   - "continue" to loop back through the agent if more work is needed

This function adds intelligence to our graph by evaluating response quality.

### Adding Conditional Edges with Helpfulness Routing

This code connects the decision logic (`tool_call_or_helpful`) to our graph structure by adding conditional edges from the agent node. The routing paths include:

- "continue": Loops back to the agent node for further processing if the response isn't helpful enough
- "action": Routes to the action node when tool calls are detected
- "end": Terminates the graph execution when a helpful response is generated or the cycle limit is reached

These conditional edges enable the graph to dynamically determine the optimal path based on response quality and need for tools.

In [28]:
graph_with_helpfulness_check.add_conditional_edges(
    "agent",
    tool_call_or_helpful,
    {
        "continue" : "agent",
        "action" : "action",
        "end" : END
    }
)

<langgraph.graph.state.StateGraph at 0x11c6cccd0>

### Connecting Action Output Back to the Agent

This code adds a direct edge from the "action" node back to the "agent" node. This creates a critical feedback loop where:

1. The action node executes tools as requested
2. Results from those tools are automatically passed back to the agent
3. The agent can then incorporate the tool results into its reasoning

This edge ensures the agent can process and reason about information retrieved from external tools.##### YOUR MARKDOWN HERE

In [29]:
graph_with_helpfulness_check.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x11c6cccd0>

### Compiling the Graph

This step compiles our graph with all its nodes, edges, and conditional logic into an executable format. The compilation process optimizes the graph for efficient execution and validates that all connections are properly defined. 

In [30]:
agent_with_helpfulness_check = graph_with_helpfulness_check.compile()


### Streaming Agent Execution with Helpfulness Evaluation

This code demonstrates our helpfulness-enhanced agent handling a complex, multi-part query about machine learning concepts and a researcher. The code:

1. Creates an input with three distinct but related questions about LoRA (a machine learning technique), Tim Dettmers (a researcher), and Attention (a key ML concept)

2. Uses `astream()` to asynchronously process the input and stream the results, allowing us to observe the agent's decision-making in real-time

3. Prints each update from the graph's nodes as they occur, showing:
   - Which node is active (agent or action)
   - The content of messages being passed through the state
   - The decision points where helpfulness is evaluated

This visualization helps us understand how the agent dynamically decides when to use tools, when to continue refining its answer, and when it determines its response is sufficiently helpful to terminate the graph execution.


In [31]:
inputs = {"messages" : [HumanMessage(content="Related to machine learning, what is LoRA? Also, who is Tim Dettmers? Also, what is Attention?")]}

async for chunk in agent_with_helpfulness_check.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        print(values["messages"])
        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_EWD24pZEcEOhww4jynzaXaBG', 'function': {'arguments': '{"query": "LoRA machine learning"}', 'name': 'arxiv'}, 'type': 'function'}, {'id': 'call_JjikNAdpIM8u8r7qpIkvO5nl', 'function': {'arguments': '{"query": "Tim Dettmers"}', 'name': 'tavily_search_results_json'}, 'type': 'function'}, {'id': 'call_YW20EEjNPAJAWH53y4IRY1sC', 'function': {'arguments': '{"query": "Attention mechanism machine learning"}', 'name': 'arxiv'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 72, 'prompt_tokens': 177, 'total_tokens': 249, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_6b6e24b474', 'finish_reason': 'tool_calls', 'logprobs': None},

### Task 4: LangGraph for the "Patterns" of GenAI

Let's ask our system about the 4 patterns of Generative AI:

1. Prompt Engineering
2. RAG
3. Fine-tuning
4. Agents

In [32]:
patterns = ["prompt engineering", "RAG", "fine-tuning", "LLM-based agents"]

In [33]:
for pattern in patterns:
  what_is_string = f"What is {pattern} and when did it break onto the scene??"
  inputs = {"messages" : [HumanMessage(content=what_is_string)]}
  messages = agent_with_helpfulness_check.invoke(inputs)
  print(messages["messages"][-1].content)
  print("\n\n")

**Prompt Engineering Definition:**

Prompt engineering is the process of designing and refining input prompts to effectively guide the behavior of AI models. It involves structuring or crafting instructions to produce the best possible output from a generative AI model. This can include phrasing a query, specifying a style, choice of words and grammar, providing relevant context, or describing a character for the AI to mimic. It is a technique used to refine large language models (LLMs) with specific or recommended prompts, and it can be done by anyone using natural language in generators like ChatGPT or DALL-E. [Sources: Coursera, Wikipedia, TechTarget, Stanford UIT, DataCamp]

**History of Prompt Engineering:**

Prompt engineering has been around since the early days of natural language processing (NLP) and is closely tied to the development and evolution of NLP and AI systems. The importance of prompting grew as models became more sophisticated. A significant milestone was the relea