Spaces:

fireworks-ai
/

scout-claims

Running

File size: 19,287 Bytes

e954acb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/RobertoBarrosoLuque/scout-claims/blob/main/notebooks/2-Exercises.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0",
      "metadata": {
        "id": "0"
      },
      "source": [
        "# Exercises: Putting the Building Blocks into Practice\n",
        "\n",
        "Welcome to the hands-on portion of the workshop! In these exercises, you will apply the concepts we've learned to solve a few practical problems.\n",
        "\n",
        "**Your goals will be to:**\n",
        "1.  **Extend Function Calling**: Add a new tool for the LLM to use.\n",
        "2.  **Modify Structured Output**: Change a Pydantic schema to extract additional structured information from an image.\n",
        "3.  **Bonus! Use Grammar Mode**: Force the LLM to respond in a highly specific, token-efficient format.\n",
        "\n",
        "Look out for the lines marked \"TODO\" in each cell; those are where you will write your code. Let's get started!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "e966e0b4",
      "metadata": {
        "id": "e966e0b4"
      },
      "outputs": [],
      "source": [
        "#\n",
        "# SETUP CELL #1: PLEASE RUN THIS BEFORE CONTINUING WITH THE EXERCISES.\n",
        "# RESTART THE RUNTIME AFTER RUNNING THIS CELL IF PROMPTED TO DO SO.\n",
        "#\n",
        "!pip install pydantic requests Pillow python-dotenv"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "eac6208b",
      "metadata": {
        "id": "eac6208b"
      },
      "outputs": [],
      "source": [
        "#\n",
        "# SETUP CELL #2: PLEASE RUN THIS BEFORE CONTINUING WITH THE EXERCISES\n",
        "#\n",
        "import os\n",
        "import io\n",
        "import base64\n",
        "from dotenv import load_dotenv\n",
        "import requests\n",
        "import json\n",
        "load_dotenv()\n",
        "\n",
        "MODEL_ID = \"accounts/fireworks/models/llama4-scout-instruct-basic\"\n",
        "\n",
        "# This pattern is for Google Colab.\n",
        "# If running locally, set the FIREWORKS_API_KEY environment variable.\n",
        "try:\n",
        "    from google.colab import userdata\n",
        "    FIREWORKS_API_KEY = userdata.get('FIREWORKS_API_KEY')\n",
        "except ImportError:\n",
        "    FIREWORKS_API_KEY = os.getenv(\"FIREWORKS_API_KEY\")\n",
        "\n",
        "# Make sure to set your FIREWORKS_API_KEY\n",
        "if not FIREWORKS_API_KEY:\n",
        "    print(\"⚠️  Warning: FIREWORKS_API_KEY not set. The following cells will not run without it.\")\n",
        "\n",
        "# Helper function to prepare images for VLMs.\n",
        "# It is defined here to be available for later exercises.\n",
        "def pil_to_base64_dict(pil_image):\n",
        "    \"\"\"Convert PIL image to the format expected by VLMs\"\"\"\n",
        "    if pil_image is None:\n",
        "        return None\n",
        "\n",
        "    buffered = io.BytesIO()\n",
        "    if pil_image.mode != \"RGB\":\n",
        "        pil_image = pil_image.convert(\"RGB\")\n",
        "\n",
        "    pil_image.save(buffered, format=\"JPEG\")\n",
        "    img_base64 = base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",
        "\n",
        "    return {\"image\": pil_image, \"path\": \"uploaded_image.jpg\", \"base64\": img_base64}\n",
        "\n",
        "# Helper function to make api calls with requests\n",
        "def make_api_call(payload, tools=None, model_id=None, base_url=None):\n",
        "    \"\"\"Make API call with requests\"\"\"\n",
        "    # Use defaults if not provided\n",
        "    final_model_id = model_id or MODEL_ID\n",
        "    final_base_url = base_url or \"https://api.fireworks.ai/inference/v1\"\n",
        "\n",
        "    # Add model to payload\n",
        "    payload[\"model\"] = final_model_id\n",
        "\n",
        "    # Add tools if provided\n",
        "    if tools:\n",
        "        payload[\"tools\"] = tools\n",
        "        payload[\"tool_choice\"] = \"auto\"\n",
        "\n",
        "    headers = {\n",
        "        \"Authorization\": f\"Bearer {FIREWORKS_API_KEY}\",\n",
        "        \"Content-Type\": \"application/json\"\n",
        "    }\n",
        "\n",
        "    response = requests.post(\n",
        "        f\"{final_base_url}/chat/completions\",\n",
        "        headers=headers,\n",
        "        json=payload\n",
        "    )\n",
        "\n",
        "    if response.status_code == 200:\n",
        "        return response.json()\n",
        "    else:\n",
        "        raise Exception(f\"API Error: {response.status_code} - {response.text}\")\n",
        "\n",
        "print(\"✅ Setup complete. Helper function and API key are ready.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "09bc4200",
      "metadata": {
        "id": "09bc4200"
      },
      "source": [
        "## Exercise 1: Extending Function Calling\n",
        "\n",
        "[Function calling](https://docs.fireworks.ai/guides/function-calling) allows an LLM to use external tools. Your first task is to give the LLM a new tool.\n",
        "\n",
        "**Goal**: Define a new function called `count_letter` that counts the occurrences of a specific letter in a word. You will then define its schema and make it available to the LLM.\n",
        "\n",
        "**Your Steps:**\n",
        "1.  Define the Python function `count_letter`.\n",
        "2.  Add it to the `available_functions` dictionary.\n",
        "3.  Define its schema and add it to the `tools` list.\n",
        "4.  Write a prompt to test your new function"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "99c48d84",
      "metadata": {
        "id": "99c48d84"
      },
      "outputs": [],
      "source": [
        "###\n",
        "### EXERCISE 1: WRITE YOUR CODE IN THIS CELL\n",
        "###\n",
        "import json\n",
        "\n",
        "# --- Step 1: Define the Python function and the available functions mapping ---\n",
        "\n",
        "# Base function from the previous notebook\n",
        "def get_weather(location: str) -> str:\n",
        "    \"\"\"Get current weather for a location\"\"\"\n",
        "    weather_data = {\"New York\": \"Sunny, 72°F\", \"London\": \"Cloudy, 15°C\", \"Tokyo\": \"Rainy, 20°C\"}\n",
        "    return weather_data.get(location, \"Weather data not available\")\n",
        "\n",
        "# ---TODO Block start---- #\n",
        "# Define a new function `count_letter` that takes a `word` and a `letter`\n",
        "# and returns the number of times the letter appears in the word.\n",
        "def count_letter(): # TODO: Add your function header here\n",
        "    # TODO: Add your function body here\n",
        "    pass\n",
        "# ---TODO Block end---- #\n",
        "\n",
        "available_functions = {\n",
        "    \"get_weather\": get_weather,\n",
        "    # TODO: Add your new function to this dictionary\n",
        "}\n",
        "\n",
        "\n",
        "# --- Step 2: Define the function schemas for the LLM ---\n",
        "\n",
        "# Base tool schema from the previous notebook\n",
        "tools = [\n",
        "    {\n",
        "        \"type\": \"function\",\n",
        "        \"function\": {\n",
        "            \"name\": \"get_weather\",\n",
        "            \"description\": \"Get current weather for a location\",\n",
        "            \"parameters\": {\n",
        "                \"type\": \"object\",\n",
        "                \"properties\": {\n",
        "                    \"location\": {\n",
        "                        \"type\": \"string\",\n",
        "                        \"description\": \"The city name\"\n",
        "                        }\n",
        "                    },\n",
        "                \"required\": [\"location\"]\n",
        "            }\n",
        "        }\n",
        "    },\n",
        "    # TODO: Add the JSON schema for your `count_letter` function here.\n",
        "    # It should have two parameters: \"word\" and \"letter\", both are required strings.\n",
        "]\n",
        "\n",
        "\n",
        "# --- Step 3: Build your input to the LLM ---\n",
        "\n",
        "# Initialize the messages list\n",
        "messages = [\n",
        "    {\n",
        "        \"role\": \"system\",\n",
        "        \"content\": \"You are a helpful assistant. You have access to a couple of tools, use them when needed.\"\n",
        "    },\n",
        "    {\n",
        "        \"role\": \"user\",\n",
        "        \"content\": \"\" #TODO: Add your user prompt here\n",
        "    }\n",
        "]\n",
        "\n",
        "# Create payload\n",
        "payload = {\n",
        "    \"messages\": messages,\n",
        "    \"tools\": tools,\n",
        "    \"model\": \"accounts/fireworks/models/llama4-maverick-instruct-basic\"\n",
        "}\n",
        "\n",
        "# Get response from LLM\n",
        "response = make_api_call(payload=payload)\n",
        "\n",
        "# Check if the model wants to call a tool/function\n",
        "if response[\"choices\"][0][\"message\"][\"tool_calls\"]:\n",
        "    tool_call = response[\"choices\"][0][\"message\"][\"tool_calls\"][0]\n",
        "    function_name = tool_call[\"function\"][\"name\"]\n",
        "    function_args = json.loads(tool_call[\"function\"][\"arguments\"])\n",
        "\n",
        "    print(f\"LLM wants to call: {function_name}\")\n",
        "    print(f\"With arguments: {function_args}\")\n",
        "\n",
        "    # Execute the function\n",
        "    function_response = available_functions[function_name](**function_args)\n",
        "    print(f\"Function result: {function_response}\")\n",
        "\n",
        "    # Add the assistant's tool call to the conversation\n",
        "    messages.append({\n",
        "        \"role\": \"assistant\",\n",
        "        \"content\": \"\",\n",
        "        \"tool_calls\": response[\"choices\"][0][\"message\"][\"tool_calls\"]\n",
        "    })\n",
        "\n",
        "    # Add the function result to the conversation\n",
        "    messages.append({\n",
        "        \"role\": \"tool\",\n",
        "        \"content\": json.dumps(function_response) if isinstance(function_response, dict) else str(function_response)\n",
        "    })\n",
        "\n",
        "    # Create the final payload\n",
        "    final_payload = {\n",
        "    \"messages\": messages,\n",
        "    \"tools\": tools,\n",
        "    \"model\": \"accounts/fireworks/models/llama4-maverick-instruct-basic\"\n",
        "    }\n",
        "\n",
        "    # Get final response from LLM\n",
        "    final_response = make_api_call(payload=payload)\n",
        "\n",
        "    print(f'Final response: {final_response[\"choices\"][0][\"message\"][\"content\"]}')"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "4d198002",
      "metadata": {
        "id": "4d198002"
      },
      "source": [
        "## Exercise 2: Modifying Structured Outputs (JSON Mode)\n",
        "\n",
        "Structured output is critical for building reliable applications. Here, you'll modify an existing schema to extract more information from an image.\n",
        "\n",
        "**Goal**: Update the `IncidentAnalysis` Pydantic model to also extract the `make` and `model` of the vehicle in the image.\n",
        "\n",
        "**Your Steps:**\n",
        "1.  Add the `make` and `model` fields to the `IncidentAnalysis` Pydantic class.\n",
        "2.  Run the VLM call using [JSON mode](https://docs.fireworks.ai/structured-responses/structured-response-formatting) to see the new structured output."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "1dc5d727",
      "metadata": {
        "id": "1dc5d727"
      },
      "outputs": [],
      "source": [
        "###\n",
        "### EXERCISE 2: WRITE YOUR CODE IN THIS CELL\n",
        "###\n",
        "import requests\n",
        "import io\n",
        "from PIL import Image\n",
        "from pydantic import BaseModel, Field\n",
        "from typing import Literal\n",
        "\n",
        "# --- Step 1: Download a sample image ---\n",
        "url = \"https://raw.githubusercontent.com/RobertoBarrosoLuque/scout-claims/main/images/back_rhs_damage.png\"\n",
        "response = requests.get(url)\n",
        "image = Image.open(io.BytesIO(response.content))\n",
        "print(\"Image downloaded.\")\n",
        "\n",
        "\n",
        "# --- Step 2: Define the output schema ---\n",
        "# ---TODO Block start---- #\n",
        "# Add two new string fields to this Pydantic model:\n",
        "# - `make`: To store the make of the car (e.g., \"Ford\")\n",
        "# - `model`: To store the model of the car (e.g., \"Mustang\")\n",
        "class IncidentAnalysis(BaseModel):\n",
        "    description: str = Field(description=\"A description of the damage to the vehicle.\")\n",
        "    location: Literal[\"front-left\", \"front-right\", \"back-left\", \"back-right\", \"front\", \"side\"]\n",
        "    severity: Literal[\"minor\", \"moderate\", \"major\"]\n",
        "    license_plate: str | None = Field(description=\"The license plate of the vehicle, if visible.\")\n",
        "# ---TODO Block end---- #\n",
        "\n",
        "# --- Step 3: Call the VLM with the new schema ---\n",
        "# The 'pil_to_base64_dict' function was defined in the setup cell\n",
        "image_for_llm = pil_to_base64_dict(image)\n",
        "\n",
        "# Create payload\n",
        "prompt = \"Describe the car damage in this image and extract all useful information.\" # TODO: modify the prompt to include the new fields\n",
        "messages=[\n",
        "    {\n",
        "        \"role\": \"user\",\n",
        "        \"content\": [\n",
        "            {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_for_llm['base64']}\"}},\n",
        "            {\"type\": \"text\", \"text\": prompt},\n",
        "        ],\n",
        "    }\n",
        "]\n",
        "response_format={\n",
        "    \"type\": \"json_object\",\n",
        "    \"schema\": IncidentAnalysis.model_json_schema(),\n",
        "}\n",
        "\n",
        "payload = {\n",
        "    \"messages\": messages,\n",
        "    \"response_format\": response_format,\n",
        "    \"model\": \"accounts/fireworks/models/llama4-maverick-instruct-basic\"\n",
        "}\n",
        "\n",
        "# Get response from LLM\n",
        "response = make_api_call(payload=payload)\n",
        "\n",
        "\n",
        "result = json.loads(response[\"choices\"][0][\"message\"][\"content\"])\n",
        "print(json.dumps(result, indent=2))"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8e5a2e3d",
      "metadata": {
        "id": "8e5a2e3d"
      },
      "source": [
        "## Bonus Exercise: Constrained Output with Grammar Mode\n",
        "\n",
        "Sometimes you need the model to respond in a very specific, non-JSON format. This is where [Grammar Mode](https://docs.fireworks.ai/structured-responses/structured-output-grammar-based) excels. It forces the model's output to conform to a strict pattern you define, which can also save output tokens vs. JSON mode and offer even more granular control.\n",
        "\n",
        "**Goal**: Use grammar mode to force the model to output *only* the make and model of the car as a single lowercase string (e.g., \"ford mustang\").\n",
        "\n",
        "**Your Steps:**\n",
        "1.  Define a GBNF grammar string.\n",
        "2.  Call the model using `response_format={\"type\": \"grammar\", \"grammar\": ...}`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "1ea8cec3",
      "metadata": {
        "id": "1ea8cec3"
      },
      "outputs": [],
      "source": [
        "###\n",
        "### BONUS EXERCISE: WRITE YOUR CODE IN THIS CELL\n",
        "###\n",
        "\n",
        "# The 'image' variable and 'pil_to_base64_dict' helper function from previous\n",
        "# cells are used here. Make sure those cells have been run.\n",
        "# This assumes the image from Exercise 2 is still loaded.\n",
        "image_for_llm = pil_to_base64_dict(image)\n",
        "\n",
        "\n",
        "# --- Step 1: Define the GBNF grammar ---\n",
        "# Define a grammar that forces the output to be:\n",
        "# 1. A 'make' (one or more lowercase letters).\n",
        "# 2. Followed by a single space.\n",
        "# 3. Followed by a 'model' (one or more lowercase letters).\n",
        "car_grammar = r'''\n",
        "# TODO: define a grammar that forces the output to satisfy the format specified above (example output: \"ford mustang\")\n",
        "'''\n",
        "\n",
        "# --- Step 2: Define the prompt ---\n",
        "# Update the prompt to ask the model to identify the make and model and to respond only in the format specified above\n",
        "prompt = \"\"  # TODO: write your prompt here\n",
        "\n",
        "\n",
        "# --- Step 3: Call the VLM with grammar mode ---\n",
        "messages=[\n",
        "    {\n",
        "        \"role\": \"user\",\n",
        "        \"content\": [\n",
        "            {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_for_llm['base64']}\"}},\n",
        "            {\"type\": \"text\", \"text\": prompt},\n",
        "        ],\n",
        "    }\n",
        "]\n",
        "response_format={\n",
        "    # TODO: define the response format to use the grammar defined above\n",
        "}\n",
        "\n",
        "# Define payload\n",
        "payload = {\n",
        "    \"messages\": messages,\n",
        "    \"response_format\": response_format,\n",
        "    \"model\": \"accounts/fireworks/models/llama4-maverick-instruct-basic\"\n",
        "}\n",
        "\n",
        "# Get response from LLM\n",
        "response = make_api_call(payload=payload)\n",
        "\n",
        "print(f'Constrained output from model: {response[\"choices\"][0][\"message\"][\"content\"]}')"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 2
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython2",
      "version": "3.11.13"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}