# Google Colab Version: [Open this notebook in Google Colab](https://colab.research.google.com/github/starfishdata/starfish/blob/main/examples/structured_llm.ipynb)

#### Dependencies 

In [None]:
%pip install starfish-core

In [1]:
## Fix for Jupyter Notebook only — do NOT use in production
## Enables async code execution in notebooks, but may cause issues with sync/async issues
## For production, please run in standard .py files without this workaround
## See: https://github.com/erdewit/nest_asyncio for more details
import nest_asyncio
nest_asyncio.apply()

from starfish import StructuredLLM
from starfish.llm.utils import merge_structured_outputs

from pydantic import BaseModel, Field
from typing import List

from starfish.common.env_loader import load_env_file ## Load environment variables from .env file
load_env_file()

In [2]:
# setup your openai api key if not already set
# import os
# os.environ["OPENAI_API_KEY"] = "your_key_here"

# If you dont have any API key, use local model (ollama)

#### 1. Structured LLM with JSON Schema

In [3]:
# ### Define the Output Structure (JSON Schema)
# Let's start with a simple JSON-like schema using a list of dictionaries.
# Each dictionary specifies a field name and its type. description is optional
json_output_schema = [
 {"name": "question", "type": "str", "description": "The generated question."},
 {"name": "answer", "type": "str", "description": "The corresponding answer."},
]

json_llm = StructuredLLM(
 model_name = "openai/gpt-4o-mini",
 prompt = "Funny facts about city {{city_name}}.",
 output_schema = json_output_schema,
 model_kwargs = {"temperature": 0.7},
)

json_response = await json_llm.run(city_name="New York")

# The response object contains both parsed data and the raw API response.
json_response.data

[{'question': 'Why did the tomato turn red in New York?',
 'answer': "Because it saw the Big Apple and couldn't ketchup with all the excitement!"}]

In [4]:
# Fully preserved raw response from API - allow you to parse the response as you want
# Like function call, tool call, thinking token etc
json_response.raw

ModelResponse(id='chatcmpl-BQGw3FMSjzWOPMRvXmgknN4oozrKK', created=1745601327, model='gpt-4o-mini-2024-07-18', object='chat.completion', system_fingerprint='fp_0392822090', choices=[Choices(finish_reason='stop', index=0, message=Message(content='[\n {\n "question": "Why did the tomato turn red in New York?",\n "answer": "Because it saw the Big Apple and couldn\'t ketchup with all the excitement!"\n }\n]', role='assistant', tool_calls=None, function_call=None, provider_specific_fields={'refusal': None}, annotations=[]))], usage=Usage(completion_tokens=41, prompt_tokens=77, total_tokens=118, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None)), service_tier='default')

#### 2. Structured LLM with Pydantic Schema (Nested)

In [5]:
# ### Define the Output Structure (Pydantic Model)
class Fact(BaseModel):
 question: str = Field(..., description="The factual question generated.")
 answer: str = Field(..., description="The corresponding answer.")
 category: str = Field(..., description="A category for the fact (e.g., History, Geography).")

# You can define a list of these models if you expect multiple results.
class FactsList(BaseModel):
 facts: List[Fact] = Field(..., description="A list of facts.")


# ### Create the StructuredLLM Instance with Pydantic
pydantic_llm = StructuredLLM(
 model_name="openai/gpt-4o-mini",
 # Ask for multiple facts this time
 prompt="Generate distinct facts about {{city}}.",
 # Pass the Pydantic model directly as the schema
 output_schema=FactsList, # Expecting a list of facts wrapped in the FactsList model
 model_kwargs={"temperature": 0.8}
)

pydantic_llm_response = await pydantic_llm.run(city="New York")

pydantic_llm_response.data

[{'facts': [{'question': 'What year did New York City become the capital of the United States?',
 'answer': 'New York City served as the capital of the United States from 1785 to 1790.',
 'category': 'History'}]}]

#### 3. Working with Different LLM Providers

Starfish uses LiteLLM under the hood, giving you access to 100+ LLM providers. Here is an example of using a custom model provider - Hyperbolic - Super cool provider with full precision model and low cost!

In [6]:

# Set up the relevant API Key and Base URL in your enviornment variables
# os.environ["HYPERBOLIC_API_KEY"] = "your_key_here"
# os.environ["HYPERBOLIC_API_BASE"] = "https://api.hyperbolic.xyz/v1"

hyperbolic_llm = StructuredLLM(
 model_name="hyperbolic/deepseek-ai/DeepSeek-V3-0324", 
 prompt="Facts about city {{city_name}}.",
 output_schema=[{"name": "question", "type": "str"}, {"name": "answer", "type": "str"}],
 model_kwargs={"temperature": 0.7},
)

hyperbolic_llm_response = await hyperbolic_llm.run(city_name="New York", num_records=5)
hyperbolic_llm_response.data

[{'question': 'What is the nickname of New York City?',
 'answer': 'The Big Apple'},
 {'question': 'Which iconic statue is located in New York Harbor?',
 'answer': 'The Statue of Liberty'},
 {'question': 'What is the name of the famous theater district in Manhattan?',
 'answer': 'Broadway'},
 {'question': "Which park is considered the 'lungs' of New York City?",
 'answer': 'Central Park'},
 {'question': 'What is the tallest building in New York City as of 2023?',
 'answer': 'One World Trade Center'}]

#### 3. Local LLM using Ollama
Ensure Ollama is installed and running. Starfish can manage the server process and model downloads

In [7]:
### Local model
ollama_llm = StructuredLLM(
 # Prefix 'ollama/' specifies the Ollama provider
 model_name="ollama/gemma3:1b",
 prompt="Facts about city {{city_name}}.",
 output_schema=[{"name": "question", "type": "str"}, {"name": "answer", "type": "str"}],
 model_kwargs={"temperature": 0.7},
)

ollama_llm_response = await ollama_llm.run(city_name="New York", num_records=5)
ollama_llm_response.data

[32m2025-04-25 10:15:40[0m | [1mINFO [0m | [1mEnsuring Ollama model gemma3:1b is ready...[0m
[32m2025-04-25 10:15:40[0m | [1mINFO [0m | [1mStarting Ollama server...[0m
[32m2025-04-25 10:15:41[0m | [1mINFO [0m | [1mOllama server started successfully[0m
[32m2025-04-25 10:15:41[0m | [1mINFO [0m | [1mFound model gemma3:1b[0m
[32m2025-04-25 10:15:41[0m | [1mINFO [0m | [1mModel gemma3:1b is already available[0m
[32m2025-04-25 10:15:41[0m | [1mINFO [0m | [1mModel gemma3:1b is ready, making API call...[0m


[{'question': 'What is the population of New York City?',
 'answer': 'As of 2023, the population of New York City is approximately 8.8 million people.'}]

In [8]:
### Resource clean up to close ollama server
from starfish.llm.backend.ollama_adapter import stop_ollama_server
await stop_ollama_server()

[32m2025-04-25 10:15:54[0m | [1mINFO [0m | [1mStopping Ollama server...[0m
[32m2025-04-25 10:15:55[0m | [1mINFO [0m | [1mOllama server stopped successfully[0m


True

#### 4. Chaining Multiple StructuredLLM Calls

You can easily pipe the output of one LLM call into the prompt of another. This is useful for multi-step reasoning, analysis, or refinement.


In [9]:
# ### Step 1: Generate Initial Facts
generator_llm = StructuredLLM(
 model_name="openai/gpt-4o-mini",
 prompt="Generate question/answer pairs about {{topic}}.",
 output_schema=[
 {"name": "question", "type": "str"},
 {"name": "answer", "type": "str"}
 ],
)

# ### Step 2: Rate the Generated Facts
rater_llm = StructuredLLM(
 model_name="openai/gpt-4o-mini",
 prompt='''Rate the following Q&A pairs based on accuracy and clarity (1-10).
 Pairs: {{generated_pairs}}''',
 output_schema=[
 {"name": "accuracy_rating", "type": "int"},
 {"name": "clarity_rating", "type": "int"}
 ],
 model_kwargs={"temperature": 0.5}
)

## num_records is reserved keyword for structured llm object, by default it is 1
generation_response = await generator_llm.run(topic='Science', num_records=5)
print("Generated Facts:", generation_response.data)

# Please note that we are using the first response as the input for the second LLM
# It will automatically figure out it need to output the same length of first response
# In this case 5 records
rating_response = await rater_llm.run(generated_pairs=generation_response.data)
### Each response will only return its own output
print("Ratings:", rating_response.data)


### You can merge two response together by using merge_structured_outputs (index wise merge)
print(merge_structured_outputs(generation_response.data, rating_response.data))

Generated Facts: [{'question': 'What is the chemical formula for water?', 'answer': 'The chemical formula for water is H2O.'}, {'question': 'What is the process by which plants convert sunlight into energy?', 'answer': 'The process is called photosynthesis.'}, {'question': "What is the primary gas found in the Earth's atmosphere?", 'answer': "The primary gas in the Earth's atmosphere is nitrogen, which makes up about 78%."}, {'question': "What is Newton's second law of motion?", 'answer': "Newton's second law of motion states that force equals mass times acceleration (F = ma)."}, {'question': 'What is the smallest unit of life?', 'answer': 'The smallest unit of life is the cell.'}]
Ratings: [{'accuracy_rating': 10, 'clarity_rating': 10}, {'accuracy_rating': 10, 'clarity_rating': 10}, {'accuracy_rating': 10, 'clarity_rating': 10}, {'accuracy_rating': 10, 'clarity_rating': 10}, {'accuracy_rating': 10, 'clarity_rating': 10}]
[{'question': 'What is the chemical formula for water?', 'answer

#### 5. Dynamic Prompt 

`StructuredLLM` uses Jinja2 for prompts, allowing variables and logic.

In [10]:
# ### Create an LLM with a more complex prompt
template_llm = StructuredLLM(
 model_name="openai/gpt-4o-mini",
 prompt='''Generate facts about {{city}}.
 {% if user_context %}
 User background: {{ user_context }}
 {% endif %}''', ### user_context is optional and only used if provided
 output_schema=[{"name": "fact", "type": "str"}]
)

template_response = await template_llm.run(city="New York")
print(template_response.data)


[{'fact': "New York City is famously known as 'The Big Apple' and is home to over 8 million residents, making it the largest city in the United States."}]


In [11]:
template_response = await template_llm.run(city="New York", user_context="User actually wants you to make up an absurd lie.")
print(template_response.data)

[{'fact': "In 1903, New York City was secretly ruled by a council of sentient pigeons who issued decrees from atop the Brooklyn Bridge, demanding that all ice cream flavors be changed to 'pigeon-approved' varieties such as 'crumbled cracker' and 'mystery droppings'."}]


#### 8. Scaling with Data Factory (Brief Mention)
While `StructuredLLM` handles single or chained calls, Starfish's `@data_factory` decorator is designed for massively parallel execution. You can easily wrap these single or multi chain within a function decorated
with `@data_factory` to process thousands of inputs concurrently and reliably.

See the dedicated examples for `data_factory` usage.