A newer version of the Gradio SDK is available:
5.35.0
Models
Smolagents is an experimental API which is subject to change at any time. Results returned by the agents can vary as the APIs or underlying models are prone to change.
To learn more about agents and tools make sure to read the introductory guide. This page contains the API docs for the underlying classes.
Models
Your custom Model
You're free to create and use your own models to power your agent.
You could subclass the base Model
class to create a model for your agent.
The main criteria is to subclass the generate
method, with these two criteria:
- It follows the messages format (
List[Dict[str, str]]
) for its inputmessages
, and it returns an object with a.content
attribute. - It stops generating outputs at the sequences passed in the argument
stop_sequences
.
For defining your LLM, you can make a CustomModel
class that inherits from the base Model
class.
It should have a generate method that takes a list of messages and returns an object with a .content attribute containing the text. The generate
method also needs to accept a stop_sequences
argument that indicates when to stop generating.
from huggingface_hub import login, InferenceClient
login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")
model_id = "meta-llama/Llama-3.3-70B-Instruct"
client = InferenceClient(model=model_id)
class CustomModel(Model):
def generate(messages, stop_sequences=["Task"]):
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1024)
answer = response.choices[0].message
return answer
custom_model = CustomModel()
Additionally, generate
can also take a grammar
argument. In the case where you specify a grammar
upon agent initialization, this argument will be passed to the calls to model, with the grammar
that you defined upon initialization, to allow constrained generation in order to force properly-formatted agent outputs.
TransformersModel
For convenience, we have added a TransformersModel
that implements the points above by building a local transformers
pipeline for the model_id given at initialization.
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))
>>> What a
You must have
transformers
andtorch
installed on your machine. Please runpip install smolagents[transformers]
if it's not the case.
[[autodoc]] TransformersModel
InferenceClientModel
The InferenceClientModel
wraps huggingface_hub's InferenceClient for the execution of the LLM. It supports all Inference Providers available on the Hub: Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.
from smolagents import InferenceClientModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = InferenceClientModel(provider="novita")
print(model(messages))
>>> Of course! If you change your mind, feel free to reach out. Take care!
[[autodoc]] InferenceClientModel
LiteLLMModel
The LiteLLMModel
leverages LiteLLM to support 100+ LLMs from various providers.
You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass temperature
.
from smolagents import LiteLLMModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
[[autodoc]] LiteLLMModel
LiteLLMRouterModel
The LiteLLMRouterModel
is a wrapper around the LiteLLM Router that leverages
advanced routing strategies: load-balancing across multiple deployments, prioritizing critical requests via queueing,
and implementing basic reliability measures such as cooldowns, fallbacks, and exponential backoff retries.
from smolagents import LiteLLMRouterModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = LiteLLMRouterModel(
model_id="llama-3.3-70b",
model_list=[
{
"model_name": "llama-3.3-70b",
"litellm_params": {"model": "groq/llama-3.3-70b", "api_key": os.getenv("GROQ_API_KEY")},
},
{
"model_name": "llama-3.3-70b",
"litellm_params": {"model": "cerebras/llama-3.3-70b", "api_key": os.getenv("CEREBRAS_API_KEY")},
},
],
client_kwargs={
"routing_strategy": "simple-shuffle",
},
)
print(model(messages))
[[autodoc]] LiteLLMRouterModel
OpenAIServerModel
This class lets you call any OpenAIServer compatible model.
Here's how you can set it (you can customise the api_base
url to point to another server):
import os
from smolagents import OpenAIServerModel
model = OpenAIServerModel(
model_id="gpt-4o",
api_base="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
)
[[autodoc]] OpenAIServerModel
AzureOpenAIServerModel
AzureOpenAIServerModel
allows you to connect to any Azure OpenAI deployment.
Below you can find an example of how to set it up, note that you can omit the azure_endpoint
, api_key
, and api_version
arguments, provided you've set the corresponding environment variables -- AZURE_OPENAI_ENDPOINT
, AZURE_OPENAI_API_KEY
, and OPENAI_API_VERSION
.
Pay attention to the lack of an AZURE_
prefix for OPENAI_API_VERSION
, this is due to the way the underlying openai package is designed.
import os
from smolagents import AzureOpenAIServerModel
model = AzureOpenAIServerModel(
model_id = os.environ.get("AZURE_OPENAI_MODEL"),
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
api_version=os.environ.get("OPENAI_API_VERSION")
)
[[autodoc]] AzureOpenAIServerModel
AmazonBedrockServerModel
AmazonBedrockServerModel
helps you connect to Amazon Bedrock and run your agent with any available models.
Below is an example setup. This class also offers additional options for customization.
import os
from smolagents import AmazonBedrockServerModel
model = AmazonBedrockServerModel(
model_id = os.environ.get("AMAZON_BEDROCK_MODEL_ID"),
)
[[autodoc]] AmazonBedrockServerModel
MLXModel
from smolagents import MLXModel
model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
>>> What a
You must have
mlx-lm
installed on your machine. Please runpip install smolagents[mlx-lm]
if it's not the case.
[[autodoc]] MLXModel
VLLMModel
Model to use vLLM for fast LLM inference and serving.
from smolagents import VLLMModel
model = VLLMModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
You must have
vllm
installed on your machine. Please runpip install smolagents[vllm]
if it's not the case.
[[autodoc]] VLLMModel