File size: 7,733 Bytes
a3d25b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# Models
<Tip warning={true}>
Smolagents is an experimental API which is subject to change at any time. Results returned by the agents
can vary as the APIs or underlying models are prone to change.
</Tip>
To learn more about agents and tools make sure to read the [introductory guide](../index). This page
contains the API docs for the underlying classes.
## Models
### Your custom Model
You're free to create and use your own models to power your agent.
You could subclass the base `Model` class to create a model for your agent.
The main criteria is to subclass the `generate` method, with these two criteria:
1. It follows the [messages format](./chat_templating) (`List[Dict[str, str]]`) for its input `messages`, and it returns an object with a `.content` attribute.
2. It stops generating outputs at the sequences passed in the argument `stop_sequences`.
For defining your LLM, you can make a `CustomModel` class that inherits from the base `Model` class.
It should have a generate method that takes a list of [messages](./chat_templating) and returns an object with a .content attribute containing the text. The `generate` method also needs to accept a `stop_sequences` argument that indicates when to stop generating.
```python
from huggingface_hub import login, InferenceClient
login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")
model_id = "meta-llama/Llama-3.3-70B-Instruct"
client = InferenceClient(model=model_id)
class CustomModel(Model):
def generate(messages, stop_sequences=["Task"]):
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1024)
answer = response.choices[0].message
return answer
custom_model = CustomModel()
```
Additionally, `generate` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to model, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs.
### TransformersModel
For convenience, we have added a `TransformersModel` that implements the points above by building a local `transformers` pipeline for the model_id given at initialization.
```python
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))
```
```text
>>> What a
```
> [!TIP]
> You must have `transformers` and `torch` installed on your machine. Please run `pip install smolagents[transformers]` if it's not the case.
[[autodoc]] TransformersModel
### InferenceClientModel
The `InferenceClientModel` wraps huggingface_hub's [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports all [Inference Providers](https://huggingface.co/docs/inference-providers/index) available on the Hub: Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.
```python
from smolagents import InferenceClientModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = InferenceClientModel(provider="novita")
print(model(messages))
```
```text
>>> Of course! If you change your mind, feel free to reach out. Take care!
```
[[autodoc]] InferenceClientModel
### LiteLLMModel
The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers.
You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass `temperature`.
```python
from smolagents import LiteLLMModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
```
[[autodoc]] LiteLLMModel
### LiteLLMRouterModel
The `LiteLLMRouterModel` is a wrapper around the [LiteLLM Router](https://docs.litellm.ai/docs/routing) that leverages
advanced routing strategies: load-balancing across multiple deployments, prioritizing critical requests via queueing,
and implementing basic reliability measures such as cooldowns, fallbacks, and exponential backoff retries.
```python
from smolagents import LiteLLMRouterModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = LiteLLMRouterModel(
model_id="llama-3.3-70b",
model_list=[
{
"model_name": "llama-3.3-70b",
"litellm_params": {"model": "groq/llama-3.3-70b", "api_key": os.getenv("GROQ_API_KEY")},
},
{
"model_name": "llama-3.3-70b",
"litellm_params": {"model": "cerebras/llama-3.3-70b", "api_key": os.getenv("CEREBRAS_API_KEY")},
},
],
client_kwargs={
"routing_strategy": "simple-shuffle",
},
)
print(model(messages))
```
[[autodoc]] LiteLLMRouterModel
### OpenAIServerModel
This class lets you call any OpenAIServer compatible model.
Here's how you can set it (you can customise the `api_base` url to point to another server):
```py
import os
from smolagents import OpenAIServerModel
model = OpenAIServerModel(
model_id="gpt-4o",
api_base="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
)
```
[[autodoc]] OpenAIServerModel
### AzureOpenAIServerModel
`AzureOpenAIServerModel` allows you to connect to any Azure OpenAI deployment.
Below you can find an example of how to set it up, note that you can omit the `azure_endpoint`, `api_key`, and `api_version` arguments, provided you've set the corresponding environment variables -- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, and `OPENAI_API_VERSION`.
Pay attention to the lack of an `AZURE_` prefix for `OPENAI_API_VERSION`, this is due to the way the underlying [openai](https://github.com/openai/openai-python) package is designed.
```py
import os
from smolagents import AzureOpenAIServerModel
model = AzureOpenAIServerModel(
model_id = os.environ.get("AZURE_OPENAI_MODEL"),
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
api_version=os.environ.get("OPENAI_API_VERSION")
)
```
[[autodoc]] AzureOpenAIServerModel
### AmazonBedrockServerModel
`AmazonBedrockServerModel` helps you connect to Amazon Bedrock and run your agent with any available models.
Below is an example setup. This class also offers additional options for customization.
```py
import os
from smolagents import AmazonBedrockServerModel
model = AmazonBedrockServerModel(
model_id = os.environ.get("AMAZON_BEDROCK_MODEL_ID"),
)
```
[[autodoc]] AmazonBedrockServerModel
### MLXModel
```python
from smolagents import MLXModel
model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
```
```text
>>> What a
```
> [!TIP]
> You must have `mlx-lm` installed on your machine. Please run `pip install smolagents[mlx-lm]` if it's not the case.
[[autodoc]] MLXModel
### VLLMModel
Model to use [vLLM](https://docs.vllm.ai/) for fast LLM inference and serving.
```python
from smolagents import VLLMModel
model = VLLMModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
```
> [!TIP]
> You must have `vllm` installed on your machine. Please run `pip install smolagents[vllm]` if it's not the case.
[[autodoc]] VLLMModel
|