Final_Assignment_Template3

Running

App Files Files Community

Final_Assignment_Template3 / docs /source /en /reference /models.md

Duibonduil

Upload 2 files

a3d25b1 verified 2 days ago

preview code

raw

history blame contribute delete

7.73 kB

	# Models

	<Tip warning={true}>

	Smolagents is an experimental API which is subject to change at any time. Results returned by the agents
	can vary as the APIs or underlying models are prone to change.

	</Tip>

	To learn more about agents and tools make sure to read the [introductory guide](../index). This page
	contains the API docs for the underlying classes.

	## Models

	### Your custom Model

	You're free to create and use your own models to power your agent.

	You could subclass the base `Model` class to create a model for your agent.
	The main criteria is to subclass the `generate` method, with these two criteria:
	1. It follows the [messages format](./chat_templating) (`List[Dict[str, str]]`) for its input `messages`, and it returns an object with a `.content` attribute.
	2. It stops generating outputs at the sequences passed in the argument `stop_sequences`.

	For defining your LLM, you can make a `CustomModel` class that inherits from the base `Model` class.
	It should have a generate method that takes a list of [messages](./chat_templating) and returns an object with a .content attribute containing the text. The `generate` method also needs to accept a `stop_sequences` argument that indicates when to stop generating.

	```python
	from huggingface_hub import login, InferenceClient

	login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

	model_id = "meta-llama/Llama-3.3-70B-Instruct"

	client = InferenceClient(model=model_id)

	class CustomModel(Model):
	def generate(messages, stop_sequences=["Task"]):
	response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1024)
	answer = response.choices[0].message
	return answer

	custom_model = CustomModel()
	```

	Additionally, `generate` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to model, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs.

	### TransformersModel

	For convenience, we have added a `TransformersModel` that implements the points above by building a local `transformers` pipeline for the model_id given at initialization.

	```python
	from smolagents import TransformersModel

	model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

	print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))
	```
	```text
	>>> What a
	```

	> [!TIP]
	> You must have `transformers` and `torch` installed on your machine. Please run `pip install smolagents[transformers]` if it's not the case.

	[[autodoc]] TransformersModel

	### InferenceClientModel

	The `InferenceClientModel` wraps huggingface_hub's [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports all [Inference Providers](https://huggingface.co/docs/inference-providers/index) available on the Hub: Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.

	```python
	from smolagents import InferenceClientModel

	messages = [
	{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
	]

	model = InferenceClientModel(provider="novita")
	print(model(messages))
	```
	```text
	>>> Of course! If you change your mind, feel free to reach out. Take care!
	```
	[[autodoc]] InferenceClientModel

	### LiteLLMModel

	The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers.
	You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass `temperature`.

	```python
	from smolagents import LiteLLMModel

	messages = [
	{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
	]

	model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
	print(model(messages))
	```

	[[autodoc]] LiteLLMModel

	### LiteLLMRouterModel

	The `LiteLLMRouterModel` is a wrapper around the [LiteLLM Router](https://docs.litellm.ai/docs/routing) that leverages
	advanced routing strategies: load-balancing across multiple deployments, prioritizing critical requests via queueing,
	and implementing basic reliability measures such as cooldowns, fallbacks, and exponential backoff retries.

	```python
	from smolagents import LiteLLMRouterModel

	messages = [
	{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
	]

	model = LiteLLMRouterModel(
	model_id="llama-3.3-70b",
	model_list=[
	{
	"model_name": "llama-3.3-70b",
	"litellm_params": {"model": "groq/llama-3.3-70b", "api_key": os.getenv("GROQ_API_KEY")},
	},
	{
	"model_name": "llama-3.3-70b",
	"litellm_params": {"model": "cerebras/llama-3.3-70b", "api_key": os.getenv("CEREBRAS_API_KEY")},
	},
	],
	client_kwargs={
	"routing_strategy": "simple-shuffle",
	},
	)
	print(model(messages))
	```

	[[autodoc]] LiteLLMRouterModel

	### OpenAIServerModel

	This class lets you call any OpenAIServer compatible model.
	Here's how you can set it (you can customise the `api_base` url to point to another server):
	```py
	import os
	from smolagents import OpenAIServerModel

	model = OpenAIServerModel(
	model_id="gpt-4o",
	api_base="https://api.openai.com/v1",
	api_key=os.environ["OPENAI_API_KEY"],
	)
	```

	[[autodoc]] OpenAIServerModel

	### AzureOpenAIServerModel

	`AzureOpenAIServerModel` allows you to connect to any Azure OpenAI deployment.

	Below you can find an example of how to set it up, note that you can omit the `azure_endpoint`, `api_key`, and `api_version` arguments, provided you've set the corresponding environment variables -- `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, and `OPENAI_API_VERSION`.

	Pay attention to the lack of an `AZURE_` prefix for `OPENAI_API_VERSION`, this is due to the way the underlying [openai](https://github.com/openai/openai-python) package is designed.

	```py
	import os

	from smolagents import AzureOpenAIServerModel

	model = AzureOpenAIServerModel(
	model_id = os.environ.get("AZURE_OPENAI_MODEL"),
	azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
	api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
	api_version=os.environ.get("OPENAI_API_VERSION")
	)
	```

	[[autodoc]] AzureOpenAIServerModel

	### AmazonBedrockServerModel

	`AmazonBedrockServerModel` helps you connect to Amazon Bedrock and run your agent with any available models.

	Below is an example setup. This class also offers additional options for customization.

	```py
	import os

	from smolagents import AmazonBedrockServerModel

	model = AmazonBedrockServerModel(
	model_id = os.environ.get("AMAZON_BEDROCK_MODEL_ID"),
	)
	```

	[[autodoc]] AmazonBedrockServerModel

	### MLXModel


	```python
	from smolagents import MLXModel

	model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

	print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
	```
	```text
	>>> What a
	```

	> [!TIP]
	> You must have `mlx-lm` installed on your machine. Please run `pip install smolagents[mlx-lm]` if it's not the case.

	[[autodoc]] MLXModel

	### VLLMModel

	Model to use [vLLM](https://docs.vllm.ai/) for fast LLM inference and serving.

	```python
	from smolagents import VLLMModel

	model = VLLMModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

	print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
	```

	> [!TIP]
	> You must have `vllm` installed on your machine. Please run `pip install smolagents[vllm]` if it's not the case.

	[[autodoc]] VLLMModel