|
--- |
|
title: Smol VLM 256m Instruct Docker |
|
emoji: π |
|
colorFrom: purple |
|
colorTo: yellow |
|
sdk: docker |
|
pinned: false |
|
short_description: Api endpoint for SMOL VLM 256M |
|
--- |
|
# π§ SmolVLM-256M: Vision + Language Inference API |
|
|
|
This Space demonstrates how to deploy and serve the **SmolVLM-256M-Instruct** multimodal language model using a Docker-based backend. The API provides OpenAI-style `chat/completions` endpoints for image + text understanding β similar to how ChatGPT Vision works. |
|
Example frontend app could be found here: https://text-rec-api.glitch.me/ |
|
|
|
## π Docker Setup |
|
|
|
This Space uses a custom Dockerfile that downloads and launches the SmolVLM model with vision support using [llama.cpp](https://github.com/ggerganov/llama.cpp). |
|
|
|
### Dockerfile |
|
|
|
```Dockerfile |
|
FROM ghcr.io/ggml-org/llama.cpp:full |
|
|
|
# Install wget |
|
RUN apt update && apt install wget -y |
|
|
|
# Download the GGUF model file |
|
RUN wget "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/SmolVLM-256M-Instruct-Q8_0.gguf" -O /smoll.gguf |
|
|
|
# Download the mmproj (multimodal projection) file |
|
RUN wget "https://huggingface.co/ggml-org/SmolVLM-256M-Instruct-GGUF/resolve/main/mmproj-SmolVLM-256M-Instruct-Q8_0.gguf" -O /mmproj.gguf |
|
|
|
# Run the server on port 7860 with moderate generation settings |
|
CMD [ "--server", "-m", "/smoll.gguf", "--mmproj", "/mmproj.gguf", "--port", "7860", "--host", "0.0.0.0", "-n", "512", "-t", "2" ] |
|
``` |
|
## π§ API Usage |
|
|
|
The server exposes a `POST /v1/chat/completions` endpoint compatible with the OpenAI API format. |
|
|
|
### π Request Format |
|
|
|
Send a JSON payload structured like this: |
|
|
|
```json |
|
{ |
|
"model": "SmolVLM-256M-Instruct", |
|
"messages": [ |
|
{ |
|
"role": "user", |
|
"content": [ |
|
{ "type": "text", "text": "What is in this image?" }, |
|
{ |
|
"type": "image_url", |
|
"image_url": { |
|
"url": "..." |
|
} |
|
} |
|
] |
|
} |
|
] |
|
} |
|
``` |
|
|
|
|
|
|