Goader/kobza
Viewer • Updated • 48.6M • 738 • 12
How to use lapa-llm/tokenizer with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="lapa-llm/tokenizer")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("lapa-llm/tokenizer")
model = AutoModelForImageTextToText.from_pretrained("lapa-llm/tokenizer")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use lapa-llm/tokenizer with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lapa-llm/tokenizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lapa-llm/tokenizer",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker model run hf.co/lapa-llm/tokenizer
How to use lapa-llm/tokenizer with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "lapa-llm/tokenizer" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lapa-llm/tokenizer",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "lapa-llm/tokenizer" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "lapa-llm/tokenizer",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'How to use lapa-llm/tokenizer with Docker Model Runner:
docker model run hf.co/lapa-llm/tokenizer
By adding more than 80K Ukrainian tokens without removing any English or EU languages tokens, Lapa Tokenizer makes Ukrainian the core language in the multilingual Gemma-3 tokenizer while keeping the vocabulary fixed at its original size of 256K tokens.
More than 16 of the most popular writing systems in the world were analyzed. Roughly four-fifths of tokens in scripts geographically and culturally distant from Ukraine—for example Bengali, Thai, Chinese, Japanese, and Korean—were pruned.
| Writing system | Tokens removed | Tokens retained |
|---|---|---|
| Han (Chinese) | 16,488 | 4,122 |
| Devanagari (Hindi) | 10,976 | 2,743 |
| Bengali | 7,983 | 1,995 |
| Arabic | 6,730 | 1,682 |
| Hiragana / Katakana (Japanese) | 3,944 | 985 |
| Hangul (Korean) | 3,744 | 935 |
| Tamil | 3,080 | 770 |
| Thai | 1,740 | 435 |
| Malayalam | 1,566 | 391 |
| Telugu | 1,428 | 356 |
| Gujarati | 1,080 | 270 |
| Kannada | 1,016 | 253 |
| Ethiopic | 691 | 172 |
| Hebrew | 670 | 167 |
| Khmer | 481 | 119 |
| Sinhala | 435 | 108 |
| Myanmar | 410 | 102 |
| Lao | 243 | 60 |
| Gurmukhi | 215 | 53 |
| Tibetan | 107 | 26 |
| Oriya | 100 | 25 |
| Cyrillic | 13,398 | 0 |
| Gemma-3 <unused-*> | 6,139 | 102 |
Replaced tokens table was replaced, no any tokens from other Writing system was affected.tokenizer = AutoTokenizer.from_pretrained("lapa-llm/tokenizer")
toks = tokenizer("Всі красиві зберігають оптимізм", add_special_tokens=False)
print(len(toks.input_ids)) -only 4 tokens 💪🏻
<think></think> for hybrid approach. This significantly speeds up tokenization.