LlamaEdge compatible quants for SmolVLM2 models.
AI & ML interests
Run open source LLMs across CPU and GPU without changing the binary in Rust and Wasm locally!
Organization Card
Run Open source LLMs and create OpenAI-compatible API services for the Llama2 series of LLMs locally With LlamaEdge!
Give it a try
Run a single command in your command line terminal.
bash <(curl -sSfL 'https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh') --interactive
Follow the on-screen instructions to install the WasmEdge Runtime and download your favorite open-source LLM. Then, choose whether you want to chat with the model via the CLI or via a web UI.
See it in action | GitHub | Docs
Why?
LlamaEdge, powered by Rust and WasmEdge, provides a strong alternative to Python in AI inference.
- Lightweight. The total runtime size is 30MB.
- Fast. Full native speed on GPUs.
- Portable. Single cross-platform binary on different CPUs, GPUs, and OSes.
- Secure. Sandboxed and isolated execution on untrusted devices.
- Container-ready. Supported in Docker, containerd, Podman, and Kubernetes.
Learn more
Please visit the LlamaEdge project to learn more.
models 295
second-state/Qwen3-Reranker-0.6B-GGUF
Updated
second-state/Seed-OSS-36B-Instruct-GGUF
Text Generation • 36B • Updated
• 78
second-state/embeddinggemma-300m-GGUF
Sentence Similarity • 0.3B • Updated
• 1.05k
second-state/NVIDIA-Nemotron-Nano-9B-v2-GGUF
Text Generation • 9B • Updated
• 128 • 1
second-state/Nemotron-Mini-4B-Instruct-GGUF
4B • Updated
• 660 • 1
second-state/jina-embeddings-v3-GGUF
0.6B • Updated
• 936 • 2
second-state/MiniCPM-V-4-GGUF
Visual Question Answering • 4B • Updated
• 269
second-state/MiniCPM-V-4_5-GGUF
Visual Question Answering • 8B • Updated
• 619 • 13
second-state/Qwen3-Coder-30B-A3B-Instruct-GGUF
Text Generation • 31B • Updated
• 251
second-state/gemma-3-270m-it-GGUF
Text Generation • 0.3B • Updated
• 110
datasets 0
None public yet