vLLM Semantic Router

community

https://vllm-semantic-router.com

Activity Feed Request to join this org

AI & ML interests

A Mixture-of-Models(MoM) Router that understands the request intent.

Recent Activity

HuaminChen updated a model about 4 hours ago

llm-semantic-router/qwen3_mmlu_math-reasoner_no_leakage_r32

HuaminChen published a model about 4 hours ago

llm-semantic-router/qwen3_mmlu_math-reasoner_no_leakage_r32

Xunzhuo updated a collection 1 day ago

MoM-Experts

View all activity

Organization Card

Community About org cards

One fabric. Many minds. We're introducing MoM (Mixture of Models)—a family of specialized routing models that power vLLM-SR's intelligent decision-making.

vLLM Semantic Router 👉: project link

Why MoM?

vLLM-SR solves a critical problem: how to route LLM requests to the right model at the right time. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."

MoM System Card

A quick overview of all MoM models:

Category	Model	Size	Architecture	Base Model	Purpose
🧠 Intelligent Routing	mom-brain-flash	Flash	Encoder	ModernBERT	Ultra-fast intent classification
	mom-brain-pro	Pro	Decoder	Qwen3 0.6B	Balanced routing with reasoning
	mom-brain-max	Max	Decoder	Qwen3 1.7B	Maximum accuracy for complex decisions
🔍 Similarity Search	mom-similarity-flash	Flash	Encoder	BERT	Semantic similarity matching
🔒 Prompt Guardian	mom-jailbreak-flash	Flash	Encoder	ModernBERT	Jailbreak/attack detection
	mom-pii-flash	Flash	Encoder	ModernBERT	PII detection & privacy protection
🎯 SLM Experts	mom-expert-math-flash	Flash	Decoder	Qwen3 0.6B	Backend math problem solver
	mom-expert-science-flash	Flash	Decoder	Qwen3 0.6B	Backend science problem solver
	mom-expert-social-flash	Flash	Decoder	Qwen3 0.6B	Backend social sciences solver
	mom-expert-humanities-flash	Flash	Decoder	Qwen3 0.6B	Backend humanities solver
	mom-expert-law-flash	Flash	Decoder	Qwen3 0.6B	Backend law problem solver
	mom-expert-generalist-flash	Flash	Decoder	Qwen3 0.6B	Backend generalist solver

Key Insights:

4 Categories: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
ModernBERT (encoder-only) → Sub-10ms latency for high-throughput routing
Qwen3 (decoder-only) → Explainable routing decisions + domain-specific problem solving
Flash models achieve 10,000+ QPS on commodity hardware
SLM Experts are not routers—they are specialized backend models that solve domain-specific problems

models 15

datasets 0

None public yet