
llm-semantic-router/qwen3_mmlu_math-reasoner_no_leakage_r32
Text Generation
•
Updated
A Mixture-of-Models(MoM) Router that understands the request intent.
One fabric. Many minds. We're introducing MoM (Mixture of Models)—a family of specialized routing models that power vLLM-SR's intelligent decision-making.
vLLM-SR solves a critical problem: how to route LLM requests to the right model at the right time. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."
A quick overview of all MoM models:
Category | Model | Size | Architecture | Base Model | Purpose |
---|---|---|---|---|---|
đź§ Intelligent Routing | mom-brain-flash | Flash | Encoder | ModernBERT | Ultra-fast intent classification |
mom-brain-pro | Pro | Decoder | Qwen3 0.6B | Balanced routing with reasoning | |
mom-brain-max | Max | Decoder | Qwen3 1.7B | Maximum accuracy for complex decisions | |
🔍 Similarity Search | mom-similarity-flash | Flash | Encoder | BERT | Semantic similarity matching |
đź”’ Prompt Guardian | mom-jailbreak-flash | Flash | Encoder | ModernBERT | Jailbreak/attack detection |
mom-pii-flash | Flash | Encoder | ModernBERT | PII detection & privacy protection | |
🎯 SLM Experts | mom-expert-math-flash | Flash | Decoder | Qwen3 0.6B | Backend math problem solver |
mom-expert-science-flash | Flash | Decoder | Qwen3 0.6B | Backend science problem solver | |
mom-expert-social-flash | Flash | Decoder | Qwen3 0.6B | Backend social sciences solver | |
mom-expert-humanities-flash | Flash | Decoder | Qwen3 0.6B | Backend humanities solver | |
mom-expert-law-flash | Flash | Decoder | Qwen3 0.6B | Backend law problem solver | |
mom-expert-generalist-flash | Flash | Decoder | Qwen3 0.6B | Backend generalist solver |
Key Insights: