AI & ML interests

A Mixture-of-Models(MoM) Router that understands the request intent.

Recent Activity

mom-family

One fabric. Many minds. We're introducing MoM (Mixture of Models)—a family of specialized routing models that power vLLM-SR's intelligent decision-making.

Why MoM?

vLLM-SR solves a critical problem: how to route LLM requests to the right model at the right time. Not every query needs the same resources—"What's the weather?" shouldn't cost as much as "Analyze this legal contract."

MoM System Card

A quick overview of all MoM models:

Category Model Size Architecture Base Model Purpose
đź§  Intelligent Routing mom-brain-flash Flash Encoder ModernBERT Ultra-fast intent classification
mom-brain-pro Pro Decoder Qwen3 0.6B Balanced routing with reasoning
mom-brain-max Max Decoder Qwen3 1.7B Maximum accuracy for complex decisions
🔍 Similarity Search mom-similarity-flash Flash Encoder BERT Semantic similarity matching
đź”’ Prompt Guardian mom-jailbreak-flash Flash Encoder ModernBERT Jailbreak/attack detection
mom-pii-flash Flash Encoder ModernBERT PII detection & privacy protection
🎯 SLM Experts mom-expert-math-flash Flash Decoder Qwen3 0.6B Backend math problem solver
mom-expert-science-flash Flash Decoder Qwen3 0.6B Backend science problem solver
mom-expert-social-flash Flash Decoder Qwen3 0.6B Backend social sciences solver
mom-expert-humanities-flash Flash Decoder Qwen3 0.6B Backend humanities solver
mom-expert-law-flash Flash Decoder Qwen3 0.6B Backend law problem solver
mom-expert-generalist-flash Flash Decoder Qwen3 0.6B Backend generalist solver

Key Insights:

  • 4 Categories: 3 for routing (Intelligent Routing, Similarity Search, Prompt Guardian) + 1 for backend problem solving (SLM Experts)
  • ModernBERT (encoder-only) → Sub-10ms latency for high-throughput routing
  • Qwen3 (decoder-only) → Explainable routing decisions + domain-specific problem solving
  • Flash models achieve 10,000+ QPS on commodity hardware
  • SLM Experts are not routers—they are specialized backend models that solve domain-specific problems

datasets 0

None public yet