✅ Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.) ✅Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B. ✅ Three stage done while pretraining: • Stage 1: General language learning and knowledge building. • Stage 2: Reasoning boost with STEM, coding, and logic skills. • Stage 3: Long context training ✅ It supports MCP in the model ✅ Strong agent skills ✅ Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template. ✅ Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.