DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism Deep Dive Coming Soon
Released: November 29, 2023
This foundational paper explores scaling laws and the trade-offs between data and model size, establishing the groundwork for subsequent models.
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Deep Dive Coming Soon
Released: May 2024
Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.
DeepSeek-V3 Technical Report Deep Dive Coming Soon
Released: December 2024
Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision training and high-performance computing (HPC) co-design strategies.