DeepSeek Papers: Advancing Open-Source Language Models

DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism Deep Dive Coming Soon

Released: November 29, 2023

This foundational paper explores scaling laws and the trade-offs between data and model size, establishing the groundwork for subsequent models.

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Deep Dive Coming Soon

Released: May 2024

Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.

DeepSeek-V3 Technical Report Deep Dive Coming Soon

Released: December 2024

Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision training and high-performance computing (HPC) co-design strategies.