Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -12,8 +12,6 @@ library_name: transformers 
     | 
|
| 12 | 
         | 
| 13 | 
         
             
            Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a larger variant, [Phi-mini-MoE](https://huggingface.co/microsoft/Phi-mini-MoE-instruct), with 7.6B total and 2.4B activated parameters.
         
     | 
| 14 | 
         | 
| 15 | 
         
            -
            The code can be found at https://github.com/microsoft/MoE-compression.
         
     | 
| 16 | 
         
            -
             
     | 
| 17 | 
         
             
            References: <br>
         
     | 
| 18 | 
         
             
            π [SlimMoE Paper](https://arxiv.org/pdf/2506.18349) <br>
         
     | 
| 19 | 
         
             
            π [Phi-3 Technical Report](https://arxiv.org/abs/2404.14219) <br>
         
     | 
| 
         | 
|
| 12 | 
         | 
| 13 | 
         
             
            Phi-tiny-MoE is a lightweight Mixture of Experts (MoE) model with 3.8B total parameters and 1.1B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](https://arxiv.org/pdf/2506.18349) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a larger variant, [Phi-mini-MoE](https://huggingface.co/microsoft/Phi-mini-MoE-instruct), with 7.6B total and 2.4B activated parameters.
         
     | 
| 14 | 
         | 
| 
         | 
|
| 
         | 
|
| 15 | 
         
             
            References: <br>
         
     | 
| 16 | 
         
             
            π [SlimMoE Paper](https://arxiv.org/pdf/2506.18349) <br>
         
     | 
| 17 | 
         
             
            π [Phi-3 Technical Report](https://arxiv.org/abs/2404.14219) <br>
         
     |