|
--- |
|
license: unknown |
|
base_model: |
|
- apple/DiffuCoder-7B-Instruct |
|
tags: |
|
- code |
|
- text-diffusion-model |
|
- diffusion large language model |
|
--- |
|
|
|
### DiffuCoder-7B-cpGRPO |
|
|
|
The DiffuCoder-7B-cpGRPO variant further refines DiffuCoder-Instruct with reinforcement learning via Coupled-GRPO. |
|
|
|
Training recipe: |
|
|
|
- Initialized from DiffuCoder-7B-Instruct, post-training with coupled-GRPO on 21K code data (1 epoch). |
|
- coupled-GRPO significantly improves DiffuCoder's performance on code generation benchmarks (+4.4\% on EvalPlus) and reduces reliance on AR bias during decoding. |
|
|
|
|
|
#### More details and usage examples: |
|
|
|
- Paper: [DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation](https://arxiv.org/abs/2506.20639) |
|
|
|
- GitHub: https://github.com/apple/ml-diffucoder |
|
|
|
#### Acknowledgement |
|
To power this HuggingFace model release, we reuse [Dream](https://huggingface.co/Dream-org/Dream-v0-Base-7B)'s modeling architecture and generation utils. |