Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
| ## | |
| Run `accelerate config` and answer the questionnaire accordingly. | |
| Below is an example yaml for mixed-precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs. | |
| <pre> | |
| compute_environment: LOCAL_MACHINE | |
| deepspeed_config: | |
| gradient_accumulation_steps: 1 | |
| gradient_clipping: 1.0 | |
| offload_optimizer_device: cpu | |
| offload_param_device: cpu | |
| zero3_init_flag: true | |
| zero3_save_16bit_model: true | |
| zero_stage: 3 | |
| distributed_type: DEEPSPEED | |
| downcast_bf16: 'no' | |
| dynamo_backend: 'NO' | |
| fsdp_config: {} | |
| machine_rank: 0 | |
| main_training_function: main | |
| megatron_lm_config: {} | |
| mixed_precision: fp16 | |
| num_machines: 1 | |
| num_processes: 8 | |
| rdzv_backend: static | |
| same_network: true | |
| use_cpu: false | |
| </pre> | |
| ## | |
| <pre> | |
| from accelerate import Accelerator | |
| + def main(): | |
| accelerator = Accelerator() | |
| model, optimizer, training_dataloader, scheduler = accelerator.prepare( | |
| model, optimizer, training_dataloader, scheduler | |
| ) | |
| for batch in training_dataloader: | |
| optimizer.zero_grad() | |
| inputs, targets = batch | |
| outputs = model(inputs) | |
| loss = loss_function(outputs, targets) | |
| accelerator.backward(loss) | |
| optimizer.step() | |
| scheduler.step() | |
| ... | |
| generated_tokens = accelerator.unwrap_model(model).generate( | |
| batch["input_ids"], | |
| attention_mask=batch["attention_mask"], | |
| **gen_kwargs, | |
| + synced_gpus=True #required for ZeRO Stage 3 | |
| ) | |
| ... | |
| accelerator.unwrap_model(model).save_pretrained( | |
| args.output_dir, | |
| is_main_process=accelerator.is_main_process, | |
| save_function=accelerator.save, | |
| + state_dict=accelerator.get_state_dict(model), #required for ZeRO Stage 3 | |
| ) | |
| ... | |
| + if __name__ == "__main__": | |
| + main() | |
| </pre> | |
| Launching a script using default accelerate config file looks like the following: | |
| ``` | |
| accelerate launch {script_name.py} {--arg1} {--arg2} ... | |
| ``` | |
| Alternatively, you can use `accelerate launch` with right config params for multi-gpu training as shown below | |
| ``` | |
| accelerate launch \ | |
| --use_deepspeed \ | |
| --num_processes=8 \ | |
| --mixed_precision=fp16 \ | |
| --zero_stage=3 \ | |
| --gradient_accumulation_steps=1 \ | |
| --gradient_clipping=1 \ | |
| --zero3_init_flag=True \ | |
| --zero3_save_16bit_model=True \ | |
| --offload_optimizer_device=cpu \ | |
| --offload_param_device=cpu \ | |
| {script_name.py} {--arg1} {--arg2} ... | |
| ``` | |
| ## | |
| For core DeepSpeed features supported via accelerate config file, no changes are required for ZeRO Stages 1 and 2. For ZeRO Stage-3, transformers' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs. | |
| For advanced users who like granular control via DeepSpeed config file, it is supported wherein you can pass its loaction when running `accelerate config` command. You can also specify values of most of the fields in DeepSpeed config file as `auto` and they are filled automatically via the arguments of `accelerate launch` command and `accelerator.prepare` call thereby making life simple for users. Please refer docs on <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed#deepspeed-config-file" target="_blank">DeepSpeed Config File</a> | |
| ## | |
| To learn more checkout the related documentation: | |
| - <a href="https://huggingface.co/docs/accelerate/usage_guides/deepspeed" target="_blank">How to use DeepSpeed</a> | |
| - <a href="https://huggingface.co/blog/accelerate-deepspeed" target="_blank">Accelerate Large Model Training using DeepSpeed</a> | |
| - <a href="https://huggingface.co/docs/accelerate/package_reference/deepspeed" target="_blank">DeepSpeed Utilities</a> |