flash-attention / adamw-zero.yaml
theonlyengine's picture
Upload 421 files
3f9c425 verified
raw
history blame contribute delete
185 Bytes
# @package train.optimizer
_target_: torch.distributed.optim.ZeroRedundancyOptimizer
_recursive_: True
optimizer_class:
_target_: torch.optim.__getattribute__
_args_:
- "AdamW"