'save_total_limit' not respected
#14
by
enricoburi
- opened
Hello,
I am fine-tuning this model for a classification task using the Trainer class.
Among the trainer arguments, I am setting 'save_total_limit'=1.
I am using the exact same code I have been using for 'bert-base-uncased' and 'ModernBERT-base', and for which everything works as expected.
However, in this case, the parameter in object seems to be completely ignored: as the training progresses, no checkpoint seems to ever be deleted. Given the size of the model, this becomes problematic pretty quickly.
Any idea what this might be related to?
I am working in JupyterLab on AWS SageMaker Studio and using transformers 4.51.3.
Keep up with the great work!