'save_total_limit' not respected

#14
by enricoburi - opened

Hello,

I am fine-tuning this model for a classification task using the Trainer class.

Among the trainer arguments, I am setting 'save_total_limit'=1.
I am using the exact same code I have been using for 'bert-base-uncased' and 'ModernBERT-base', and for which everything works as expected.

However, in this case, the parameter in object seems to be completely ignored: as the training progresses, no checkpoint seems to ever be deleted. Given the size of the model, this becomes problematic pretty quickly.
Any idea what this might be related to?

I am working in JupyterLab on AWS SageMaker Studio and using transformers 4.51.3.

Keep up with the great work!

Sign up or log in to comment