Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Li Zhang's picture
5 2 1

Li Zhang

Andcircle
·

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago
Understanding InstaFlow/Rectified Flow
upvoted an article 7 months ago
Recoloring photos with diffusers
reacted to BramVanroy's post with 👍 about 1 year ago
Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others. Any prior experience that you can share or suggestions to improve throughout?
View all activity

Organizations

None yet

Andcircle 's datasets

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs