Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Li Zhang's picture
5 2 1

Li Zhang

Andcircle
·

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago
Understanding InstaFlow/Rectified Flow
upvoted an article 7 months ago
Recoloring photos with diffusers
reacted to BramVanroy's post with 👍 about 1 year ago
Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others. Any prior experience that you can share or suggestions to improve throughout?
View all activity

Organizations

None yet

Collections 1

Multi-Modality
  • Emerging Properties in Unified Multimodal Pretraining

    Paper • 2505.14683 • Published May 20 • 130
Multi-Modality
  • Emerging Properties in Unified Multimodal Pretraining

    Paper • 2505.14683 • Published May 20 • 130

models 0

None public yet

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs