Apply for community grant: Academic project (gpu)

#3
by ZebangCheng - opened

Dear Hugging Face Team,

Emotion-LLaMA is our multimodal large language model developed for emotion recognition and reasoning. It represents a key academic contribution and has been accepted to NeurIPS 2024, making it the first multimodal large model in the field of affective computing.

We previously deployed Emotion-LLaMA on Hugging Face Spaces, but due to limited research funding, we had to suspend the service in early 2025.

The model has received wide recognition from the community, with over 300 stars on GitHub and nearly 2,000 visits to our Hugging Face Space. We are now hoping to bring it back online to further benefit the open science and AI research community.

We would greatly appreciate GPU support—specifically an Nvidia A10G (small) instance would be sufficient to host our model effectively.

Thank you very much for your support and for enabling open-source innovation.

Best regards,
Zebang Cheng

Hi @ZebangCheng , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Hi @hysts ,

Thank you so much for your quick reply and for assigning ZeroGPU to our Space! We truly appreciate it.

As we've been trying to get the Space running, we've been consistently encountering an initialization bug that prevents the service from starting up correctly. We have already reviewed the compatibility and usage documentation you provided, but unfortunately, the issue persists.

We were wondering if you might have a moment to take a look and see what could be causing this bug?

Thank you again for your time and support!

Best regards,
Zebang Cheng

@ZebangCheng Sure, I'll take a look and let you know if I find something.

Hi @ZebangCheng
I've looked into it and noticed a few things:

  1. This code is problematic. In Spaces, apps are supposed to run on port 7860.

  2. On a separate note, this might not be needed. FYI, even if login is actually needed, if you set HF_TOKEN as a secret in your Space Settings, it logs in automatically. So, you don't need to call login() manually, and just adding the env variable is enough.

  3. Regarding the gradio version in Spaces, it's controlled by sdk_version in the README.md, not by requirements.txt. So even if you set a version in requirements.txt, it's overridden by the one in README.md.
    https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA/blob/fbbaf0cb00765fefc37985cc4001afe12698696f/README.md?code=true#L7
    Also, when using the latest gradio==5.34.0, looks like matplotlib is no longer in its dependency, but your app seems to depend on it, so you need to add matplotlib to requirements.txt.

After these changes, the Space can at least launch properly on ZeroGPU.

That said, there still seems to be an issue. When I try to run the example, the following cuBLAS error appears.

cuBLAS API failed with status 15
error detectedA: torch.Size([293, 4096]), B: torch.Size([4096, 4096]), C: (293, 4096); (lda, ldb, ldc): (c_int(9376), c_int(131072), c_int(9376)); (m, n, k): (c_int(293), c_int(4096), c_int(4096))
Exception in thread Thread-11 (model_generate):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/app/minigpt4/conversation/conversation.py", line 247, in model_generate
    output = self.model.llama_model.generate(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/peft/peft_model.py", line 580, in generate
    return self.base_model.generate(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1572, in generate
    return self.sample(
  File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2619, in sample
    outputs = self(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/peft/tuners/lora.py", line 502, in forward
    result = super().forward(x)
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 441, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 401, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/usr/local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1792, in igemmlt
    raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!

I'm not sure what this is, but I think it's probably something you can fix on your side.

That's it for now. Hope this helps!

Sign up or log in to comment