Spaces:
Running
on
Zero
Apply for community grant: Academic project (gpu)
Dear Hugging Face Team,
Emotion-LLaMA is our multimodal large language model developed for emotion recognition and reasoning. It represents a key academic contribution and has been accepted to NeurIPS 2024, making it the first multimodal large model in the field of affective computing.
We previously deployed Emotion-LLaMA on Hugging Face Spaces, but due to limited research funding, we had to suspend the service in early 2025.
The model has received wide recognition from the community, with over 300 stars on GitHub and nearly 2,000 visits to our Hugging Face Space. We are now hoping to bring it back online to further benefit the open science and AI research community.
We would greatly appreciate GPU support—specifically an Nvidia A10G (small) instance would be sufficient to host our model effectively.
Thank you very much for your support and for enabling open-source innovation.
Best regards,
Zebang Cheng
Hi @ZebangCheng , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.
Hi @hysts ,
Thank you so much for your quick reply and for assigning ZeroGPU to our Space! We truly appreciate it.
As we've been trying to get the Space running, we've been consistently encountering an initialization bug that prevents the service from starting up correctly. We have already reviewed the compatibility and usage documentation you provided, but unfortunately, the issue persists.
We were wondering if you might have a moment to take a look and see what could be causing this bug?
Thank you again for your time and support!
Best regards,
Zebang Cheng
Hi
@ZebangCheng
I've looked into it and noticed a few things:
This code is problematic. In Spaces, apps are supposed to run on port 7860.
On a separate note, this might not be needed. FYI, even if login is actually needed, if you set
HF_TOKEN
as a secret in your Space Settings, it logs in automatically. So, you don't need to calllogin()
manually, and just adding the env variable is enough.Regarding the gradio version in Spaces, it's controlled by
sdk_version
in theREADME.md
, not byrequirements.txt
. So even if you set a version inrequirements.txt
, it's overridden by the one inREADME.md
.
https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA/blob/fbbaf0cb00765fefc37985cc4001afe12698696f/README.md?code=true#L7
Also, when using the latestgradio==5.34.0
, looks likematplotlib
is no longer in its dependency, but your app seems to depend on it, so you need to addmatplotlib
torequirements.txt
.
After these changes, the Space can at least launch properly on ZeroGPU.
That said, there still seems to be an issue. When I try to run the example, the following cuBLAS error appears.
cuBLAS API failed with status 15
error detectedA: torch.Size([293, 4096]), B: torch.Size([4096, 4096]), C: (293, 4096); (lda, ldb, ldc): (c_int(9376), c_int(131072), c_int(9376)); (m, n, k): (c_int(293), c_int(4096), c_int(4096))
Exception in thread Thread-11 (model_generate):
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/user/app/minigpt4/conversation/conversation.py", line 247, in model_generate
output = self.model.llama_model.generate(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/peft/peft_model.py", line 580, in generate
return self.base_model.generate(**kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 1572, in generate
return self.sample(
File "/usr/local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2619, in sample
outputs = self(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/peft/tuners/lora.py", line 502, in forward
result = super().forward(x)
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 441, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 563, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 401, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/usr/local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1792, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!
I'm not sure what this is, but I think it's probably something you can fix on your side.
That's it for now. Hope this helps!