@onekq on Hugging Face: "Heard good things about this model and no inference providers support it ...…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

onekq

posted an update Apr 21

Post

821

Heard good things about this model and no inference providers support it ...

https://huggingface.co/THUDM/GLM-4-9B-0414

JLouisBiz

Apr 21

it works on the llama.cpp

It is how you can run it:

llama-server -ngl 999 --host 192.168.1.68 --override-kv glm4.rope.dimension_count=int:64 --override-kv tokenizer.ggml.eos_token_id=int:151336 -m /mnt/nvme0n1/LLM/quantized/GLM-4-9B-0414-Q8_0.gguf

Read here why:

Eval bug: GLM-Z1-9B-0414 · Issue #12946 · ggml-org/llama.cpp:
https://github.com/ggml-org/llama.cpp/issues/12946#issuecomment-2803564782

onekq

Apr 21

Ah I see. they have their own architecture.

https://github.com/huggingface/transformers/pull/37388

This will be hard.

In this post

onekq Yi Cui
JLouisBiz Jean Louis