Make model config compatible with Hugging Face MiniMax implementation
This PR updates the model configuration to be compatible with the upstream Hugging Face transformers
implementation of the MiniMax
architecture, introduced in transformers#35831. With these changes, the model can now be loaded without relying on trust_remote_code=True
.
This enables easier and safer usage via the standard AutoModelForCausalLM
interface:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"MiniMaxAI/MiniMax-Text-01-hf",
revision="refs/pr/39",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"MiniMaxAI/MiniMax-Text-01-hf",
revision="refs/pr/39"
)
prompt = "Hello!"
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
{"role": "user", "content": [{"type": "text", "text": prompt}]},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True)
tokenizer.batch_decode(generated_ids)[0]
Hi @geetu040 , thanks a lot for opening the PR to add Transformers support! We’ve hit a couple of config-parameter clashes in the main repo, so we’re spinning up a dedicated HF version for this integration. Could you please retarget your PR to MiniMax-Text-01-hf instead? Appreciate your help!
FYI this is a breaking change for existing users in vLLM who used this model with the old format https://github.com/vllm-project/vllm/issues/20198
We may want to keep around this format and just point HF users to the new "-hf" version