joaogante's picture
joaogante HF Staff
Update README.md
eba7946 verified
|
raw
history blame
1.05 kB
metadata
library_name: transformers
tags:
  - custom_generate

Description

Test repo to experiment with calling generate from the hub. It is a simplified implementation of greedy decoding.

Base model:

Qwen/Qwen2.5-0.5B-Instruct

Model compatibility

Most models. More specifically, any transformer LLM/VLM trained for causal language modeling.

Additional Arguments

left_padding (int, optional): number of padding tokens to add before the provided input

Output Type changes

(none)

Example usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", device_map="auto")

inputs = tokenizer(["The quick brown"], return_tensors="pt").to(model.device)
gen_out = model.generate(**inputs, left_padding=5, custom_generate="transformers-community/custom_generate_example", trust_remote_code=True)
print(tokenizer.batch_decode(gen_out, skip_special_tokens=True))