File size: 2,853 Bytes
bd9afc9 02ba5c6 bd9afc9 02ba5c6 bd9afc9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
library_name: transformers
pipeline_tag: text-generation
inference: true
widget:
- text: Hello!
example_title: Hello world
group: Python
base_model:
- openai/gpt-oss-120b
---
This tiny model is for debugging. It is randomly initialized with the config adapted from [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b).
Note: This model is in BF16; quantized MXFP4 FFN is not used.
### Example usage:
- vLLM
```bash
vllm serve tiny-random/gpt-oss
```
- Transformers
```python
import torch
from transformers import pipeline
model_id = "tiny-random/gpt-oss"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="cuda"
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=16,
)
print(outputs[0]["generated_text"][-1])
```
### Codes to create this repo:
```python
import json
import torch
from huggingface_hub import hf_hub_download
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoProcessor,
AutoTokenizer,
GenerationConfig,
GptOssForCausalLM,
pipeline,
set_seed,
)
source_model_id = "openai/gpt-oss-120b"
save_folder = "/tmp/tiny-random/gpt-oss"
processor = AutoProcessor.from_pretrained(source_model_id)
processor.save_pretrained(save_folder)
with open(hf_hub_download(source_model_id, filename='config.json', repo_type='model'), 'r') as f:
config_json = json.load(f)
config_json.update({
"head_dim": 32,
"hidden_size": 32, # required by Mxfp4GptOssExperts codes
"intermediate_size": 64,
"layer_types": ["sliding_attention", "full_attention"],
"num_attention_heads": 2,
"num_hidden_layers": 2,
"num_key_value_heads": 1,
"num_local_experts": 32,
"tie_word_embeddings": True,
})
quantization_config = config_json['quantization_config']
del config_json['quantization_config']
with open(f"{save_folder}/config.json", "w", encoding='utf-8') as f:
json.dump(config_json, f, indent=2)
config = AutoConfig.from_pretrained(save_folder)
print(config)
torch.set_default_dtype(torch.bfloat16)
model = AutoModelForCausalLM.from_config(config)
torch.set_default_dtype(torch.float32)
model.generation_config = GenerationConfig.from_pretrained(
source_model_id, trust_remote_code=True,
)
set_seed(42)
with torch.no_grad():
for name, p in sorted(model.named_parameters()):
torch.nn.init.normal_(p, 0, 0.1)
print(name, p.shape)
model.save_pretrained(save_folder)
# mxfp4
from transformers.quantizers.quantizer_mxfp4 import Mxfp4HfQuantizer
# model = AutoModelForCausalLM.from_pretrained(save_folder, trust_remote_code=True, torch_dtype=torch.bfloat16, quantization_config=quantization_config)
# model.save_pretrained(save_folder, safe_serialization=True)
``` |