File size: 3,132 Bytes
e36851c
 
8ea0f36
e36851c
 
a949f51
e36851c
a949f51
e36851c
cc68cfb
a949f51
e36851c
a949f51
e36851c
a949f51
 
 
 
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
 
 
 
e36851c
a949f51
 
 
 
 
 
 
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
 
 
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
 
e36851c
a949f51
e36851c
a949f51
 
 
 
 
 
e36851c
a949f51
 
e36851c
a949f51
 
e36851c
a949f51
 
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
 
 
 
 
 
 
 
 
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
 
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
e36851c
a949f51
e36851c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
pipeline_tag: text-generation
---

---

# wasm-32B-Instruct-V1

**wasm-32B-Instruct-V1** is a state-of-the-art instruction-tuned large language model developed by [wasmdashai](https://huggingface.co/wasmdashai). With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks,
## πŸš€ Introduction

`wasm-32B-Instruct-V1` is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including:

* 🧠 Code generation and debugging
* πŸ“š Long-context understanding
* πŸ—£οΈ Multi-turn dialogue and reasoning
* πŸ” Privacy-conscious edge deployments (e.g., via WebAssembly)

This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency.

## πŸ—οΈ Model Details

* **Type**: Causal Language Model (Decoder-only)
* **Parameters**: 32 Billion
* **Training**: Pretraining + Instruction Fine-tuning
* **Architecture**: Transformer with:

  * Rotary Position Embeddings (RoPE)
  * SwiGLU activation
  * RMSNorm
  * Attention with QKV bias
* **Context Length**: Up to **32,768** tokens
* **Extended Context Option**: Via `rope_scaling` (supports up to 128K with YaRN)
* **Format**: Hugging Face Transformers-compatible

## βš™οΈ Requirements

To use this model, install the latest version of πŸ€— `transformers` (>= 4.37.0 recommended):

```bash
pip install --upgrade transformers
```

## πŸ§ͺ Quickstart

Here is a minimal example to load the model and generate a response:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "wasmdashai/wasm-32B-Instruct-V1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Explain the concept of recursion with Python code."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
```

## 🧩 Processing Long Texts

This model supports **context lengths up to 32,768 tokens**. For even longer inputs, you can enable **YaRN** scaling by modifying the `config.json` as follows:

```json
{
  "rope_scaling": {
    "type": "yarn",
    "factor": 4.0,
    "original_max_position_embeddings": 32768
  }
}
```

This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits.

## πŸ“¦ Deployment Notes

We recommend using `vLLM` for efficient deployment, especially with large input lengths or real-time serving needs. Please note:

* `vLLM` currently supports static YaRN only.
* Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs.

## πŸ“¬ Contact

For support, feedback, or collaboration inquiries, please contact:

πŸ“§ **[[email protected]](mailto:[email protected])**

---