File size: 3,132 Bytes
e36851c 8ea0f36 e36851c a949f51 e36851c a949f51 e36851c cc68cfb a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c a949f51 e36851c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
library_name: transformers
pipeline_tag: text-generation
---
---
# wasm-32B-Instruct-V1
**wasm-32B-Instruct-V1** is a state-of-the-art instruction-tuned large language model developed by [wasmdashai](https://huggingface.co/wasmdashai). With 32 billion parameters, this model is designed to deliver high-quality performance across a wide range of natural language processing and code-related tasks,
## π Introduction
`wasm-32B-Instruct-V1` is built for instruction-following tasks and general-purpose reasoning. It leverages a powerful transformer architecture with optimized performance for large-scale generation tasks including:
* π§ Code generation and debugging
* π Long-context understanding
* π£οΈ Multi-turn dialogue and reasoning
* π Privacy-conscious edge deployments (e.g., via WebAssembly)
This model is fine-tuned on diverse instruction datasets and optimized for both human alignment and computational efficiency.
## ποΈ Model Details
* **Type**: Causal Language Model (Decoder-only)
* **Parameters**: 32 Billion
* **Training**: Pretraining + Instruction Fine-tuning
* **Architecture**: Transformer with:
* Rotary Position Embeddings (RoPE)
* SwiGLU activation
* RMSNorm
* Attention with QKV bias
* **Context Length**: Up to **32,768** tokens
* **Extended Context Option**: Via `rope_scaling` (supports up to 128K with YaRN)
* **Format**: Hugging Face Transformers-compatible
## βοΈ Requirements
To use this model, install the latest version of π€ `transformers` (>= 4.37.0 recommended):
```bash
pip install --upgrade transformers
```
## π§ͺ Quickstart
Here is a minimal example to load the model and generate a response:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "wasmdashai/wasm-32B-Instruct-V1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "Explain the concept of recursion with Python code."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## π§© Processing Long Texts
This model supports **context lengths up to 32,768 tokens**. For even longer inputs, you can enable **YaRN** scaling by modifying the `config.json` as follows:
```json
{
"rope_scaling": {
"type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}
```
This is ideal for handling documents, logs, or multi-step reasoning tasks that exceed standard limits.
## π¦ Deployment Notes
We recommend using `vLLM` for efficient deployment, especially with large input lengths or real-time serving needs. Please note:
* `vLLM` currently supports static YaRN only.
* Avoid applying rope scaling unless necessary for long-context tasks, as it may impact performance on short inputs.
## π¬ Contact
For support, feedback, or collaboration inquiries, please contact:
π§ **[[email protected]](mailto:[email protected])**
---
|