File size: 1,882 Bytes

---
license: apache-2.0
---

<h1 align="center"> Moxin 7B Instruct </h1>

<p align="center"> <a href="https://github.com/moxin-org/Moxin-LLM">Home Page</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://arxiv.org/abs/2412.06845">Technical Report</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://huggingface.co/moxin-org/Moxin-7B-LLM">Base Model</a> &nbsp&nbsp | &nbsp&nbsp <a href="https://huggingface.co/moxin-org/Moxin-7B-Chat">Chat Model</a>  &nbsp&nbsp | &nbsp&nbsp  <a href="https://huggingface.co/moxin-org/Moxin-7B-Instruct">Instruct Model</a> &nbsp&nbsp | &nbsp&nbsp  <a href="https://huggingface.co/moxin-org/Moxin-7B-Reasoning">Reasoning Model</a>  &nbsp&nbsp | &nbsp&nbsp  <a href="https://huggingface.co/moxin-org/Moxin-7B-VLM">VLM Model</a> </p>



## Chat Template

The chat template is formatted as:
```
<|system|>\nYou are a helpful AI assistant!\n<|user|>\nHow are you doing?\n<|assistant|>\nThank you for asking! As an AI, I don't have feelings, but I'm functioning normally and ready to assist you. How can I help you today?<|endoftext|>
```
Or with new lines expanded:
```
<|system|>
You are a helpful AI assistant!
<|user|>
How are you doing?
<|assistant|>
Thank you for asking! As an AI, I don't have feelings, but I'm functioning normally and ready to assist you. How can I help you today?<|endoftext|>
```


## Inference

You can use the following code to run inference with the model. 

```
import transformers
import torch

model_id = "moxin-org/Moxin-7B-Instruct"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant!"},
    {"role": "user", "content": "How are you doing?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=1024,
)

print(outputs[0]["generated_text"][-1])

```