LiteGPT-Instruct

This is a 124M parameter Language Model (GPT-2 Small architecture) fine-tuned on the Alpaca dataset for instruction following.

It is part of the "Small Language Model (SLM)" project, trained from scratch on educational data (FineWeb-Edu) and then fine-tuned on instructions.

Model Details

  • Architecture: GPT-2 Small (12 layers, 12 heads, 768 embedding dim)
  • Parameters: ~124 Million
  • Context Length: 1024 tokens
  • Training:

Usage

This model requires a specific prompt format to function correctly.

Prompt Template (Alpaca)

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{your_instruction}

### Response:

Python Example

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("koganrath/LiteGPT-Instruct")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

instruction = "List three primary colors."
prompt = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • Size: As a 124M parameter model, its reasoning capabilities are limited compared to larger models (7B+).
  • Hallucinations: It may generate incorrect or nonsensical information.
  • Bias: It inherits biases present in the FineWeb and Alpaca datasets.

Authors

Trained by koganrath as part of the LiteGPT Project.

Downloads last month
74
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kmkrworks/LiteGPT-Instruct

Finetuned
(1)
this model
Quantizations
1 model

Dataset used to train kmkrworks/LiteGPT-Instruct