license: mit
datasets:
- fka/awesome-chatgpt-prompts
language:
- en
metrics:
- character
base_model:
- openai/gpt-oss-20b
new_version: openai/gpt-oss-20b
pipeline_tag: token-classification
library_name: fastai
tags:
- code
π§ Custom GPT from Scratch β Saved in Safetensors
This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model β the Transformer weights are initialized randomly and trained fully on a small custom dataset.
π Features
Custom GPT architecture β written in pure PyTorch
From scratch training β no pre-trained weights
Hugging Face Trainer integration for training loop, evaluation, and logging
Tokenizer compatibility β uses GPT2 tokenizer for convenience
Safetensors format β safe, portable model checkpointing
Tiny dataset β quick training for learning purposes
π How it Works
SimpleGPTConfig β stores model hyperparameters
CausalSelfAttention β implements causal masked multi-head self-attention
Block β Transformer block with LayerNorm, attention, and feed-forward network
SimpleGPTLMHeadModel β complete GPT model with language modeling head
Trainer setup β defines dataset, tokenizer, data collator, and training arguments
Training & saving β model is saved as model.safetensors
π Getting Started
1οΈβ£ Install dependencies
pip install torch transformers datasets accelerate safetensors
2οΈβ£ Train the model
python train.py
This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.
π Repository Structure
βββ train.py # Main training script
βββ README.md # Project documentation
βββ mini_custom_transformer_safetensors/
βββ config.json
βββ model.safetensors
βββ tokenizer.json
π‘ Why Safetensors?
Security β avoids arbitrary code execution vulnerabilities in .bin files
Speed β faster loading on CPU and GPU
Interoperability β works with Hugging Face models out of the box
π Notes
This is a learning example, not intended for production-level performance.
Since it trains from scratch on a tiny dataset, output quality will be limited.
You can expand the dataset and train longer for better results.
π License
MIT License β feel free to use, modify, and share.
If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.