ankitkushwaha90's picture
Create README.md
601f8d4 verified
metadata
license: mit
datasets:
  - fka/awesome-chatgpt-prompts
language:
  - en
metrics:
  - character
base_model:
  - openai/gpt-oss-20b
new_version: openai/gpt-oss-20b
pipeline_tag: token-classification
library_name: fastai
tags:
  - code

🧠 Custom GPT from Scratch β€” Saved in Safetensors

This repository contains a minimal GPT-style Transformer built completely from scratch using PyTorch and integrated with the Hugging Face Trainer for easy training, evaluation, and saving. Unlike fine-tuning, this project does not start from a pre-trained model β€” the Transformer weights are initialized randomly and trained fully on a small custom dataset.

πŸ“‚ Features

  • Custom GPT architecture β€” written in pure PyTorch

  • From scratch training β€” no pre-trained weights

  • Hugging Face Trainer integration for training loop, evaluation, and logging

  • Tokenizer compatibility β€” uses GPT2 tokenizer for convenience

  • Safetensors format β€” safe, portable model checkpointing

  • Tiny dataset β€” quick training for learning purposes

πŸ“œ How it Works

  • SimpleGPTConfig β€” stores model hyperparameters

  • CausalSelfAttention β€” implements causal masked multi-head self-attention

  • Block β€” Transformer block with LayerNorm, attention, and feed-forward network

  • SimpleGPTLMHeadModel β€” complete GPT model with language modeling head

  • Trainer setup β€” defines dataset, tokenizer, data collator, and training arguments

  • Training & saving β€” model is saved as model.safetensors

πŸš€ Getting Started

1️⃣ Install dependencies

pip install torch transformers datasets accelerate safetensors

2️⃣ Train the model

python train.py

This will train on a small text dataset and save the model to ./mini_custom_transformer_safetensors.

πŸ—‚ Repository Structure
β”œβ”€β”€ train.py                   # Main training script
β”œβ”€β”€ README.md                  # Project documentation
└── mini_custom_transformer_safetensors/
    β”œβ”€β”€ config.json
    β”œβ”€β”€ model.safetensors
    └── tokenizer.json

πŸ’‘ Why Safetensors?

  • Security β€” avoids arbitrary code execution vulnerabilities in .bin files

  • Speed β€” faster loading on CPU and GPU

  • Interoperability β€” works with Hugging Face models out of the box

πŸ“Œ Notes

  • This is a learning example, not intended for production-level performance.

  • Since it trains from scratch on a tiny dataset, output quality will be limited.

  • You can expand the dataset and train longer for better results.

πŸ“œ License

MIT License β€” feel free to use, modify, and share.

If you want, I can add an example inference script so users can load model.safetensors and generate text immediately after training. That way the README is complete for both training and usage.