Text Classification
Transformers
Safetensors
English
llama
chess
classification
strategic-reasoning
reproduction
text-embeddings-inference
Instructions to use jrahn/ROOK-CLF-9m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jrahn/ROOK-CLF-9m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="jrahn/ROOK-CLF-9m")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("jrahn/ROOK-CLF-9m") model = AutoModelForSequenceClassification.from_pretrained("jrahn/ROOK-CLF-9m") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| pipeline_tag: text-classification | |
| tags: | |
| - chess | |
| - classification | |
| - strategic-reasoning | |
| - reproduction | |
| license: mit | |
| language: | |
| - en | |
| datasets: | |
| - lfsm/rook-40m | |
| metrics: | |
| - accuracy | |
| paper: https://laion.ai/notes/rook/ | |
| # ROOK-CLF-9M | |
| A 9M parameter chess move prediction model using a classification approach, reproducing Google DeepMind's ["Grandmaster-Level Chess Without Search"](https://arxiv.org/abs/2402.04494). | |
| ## Model Details | |
| ### Model Description | |
| ROOK-CLF-9M reproduces one specific ablation from the appendix of Ruoss et al. 2024 ["Grandmaster-Level Chess Without Search"](https://arxiv.org/abs/2402.04494): the 9M parameter model configuration trained on behavior cloning (action prediction only). | |
| **What is Reproduced:** | |
| - 9M parameter decoder-only transformer (smallest size from the original paper) | |
| - Behavior cloning objective (action prediction from state) | |
| - Architecture: 8 layers, 8 heads, 256 embedding dimension | |
| **What is Different:** | |
| - Single training objective (behavior cloning only) vs. three objectives in the full paper | |
| - Reduced compute/training steps compared to original | |
| **Overview:** | |
| - **Developed by:** Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI) | |
| - **Reproduces:** 9M parameter ablation from Ruoss et al. 2024 Appendix (arXiv:2402.04494) | |
| - **Model type:** LlamaForSequenceClassification | |
| - **Language(s):** Chess notation (FEN format) | |
| - **License:** MIT | |
| - **Finetuned from:** Trained from scratch | |
| - **Demo:** [Interactive Demo](https://jorahn.github.io/research/rook-clf-demo/) | |
| - **Repository:** [GitHub](https://github.com/jorahn/rook) | |
| - **Paper:** [LAION Research Note](https://laion.ai/notes/rook/) | |
| ### Model Architecture | |
| - **Parameters:** 9M | |
| - **Layers:** 8 | |
| - **Attention Heads:** 8 | |
| - **Hidden Size:** 256 | |
| - **Context Length:** 78 tokens | |
| - **Vocabulary:** 32-character base; model embedding size padded to 128 | |
| ## Uses | |
| ### Direct Use | |
| The model can be used for: | |
| - Chess move prediction from FEN positions | |
| - Chess position analysis | |
| - Educational chess applications | |
| - Research on strategic reasoning in transformers | |
| ### Out-of-Scope Use | |
| The model is not suitable for: | |
| - Tournament-level competitive play | |
| - Real-time chess engines requiring deep search | |
| - Analysis of chess variants or non-standard rules | |
| ## Training Details | |
| ### Training Data | |
| - **Primary dataset:** ChessBench (GDM) 40M positions used for behavior cloning | |
| - **Labels:** Best move per position (UCI). Top‑k candidates used for auxiliary evaluation | |
| ### Training Procedure | |
| #### Preprocessing | |
| 1. **FEN Standardization:** Convert positions to standard FEN notation | |
| 2. **Fixed-Length Encoding:** Pad/truncate to 77 characters | |
| 3. **Tokenization:** Character-level tokenization + [CLS] token (78 total) | |
| 4. **Move Mapping:** Convert UCI moves to classification labels (1968 classes) | |
| #### Training Hyperparameters | |
| - **Framework:** HuggingFace Transformers | |
| - **Hardware:** 2x NVIDIA RTX 4090 | |
| - **Learning Rate:** 4e-4 | |
| - **Batch Size:** 1024 | |
| - **Optimizer:** AdamW | |
| - **Weight Decay:** 0.01 | |
| - **Warmup Steps:** 500 | |
| ## Evaluation | |
| ### Metrics | |
| Reported in the LAION research note: | |
| - **Action accuracy (ChessBench 40M, 195k steps):** 49% | |
| - **BIG-bench Checkmate-in-One:** 57% | |
| ### Benchmarks | |
| - **BIG-bench Checkmate-in-One:** 57% (LAION note) | |
| - **GDM Searchless Chess (ChessBench 40M):** 49% action accuracy (LAION note) | |
| ## Technical Details | |
| ### Tokenization | |
| The model uses a custom tokenization scheme critical for proper inference: | |
| **Step 1: FEN Processing (77 characters fixed)** | |
| ```python | |
| # Original FEN | |
| fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" | |
| # Process FEN to fixed 77-character format: | |
| # 1. Expand numbers to dots (e.g., "8" → "........") | |
| # 2. Remove slashes | |
| # 3. Pad castling to 4 chars, en passant to 2 chars, halfmove to 3 chars, fullmove to 3 chars | |
| def process_fen(fen): | |
| position, turn, castling, en_passant, halfmove, fullmove = fen.split(" ") | |
| # Expand empty squares: "8" → "........" | |
| position = re.sub(r'\d+', lambda m: "." * int(m.group()), position) | |
| position = position.replace("/", "") # Remove row separators | |
| castling = castling.ljust(4, ".") # Pad to 4 chars | |
| en_passant = en_passant.ljust(2, ".") # Pad to 2 chars | |
| halfmove = halfmove.ljust(2, ".") + "." # Pad to 3 chars total | |
| fullmove = fullmove.ljust(3, ".") # Pad to 3 chars | |
| return "".join([position, turn, castling, en_passant, halfmove, fullmove]) | |
| # Result: exactly 77 characters | |
| processed = process_fen(fen) | |
| # "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.." | |
| ``` | |
| **Step 2: Add [CLS] token and convert to token IDs** | |
| ```python | |
| # Add classification token | |
| final_input = processed + "[CLS]" # 78 characters total | |
| # Convert to token IDs (character-level tokenization) | |
| tokens = [char_to_id[c] for c in final_input] # 78 tokens | |
| ``` | |
| **Complete example:** | |
| ``` | |
| Input FEN: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" | |
| Processed: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.." | |
| With [CLS]: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1..[CLS]" | |
| Token IDs: [13, 11, 3, 12, 10, 3, 11, 13, 15, 15, 15, 15, 15, 15, 15, 15, ...] # 78 tokens | |
| ``` | |
| ### Inference | |
| For in-browser inference, the model is exported to ONNX format: | |
| ```python | |
| # ONNX export for web deployment | |
| import torch | |
| from transformers import AutoModelForSequenceClassification | |
| model = AutoModelForSequenceClassification.from_pretrained("jrahn/ROOK-CLF-9m") | |
| dummy_input = torch.randint(0, 88, (1, 78)) | |
| torch.onnx.export( | |
| model, | |
| dummy_input, | |
| "rook_clf_9m.onnx", | |
| input_names=['input_ids'], | |
| output_names=['logits'], | |
| dynamic_axes={'input_ids': {0: 'batch_size'}} | |
| ) | |
| ``` | |
| ## Limitations | |
| - **Search-free:** No lookahead or position evaluation beyond single move | |
| - **Tactical Weakness:** Limited performance on complex tactical sequences | |
| - **Opening Knowledge:** Relies on training data distribution for openings | |
| - **Endgame Performance:** Weaker in theoretical endgames requiring precise calculation | |
| ## Citation | |
| If you use this model, please cite both our work and the original paper: | |
| ```bibtex | |
| @article{rook2024, | |
| title={ROOK: Strategic Reasoning in Chess Without Search}, | |
| author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi}, | |
| journal={LAION Research Notes}, | |
| year={2024}, | |
| url={https://laion.ai/notes/rook/} | |
| } | |
| @article{ruoss2024grandmaster, | |
| title={Grandmaster-level chess without search}, | |
| author={Ruoss, Anian and Delétang, Grégoire and McAleese, Nell and Genewein, Tim and Weidinger, Laura and Cai, Matteo and Weber, Théophane and Hutter, Marcus and Legg, Shane}, | |
| journal={arXiv preprint arXiv:2402.04494}, | |
| year={2024} | |
| } | |
| ``` | |
| ## Model Card Contact | |
| Jonathan Rahn - [GitHub](https://github.com/jorahn) | [Research Page](https://jorahn.github.io/research/) | |
| ## Metrics Source | |
| LAION research note: https://laion.ai/notes/rook/ | |