ROOK-CLF-9m / README.md
jrahn's picture
Update README.md
a02db44 verified
---
library_name: transformers
pipeline_tag: text-classification
tags:
- chess
- classification
- strategic-reasoning
- reproduction
license: mit
language:
- en
datasets:
- lfsm/rook-40m
metrics:
- accuracy
paper: https://laion.ai/notes/rook/
---
# ROOK-CLF-9M
A 9M parameter chess move prediction model using a classification approach, reproducing Google DeepMind's ["Grandmaster-Level Chess Without Search"](https://arxiv.org/abs/2402.04494).
## Model Details
### Model Description
ROOK-CLF-9M reproduces one specific ablation from the appendix of Ruoss et al. 2024 ["Grandmaster-Level Chess Without Search"](https://arxiv.org/abs/2402.04494): the 9M parameter model configuration trained on behavior cloning (action prediction only).
**What is Reproduced:**
- 9M parameter decoder-only transformer (smallest size from the original paper)
- Behavior cloning objective (action prediction from state)
- Architecture: 8 layers, 8 heads, 256 embedding dimension
**What is Different:**
- Single training objective (behavior cloning only) vs. three objectives in the full paper
- Reduced compute/training steps compared to original
**Overview:**
- **Developed by:** Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
- **Reproduces:** 9M parameter ablation from Ruoss et al. 2024 Appendix (arXiv:2402.04494)
- **Model type:** LlamaForSequenceClassification
- **Language(s):** Chess notation (FEN format)
- **License:** MIT
- **Finetuned from:** Trained from scratch
- **Demo:** [Interactive Demo](https://jorahn.github.io/research/rook-clf-demo/)
- **Repository:** [GitHub](https://github.com/jorahn/rook)
- **Paper:** [LAION Research Note](https://laion.ai/notes/rook/)
### Model Architecture
- **Parameters:** 9M
- **Layers:** 8
- **Attention Heads:** 8
- **Hidden Size:** 256
- **Context Length:** 78 tokens
- **Vocabulary:** 32-character base; model embedding size padded to 128
## Uses
### Direct Use
The model can be used for:
- Chess move prediction from FEN positions
- Chess position analysis
- Educational chess applications
- Research on strategic reasoning in transformers
### Out-of-Scope Use
The model is not suitable for:
- Tournament-level competitive play
- Real-time chess engines requiring deep search
- Analysis of chess variants or non-standard rules
## Training Details
### Training Data
- **Primary dataset:** ChessBench (GDM) 40M positions used for behavior cloning
- **Labels:** Best move per position (UCI). Top‑k candidates used for auxiliary evaluation
### Training Procedure
#### Preprocessing
1. **FEN Standardization:** Convert positions to standard FEN notation
2. **Fixed-Length Encoding:** Pad/truncate to 77 characters
3. **Tokenization:** Character-level tokenization + [CLS] token (78 total)
4. **Move Mapping:** Convert UCI moves to classification labels (1968 classes)
#### Training Hyperparameters
- **Framework:** HuggingFace Transformers
- **Hardware:** 2x NVIDIA RTX 4090
- **Learning Rate:** 4e-4
- **Batch Size:** 1024
- **Optimizer:** AdamW
- **Weight Decay:** 0.01
- **Warmup Steps:** 500
## Evaluation
### Metrics
Reported in the LAION research note:
- **Action accuracy (ChessBench 40M, 195k steps):** 49%
- **BIG-bench Checkmate-in-One:** 57%
### Benchmarks
- **BIG-bench Checkmate-in-One:** 57% (LAION note)
- **GDM Searchless Chess (ChessBench 40M):** 49% action accuracy (LAION note)
## Technical Details
### Tokenization
The model uses a custom tokenization scheme critical for proper inference:
**Step 1: FEN Processing (77 characters fixed)**
```python
# Original FEN
fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
# Process FEN to fixed 77-character format:
# 1. Expand numbers to dots (e.g., "8" → "........")
# 2. Remove slashes
# 3. Pad castling to 4 chars, en passant to 2 chars, halfmove to 3 chars, fullmove to 3 chars
def process_fen(fen):
position, turn, castling, en_passant, halfmove, fullmove = fen.split(" ")
# Expand empty squares: "8" → "........"
position = re.sub(r'\d+', lambda m: "." * int(m.group()), position)
position = position.replace("/", "") # Remove row separators
castling = castling.ljust(4, ".") # Pad to 4 chars
en_passant = en_passant.ljust(2, ".") # Pad to 2 chars
halfmove = halfmove.ljust(2, ".") + "." # Pad to 3 chars total
fullmove = fullmove.ljust(3, ".") # Pad to 3 chars
return "".join([position, turn, castling, en_passant, halfmove, fullmove])
# Result: exactly 77 characters
processed = process_fen(fen)
# "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.."
```
**Step 2: Add [CLS] token and convert to token IDs**
```python
# Add classification token
final_input = processed + "[CLS]" # 78 characters total
# Convert to token IDs (character-level tokenization)
tokens = [char_to_id[c] for c in final_input] # 78 tokens
```
**Complete example:**
```
Input FEN: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
Processed: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.."
With [CLS]: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1..[CLS]"
Token IDs: [13, 11, 3, 12, 10, 3, 11, 13, 15, 15, 15, 15, 15, 15, 15, 15, ...] # 78 tokens
```
### Inference
For in-browser inference, the model is exported to ONNX format:
```python
# ONNX export for web deployment
import torch
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("jrahn/ROOK-CLF-9m")
dummy_input = torch.randint(0, 88, (1, 78))
torch.onnx.export(
model,
dummy_input,
"rook_clf_9m.onnx",
input_names=['input_ids'],
output_names=['logits'],
dynamic_axes={'input_ids': {0: 'batch_size'}}
)
```
## Limitations
- **Search-free:** No lookahead or position evaluation beyond single move
- **Tactical Weakness:** Limited performance on complex tactical sequences
- **Opening Knowledge:** Relies on training data distribution for openings
- **Endgame Performance:** Weaker in theoretical endgames requiring precise calculation
## Citation
If you use this model, please cite both our work and the original paper:
```bibtex
@article{rook2024,
title={ROOK: Strategic Reasoning in Chess Without Search},
author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
journal={LAION Research Notes},
year={2024},
url={https://laion.ai/notes/rook/}
}
@article{ruoss2024grandmaster,
title={Grandmaster-level chess without search},
author={Ruoss, Anian and Delétang, Grégoire and McAleese, Nell and Genewein, Tim and Weidinger, Laura and Cai, Matteo and Weber, Théophane and Hutter, Marcus and Legg, Shane},
journal={arXiv preprint arXiv:2402.04494},
year={2024}
}
```
## Model Card Contact
Jonathan Rahn - [GitHub](https://github.com/jorahn) | [Research Page](https://jorahn.github.io/research/)
## Metrics Source
LAION research note: https://laion.ai/notes/rook/