Update README.md

a02db44 verified 9 months ago

6.97 kB

	---
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- chess
	- classification
	- strategic-reasoning
	- reproduction
	license: mit
	language:
	- en
	datasets:
	- lfsm/rook-40m
	metrics:
	- accuracy
	paper: https://laion.ai/notes/rook/
	---

	# ROOK-CLF-9M

	A 9M parameter chess move prediction model using a classification approach, reproducing Google DeepMind's ["Grandmaster-Level Chess Without Search"](https://arxiv.org/abs/2402.04494).

	## Model Details

	### Model Description

	ROOK-CLF-9M reproduces one specific ablation from the appendix of Ruoss et al. 2024 ["Grandmaster-Level Chess Without Search"](https://arxiv.org/abs/2402.04494): the 9M parameter model configuration trained on behavior cloning (action prediction only).

	What is Reproduced:
	- 9M parameter decoder-only transformer (smallest size from the original paper)
	- Behavior cloning objective (action prediction from state)
	- Architecture: 8 layers, 8 heads, 256 embedding dimension

	What is Different:
	- Single training objective (behavior cloning only) vs. three objectives in the full paper
	- Reduced compute/training steps compared to original

	Overview:
	- Developed by: Jonathan Rahn, Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
	- Reproduces: 9M parameter ablation from Ruoss et al. 2024 Appendix (arXiv:2402.04494)
	- Model type: LlamaForSequenceClassification
	- Language(s): Chess notation (FEN format)
	- License: MIT
	- Finetuned from: Trained from scratch
	- Demo: [Interactive Demo](https://jorahn.github.io/research/rook-clf-demo/)
	- Repository: [GitHub](https://github.com/jorahn/rook)
	- Paper: [LAION Research Note](https://laion.ai/notes/rook/)

	### Model Architecture

	- Parameters: 9M
	- Layers: 8
	- Attention Heads: 8
	- Hidden Size: 256
	- Context Length: 78 tokens
	- Vocabulary: 32-character base; model embedding size padded to 128

	## Uses

	### Direct Use

	The model can be used for:
	- Chess move prediction from FEN positions
	- Chess position analysis
	- Educational chess applications
	- Research on strategic reasoning in transformers

	### Out-of-Scope Use

	The model is not suitable for:
	- Tournament-level competitive play
	- Real-time chess engines requiring deep search
	- Analysis of chess variants or non-standard rules

	## Training Details

	### Training Data

	- Primary dataset: ChessBench (GDM) 40M positions used for behavior cloning
	- Labels: Best move per position (UCI). Top‑k candidates used for auxiliary evaluation

	### Training Procedure

	#### Preprocessing

	1. FEN Standardization: Convert positions to standard FEN notation
	2. Fixed-Length Encoding: Pad/truncate to 77 characters
	3. Tokenization: Character-level tokenization + [CLS] token (78 total)
	4. Move Mapping: Convert UCI moves to classification labels (1968 classes)

	#### Training Hyperparameters

	- Framework: HuggingFace Transformers
	- Hardware: 2x NVIDIA RTX 4090
	- Learning Rate: 4e-4
	- Batch Size: 1024
	- Optimizer: AdamW
	- Weight Decay: 0.01
	- Warmup Steps: 500

	## Evaluation

	### Metrics

	Reported in the LAION research note:

	- Action accuracy (ChessBench 40M, 195k steps): 49%
	- BIG-bench Checkmate-in-One: 57%

	### Benchmarks

	- BIG-bench Checkmate-in-One: 57% (LAION note)
	- GDM Searchless Chess (ChessBench 40M): 49% action accuracy (LAION note)

	## Technical Details

	### Tokenization

	The model uses a custom tokenization scheme critical for proper inference:

	Step 1: FEN Processing (77 characters fixed)
	```python
	# Original FEN
	fen = "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"

	# Process FEN to fixed 77-character format:
	# 1. Expand numbers to dots (e.g., "8" → "........")
	# 2. Remove slashes
	# 3. Pad castling to 4 chars, en passant to 2 chars, halfmove to 3 chars, fullmove to 3 chars

	def process_fen(fen):
	position, turn, castling, en_passant, halfmove, fullmove = fen.split(" ")
	# Expand empty squares: "8" → "........"
	position = re.sub(r'\d+', lambda m: "." * int(m.group()), position)
	position = position.replace("/", "") # Remove row separators
	castling = castling.ljust(4, ".") # Pad to 4 chars
	en_passant = en_passant.ljust(2, ".") # Pad to 2 chars
	halfmove = halfmove.ljust(2, ".") + "." # Pad to 3 chars total
	fullmove = fullmove.ljust(3, ".") # Pad to 3 chars
	return "".join([position, turn, castling, en_passant, halfmove, fullmove])

	# Result: exactly 77 characters
	processed = process_fen(fen)
	# "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.."
	```

	Step 2: Add [CLS] token and convert to token IDs
	```python
	# Add classification token
	final_input = processed + "[CLS]" # 78 characters total

	# Convert to token IDs (character-level tokenization)
	tokens = [char_to_id[c] for c in final_input] # 78 tokens
	```

	Complete example:
	```
	Input FEN: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"
	Processed: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1.."
	With [CLS]: "rnbqkbnrpppppppp................................PPPPPPPPRNBQKBNRwKQkq-...0..1..[CLS]"
	Token IDs: [13, 11, 3, 12, 10, 3, 11, 13, 15, 15, 15, 15, 15, 15, 15, 15, ...] # 78 tokens
	```

	### Inference

	For in-browser inference, the model is exported to ONNX format:

	```python
	# ONNX export for web deployment
	import torch
	from transformers import AutoModelForSequenceClassification

	model = AutoModelForSequenceClassification.from_pretrained("jrahn/ROOK-CLF-9m")
	dummy_input = torch.randint(0, 88, (1, 78))

	torch.onnx.export(
	model,
	dummy_input,
	"rook_clf_9m.onnx",
	input_names=['input_ids'],
	output_names=['logits'],
	dynamic_axes={'input_ids': {0: 'batch_size'}}
	)
	```

	## Limitations

	- Search-free: No lookahead or position evaluation beyond single move
	- Tactical Weakness: Limited performance on complex tactical sequences
	- Opening Knowledge: Relies on training data distribution for openings
	- Endgame Performance: Weaker in theoretical endgames requiring precise calculation

	## Citation

	If you use this model, please cite both our work and the original paper:

	```bibtex
	@article{rook2024,
	title={ROOK: Strategic Reasoning in Chess Without Search},
	author={Rahn, Jonathan and Jitsev, Jenia and Sun, Qi},
	journal={LAION Research Notes},
	year={2024},
	url={https://laion.ai/notes/rook/}
	}

	@article{ruoss2024grandmaster,
	title={Grandmaster-level chess without search},
	author={Ruoss, Anian and Delétang, Grégoire and McAleese, Nell and Genewein, Tim and Weidinger, Laura and Cai, Matteo and Weber, Théophane and Hutter, Marcus and Legg, Shane},
	journal={arXiv preprint arXiv:2402.04494},
	year={2024}
	}
	```

	## Model Card Contact

	Jonathan Rahn - [GitHub](https://github.com/jorahn) \| [Research Page](https://jorahn.github.io/research/)

	## Metrics Source

	LAION research note: https://laion.ai/notes/rook/