Skip to content
View jorahn's full-sized avatar

Highlights

  • Pro

Organizations

@rcs-analytics

Block or report jorahn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jorahn/README.md

Chess AI Through Language Models: Strategic Reasoning Without Search

Jonathan Rahn | AI Lab Lead, Drees & Sommer | GitHub | HuggingFace

Research Overview

This work explores transformer-based strategic reasoning through chess as a testbed, demonstrating that language models can develop sophisticated game-playing capabilities without traditional search algorithms. In collaboration with LAION, we’ve developed a progression of models that challenge fundamental assumptions about how AI systems learn strategic thinking.

The core hypothesis: complex strategic reasoning can emerge from next-token prediction when models are trained on appropriately structured strategic data.


The ROOK Project Evolution

RookWorld-RLVR (Current)

Active development integrating Reinforcement Learning with Verifiable Rewards (GRPO) for enhanced reasoning capabilities.
Repo Transformers & TRL: jorahn/RookWorld-TRL
Repo PyTorch: jorahn/RookWorld-RLVR

RookWorld-LM (124M params) - Unified Agent+Environment

Key breakthrough: Unified chess policy and world model in a single transformer architecture.
Post: ROOK: REASONING OVER ORGANIZED KNOWLEDGE

  • Collaboration: Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)
  • Multi-task Performance:
    • 32.1% Checkmate-in-One accuracy (vs ChessGPT-Base 26.5%)
    • 99.9% environment simulation accuracy
    • 26.2% overall action accuracy
  • Model: RookWorld-LM 124M
  • Dataset: rookworld_7m
  • Significance: Enables closed-loop self-play without external engines
  • Interactive Demo: RookWorld Space

ROOK-LM (124M params) - Chain-of-Thought Integration

Implementation of Chain-of-Thought reasoning for chess, incorporating position analysis → candidate evaluation → move selection.

  • Dataset: rook_40m (6B tokens, generated on Tsubame 4.0)
  • Architecture: GPT-2 with custom chess tokenization
  • Performance: 22.2% action accuracy with comprehensive reasoning traces
  • Technical Details: LAION Research Note

ROOK-CLF (9M params) - Classification Approach

Reproduction of Google DeepMind’s “Grandmaster-Level Chess Without Search” methodology using LLaMA-based decoder.

  • Performance: 49% action accuracy, 57% on Checkmate-in-One
  • Achievement: Demonstrated searchless chess AI feasibility with minimal parameters
  • Model: Available on HuggingFace

YoloChess (2022) - Foundation Work

Initial exploration using BERT-based position evaluation with custom FEN encoders. Established baseline performance and identified key challenges in chess representation for transformer architectures.


Technical Contributions

Novel Architectures

  • Unified world modeling: Simultaneous policy and environment simulation in transformers
  • Strategic tokenization: Custom representations for structured game states
  • Multi-task scaling: Consistent performance improvements with unified training objectives

Dataset Engineering

  • Large-scale annotation: 40M+ positions annotated with Stockfish 16.1 on supercomputing infrastructure
  • Multi-format datasets: Support for classification, autoregressive, and multi-task learning
  • Reproducible pipelines: Full data generation code and methodology documentation

Open Science Impact

All models, datasets, and code publicly available. Contributing to democratization of strategic AI research.


Research Context

Background spans neuro-informatics (University of Lübeck), games industry applications, business economics & management (Witten/Herdecke University, IPADE Mexico DF), and AI/ML consulting. Active contributor to HuggingFace ecosystem (transformers, datasets, evaluate) and open source frameworks including keras-rl and custom implementations like keras-wide-n-deep. Current work at Drees & Sommer, building the AI Lab & exploring applications in construction and real estate optimization.


Research Implications

The RookWorld results suggest that:

  1. Search-free strategic AI is viable with appropriate training data
  2. Unified architectures can efficiently handle multiple strategic reasoning tasks
  3. Chain-of-thought training improves both performance and interpretability
  4. Language model paradigms apply effectively to structured strategic domains

These findings have implications beyond chess for any domain requiring sequential decision-making under

Pinned Loading

  1. rookworld-trl rookworld-trl Public

    Post-train RookWorld-LM using GRPO via TRL

    Python

  2. RookWorld RookWorld Public

    training language models to reason with a world model

    Jupyter Notebook 1

  3. rook rook Public

    training language models to reason with a world model

    Python

  4. llama-int8 llama-int8 Public

    Forked from tloen/llama-int8

    Quantized inference code for LLaMA models

    Python 13 2

  5. keras-wide-n-deep keras-wide-n-deep Public

    Reimplementation of Google's Wide & Deep Network in Keras

    Python 27 16

  6. llm.c llm.c Public

    Forked from karpathy/llm.c

    LLM training in simple, raw C/CUDA

    Cuda