Welcome to the official repo for Open-Code-Zero! We pioneered replicating Deepseel-R1-Zero’s self-reflective reasoning in pure code using minimalist RL (no math data!) on a 7B coder model. Key discoveries:
- 🚀 Emergent Long-COT: Achieved sophisticated self-correction with just 15k problems and ~600 training steps.
- 🧠 System-2 Awakening: Chaotic "quick thinking" gives way to structured, critical analysis as training stabilizes.
- 💻 Coder Advantage: Code-specialized models avoid language-switching instability seen in general LLMs, enabling cleaner reasoning.
- 🫢 First to prove code-domain LLMs can intrinsically evolve Deepseek-R1-Zero-style reasoning without math. Dive in for paradigm-shifting examples!
We find by using a simple outcome-based reward, model learns to naturally adopt a more sophisticated reasoning pattern gradually during training Coming soon in just a few days! Stay tuned and star our repo if you are interested :)