Open-Code-Zero

Welcome to the official repo for Open-Code-Zero! We pioneered replicating Deepseel-R1-Zero’s self-reflective reasoning in pure code using minimalist RL (no math data!) on a 7B coder model. Key discoveries:

🚀 Emergent Long-COT: Achieved sophisticated self-correction with just 15k problems and ~600 training steps.
🧠 System-2 Awakening: Chaotic "quick thinking" gives way to structured, critical analysis as training stabilizes.
💻 Coder Advantage: Code-specialized models avoid language-switching instability seen in general LLMs, enabling cleaner reasoning.
🫢 First to prove code-domain LLMs can intrinsically evolve Deepseek-R1-Zero-style reasoning without math. Dive in for paradigm-shifting examples!

Training Settings & Code

We find by using a simple outcome-based reward, model learns to naturally adopt a more sophisticated reasoning pattern gradually during training Coming soon in just a few days! Stay tuned and star our repo if you are interested :)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open-Code-Zero

Training Settings & Code

About

Uh oh!

Releases

Packages

dvlab-research/Open-Code-Zero

Folders and files

Latest commit

History

Repository files navigation

Open-Code-Zero

Training Settings & Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages