Skip to content

dvlab-research/Open-Code-Zero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Open-Code-Zero

Welcome to the official repo for Open-Code-Zero! We pioneered replicating Deepseel-R1-Zero’s self-reflective reasoning in pure code using minimalist RL (no math data!) on a 7B coder model. Key discoveries:

  • 🚀 Emergent Long-COT: Achieved sophisticated self-correction with just 15k problems and ~600 training steps.
  • 🧠 System-2 Awakening: Chaotic "quick thinking" gives way to structured, critical analysis as training stabilizes.
  • 💻 Coder Advantage: Code-specialized models avoid language-switching instability seen in general LLMs, enabling cleaner reasoning.
  • 🫢 First to prove code-domain LLMs can intrinsically evolve Deepseek-R1-Zero-style reasoning without math. Dive in for paradigm-shifting examples!

Training Settings & Code

We find by using a simple outcome-based reward, model learns to naturally adopt a more sophisticated reasoning pattern gradually during training Coming soon in just a few days! Stay tuned and star our repo if you are interested :)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published