Reached 2000+ ELo on 4 hours of supervised trainig and then 8 hours of self-play type RL
Hardware- AMD MI3000 x8
Features-
- stockfish distillation to policy network and val network
- Self play mechanism that traind the policy and value network
- Eval on stockfish levels
- Data generation from stockfish with multi core processing
- qunatizing the weights of the model to fp16 from fp32 For running any trained bot