Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
This is massive! Thanks for making this PR.
I think before any additional changes, our next step is to establish a great baseline. We should try to compare our results against the original results, otherwise, it's hard to attest the quality of our PPG implementation given that we included divergent changes such as using shared optimizers.
The original paper ran experiments using the hard distribution mode, so we might have to re-run them. I made a fork available here https://github.com/openrlbenchmark/phasic-policy-gradient to run tracked experiments. Unfortunately, I was not able to run them due to insufficient GPU memory... Would you mind giving it a try? The benchmark commands is at https://github.com/openrlbenchmark/phasic-policy-gradient/blob/add-wandb/benchmark.sh
docs/rl-algorithms/ppg.md
Outdated
| * Original PPO used orthogonal initialization of only the Policy head and Value heads with scale of 0.01 and 1. respectively. | ||
| * For PPG | ||
| * All weights are initialized with the default torch initialization (Kaiming Uniform) | ||
| * Each layer’s weights are divided by the L2 norm of the weights along the (which axis?), and multiplied by a scale factor. |
There was a problem hiding this comment.
Please clarify "which axis" here.
* added nit changes from ppg code * change observation buffer to uint8 * sample full rollouts * minor device fix * update optimizer settings * add ppg documentation * update mkdocs * update images to png for codespell errors * trigger CI * Minor format change * format by running `pre-commit` * removes trailing space * Add an extra note * argument names and documentation changes * add capture video * add experiment report * Update documentation * Quick css fix * Update documentation * Fix documentation for PPO * Add benchmark commands * Add benchmark commands * add metrics section * Add more docs * Quick fix on ddpg docs * Add procgen test cases * Update CI * test CI * test ci * Update tests * normalization axis documentation Co-authored-by: Dipam Chakraborty <dipam@aicrowd.com> Co-authored-by: Costa Huang <costa.huang@outlook.com>
Description
Types of changes
Checklist:
pre-commit run --all-filespasses (required).mkdocs serve.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-videoflag toggled on (required).mkdocs serve.width=500andheight=300).