Match PPG implementation by dipamc · Pull Request #186 · vwxyzjn/cleanrl

dipamc · 2022-05-18T15:57:14Z

Description

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-05-18T15:57:19Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	May 28, 2022 at 0:09AM (UTC)

gitpod-io · 2022-05-18T15:57:19Z

vwxyzjn

This is massive! Thanks for making this PR.

I think before any additional changes, our next step is to establish a great baseline. We should try to compare our results against the original results, otherwise, it's hard to attest the quality of our PPG implementation given that we included divergent changes such as using shared optimizers.

The original paper ran experiments using the hard distribution mode, so we might have to re-run them. I made a fork available here https://github.com/openrlbenchmark/phasic-policy-gradient to run tracked experiments. Unfortunately, I was not able to run them due to insufficient GPU memory... Would you mind giving it a try? The benchmark commands is at https://github.com/openrlbenchmark/phasic-policy-gradient/blob/add-wandb/benchmark.sh

cleanrl/ppg_procgen.py

docs/rl-algorithms/ppg.md

vwxyzjn · 2022-05-27T02:53:30Z

docs/rl-algorithms/ppg.md

+    * Original PPO used orthogonal initialization of only the Policy head and Value heads with scale of 0.01 and 1. respectively.
+    * For PPG
+        * All weights are initialized with the default torch initialization (Kaiming Uniform)
+        * Each layer’s weights are divided by the L2 norm of the weights along the (which axis?), and multiplied by a scale factor.


Please clarify "which axis" here.

* added nit changes from ppg code * change observation buffer to uint8 * sample full rollouts * minor device fix * update optimizer settings * add ppg documentation * update mkdocs * update images to png for codespell errors * trigger CI * Minor format change * format by running `pre-commit` * removes trailing space * Add an extra note * argument names and documentation changes * add capture video * add experiment report * Update documentation * Quick css fix * Update documentation * Fix documentation for PPO * Add benchmark commands * Add benchmark commands * add metrics section * Add more docs * Quick fix on ddpg docs * Add procgen test cases * Update CI * test CI * test ci * Update tests * normalization axis documentation Co-authored-by: Dipam Chakraborty <dipam@aicrowd.com> Co-authored-by: Costa Huang <costa.huang@outlook.com>

dipamc and others added 8 commits May 14, 2022 11:13

added nit changes from ppg code

419041d

change observation buffer to uint8

2e1190b

sample full rollouts

86f5be7

minor device fix

beff293

update optimizer settings

4cb85d5

add ppg documentation

d6ee26b

update mkdocs

fea4531

update images to png for codespell errors

20f15da

vercel bot deployed to Preview May 18, 2022 15:57 View deployment

trigger CI

6c3cb05

vercel bot deployed to Preview May 18, 2022 22:08 View deployment

Minor format change

631ab96

vercel bot deployed to Preview May 18, 2022 22:13 View deployment

format by running pre-commit

d961d0f

vercel bot deployed to Preview May 18, 2022 22:16 View deployment

removes trailing space

4cff11d

vercel bot deployed to Preview May 18, 2022 22:17 View deployment

vwxyzjn reviewed May 18, 2022

View reviewed changes

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved

cleanrl/ppg_procgen.py Outdated Show resolved Hide resolved

docs/rl-algorithms/ppg.md Outdated Show resolved Hide resolved

Add an extra note

fb9c832

vercel bot deployed to Preview May 19, 2022 02:14 View deployment

argument names and documentation changes

31bb5c4

vercel bot deployed to Preview May 23, 2022 16:42 View deployment

add capture video

ed66604

vercel bot deployed to Preview May 23, 2022 17:00 View deployment

add experiment report

1610191

vercel bot deployed to Preview May 25, 2022 13:59 View deployment

Merge branch 'master' into ppg-dev

51c6aac

vercel bot deployed to Preview May 27, 2022 02:16 View deployment

Fix documentation for PPO

9c4edf8

vercel bot deployed to Preview May 27, 2022 02:54 View deployment

Add benchmark commands

23cd48e

vercel bot deployed to Preview May 27, 2022 02:56 View deployment

Add benchmark commands

8e4f977

vercel bot deployed to Preview May 27, 2022 02:57 View deployment

add metrics section

72e8cce

vercel bot deployed to Preview May 27, 2022 07:56 View deployment

Add more docs

aa695c1

vercel bot deployed to Preview May 27, 2022 14:47 View deployment

Quick fix on ddpg docs

0564584

vercel bot deployed to Preview May 27, 2022 14:48 View deployment

vwxyzjn added 2 commits May 27, 2022 10:59

Add procgen test cases

a08039e

Update CI

31a175c

vercel bot deployed to Preview May 27, 2022 15:00 View deployment

test CI

f063a7b

vercel bot deployed to Preview May 27, 2022 15:03 View deployment

test ci

60df2c8

vercel bot deployed to Preview May 27, 2022 15:03 View deployment

Update tests

e70c71a

vercel bot deployed to Preview May 27, 2022 17:53 View deployment

normalization axis documentation

6ebaaae

vercel bot deployed to Preview May 28, 2022 12:09 View deployment

vwxyzjn approved these changes May 28, 2022

View reviewed changes

vwxyzjn merged commit eba6452 into vwxyzjn:master May 28, 2022

This was referenced Jun 20, 2022

Adding Average Reward PPO proposal #210

Closed

Prototype TD3 with JAX #216

Closed

JAX Integration with CleanRL #218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match PPG implementation#186

Match PPG implementation#186
vwxyzjn merged 32 commits intovwxyzjn:masterfrom
dipamc:ppg-dev

dipamc commented May 18, 2022 •

edited

Loading

Uh oh!

vercel bot commented May 18, 2022 •

edited

Loading

Uh oh!

gitpod-io bot commented May 18, 2022

Uh oh!

vwxyzjn left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vwxyzjn May 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dipamc commented May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Checklist:

Uh oh!

vercel bot commented May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitpod-io bot commented May 18, 2022

Uh oh!

vwxyzjn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vwxyzjn May 27, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dipamc commented May 18, 2022 •

edited

Loading

vercel bot commented May 18, 2022 •

edited

Loading

vwxyzjn left a comment •

edited

Loading