SAC Documentation - Benchmarks - Minor code tweaks by dosssman · Pull Request #146 · vwxyzjn/cleanrl

dosssman · 2022-03-23T05:12:08Z

Description

SAC documentation prototype
SAC qf_loss computation: removed the /2 gradient scaling so that .backward() is more aligned with the theory. Instead, log qf_loss = (qf1_loss + qf2_loss) / 2.` for meaningful comparison with mono Q-value network algorithms.
Same is done in OpenAI SpinningUP for example.
Added benchmark instructions to run SAC on Mujoco and pybullet environments
Added SAC runs for 6 continuous control envs (3 Mujoco, 3 PyBullet) to cleanrl and corresponding openrlbenchmark reports.

Types of changes

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
~~I have updated the tests accordingly (if applicable).~~

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-03-23T05:12:12Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/rnPXKTft8t6hLfegqbnNkwBpG7W8
✅ Preview: https://cleanrl-git-fork-dosssman-sac-docs-vwxyzjn.vercel.app

gitpod-io · 2022-03-23T05:12:16Z

… SAC with fixed $\alpha$

dosssman · 2022-03-29T14:08:33Z

Hello there.

While I think this branch should be ready for review, it could not pass the pre-commit test due to some problem with the black dependency I think. @vwxyzjn Have you encountered such a problem recently ?

isort....................................................................Passed
autoflake................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1

Traceback (most recent call last):
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/bin/black", line 8, in <module>
    sys.exit(patched_main())
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
    patch_click()
  File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
    from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)

codespell................................................................Passed

vwxyzjn · 2022-03-29T15:01:48Z

Looks good on my end:

psf/black#2964

vwxyzjn · 2022-03-29T15:14:19Z

Fixed with psf/black#2964

vwxyzjn · 2022-03-29T20:44:13Z

Thank you @dosssman. This is a really high-quality benchmark. I have asked @ikostrikov, who maintains https://github.com/ikostrikov/jaxrl, to help review this PR. Thanks @ikostrikov!

vwxyzjn

One other thing. Would you mind customizing the chart a bit like the other charts in the benchmark? Use CleanRL's sac_continuous_action.py instead of exp_name: sac_continuous_action for the legend. I'd also change the line color to red for consistency.

Everything else looks good :) Feel free to merge once you have a chance to address these .

vwxyzjn · 2022-04-04T20:16:35Z

docs/rl-algorithms/sac.md

+
+## Overview
+
+The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies.


DDPG algorithms > DDPG algorithm

Thanks. Fixed.

docs/rl-algorithms/sac.md

…lot color changes -- mentions global gradient clipping

dosssman · 2022-04-05T03:46:35Z

Change the legend of the SAC experiments
Change the color of the SAC experiments in the standalone Mujoco and PyBullet reports
Re-export the SAC plots with the red color
Fixed the "DDPG algorihtms" typo
Mention global gradient clipping

… added VAE paper citation

dosssman · 2022-04-08T06:53:35Z

Oh man, I finally found the last reference for SAC I was looking for but could not remember: pranz24/pytorch-soft-actor-critic .

vwxyzjn

It looks really good now. Thank you @dosssman and feel free to merge.

dosssman · 2022-04-09T01:36:35Z

Thanks for the review. Merging then.

* Preliminary work on the SAC docs and clenarl openbenchmark * Updated the instructions for benchmark script runs * Fixed typos and formatting * SAC docs added Wandb Iframe for PyBullet and MuJoCo; command line for SAC with fixed $\alpha$ * Finalized complete, sac.md doc draft, added images of learning curves * Typo and formulation tweaks * Fixed the autospell detected typo * Update pre-commit file * Fix weird github action error: psf/black#2964 * Typo fix * Follow up on change requests after review.\ntypo fix -- legend and polot color changes -- mentions global gradient clipping * Fleshed out the reparam. trick for action sampling, removed the TODO, added VAE paper citation * added pranz24 reference to sac docs * Added pranz24 reference and licenses Co-authored-by: Costa Huang <costa.huang@outlook.com>

dosssman added 2 commits March 23, 2022 13:25

Preliminary work on the SAC docs and clenarl openbenchmark

b721759

Updated the instructions for benchmark script runs

8b88bd4

vercel bot deployed to Preview March 23, 2022 05:12 View deployment

Fixed typos and formatting

533a051

vercel bot deployed to Preview March 23, 2022 05:15 View deployment

SAC docs added Wandb Iframe for PyBullet and MuJoCo; command line for…

69f75ce

… SAC with fixed $\alpha$

vercel bot deployed to Preview March 23, 2022 05:21 View deployment

vwxyzjn mentioned this pull request Mar 23, 2022

Refactor documentation #121

Closed

10 tasks

dosssman added 2 commits March 29, 2022 22:45

Finalized complete, sac.md doc draft, added images of learning curves

a7a6fc8

Typo and formulation tweaks

94b5301

vercel bot deployed to Preview March 29, 2022 13:50 View deployment

Fixed the autospell detected typo

c763e18

vercel bot deployed to Preview March 29, 2022 14:08 View deployment

dosssman marked this pull request as ready for review March 29, 2022 14:08

dosssman requested a review from vwxyzjn March 29, 2022 14:08

Update pre-commit file

d311c1f

vercel bot deployed to Preview March 29, 2022 15:01 View deployment

Fix weird github action error:

6a78a34

psf/black#2964

vercel bot deployed to Preview March 29, 2022 15:04 View deployment

Typo fix

ab30b20

vercel bot deployed to Preview March 31, 2022 06:16 View deployment

vwxyzjn requested changes Apr 4, 2022

View reviewed changes

Follow up on change requests after review.\ntypo fix -- legend and po…

456b0c1

…lot color changes -- mentions global gradient clipping

vercel bot deployed to Preview April 5, 2022 03:46 View deployment

dosssman requested a review from vwxyzjn April 5, 2022 03:47

Fleshed out the reparam. trick for action sampling, removed the TODO,…

f68e5cf

… added VAE paper citation

vercel bot deployed to Preview April 5, 2022 03:56 View deployment

dosssman added 2 commits April 8, 2022 15:45

Pulled recent changes to master to include sac license

04d2c47

added pranz24 reference to sac docs

5359826

vercel bot deployed to Preview April 8, 2022 06:47 View deployment

Added pranz24 reference and licenses

9b6b158

vercel bot deployed to Preview April 8, 2022 06:50 View deployment

vwxyzjn approved these changes Apr 8, 2022

View reviewed changes

dosssman merged commit 9428ce6 into vwxyzjn:master Apr 9, 2022

dosssman deleted the sac-docs branch April 9, 2022 01:36


		## Overview

		The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies.

Conversation

dosssman commented Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Checklist:

Uh oh!

vercel bot commented Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitpod-io bot commented Mar 23, 2022

Uh oh!

dosssman commented Mar 29, 2022

Uh oh!

vwxyzjn commented Mar 29, 2022

Uh oh!

vwxyzjn commented Mar 29, 2022

Uh oh!

vwxyzjn commented Mar 29, 2022

Uh oh!

vwxyzjn left a comment

Choose a reason for hiding this comment

Uh oh!

vwxyzjn Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

dosssman Apr 5, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dosssman commented Apr 5, 2022

Uh oh!

dosssman commented Apr 8, 2022

Uh oh!

vwxyzjn left a comment

Choose a reason for hiding this comment

Uh oh!

dosssman commented Apr 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dosssman commented Mar 23, 2022 •

edited

Loading

vercel bot commented Mar 23, 2022 •

edited

Loading