SAC Documentation - Benchmarks - Minor code tweaks#146
SAC Documentation - Benchmarks - Minor code tweaks#146dosssman merged 15 commits intovwxyzjn:masterfrom
Conversation
|
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/vwxyzjn/cleanrl/rnPXKTft8t6hLfegqbnNkwBpG7W8 |
… SAC with fixed $\alpha$
|
Hello there. While I think this branch should be ready for review, it could not pass the pre-commit test due to some problem with the isort....................................................................Passed
autoflake................................................................Passed
black....................................................................Failed
- hook id: black
- exit code: 1
Traceback (most recent call last):
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1423, in patched_main
patch_click()
File "/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/black/__init__.py", line 1409, in patch_click
from click import _unicodefun
ImportError: cannot import name '_unicodefun' from 'click' (/home/d055/.cache/pre-commit/repoverpfvk2/py_env-python3.8/lib/python3.8/site-packages/click/__init__.py)
codespell................................................................Passed |
|
Fixed with psf/black#2964 |
|
Thank you @dosssman. This is a really high-quality benchmark. I have asked @ikostrikov, who maintains https://github.com/ikostrikov/jaxrl, to help review this PR. Thanks @ikostrikov! |
vwxyzjn
left a comment
There was a problem hiding this comment.
One other thing. Would you mind customizing the chart a bit like the other charts in the benchmark? Use CleanRL's sac_continuous_action.py instead of exp_name: sac_continuous_action for the legend. I'd also change the line color to red for consistency.
Everything else looks good :) Feel free to merge once you have a chance to address these .
docs/rl-algorithms/sac.md
Outdated
|
|
||
| ## Overview | ||
|
|
||
| The Soft Actor-Critic (SAC) algorithm extends the DDPG algorithms by 1) using a stochastic policy, which in theory can express multi-modal optimal policies. |
…lot color changes -- mentions global gradient clipping
|
… added VAE paper citation
|
Oh man, I finally found the last reference for SAC I was looking for but could not remember: pranz24/pytorch-soft-actor-critic . |
|
Thanks for the review. Merging then. |
* Preliminary work on the SAC docs and clenarl openbenchmark * Updated the instructions for benchmark script runs * Fixed typos and formatting * SAC docs added Wandb Iframe for PyBullet and MuJoCo; command line for SAC with fixed $\alpha$ * Finalized complete, sac.md doc draft, added images of learning curves * Typo and formulation tweaks * Fixed the autospell detected typo * Update pre-commit file * Fix weird github action error: psf/black#2964 * Typo fix * Follow up on change requests after review.\ntypo fix -- legend and polot color changes -- mentions global gradient clipping * Fleshed out the reparam. trick for action sampling, removed the TODO, added VAE paper citation * added pranz24 reference to sac docs * Added pranz24 reference and licenses Co-authored-by: Costa Huang <costa.huang@outlook.com>

Description
qf_losscomputation: removed the/2gradient scaling so that .backward() is more aligned with the theory. Instead, logqf_loss= (qf1_loss + qf2_loss) / 2.` for meaningful comparison with mono Q-value network algorithms.Same is done in OpenAI SpinningUP for example.
Types of changes
Checklist:
pre-commit run --all-filespasses (required).mkdocs serve.I have updated the tests accordingly (if applicable).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-videoflag toggled on (required).mkdocs serve.width=500andheight=300).I have updated the tests accordingly (if applicable).