Skip to content

PPO + JAX + EnvPool + Atari#227

Merged
vwxyzjn merged 37 commits intomasterfrom
jax-ppo-envpool-atari
Oct 6, 2022
Merged

PPO + JAX + EnvPool + Atari#227
vwxyzjn merged 37 commits intomasterfrom
jax-ppo-envpool-atari

Conversation

@vwxyzjn
Copy link
Copy Markdown
Owner

@vwxyzjn vwxyzjn commented Jul 7, 2022

Description

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link
Copy Markdown

vercel bot commented Jul 7, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Oct 6, 2022 at 1:23AM (UTC)

@vwxyzjn
Copy link
Copy Markdown
Owner Author

vwxyzjn commented Jul 10, 2022

image

The parameters count match

@vwxyzjn vwxyzjn changed the title PPO + jax + envpool + atari PPO + JAX + EnvPool + Atari Jul 12, 2022
@vwxyzjn
Copy link
Copy Markdown
Owner Author

vwxyzjn commented Aug 24, 2022

We got this message. See #227 (comment)
```
NotImplementedError: Got <class 'jaxlib.xla_extension.DeviceArray'>, but numpy array, torch tensor, or caffe2 blob name are expected.
```
@vwxyzjn
Copy link
Copy Markdown
Owner Author

vwxyzjn commented Oct 5, 2022

Hi @yooceii @kinalmehta, I have addressed most of your concerns. Please let me know if additional tweaks are needed.

Copy link
Copy Markdown
Collaborator

@kinalmehta kinalmehta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the concerns seem to be addressed.

LGTM

Great Work!!

@vwxyzjn vwxyzjn dismissed yooceii’s stale review October 6, 2022 14:54

concerns addressed

@vwxyzjn vwxyzjn merged commit 42d21bd into master Oct 6, 2022
ludgerpaehler pushed a commit to ludgerpaehler/koopman-rl that referenced this pull request Jan 13, 2026
* PPO + jax + envpool + atari

* fix bug: only report metric when lifes are used up

* pre-commit

* quick fix

* Quick refactor

* push changes

* pre-commit and use EnvPool's new API

* update envpool

* update docs

* update ppo benchmark script

* update docs

* use the latest envpool interface

* update envpool to the latest version

* update pyproject.toml

* update lock files

* Quick clarification

* Update docs

* remove non benchmarked script

* update docs

* revert poetry changes

* docs fix

* remove uncessary code, add docs

* add a note one envpool

* update test cases

* explain `get_action_and_value`

* fix indent

* Fix weird error with `np.mean`. See below:

We got this message. See vwxyzjn#227 (comment)
```
NotImplementedError: Got <class 'jaxlib.xla_extension.DeviceArray'>, but numpy array, torch tensor, or caffe2 blob name are expected.
```

* update docs

* pre-commit

* add note on `charts/avg_episodic_return`

* update reproducibility script

* add note on value function clipping
softwarecore1995 added a commit to softwarecore1995/clean-rl that referenced this pull request Feb 22, 2026
* PPO + jax + envpool + atari

* fix bug: only report metric when lifes are used up

* pre-commit

* quick fix

* Quick refactor

* push changes

* pre-commit and use EnvPool's new API

* update envpool

* update docs

* update ppo benchmark script

* update docs

* use the latest envpool interface

* update envpool to the latest version

* update pyproject.toml

* update lock files

* Quick clarification

* Update docs

* remove non benchmarked script

* update docs

* revert poetry changes

* docs fix

* remove uncessary code, add docs

* add a note one envpool

* update test cases

* explain `get_action_and_value`

* fix indent

* Fix weird error with `np.mean`. See below:

We got this message. See vwxyzjn/cleanrl#227 (comment)
```
NotImplementedError: Got <class 'jaxlib.xla_extension.DeviceArray'>, but numpy array, torch tensor, or caffe2 blob name are expected.
```

* update docs

* pre-commit

* add note on `charts/avg_episodic_return`

* update reproducibility script

* add note on value function clipping
softwarecore1995 added a commit to softwarecore1995/clean-rl that referenced this pull request Feb 22, 2026
We got this message. See vwxyzjn/cleanrl#227 (comment)
```
NotImplementedError: Got <class 'jaxlib.xla_extension.DeviceArray'>, but numpy array, torch tensor, or caffe2 blob name are expected.
```
arjunmahesh1 pushed a commit to arjunmahesh1/cleanrl that referenced this pull request Mar 6, 2026
* PPO + jax + envpool + atari

* fix bug: only report metric when lifes are used up

* pre-commit

* quick fix

* Quick refactor

* push changes

* pre-commit and use EnvPool's new API

* update envpool

* update docs

* update ppo benchmark script

* update docs

* use the latest envpool interface

* update envpool to the latest version

* update pyproject.toml

* update lock files

* Quick clarification

* Update docs

* remove non benchmarked script

* update docs

* revert poetry changes

* docs fix

* remove uncessary code, add docs

* add a note one envpool

* update test cases

* explain `get_action_and_value`

* fix indent

* Fix weird error with `np.mean`. See below:

We got this message. See vwxyzjn/cleanrl#227 (comment)
```
NotImplementedError: Got <class 'jaxlib.xla_extension.DeviceArray'>, but numpy array, torch tensor, or caffe2 blob name are expected.
```

* update docs

* pre-commit

* add note on `charts/avg_episodic_return`

* update reproducibility script

* add note on value function clipping
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants