Skip to content

Proper multi-gpu support with PPO#178

Merged
vwxyzjn merged 15 commits intomasterfrom
new-multi-gpu
May 29, 2022
Merged

Proper multi-gpu support with PPO#178
vwxyzjn merged 15 commits intomasterfrom
new-multi-gpu

Conversation

@vwxyzjn
Copy link
Copy Markdown
Owner

@vwxyzjn vwxyzjn commented May 4, 2022

Description

This is a follow up on #162

Types of changes

  • New algorithm

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link
Copy Markdown

vercel bot commented May 4, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview May 29, 2022 at 3:25PM (UTC)

@gitpod-io
Copy link
Copy Markdown

gitpod-io bot commented May 4, 2022

Comment on lines +99 to +101
if capture_video:
if idx == 0:
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
Copy link
Copy Markdown
Collaborator

@yooceii yooceii May 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if capture_video:
if idx == 0:
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")
if capture_video and idx == 0:
env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")

Comment on lines +162 to +163
args.batch_size = int(args.num_envs * args.num_steps)
args.minibatch_size = int(args.batch_size // args.num_minibatches)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dup to line 89-90?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove lines 89-90

Comment on lines +199 to +202
args.seed += local_rank
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed - local_rank)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
args.seed += local_rank
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed - local_rank)
torch.manual_seed(args.seed)
args.seed += local_rank
random.seed(args.seed)
np.random.seed(args.seed)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API

assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"

agent = Agent(envs).to(device)
torch.manual_seed(args.seed)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dup to line 201?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

# TRY NOT TO MODIFY: start the game
global_step = 0
start_time = time.time()
next_obs = torch.Tensor(envs.reset()).to(device)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
next_obs = torch.Tensor(envs.reset()).to(device)
next_obs = torch.tensor(envs.reset()).to(device)

per https://discuss.pytorch.org/t/difference-between-torch-tensor-and-torch-tensor/30786/2

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)

# TRY NOT TO MODIFY: execute the game and log data.
next_obs, reward, done, info = envs.step(action.cpu().numpy())
rewards[step] = torch.tensor(reward).to(device).view(-1)
next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same to line 236

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Comment on lines +382 to +384
y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy()
var_y = np.var(y_true)
explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit probably move under the line 387

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just follows the structure like in other files.

@yooceii
Copy link
Copy Markdown
Collaborator

yooceii commented May 29, 2022

Sry, didn't have time to review before merging.

Copy link
Copy Markdown
Owner Author

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yooceii for the review! I left a few comments. Feel free to open a PR to fix applicable issues :)

Comment on lines +162 to +163
args.batch_size = int(args.num_envs * args.num_steps)
args.minibatch_size = int(args.batch_size // args.num_minibatches)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to remove lines 89-90

Comment on lines +199 to +202
args.seed += local_rank
random.seed(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed - local_rank)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API

assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"

agent = Agent(envs).to(device)
torch.manual_seed(args.seed)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

# TRY NOT TO MODIFY: start the game
global_step = 0
start_time = time.time()
next_obs = torch.Tensor(envs.reset()).to(device)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)

# TRY NOT TO MODIFY: execute the game and log data.
next_obs, reward, done, info = envs.step(action.cpu().numpy())
rewards[step] = torch.tensor(reward).to(device).view(-1)
next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

Comment on lines +382 to +384
y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy()
var_y = np.var(y_true)
explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just follows the structure like in other files.

ludgerpaehler pushed a commit to ludgerpaehler/koopman-rl that referenced this pull request Jan 13, 2026
* Add multi-gpu example

* fix pre-commit

* Add documentation and benchmark

* Update documentation

* Quick fix

* Also record world size in the params

* remove trailing space

* revert changes

* Update test cases

* Update test cases

* Fix CI

* Fix tests

* Fix pre-commit

* Fix tests

* Add a note that multi gpu only supported in linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants