Proper multi-gpu support with PPO by vwxyzjn · Pull Request #178 · vwxyzjn/cleanrl

vwxyzjn · 2022-05-04T22:53:01Z

Description

This is a follow up on #162

Types of changes

New algorithm

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-05-04T22:53:03Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	May 29, 2022 at 3:25PM (UTC)

gitpod-io · 2022-05-04T22:53:05Z

yooceii · 2022-05-29T18:33:16Z

cleanrl/ppo_atari_multigpu.py

+        if capture_video:
+            if idx == 0:
+                env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")


Suggested change

if capture_video:

if idx == 0:

env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")

if capture_video and idx == 0:

env = gym.wrappers.RecordVideo(env, f"videos/{run_name}")

yooceii · 2022-05-29T18:36:24Z

cleanrl/ppo_atari_multigpu.py

+    args.batch_size = int(args.num_envs * args.num_steps)
+    args.minibatch_size = int(args.batch_size // args.num_minibatches)


Dup to line 89-90?

Feel free to remove lines 89-90

yooceii · 2022-05-29T18:39:13Z

cleanrl/ppo_atari_multigpu.py

+    args.seed += local_rank
+    random.seed(args.seed)
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed - local_rank)


Why not

Suggested change

args.seed += local_rank

random.seed(args.seed)

np.random.seed(args.seed)

torch.manual_seed(args.seed - local_rank)

torch.manual_seed(args.seed)

args.seed += local_rank

random.seed(args.seed)

np.random.seed(args.seed)

The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API

yooceii · 2022-05-29T18:44:47Z

cleanrl/ppo_atari_multigpu.py

+    assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
+
+    agent = Agent(envs).to(device)
+    torch.manual_seed(args.seed)


Dup to line 201?

See comment above.

yooceii · 2022-05-29T18:48:52Z

cleanrl/ppo_atari_multigpu.py

+    # TRY NOT TO MODIFY: start the game
+    global_step = 0
+    start_time = time.time()
+    next_obs = torch.Tensor(envs.reset()).to(device)


Suggested change

next_obs = torch.Tensor(envs.reset()).to(device)

next_obs = torch.tensor(envs.reset()).to(device)

per https://discuss.pytorch.org/t/difference-between-torch-tensor-and-torch-tensor/30786/2

Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)

yooceii · 2022-05-29T18:49:15Z

cleanrl/ppo_atari_multigpu.py

+            # TRY NOT TO MODIFY: execute the game and log data.
+            next_obs, reward, done, info = envs.step(action.cpu().numpy())
+            rewards[step] = torch.tensor(reward).to(device).view(-1)
+            next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)


Same to line 236

See comment above.

yooceii · 2022-05-29T18:54:15Z

cleanrl/ppo_atari_multigpu.py

+        y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy()
+        var_y = np.var(y_true)
+        explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y


nit probably move under the line 387

This just follows the structure like in other files.

yooceii · 2022-05-29T18:58:54Z

Sry, didn't have time to review before merging.

vwxyzjn

Thank you @yooceii for the review! I left a few comments. Feel free to open a PR to fix applicable issues :)

vwxyzjn · 2022-05-29T21:19:34Z

cleanrl/ppo_atari_multigpu.py

+    args.batch_size = int(args.num_envs * args.num_steps)
+    args.minibatch_size = int(args.batch_size // args.num_minibatches)


Feel free to remove lines 89-90

vwxyzjn · 2022-05-29T21:22:12Z

cleanrl/ppo_atari_multigpu.py

+    args.seed += local_rank
+    random.seed(args.seed)
+    np.random.seed(args.seed)
+    torch.manual_seed(args.seed - local_rank)


The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API

vwxyzjn · 2022-05-29T21:22:20Z

cleanrl/ppo_atari_multigpu.py

+    assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported"
+
+    agent = Agent(envs).to(device)
+    torch.manual_seed(args.seed)


See comment above.

vwxyzjn · 2022-05-29T21:23:21Z

cleanrl/ppo_atari_multigpu.py

+    # TRY NOT TO MODIFY: start the game
+    global_step = 0
+    start_time = time.time()
+    next_obs = torch.Tensor(envs.reset()).to(device)


Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)

vwxyzjn · 2022-05-29T21:23:31Z

cleanrl/ppo_atari_multigpu.py

+            # TRY NOT TO MODIFY: execute the game and log data.
+            next_obs, reward, done, info = envs.step(action.cpu().numpy())
+            rewards[step] = torch.tensor(reward).to(device).view(-1)
+            next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)


See comment above.

vwxyzjn · 2022-05-29T21:24:42Z

cleanrl/ppo_atari_multigpu.py

+        y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy()
+        var_y = np.var(y_true)
+        explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y


This just follows the structure like in other files.

* Add multi-gpu example * fix pre-commit * Add documentation and benchmark * Update documentation * Quick fix * Also record world size in the params * remove trailing space * revert changes * Update test cases * Update test cases * Fix CI * Fix tests * Fix pre-commit * Fix tests * Add a note that multi gpu only supported in linux

vwxyzjn added 3 commits May 2, 2022 23:22

Add multi-gpu example

7ee1828

fix pre-commit

971c6bc

Add documentation and benchmark

c8fe88b

Update documentation

5cbbddf

vercel bot deployed to Preview May 4, 2022 22:56 View deployment

vwxyzjn mentioned this pull request May 5, 2022

Prototype multi-gpu support with PPO #162

Closed

18 tasks

Quick fix

c1a3286

vercel bot deployed to Preview May 5, 2022 13:51 View deployment

vwxyzjn mentioned this pull request May 10, 2022

multi-gpu implementation for PPO #182

Closed

vwxyzjn requested review from dosssman and yooceii May 14, 2022 03:44

Also record world size in the params

ada9cfa

vercel bot deployed to Preview May 18, 2022 22:06 View deployment

remove trailing space

fbea751

vercel bot deployed to Preview May 25, 2022 16:13 View deployment

revert changes

f9f75c8

vercel bot deployed to Preview May 29, 2022 14:28 View deployment

Update test cases

0e933e9

vercel bot deployed to Preview May 29, 2022 14:33 View deployment

Update test cases

a0be9e3

vercel bot deployed to Preview May 29, 2022 14:34 View deployment

Fix CI

d29a9ad

vercel bot deployed to Preview May 29, 2022 14:42 View deployment

Fix tests

6d8bed8

vercel bot deployed to Preview May 29, 2022 14:45 View deployment

Fix pre-commit

8d234f4

vercel bot deployed to Preview May 29, 2022 14:46 View deployment

Fix tests

16aa83e

vercel bot deployed to Preview May 29, 2022 14:50 View deployment

Add a note that multi gpu only supported in linux

2931b56

vercel bot deployed to Preview May 29, 2022 15:25 View deployment

vwxyzjn merged commit 3555838 into master May 29, 2022

yooceii reviewed May 29, 2022

View reviewed changes

vwxyzjn commented May 29, 2022

View reviewed changes

		args.batch_size = int(args.num_envs * args.num_steps)
		args.minibatch_size = int(args.batch_size // args.num_minibatches)

	next_obs = torch.Tensor(envs.reset()).to(device)
	next_obs = torch.tensor(envs.reset()).to(device)

Conversation

vwxyzjn commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Checklist:

Uh oh!

vercel bot commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitpod-io bot commented May 4, 2022

Uh oh!

yooceii May 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yooceii commented May 29, 2022

Uh oh!

vwxyzjn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vwxyzjn commented May 4, 2022 •

edited

Loading

vercel bot commented May 4, 2022 •

edited

Loading

yooceii May 29, 2022 •

edited

Loading