Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
| if capture_video: | ||
| if idx == 0: | ||
| env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") |
There was a problem hiding this comment.
| if capture_video: | |
| if idx == 0: | |
| env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") | |
| if capture_video and idx == 0: | |
| env = gym.wrappers.RecordVideo(env, f"videos/{run_name}") |
| args.batch_size = int(args.num_envs * args.num_steps) | ||
| args.minibatch_size = int(args.batch_size // args.num_minibatches) |
There was a problem hiding this comment.
Feel free to remove lines 89-90
| args.seed += local_rank | ||
| random.seed(args.seed) | ||
| np.random.seed(args.seed) | ||
| torch.manual_seed(args.seed - local_rank) |
There was a problem hiding this comment.
Why not
| args.seed += local_rank | |
| random.seed(args.seed) | |
| np.random.seed(args.seed) | |
| torch.manual_seed(args.seed - local_rank) | |
| torch.manual_seed(args.seed) | |
| args.seed += local_rank | |
| random.seed(args.seed) | |
| np.random.seed(args.seed) |
There was a problem hiding this comment.
The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API
| assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported" | ||
|
|
||
| agent = Agent(envs).to(device) | ||
| torch.manual_seed(args.seed) |
| # TRY NOT TO MODIFY: start the game | ||
| global_step = 0 | ||
| start_time = time.time() | ||
| next_obs = torch.Tensor(envs.reset()).to(device) |
There was a problem hiding this comment.
| next_obs = torch.Tensor(envs.reset()).to(device) | |
| next_obs = torch.tensor(envs.reset()).to(device) |
per https://discuss.pytorch.org/t/difference-between-torch-tensor-and-torch-tensor/30786/2
There was a problem hiding this comment.
Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)
| # TRY NOT TO MODIFY: execute the game and log data. | ||
| next_obs, reward, done, info = envs.step(action.cpu().numpy()) | ||
| rewards[step] = torch.tensor(reward).to(device).view(-1) | ||
| next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device) |
| y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy() | ||
| var_y = np.var(y_true) | ||
| explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y |
There was a problem hiding this comment.
nit probably move under the line 387
There was a problem hiding this comment.
This just follows the structure like in other files.
|
Sry, didn't have time to review before merging. |
| args.batch_size = int(args.num_envs * args.num_steps) | ||
| args.minibatch_size = int(args.batch_size // args.num_minibatches) |
There was a problem hiding this comment.
Feel free to remove lines 89-90
| args.seed += local_rank | ||
| random.seed(args.seed) | ||
| np.random.seed(args.seed) | ||
| torch.manual_seed(args.seed - local_rank) |
There was a problem hiding this comment.
The seeding tricks done here is to ensure the same seed is used to initialize the agent's parameters: see "Adjust seed per process" https://docs.cleanrl.dev/rl-algorithms/ppo/#implementation-details_6. The more elegant way to do it is use an API to somehow broadcast the Agent's parameters in rank 0 to other tanks, but I haven't found such an API
| assert isinstance(envs.single_action_space, gym.spaces.Discrete), "only discrete action space is supported" | ||
|
|
||
| agent = Agent(envs).to(device) | ||
| torch.manual_seed(args.seed) |
| # TRY NOT TO MODIFY: start the game | ||
| global_step = 0 | ||
| start_time = time.time() | ||
| next_obs = torch.Tensor(envs.reset()).to(device) |
There was a problem hiding this comment.
Hmm in the past I have weird issues with torch.tensor. I'd also avoid changing it just in ppo_atari_multigpu.py but not in other files. Bottom line I don't think this would be a huge issue and cause performance differences, but I am happy to change it if evidence shows otherwise :)
| # TRY NOT TO MODIFY: execute the game and log data. | ||
| next_obs, reward, done, info = envs.step(action.cpu().numpy()) | ||
| rewards[step] = torch.tensor(reward).to(device).view(-1) | ||
| next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device) |
| y_pred, y_true = b_values.cpu().numpy(), b_returns.cpu().numpy() | ||
| var_y = np.var(y_true) | ||
| explained_var = np.nan if var_y == 0 else 1 - np.var(y_true - y_pred) / var_y |
There was a problem hiding this comment.
This just follows the structure like in other files.
* Add multi-gpu example * fix pre-commit * Add documentation and benchmark * Update documentation * Quick fix * Also record world size in the params * remove trailing space * revert changes * Update test cases * Update test cases * Fix CI * Fix tests * Fix pre-commit * Fix tests * Add a note that multi gpu only supported in linux
Description
This is a follow up on #162
Types of changes
Checklist:
pre-commit run --all-filespasses (required).mkdocs serve.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-videoflag toggled on (required).mkdocs serve.width=500andheight=300).