Allow envs to send a 'needs_reset' signal by muupan · Pull Request #356 · chainer/chainerrl

muupan · 2018-11-16T12:55:36Z

This PR enables an env to send a signal that indicates it needs a reset via the info dict returned by env.step without setting done=True.

This functionality is needed to implement to strictly follow the training protocol of DeepMind on Atari games, which limits the number of frames of an episode ignoring life losses, while the agent still sees episodes that terminate by a life loss.

add ContinuingTimeLimit wrapper
handle needs_reset signal in both training and evaluation
check how it affects DQN on Atari

muupan · 2018-11-26T07:55:26Z

I confirmed the change introduced by this PR does not affect the scores on Atari.

old: examples/ale/train_dqn_ale.py --eval-interval 1000000 --env {env_id} (before this PR)
new: examples/ale/train_dqn_ale.py --eval-interval 1000000 --max-frames 108000 --env {env_id} (after this PR)

prabhatnagarajan

I still have to dig deeper in another review.

prabhatnagarajan · 2018-11-26T12:03:34Z

chainerrl/experiments/train_agent.py

-            if done or episode_len == max_episode_len or t == steps:
+            reset = (episode_len == max_episode_len
+                     or info.get('needs_reset', False))
+            if done or reset or t == steps:


Later in this "if" statement (https://github.com/chainer/chainerrl/pull/356/files#diff-a2caf3ec0e2750a1d16edb375789daa5R81), you reset the environment. Why do you reset the environment if done is true? What if reset = False?

Even if reset == False, we need to call env.reset() if done == True. In other words, the reset variable can be false when we reset the env due to done == True. It is possible to rename reset as non_done_reset or something, but it would be verbose.

So this is targeted primarily at environments that reset based off of a max-episode-length or via done, and not environments where done is True but you still do not reset the environment?

Correct. Currently, ChainerRL assumes that env.reset must be called when done==True.

prabhatnagarajan · 2018-11-26T12:05:59Z

chainerrl/experiments/train_agent_async.py

                        outdir, global_t, local_t, episode_r)
                    logger.info('statistics:%s', agent.get_statistics())
+
+                # Evalaute the current agent


Nit: Evaluate*, not "Evalaute"

prabhatnagarajan · 2018-11-26T12:25:10Z

chainerrl/experiments/train_agent_batch.py

@@ -86,9 +88,6 @@ def train_agent_batch(agent, env, steps, outdir, log_interval=None,
            #   5. reset the env to start a new episode


Should these comments be revised?

I think these comments are still correct except that 3-5 are skipped when training is finished. I'll clarify this in the comments.

prabhatnagarajan · 2018-11-26T12:51:06Z

chainerrl/wrappers/atari_wrappers.py


 from gym import spaces

+import chainerrl


Since you're changing the file, perhaps we should change the header from "This file is a fork from a MIT-licensed project" to "This file adapted from an MIT-licensed project..."

"fork" means it already has changes, no?

prabhatnagarajan · 2018-11-26T12:51:55Z

chainerrl/wrappers/continuing_time_limit.py

+    each episode, except that done=False is returned and that
+    info['needs_reset'] is set to True when past the limit.
+
+    Code that calls env.step is repsonsible for checking the info dict, the


"responsible" - typo

prabhatnagarajan · 2018-11-28T03:22:56Z

examples/ale/train_dqn_ale.py

-    parser.add_argument('--max-episode-len', type=int,
-                        default=30 * 60 * 60 // 4,  # 30 minutes with 60/4 fps
-                        help='Maximum number of timesteps for each episode.')
+    parser.add_argument('--max-frames', type=int,


Is train_dqn_ale the only file that should have max_frames? At least atari/train_dqn.py should also incorporate these changes, right? Why did you only change the DQN example?

Other examples that use atari_wrappers are affected as well. If the changes to examples/ale/train_dqn_ale.py look ok to you, I can apply the same changes to other examples as well, though I believe they would work without changes.

Sounds good. I think we should still make the changes to other examples using atari_wrappers for completeness/consistency.

prabhatnagarajan

I've read through this PR, and everything seems to be okay. was slightly confused by some of the tests, but they all seemed fine. This is ready for approval, but please make the following minor changes before merging with master:

[Address the comments (e.g. typos etc.)]
[Make the changes to train_dqn_ale.py apply to other relevant files]

prabhatnagarajan · 2018-11-29T02:17:40Z

examples/ale/train_dqn_ale.py

-    parser.add_argument('--max-episode-len', type=int,
-                        default=30 * 60 * 60 // 4,  # 30 minutes with 60/4 fps
-                        help='Maximum number of timesteps for each episode.')
+    parser.add_argument('--max-frames', type=int,


Sounds good. I think we should still make the changes to other examples using atari_wrappers for completeness/consistency.

muupan · 2018-12-07T08:12:20Z

I addressed the comments and made the same change to other examples.

prabhatnagarajan

LGTM;

muupan added 23 commits November 16, 2018 16:20

Reset if info['needs_reset'] is True

19210cd

Support reset with done=False by frames

7880de6

Move ContinuingTimeLimit to a new file

e5d2743

Clean ContinuingTimeLimit

bac52cd

Add ContinuingTimeLimit to __init__.py

bb03699

Add tests for ContinuingTimeLimit

989f03d

Use chainerrl.wrappers.ContinuingTimeLimit

194ec71

Update batch_run_evaluation_episodes to support needs_reset

c7ef013

Update train_agent_batch to support needs_reset

a20703c

Test train_agent with needs_reset

2d6c298

Avoid unnecessary resetting

54e1039

Test evaluation runs with needs_reset

bf84592

Avoid unnecessary resetting in batch training

80b8ad5

Update test due to avoiding unnecessary reset

e27f22a

Test train_agent_batch with needs_reset

8121c74

Make logic of train_agent_async similar to train_agent

b7aaaed

Test train_agent_async with needs_reset

623e8e4

Simplify

5958e41

Simplify

af6224a

Correct comments

a440022

Test the case where the last state in training is terminal

a578899

Specify --max-frames instead of --max-episode-len

0d8f52b

Remove no longer used arg

c959492

muupan changed the title ~~[WIP] Allow envs to send a 'needs_reset' signal~~ Allow envs to send a 'needs_reset' signal Nov 26, 2018

muupan requested a review from prabhatnagarajan November 26, 2018 07:56

prabhatnagarajan reviewed Nov 26, 2018

View reviewed changes

muupan mentioned this pull request Nov 27, 2018

Implicit quantile networks (IQN) #288

Merged

3 tasks

prabhatnagarajan suggested changes Nov 28, 2018

View reviewed changes

prabhatnagarajan approved these changes Nov 29, 2018

View reviewed changes

muupan added 5 commits December 7, 2018 16:47

Add comment on avoiding unnecessary env resetting

ffdfa05

Fix typos

9dc2db0

Apply same changes to other examples

dc1ec4c

Merge branch 'master' into continuing-time-limit

7127b29

Apply same change

66165fa

muupan added 2 commits December 9, 2018 09:36

Merge branch 'fix-unicode-error' into continuing-time-limit

924e3b2

Merge branch 'master' into continuing-time-limit

eb9f3d2

prabhatnagarajan approved these changes Dec 12, 2018

View reviewed changes

muupan merged commit ea69e24 into chainer:master Dec 12, 2018

muupan deleted the continuing-time-limit branch December 12, 2018 06:30

muupan added this to the v0.6 milestone Feb 26, 2019

muupan added the enhancement label Feb 26, 2019

		@@ -86,9 +88,6 @@ def train_agent_batch(agent, env, steps, outdir, log_interval=None,
		# 5. reset the env to start a new episode

Conversation

muupan commented Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Nov 26, 2018

Uh oh!

prabhatnagarajan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muupan Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prabhatnagarajan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muupan commented Dec 7, 2018

Uh oh!

prabhatnagarajan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

muupan commented Nov 16, 2018 •

edited

Loading

muupan Nov 27, 2018 •

edited

Loading