Prioritized replay by toslunar · Pull Request #44 · chainer/chainerrl

toslunar · 2017-03-03T07:12:05Z

Prioritized replay (for DQN)

muupan · 2017-03-03T10:15:01Z

Travis CI failed due to style checks. Can you install hacking, apply flake8 and fix warnings?

toslunar · 2017-03-06T05:37:11Z

Fixed the style. Thanks.

muupan · 2017-03-08T07:36:03Z

Thanks for the fixes. Can you add tests for SumTree, PrioritizedBuffer and Prioritized(Episodic)ReplayBuffer?

muupan · 2017-03-08T07:39:20Z

chainerrl/agents/dqn.py

+            for _ in episodes:
+                errors_out.append(0.0)
+            errors_out_step = []
+            # print('----------------------------------------------------')


Can you remove comment-outed code like this?

muupan · 2017-03-08T15:44:37Z

chainerrl/misc/prioritized.py

+        self.data = []
+        self.priority_tree = SumTree()
+        self.data_inf = collections.deque()
+        self.count_used = []


What is the purpose of self.count_used?

Sorry. count_used was not used.

count of updates is betasteps

muupan · 2017-03-14T07:07:09Z

I tried adding this test case to tests/test_replay_buffer.py to check capacity handling and it failed, exceeding the specified capacity. It seems capacity parameter doesn't work. Can you fix it?

    def test_capacity(self):
        capacity = 10
        rbuf = replay_buffer.PrioritizedReplayBuffer(capacity)
        # Fill the buffer
        for _ in range(capacity):
            trans1 = dict(state=0, action=1, reward=2, next_state=3,
                          next_action=4, is_state_terminal=True)
            rbuf.append(**trans1)
        self.assertEqual(len(rbuf), capacity)

        # Add a new transition
        trans2 = dict(state=1, action=1, reward=2, next_state=3,
                      next_action=4, is_state_terminal=True)
        rbuf.append(**trans2)
        # The size should not change
        self.assertEqual(len(rbuf), capacity)

muupan · 2017-03-14T07:08:16Z

tests/test_replay_buffer.py

            self.assertEqual(s2[1], trans1)
+
+
+class PrioritizedReplayBuffer(unittest.TestCase):


Can you rename it to TestPrioritizedReplayBuffer to avoid confusion?

TestFooBar(unittest.TestCase)

chainer#44 (comment)

toslunar · 2017-03-14T08:11:17Z

Fixed the issue on capacity.

The argument 'capacity' of PrioritizedBuffer was used to limit len(self.data). Now, it limits len(self) (= len(self.data) + len(self.data_inf)).

muupan · 2017-03-16T06:19:35Z

chainerrl/misc/prioritized.py

+        self.priority_tree[i] = self.priority_tree[n-1]
+        del self.priority_tree[n-1]
+        ret = self.data[i]
+        self.data[i] = self.data.pop()


self.data[i] = self.data.pop()

This would raise an out-of-range error if i == n - 1 because self.data has only n-1 elements after pop.

muupan · 2017-03-16T08:46:41Z

The example script fails. Can you fix it?

$ python examples/gym/train_dqn_gym.py --prioritized-replay --episodic-replay
Output files are saved in dqn_out/20170316174313282734
INFO:gym.envs.registration:Making new env: Pendulum-v0
INFO:gym.envs.registration:Making new env: Pendulum-v0
DEBUG:chainerrl.agents.dqn:t:0 q:-0.3146612048149109 action_value:QuadraticActionValue greedy_actions:[[-0.89204156]] v:[[-0.3146612]]
...
DEBUG:chainerrl.agents.dqn:t:200 r:-0.00918397948453 a:[-0.41467309]
Saved the agent to dqn_out/20170316174313282734/200_except
Traceback (most recent call last):
  File "examples/gym/train_dqn_gym.py", line 179, in <module>
    main()
  File "examples/gym/train_dqn_gym.py", line 175, in main
    max_episode_len=timestep_limit)
  File "/home/fujita/drill/chainerrl/experiments/train_agent.py", line 124, in train_agent_with_evaluation
    successful_score=successful_score)
  File "/home/fujita/drill/chainerrl/experiments/train_agent.py", line 55, in train_agent
    agent.stop_episode_and_train(obs, r, done=done)
  File "/home/fujita/drill/chainerrl/agents/dqn.py", line 449, in stop_episode_and_train
    self.stop_episode()
  File "/home/fujita/drill/chainerrl/agents/dqn.py", line 456, in stop_episode
    self.replay_buffer.stop_current_episode()
  File "/home/fujita/drill/chainerrl/replay_buffer.py", line 232, in stop_current_episode
    self.episodic_memory.append(self.current_episode)
  File "/home/fujita/drill/chainerrl/misc/prioritized.py", line 18, in append
    if len(self) > self.capacity:
TypeError: unorderable types: int() > NoneType()

coveralls · 2017-03-16T09:18:30Z

Changes Unknown when pulling 4cec485 on toslunar:prioritized-replay into ** on pfnet:master**.

coveralls · 2017-03-16T10:53:53Z

Changes Unknown when pulling 620fb17 on toslunar:prioritized-replay into ** on pfnet:master**.

coveralls · 2017-03-16T12:18:50Z

Changes Unknown when pulling b6ca346 on toslunar:prioritized-replay into ** on pfnet:master**.

coveralls · 2017-03-16T12:29:52Z

Changes Unknown when pulling b6ca346 on toslunar:prioritized-replay into ** on pfnet:master**.

muupan · 2017-03-21T07:36:21Z

Thanks for the fixes! Now it looks good.

toslunar added 5 commits March 3, 2017 15:19

.

dadfc60

.

61f349b

.

9dc17e8

.

7d2b656

debug

5f5f601

toslunar and others added 8 commits March 3, 2017 19:28

prioritized episodic replay

73516ec

flake8

9e44cce

debug

3ff4158

use errors_out

8a8d60f

episodic

78d07d8

assert priority

a0de6ad

style

605c838

episodic example

de4ac14

muupan reviewed Mar 8, 2017

View reviewed changes

toslunar added 4 commits March 9, 2017 14:00

delete commented out codes

18bc61f

refactoring

a372c10

debug

fe1629e

tests

ffd05cd

muupan mentioned this pull request Mar 9, 2017

Path Consistency Learning #45

Merged

toslunar added 7 commits March 9, 2017 18:29

debug

e069210

schedule beta by exploration steps

4bb1d4d

debug

ae006ff

randomize order of new data

a5f6b64

.

2a275d6

.

9fe9b14

copy paste

cb5aaad

toslunar added 7 commits March 10, 2017 11:31

.

c0a29b0

refactoring

f5e6957

Take --update-frequency into account

192d3d0

count of updates is betasteps

remove parentheses after assert

76f1cdf

test PrioritizedReplayBuffer

de14e19

use __getitem__ and __setitem__

4d12cad

use __delitem__

dd013a9

muupan reviewed Mar 14, 2017

View reviewed changes

toslunar added 4 commits March 14, 2017 16:22

rename

7d3a717

TestFooBar(unittest.TestCase)

add muupan's testcase

ced321f

chainer#44 (comment)

check capacity on appending

6a2fce4

misc

080d0a0

muupan reviewed Mar 16, 2017

View reviewed changes

debug

41ca2d6

debug

4cec485

toslunar added 3 commits March 16, 2017 18:19

test capacity=None

2bda0e3

test episodic replay buffers

1a6eea9

make capacity optional

620fb17

sorry

b6ca346

muupan merged commit f4cff28 into chainer:master Mar 21, 2017

toslunar deleted the prioritized-replay branch August 13, 2017 21:49

		self.assertEqual(s2[1], trans1)


		class PrioritizedReplayBuffer(unittest.TestCase):

Conversation

toslunar commented Mar 3, 2017

Uh oh!

muupan commented Mar 3, 2017

Uh oh!

toslunar commented Mar 6, 2017

Uh oh!

muupan commented Mar 8, 2017

Uh oh!

muupan Mar 8, 2017

Choose a reason for hiding this comment

Uh oh!

muupan Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toslunar Mar 9, 2017

Choose a reason for hiding this comment

Uh oh!

muupan commented Mar 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan Mar 14, 2017

Choose a reason for hiding this comment

Uh oh!

toslunar commented Mar 14, 2017

Uh oh!

muupan Mar 16, 2017

Choose a reason for hiding this comment

Uh oh!

muupan commented Mar 16, 2017

Uh oh!

coveralls commented Mar 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Mar 16, 2017

Uh oh!

coveralls commented Mar 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Mar 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Mar 21, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

muupan Mar 8, 2017 •

edited

Loading

muupan commented Mar 14, 2017 •

edited

Loading

coveralls commented Mar 16, 2017 •

edited

Loading

coveralls commented Mar 16, 2017 •

edited

Loading

coveralls commented Mar 16, 2017 •

edited

Loading