Skip to content

Prioritized replay#44

Merged
muupan merged 41 commits intochainer:masterfrom
toslunar:prioritized-replay
Mar 21, 2017
Merged

Prioritized replay#44
muupan merged 41 commits intochainer:masterfrom
toslunar:prioritized-replay

Conversation

@toslunar
Copy link
Copy Markdown
Member

@toslunar toslunar commented Mar 3, 2017

Prioritized replay (for DQN)

@muupan
Copy link
Copy Markdown
Member

muupan commented Mar 3, 2017

Travis CI failed due to style checks. Can you install hacking, apply flake8 and fix warnings?

@toslunar
Copy link
Copy Markdown
Member Author

toslunar commented Mar 6, 2017

Fixed the style. Thanks.

@muupan
Copy link
Copy Markdown
Member

muupan commented Mar 8, 2017

Thanks for the fixes. Can you add tests for SumTree, PrioritizedBuffer and Prioritized(Episodic)ReplayBuffer?

for _ in episodes:
errors_out.append(0.0)
errors_out_step = []
# print('----------------------------------------------------')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove comment-outed code like this?

self.data = []
self.priority_tree = SumTree()
self.data_inf = collections.deque()
self.count_used = []
Copy link
Copy Markdown
Member

@muupan muupan Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of self.count_used?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. count_used was not used.

@muupan muupan mentioned this pull request Mar 9, 2017
@muupan
Copy link
Copy Markdown
Member

muupan commented Mar 14, 2017

I tried adding this test case to tests/test_replay_buffer.py to check capacity handling and it failed, exceeding the specified capacity. It seems capacity parameter doesn't work. Can you fix it?

    def test_capacity(self):
        capacity = 10
        rbuf = replay_buffer.PrioritizedReplayBuffer(capacity)
        # Fill the buffer
        for _ in range(capacity):
            trans1 = dict(state=0, action=1, reward=2, next_state=3,
                          next_action=4, is_state_terminal=True)
            rbuf.append(**trans1)
        self.assertEqual(len(rbuf), capacity)

        # Add a new transition
        trans2 = dict(state=1, action=1, reward=2, next_state=3,
                      next_action=4, is_state_terminal=True)
        rbuf.append(**trans2)
        # The size should not change
        self.assertEqual(len(rbuf), capacity)

self.assertEqual(s2[1], trans1)


class PrioritizedReplayBuffer(unittest.TestCase):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename it to TestPrioritizedReplayBuffer to avoid confusion?

@toslunar
Copy link
Copy Markdown
Member Author

Fixed the issue on capacity.

The argument 'capacity' of PrioritizedBuffer was used to limit len(self.data). Now, it limits len(self) (= len(self.data) + len(self.data_inf)).

self.priority_tree[i] = self.priority_tree[n-1]
del self.priority_tree[n-1]
ret = self.data[i]
self.data[i] = self.data.pop()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.data[i] = self.data.pop()

This would raise an out-of-range error if i == n - 1 because self.data has only n-1 elements after pop.

@muupan
Copy link
Copy Markdown
Member

muupan commented Mar 16, 2017

The example script fails. Can you fix it?

$ python examples/gym/train_dqn_gym.py --prioritized-replay --episodic-replay
Output files are saved in dqn_out/20170316174313282734
INFO:gym.envs.registration:Making new env: Pendulum-v0
INFO:gym.envs.registration:Making new env: Pendulum-v0
DEBUG:chainerrl.agents.dqn:t:0 q:-0.3146612048149109 action_value:QuadraticActionValue greedy_actions:[[-0.89204156]] v:[[-0.3146612]]
...
DEBUG:chainerrl.agents.dqn:t:200 r:-0.00918397948453 a:[-0.41467309]
Saved the agent to dqn_out/20170316174313282734/200_except
Traceback (most recent call last):
  File "examples/gym/train_dqn_gym.py", line 179, in <module>
    main()
  File "examples/gym/train_dqn_gym.py", line 175, in main
    max_episode_len=timestep_limit)
  File "/home/fujita/drill/chainerrl/experiments/train_agent.py", line 124, in train_agent_with_evaluation
    successful_score=successful_score)
  File "/home/fujita/drill/chainerrl/experiments/train_agent.py", line 55, in train_agent
    agent.stop_episode_and_train(obs, r, done=done)
  File "/home/fujita/drill/chainerrl/agents/dqn.py", line 449, in stop_episode_and_train
    self.stop_episode()
  File "/home/fujita/drill/chainerrl/agents/dqn.py", line 456, in stop_episode
    self.replay_buffer.stop_current_episode()
  File "/home/fujita/drill/chainerrl/replay_buffer.py", line 232, in stop_current_episode
    self.episodic_memory.append(self.current_episode)
  File "/home/fujita/drill/chainerrl/misc/prioritized.py", line 18, in append
    if len(self) > self.capacity:
TypeError: unorderable types: int() > NoneType()

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 16, 2017

Coverage Status

Changes Unknown when pulling 4cec485 on toslunar:prioritized-replay into ** on pfnet:master**.

@coveralls
Copy link
Copy Markdown

Coverage Status

Changes Unknown when pulling 620fb17 on toslunar:prioritized-replay into ** on pfnet:master**.

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 16, 2017

Coverage Status

Changes Unknown when pulling b6ca346 on toslunar:prioritized-replay into ** on pfnet:master**.

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 16, 2017

Coverage Status

Changes Unknown when pulling b6ca346 on toslunar:prioritized-replay into ** on pfnet:master**.

@muupan
Copy link
Copy Markdown
Member

muupan commented Mar 21, 2017

Thanks for the fixes! Now it looks good.

@muupan muupan merged commit f4cff28 into chainer:master Mar 21, 2017
@toslunar toslunar deleted the prioritized-replay branch August 13, 2017 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants