Mimic the details of prioritized experience replay by muupan · Pull Request #301 · chainer/chainerrl

muupan · 2018-08-31T06:37:50Z

This PR adds changes to mimic the details of the original paper and their implementation.

Errors are clipped to [error_min, error_max]. The default is [0, 1].
eps is added to clipped errors before applying alpha.
eps's default value is set to 0.01. I confirmed that they use 0.01 by asking the first author via email.

I will add changes in examples to mimic their training settings later.

epsilon is added to (absoulte) errors, not priorities default epsilon is 0.01 TD errors are clipped by [-1, 1]

toslunar

LGTM

muupan added 3 commits June 18, 2018 17:16

Mimic the paper

76ff6cd

epsilon is added to (absoulte) errors, not priorities default epsilon is 0.01 TD errors are clipped by [-1, 1]

Merge branch 'master' into replicate-prioritized-replay

7214d4a

Merge branch 'master' into replicate-prioritized-replay

4d32a21

muupan mentioned this pull request Aug 31, 2018

Tuned DoubleDQN with prioritized experience replay #302

Merged

toslunar approved these changes Aug 31, 2018

View reviewed changes

toslunar merged commit eb18687 into chainer:master Aug 31, 2018

toslunar added this to the v0.5 milestone Sep 7, 2018

muupan added enhancement no-compat labels Nov 13, 2018

Provide feedback