Categorical DQN (C51) by muupan · Pull Request #249 · chainer/chainerrl

muupan · 2018-03-16T10:42:00Z

Check performance on Atari
Add tests
- tests of C51
- tests of DistributionalDiscreteActionValue
Clean code
- ~~Implement its own __init__~~ Clarify differences from DQN in the docstring

~~Merge #248 first.~~

This reverts commit 96339e1.

muupan · 2018-03-28T06:22:01Z

Below are the results of examples/ale/train_c51_ale.py {rom} and examples/ale/train_dqn_ale {rom} --agent DQN/DoubleDQN/PAL, each with three random seeds. C51 achieves better scores across games than other DQN variants.

Previous implementation does not work for out-of-bound values.

muupan · 2018-04-02T08:16:21Z

It is unclear to me that categorical projection should be implemented as Algorithm 1 or as (7) in the paper. Algorithm 1 seems wrong to me when b_j is an integer, so I handled the case when bj is an integer separately.

dealing with a special case

toslunar

Thanks a lot. I reviewed.

toslunar · 2018-04-03T04:04:03Z

chainerrl/action_value.py

 import chainer
 from chainer import cuda
 from chainer import functions as F
+


Fix style: remove this empty line

toslunar · 2018-04-03T04:13:08Z

chainerrl/agents/categorical_dqn.py

+            (batch_size, n_atoms).
+        y_probs (ndarray): Probabilities of atoms whose values are y.
+            Its shape must be (batch_size, n_atoms).
+        z (ndarray): Values of atoms before projection after projection. Its


Fix typo: Values of atoms ~~before projection~~ after projection.

toslunar · 2018-04-04T09:52:59Z

tests/agents_tests/test_categorical_dqn.py

+                for j in range(n_atoms - 1):
+                    if z[j] < yi <= z[j + 1]:
+                        proj_probs[b, j] += (z[j + 1] - yi) / delta_z * p
+                        proj_probs[b, j + 1] += (yi - z[j]) / delta_z * p


Could you use delta_z = z[j + 1] - z[j] in this naive implementation?

toslunar · 2018-04-04T10:05:42Z

chainerrl/agents/categorical_dqn.py

+    scatter_add(
+        z_probs.ravel(),
+        (l.astype(xp.int32) + offset).ravel(),
+        (y_probs * (u - bj)).ravel())


(y_probs * (1 - (bj - l))).ravel()) could eliminate the treatment for the case l == u.
The reason why the authors of the paper use u = ceil(bj) seems to me no more than avoiding "z[n_atoms] += 0".

Wow, your solution is definitely better than mine!

toslunar · 2018-04-04T10:36:49Z

examples/gym/train_categorical_dqn_gym.py

+    n_atoms = 51
+    v_max = 500
+    v_min = 0
+    z_values = np.linspace(v_min, v_max, num=n_atoms, dtype=np.float32)


Could you consider moving this line into the package? (e.g. pass v_min, v_max, n_atoms to DistributionalFCStateQFunctionWithDiscreteAction.) z_values should be linspace anyway.

toslunar · 2018-04-05T08:06:11Z

chainerrl/agents/categorical_dqn.py

+        """Compute a loss of categorical DQN."""
+        y, t = self._compute_y_and_t(exp_batch, gamma)
+        # minimize the cross entropy
+        eltwise_loss = -t * F.log(F.clip(y, 1e-10, 1.))


Could you explain why F.clip is here?

I found clipping was necessary for training CategoricalDQN from earlier experiments. Without clipping, some probability values converges to 0, resulting in log(0) -> NaN.

Other unofficial implementations also apply clipping.
https://github.com/Kaixhin/Rainbow/blob/master/agent.py#L85
https://github.com/floringogianu/categorical-dqn/blob/master/policy_improvement/categorical_update.py#L53

Since clipping by 1e-10 worked, I didn't tune 1e-10 further. It is possible larger values may result in better performance.

I added a comment to explain why.

I found clipping was necessary for training CategoricalDQN from earlier experiments. Without clipping, some probability values converges to 0, resulting in log(0) -> NaN. Other unofficial implementations also apply clipping. https://github.com/Kaixhin/Rainbow/blob/master/agent.py#L85 https://github.com/floringogianu/categorical-dqn/blob/master/policy_improvement/categorical_update.py#L53 Since clipping by 1e-10 worked, I didn't tune 1e-10 further. It is possible larger values may result in better performance.

muupan · 2018-04-05T11:32:21Z

Thanks for the review. I fixed them.

toslunar

LGTM

seann999 and others added 30 commits March 4, 2018 18:17

(wip) distributional RL

78f78d9

speed up

d943388

works?

0625d2b

Implement Distribution.params

9d10126

Remove cv2 and unused code

e8c42a4

Revert render

d0f7457

Clean code

b22d74d

Fix comment and remove unused code

c6edde5

Merge branch 'master' into c51

691d8a0

Merge branch 'outdir' into c51

5090053

Make sure z_values is not Variable

83edf34

Add C51 to __init__.py

073986d

Add z_values as persistent for to_gpu to work

3653dd1

Remove MLPDistribution because it is just a MLP with reshape and softmax

2169136

Add a script to train C51 on ALE

a384203

Fix seed and style

6a97a40

Test C51 examples

058837c

Merge branch 'master' into c51

005c351

Add --final-epsilon and --eval-epsilon to train_dqn_ale.py

e866137

Add --final-epsilon and --eval-epsilon to train_c51_ale.py

60cb702

Merge remote-tracking branch 'muupan/save-best-so-far-agent' into c51

01e5fd8

Stop saving best-so-far agent to save disk space

892dbdb

Add --logging-level with INFO as default

27cda5f

Set max episode length

96339e1

Fix C51 impl

f2ede1d

Improve docstring

a6323cf

Revert "Set max episode length"

782ffea

This reverts commit 96339e1.

Avoid nan

a82f7ad

Add C51 tests

6c39a60

Use batch_accumulator='mean' as Kaixhin/Rainbow does

25cabb6

muupan added 2 commits March 28, 2018 12:54

Support cpu-only env

5254c36

Merge branch 'master' into c51

ecb9488

muupan added 6 commits March 29, 2018 17:51

Add tests for DistributionalDiscreteActionValue

8ad52db

Use q_values_formatter

b5a5164

Use np.linspace to compute z_values

2ee57f3

Add tests for categorical projection and fix its bug

b1d8817

Previous implementation does not work for out-of-bound values.

Rename C51 -> CategoricalDQN

4bec460

Merge branch 'categorical-dqn' into c51

9ccf11c

muupan changed the title ~~[WIP] C51~~ [WIP] Categorical DQN (C51) Apr 2, 2018

muupan added 2 commits April 2, 2018 17:11

Remove comment-out code

04a4f89

Use linspace

bdbdd9b

muupan added 3 commits April 2, 2018 18:42

Rename c51 -> categorical_dqn in test_examples

3298153

Implement categorical projection similarly with the paper while

ba36c34

dealing with a special case

Clarify differences in arguments from DQN in docstring

1e9d2e1

muupan changed the title ~~[WIP] Categorical DQN (C51)~~ Categorical DQN (C51) Apr 3, 2018

toslunar reviewed Apr 5, 2018

View reviewed changes

muupan added 6 commits April 5, 2018 18:24

Fix typo

25467ff

Fix style

e6177e1

Use z[j + 1] - z[j] as delta_z

8f8278c

Make categorical projection code simpler

8d28805

Pass v_min and v_max instead of z_values

78e429b

Fix typoe again: before -> after

e68f160

toslunar approved these changes Apr 6, 2018

View reviewed changes

toslunar merged commit e3e2c44 into chainer:master Apr 6, 2018

muupan added the enhancement label Jul 23, 2018

muupan added this to the v0.4 milestone Jul 23, 2018

Conversation

muupan commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Mar 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muupan commented Apr 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toslunar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muupan commented Apr 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toslunar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

muupan commented Mar 16, 2018 •

edited

Loading

muupan commented Mar 28, 2018 •

edited

Loading

muupan commented Apr 2, 2018 •

edited

Loading

muupan commented Apr 5, 2018 •

edited

Loading