Conversation
This reverts commit 96339e1.
Previous implementation does not work for out-of-bound values.
|
It is unclear to me that categorical projection should be implemented as Algorithm 1 or as (7) in the paper. Algorithm 1 seems wrong to me when b_j is an integer, so I handled the case when bj is an integer separately. |
chainerrl/action_value.py
Outdated
| import chainer | ||
| from chainer import cuda | ||
| from chainer import functions as F | ||
|
|
There was a problem hiding this comment.
Fix style: remove this empty line
chainerrl/agents/categorical_dqn.py
Outdated
| (batch_size, n_atoms). | ||
| y_probs (ndarray): Probabilities of atoms whose values are y. | ||
| Its shape must be (batch_size, n_atoms). | ||
| z (ndarray): Values of atoms before projection after projection. Its |
There was a problem hiding this comment.
Fix typo: Values of atoms before projection after projection.
| for j in range(n_atoms - 1): | ||
| if z[j] < yi <= z[j + 1]: | ||
| proj_probs[b, j] += (z[j + 1] - yi) / delta_z * p | ||
| proj_probs[b, j + 1] += (yi - z[j]) / delta_z * p |
There was a problem hiding this comment.
Could you use delta_z = z[j + 1] - z[j] in this naive implementation?
chainerrl/agents/categorical_dqn.py
Outdated
| scatter_add( | ||
| z_probs.ravel(), | ||
| (l.astype(xp.int32) + offset).ravel(), | ||
| (y_probs * (u - bj)).ravel()) |
There was a problem hiding this comment.
(y_probs * (1 - (bj - l))).ravel()) could eliminate the treatment for the case l == u.
The reason why the authors of the paper use u = ceil(bj) seems to me no more than avoiding "z[n_atoms] += 0".
There was a problem hiding this comment.
Wow, your solution is definitely better than mine!
| n_atoms = 51 | ||
| v_max = 500 | ||
| v_min = 0 | ||
| z_values = np.linspace(v_min, v_max, num=n_atoms, dtype=np.float32) |
There was a problem hiding this comment.
Could you consider moving this line into the package? (e.g. pass v_min, v_max, n_atoms to DistributionalFCStateQFunctionWithDiscreteAction.) z_values should be linspace anyway.
| """Compute a loss of categorical DQN.""" | ||
| y, t = self._compute_y_and_t(exp_batch, gamma) | ||
| # minimize the cross entropy | ||
| eltwise_loss = -t * F.log(F.clip(y, 1e-10, 1.)) |
There was a problem hiding this comment.
Could you explain why F.clip is here?
There was a problem hiding this comment.
I found clipping was necessary for training CategoricalDQN from earlier experiments. Without clipping, some probability values converges to 0, resulting in log(0) -> NaN.
Other unofficial implementations also apply clipping.
https://github.com/Kaixhin/Rainbow/blob/master/agent.py#L85
https://github.com/floringogianu/categorical-dqn/blob/master/policy_improvement/categorical_update.py#L53
Since clipping by 1e-10 worked, I didn't tune 1e-10 further. It is possible larger values may result in better performance.
There was a problem hiding this comment.
I added a comment to explain why.
I found clipping was necessary for training CategoricalDQN from earlier experiments. Without clipping, some probability values converges to 0, resulting in log(0) -> NaN. Other unofficial implementations also apply clipping. https://github.com/Kaixhin/Rainbow/blob/master/agent.py#L85 https://github.com/floringogianu/categorical-dqn/blob/master/policy_improvement/categorical_update.py#L53 Since clipping by 1e-10 worked, I didn't tune 1e-10 further. It is possible larger values may result in better performance.
|
Thanks for the review. I fixed them. |







Implement its ownClarify differences from DQN in the docstring__init__Merge #248 first.