Skip to content

Commit b5fd039

Browse files
authored
Merge pull request #246 from chainer/muupan-patch-1
Update the algorithm section of README
2 parents fd8259f + b55c9dd commit b5fd039

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,12 @@ For more information, you can refer to [ChainerRL's documentation](http://chaine
3939
|:----------|:---------------:|:----------------:|:---------------:|:------------------:|
4040
| DQN (including DoubleDQN etc.) || ✓ (NAF) || x |
4141
| DDPG | x ||| x |
42-
| A3C |||||
42+
| A3C |||||
4343
| ACER |||||
4444
| NSQ (N-step Q-learning) || ✓ (NAF) |||
4545
| PCL (Path Consistency Learning) |||||
46+
| PPO ||| x | x |
47+
| TRPO ||| x | x |
4648

4749
Following algorithms have been implemented in ChainerRL:
4850
- A3C (Asynchronous Advantage Actor-Critic)
@@ -53,6 +55,7 @@ Following algorithms have been implemented in ChainerRL:
5355
- PGT (Policy Gradient Theorem)
5456
- PCL (Path Consistency Learning)
5557
- PPO (Proximal Policy Optimization)
58+
- TRPO (Trust Region Policy Optimization)
5659

5760
Q-function based algorithms such as DQN can utilize a Normalized Advantage Function (NAF) to tackle continuous-action problems as well as DQN-like discrete output networks.
5861

0 commit comments

Comments
 (0)