Merge pull request #246 from chainer/muupan-patch-1

toslunar · web-flow · commit b5fd039adaa4 · 2018-03-15T18:01:43.000+09:00
Update the algorithm section of README
diff --git a/README.md b/README.md
@@ -39,10 +39,12 @@ For more information, you can refer to [ChainerRL's documentation](http://chaine
 |:----------|:---------------:|:----------------:|:---------------:|:------------------:|
 | DQN (including DoubleDQN etc.) | ✓ | ✓ (NAF) | ✓ | x |
 | DDPG | x | ✓ | ✓ | x |
-| A3C | ✓ | ✓ | ✓ | ✓ |
+| A3C  | ✓ | ✓ | ✓ | ✓ |
 | ACER | ✓ | ✓ | ✓ | ✓ |
 | NSQ (N-step Q-learning) | ✓ | ✓ (NAF) | ✓ | ✓ |
 | PCL (Path Consistency Learning) | ✓ | ✓ | ✓ | ✓ |
+| PPO  | ✓ | ✓ | x | x |
+| TRPO | ✓ | ✓ | x | x |
 
 Following algorithms have been implemented in ChainerRL:
 - A3C (Asynchronous Advantage Actor-Critic)
@@ -53,6 +55,7 @@ Following algorithms have been implemented in ChainerRL:
 - PGT (Policy Gradient Theorem)
 - PCL (Path Consistency Learning)
 - PPO (Proximal Policy Optimization)
+- TRPO (Trust Region Policy Optimization)
 
 Q-function based algorithms such as DQN can utilize a Normalized Advantage Function (NAF) to tackle continuous-action problems as well as DQN-like discrete output networks.