Skip to content

Commit 315de4e

Browse files
authored
Merge pull request #469 from prabhatnagarajan/iqn_readme
Adds IQN Results to readme
2 parents f5509b9 + 4cb55d0 commit 315de4e

File tree

1 file changed

+94
-63
lines changed

1 file changed

+94
-63
lines changed

examples/atari/iqn/README.md

Lines changed: 94 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -24,68 +24,99 @@ To view the full list of options, either view the code or run the example with t
2424
## Results
2525
These results reflect ChainerRL `v0.6.0`. The ChainerRL score currently consists of a single run. The reported results from the IQN paper are also from a single run. We use the same evaluation protocol used in the IQN paper.
2626

27-
| Game | ChainerRL Score | Reported Scores |
27+
28+
| Results Summary ||
29+
| ------------- |:-------------:|
30+
| Number of seeds | 2 |
31+
| Number of common domains | 54 |
32+
| Number of domains where paper scores higher | 25 |
33+
| Number of domains where ChainerRL scores higher | 27 |
34+
| Number of ties between paper and ChainerRL | 2 |
35+
36+
37+
| Game | ChainerRL Score | Original Reported Scores |
2838
| ------------- |:-------------:|:-------------:|
29-
| AirRaid | N/A| N/A|
30-
| Alien | N/A| **7022**|
31-
| Amidar | N/A| **2946**|
32-
| Assault | N/A| **29091**|
33-
| Asterix | **464666.66** | 342016|
34-
| Asteroids | N/A| **2898**|
35-
| Atlantis | N/A| **978200**|
36-
| Bank Heist | N/A| **1416**|
37-
| Battlezone | N/A| **42244**|
38-
| Beamrider | 35525.0 | **42776**|
39-
| Berzerk | N/A| **1053**|
40-
| Bowling | N/A| **86.5**|
41-
| Boxing | N/A| **99.8**|
42-
| Breakout | **738.0**| 734|
43-
| Carnival | N/A| N/A|
44-
| Centipede | N/A| **11561**|
45-
| Chopper Command | N/A| **16836**|
46-
| Crazy Climber | N/A| **179082**|
47-
| Demon Attack | N/A| **128580**|
48-
| Double Dunk | N/A| **5.6**|
49-
| Elevator Action | N/A| N/A|
50-
| Enduro | N/A| **2359**|
51-
| Fishing Derby | N/A| **33.8**|
52-
| Freeway | N/A| **34.0**|
53-
| Frostbite | N/A| **4342**|
54-
| Gopher | N/A| **118365**|
55-
| Gravitar | N/A| **911**|
56-
| H.E.R.O. | N/A| **28386**|
57-
| Ice Hockey | N/A| **0.2**|
58-
| James Bond 007 | N/A| **35108**|
59-
| Journey Escape | N/A| N/A|
60-
| Kangaroo | N/A| **15487**|
61-
| Krull | N/A| **10707**|
62-
| Kung-Fu Master | N/A| **73512**|
63-
| Montezuma's Revenge | N/A| **0**|
64-
| Ms. Pac-Man | N/A| **6349**|
65-
| Name This Game | N/A| **22682**|
66-
| Phoenix | N/A| **56599**|
67-
| Pitfall II | N/A| N/A|
68-
| Pitfall! | N/A| **0.0**|
39+
| AirRaid | 9672.1| N/A|
40+
| Alien | **12484.3**| 7022|
41+
| Amidar | 2392.3| **2946**|
42+
| Assault | 24731.9| **29091**|
43+
| Asterix | **454846.7**| 342016|
44+
| Asteroids | **3885.9**| 2898|
45+
| Atlantis | 946912.5| **978200**|
46+
| BankHeist | 1326.3| **1416**|
47+
| BattleZone | **69316.2**| 42244|
48+
| BeamRider | 38111.4| **42776**|
49+
| Berzerk | **138167.9**| 1053|
50+
| Bowling | 84.3| **86.5**|
51+
| Boxing | **99.9**| 99.8|
52+
| Breakout | 658.6| **734**|
53+
| Carnival | 5267.2| N/A|
54+
| Centipede | 11265.2| **11561**|
55+
| ChopperCommand | **43466.9**| 16836|
56+
| CrazyClimber | 178111.6| **179082**|
57+
| DemonAttack | **134637.5**| 128580|
58+
| DoubleDunk | **8.3**| 5.6|
59+
| Enduro | **2363.3**| 2359|
60+
| FishingDerby | **39.3**| 33.8|
61+
| Freeway | **34.0**| **34.0**|
62+
| Frostbite | **8531.3**| 4342|
63+
| Gopher | 116037.5| **118365**|
64+
| Gravitar | **1010.8**| 911|
65+
| Hero | 27639.9| **28386**|
66+
| IceHockey | -0.3| **0.2**|
67+
| Jamesbond | 27959.5| **35108**|
68+
| JourneyEscape | -685.6| N/A|
69+
| Kangaroo | **15517.7**| 15487|
70+
| Krull | 9809.3| **10707**|
71+
| KungFuMaster | **87566.3**| 73512|
72+
| MontezumaRevenge | **0.6**| 0.0|
73+
| MsPacman | 5786.5| **6349**|
74+
| NameThisGame | **23151.3**| 22682|
75+
| Phoenix | **145318.8**| 56599|
76+
| Pitfall | 0.0| N/A|
77+
| Pitfall! | N/A| 0.0|
6978
| Pong | **21.0**| **21.0**|
70-
| Pooyan | N/A| N/A|
71-
| Private Eye | N/A| **200**|
72-
| Qbert | **25971.3**| 25750|
73-
| River Raid | N/A| **17765**|
74-
| Road Runner | N/A| **57900**|
75-
| Robot Tank | N/A| **62.5**|
76-
| Seaquest | **31670.0**| 30140|
77-
| Skiing | N/A| **-9289**|
78-
| Solaris | N/A| **8007**|
79-
| Space Invaders | **43649.0**| 28888|
80-
| Stargunner | N/A| **74677**|
81-
| Tennis | N/A| **23.6**|
82-
| Time Pilot | N/A| **12236**|
83-
| Tutankham | N/A| **293**|
84-
| Up’n Down | N/A| **88148**|
85-
| Venture | N/A| **1318**|
86-
| Video Pinball | N/A| **698045**|
87-
| WizardOfWor | N/A| **31190**|
88-
| YarsRevenge | N/A| **28379**|
89-
| Zaxxon | N/A| **21772**|
90-
91-
79+
| Pooyan | 28041.5| N/A|
80+
| PrivateEye | **289.9**| 200|
81+
| Qbert | 24950.3| **25750**|
82+
| Riverraid | **20716.1**| 17765|
83+
| RoadRunner | **63523.6**| 57900|
84+
| Robotank | **77.1**| 62.5|
85+
| Seaquest | 27045.5| **30140**|
86+
| Skiing | -9354.7| **-9289**|
87+
| Solaris | 7423.3| **8007**|
88+
| SpaceInvaders | 27810.9| **28888**|
89+
| StarGunner | **189208.0**| 74677|
90+
| Tennis | **23.8**| 23.6|
91+
| TimePilot | **12758.3**| 12236|
92+
| Tutankham | **337.4**| 293|
93+
| UpNDown | 83140.0| **88148**|
94+
| Venture | 289.0| **1318**|
95+
| VideoPinball | 664013.5| **698045**|
96+
| WizardOfWor | 20892.8| **31190**|
97+
| YarsRevenge | **30385.0**| 28379|
98+
| Zaxxon | 14754.4| **21772**|
99+
100+
101+
## Evaluation Protocol
102+
Our evaluation protocol is designed to mirror the evaluation protocol of the original paper as closely as possible, in order to offer a fair comparison of the quality of our example implementation. Specifically, the details of our evaluation (also can be found in the code) are the following:
103+
104+
- **Evaluation Frequency**: The agent is evaluated after every 1 million frames (250K timesteps). This results in a total of 200 "intermediate" evaluations.
105+
- **Evaluation Phase**: The agent is evaluated for 500K frames (125K timesteps) in each intermediate evaluation.
106+
- **Output**: The output of an intermediate evaluation phase is a score representing the mean score of all completed evaluation episodes within the 125K timesteps. If there is any unfinished episode by the time the 125K timestep evaluation phase is finished, that episode is discarded.
107+
- **Intermediate Evaluation Episode**:
108+
- Capped at 30 mins of play, or 108K frames/ 27K timesteps.
109+
- Each evaluation episode begins with a random number of no-ops (up to 30), where this number is chosen uniformly at random.
110+
- **Reporting**: For each run of our IQN example, we take the best outputted score of the intermediate evaluations to be the evaluation for that agent. We then average this over all runs (i.e. seeds) to produce the output reported in the table.
111+
112+
113+
## Training times
114+
115+
| Statistic | | |
116+
| ------------- |:-------------:|:-------------:|
117+
| Mean time (in days) across all domains | 9.20622761441 |
118+
| Fastest Domain | YarsRevenge | 8.51702608487 |
119+
| Slowest Domain | Freeway | 10.0143018948 |
120+
121+
122+

0 commit comments

Comments
 (0)