Skip to content

Add Trajectory and Policy Queues #3113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jan 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
baf86b2
Add advance method and policy queue
Dec 20, 2019
bbcb5a4
Merge branch 'master' into develop-queue
Dec 20, 2019
884b65f
Add trajectory queue
Dec 21, 2019
aae95fb
Remove trainer from agent_processor
Dec 21, 2019
5f0a353
Fix tests
Dec 21, 2019
237c55e
Add timer for process trajectory
Dec 21, 2019
b2acd97
Remove trainer from constructor
Dec 21, 2019
4244d53
Fix remaining tests
Dec 21, 2019
a3e0284
Move some logic to helper function
Dec 23, 2019
5198582
Move stepping logic into advance() function (#3124)
Jan 2, 2020
fd868bf
Make AgentManager separate class
Jan 2, 2020
3b70207
Merge branch 'master' into develop-queue
Jan 3, 2020
c0a3c3c
Fix tests
Jan 3, 2020
95af49c
Fix more tests
Jan 3, 2020
c2f1ae3
Clean up Trainer interface
Jan 3, 2020
0523144
Fix RLTrainer Test
Jan 3, 2020
8725c59
Additional cleanup
Jan 3, 2020
9272df6
queue type cleanup (#3152)
Jan 3, 2020
f1fccef
Fix some typing and make constructor explicit
Jan 3, 2020
8c6378e
Merge branch 'master' of github.com:Unity-Technologies/ml-agents into…
Jan 3, 2020
14a0336
Clean up get_max_steps and remove dead method
Jan 3, 2020
7126882
Update migrating doc
Jan 3, 2020
751968c
Add queue exception and fix issue with get_max_steps
Jan 3, 2020
2e912bd
Disable Pylint for RLTrainer
Jan 3, 2020
ffd9299
Fix AgentManagerQueue in tests
Jan 3, 2020
d13dacd
Merge branch 'master' into develop-queue
Jan 3, 2020
0514f95
Fix stats reporting for curriculum lesson
Jan 3, 2020
b15492d
Revert "Fix stats reporting for curriculum lesson"
Jan 3, 2020
224bc0f
Fix lesson number reporting
Jan 3, 2020
5889e7e
Ignore mismatched brain_names in curricula
Jan 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 35 additions & 36 deletions config/sac_trainer_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ default:
init_entcoef: 1.0
learning_rate: 3.0e-4
learning_rate_schedule: constant
max_steps: 5.0e4
max_steps: 5.0e5
memory_size: 256
normalize: false
num_update: 1
train_interval: 1
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
summary_freq: 10000
tau: 0.005
use_recurrent: false
vis_encode_type: simple
Expand All @@ -28,73 +28,73 @@ FoodCollector:
normalize: false
batch_size: 256
buffer_size: 500000
max_steps: 1.0e5
max_steps: 2.0e6
init_entcoef: 0.05
train_interval: 1

Bouncer:
normalize: true
max_steps: 5.0e5
max_steps: 2.0e7
num_layers: 2
hidden_units: 64
summary_freq: 1000
summary_freq: 20000

PushBlock:
max_steps: 5.0e4
max_steps: 1.5e7
init_entcoef: 0.05
hidden_units: 256
summary_freq: 2000
summary_freq: 60000
time_horizon: 64
num_layers: 2

SmallWallJump:
max_steps: 1.0e6
max_steps: 3e7
hidden_units: 256
summary_freq: 2000
summary_freq: 20000
time_horizon: 128
init_entcoef: 0.1
num_layers: 2
normalize: false

BigWallJump:
max_steps: 1.0e6
max_steps: 3e7
hidden_units: 256
summary_freq: 2000
summary_freq: 20000
time_horizon: 128
num_layers: 2
init_entcoef: 0.1
normalize: false

Striker:
max_steps: 5.0e5
max_steps: 5.0e6
learning_rate: 1e-3
hidden_units: 256
summary_freq: 2000
summary_freq: 20000
time_horizon: 128
init_entcoef: 0.1
num_layers: 2
normalize: false

Goalie:
max_steps: 5.0e5
max_steps: 5.0e6
learning_rate: 1e-3
hidden_units: 256
summary_freq: 2000
summary_freq: 20000
time_horizon: 128
init_entcoef: 0.1
num_layers: 2
normalize: false

Pyramids:
summary_freq: 2000
summary_freq: 30000
time_horizon: 128
batch_size: 128
buffer_init_steps: 10000
buffer_size: 500000
hidden_units: 256
num_layers: 2
init_entcoef: 0.01
max_steps: 5.0e5
max_steps: 1.0e7
sequence_length: 16
tau: 0.01
use_recurrent: false
Expand All @@ -115,7 +115,7 @@ VisualPyramids:
hidden_units: 256
buffer_init_steps: 1000
num_layers: 1
max_steps: 5.0e5
max_steps: 1.0e7
buffer_size: 500000
init_entcoef: 0.01
tau: 0.01
Expand All @@ -134,21 +134,21 @@ VisualPyramids:
normalize: true
batch_size: 64
buffer_size: 12000
summary_freq: 1000
summary_freq: 12000
time_horizon: 1000
hidden_units: 64
init_entcoef: 0.5

3DBallHard:
normalize: true
batch_size: 256
summary_freq: 1000
summary_freq: 12000
time_horizon: 1000

Tennis:
buffer_size: 500000
normalize: true
max_steps: 2e5
max_steps: 4e6

CrawlerStatic:
normalize: true
Expand All @@ -157,8 +157,8 @@ CrawlerStatic:
train_interval: 2
buffer_size: 500000
buffer_init_steps: 2000
max_steps: 5e5
summary_freq: 3000
max_steps: 5e6
summary_freq: 30000
init_entcoef: 1.0
num_layers: 3
hidden_units: 512
Expand All @@ -172,10 +172,10 @@ CrawlerDynamic:
time_horizon: 1000
batch_size: 256
buffer_size: 500000
summary_freq: 3000
summary_freq: 30000
train_interval: 2
num_layers: 3
max_steps: 1e6
max_steps: 1e7
hidden_units: 512
reward_signals:
extrinsic:
Expand All @@ -187,8 +187,8 @@ Walker:
time_horizon: 1000
batch_size: 256
buffer_size: 500000
max_steps: 2e6
summary_freq: 3000
max_steps: 2e7
summary_freq: 30000
num_layers: 4
train_interval: 2
hidden_units: 512
Expand All @@ -202,16 +202,16 @@ Reacher:
time_horizon: 1000
batch_size: 128
buffer_size: 500000
max_steps: 2e5
summary_freq: 3000
max_steps: 2e7
summary_freq: 60000

Hallway:
sequence_length: 32
num_layers: 2
hidden_units: 128
memory_size: 256
init_entcoef: 0.1
max_steps: 5.0e5
max_steps: 1.0e7
summary_freq: 1000
time_horizon: 64
use_recurrent: true
Expand All @@ -223,8 +223,7 @@ VisualHallway:
memory_size: 256
gamma: 0.99
batch_size: 64
max_steps: 5.0e5
summary_freq: 1000
max_steps: 1.0e7
time_horizon: 64
use_recurrent: true

Expand All @@ -237,8 +236,8 @@ VisualPushBlock:
gamma: 0.99
buffer_size: 1024
batch_size: 64
max_steps: 5.0e5
summary_freq: 1000
max_steps: 3.0e6
summary_freq: 60000
time_horizon: 64

GridWorld:
Expand All @@ -249,8 +248,8 @@ GridWorld:
init_entcoef: 0.5
buffer_init_steps: 1000
buffer_size: 50000
max_steps: 50000
summary_freq: 2000
max_steps: 500000
summary_freq: 20000
time_horizon: 5
reward_signals:
extrinsic:
Expand Down
Loading