Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
abc2223
Make --resume an option
Mar 27, 2020
69080a1
Fix issue
Mar 27, 2020
3afb8ac
Make load error an exception
Mar 27, 2020
dfb657b
Make StatsWriter erase existing tb files
Mar 27, 2020
3537c36
Make --force and check for existing models
Mar 27, 2020
b334404
Fix comment
Mar 27, 2020
c2cbef3
Remove directory deletion
Mar 27, 2020
57af197
Deprecate --train
Mar 27, 2020
8b35f15
Add tests
Mar 28, 2020
a45d2e4
Edit docs
Mar 28, 2020
146d6cb
Update changelog and migrating
Mar 28, 2020
48c274e
Don't clear TB on resume
Mar 30, 2020
c60057e
Test tensorboard clearing
Mar 30, 2020
1122ea3
Merge branch 'master' into develop-cliflags
Mar 30, 2020
cf05095
Added warning when cleaning out old events files
Mar 30, 2020
8460213
Merge branch 'develop-cliflags' of github.com:Unity-Technologies/ml-a…
Mar 30, 2020
a334f8b
Merge branch 'master' into develop-cliflags
Mar 30, 2020
bea21de
Fix merge
Mar 30, 2020
6e43ed0
Fix merge
Mar 30, 2020
91ca95d
Merge branch 'develop-cliflags' of github.com:Unity-Technologies/ml-a…
Mar 30, 2020
c59881e
Add initialization functionality to policy
Mar 30, 2020
d36cc74
Add ability to initialize from model to CLI
Mar 30, 2020
e0e07b1
Change docs and changelog
Mar 30, 2020
7f01203
Add tests
Mar 30, 2020
e47c260
Fix issue if init_path is None
Mar 31, 2020
33f5695
Update changelog with PR number
Mar 31, 2020
6f6d6f4
Change name of initialize_ckpt_path
Apr 1, 2020
e43c6e5
Capitalize run ID
Apr 1, 2020
e200918
Fix learn.py test
Apr 1, 2020
af9f7c7
Nicer error when model is found but can't load.
Apr 1, 2020
4353e5f
Added init_path to documentation
Apr 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
- Added ability to start training (initialize model weights) from a previous run ID. (#3710)

### Minor Changes
- Format of console output has changed slightly and now matches the name of the model/summary directory. (#3630, #3616)
Expand Down
8 changes: 8 additions & 0 deletions docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ If you've already trained a model using the specified `<run-identifier>` and `--
specified, you will not be able to continue with training. Use `--force` to force ML-Agents to
overwrite the existing data.

Alternatively, you might want to start a new training run but _initialize_ it using an already-trained
model. You may want to do this, for instance, if your environment changed and you want
a new model, but the old behavior is still better than random. You can do this by specifying `--initialize-from=<run-identifier>`, where `<run-identifier>` is the old run ID.

### Command Line Training Options

In addition to passing the path of the Unity executable containing your training
Expand Down Expand Up @@ -164,6 +168,9 @@ environment, you can set the following command line options when invoking
as the current agents in your scene.
* `--force`: Attempting to train a model with a run-id that has been used before will
throw an error. Use `--force` to force-overwrite this run-id's summary and model data.
* `--initialize-from=<run-identifier>`: Specify an old run-id here to initialize your model from
a previously trained model. Note that the previously saved models _must_ have the same behavior
parameters as your current environment.
* `--no-graphics`: Specify this option to run the Unity executable in
`-batchmode` and doesn't initialize the graphics driver. Use this only if your
training doesn't involve visual observations (reading from Pixels). See
Expand Down Expand Up @@ -226,6 +233,7 @@ example environments are included in the provided config file.
| train_interval | How often to update the agent. | SAC |
| num_update | Number of mini-batches to update the agent with during each update. | SAC |
| use_recurrent | Train using a recurrent neural network. See [Using Recurrent Neural Networks](Feature-Memory.md). | PPO, SAC |
| init_path | Initialize trainer from a previously saved model. | PPO, SAC |

\*PPO = Proximal Policy Optimization, SAC = Soft Actor-Critic, BC = Behavioral Cloning (Imitation), GAIL = Generative Adversarial Imitaiton Learning

Expand Down
11 changes: 11 additions & 0 deletions docs/Training-PPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,17 @@ Default Value: `0` (all)

Typical Range: Approximately equal to PPO's `buffer_size`

### (Optional) Advanced: Initialize Model Path

`init_path` can be specified to initialize your model from a previous run before starting.
Note that the prior run should have used the same trainer configurations as the current run,
and have been saved with the same version of ML-Agents. You should provide the full path
to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.

This option is provided in case you want to initialize different behaviors from different runs;
in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize
all models from the same run.

## Training Statistics

To view training statistics, use TensorBoard. For information on launching and
Expand Down
11 changes: 11 additions & 0 deletions docs/Training-SAC.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,17 @@ Typical Range (Continuous): `512` - `5120`

Typical Range (Discrete): `32` - `512`

### (Optional) Advanced: Initialize Model Path

`init_path` can be specified to initialize your model from a previous run before starting.
Note that the prior run should have used the same trainer configurations as the current run,
and have been saved with the same version of ML-Agents. You should provide the full path
to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.

This option is provided in case you want to initialize different behaviors from different runs;
in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize
all models from the same run.

## Training Statistics

To view training statistics, use TensorBoard. For information on launching and
Expand Down
20 changes: 16 additions & 4 deletions ml-agents/mlagents/trainers/learn.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,20 @@ def _create_parser():
default=False,
dest="force",
action="store_true",
help="Force-overwrite existing models and summaries for a run-id that has been used "
help="Force-overwrite existing models and summaries for a run ID that has been used "
"before.",
)
argparser.add_argument(
"--run-id",
default="ppo",
help="The directory name for model and summary statistics",
help="The run identifier for model and summary statistics.",
)
argparser.add_argument(
"--initialize-from",
metavar="RUN_ID",
default=None,
help="Specify a previously saved run ID from which to initialize the model from. "
"This can be used, for instance, to fine-tune an existing model on a new environment. ",
)
argparser.add_argument(
"--save-freq", default=50000, type=int, help="Frequency at which to save model"
Expand All @@ -113,7 +120,7 @@ def _create_parser():
dest="inference",
action="store_true",
help="Run in Python inference mode (don't train). Use with --resume to load a model trained with an "
"existing run-id.",
"existing run ID.",
)
argparser.add_argument(
"--base-port",
Expand Down Expand Up @@ -194,6 +201,7 @@ class RunOptions(NamedTuple):
seed: int = parser.get_default("seed")
env_path: Optional[str] = parser.get_default("env_path")
run_id: str = parser.get_default("run_id")
initialize_from: str = parser.get_default("initialize_from")
load_model: bool = parser.get_default("load_model")
resume: bool = parser.get_default("resume")
force: bool = parser.get_default("force")
Expand Down Expand Up @@ -268,6 +276,9 @@ def run_training(run_seed: int, options: RunOptions) -> None:
"""
with hierarchical_timer("run_training.setup"):
model_path = f"./models/{options.run_id}"
maybe_init_path = (
f"./models/{options.initialize_from}" if options.initialize_from else None
)
summaries_dir = "./summaries"
port = options.base_port

Expand All @@ -281,7 +292,7 @@ def run_training(run_seed: int, options: RunOptions) -> None:
],
)
handle_existing_directories(
model_path, summaries_dir, options.resume, options.force
model_path, summaries_dir, options.resume, options.force, maybe_init_path
)
tb_writer = TensorboardWriter(summaries_dir, clear_past_data=not options.resume)
gauge_write = GaugeWriter()
Expand Down Expand Up @@ -319,6 +330,7 @@ def run_training(run_seed: int, options: RunOptions) -> None:
not options.inference,
options.resume,
run_seed,
maybe_init_path,
maybe_meta_curriculum,
options.multi_gpu,
)
Expand Down
56 changes: 48 additions & 8 deletions ml-agents/mlagents/trainers/policy/tf_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ def __init__(self, seed, brain, trainer_parameters, load=False):
if self.use_continuous_act:
self.num_branches = self.brain.vector_action_space_size[0]
self.model_path = trainer_parameters["model_path"]
self.initialize_path = trainer_parameters.get("init_path", None)
self.keep_checkpoints = trainer_parameters.get("keep_checkpoints", 5)
self.graph = tf.Graph()
self.sess = tf.Session(
Expand Down Expand Up @@ -109,23 +110,52 @@ def _initialize_graph(self):
init = tf.global_variables_initializer()
self.sess.run(init)

def _load_graph(self):
def _load_graph(self, model_path: str, reset_global_steps: bool = False) -> None:
with self.graph.as_default():
self.saver = tf.train.Saver(max_to_keep=self.keep_checkpoints)
logger.info("Loading Model for brain {}".format(self.brain.brain_name))
ckpt = tf.train.get_checkpoint_state(self.model_path)
logger.info(
"Loading model for brain {} from {}.".format(
self.brain.brain_name, model_path
)
)
ckpt = tf.train.get_checkpoint_state(model_path)
if ckpt is None:
raise UnityPolicyException(
"The model {0} could not be loaded. Make "
"sure you specified the right "
"--run-id. and that the previous run you are resuming from had the same "
"behavior names.".format(self.model_path)
"--run-id and that the previous run you are loading from had the same "
"behavior names.".format(model_path)
)
try:
self.saver.restore(self.sess, ckpt.model_checkpoint_path)
except tf.errors.NotFoundError:
raise UnityPolicyException(
"The model {0} was found but could not be loaded. Make "
"sure the model is from the same version of ML-Agents, has the same behavior parameters, "
"and is using the same trainer configuration as the current run.".format(
model_path
)
)
if reset_global_steps:
logger.info(
"Starting training from step 0 and saving to {}.".format(
self.model_path
)
)
else:
logger.info(
"Resuming training from step {}.".format(self.get_current_step())
)
self.saver.restore(self.sess, ckpt.model_checkpoint_path)

def initialize_or_load(self):
if self.load:
self._load_graph()
# If there is an initialize path, load from that. Else, load from the set model path.
# If load is set to True, don't reset steps to 0. Else, do. This allows a user to,
# e.g., resume from an initialize path.
reset_steps = not self.load
if self.initialize_path is not None:
self._load_graph(self.initialize_path, reset_global_steps=reset_steps)
elif self.load:
self._load_graph(self.model_path, reset_global_steps=reset_steps)
else:
self._initialize_graph()

Expand Down Expand Up @@ -295,6 +325,16 @@ def get_current_step(self):
step = self.sess.run(self.global_step)
return step

def _set_step(self, step: int) -> int:
"""
Sets current model step to step without creating additional ops.
:param step: Step to set the current model step to.
:return: The step the model was set to.
"""
current_step = self.get_current_step()
# Increment a positive or negative number of steps.
return self.increment_step(step - current_step)

def increment_step(self, n_steps):
"""
Increments model step.
Expand Down
2 changes: 1 addition & 1 deletion ml-agents/mlagents/trainers/tests/test_learn.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def test_run_training(
None,
)
handle_dir_mock.assert_called_once_with(
"./models/ppo", "./summaries", False, False
"./models/ppo", "./summaries", False, False, None
)
StatsReporter.writers.clear() # make sure there aren't any writers as added by learn.py

Expand Down
49 changes: 47 additions & 2 deletions ml-agents/mlagents/trainers/tests/test_nn_policy.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import pytest
import os
from typing import Dict, Any

import numpy as np
from mlagents.tf_utils import tf
Expand Down Expand Up @@ -54,7 +56,14 @@ def dummy_config():
NUM_AGENTS = 12


def create_policy_mock(dummy_config, use_rnn, use_discrete, use_visual):
def create_policy_mock(
dummy_config: Dict[str, Any],
use_rnn: bool = False,
use_discrete: bool = True,
use_visual: bool = False,
load: bool = False,
seed: int = 0,
) -> NNPolicy:
mock_brain = mb.setup_mock_brain(
use_discrete,
use_visual,
Expand All @@ -66,10 +75,46 @@ def create_policy_mock(dummy_config, use_rnn, use_discrete, use_visual):
trainer_parameters = dummy_config
trainer_parameters["keep_checkpoints"] = 3
trainer_parameters["use_recurrent"] = use_rnn
policy = NNPolicy(0, mock_brain, trainer_parameters, False, False)
policy = NNPolicy(seed, mock_brain, trainer_parameters, False, load)
return policy


def test_load_save(dummy_config, tmp_path):
path1 = os.path.join(tmp_path, "runid1")
path2 = os.path.join(tmp_path, "runid2")
trainer_params = dummy_config
trainer_params["model_path"] = path1
policy = create_policy_mock(trainer_params)
policy.initialize_or_load()
policy.save_model(2000)

assert len(os.listdir(tmp_path)) > 0

# Try load from this path
policy2 = create_policy_mock(trainer_params, load=True, seed=1)
policy2.initialize_or_load()
_compare_two_policies(policy, policy2)

# Try initialize from path 1
trainer_params["model_path"] = path2
trainer_params["init_path"] = path1
policy3 = create_policy_mock(trainer_params, load=False, seed=2)
policy3.initialize_or_load()

_compare_two_policies(policy2, policy3)


def _compare_two_policies(policy1: NNPolicy, policy2: NNPolicy) -> None:
"""
Make sure two policies have the same output for the same input.
"""
step = mb.create_batchedstep_from_brainparams(policy1.brain, num_agents=1)
run_out1 = policy1.evaluate(step, list(step.agent_id))
run_out2 = policy2.evaluate(step, list(step.agent_id))

np.testing.assert_array_equal(run_out2["log_probs"], run_out1["log_probs"])


@pytest.mark.parametrize("discrete", [True, False], ids=["discrete", "continuous"])
@pytest.mark.parametrize("visual", [True, False], ids=["visual", "vector"])
@pytest.mark.parametrize("rnn", [True, False], ids=["rnn", "no_rnn"])
Expand Down
12 changes: 12 additions & 0 deletions ml-agents/mlagents/trainers/tests/test_trainer_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,3 +357,15 @@ def test_existing_directories(tmp_path):
trainer_util.handle_existing_directories(model_path, summary_path, True, False)
# Test try to train w/ force - should work
trainer_util.handle_existing_directories(model_path, summary_path, False, True)

# Test initialize option
init_path = os.path.join(tmp_path, "runid2")
with pytest.raises(UnityTrainerException):
trainer_util.handle_existing_directories(
model_path, summary_path, False, True, init_path
)
os.mkdir(init_path)
# Should pass since the directory exists now.
trainer_util.handle_existing_directories(
model_path, summary_path, False, True, init_path
)
Loading