Skip to content

[refactor] CLI changes #3705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]
### Major Changes
- The `--load` and `--train` command-line flags have been deprecated. Training now happens by default, and
use `--resume` to resume training instead. (#3705)
- The Jupyter notebooks have been removed from the repository.
- Introduced the `SideChannelUtils` to register, unregister and access side channels.
- `Academy.FloatProperties` was removed, please use `SideChannelUtils.GetSideChannel<FloatPropertiesChannel>()` instead.
Expand All @@ -23,6 +25,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Environment subprocesses now close immediately on timeout or wrong API version. (#3679)
- Fixed an issue in the gym wrapper that would raise an exception if an Agent called EndEpisode multiple times in the same step. (#3700)
- Fixed an issue where exceptions from environments provided a returncode of 0. (#3680)
- Running `mlagents-learn` with the same `--run-id` twice will no longer overwrite the existing files. (#3705)
- Fixed an issue where logging output was not visible; logging levels are now set consistently (#3703).

## [0.15.0-preview] - 2020-03-18
Expand Down
19 changes: 11 additions & 8 deletions docs/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,19 +197,17 @@ which accepts arguments used to configure both training and inference phases.
2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
**Note**: If you followed the default [installation](Installation.md), then
you should be able to run `mlagents-learn` from any directory.
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier> --train`
3. Run `mlagents-learn <trainer-config-path> --run-id=<run-identifier>`
where:
- `<trainer-config-path>` is the relative or absolute filepath of the
trainer configuration. The defaults used by example environments included
in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- `<run-identifier>` is a string used to separate the results of different
training runs
- `--train` tells `mlagents-learn` to run a training session (rather
than inference)
training runs. Make sure to use one that hasn't been used already!
4. If you cloned the ML-Agents repo, then you can simply run

```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
mlagents-learn config/trainer_config.yaml --run-id=firstRun
```

5. When the message _"Start training by pressing the Play button in the Unity
Expand All @@ -219,7 +217,6 @@ which accepts arguments used to configure both training and inference phases.
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
environment first.

The `--train` flag tells the ML-Agents toolkit to run in training mode.
The `--time-scale=100` sets the `Time.TimeScale` value in Unity.

**Note**: You can train using an executable rather than the Editor. To do so,
Expand Down Expand Up @@ -330,8 +327,14 @@ Either wait for the training process to close the window or press Ctrl+C at the
command-line prompt. If you close the window manually, the `.nn` file
containing the trained model is not exported into the ml-agents folder.

You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<behavior_name>.nn` where
If you've quit the training early using Ctrl+C and want to resume training, run the
same command again, appending the `--resume` flag:

```sh
mlagents-learn config/trainer_config.yaml --run-id=firstRun --resume
```

Your trained model will be at `models/<run-identifier>/<behavior_name>.nn` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding to the model.
(**Note:** There is a known bug on Windows that causes the saving of the model to
fail when you early terminate the training, it's recommended to wait until Step
Expand Down
2 changes: 1 addition & 1 deletion docs/Learning-Environment-Create-New.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ in this simple environment, speeds up training.
To train in the editor, run the following Python command from a Terminal or Console
window before pressing play:

mlagents-learn config/config.yaml --run-id=RollerBall-1 --train
mlagents-learn config/config.yaml --run-id=RollerBall-1

(where `config.yaml` is a copy of `trainer_config.yaml` that you have edited
to change the `batch_size` and `buffer_size` hyperparameters for your trainer.)
Expand Down
8 changes: 3 additions & 5 deletions docs/Learning-Environment-Executable.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,27 +76,25 @@ env = UnityEnvironment(file_name=<env_name>)
followed the default [installation](Installation.md), then navigate to the
`ml-agents/` folder.
3. Run
`mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier> --train`
`mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>`
Where:
* `<trainer-config-file>` is the file path of the trainer configuration yaml
* `<env_name>` is the name and path to the executable you exported from Unity
(without extension)
* `<run-identifier>` is a string used to separate the results of different
training runs
* And the `--train` tells `mlagents-learn` to run a training session (rather
than inference)

For example, if you are training with a 3DBall executable you exported to the
the directory where you installed the ML-Agents Toolkit, run:

```sh
mlagents-learn ../config/trainer_config.yaml --env=3DBall --run-id=firstRun --train
mlagents-learn ../config/trainer_config.yaml --env=3DBall --run-id=firstRun
```

And you should see something like

```console
ml-agents$ mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train
ml-agents$ mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run


▄▄▄▓▓▓▓
Expand Down
8 changes: 7 additions & 1 deletion docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ The versions can be found in
## Migrating from 0.15 to latest

### Important changes
* The `--load` and `--train` command-line flags have been deprecated and replaced with `--resume` and `--inference`.
* Running with the same `--run-id` twice will now throw an error.

### Steps to Migrate
* Replace the `--load` flag with `--resume` when calling `mlagents-learn`, and don't use the `--train` flag as training
will happen by default. To run without training, use `--inference`.
* To force-overwrite files from a pre-existing run, add the `--force` command-line flag.
* The Jupyter notebooks have been removed from the repository.
* `Academy.FloatProperties` was removed.
* `Academy.RegisterSideChannel` and `Academy.UnregisterSideChannel` were removed.
Expand All @@ -19,7 +26,6 @@ The versions can be found in
* Replace `Academy.RegisterSideChannel` with `SideChannelUtils.RegisterSideChannel()`.
* Replace `Academy.UnregisterSideChannel` with `SideChannelUtils.UnregisterSideChannel`.


## Migrating from 0.14 to 0.15

### Important changes
Expand Down
2 changes: 1 addition & 1 deletion docs/Training-Curriculum-Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ for our curricula and PPO will train using Curriculum Learning. For example,
to train agents in the Wall Jump environment with curriculum learning, we can run:

```sh
mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/wall_jump.yaml --run-id=wall-jump-curriculum --train
mlagents-learn config/trainer_config.yaml --curriculum=config/curricula/wall_jump.yaml --run-id=wall-jump-curriculum
```

We can then keep track of the current lessons and progresses via TensorBoard.
2 changes: 1 addition & 1 deletion docs/Training-Environment-Parameter-Randomization.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ sampling setup, we would run

```sh
mlagents-learn config/trainer_config.yaml --sampler=config/3dball_randomize.yaml
--run-id=3D-Ball-randomize --train
--run-id=3D-Ball-randomize
```

We can observe progress and metrics via Tensorboard.
38 changes: 29 additions & 9 deletions docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ training options.
The basic command for training is:

```sh
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier> --train
mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>
```

where
Expand All @@ -68,7 +68,7 @@ contains agents ready to train. To perform the training:
environment you built in step 1:

```sh
mlagents-learn config/trainer_config.yaml --env=../../projects/Cats/CatsOnBicycles.app --run-id=cob_1 --train
mlagents-learn config/trainer_config.yaml --env=../../projects/Cats/CatsOnBicycles.app --run-id=cob_1
```

During a training session, the training program prints out and saves updates at
Expand All @@ -92,9 +92,27 @@ under the assigned run-id — in the cats example, the path to the model would b
`models/cob_1/CatsOnBicycles_cob_1.nn`.

While this example used the default training hyperparameters, you can edit the
[training_config.yaml file](#training-config-file) with a text editor to set
[trainer_config.yaml file](#training-config-file) with a text editor to set
different values.

To interrupt training and save the current progress, hit Ctrl+C once and wait for the
model to be saved out.

### Loading an Existing Model

If you've quit training early using Ctrl+C, you can resume the training run by running
`mlagents-learn` again, specifying the same `<run-identifier>` and appending the `--resume` flag
to the command.

You can also use this mode to run inference of an already-trained model in Python.
Append both the `--resume` and `--inference` to do this. Note that if you want to run
inference in Unity, you should use the
[Unity Inference Engine](Getting-started#Running-a-pre-trained-model).

If you've already trained a model using the specified `<run-identifier>` and `--resume` is not
specified, you will not be able to continue with training. Use `--force` to force ML-Agents to
overwrite the existing data.

### Command Line Training Options

In addition to passing the path of the Unity executable containing your training
Expand All @@ -115,7 +133,7 @@ environment, you can set the following command line options when invoking
training. Defaults to 0.
* `--num-envs=<n>`: Specifies the number of concurrent Unity environment instances to
collect experiences from when training. Defaults to 1.
* `--run-id=<path>`: Specifies an identifier for each training run. This
* `--run-id=<run-identifier>`: Specifies an identifier for each training run. This
identifier is used to name the subdirectories in which the trained model and
summary statistics are saved as well as the saved model itself. The default id
is "ppo". If you use TensorBoard to view the training statistics, always set a
Expand All @@ -137,13 +155,15 @@ environment, you can set the following command line options when invoking
will use the port `(base_port + worker_id)`, where the `worker_id` is sequential IDs
given to each instance from 0 to `num_envs - 1`. Default is 5005. __Note:__ When
training using the Editor rather than an executable, the base port will be ignored.
* `--train`: Specifies whether to train model or only run in inference mode.
When training, **always** use the `--train` option.
* `--load`: If set, the training code loads an already trained model to
* `--inference`: Specifies whether to only run in inference mode. Omit to train the model.
To load an existing model, specify a run-id and combine with `--resume`.
* `--resume`: If set, the training code loads an already trained model to
initialize the neural network before training. The learning code looks for the
model in `models/<run-id>/` (which is also where it saves models at the end of
training). When not set (the default), the neural network weights are randomly
initialized and an existing model is not loaded.
training). This option only works when the models exist, and have the same behavior names
as the current agents in your scene.
* `--force`: Attempting to train a model with a run-id that has been used before will
throw an error. Use `--force` to force-overwrite this run-id's summary and model data.
* `--no-graphics`: Specify this option to run the Unity executable in
`-batchmode` and doesn't initialize the graphics driver. Use this only if your
training doesn't involve visual observations (reading from Pixels). See
Expand Down
61 changes: 53 additions & 8 deletions ml-agents/mlagents/trainers/learn.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@
from mlagents import tf_utils
from mlagents.trainers.trainer_controller import TrainerController
from mlagents.trainers.meta_curriculum import MetaCurriculum
from mlagents.trainers.trainer_util import load_config, TrainerFactory
from mlagents.trainers.trainer_util import (
load_config,
TrainerFactory,
handle_existing_directories,
)
from mlagents.trainers.stats import (
TensorboardWriter,
CSVWriter,
Expand Down Expand Up @@ -68,7 +72,22 @@ def _create_parser():
default=False,
dest="load_model",
action="store_true",
help="Whether to load the model or randomly initialize",
help=argparse.SUPPRESS, # Deprecated but still usable for now.
)
argparser.add_argument(
"--resume",
default=False,
dest="resume",
action="store_true",
help="Resumes training from a checkpoint. Specify a --run-id to use this option.",
)
argparser.add_argument(
"--force",
default=False,
dest="force",
action="store_true",
help="Force-overwrite existing models and summaries for a run-id that has been used "
"before.",
)
argparser.add_argument(
"--run-id",
Expand All @@ -86,7 +105,15 @@ def _create_parser():
default=False,
dest="train_model",
action="store_true",
help="Whether to train model, or only run inference",
help=argparse.SUPPRESS,
)
argparser.add_argument(
"--inference",
default=False,
dest="inference",
action="store_true",
help="Run in Python inference mode (don't train). Use with --resume to load a model trained with an "
"existing run-id.",
)
argparser.add_argument(
"--base-port",
Expand Down Expand Up @@ -168,7 +195,10 @@ class RunOptions(NamedTuple):
env_path: Optional[str] = parser.get_default("env_path")
run_id: str = parser.get_default("run_id")
load_model: bool = parser.get_default("load_model")
resume: bool = parser.get_default("resume")
force: bool = parser.get_default("force")
train_model: bool = parser.get_default("train_model")
inference: bool = parser.get_default("inference")
save_freq: int = parser.get_default("save_freq")
keep_checkpoints: int = parser.get_default("keep_checkpoints")
base_port: int = parser.get_default("base_port")
Expand Down Expand Up @@ -205,7 +235,8 @@ def from_argparse(args: argparse.Namespace) -> "RunOptions":
argparse_args["sampler_config"] = load_config(
argparse_args["sampler_file_path"]
)

# Keep deprecated --load working, TODO: remove
argparse_args["resume"] = argparse_args["resume"] or argparse_args["load_model"]
# Since argparse accepts file paths in the config options which don't exist in CommandLineOptions,
# these keys will need to be deleted to use the **/splat operator below.
argparse_args.pop("sampler_file_path")
Expand Down Expand Up @@ -249,7 +280,10 @@ def run_training(run_seed: int, options: RunOptions) -> None:
"Environment/Episode Length",
],
)
tb_writer = TensorboardWriter(summaries_dir)
handle_existing_directories(
model_path, summaries_dir, options.resume, options.force
)
tb_writer = TensorboardWriter(summaries_dir, clear_past_data=not options.resume)
gauge_write = GaugeWriter()
console_writer = ConsoleWriter()
StatsReporter.add_writer(tb_writer)
Expand Down Expand Up @@ -282,8 +316,8 @@ def run_training(run_seed: int, options: RunOptions) -> None:
options.run_id,
model_path,
options.keep_checkpoints,
options.train_model,
options.load_model,
not options.inference,
options.resume,
run_seed,
maybe_meta_curriculum,
options.multi_gpu,
Expand All @@ -296,7 +330,7 @@ def run_training(run_seed: int, options: RunOptions) -> None:
options.run_id,
options.save_freq,
maybe_meta_curriculum,
options.train_model,
not options.inference,
run_seed,
sampler_manager,
resampling_interval,
Expand Down Expand Up @@ -424,6 +458,17 @@ def run_cli(options: RunOptions) -> None:
logger.debug("Configuration for this run:")
logger.debug(json.dumps(options._asdict(), indent=4))

# Options deprecation warnings
if options.load_model:
logger.warning(
"The --load option has been deprecated. Please use the --resume option instead."
)
if options.train_model:
logger.warning(
"The --train option has been deprecated. Train mode is now the default. Use "
"--inference to run in inference mode."
)

run_seed = options.seed
if options.cpu:
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
Expand Down
7 changes: 4 additions & 3 deletions ml-agents/mlagents/trainers/policy/tf_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,11 @@ def _load_graph(self):
logger.info("Loading Model for brain {}".format(self.brain.brain_name))
ckpt = tf.train.get_checkpoint_state(self.model_path)
if ckpt is None:
logger.info(
"The model {0} could not be found. Make "
raise UnityPolicyException(
"The model {0} could not be loaded. Make "
"sure you specified the right "
"--run-id".format(self.model_path)
"--run-id. and that the previous run you are resuming from had the same "
"behavior names.".format(self.model_path)
)
self.saver.restore(self.sess, ckpt.model_checkpoint_path)

Expand Down
Loading