This project will no longer be maintained by Intel.
Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
Intel no longer accepts patches to this project.
If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.
This repository contains the code used to obtain the experimental results in the paper Modeling and Optimization Trade-off in Meta-learning, Gao and Sener (NeurIPS 2020).
It is based on the full_code branch of the ProMP repository.
The code is written in Python 3. The part corresponding to the linear regression experiment only requires NumPy, while the part corresponding to the reinforcement learning experiments also requires Tensorflow and the Mujoco physics engine. Some of the reinforcement learning environments can be found in this repository, and the rest are from MetaWorld.
Please follow the installation instructions provided by the ProMP repository and the MetaWorld repository. For the latter, please use the api-rework branch for compatibility (this has already been added to requirements.txt).
Execute
python3 linear_regression/run_experiment.py --p 1 --beta 2 --seed 1
The figures can then be found in the folder p-1_beta-2_seed-1/figures
.
To create all the executable scripts that we need to run, execute
python3 experiments/benchmark/run.py
They will be found in the folder scripts
.
The training scripts are of the form algorithm_environment_mode_seed.sh
, and the testing scripts are of the form test_algorithm_environment_mode_seed_checkpoint.sh
.
algorithm
is replaced byppo
(DRS+PPO),promp
(ProMP),trpo
(DRS+TRPO),trpomaml
(TRPO-MAML).environment
andmode
are replaced bywalker
andparams-interpolate
(Walker2DRandParams)walker
andgoal-interpolate
(Walker2DRandVel)cheetah
andgoal-interpolate
(HalfCheetahRandVel)hopper
andparams-interpolate
(HopperRandParams)metaworld
andml1-push
(ML1-Push)metaworld
andml1-reach
(ML1-Reach)metaworld
andml10
(ML10)metaworld
andml45
(ML45)
seed
, the random seed, is replaced by integers 1-5.checkpoint
, the policies stored at various stages during training, is replaced by integers 0-20.
After all runs are finished, the figures can be created by executing
python3 experiments/benchmark/summary.py
They will be found in the folder results
.
We would like to thank Charles Packer for help during the creation of the code for the reinforcement learning experiments.
To cite this repository in your research, please reference the following paper:
Katelyn Gao and Ozan Sener. Modeling and Optimization Trade-off in Meta-Learning. arXiv preprint arXiv:2010.12916 (2020).
@misc{GaoSener2020,
Author = {Katelyn Gao and Ozan Sener},
Title = {Modeling and Optimization Trade-off in Meta-Learning},
Year = {2020},
Eprint = {arXiv:2010.12916},
}