for the paper Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance.
(Accepted for MODELS 2025.)
Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. However, RL methods exhibit performance issues in complex problems. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a sensible trade-off between the certainty and timeliness of human advice, our method takes a firm step towards machine learning-driven human-in-the-loop engineering methods.
- Content description
- Reproduction of analysis
- Reproduction of experimental data
- Experiment setup
- Results
01-advice
- Contains all the experimental artifacts and visualizations used in our experiments (map, advice files, and advice visualized on the map)02-data
- Contains experimental data produced in accordance with theExperiment settings
randomRewardData.csv
- Cumulative rewards of a random-walk agentunadvisedRewardData.csv
- Cumulative rewards of an unadvised (but not random) agentallRewardData.csv
- Cumulative rewards of an agent advised by information about every stateholesAndGoalRewardData.csv
- Cumulative rewards of an agent advised by information about terminating states (negative termination and positive termination, i.e., goal)human10RewardData.csv
- Cumulative rewards of an agent advised by a single human advisor about 10% of the stateshuman5RewardData.csv
- Cumulative rewards of an agent advised by a single human advisor about 5% of the statescoop10SequentialRewardData.csv
- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 10% of the statescoop10ParallelRewardData.csv
- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 10% of the statescoop5SequentialRewardData.csv
- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 5% of the statescoop5ParallelRewardData.csv
- Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 5% of the states
03-analysis
- Contains Python analysis scripts to obtain the results in the04-results
folder04-results
- Contains the plots and statistical significance values that are used in the publication
- Install the required Python packages by running
pip install -r .\03-analysis\requirements.txt
from the root folder. - For the charts, run
python .\03-analysis\plotting.py
from the root folder and follow the instructions. Results will be generated into04-results
in two formats, in the respectivepdf
andpng
subfolders. - For the significance tests, run
python .\03-analysis\t_test.py > 04-results/significance/results.txt
from the root folder. Results will be generated into04-results/significance
in a textual tabular format.
NOTE: The above steps have been tested with python>=3.8 && python<=3.13
.
For the following steps, refer to the tool's official GitHub repository.
- Download Eclipse Modeling Tools, Version: 2025-06 (4.36.0) from the Eclipse Foundation site.
- Install Eclipse Xtend, Version: 2.39.0 either through the Marketplace or from the Xtend site.
- Install Viatra, Version: 2.9.1 either through the Marketplace or from the Viatra site.
- Clone the github repository for the tool’s official GitHub repository.
- Import the contents of the (1) plugins, (2) examples, and (3) tests folders into the running Eclipse instance.
- Generate the RL model and edit code using the
genmodel
in /plugins/ca.mcmaster.ssm.rl4mt.metamodel/models).- Open
rl.genmodel
, right-click the root node and select generate model code. - Right-click the root node and select
generate edit code
.
- Open
- Generate the Lake model and edit code using the
genmodel
in /examples/ca.mcmaster.ssm.rl4mt.examples.lake.metamodel/models.- Open
lake.genmodel
right-click the root node and selectgenerate model code
. - Right-click the root node and select
generate edit code
.
- Open
Data can be obtained by running experiments encoded in unit tests. Unit tests are parameterized with human advice found in the 01-advice
folder of this replication package.
To locate the unit tests, navigate to https://github.com/ssm-lab/rl4mt/tree/main/tests/ca.mcmaster.ssm.rl4mt.examples.lake.tests/src/ca/mcmaster/ssm/rl4mt/examples/lake/tests
in the tool's official GitHub repository.
Repeat these steps for each experiment file.
- Right-click the file name.
- Go to
Run as
and clickRun configurations
. - Select
JUnit Plug-in Test
and create a new configuration. Optionally, name this test as the experiment file. - Under the
Test
tab selectRun a single test
and underTest class
select the experiment file. - Click on the
Arguments
tab.- Program arguments:
-os ${target.os} -ws ${target.ws} -arch ${target.arch} -nl ${target.nl} -consoleLog -nosplash
. - VM arguments:
-Xms512m -Xmx4096m
.
- Program arguments:
Note: Headless mode preferred.
- Click on the
Main
tab. - Under
Program to Run
selectRun an application
and select[No Application] - Headless Mode
.
NOTE: The following steps take a long time (about half an hour each, depending on the hardware) to compute.
- Run
LakeTestRandom.xtend
- Rename
rewardData.csv
torandomRewardData.csv
- Run
LakeTestUnadvised.xtend
- Rename
rewardData.csv
tounadvisedRewardData.csv
In this experiment, a single oracle advisor gives advice about every tile.
- In
LakeTestSingleAdvisor.xtend
, on line 233 change theSingleExperimentMode
toAll
runAdvisedAgentSingleAdvisor(SingleExperimentMode.ALL)
- Save and run
LakeTestSingleAdvisor.xtend
- Rename
rewardData.csv
toallRewardData.csv
In this experiment, a single oracle advisor gives advice about hole tiles and the goal tile (about 20% of the problem space).
- In
LakeTestSingleAdvisor.xtend
, on line 233 change theSingleExperimentMode
toHOLES_AND_GOAL
runAdvisedAgentSingleAdvisor(SingleExperimentMode.HOLES_AND_GOAL)
- Save and run
LakeTestSingleAdvisor.xtend
- Rename
rewardData.csv
toholesAndGoalRewardData.csv
In this experiment, a single human advisor gives advice about 10% of the problem space.
- In
LakeTestSingleAdvisor.xtend
, on line 233 change theSingleExperimentMode
toHUMAN10
runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN10)
- Save and run
LakeTestSingleAdvisor.xtend
- Rename
rewardData.csv
tohuman10RewardData.csv
In this experiment, a single human advisor gives advice about 5% of the problem space.
- In
LakeTestSingleAdvisor.xtend
, on line 233 change theSingleExperimentMode
toHUMAN5
runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN5)
- Save and run
LakeTestSingleAdvisor.xtend
- Rename
rewardData.csv
tohuman5RewardData.csv
NOTE: The following data is only briefly mentioned in the paper, but not presented in detail due to the page limit.
In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.
- In
LakeTestCoop.xtend
, on line 333 change theCoopExperimentMode
toSEQUENTIAL_10
runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_10)
- Save and run
LakeTestCoop.xtend
- Rename
rewardData.csv
tocoop10SequentialRewardData.csv
In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.
- In
LakeTestCoop.xtend
, on line 333 change theCoopExperimentMode
toPARALLEL_10
runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_10)
- Save and run
LakeTestCoop.xtend
- Rename
rewardData.csv
tocoop10ParallelRewardData.csv
In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.
- In
LakeTestCoop.xtend
, on line 333 change theCoopExperimentMode
toSEQUENTIAL_5
runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_5)
- Save and run
LakeTestCoop.xtend
- Rename
rewardData.csv
tocoop5SequentialRewardData.csv
In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.
- In
LakeTestCoop.xtend
, on line 333 change theCoopExperimentMode
toPARALLEL_5
runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_5)
- Save and run
LakeTestCoop.xtend
- Rename
rewardData.csv
tocoop5ParallelRewardData.csv
The map used in the experiments:
Parameter | Value |
---|---|
RL method | Discrete policy gradient |
Learning rate ( |
0.9 |
Discount factor ( |
1.0 |
Number of episodes | 10000 |
SL fusion operator | BCF |
State-action space | 12x12x4 |
Evaluated agent | {Random, Unadvised, Advised} |
Source of advice | {Oracle, Single human, Cooperating humans} |
Advice quota – Oracle | {100% ("All"), 20% ("Holes&Goal")} |
Advice quota – Single human | {10%, 5%} |
Advice quota – Cooperating humans | {10% each, 5% each} |
Uncertainty - Oracle and Single human) | {0.2k ∣ k |
Uncertainty – Cooperating humans | 2D Manhattan distance |
Cooperative advice type | {Sequential cooperation, parallel cooperation} |
Oracle | Single human | |||
---|---|---|---|---|
u | 100% | 20% | 10% | 5% |
0.0 | 9900.100 | 9914.900 | 9768.000 | 8051.300 |
0.2 | 9685.900 | 8948.933 | 8538.266 | 5287.833 |
0.4 | 7974.066 | 5216.433 | 6121.033 | 2134.966 |
0.6 | 5094.333 | 2177.633 | 3488.700 | 2246.733 |
0.8 | 1502.500 | 523.633 | 1126.300 | 1108.666 |
10% | 5% | |
---|---|---|
Sequential | 8078.366 | 5037.066 |
Parallel | 5429.466 | 4130.666 |