Skip to content

ssm-lab/rl4mt-replication-package

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replication package

for the paper Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance.

(Accepted for MODELS 2025.)

License: MIT DOI

About

Model-driven engineering problems often require complex model transformations (MTs), i.e., MTs that are chained in extensive sequences. Pertinent examples of such problems include model synchronization, automated model repair, and design space exploration. Manually developing complex MTs is an error-prone and often infeasible process. Reinforcement learning (RL) is an apt way to alleviate these issues. In RL, an autonomous agent explores the state space through trial and error to identify beneficial sequences of actions, such as MTs. However, RL methods exhibit performance issues in complex problems. In these situations, human guidance can be of high utility. In this paper, we present an approach and technical framework for developing complex MT sequences through RL, guided by potentially uncertain human advice. Our framework allows user-defined MTs to be mapped onto RL primitives, and executes them as RL programs to find optimal MT sequences. Our evaluation shows that human guidance, even if uncertain, substantially improves RL performance, and results in more efficient development of complex MTs. Through a sensible trade-off between the certainty and timeliness of human advice, our method takes a firm step towards machine learning-driven human-in-the-loop engineering methods.

Table of contents

Content description

  • 01-advice - Contains all the experimental artifacts and visualizations used in our experiments (map, advice files, and advice visualized on the map)
  • 02-data - Contains experimental data produced in accordance with the Experiment settings
    • randomRewardData.csv - Cumulative rewards of a random-walk agent
    • unadvisedRewardData.csv - Cumulative rewards of an unadvised (but not random) agent
    • allRewardData.csv - Cumulative rewards of an agent advised by information about every state
    • holesAndGoalRewardData.csv - Cumulative rewards of an agent advised by information about terminating states (negative termination and positive termination, i.e., goal)
    • human10RewardData.csv - Cumulative rewards of an agent advised by a single human advisor about 10% of the states
    • human5RewardData.csv - Cumulative rewards of an agent advised by a single human advisor about 5% of the states
    • coop10SequentialRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 10% of the states
    • coop10ParallelRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 10% of the states
    • coop5SequentialRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top left, one located at bottom right) who each advise about 5% of the states
    • coop5ParallelRewardData.csv - Cumulative rewards of an agent advised by two cooperating human advisors (one located at top right, one located at bottom left) who each advise about 5% of the states
  • 03-analysis - Contains Python analysis scripts to obtain the results in the 04-results folder
  • 04-results - Contains the plots and statistical significance values that are used in the publication

Reproduction of analysis

  • Install the required Python packages by running pip install -r .\03-analysis\requirements.txt from the root folder.
  • For the charts, run python .\03-analysis\plotting.py from the root folder and follow the instructions. Results will be generated into 04-results in two formats, in the respective pdf and png subfolders.
  • For the significance tests, run python .\03-analysis\t_test.py > 04-results/significance/results.txt from the root folder. Results will be generated into 04-results/significance in a textual tabular format.

NOTE: The above steps have been tested with python>=3.8 && python<=3.13.

Reproduction of experimental data

Setting up Eclipse

For the following steps, refer to the tool's official GitHub repository.

  1. Download Eclipse Modeling Tools, Version: 2025-06 (4.36.0) from the Eclipse Foundation site.
  2. Install Eclipse Xtend, Version: 2.39.0 either through the Marketplace or from the Xtend site.
  3. Install Viatra, Version: 2.9.1 either through the Marketplace or from the Viatra site.
  4. Clone the github repository for the tool’s official GitHub repository.
  5. Import the contents of the (1) plugins, (2) examples, and (3) tests folders into the running Eclipse instance.
  6. Generate the RL model and edit code using the genmodel in /plugins/ca.mcmaster.ssm.rl4mt.metamodel/models).
    • Open rl.genmodel, right-click the root node and select generate model code.
    • Right-click the root node and select generate edit code.
  7. Generate the Lake model and edit code using the genmodel in /examples/ca.mcmaster.ssm.rl4mt.examples.lake.metamodel/models.
    • Open lake.genmodel right-click the root node and select generate model code.
    • Right-click the root node and select generate edit code.

Obtaining experimental data

Data can be obtained by running experiments encoded in unit tests. Unit tests are parameterized with human advice found in the 01-advice folder of this replication package.

To locate the unit tests, navigate to https://github.com/ssm-lab/rl4mt/tree/main/tests/ca.mcmaster.ssm.rl4mt.examples.lake.tests/src/ca/mcmaster/ssm/rl4mt/examples/lake/tests in the tool's official GitHub repository.

Run configurations

Repeat these steps for each experiment file.

  1. Right-click the file name.
  2. Go to Run as and click Run configurations.
  3. Select JUnit Plug-in Test and create a new configuration. Optionally, name this test as the experiment file.
  4. Under the Test tab select Run a single test and under Test class select the experiment file.
  5. Click on the Arguments tab.
    • Program arguments: -os ${target.os} -ws ${target.ws} -arch ${target.arch} -nl ${target.nl} -consoleLog -nosplash.
    • VM arguments: -Xms512m -Xmx4096m.

Note: Headless mode preferred.

  • Click on the Main tab.
  • Under Program to Run select Run an application and select [No Application] - Headless Mode.

NOTE: The following steps take a long time (about half an hour each, depending on the hardware) to compute.

Random Agent

  1. Run LakeTestRandom.xtend
  2. Rename rewardData.csv to randomRewardData.csv

Unadvised Agent

  1. Run LakeTestUnadvised.xtend
  2. Rename rewardData.csv to unadvisedRewardData.csv

Oracle - 100% advice quota

In this experiment, a single oracle advisor gives advice about every tile.

  1. In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to All
    • runAdvisedAgentSingleAdvisor(SingleExperimentMode.ALL)
  2. Save and run LakeTestSingleAdvisor.xtend
  3. Rename rewardData.csv to allRewardData.csv

Oracle - 20% advice quota

In this experiment, a single oracle advisor gives advice about hole tiles and the goal tile (about 20% of the problem space).

  1. In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to HOLES_AND_GOAL
    • runAdvisedAgentSingleAdvisor(SingleExperimentMode.HOLES_AND_GOAL)
  2. Save and run LakeTestSingleAdvisor.xtend
  3. Rename rewardData.csv to holesAndGoalRewardData.csv

Single human - 10% advice quota

In this experiment, a single human advisor gives advice about 10% of the problem space.

  1. In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to HUMAN10
    • runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN10)
  2. Save and run LakeTestSingleAdvisor.xtend
  3. Rename rewardData.csv to human10RewardData.csv

Single human - 5% advice quota

In this experiment, a single human advisor gives advice about 5% of the problem space.

  1. In LakeTestSingleAdvisor.xtend, on line 233 change the SingleExperimentMode to HUMAN5
    • runAdvisedAgentSingleAdvisor(SingleExperimentMode.HUMAN5)
  2. Save and run LakeTestSingleAdvisor.xtend
  3. Rename rewardData.csv to human5RewardData.csv

NOTE: The following data is only briefly mentioned in the paper, but not presented in detail due to the page limit.

Two cooperating humans - 10% advice quota each (total 20%) - Sequential guidance

In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.

  1. In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to SEQUENTIAL_10
    • runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_10)
  2. Save and run LakeTestCoop.xtend
  3. Rename rewardData.csv to coop10SequentialRewardData.csv

Two cooperating humans - 10% advice quota each (total 20%) - Parallel guidance

In this experiment, two human advisors gives advice about 10% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.

  1. In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to PARALLEL_10
    • runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_10)
  2. Save and run LakeTestCoop.xtend
  3. Rename rewardData.csv to coop10ParallelRewardData.csv

Two cooperating humans - 5% advice quota each (total 10%) - Sequential guidance

In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the top-left corner (start) and the bottom-right corner (goal) and give advice about their local environment. Therefore, the agent is first guided by the first advisor's input, and later, by the second advisor's input -- i.e., guidance is sequential.

  1. In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to SEQUENTIAL_5
    • runAdvisedAgentCoop(CoopExperimentMode.SEQUENTIAL_5)
  2. Save and run LakeTestCoop.xtend
  3. Rename rewardData.csv to coop5SequentialRewardData.csv

Two cooperating humans - 5% advice quota each (total 10%) - Parallel guidance

In this experiment, two human advisors gives advice about 5% of the problem space each. The advisors are located in the bottom-left and the top-right corner and give advice about their local environment. Therefore, the agent is sometimes guided by the first advisor's input and sometimes, by the second advisor's input -- i.e., guidance is parallel.

  1. In LakeTestCoop.xtend, on line 333 change the CoopExperimentMode to PARALLEL_5
    • runAdvisedAgentCoop(CoopExperimentMode.PARALLEL_5)
  2. Save and run LakeTestCoop.xtend
  3. Rename rewardData.csv to coop5ParallelRewardData.csv

Experiment setup

Problem

The map used in the experiments:

The map used in the experiments

Settings and hyperparameters

Parameter Value
RL method Discrete policy gradient
Learning rate ($\alpha$) 0.9
Discount factor ($\gamma$) 1.0
Number of episodes 10000
SL fusion operator BCF
State-action space 12x12x4
Evaluated agent {Random, Unadvised, Advised}
Source of advice {Oracle, Single human, Cooperating humans}
Advice quota – Oracle {100% ("All"), 20% ("Holes&Goal")}
Advice quota – Single human {10%, 5%}
Advice quota – Cooperating humans {10% each, 5% each}
Uncertainty - Oracle and Single human) {0.2k ∣ k $\in$ 0..4}
Uncertainty – Cooperating humans 2D Manhattan distance
Cooperative advice type {Sequential cooperation, parallel cooperation}

Results

Oracle and single human

Oracle Single human
u 100% 20% 10% 5%
0.0 9900.100 9914.900 9768.000 8051.300
0.2 9685.900 8948.933 8538.266 5287.833
0.4 7974.066 5216.433 6121.033 2134.966
0.6 5094.333 2177.633 3488.700 2246.733
0.8 1502.500 523.633 1126.300 1108.666

Oracle - 100% advice quota

Oracle 100% - Linear scale

Oracle 100% - Log scale

Oracle - 20% advice quota

Oracle 20% - Linear scale

Oracle 20% - Log scale

Single human - 10% advice quota

Human 10% - Linear scale

Human 10% - Log scale

Single human - 5% advice quota

Human 5% - Linear scale

Human 5% - Log scale

Two cooperating humans

10% 5%
Sequential 8078.366 5037.066
Parallel 5429.466 4130.666

Two cooperating humans - 10% advice quota each (total 20%)

Two coopearting human with 10% each - Linear scale

Two coopearting human with 10% each - Log scale

Two cooperating humans - 5% advice quota each (total 10%)

Two coopearting human with 5% each - Linear scale

Two coopearting human with 5% each - Log scale

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages