- Extract the solutions using the following structure:
.../task/group/user/sub.cpp- From terry you can use
util/extract_solutions.py folder/with/zips/ target/folder/ --all --source-only --ignore0(fromalgorithm-ninja/terry) - From cms you can use
util/from_cms.py folder/with/solutions/ target/folder/ --ignore0(you can use--same-schoolto check only between solutions of the same school) - Ignoring the solutions that score 0 will make much cleaner results
- From terry you can use
- Prepare a folder with the templates given to the contestants, grouped by task (e.g.
.../task/template1.cpp).- You can use
util/get_templates.sh round_folderfor CMS.
- You can use
- Put the ranking of the contest in a text file:
- Each line must contain a single word: the id of the user, the same as step 1
- The ranking must be sorted from the top ranked
- Compile
starplagby issuingmake - Run
./build/main path/to/task/ path/to/templates/of/task/ path/to/ranking.txt cutoff path/to/target/folder/- The
cutoffshould be the limit of the ranking where you are interested in finding plagiarism (e.g. top 200): it only considers pairs of subs where at least one is from the topcutoffusers - It only checks pairs of users within the same group
path/to/target/folder/will contain the results of the execution as well as the snapshots of the computation used in case of crash
- The
- After the execution ends a file named
totalis created inside the target folder- The first line contains the number of processed user, the number of matches
Hfound before the cutoff (limited to 500) and the number of matchesLfound after the cutoff (limited to 500) - The next
Hlines contain the match information for the top part of the ranking - The next
Llines contain the match information for the rest of the ranking
- The first line contains the number of processed user, the number of matches
- To ease the manual checking of those matches you can use
./manual_check.py path/to/total path/to/cache- Where the first parameter is the path to the
totalfile generated by the previous step - The second parameter is a cache file to save partial results
- The script will create a file
output.tsvwith the list of copied solutions
- Where the first parameter is the path to the
For the main tool you need a C++17 compiler with std::filesystem support.
For manual_check.py you need python3, tmux and you have to run make before using it.