Skip to content

Commit f84cae4

Browse files
authored
Merge pull request #696 from asp8200/md5_check_of_test_output
Adding md5-sums to the test-yml-files
2 parents 15ac00d + 7749eb7 commit f84cae4

21 files changed

+1481
-0
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1616
- [#679](https://github.com/nf-core/sarek/pull/679) - Back to `dev`
1717
- [#685](https://github.com/nf-core/sarek/pull/685) - Updating the nf-core modules used by Sarek.
1818
- [#691](https://github.com/nf-core/sarek/pull/691) - To run the same pytest as before locally, use `PROFILE=docker`
19+
- [#696](https://github.com/nf-core/sarek/pull/696) - Adding check of md5-sums in CI-tests.
1920

2021
### Fixed
2122

tests/test_aligner.yml

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,40 +6,73 @@
66
- preprocessing
77
files:
88
- path: results/csv/markduplicates.csv
9+
md5sum: 0d6120bb99e92f6810343270711ca53e
910
- path: results/csv/markduplicates_no_table.csv
11+
md5sum: 2a2d3d4842befd4def39156463859ee3
1012
- path: results/csv/recalibrated.csv
13+
md5sum: 42628ec994c16f565e5407b40a9c1ac3
1114
- path: results/multiqc
1215
- path: results/preprocessing/markduplicates/test/test.md.cram
16+
# binary changing on reruns
1317
- path: results/preprocessing/markduplicates/test/test.md.cram.crai
18+
# binary changing on reruns
1419
- path: results/preprocessing/recal_table/test/test.recal.table
20+
md5sum: 4ac774bf5f1157e77426fd82f5ac0fbe
1521
- path: results/preprocessing/recalibrated/test/test.recal.cram
22+
# binary changing on reruns
1623
- path: results/preprocessing/recalibrated/test/test.recal.cram.crai
24+
# binary changing on reruns
1725
- path: results/reference/bwamem2/genome.fasta.0123
26+
md5sum: d73300d44f733bcdb7c988fc3ff3e3e9
1827
- path: results/reference/bwamem2/genome.fasta.amb
28+
md5sum: 1891c1de381b3a96d4e72f590fde20c1
1929
- path: results/reference/bwamem2/genome.fasta.ann
30+
md5sum: 2df4aa2d7580639fa0fcdbcad5e2e969
2031
- path: results/reference/bwamem2/genome.fasta.bwt.2bit.64
32+
md5sum: cd4bdf496eab05228a50c45ee43c1ed0
2133
- path: results/reference/bwamem2/genome.fasta.pac
34+
md5sum: 8569fbdb2c98c6fb16dfa73d8eacb070
2235
- path: results/reference/dbsnp/dbsnp_146.hg38.vcf.gz.tbi
36+
md5sum: 628232d0c870f2dbf73c3e81aff7b4b4
2337
- path: results/reference/dict/genome.dict
38+
md5sum: 2433fe2ba31257337bf4c4bd4cb8da15
2439
- path: results/reference/fai/genome.fasta.fai
40+
md5sum: 3520cd30e1b100e55f578db9c855f685
2541
- path: results/reference/intervals/chr22_1-40001.bed
42+
md5sum: 87a15eb9c2ff20ccd5cd8735a28708f7
2643
- path: results/reference/intervals/chr22_1-40001.bed.gz
44+
md5sum: d3341fa28986c40b24fcc10a079dbb80
2745
- path: results/reference/intervals/genome.bed
46+
md5sum: a87dc7d20ebca626f65cc16ff6c97a3e
2847
- path: results/reference/known_indels/mills_and_1000G.indels.vcf.gz.tbi
48+
md5sum: 1bb7ab8f22eb798efd796439d3b29b7a
2949
- path: results/reports/fastqc/test-test_L1
3050
- path: results/reports/markduplicates/test/test.md.metrics
51+
contains: ["test 8547 767 84 523391 3882 0 0 0.385081", "1.0 767 767"]
3152
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
53+
md5sum: 76fa71922a3f748e507c2364c531dfcb
3254
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
55+
md5sum: abc5df85e302b79985627888870882da
3356
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
57+
md5sum: d536456436eb275159b8c6af83213d80
3458
- path: results/reports/mosdepth/test/test.md.regions.bed.gz
59+
md5sum: 38fe39894abe62e38f8ac214cba64f2b
3560
- path: results/reports/mosdepth/test/test.md.regions.bed.gz.csi
61+
md5sum: b1c2a861f64e20a94108a6de3b76c582
3662
- path: results/reports/mosdepth/test/test.recal.mosdepth.global.dist.txt
63+
md5sum: 76fa71922a3f748e507c2364c531dfcb
3764
- path: results/reports/mosdepth/test/test.recal.mosdepth.region.dist.txt
65+
md5sum: abc5df85e302b79985627888870882da
3866
- path: results/reports/mosdepth/test/test.recal.mosdepth.summary.txt
67+
md5sum: d536456436eb275159b8c6af83213d80
3968
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz
69+
md5sum: 38fe39894abe62e38f8ac214cba64f2b
4070
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz.csi
71+
md5sum: b1c2a861f64e20a94108a6de3b76c582
4172
- path: results/reports/samtools/test/test.md.cram.stats
73+
md5sum: dcf70bbcfb92e01027978f28d2035d78
4274
- path: results/reports/samtools/test/test.recal.cram.stats
75+
md5sum: 5528d952f5dc74a39f28e27165bf96be
4376
- name: Run dragmap
4477
command: nextflow run main.nf -profile test,docker --aligner dragmap --save_reference
4578
tags:
@@ -48,40 +81,85 @@
4881
- preprocessing
4982
files:
5083
- path: results/csv/markduplicates.csv
84+
md5sum: 0d6120bb99e92f6810343270711ca53e
5185
- path: results/csv/markduplicates_no_table.csv
86+
md5sum: 2a2d3d4842befd4def39156463859ee3
5287
- path: results/csv/recalibrated.csv
88+
md5sum: 42628ec994c16f565e5407b40a9c1ac3
5389
- path: results/multiqc
5490
- path: results/preprocessing/markduplicates/test/test.md.cram
91+
# binary changing on reruns
5592
- path: results/preprocessing/markduplicates/test/test.md.cram.crai
93+
# binary changing on reruns
5694
- path: results/preprocessing/recal_table/test/test.recal.table
95+
md5sum: 75ba4376a17ca69c5134153302f82e92
5796
- path: results/preprocessing/recalibrated/test/test.recal.cram
97+
# binary changing on reruns
5898
- path: results/preprocessing/recalibrated/test/test.recal.cram.crai
99+
# binary changing on reruns
59100
- path: results/reference/dbsnp/dbsnp_146.hg38.vcf.gz.tbi
101+
md5sum: 628232d0c870f2dbf73c3e81aff7b4b4
60102
- path: results/reference/dict/genome.dict
103+
md5sum: 2433fe2ba31257337bf4c4bd4cb8da15
61104
- path: results/reference/dragmap/hash_table.cfg
105+
# hash_table.cfg contains many strings which we could test for - which do we want to test?
106+
contains:
107+
[
108+
"reference_sequences = 1",
109+
"reference_len = 368640",
110+
"reference_len_raw = 40001",
111+
"reference_len_not_n = 40001",
112+
"reference_alt_seed = 204800",
113+
]
62114
- path: results/reference/dragmap/hash_table.cfg.bin
115+
# binary changing on reruns
63116
- path: results/reference/dragmap/hash_table.cmp
117+
md5sum: 1caab4ffc89f81ace615a2e813295cf4
64118
- path: results/reference/dragmap/hash_table_stats.txt
119+
# hash_table_stats.txt contains many string which we could test for - which do we want to test?
120+
contains: ["A bases: 10934", "C bases: 8612", "G bases: 8608", "T bases: 11847"]
65121
- path: results/reference/dragmap/ref_index.bin
122+
md5sum: dbb5c7d26b974e0ac338024fe4535044
66123
- path: results/reference/dragmap/reference.bin
124+
md5sum: be67b80ee48aa96b383fd72f1ccfefea
67125
- path: results/reference/dragmap/repeat_mask.bin
126+
md5sum: 294939f1f80aa7f4a70b9b537e4c0f21
68127
- path: results/reference/dragmap/str_table.bin
128+
md5sum: 45f7818c4a10fdeed04db7a34b5f9ff1
69129
- path: results/reference/fai/genome.fasta.fai
130+
md5sum: 3520cd30e1b100e55f578db9c855f685
70131
- path: results/reference/intervals/chr22_1-40001.bed
132+
md5sum: 87a15eb9c2ff20ccd5cd8735a28708f7
71133
- path: results/reference/intervals/chr22_1-40001.bed.gz
134+
md5sum: d3341fa28986c40b24fcc10a079dbb80
72135
- path: results/reference/intervals/genome.bed
136+
md5sum: a87dc7d20ebca626f65cc16ff6c97a3e
73137
- path: results/reference/known_indels/mills_and_1000G.indels.vcf.gz.tbi
138+
md5sum: 1bb7ab8f22eb798efd796439d3b29b7a
74139
- path: results/reports/fastqc/test-test_L1
75140
- path: results/reports/markduplicates/test/test.md.metrics
141+
contains: ["LB0 13607 543 161 518779 6410 0 0 0.436262"]
76142
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
143+
md5sum: be1a800868fc1ce26711654525224e59
77144
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
145+
md5sum: 2a3f0fab66518ef0786235470f1f28d0
78146
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
147+
md5sum: d38ab9b0e0e551dc22919304929dd71c
79148
- path: results/reports/mosdepth/test/test.md.regions.bed.gz
149+
md5sum: 0d92f4c698a6476ccaf798aa31a557bc
80150
- path: results/reports/mosdepth/test/test.md.regions.bed.gz.csi
151+
md5sum: d5f1c9389ecf52ba839e834780a94549
81152
- path: results/reports/mosdepth/test/test.recal.mosdepth.global.dist.txt
153+
md5sum: be1a800868fc1ce26711654525224e59
82154
- path: results/reports/mosdepth/test/test.recal.mosdepth.region.dist.txt
155+
md5sum: 2a3f0fab66518ef0786235470f1f28d0
83156
- path: results/reports/mosdepth/test/test.recal.mosdepth.summary.txt
157+
md5sum: d38ab9b0e0e551dc22919304929dd71c
84158
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz
159+
md5sum: 0d92f4c698a6476ccaf798aa31a557bc
85160
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz.csi
161+
md5sum: d5f1c9389ecf52ba839e834780a94549
86162
- path: results/reports/samtools/test/test.md.cram.stats
163+
md5sum: f2ae8b531aa1fb2fbffe9a92e4c81493
87164
- path: results/reports/samtools/test/test.recal.cram.stats
165+
md5sum: f7bab59db4fb8ab49eea71b668d351d5

tests/test_annotation.yml

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,52 @@
55
- snpeff
66
files:
77
- path: results/annotation/test/test_snpEff.ann.vcf.gz
8+
md5sum: 01f24fdd76f73eefd695beea7b3d3d8e
89
- path: results/annotation/test/test_snpEff.ann.vcf.gz.tbi
10+
md5sum: 51e418d9be9bb33f1d4123493b15b6c9
911
- path: results/multiqc
1012
- path: results/reports/snpeff/test/snpEff_summary.html
13+
# snpEff_summary.html changes md5sums on reruns.
14+
contains: ["<b> Genome total length </b>", "<td> 100,286,402 </td>", "<td> MT192765.1 </td>"]
1115
- path: results/reports/snpeff/test/test_snpEff.csv
16+
# test_snpEff.csv changes md5sums on reruns.
17+
contains:
18+
[
19+
"Values , 50,100",
20+
"Count , 1,8",
21+
"Reference , 0",
22+
"Het , 1",
23+
"Hom , 8",
24+
"Missing , 0",
25+
"MT192765.1, Position,0,1",
26+
"MT192765.1,Count,0,0",
27+
]
1228
- path: results/reports/snpeff/test/test_snpEff.genes.txt
29+
md5sum: 130536bf0237d7f3f746d32aaa32840a
1330
- name: Run VEP
1431
command: nextflow run main.nf -profile test,annotation --tools vep --skip_tools multiqc
1532
tags:
1633
- annotation
1734
- vep
1835
files:
1936
- path: results/annotation/test/test_VEP.ann.vcf.gz
37+
# binary changes md5sums on reruns.
2038
- path: results/annotation/test/test_VEP.ann.vcf.gz.tbi
39+
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
2140
- path: results/reports/EnsemblVEP/test/test_VEP.summary.html
41+
# test_VEP.summary.html changes md5sums on reruns.
42+
contains:
43+
[
44+
"<tr><td>Input file</td><td>test.vcf.gz</td></tr><tr><td>Output file</td><td>test_VEP.ann.vcf</td></tr>",
45+
"General statistics",
46+
"Lines of input read",
47+
"Variants processed",
48+
"Variants filtered out",
49+
"Novel / existing variants",
50+
"Overlapped genes",
51+
"Overlapped transcripts",
52+
"Overlapped regulatory features",
53+
]
2254
- name: Run snpEff followed by VEP
2355
command: nextflow run main.nf -profile test,annotation --tools merge --skip_tools multiqc
2456
tags:
@@ -28,8 +60,23 @@
2860
- vep
2961
files:
3062
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz
63+
# binary changes md5sums on reruns.
3164
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz.tbi
65+
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
3266
- path: results/reports/EnsemblVEP/test/test_snpEff_VEP.summary.html
67+
# test_snpEff_VEP.summary.html changes md5sums on reruns.
68+
contains:
69+
[
70+
"<tr><td>Input file</td><td>test_snpEff.ann.vcf.gz</td></tr><tr><td>Output file</td><td>test_snpEff_VEP.ann.vcf</td></tr>",
71+
"General statistics",
72+
"Lines of input read",
73+
"Variants processed",
74+
"Variants filtered out",
75+
"Novel / existing variants",
76+
"Overlapped genes",
77+
"Overlapped transcripts",
78+
"Overlapped regulatory features",
79+
]
3380
- path: results/annotation/test/test_snpEff.ann.vcf.gz
3481
should_exist: false
3582
- path: results/annotation/test/test_snpEff.ann.vcf.gz.tbi
@@ -55,22 +102,36 @@
55102
- vep
56103
files:
57104
- path: results/annotation/test/test_VEP.ann.vcf.gz
105+
# binary changes md5sums on reruns.
58106
- path: results/annotation/test/test_VEP.ann.vcf.gz.tbi
107+
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
59108
- path: results/annotation/test/test_snpEff.ann.vcf.gz
109+
md5sum: 01f24fdd76f73eefd695beea7b3d3d8e
60110
- path: results/annotation/test/test_snpEff.ann.vcf.gz.tbi
111+
md5sum: 51e418d9be9bb33f1d4123493b15b6c9
61112
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz
113+
# binary changes md5sums on reruns.
62114
- path: results/annotation/test/test_snpEff_VEP.ann.vcf.gz.tbi
115+
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
63116
- path: results/reports/EnsemblVEP/test/test_VEP.summary.html
117+
# text-based file changes md5sums on reruns.
64118
- path: results/reports/EnsemblVEP/test/test_snpEff_VEP.summary.html
119+
# text-based file changes md5sums on reruns.
65120
- path: results/reports/snpeff/test/snpEff_summary.html
121+
# text-based file changes md5sums on reruns.
66122
- path: results/reports/snpeff/test/test_snpEff.csv
123+
# text-based file changes md5sums on reruns.
67124
- path: results/reports/snpeff/test/test_snpEff.genes.txt
125+
md5sum: 130536bf0237d7f3f746d32aaa32840a
68126
- name: Run VEP with fasta
69127
command: nextflow run main.nf -profile test,annotation --tools vep --vep_include_fasta --skip_tools multiqc
70128
tags:
71129
- annotation
72130
- vep
73131
files:
74132
- path: results/annotation/test/test_VEP.ann.vcf.gz
133+
# binary changes md5sums on reruns.
75134
- path: results/annotation/test/test_VEP.ann.vcf.gz.tbi
135+
md5sum: 4cb176febbc8c26d717a6c6e67b9c905
76136
- path: results/reports/EnsemblVEP/test/test_VEP.summary.html
137+
# text-based file changes md5sums on reruns.

tests/test_bam_remap.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,40 +4,76 @@
44
- alignment_to_fastq
55
files:
66
- path: results/cat/test-1_1.merged.fastq.gz
7+
md5sum: 27b1dd4720d589cda1f33028798e859b
78
- path: results/cat/test-1_2.merged.fastq.gz
9+
md5sum: 2bbac774fffd1a9df53f9ab2fc2b86ab
810
- path: results/collate/test-1.mapped_1.fq.gz
11+
md5sum: 992b824d00359782db5240eee42d5f06
912
- path: results/collate/test-1.mapped_2.fq.gz
13+
md5sum: 118bff0ec11c9cc0427a7db21bdebc9c
1014
- path: results/collate/test-1.mapped_other.fq.gz
15+
md5sum: 709872fc2910431b1e8b7074bfe38c67
1116
- path: results/collate/test-1.mapped_singleton.fq.gz
17+
md5sum: 709872fc2910431b1e8b7074bfe38c67
1218
- path: results/collate/test-1.unmapped_1.fq.gz
19+
md5sum: b79faf89e96948ea52f3ca41bee7de9a
1320
- path: results/collate/test-1.unmapped_2.fq.gz
21+
md5sum: 8e18a94bfd77739e184856ac95d5b26a
1422
- path: results/collate/test-1.unmapped_other.fq.gz
23+
md5sum: 709872fc2910431b1e8b7074bfe38c67
1524
- path: results/collate/test-1.unmapped_singleton.fq.gz
25+
md5sum: 709872fc2910431b1e8b7074bfe38c67
1626
- path: results/csv/markduplicates.csv
27+
md5sum: 0d6120bb99e92f6810343270711ca53e
1728
- path: results/csv/markduplicates_no_table.csv
29+
md5sum: 2a2d3d4842befd4def39156463859ee3
1830
- path: results/csv/recalibrated.csv
31+
md5sum: 42628ec994c16f565e5407b40a9c1ac3
1932
- path: results/multiqc
2033
- path: results/preprocessing/markduplicates/test/test.md.cram
34+
# binary changes md5sums on reruns.
2135
- path: results/preprocessing/markduplicates/test/test.md.cram.crai
36+
# binary changes md5sums on reruns.
2237
- path: results/preprocessing/recal_table/test/test.recal.table
38+
md5sum: 9c0517ffdc5d30a5c73b9f7df1ff3060
2339
- path: results/preprocessing/recalibrated/test/test.recal.cram
40+
# binary changes md5sums on reruns.
2441
- path: results/preprocessing/recalibrated/test/test.recal.cram.crai
42+
# binary changes md5sums on reruns.
2543
- path: results/reports/fastqc/test-1
2644
- path: results/reports/markduplicates/test/test.md.metrics
45+
contains: ["test 0 2820 2 2 0 828 0 0.293617 3807", "1.0 0.999986 1178 1178", "2.0 1.47674 800 800", "100.0 1.911145 0 0"]
2746
- path: results/reports/mosdepth/test/test.md.mosdepth.global.dist.txt
47+
md5sum: 9cb9b181119256ed17a77dcf44d58285
2848
- path: results/reports/mosdepth/test/test.md.mosdepth.region.dist.txt
49+
md5sum: 75e1ce7e55af51f4985fa91654a5ea2d
2950
- path: results/reports/mosdepth/test/test.md.mosdepth.summary.txt
51+
md5sum: dbe376360e437c89190139ef0ae6769a
3052
- path: results/reports/mosdepth/test/test.md.regions.bed.gz
53+
md5sum: d9b53915d473710ff0260a0ff694fd32
3154
- path: results/reports/mosdepth/test/test.md.regions.bed.gz.csi
55+
md5sum: d0713716f63ac573f4a3385733e9a537
3256
- path: results/reports/mosdepth/test/test.recal.mosdepth.global.dist.txt
57+
md5sum: 9cb9b181119256ed17a77dcf44d58285
3358
- path: results/reports/mosdepth/test/test.recal.mosdepth.region.dist.txt
59+
md5sum: 75e1ce7e55af51f4985fa91654a5ea2d
3460
- path: results/reports/mosdepth/test/test.recal.mosdepth.summary.txt
61+
md5sum: dbe376360e437c89190139ef0ae6769a
3562
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz
63+
md5sum: d9b53915d473710ff0260a0ff694fd32
3664
- path: results/reports/mosdepth/test/test.recal.regions.bed.gz.csi
65+
md5sum: d0713716f63ac573f4a3385733e9a537
3766
- path: results/reports/samtools/test/test.md.cram.stats
67+
md5sum: 5201890d36c1dd127b930373b6e823e5
3868
- path: results/reports/samtools/test/test.recal.cram.stats
69+
md5sum: bb2fc6118a1404c45f9e828600df8fb1
3970
- path: results/samtools/test-1.bam
71+
# binary changes md5sums on reruns.
4072
- path: results/samtools/test-1.map_map.bam
73+
md5sum: e1d347ccaec52f690c0313047fecf7e6
4174
- path: results/samtools/test-1.map_unmap.bam
75+
md5sum: 0be5ce27b94e047a1437596a91560982
4276
- path: results/samtools/test-1.unmap_map.bam
77+
md5sum: 53423525e9bf327c60916aded73ba8a6
4378
- path: results/samtools/test-1.unmap_unmap.bam
79+
md5sum: 60a80b7e380e228555b8d90990e1c788

0 commit comments

Comments
 (0)