-
Notifications
You must be signed in to change notification settings - Fork 526
Description
Description of the bug
When running sarek 3.0.1 with Mutect2 on tumor-normal paired WES data in uBAM format, the pipeline will crash with an error message saying:
"""
A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...]
"""
When running the same command on tumor-normal paired WES in fastq format, the pipeline will run just fine.
Command used
ref_dir=/path/to/refs
vep_dir=${ref_dir}/vep106.1
NXF_SINGULARITY_CACHEDIR=/path/to/singularity_cache
export NXF_SINGULARITY_CACHEDIR=$NXF_SINGULARITY_CACHEDIR
nextflow run /my/local/sarek-3.0.1/workflow/ \
-with-trace trace_complete.txt \
-with-timeline timeline_complete.html \
-with-dag dag_complete.pdf \
--tools 'mutect2' \
--input 't-n_wes_ubams.csv' \
--wes \
--genome GATK.GRCh38 \
--igenomes_base ${ref_dir} \
--vep_cache ${vep_dir} \
--max_time '240.h' \
--max_memory '32.GB' \
--max_cpus 16 \
-profile singularity > complete.log 2>&1
Findings
We checked the code and it indeed looks for {meta.patient}_{meta.normal_id} in the CRAM/BAM header, but our CRAM header only matched the {meta.normal_id}.
Line 1111 in ad2b34f
| "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.patient}_${meta.normal_id}" } |
When we modified the code so that it would only look for {meta.normal_id}, the code would run until the end without any issues. We did a rerun starting from fastq files and there the original pipeline did not crash. We proceeded to look into the read_group generation and there we found the issue:
Starting from fastq assigns {row.patient}_{row.sample} to "SM" in the read group,
Line 1196 in 5bb160d
| def read_group = "\"@RG\\tID:${flowcell}.${row.sample}.${row.lane}\\t${CN}PU:${row.lane}\\tSM:${row.patient}_${row.sample}\\tLB:${row.sample}\\tDS:${params.fasta}\\tPL:${params.seq_platform}\"" |
while Starting from bam assigns {row.sample} to "SM".
Line 1219 in 5bb160d
| def read_group = "\"@RG\\tID:${row.sample}_${row.lane}\\t${CN}PU:${row.lane}\\tSM:${row.sample}\\tLB:${row.sample}\\tPL:${params.seq_platform}\"" |
We think that this is a bug, since Mutect2 variant calling would always crash when looking for the {meta.patient}_{meta.normal_id} in the {row.sample} BAM header of uBAM-derived sequencing data.
We think the solution would be to update the "\tSM:${row.sample}" to "\tSM:${row.patient}_${row.sample}" in line 1219 of sarek/workflows/sarek.nf (However, we did not test this yet).
For now people are able to circumvent this bug in tumor-normal mutect2 runs, by starting from fastq reads.