Skip to content

A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...] #732

@petanska

Description

@petanska

Description of the bug

When running sarek 3.0.1 with Mutect2 on tumor-normal paired WES data in uBAM format, the pipeline will crash with an error message saying:
"""
A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...]
"""

When running the same command on tumor-normal paired WES in fastq format, the pipeline will run just fine.

Command used

ref_dir=/path/to/refs
vep_dir=${ref_dir}/vep106.1

NXF_SINGULARITY_CACHEDIR=/path/to/singularity_cache
export NXF_SINGULARITY_CACHEDIR=$NXF_SINGULARITY_CACHEDIR
nextflow run /my/local/sarek-3.0.1/workflow/ \
	-with-trace trace_complete.txt \
	-with-timeline timeline_complete.html \
	-with-dag dag_complete.pdf \
	--tools 'mutect2' \
	--input 't-n_wes_ubams.csv' \
	--wes \
	--genome GATK.GRCh38 \
	--igenomes_base ${ref_dir} \
	--vep_cache ${vep_dir} \
	--max_time '240.h' \
	--max_memory '32.GB' \
	--max_cpus 16 \
	-profile singularity > complete.log 2>&1

Findings

We checked the code and it indeed looks for {meta.patient}_{meta.normal_id} in the CRAM/BAM header, but our CRAM header only matched the {meta.normal_id}.

"--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.patient}_${meta.normal_id}" }

When we modified the code so that it would only look for {meta.normal_id}, the code would run until the end without any issues. We did a rerun starting from fastq files and there the original pipeline did not crash. We proceeded to look into the read_group generation and there we found the issue:
Starting from fastq assigns {row.patient}_{row.sample} to "SM" in the read group,
def read_group = "\"@RG\\tID:${flowcell}.${row.sample}.${row.lane}\\t${CN}PU:${row.lane}\\tSM:${row.patient}_${row.sample}\\tLB:${row.sample}\\tDS:${params.fasta}\\tPL:${params.seq_platform}\""

while Starting from bam assigns {row.sample} to "SM".
def read_group = "\"@RG\\tID:${row.sample}_${row.lane}\\t${CN}PU:${row.lane}\\tSM:${row.sample}\\tLB:${row.sample}\\tPL:${params.seq_platform}\""

We think that this is a bug, since Mutect2 variant calling would always crash when looking for the {meta.patient}_{meta.normal_id} in the {row.sample} BAM header of uBAM-derived sequencing data.

We think the solution would be to update the "\tSM:${row.sample}" to "\tSM:${row.patient}_${row.sample}" in line 1219 of sarek/workflows/sarek.nf (However, we did not test this yet).

For now people are able to circumvent this bug in tumor-normal mutect2 runs, by starting from fastq reads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions