A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...]

# Description of the bug
When running sarek 3.0.1 with Mutect2 on tumor-normal paired WES data in uBAM format, the pipeline will crash with an error message saying: 
"""
A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...]
"""

When running the same command on tumor-normal paired WES in fastq format, the pipeline will run just fine.

# Command used
```
ref_dir=/path/to/refs
vep_dir=${ref_dir}/vep106.1

NXF_SINGULARITY_CACHEDIR=/path/to/singularity_cache
export NXF_SINGULARITY_CACHEDIR=$NXF_SINGULARITY_CACHEDIR
nextflow run /my/local/sarek-3.0.1/workflow/ \
	-with-trace trace_complete.txt \
	-with-timeline timeline_complete.html \
	-with-dag dag_complete.pdf \
	--tools 'mutect2' \
	--input 't-n_wes_ubams.csv' \
	--wes \
	--genome GATK.GRCh38 \
	--igenomes_base ${ref_dir} \
	--vep_cache ${vep_dir} \
	--max_time '240.h' \
	--max_memory '32.GB' \
	--max_cpus 16 \
	-profile singularity > complete.log 2>&1
```

# Findings
We checked the code and it indeed looks for {meta.patient}\_{meta.normal_id} in the CRAM/BAM header, but our CRAM header only matched the {meta.normal_id}.
https://github.com/nf-core/sarek/blob/ad2b34f39fead34d7a09051e67506229e827e892/conf/modules.config#L1111
When we modified the code so that it would only look for {meta.normal_id}, the code would run until the end without any issues. We did a rerun starting from fastq files and there the original pipeline did not crash. We proceeded to look into the read_group generation and there we found the issue:
Starting from fastq assigns {row.patient}\_{row.sample} to "SM" in the read group,
https://github.com/nf-core/sarek/blob/5bb160ddb2cee50c811585e450b16cba13c95a02/workflows/sarek.nf#L1196
while Starting from bam assigns {row.sample} to "SM".
https://github.com/nf-core/sarek/blob/5bb160ddb2cee50c811585e450b16cba13c95a02/workflows/sarek.nf#L1219
We think that this is a bug, since Mutect2 variant calling would always crash when looking for the {meta.patient}\_{meta.normal_id}  in the {row.sample} BAM header of uBAM-derived sequencing data.

We think the solution would be to update the "\\tSM:${row.sample}" to "\\tSM:${row.patient}_${row.sample}" in line **1219** of **sarek/workflows/sarek.nf**  (However, we did not test this yet).

For now people are able to circumvent this bug in tumor-normal mutect2 runs, by starting from fastq reads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...] #732

Description of the bug

Command used

Findings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A USER ERROR has occurred: Bad input: Sample $name is not in BAM header: [...] #732

Description

Description of the bug

Command used

Findings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions