Skip to content

Latest commit

 

History

History
36 lines (25 loc) · 2.35 KB

File metadata and controls

36 lines (25 loc) · 2.35 KB

Victorian Cancer Bioinformatics Symposium - online, 2020-10-23

Sarek, a reproducible and portable workflow for analysis of matching tumor-normal NGS data

Maxime Garcia [1], Szilveszter Juhos [1], Teresita Díaz de Ståhl [1], Markus Mayrhofer [2], Johanna Sandgren [1], Björn Nystedt [2], Monica Nistér [1]

[1] Dept. of Oncology Pathology, The Swedish Childhood Tumor Biobank (Barntumörbanken, BTB); Karolinska Institutet [2] Dept. of Cell and Molecular Biology; National Bioinformatics Infrastructure Sweden, Science for Life Laboratory; Uppsala University

Introduction

High throughput sequencing for precision medicine is a routine method. Numerous tools have to be used, and analysis is time consuming. We propose Sarek, an open-source container based bioinformatics workflow for germline or tumor/normal pairs (can include matched relapses), written in Nextflow, to process WGS, whole-exome or gene-panel samples.

Methods

Sarek is part of nf-core, a collection of high quality peer-reviewed workflows; supported environments are Docker, Singularity and Conda, enabling version tracking and reproducibility. It is designed with flexible environments in mind: local fat node, HTC cluster or cloud environment like AWS. Several model organism references are available (including Human GRCh37 and GRCh38). Sarek is based on GATK best practices to prepare short-read data. The pipeline then reports germline and somatic SNVs and SVs (HaplotypeCaller, Strelka, Mutect2, Manta and TIDDIT). CNVs, purity and ploidy is estimated with ASCAT and Control-FREEC. At the end of the analysis the resulting VCF files can be annotated by SNPEff and/or VEP to facilitate further downstream processing. Furthermore, a broad set of QC metrics is reported as a final step of the workflow with MultiQC. Additional software can be included as new modules.

Results

From FASTQs to annotated VCFs it takes four days for a paired 90X/90X WGS-sample on a 48 cores node, with the complete set of tools. Processing can be sped-up with the optional use of Sentieon (C). Sarek is used in production at the National Genomics Infrastructure Sweden for germline and cancer samples for the Swedish Childhood Tumor Biobank and other research groups.

Conclusion

Sarek is an easy-to-use tool for germline or cancer NGS samples, to be downloaded from nf-co.re/sarek under MIT license.