GATK4 first round without MuTect1 and indel realignment#607
GATK4 first round without MuTect1 and indel realignment#607maxulysse merged 41 commits intoSciLifeLab:masterfrom szilvajuhos:master
Conversation
…ow error. Maybe NXF bug?
annotate.nf
Outdated
| vcfNotToAnnotate = Channel.create() | ||
|
|
||
| if (annotateVCF == []) { | ||
| // by default we annotate both germline and somatic results that we can find in the VariantCalling directory |
There was a problem hiding this comment.
In fact, we annote all available vcfs by default, so it's not really a question of germline/somatic, but really more a question of which tools was run
containers/sarek/Dockerfile
Outdated
|
|
||
| LABEL \ | ||
| author="Maxime Garcia" \ | ||
| authors="Maxime.Gracia@scilifelab.se, Szilveszter.Juhos@scilifelab.se" \ |
containers/sarek/Dockerfile
Outdated
| authors="Maxime.Gracia@scilifelab.se, Szilveszter.Juhos@scilifelab.se" \ | ||
| description="Image with tools used in Sarek" \ | ||
| maintainer="maxime.garcia@scilifelab.se" | ||
| maintainers="Maxime.Gracia@scilifelab.se, Szilveszter.Juhos@scilifelab.se" |
There was a problem hiding this comment.
If I remember well, we can use whichever label we want, but maintainer is meant to stay that way, because it's a port of the deprecated instruction MAINTAINER.
I do think we can use:
maintainer="Maxime Garcia <maxime.garcia@scilifelab.se>, Szilveszter Juhos <Szilveszter.Juhos@scilifelab.se>"
containers/vcfanno/to_build
Outdated
| docker build -t szilvajuhos/sarek-vcfanno:latest . | ||
| docker images | ||
| docker push szilvajuhos/sarek-vcfanno:latest | ||
| singularity pull docker://szilvajuhos/sarek-vcfanno:latest |
There was a problem hiding this comment.
Do we really need this script in this repo ?
There was a problem hiding this comment.
No, of course not
germlineVC.nf
Outdated
| -L ${intervalBed} \ | ||
| --dbsnp ${dbsnp} \ | ||
| -O ${intervalBed.baseName}_${idSample}.g.vcf \ | ||
| --emit-ref-confidence GVCF |
lib/QC.groovy
Outdated
| static def getVersionGATK() { | ||
| """ | ||
| echo "GATK version"\$(java -jar \$GATK_HOME/GenomeAnalysisTK.jar --version 2>&1) > v_gatk.txt | ||
| gatk-launch ApplyBQSR --help 2>&1| awk -F/ '/java/{for(i=1;i<=NF;i++){if(\$i~/gatk4/){sub("gatk4-","",\$i);print \$i>"v_gatk.txt"}}}' |
There was a problem hiding this comment.
I think we can work out the regex in the Python script instead of doing it here, it'll make more sense
There was a problem hiding this comment.
the gatk-launch is GATK-provided, I do not want to fiddle with that. OTOH it would be nice if they would have a --version option :/
There was a problem hiding this comment.
I'll look more if there's something similar with the new GATK
There was a problem hiding this comment.
My feeling is that it is still the easiest way to have the version :/
There was a problem hiding this comment.
I was thinking more of removing the awk part, and do the regex in the python script
There was a problem hiding this comment.
I see, sure, we can refactor it for the rest of the software later also.
|
Quite an impressive work. |
containers/sarek/environment.yml
Outdated
| - conda-forge::openjdk=8.0.144 # Needed for FastQC docker - see bioconda/bioconda-recipes#5026 | ||
| - fastqc=0.11.7 | ||
| - freebayes=1.2.0 | ||
| - gatk4=4.0.3.0 |
There was a problem hiding this comment.
You can use the 4.0.4.0, the executable is back to being gatk and not gatck-launch anymore
There was a problem hiding this comment.
Fine, will change the name in processes as well. In fact we have 4.0.6.0 also
containers/sarek/environment.yml
Outdated
| - fastqc=0.11.7 | ||
| - freebayes=1.2.0 | ||
| - gatk4=4.0.3.0 | ||
| - htslib=1.7 |
There was a problem hiding this comment.
You should use the 1.8 here.
If I remember well, htslib, bcftools and samtools can all have the same version
There was a problem hiding this comment.
OK, done but will check since I got a feeling that 1.8 has compatibility issues
There was a problem hiding this comment.
OK, strange, but good to know if you can confirm that
There was a problem hiding this comment.
There is even 1.9 out already! You could already skip 1.8 ...
https://github.com/samtools/samtools/releases/
containers/sarek/environment.yml
Outdated
| - gatk4=4.0.3.0 | ||
| - htslib=1.7 | ||
| - igvtools=2.3.93 | ||
| - manta=1.3.0 |
There was a problem hiding this comment.
Since we're updating, we can try the 1.4.0
| genome_base = params.genome == 'GRCh37' ? '/sw/data/uppnex/ToolBox/ReferenceAssemblies/hg38make/bundle/2.8/b37' : params.genome == 'GRCh38' ? '/sw/data/uppnex/ToolBox/hg38bundle' : 'References/smallGRCh37' | ||
| singleCPUMem = 8.GB | ||
| totalMemory = 104.GB // change to 240 on irma | ||
| totalMemory = 92.GB // change to 240 on irma |
| build.sh | ||
| COPY environment.yml / | ||
| RUN conda env update -n root -f /environment.yml && conda clean -a | ||
| ENV PATH /opt/conda/bin:$PATH |
There was a problem hiding this comment.
we can get rid of this last line:
https://gitter.im/nf-core/Lobby?at=5b59f41bd2f0934551d30d5d
There was a problem hiding this comment.
tried, and was not working as expected, so I prefer to leave it as it is now, and improve when needed
There was a problem hiding this comment.
we will keep now with the ENV
containers/sarek/environment.yml
Outdated
| @@ -0,0 +1,24 @@ | |||
| # You can use this file to create a conda environment for this pipeline: | |||
| # conda env create -f environment.yml | |||
| name: sarek-core | |||
There was a problem hiding this comment.
We should specify a version here
so I would go for sarek-core-dev or sarek-core-2.1
There was a problem hiding this comment.
can we just leave as sarek ?
GATK4 first round without MuTect1 and indel realignment
Also have a look at the new container structure. I am trying to accommodate nf-core guidelines. alleleCount and ASCAT needs new bioconda recipes, but most of the other tools are in a collated (relatively big) container including GATK4, igvtools, etc.