building a reproducible workow with snakemake and singularity€¦ · wolfgang resch - hpc @ nih -...
TRANSCRIPT
![Page 1: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/1.jpg)
Setupbw$ sinteractive -c12 --mem=24g --gres=lscratch:20...node$ module load singularity snakemake hisatnode$ cd /data/$USERnode$ git clone https://github.com/NIH-HPC/snakemake-class.gitnode$ cd snakemake-classnode$ ./setup.sh...+-------------------------------------------------+| || Class materials have been set up successfully || |+-------------------------------------------------+
![Page 2: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/2.jpg)
Wolfgang Resch - HPC @ NIH - [email protected]
Building a reproducibleworkflow with Snakemake
and Singularity
![Page 3: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/3.jpg)
Slides adapted from
Johannes Koester
http://slides.com/johanneskoester/snakemake-tutorial-2016#/
![Page 4: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/4.jpg)
data
results
![Page 5: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/5.jpg)
data
results
data
results
data
results
data
results
data
results
merged resultsresults results results results results
![Page 6: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/6.jpg)
data
results
data
results
data
results
data
results
data
results
merged resultsresults results results results results
scalability
automatically execute steps in parallelminimize redundant computation when adding/changing data,or resuming interrupted workflows
![Page 7: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/7.jpg)
data
results
data
results
data
results
data
results
data
results
merged resultsresults results results results results
document tools, versions, parameters, algorithmsexecute automatically
repr
oduc
ibili
ty
![Page 8: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/8.jpg)
There Are Many Workflow Tools
make, ninja, scons, waf, ruffus, jug, Rake, bpipe, BigDataScript, toil, nextflow, paver, bcbio-nextgen, snakemake,wdl, cwl, Galaxy, KNIME,Taverna, Partek flow, DNAnexus, SevenBridges,Basespace
https://github.com/pditommaso/awesome-pipeline
![Page 9: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/9.jpg)
Snakemake
biowulf users: ~100
![Page 10: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/10.jpg)
data
results
rule qc: input: "seq/{sample}.fastq.gz" output: "qc/{sample}.qc" script: "scripts/myscript.py"
rule aln: input: "seq/{sample}.fastq.gz" output: bam = "aln/{sample}.bam", bai = "aln/{sample}.bai" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output.bam} samtools index {output.bam} """
Rules
![Page 11: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/11.jpg)
rule aln: input: "seq/{sample}.fastq.gz" output: bam = "aln/{sample}.bam", bai = "aln/{sample}.bai" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output.bam} samtools index {output.bam} """
name
recipe
formalizedinput/output
refer to input and output in recipe
Rules
![Page 12: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/12.jpg)
rule aln: input: "seq/{sample}.fastq.gz" output: bam = "aln/{sample}.bam", bai = "aln/{sample}.bai" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output.bam} samtools index {output.bam} """
wildcards generalize rules
Rules
![Page 13: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/13.jpg)
rule aln: input: "seq/{sample}.fastq.gz" output: bam = "aln/{sample}.bam", bai = "aln/{sample}.bai" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output.bam} samtools index {output.bam} """
Rules
input and outputcan be single files
or lists
referred to by name
![Page 14: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/14.jpg)
rule aln: input: "seq/{sample}.fastq.gz" output: "aln/{sample}.bam", "aln/{sample}.bai" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output[0]} samtools index {output[0]} """
Rules
input and outputcan be single files
or lists
referred to by index
![Page 15: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/15.jpg)
rule aln: input: "seq/{sample}.fastq.gz" output: "aln/{sample}.bam", "aln/{sample}.bai" conda: "envs/aln.yml" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output[0]} samtools index {output[0]} """
Rules
reproducibleenvironment
![Page 16: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/16.jpg)
rule aln: input: "seq/{sample}.fastq.gz" output: "aln/{sample}.bam", "aln/{sample}.bai" singularity: "shub://NIH-HPC/snakemake-class" shell: """ hisat2 -x /ref/genome -U {input}\ | samtools sort > {output[0]} samtools index {output[0]} """
Rules
reproducibleenvironment
![Page 17: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/17.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
![Page 18: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/18.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
![Page 19: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/19.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
![Page 20: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/20.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
ABC
![Page 21: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/21.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
ABCABC
![Page 22: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/22.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
ABCABC
![Page 23: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/23.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
ABC
ABCABC
![Page 24: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/24.jpg)
Dependencies are implicitand 'backwards'
data
results
rule a: input: "start/{sample}.txt" output: "mid/{sample}.txt" shell: "sort {input} > {output}"
rule b: input: "mid/{sample}.txt" output: "final/{sample}.summary" shell: "uniq -c {input} > {output}"
$ snakemake final/ABC.summary
ABCABC
ABCABC
![Page 25: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/25.jpg)
http://slides.com/johanneskoester/snakemake-tutorial-2016#/
https://snakemake.readthedocs.io/en/stable/
https://bitbucket.org/snakemake/snakemake/overview
https://github.com/leipzig/SandwichesWithSnakemake
https://molb7621.github.io/workshop/Classes/snakemake-tutorial.html
http://blog.byronjsmith.com/snakemake-analysis.html
![Page 26: Building a reproducible workow with Snakemake and Singularity€¦ · Wolfgang Resch - HPC @ NIH - staff@hpc.nih.gov Building a reproducible workow with Snakemake and Singularity](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed6cad9777e4e4f012b27f4/html5/thumbnails/26.jpg)
reads[fastq]
report[html]
transcriptome[fa]
transcriptome index[salmon index]
salmon[quant.sf]
alignment[bam]
counts[count.tsv]
merged data[rdata]
alignment qc[rseqc out]
salmon
R
rseqc
hisat2
featureCounts
fastqc
salmon
R