genomic computation at scale with serverless, stackstorm and docker swarm
TRANSCRIPT
![Page 1: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/1.jpg)
Genomic Computation at Scalewith Serverless, StackStorm, and DockerSC17, 14 Nov 2017Dmitri ZimineFellow @ Extreme Networks@dzimine
Image by Miki Yoshihito, Creative Commons license
![Page 2: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/2.jpg)
Genomic Sequencing and Annotation
ACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGCTGGTAACGTATACCTCT...
Sequencer
Sequenced Genome
DNA Sample
Annotated Sequence
Computein silko
![Page 3: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/3.jpg)
3
So that…
Source: http://www.yourgenome.org
![Page 4: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/4.jpg)
Victor SolovyevPartner,
Leading scientist in computational
biologyVictor Solovyev is a leading scientist in computational biology. His experience is a good mixture of academic positions, including Professor at Royal Holloway and KAUST, and various industry roles. His research on bioinformatics and genomic computations are published in Nature, Science, Genome Research and highly cited.
As Chief Sci. Officer at Softberry, he is leading software development for biomedical data analysis and research in computational biology. Softberry software products have been used in over 2000 research publications in 2016 alone. Fgenesh program has been cited in ~ 3200, Bprom program in ~ 800, Fgenesb pipeline in ~500 scientific publications.
![Page 5: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/5.jpg)
5
fgenesb pipeline: some [prev] results
![Page 6: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/6.jpg)
![Page 7: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/7.jpg)
PROPERTIES:
Challenges:• Offer annotation pipelines online• Use cloud, for large elastic capacity• Handle scale - spiky workload• Economically
GAaaS – Genomic Annotation as a Service
![Page 8: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/8.jpg)
Agenda
8
Problem & Solution
Domain demands, technology selection & serverless, toolchain, solution overview
Show & Tell Demo
Discussion Lessons learned, what to keep & what to refactor, the path forward
![Page 9: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/9.jpg)
Typical genomic annotation pipelineSearch for similar
proteins in databases
KEGG
Prediction of genes and proteins
Compilation and presentation of
results
NR
fgenesb
Blast(NR)
GCView
50-100Gb
KOALA(KEGG)
1Mb-3Gb
HighlyParallel-able
![Page 10: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/10.jpg)
Annotation Pipelines
A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.
Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020
![Page 11: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/11.jpg)
Annotation Pipelines
A basic exome pipeline delivering called variants from raw sequence could consist of as few as 12 steps, most of which can be run in parallel, but a real analysis will typically involve several additional downstream steps and complex report generation.
Source: Brief Bioinform bbw020. DOI: https://doi.org/10.1093/bib/bbw020
PROPERTIES:
• Steps: • jobs/functions • Run times – may be hours & days• Diverse (a.k.a. “don’t run on the same box”)
• Workflow orchestration:• Logical patterns: splits, parallels, joins• Data flow:
Upstream results –> downstream inputs• Scale dimentions: spiky load
• Low volume of requests, • Very high compute demand per request
Properties:
![Page 12: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/12.jpg)
Serverless
![Page 13: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/13.jpg)
Authoritative: Mike Roberts on martinfowler.com:
My summary• Function, not service: “down when done”• Scale – elastic, infinite, transparent for developer• Pay per use consumption model
https://goo.gl/bTfgfU
What is Serverless?
![Page 14: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/14.jpg)
14
Serverless fits!
*) BYOC – Bring Your Own Code (see the serverless compute manifesto, https://goo.gl/q9HsXB
Typical Serverless requirements:
• “Functions”, not “servers”, down when done
• Elastic scale: handle spiky workload pattern
• BYOC*: package algorithms into containers
• Launch on a variety of events
![Page 15: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/15.jpg)
Additional requirements:
• Long running times: hours
• Pipeline orchestration: execution logic and data passing
• Local Dev environment, consistent and convenient
15
Serverless fits, but…
Typical Serverless requirements:
• Elastic scale: handle spiky workload pattern
• “Functions”, not “servers”, down when done
• BYOC*: package programs into containers, run everywhere
• Launch on a variety of events
![Page 16: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/16.jpg)
Why not <…>
16
AWS Lambda? 5 min limitation - jobs run for hours and days
Azure? No native support for Functionsin docker containers *
OpenWhisk?Lacks powerful workflow to orchestrate pipelines (only sequences)
*) At the time of selecting. I will cover ”what has changed” in Discussion.
![Page 17: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/17.jpg)
D I Y
![Page 18: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/18.jpg)
18
![Page 19: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/19.jpg)
Terraform provisions infra on AWS (WIP);
Vagrant for local dev infra.
Ansible deploys & cofigures software on
Infra.
Docker to containerize functions and
push to local Docker Registry.
StackStorm orchestrates pipeline
executions,
invokes Swarm to run functions,
dynamically scales Swarm on load.
Tool Chain
![Page 20: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/20.jpg)
StackStorm, in 1 minute
ActionsSensors
WorkflowsRules
IT Domains
Config mgmtStorageNetworking ContainersCloud InfraMonitoring Ops Support
Triggers Calls
![Page 21: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/21.jpg)
©2017 Extreme Networks, Inc. All rights reserved
StackStorm is like …
ActionsSensors
WorkflowsRules
Step Functions
AWS Lambda
OpenSource, for DIY Serverless
![Page 22: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/22.jpg)
Three Sides to Serverless Story
DevOps
Developer
End User
Submits sequence,Gets results,fast and cheap.
Packs algorithms incontainers, Defines pipelines
Provides infrastructure
![Page 23: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/23.jpg)
1. DevOps: deploys serverless solution
23share(:rw) data(:ro)
StackStorm
other infra…
f(x)
Registry
Controller
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
/share /data
$ function
Scale
DevOps
![Page 24: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/24.jpg)
![Page 25: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/25.jpg)
2. Developer: creates functions, defines pipeline
25
StackStorm
Registry
Create functions (BYOC), pack into Docker image,push to local Registry
Define pipelines as StackStorm workflowsDeveloper
1
2
f(x)
f(x)
f(x)
f(x)
![Page 26: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/26.jpg)
StackStorm
StackStorm sends results back to user
Swarmcontroller
2
46Docker pulls
function’s images 5Functions run in containers, produce data
f(x)
StackStorm runs workflowschedules functionsas jobs on Swarm
SwarmWorker
3Swarm schedulesservices
User sendssequence data1
f(x) f(x)
Registry
3. User submits data, System runs pipeline & produces results
End User
![Page 27: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/27.jpg)
27
Genomic annotation pipeline with StackStorm, Docker,
and Docker Swarm
Show & Tell, PART 1
![Page 28: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/28.jpg)
![Page 29: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/29.jpg)
Scale: dynamically, on load
29
share(:rw) data(:ro)
StackStorm
other infra…
f(x)
Registry
Controller
f(x)
f(x)
f(x)
Worker
f(x)
f(x)
f(x)
Worker
f(x)
Worker
Scale
![Page 30: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/30.jpg)
30
Show & Tell, PART 2
Dynamically scaling Swarm cluster on AWS,
on workload
![Page 31: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/31.jpg)
![Page 32: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/32.jpg)
Agenda
32
Problem & Solution
Domain demands, technology selection & serverless, toolchain, solution overview
Show & Tell Demo
Discussion Lessons learned, what to keep & what to refactor, the path forward
![Page 33: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/33.jpg)
![Page 34: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/34.jpg)
Serverless hype accelerates
25+ framewors … but no turn-key fit yet
![Page 35: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/35.jpg)
Kubernetes Won Container Arm Race
now with built-in AWS autoscaler .
![Page 36: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/36.jpg)
Azure Introduced Container Instances
no messing with VMs, per-second billing .
![Page 37: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/37.jpg)
We are outpaced by technology
![Page 38: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/38.jpg)
We are outpaced by technology
So What?
![Page 39: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/39.jpg)
Path Forward: Options
Option 1: Kubernetes
• Use Kubernetes pack from StackStorm Exchange• Utilize k8s “run to completion” jobs• Deploy on AWS, minikube for local development, • Leverage AWS autoscaler for elastic capacity
StackStorm handles pipeline workflow, calls k8s Jobs. Same app developer experience.
39
![Page 40: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/40.jpg)
Path Forward: Options
Option 2: Azure
• Use Azure’s ”Self-orchestration” option with StackStorm• Azure provides containers on demand (no VMs!)• Per container, per second billing
StackStorm handles pipeline workflow, calls Azure containers. App developer experience stays the same.
40
![Page 41: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/41.jpg)
StackStorm
StackStorm sends results back to user
Azure Container
Service
2
46Docker pulls
function’s imagesfrom Registry
5Functions run in containers, produce data
f(x)
StackStorm runs workflowschedules functionsas containers on Azure
AzureContainerInstance
3Azure schedulescontainer instances
User sendssequence data1
f(x) f(x)
Registry
Path forward: Change to Azure Container Instances
End User
![Page 42: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/42.jpg)
42
![Page 43: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/43.jpg)
43
![Page 44: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/44.jpg)
STACKSTORM EVENT-DRIVEN AUTOMATION ALLOWS YOU TO GET YOUR SOLUTION UP AND RUNNING QUICKLY SO YOU CAN DELIVER BUSINESS FAST, EXPERIMENT AND INNOVATE. ONCE YOU HAVE IT JUST RIGHT, YOU CAN BUILD A MORE PERMANENT VERSION WITH MICROSERVICES
ActionsSensors
WorkflowsRules
44
![Page 45: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/45.jpg)
StackStorm is an innovation platform where we can build solutions, experiment and learn, while deliver business value, before moving implementation to dedicated services
![Page 46: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/46.jpg)
46
StackStorm OpenSourcePlatform
Brocade Workflow Composer(StackStorm Enterprise Edition)
Network Automation
StackStorm Exchange Community
Security AssistedNetworking
![Page 47: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/47.jpg)
©2017 Extreme Networks, Inc. All rights reserved
Come and see! SC17 Excibition, Booth #519
47
![Page 48: Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm](https://reader031.vdocuments.site/reader031/viewer/2022021923/5a6d2d467f8b9af8418b4f1b/html5/thumbnails/48.jpg)
Image by Miki Yoshihito, Creative Commons license
Dmitri ZimineExtreme Networks@dziminehttp://github.com/dzimine/serverless-swarm
@Stack_Stormhttp://github.com/StackStorm/st2 Star 2,317
Thank You!