using abstractions to scale up applications to campus grids
DESCRIPTION
Using Abstractions to Scale Up Applications to Campus Grids. Douglas Thain University of Notre Dame 28 April 2009. Outline. What is a Campus Grid? Challenges of Using Campus Grids. Solution: Abstractions Examples and Applications All-Pairs: Biometrics Wavefront: Economics - PowerPoint PPT PresentationTRANSCRIPT
Using AbstractionsUsing Abstractionsto Scale Up Applicationsto Scale Up Applications
to Campus Gridsto Campus Grids
Douglas ThainDouglas Thain
University of Notre DameUniversity of Notre Dame
28 April 200928 April 2009
OutlineOutline
What is a Campus Grid?What is a Campus Grid?
Challenges of Using Campus Grids.Challenges of Using Campus Grids.
Solution: AbstractionsSolution: Abstractions
Examples and ApplicationsExamples and Applications– All-Pairs: BiometricsAll-Pairs: Biometrics– Wavefront: EconomicsWavefront: Economics– Assembly: GenomicsAssembly: Genomics
Combining Abstractions TogetherCombining Abstractions Together
What is a Campus Grid?What is a Campus Grid?
A campus grid is an aggregation of all A campus grid is an aggregation of all available computing power found in an available computing power found in an institution:institution:– Idle cycles from desktop machines.Idle cycles from desktop machines.– Unused cycles from dedicated clusters.Unused cycles from dedicated clusters.
Examples of campus grids:Examples of campus grids:– 600 CPUs at the University of Notre Dame600 CPUs at the University of Notre Dame– 2000 CPUs at the University of Wisconsin2000 CPUs at the University of Wisconsin– 13,000 CPUs at Purdue University13,000 CPUs at Purdue University
Provides robust batch queueing on a Provides robust batch queueing on a complex distributed system.complex distributed system.
Resource owners control consumption:Resource owners control consumption:– ““Only run jobs on this machine at night.”Only run jobs on this machine at night.”– ““Prefer biology jobs over physics jobs.”Prefer biology jobs over physics jobs.”
End users express needs:End users express needs:– ““Only run this job where RAM>2GB”Only run this job where RAM>2GB”– ““Prefer to run on machinesPrefer to run on machines
http://www.cs.wisc.edu/condor
The Assembly LanguageThe Assembly Languageof Campus Gridsof Campus Grids
User Interface:User Interface:– N x { run program X with files F and G }N x { run program X with files F and G }
System Properties:System Properties:– Wildly varying resource availability.Wildly varying resource availability.– Heterogeneous resources.Heterogeneous resources.– Unpredictable preemption.Unpredictable preemption.
Effect on Applications:Effect on Applications:– Jobs can’t run for too long...Jobs can’t run for too long...– But, they can’t run too quickly, either!But, they can’t run too quickly, either!– Use file I/O for inter-process communication.Use file I/O for inter-process communication.– Bad choices cause chaos on the network and Bad choices cause chaos on the network and
heartburn for system administrators.heartburn for system administrators.
I have 10,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers.
I have a laptop.
I own a few machines I can get cycles from ND and Purdue
Now What?
ObservationObservation
In a given field of study, a single person In a given field of study, a single person may repeat the same may repeat the same pattern of work of work many times, making slight changes to the many times, making slight changes to the data and algorithms.data and algorithms.
If we knew in advance the intended If we knew in advance the intended pattern, then we could do a better job of pattern, then we could do a better job of mapping a complex application to a mapping a complex application to a complex system.complex system.
AbstractionsAbstractionsfor Distributed Computingfor Distributed Computing
Abstraction: a Abstraction: a declarative specificationdeclarative specification of the computation and data of a workload.of the computation and data of a workload.
A A restricted patternrestricted pattern, not meant to be a , not meant to be a general purpose programming language.general purpose programming language.
UsesUses data structures instead of files. instead of files.
Provide users with a Provide users with a bright path..
Regular structure makes it tractable to Regular structure makes it tractable to model and model and predict performance.predict performance.
All-Pairs AbstractionAll-Pairs Abstraction
AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )
returns matrix M wherereturns matrix M where
M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j
B1
B2
B3
A1 A2 A3
F F F
A1A1
An
B1B1
Bn
F
AllPairs(A,B,F)F
F F
F F
FMoretti, Bulosan, Flynn, Thain,AllPairs: An Abstraction… IPDPS 2008
allpairs A B F.exe
Example ApplicationExample Application
Goal: Design robust face comparison function.Goal: Design robust face comparison function.
F
0.05
F
0.97
Similarity Matrix ConstructionSimilarity Matrix Construction
11 .8.8 .1.1 00 00 .1.1
11 00 .1.1 .1.1 00
11 00 .1.1 .3.3
11 00 00
11 .1.1
11
F
Current Workload:4000 images256 KB each10s per F(five days)
Future Workload:60000 images1MB each1s per F(three months)
Non-Expert User on a Campus GridNon-Expert User on a Campus GridTry 1: Each F is a batch job.Failure: Dispatch latency >> F runtime.
HN
CPU CPU CPU CPUF F F FCPUF
Try 2: Each row is a batch job.Failure: Too many small ops on FS.
HN
CPU CPU CPU CPUF F F FCPUFFFF FF
FFFF
FFF
FFF
Try 3: Bundle all files into one package.Failure: Everyone loads 1GB at once.
HN
CPU CPU CPU CPUF F F FCPUFFFF FF
FFFF
FFF
FFF
Try 4: User gives up and attemptsto solve an easier or smaller problem.
All-Pairs AbstractionAll-Pairs Abstraction
AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )
returns matrix M wherereturns matrix M where
M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j
B1
B2
B3
A1 A2 A3
F F F
A1A1
An
B1B1
Bn
F
AllPairs(A,B,F)F
F F
F F
F
% allpairs compare.exe adir bdir% allpairs compare.exe adir bdir
Distribute Data Via Spanning TreeDistribute Data Via Spanning Tree
An Interesting TwistAn Interesting Twist
Send the absolute minimum amount of data Send the absolute minimum amount of data needed to each of N nodes from a central serverneeded to each of N nodes from a central server– Each job must run on exactly 1 node.Each job must run on exactly 1 node.– Data distribution time: Data distribution time: O( D sqrt(N) )O( D sqrt(N) )
Send all data to all N nodes via spanning tree Send all data to all N nodes via spanning tree distribution:distribution:– Any job can run on any node.Any job can run on any node.– Data distribution time: Data distribution time: O( D log(N) )O( D log(N) )
It is both faster and more robust to send all It is both faster and more robust to send all data to all nodes via spanning tree.data to all nodes via spanning tree.
Choose the Right # of CPUsChoose the Right # of CPUs
What is the right metric?What is the right metric?
What’s the right metric?What’s the right metric?
Speedup?Speedup?– Seq Runtime / Parallel RuntimeSeq Runtime / Parallel Runtime
Parallel Efficiency?Parallel Efficiency?– Speedup / N CPUs?Speedup / N CPUs?
Neither works, because the number of CPUs Neither works, because the number of CPUs varies over time and between runs.varies over time and between runs.
Better Choice: Cost EfficiencyBetter Choice: Cost Efficiency– Work Completed / Resources ConsumedWork Completed / Resources Consumed– Cars: Miles / GallonCars: Miles / Gallon– Planes: Person-Miles / GallonPlanes: Person-Miles / Gallon– Results / CPU-hoursResults / CPU-hours– Results / $$$Results / $$$
All-Pairs Abstraction
Wavefront ( R[x,0], R[0,y], F(x,y,d) )Wavefront ( R[x,0], R[0,y], F(x,y,d) )
R[4,2]
R[3,2] R[4,3]
R[4,4]R[3,4]R[2,4]
R[4,0]R[3,0]R[2,0]R[1,0]R[0,0]
R[0,1]
R[0,2]
R[0,3]
R[0,4]
Fx
yd
Fx
yd
Fx
yd
Fx
yd
Fx
yd
Fx
yd
F
F
y
y
x
x
d
d
x F Fx
yd yd
% wavefront func.exe infile outfile% wavefront func.exe infile outfile
The Performance ProblemThe Performance Problem
Dispatch latency really matters: a delay in one Dispatch latency really matters: a delay in one holds up all of its children.holds up all of its children.If we dispatch larger sub-problems:If we dispatch larger sub-problems:– Concurrency on each node increases.Concurrency on each node increases.– Distributed concurrency decreases.Distributed concurrency decreases.
If we dispatch smaller sub-problems:If we dispatch smaller sub-problems:– Concurrency on each node decreases.Concurrency on each node decreases.– Spend more time waiting for jobs to be dispatched.Spend more time waiting for jobs to be dispatched.
So, model the system to choose the block size.So, model the system to choose the block size.And, build a fast-dispatch execution system.And, build a fast-dispatch execution system.
Wavefront ( R[x,0], R[0,y], F(x,y,d) )Wavefront ( R[x,0], R[0,y], F(x,y,d) )
R[4,2]
R[3,2] R[4,3]
R[4,4]R[3,4]R[2,4]
R[4,0]R[3,0]R[2,0]R[1,0]R[0,0]
R[0,1]
R[0,2]
R[0,3]
R[0,4]
Fx
yd
Fx
yd
Fx
yd
Fx
yd
Fx
yd
Fx
yd
F
F
y
y
x
x
d
d
x F Fx
yd yd
Block Size= 2
Model of 1000x1000 WavefrontModel of 1000x1000 Wavefront
worker
workerworker
workerworker
workerworker
workqueue
FIn.txt out.txt
put F.exeput in.txtexec F.exe <in.txt >out.txtget out.txt
wavefront
queuetasks
tasksdone
100s of workersdispatched toNotre Dame,Purdue, and
Wisconsin
worker
workerworker
workerworker
workerworker
workqueue
F
In.txt out.txt
put F.exeput in.txtexec F.exe <in.txt >out.txtget out.txt
wavefront
queuetasks
tasksdone
wavefront
F
100s of workersdispatched toNotre Dame,Purdue, and
Wisconsin
500x500 Wavefront on ~200 CPUs500x500 Wavefront on ~200 CPUs
Wavefront on a 200-CPU ClusterWavefront on a 200-CPU Cluster
Wavefront on a 32-Core CPUWavefront on a 32-Core CPU
The Genome Assembly ProblemThe Genome Assembly Problem
AGTCGATCGATCGATAATCGATCCTAGCTAGCTACGA
AGTCGATCGATCGAT
AGCTAGCTACGA TCGATAATCGATCCTAGCTA
Chemical Sequencing
Computational Assembly
AGTCGATCGATCGAT
AGCTAGCTACGA TCGATAATCGATCCTAGCTA
Millions of “reads”100s bytes long.
Sample GenomesSample Genomes
ReadsReads DataData PairsPairs
SequentialSequential
TimeTime
A. gambiaeA. gambiae
scaffoldscaffold
101K101K 80MB80MB 738K738K 12 hours12 hours
A. gambiaeA. gambiae
completecomplete
180K180K 1.4GB1.4GB 12M12M 6 days6 days
S. bicolorS. bicolor 7.9M7.9M 5.7GB5.7GB 84M84M 30 days30 days
Assemble( set S, Test(), Align(), Assm() )Assemble( set S, Test(), Align(), Assm() )
0. AGCCTGCATTA…1. CATTAACGAAC…2. GACTGACTAGC…3, TGACCGATAAA…
Candidate Pairs0 is similar to 11 is similar to 31 is similar to 4
0. AGCCTGCATTA1. CATTAACGAAC…
Sequence Data
AGTCGATCGATCGATAATC…
List of Alignments
Assembled Sequence
CPU Bound
I/O
Bou
ndR
AM
Bound
Align
Test Assem
worker
workerworker
workerworker
workerworker
workqueue
in.txt out.txt
put align.exeput in.txtexec F.exe <in.txt >out.txtget out.txt
100s of workersdispatched toNotre Dame,Purdue, and
Wisconsin
alignmaster
queuetasks
tasksdone
align.exe
detail of a single worker:
Distributed Genome AssemblyDistributed Genome Assembly
test
assemble
Small Genome (101K reads)Small Genome (101K reads)
Medium Genome (180K reads)Medium Genome (180K reads)
Large Genome (7.9M)Large Genome (7.9M)
From Workstation to GridFrom Workstation to Grid
What’s the Upshot?What’s the Upshot?
We can do full-scale assemblies as a routine We can do full-scale assemblies as a routine matter on existing conventional machines.matter on existing conventional machines.
Our solution is faster (wall-clock time) than the Our solution is faster (wall-clock time) than the next faster assembler run on 1024x BG/L.next faster assembler run on 1024x BG/L.
You could almost certainly do better with a You could almost certainly do better with a dedicated cluster and a fast interconnect, but dedicated cluster and a fast interconnect, but such systems are not universally available.such systems are not universally available.
Our solution opens up research in assembly to Our solution opens up research in assembly to labs with “NASCAR” instead of “Formula-One” labs with “NASCAR” instead of “Formula-One” hardware.hardware.
What Other AbstractionsWhat Other AbstractionsMight Be Useful?Might Be Useful?
Map( set S, F(s) )Map( set S, F(s) )Explore( F(x), x: [a….b] )Explore( F(x), x: [a….b] )Minimize( F(x), delta )Minimize( F(x), delta )Minimax( state s, A(s), B(s) )Minimax( state s, A(s), B(s) )Search( state s, F(s), IsTerminal(s) )Search( state s, F(s), IsTerminal(s) )Query( properties ) -> set of objectsQuery( properties ) -> set of objects
FluidFlow( V[x,y,z], F(v), delta )FluidFlow( V[x,y,z], F(v), delta )
How do we connect multiple How do we connect multiple abstractions together?abstractions together?
Need a meta-language, perhaps with its own Need a meta-language, perhaps with its own atomic operations for simple tasks:atomic operations for simple tasks:Need to manage (possibly large) intermediate Need to manage (possibly large) intermediate storage between operations.storage between operations.Need to handle data type conversions between Need to handle data type conversions between almost-compatible components.almost-compatible components.Need type reporting and error checking to avoid Need type reporting and error checking to avoid expensive errors.expensive errors.If abstractions are feasible to model, then it may If abstractions are feasible to model, then it may be feasible to model entire programs.be feasible to model entire programs.
Connecting Abstractions in BXGridConnecting Abstractions in BXGrid
B1
B2
B3
A1 A2 A3
F F F
F
F F
F F
F
L brown
L blue
R brown
R brown
S1
S2
S3
eye color
F
F
F
ROCCurve
S = Select( color=“brown” )
B = Transform( S,F )
M = AllPairs( A, B, F )
Bui, Thomas, Kelly, Lyon, Flynn, ThainBXGrid: A Repository and Experimental Abstraction… poster at IEEE eScience 2008
50
51
Implementing AbstractionsImplementing Abstractions
S = Select( color=“brown” )
B = Transform( S,F )
M = AllPairs( A, B, F )
DBMS
Relational Database (2x)
Active Storage Cluster (16x)
CPU
Relational Database
CPU CPU CPU
CPU CPU CPU CPU
Condor Pool (500x)
Largest Combination so FarLargest Combination so Far
Complete Select/Transform/All-Pairs biometric Complete Select/Transform/All-Pairs biometric experiment on 58,396 irises from the Face experiment on 58,396 irises from the Face Recognition Grand Challenge.Recognition Grand Challenge.To our knowledge, the largest experiment ever To our knowledge, the largest experiment ever run on publically available data.run on publically available data.Competing biometric research relies on samples Competing biometric research relies on samples of 100-1000 images, which can miss important of 100-1000 images, which can miss important population effects. population effects. Reduced computation time from 800 days to 10 Reduced computation time from 800 days to 10 days, making it feasible to repeat multiple times days, making it feasible to repeat multiple times for a graduate thesis.for a graduate thesis.
Abstractions ReduxAbstractions ReduxCampus grids provide enormous Campus grids provide enormous computing power, but are very challenging computing power, but are very challenging to use effectively.to use effectively.
An abstraction provides a robust, scalable An abstraction provides a robust, scalable solution to a category of problems.solution to a category of problems.
Multiple abstractions can be chained Multiple abstractions can be chained together to solve very large problems.together to solve very large problems.
Could a menu of abstractions cover a Could a menu of abstractions cover a significant fraction of the application mix?significant fraction of the application mix?
AcknowledgmentsAcknowledgments
Cooperative Computing LabCooperative Computing Lab– http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl
Grad StudentsGrad Students– Chris MorettiChris Moretti– Hoang Bui Hoang Bui – Li YuLi Yu– Mike OlsonMike Olson– Michael AlbrechtMichael Albrecht
Faculty:Faculty:– Patrick FlynnPatrick Flynn– Nitesh ChawlaNitesh Chawla– Kenneth JuddKenneth Judd– Scott EmrichScott Emrich
NSF Grants CCF-0621434, CNS-0643229NSF Grants CCF-0621434, CNS-0643229
UndergradsUndergrads– Mike KellyMike Kelly– Rory CarmichaelRory Carmichael– Mark PasquierMark Pasquier– Christopher LyonChristopher Lyon– Jared BulosanJared Bulosan