parallel programing in r - revolutions
Post on 02-Feb-2017
222 Views
Preview:
TRANSCRIPT
ParallelCompu,nginR
BioC2009,Sea7le,July2009
WhoareREvolu,onCompu,ng?
REvolu,onCompu,ng:– Isacommercialopen‐sourcecompany,foundedin2007– ProvidesservicesandproductsbasedonR
• The“RedHat”®forR– Producesfreeandsubscrip,on‐basedhigh‐performance,enhanceddistribu,onsofR
– Offerssupport,training,valida,onandotherservicesaroundR
– Hasexper,seinhigh‐performanceanddistributedcompu,ng
– IsafinancialandtechnicalcontributortotheRcommunity– Hasopera,onsinNewHaven,Sea7le,andSanFrancisco
The People of REvolution
• Mar$nSchultz,ChiefScien,ficOfficer(ArthurKWatsonProfessorofComputerScience,YaleUniversity;founderofScien,ficCompu,ngAssociates;researchinalgorithmdesign,parallelprogrammingenvironmentsandarchitectures)
• DavidSmith,DirectorofCommunity&Rblogger(co‐authorofAnIntroduc+ontoR,ESS)
• BryanLewis,AmbassadorofCool(akaDirectorofSystemsEngineering;appliedmathinterestsinnumericalanalysisofinverseproblems;formerCEOofRocketcalc)
• DaneseCooper,OpenSourceDiva(boardofdirectors,OpenSourceIni,a,ve;member,ApacheSo`wareFounda,on;advisoryboard,Mozilla.org;previouslyseniordirector,opensourcestrategiesatIntelandSun)
• SteveWeston,SeniorResearchScien,st,DirectorofEngineering(REvolu,onandScien,ficCompu,ngAssociates;developmentofNetWorkSpaces–parallelprogrammingwithR,Python,Ruby,andMatlab–NetworkLinda,Paradise,andPiranha).
• JayEmerson,DeptSta,s,cs,YaleUniversity(authorofbigmemorypackage)
WhatisREvolu,onR?
• REvolu,onRisthefreedistribu,onofR– Op,mizedforspeed– Usesmul,pleCPUs/coresforperformance– ForWindowsandMacOS(soon:Ubuntu)– Supportviacommunityforums
• REvolu,onREnterpriseisourenhanced,subscrip,on‐onlydistribu,onofR– Telephone/emailsupportfromrealRexperts– Suitableforuseinregulated/validatedenvironments– IncludesproprietaryParallelRpackagesforreliabledistributedcompu,ngwithR
• onclustersorinthecloud– Supportedon64‐bitWindows,Linux
Suppor,ngtheRCommunity
Weareanopensourcecompanysuppor,ngtheRcommunity:
• BenefactorofRFounda$on• FinancialsupporterofRconferencesandusergroups• Newfunc$onalitydevelopedincoreRtocontributedunderGPL
• 64‐bitWindowssupport• Step‐debuggingsupport
• REvangelism
“Revolu$ons”Blog:blog.revolu,on‐compu,ng.comDailyNewsaboutR,sta1s1cs,andopen‐source
5
Intoday’slab:
• Introduc,ontoParallelProcessing• Mul,‐ThreadedProcessing
– Compu,ngontheGPU
• Iterators• Theforeachloop• Usingmul,plecores:SMP
• ClusterCompu,ng
• Mul,‐Stratumparallelism
• Q&A/Exercises6
GemngStarted
• R2.10.x,and2.9.xonWindows:install.packages("foreach",type="source")
install.packages("iterators",type="source")
• R2.9.x,Mac/Linuxonly:install.packages("doMC")
require(doMC)
• Windows/Mac:– InstallREvolu,onREnterprise2.0(R2.7.2)require(doNWS)
7
8
Introduc,ontoParallelProcessing
WithanasidetoHigh‐PerformanceCompu,ng
HPCo`enmeansefficientlyexploi,ngspecializedhardware
ImagescopyrightCray,Xlinix,NVIDIAfromupper‐left,clockwise.
WhatisHigh‐PerformanceCompu,ng(HPC)?
Imagefromncbr.sdsc.edu
WhatisHigh‐PerformanceCompu,ng(HPC)?
• Thesedays,HPCisfrequentlyassociatedwithCOTS*clustercompu,ngandwithSIMDvectoriza,onandpipelining(GPUs)
*Commodity,offtheshelf
• New:cloudcompu,ng
• HPCiso`enconcernedwithmul,‐processing(parallel
processing),thecoördina,onof
mul,ple,simultaneously
running(sub)programs
– Threads
– Processes
– Clusters
ImageCopyrightLawrenceLivermoreNationalLab
WhatisHigh‐PerformanceCompu,ng(HPC)?
HPCo`eninvolveseffec,velymanaginghugedatasets
– Parallelfilesystems(GPFS,PVFS2,Lustre,GFS2,S3…)
– Paralleldataopera,ons(map‐reduce)
– Workingwithhigh‐performancedatabases
– bigmemorypackageinR
ImageCopyrightHP(amulti‐petabytestoragesystem)
WhatisHigh‐PerformanceCompu,ng(HPC)?
ATaxonomyofParallelProcessing
• Mul$‐node/cluster/cloudcompu$ng(heavyweightprocesses)– Memorydistributedacrossnetwork– Examples:foreach,SNOW,Rmpi,batchprocessing
• Mul$‐core/mul$‐processorcompu$ng(heavyweightprocesses)– SMP:SymmetricMul,‐Processing– Independentmemoryinsharedspace– Naturallyscalestomul,‐nodeprocessing– Examples:mul,core(Windows/Unix),foreach
• Mul$‐threadedprocessing(lightweightprocesses)– Usuallysharedmemory– Hardertoscaleoutacrossnetworks– Examples:threadedlinear‐algebralibrariesforR(ATLAS,MKL);GPU
processors(CUDA/NVIDIA;ct/INTEL)
13
14
Mul,‐ThreadedProcessing
Whatisthreadedprogramming?
• Athreadisakindofprocessthatsharesaddressspacewithitsparentprocess
• Created,destroyed,managedandsynchronizedinCcode– POSIXthreads– OpenMPthreads
• Fast,butdifficulttoprogram– Easytooverwritevariables– Needtoworryaboutsynchroniza,on
15
Exploi,ngthreadswithR
• RlinkstoBLAS(BasicLinearAlgebraSubprograms)librariesforefficientvector/matrixopera,ons
• Linux:NeedtocompileandlinkwiththreadedBLAS(ATLAS)
• Windows/Mac:REvolu,onRlinkedtoIntelMKLlibraries,usesasmanythreadsascores– Manyhigher‐levelopera,onsop,mizedaswell
• MacOS:CRANbinaryusesveclibBLAS– threaded,pre7ygoodperformance
16
REvolution R SVD Performance
Exampledatamatrix150,000x500fast.svd
Quad‐coreIntelCore2CPU,WindowsVista64‐bitWorkstation
Revolu,onRPerformance
Mul,‐ThreadedProcessing
17
18
GPUProgramming
WhatisaGPU?
• Dedicatedprocessingchip(orcard)dedicatedtofastfloa,ng‐pointopera,ons– Originallyfor3‐Dgraphicscalcula,ons
• Highlyparallel:100’sofprocessorsonasinglechip,capableofrunning1000’softhreads
• Usuallyincludesdedicatedhigh‐speedRAM,accessibleonlybyGPU– Needtotransferdatain/out
• ProgrammeddirectlyusingcustomCdialect/compilers
• >90%ofnewdesktops/laptopshaveanintegratedGPU
19
GeForce8800GT
• LaunchedOct29,2007• 512Mbof256‐bitmemory• 128processors• 512simultaneousthreads• <$200
• DownloadNVIDIACUDATools:– h7p://www.nvidia.com/object/cuda_home.html
• Tutorial– h7p://www.ddj.com/architect/207200659
20
PerformanceComparison
Method Time(seconds)
BaseRconvolvefunc,on 9.89
AMDACML 6.29
FFTW(8threads) 3.75
CUDAonGeForce8800GT 1.88(singleprecision)
21
Convolve2vectorsoflength2^22 60Mbofdata
Quaddual‐coreprocessor/GPU
22
IntroducingIterators
ThoughtExperiment:DrawingCards
• Youaretheteacherof10grade‐schoolpupils.• Classproject:draweachofthe52playingcardsasaposter.
• Eachchildhassuppliesofposterpaperandcrayons,butrequiresareferencecardtocopy.
• Howtoorganizethepupils?
23
Iterators
> require(iterators) • Generalizedloopvariable• Valueneednotbeatomic
– Rowofamatrix– Randomdataset– Chunkofadatafile– Recordfromadatabase
• Createwith:iter • Getvalueswith:nextElem • Usedasindexingargumentwithforeach
Iteratorsarememoryfriendly
• Allowdatatobesplitintomanageablepiecesonthefly
• Helpsalleviateproblemswithprocessinglargedatastructures
• Piecescanbeprocessedinparallel
Iteratorsactasadaptors
• Allowsyourdatatobeprocessedbyforeachwithoutbeingconverted
• Caniterateovermatricesanddataframesbyroworbycolumn:
it <- iter(Boston, by="row") nextElem(it)
NumericIterator
> i <- iter(1:3) > nextElem(i) [1] 1
> nextElem(i) [1] 2 > nextElem(i) [1] 3 > nextElem(i) Error: StopIteration
Longsequences
> i <- icount(1e9) > nextElem(i) [1] 1 > nextElem(i) [1] 2 > nextElem(i) [1] 3 > nextElem(i) [1] 4 > nextElem(i) [1] 5
Matrixdimensions
> M <- matrix(1:25,ncol=5) > r <- iter(M,by="row") > nextElem(r) [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 > nextElem(r) [,1] [,2] [,3] [,4] [,5] [1,] 2 7 12 17 22 > nextElem(r) [,1] [,2] [,3] [,4] [,5] [1,] 3 8 13 18 23
DataFile
> rec <- iread.table("MSFT.csv",sep=",", header=T, row.names=NULL) > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.91 30.25 29.4 29.86 76935100 28.73 > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.7 29.97 29.44 29.81 45774500 28.68 > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.63 29.75 29.45 29.64 44607200 28.52 > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.65 30.1 29.53 29.93 50220200 28.8
30
Database
> library(RSQLite) > m <- dbDriver('SQLite') > con <- dbConnect(m, dbname="arrests") > it <- iquery(con, 'select * from USArrests', n=10) > nextElem(it) Murder Assault UrbanPop Rape Alabama 13.2 236 58 21.2 Alaska 10.0 263 48 44.5 Arizona 8.1 294 80 31.0 Arkansas 8.8 190 50 19.5 California 9.0 276 91 40.6 Colorado 7.9 204 78 38.7 Connecticut 3.3 110 77 11.1 Delaware 5.9 238 72 15.8 Florida 15.4 335 80 31.9 Georgia 17.4 211 60 25.8
31
Infinite&Irregularsequences
iprime <- function() { lastPrime <- 1 nextEl <- function() { lastPrime <<- as.numeric(nextprime(lastPrime)) lastPrime } it <- list(nextElem=nextEl) class(it) <- c('abstractiter','iter') it}
> require(gmp) > p <- iprime() > nextElem(p) [1] 2 > nextElem(p) [1] 3
33
Loopingwithforeach
Loopingwithforeach
foreach (var=iterator) %dopar% { statements }
Evaluatestatementsuntiliteratorterminates statementswillreferencevariablevar Valuesof{ … }blockcollectedintoalist
Runssequentially(bydefault)(orforcewith%do% )
34
> foreach (j=1:4) %dopar% sqrt (j)
[[1]] [1] 1
[[2]] [1] 1.414214
[[3]] [1] 1.732051
[[4]] [1] 2
Warning message: executing %dopar% sequentially: no parallel backend registered
35
CombiningResults> foreach(j=1:4, .combine=c) %dopar% sqrt(j) [1] 1.000000 1.414214 1.732051 2.000000
> foreach(j=1:4, .combine='+’, .inorder=FALSE) %dopar% sqrt(j)
[1] 6.146264
Whenorderofevaluationisunimportant,use.inorder=FALSE
36
Referencingglobalvariables> z <- 2 > f <- function (x) sqrt (x + z)
> foreach (j=1:4, .combine='+') %dopar% f(j)
[1] 8.417609
foreachautomaticallyinspectscodeandensuresunboundobjectsarepropagatedtotheevaluationenvironment
37
Nestedforeachexecu,on
• foreach opera,onscanbenestedusing%:%operator
• Allowsparallelexecu,onacrossmul,pleloopslevels,“unrolling”theinnerloops
foreach(i=1:3, .combine=cbind) %:% foreach(j=1:3, .combine=c) %dopar% (i + j)
39
Speedingupcodewithforeach
SMPProcessing
Quickreviewofparallelanddistributedcompu,nginR
• NetWorkSpaces(packagenws;SMP,distributed)– GPL,alsocommerciallysupportedbyREvolu,onCompu,ng– Verycross‐pla�orm,distributedshared‐memoryparadigm– Fault‐tolerant
• Mul,Core(packagemulticore;SMPonly)– Linux/MacOS(requiresPOSIX)– UsesforktocreatenewRprocesses
• Rmpi(packageRmpi;SMP,distributed)– Fine‐grainedcontrolallowsveryhigh‐performancecalcula,ons– Canbetrickytoconfigure– LimitedWindowsandheterogeneousclustersupport
• SNOW(packagesnow;SMP,distributed*)– LimitedWindowssupport(*singlemachineonly)– Meta‐package:supportsMPI,sockets,NWS,PVM
40
Parallelbackendsforforeach
• %dopar%behaviourdependsoncurrent“registered”parallelbackend
• Modularparallelbackends
• registerDoSEQ(default)• registerDoNWS(NetWorkSpaces)• registerDoMC(mul,core,MacOS/Windows)
• FromTerminal/ESSonly!(R.appGUIwillcrash.)• registerDoSNOW• registerDoRMPI
41
GemngStarted:Mul,‐coreProcessing
• R2.10.x–waitun,lofficialrelease• R2.9.x
require(doMC)
registerDoMC(cores=2)
• REvolu,onREnterpriserequire(doNWS)
s <- sleigh(workerCount=2) registerDoNWS()
42
Asimplesimulation:
birthday <- function(n) { ntests <- 1000 pop <- 1:365 anydup <- function(i) any(duplicated( sample(pop, n, replace=TRUE)))
sum(sapply(seq(ntests), anydup)) / ntests }
x <- foreach (j=1:100) %dopar% birthday (j)
43
BirthdayExample‐,mings
Backend Time(s)registerDoSEQ() 41sregisterDoMC()#2cores 28sregisterDoNWS()#2workers 26s(*)
44
Dual‐core2.4GHzIntelMacBook:
system.time{ x <- foreach (j=1:100) %dopar% birthday (j) } # Elapsed
BirthdaySimula,on:Mul,core/NWS
> x <- foreach (j=1:100) %dopar% birthday (j) > plot(1:100, unlist(x),type="l")
46
UsingclusterswithNetworkSpaces
SemngUpaCluster
1. Iden,fymachinestoformnodesoncluster– EasiestwithLinux/MacOS
– PossiblewithWindows
2. Selectaservermachine– OKforthisonetobeonWindows
3. Makesurepasswordlesssshenabledoneachworkernode
– ssh nodename Revo --versionshouldwork
47
Semngupacluster,part2
4. Logintoserver,startREvolu,onR5. Createasleigh
require(doNWS) s <- sleigh(nodeList=c( rep("localhost",2), rep("thor",8), rep("loki",4)), launch=sshcmd) registerDoNWS(s)
6. Useforeachasbefore7. (op,onal)usejoinSleightoaddnewnodes
48
ParallelRandomForest
# a simple parallel random forest
library(randomForest) x <- matrix(runif(500), 100)
y <- gl(2, 50) wc <- 2
n <- ceiling(1000 / wc)
registerDoNWS(s) foreach(ntree=rep(n, wc), .combine=combine,
.packages='randomForest') %dopar% randomForest(x, y, ntree=ntree)
• Easier: randomShrubberyNWS()
49
Conver,ngexis,ngcode
• Converttheseloopstoforeach:– for:makebodyreturnitera,onvalueand.combine
– apply:useiter(X, by="row”)and.combine • Oriapply(X,1)
– lapply:useiter(mylist)
50
ppiStatsExample
• Sequen,al:bpMats1 <- lapply(bpList, function(x) { bpMatrix(x, symMat = TRUE, homodimer = FALSE, baitAsPrey = FALSE, unWeighted = FALSE, onlyRecip = FALSE, baitsOnly = FALSE) })
• Parallel:bpMats1 <- foreach(x=iter(bpList), .packages = "ppiStats") %dopar% { bpMatrix(x, sysMat = TRUE, homodimer = FALSE, baitAsPrey = FALSE, unWeighted = FALSE, onlyRecip = FALSE, baitsOnly = FALSE) }
51
ppiStatsExample
• Sequen,al:bpGraphs <- lapply(bpMats1, function(x) {
genBPGraph(x, directed = TRUE, bp = FALSE)
})
• Parallel:bpGraph <- foreach(x=iter(bpMat1),
.packages = "ppiStats") %dopar% {
genBPGraph(x, directed = TRUE, bp = FALSE) }
52
Excercise
• Findother“embarassinglyparallel”BioConductorexamples,andconverttoparallelwithforeach.
53
54
Mul,‐StratumParallelism
foreach(iterator)%dopar%{tasks}
foreach…
task task
foreach…
task task
CLUSTER
SMP
Anexampleofexplicitmulti‐stratum||ism
55
require ("doNWS") require ("foreach") require ("doMC")
s <- sleigh(nodelist=c(rep("localhost",2), rep("bladeserver",8)) registerDoNWS(s)
foreach (iterator_i, .packages=c("foreach", "doMC"))%dopar% { registerDoMC() foreach (iteratorj_) %dopar% { tasks… } }
Mul,‐stratumtemplate:NWS/Mul,core
56
Pi�allstoavoid
• Sequen,alvsParallelProgramming• RandomNumberGenera,on
– library(sprngNWS) – sleigh(workerCount=8, rngType=‘sprngLFG’)
• Nodefailure• CosmicRays
57
Conclusions
• Parallelcompu,ngiseasy!• Writeloopswithforeach/%dopar%
– Worksfineinasingle‐processorenvironment
– Third‐partyuserscanregisterbackendsformul,processororclusterprocessing
– Speedbenefitswithoutmodifyingcode
• Easyperformancegainsonmodernlaptops/desktops
• Expandtoclustersformeatyjobs
58
ThankYou!
• DavidSmith– david@revolu,on‐compu,ng.com,@revodavid
• REvolu,onCompu,ng– www.revolu,on‐compu,ng.com
• Revolu+ons,theRblog– blog.revolu,on‐compu,ng.com
• Downloads:– Slides:h7p://,nyurl.com/R‐Bioc‐slides– Script:h7p://,nyurl.com/R‐Bioc‐script
top related