parallel programing in r - revolutions

Post on 02-Feb-2017

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ParallelCompu,nginR

BioC2009,Sea7le,July2009

WhoareREvolu,onCompu,ng?

REvolu,onCompu,ng:–  Isacommercialopen‐sourcecompany,foundedin2007–  ProvidesservicesandproductsbasedonR

•  The“RedHat”®forR–  Producesfreeandsubscrip,on‐basedhigh‐performance,enhanceddistribu,onsofR

–  Offerssupport,training,valida,onandotherservicesaroundR

–  Hasexper,seinhigh‐performanceanddistributedcompu,ng

–  IsafinancialandtechnicalcontributortotheRcommunity–  Hasopera,onsinNewHaven,Sea7le,andSanFrancisco

The People of REvolution

•  Mar$nSchultz,ChiefScien,ficOfficer(ArthurKWatsonProfessorofComputerScience,YaleUniversity;founderofScien,ficCompu,ngAssociates;researchinalgorithmdesign,parallelprogrammingenvironmentsandarchitectures)

•  DavidSmith,DirectorofCommunity&Rblogger(co‐authorofAnIntroduc+ontoR,ESS)

•  BryanLewis,AmbassadorofCool(akaDirectorofSystemsEngineering;appliedmathinterestsinnumericalanalysisofinverseproblems;formerCEOofRocketcalc)

•  DaneseCooper,OpenSourceDiva(boardofdirectors,OpenSourceIni,a,ve;member,ApacheSo`wareFounda,on;advisoryboard,Mozilla.org;previouslyseniordirector,opensourcestrategiesatIntelandSun)

•  SteveWeston,SeniorResearchScien,st,DirectorofEngineering(REvolu,onandScien,ficCompu,ngAssociates;developmentofNetWorkSpaces–parallelprogrammingwithR,Python,Ruby,andMatlab–NetworkLinda,Paradise,andPiranha).

•  JayEmerson,DeptSta,s,cs,YaleUniversity(authorofbigmemorypackage)

WhatisREvolu,onR?

•  REvolu,onRisthefreedistribu,onofR–  Op,mizedforspeed–  Usesmul,pleCPUs/coresforperformance–  ForWindowsandMacOS(soon:Ubuntu)–  Supportviacommunityforums

•  REvolu,onREnterpriseisourenhanced,subscrip,on‐onlydistribu,onofR–  Telephone/emailsupportfromrealRexperts–  Suitableforuseinregulated/validatedenvironments–  IncludesproprietaryParallelRpackagesforreliabledistributedcompu,ngwithR

•  onclustersorinthecloud–  Supportedon64‐bitWindows,Linux

Suppor,ngtheRCommunity

Weareanopensourcecompanysuppor,ngtheRcommunity:

•  BenefactorofRFounda$on•  FinancialsupporterofRconferencesandusergroups•  Newfunc$onalitydevelopedincoreRtocontributedunderGPL

•  64‐bitWindowssupport•  Step‐debuggingsupport

•  REvangelism

“Revolu$ons”Blog:blog.revolu,on‐compu,ng.comDailyNewsaboutR,sta1s1cs,andopen‐source

5

Intoday’slab:

•  Introduc,ontoParallelProcessing•  Mul,‐ThreadedProcessing

– Compu,ngontheGPU

•  Iterators•  Theforeachloop•  Usingmul,plecores:SMP

•  ClusterCompu,ng

•  Mul,‐Stratumparallelism

•  Q&A/Exercises6

GemngStarted

•  R2.10.x,and2.9.xonWindows:install.packages("foreach",type="source")

install.packages("iterators",type="source")

•  R2.9.x,Mac/Linuxonly:install.packages("doMC")

require(doMC)

•  Windows/Mac:–  InstallREvolu,onREnterprise2.0(R2.7.2)require(doNWS)

7

8

Introduc,ontoParallelProcessing

WithanasidetoHigh‐PerformanceCompu,ng

HPCo`enmeansefficientlyexploi,ngspecializedhardware

ImagescopyrightCray,Xlinix,NVIDIAfromupper‐left,clockwise.

WhatisHigh‐PerformanceCompu,ng(HPC)?

Imagefromncbr.sdsc.edu

WhatisHigh‐PerformanceCompu,ng(HPC)?

•  Thesedays,HPCisfrequentlyassociatedwithCOTS*clustercompu,ngandwithSIMDvectoriza,onandpipelining(GPUs)

*Commodity,offtheshelf

•  New:cloudcompu,ng

•  HPCiso`enconcernedwithmul,‐processing(parallel

processing),thecoördina,onof

mul,ple,simultaneously

running(sub)programs

–  Threads

–  Processes

–  Clusters

ImageCopyrightLawrenceLivermoreNationalLab

WhatisHigh‐PerformanceCompu,ng(HPC)?

HPCo`eninvolveseffec,velymanaginghugedatasets

–  Parallelfilesystems(GPFS,PVFS2,Lustre,GFS2,S3…)

–  Paralleldataopera,ons(map‐reduce)

–  Workingwithhigh‐performancedatabases

–  bigmemorypackageinR

ImageCopyrightHP(amulti‐petabytestoragesystem)

WhatisHigh‐PerformanceCompu,ng(HPC)?

ATaxonomyofParallelProcessing

•  Mul$‐node/cluster/cloudcompu$ng(heavyweightprocesses)–  Memorydistributedacrossnetwork–  Examples:foreach,SNOW,Rmpi,batchprocessing

•  Mul$‐core/mul$‐processorcompu$ng(heavyweightprocesses)–  SMP:SymmetricMul,‐Processing–  Independentmemoryinsharedspace–  Naturallyscalestomul,‐nodeprocessing–  Examples:mul,core(Windows/Unix),foreach

•  Mul$‐threadedprocessing(lightweightprocesses)–  Usuallysharedmemory–  Hardertoscaleoutacrossnetworks–  Examples:threadedlinear‐algebralibrariesforR(ATLAS,MKL);GPU

processors(CUDA/NVIDIA;ct/INTEL)

13

14

Mul,‐ThreadedProcessing

Whatisthreadedprogramming?

•  Athreadisakindofprocessthatsharesaddressspacewithitsparentprocess

•  Created,destroyed,managedandsynchronizedinCcode– POSIXthreads– OpenMPthreads

•  Fast,butdifficulttoprogram– Easytooverwritevariables– Needtoworryaboutsynchroniza,on

15

Exploi,ngthreadswithR

•  RlinkstoBLAS(BasicLinearAlgebraSubprograms)librariesforefficientvector/matrixopera,ons

•  Linux:NeedtocompileandlinkwiththreadedBLAS(ATLAS)

•  Windows/Mac:REvolu,onRlinkedtoIntelMKLlibraries,usesasmanythreadsascores– Manyhigher‐levelopera,onsop,mizedaswell

•  MacOS:CRANbinaryusesveclibBLAS–  threaded,pre7ygoodperformance

16

REvolution R SVD Performance

Exampledatamatrix150,000x500fast.svd

Quad‐coreIntelCore2CPU,WindowsVista64‐bitWorkstation

Revolu,onRPerformance

Mul,‐ThreadedProcessing

17

18

GPUProgramming

WhatisaGPU?

•  Dedicatedprocessingchip(orcard)dedicatedtofastfloa,ng‐pointopera,ons– Originallyfor3‐Dgraphicscalcula,ons

•  Highlyparallel:100’sofprocessorsonasinglechip,capableofrunning1000’softhreads

•  Usuallyincludesdedicatedhigh‐speedRAM,accessibleonlybyGPU– Needtotransferdatain/out

•  ProgrammeddirectlyusingcustomCdialect/compilers

•  >90%ofnewdesktops/laptopshaveanintegratedGPU

19

GeForce8800GT

•  LaunchedOct29,2007•  512Mbof256‐bitmemory•  128processors•  512simultaneousthreads•  <$200

•  DownloadNVIDIACUDATools:– h7p://www.nvidia.com/object/cuda_home.html

•  Tutorial– h7p://www.ddj.com/architect/207200659

20

PerformanceComparison

Method Time(seconds)

BaseRconvolvefunc,on 9.89

AMDACML 6.29

FFTW(8threads) 3.75

CUDAonGeForce8800GT 1.88(singleprecision)

21

  Convolve2vectorsoflength2^22  60Mbofdata

  Quaddual‐coreprocessor/GPU

22

IntroducingIterators

ThoughtExperiment:DrawingCards

•  Youaretheteacherof10grade‐schoolpupils.•  Classproject:draweachofthe52playingcardsasaposter.

•  Eachchildhassuppliesofposterpaperandcrayons,butrequiresareferencecardtocopy.

•  Howtoorganizethepupils?

23

Iterators

> require(iterators) •  Generalizedloopvariable•  Valueneednotbeatomic

– Rowofamatrix– Randomdataset– Chunkofadatafile– Recordfromadatabase

•  Createwith:iter •  Getvalueswith:nextElem •  Usedasindexingargumentwithforeach

Iteratorsarememoryfriendly

•  Allowdatatobesplitintomanageablepiecesonthefly

•  Helpsalleviateproblemswithprocessinglargedatastructures

•  Piecescanbeprocessedinparallel

Iteratorsactasadaptors

•  Allowsyourdatatobeprocessedbyforeachwithoutbeingconverted

•  Caniterateovermatricesanddataframesbyroworbycolumn:

it <- iter(Boston, by="row") nextElem(it)

NumericIterator

> i <- iter(1:3) > nextElem(i) [1] 1

> nextElem(i) [1] 2 > nextElem(i) [1] 3 > nextElem(i) Error: StopIteration

Longsequences

> i <- icount(1e9) > nextElem(i) [1] 1 > nextElem(i) [1] 2 > nextElem(i) [1] 3 > nextElem(i) [1] 4 > nextElem(i) [1] 5

Matrixdimensions

> M <- matrix(1:25,ncol=5) > r <- iter(M,by="row") > nextElem(r) [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 > nextElem(r) [,1] [,2] [,3] [,4] [,5] [1,] 2 7 12 17 22 > nextElem(r) [,1] [,2] [,3] [,4] [,5] [1,] 3 8 13 18 23

DataFile

> rec <- iread.table("MSFT.csv",sep=",", header=T, row.names=NULL) > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.91 30.25 29.4 29.86 76935100 28.73 > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.7 29.97 29.44 29.81 45774500 28.68 > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.63 29.75 29.45 29.64 44607200 28.52 > nextElem(rec) MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted 1 29.65 30.1 29.53 29.93 50220200 28.8

30

Database

> library(RSQLite) > m <- dbDriver('SQLite') > con <- dbConnect(m, dbname="arrests") > it <- iquery(con, 'select * from USArrests', n=10) > nextElem(it) Murder Assault UrbanPop Rape Alabama 13.2 236 58 21.2 Alaska 10.0 263 48 44.5 Arizona 8.1 294 80 31.0 Arkansas 8.8 190 50 19.5 California 9.0 276 91 40.6 Colorado 7.9 204 78 38.7 Connecticut 3.3 110 77 11.1 Delaware 5.9 238 72 15.8 Florida 15.4 335 80 31.9 Georgia 17.4 211 60 25.8

31

Infinite&Irregularsequences

iprime <- function() { lastPrime <- 1 nextEl <- function() { lastPrime <<- as.numeric(nextprime(lastPrime)) lastPrime } it <- list(nextElem=nextEl) class(it) <- c('abstractiter','iter') it}

> require(gmp) > p <- iprime() > nextElem(p) [1] 2 > nextElem(p) [1] 3

33

Loopingwithforeach

Loopingwithforeach

foreach (var=iterator) %dopar% { statements }

  Evaluatestatementsuntiliteratorterminates  statementswillreferencevariablevar   Valuesof{ … }blockcollectedintoalist

  Runssequentially(bydefault)(orforcewith%do% )

34

> foreach (j=1:4) %dopar% sqrt (j)

[[1]] [1] 1

[[2]] [1] 1.414214

[[3]] [1] 1.732051

[[4]] [1] 2

Warning message: executing %dopar% sequentially: no parallel backend registered

35

CombiningResults> foreach(j=1:4, .combine=c) %dopar% sqrt(j) [1] 1.000000 1.414214 1.732051 2.000000

> foreach(j=1:4, .combine='+’, .inorder=FALSE) %dopar% sqrt(j)

[1] 6.146264

  Whenorderofevaluationisunimportant,use.inorder=FALSE

36

Referencingglobalvariables> z <- 2 > f <- function (x) sqrt (x + z)

> foreach (j=1:4, .combine='+') %dopar% f(j)

[1] 8.417609

  foreachautomaticallyinspectscodeandensuresunboundobjectsarepropagatedtotheevaluationenvironment

37

Nestedforeachexecu,on

•  foreach opera,onscanbenestedusing%:%operator

•  Allowsparallelexecu,onacrossmul,pleloopslevels,“unrolling”theinnerloops

foreach(i=1:3, .combine=cbind) %:% foreach(j=1:3, .combine=c) %dopar% (i + j)

39

Speedingupcodewithforeach

SMPProcessing

Quickreviewofparallelanddistributedcompu,nginR

•  NetWorkSpaces(packagenws;SMP,distributed)–  GPL,alsocommerciallysupportedbyREvolu,onCompu,ng–  Verycross‐pla�orm,distributedshared‐memoryparadigm–  Fault‐tolerant

•  Mul,Core(packagemulticore;SMPonly)–  Linux/MacOS(requiresPOSIX)–  UsesforktocreatenewRprocesses

•  Rmpi(packageRmpi;SMP,distributed)–  Fine‐grainedcontrolallowsveryhigh‐performancecalcula,ons–  Canbetrickytoconfigure–  LimitedWindowsandheterogeneousclustersupport

•  SNOW(packagesnow;SMP,distributed*)–  LimitedWindowssupport(*singlemachineonly)–  Meta‐package:supportsMPI,sockets,NWS,PVM

40

Parallelbackendsforforeach

•  %dopar%behaviourdependsoncurrent“registered”parallelbackend

•  Modularparallelbackends

•  registerDoSEQ(default)•  registerDoNWS(NetWorkSpaces)•  registerDoMC(mul,core,MacOS/Windows)

•  FromTerminal/ESSonly!(R.appGUIwillcrash.)•  registerDoSNOW•  registerDoRMPI

41

GemngStarted:Mul,‐coreProcessing

•  R2.10.x–waitun,lofficialrelease•  R2.9.x

require(doMC)

registerDoMC(cores=2)

•  REvolu,onREnterpriserequire(doNWS)

s <- sleigh(workerCount=2) registerDoNWS()

42

Asimplesimulation:

birthday <- function(n) { ntests <- 1000 pop <- 1:365 anydup <- function(i) any(duplicated( sample(pop, n, replace=TRUE)))

sum(sapply(seq(ntests), anydup)) / ntests }

x <- foreach (j=1:100) %dopar% birthday (j)

43

BirthdayExample‐,mings

Backend Time(s)registerDoSEQ() 41sregisterDoMC()#2cores 28sregisterDoNWS()#2workers 26s(*)

44

Dual‐core2.4GHzIntelMacBook:

system.time{ x <- foreach (j=1:100) %dopar% birthday (j) } # Elapsed

BirthdaySimula,on:Mul,core/NWS

> x <- foreach (j=1:100) %dopar% birthday (j) > plot(1:100, unlist(x),type="l")

46

UsingclusterswithNetworkSpaces

SemngUpaCluster

1.  Iden,fymachinestoformnodesoncluster–  EasiestwithLinux/MacOS

–  PossiblewithWindows

2.  Selectaservermachine–  OKforthisonetobeonWindows

3.  Makesurepasswordlesssshenabledoneachworkernode

–  ssh nodename Revo --versionshouldwork

47

Semngupacluster,part2

4.  Logintoserver,startREvolu,onR5.  Createasleigh

require(doNWS) s <- sleigh(nodeList=c( rep("localhost",2), rep("thor",8), rep("loki",4)), launch=sshcmd) registerDoNWS(s)

6.  Useforeachasbefore7.  (op,onal)usejoinSleightoaddnewnodes

48

ParallelRandomForest

# a simple parallel random forest

library(randomForest) x <- matrix(runif(500), 100)

y <- gl(2, 50) wc <- 2

n <- ceiling(1000 / wc)

registerDoNWS(s) foreach(ntree=rep(n, wc), .combine=combine,

.packages='randomForest') %dopar% randomForest(x, y, ntree=ntree)

•  Easier: randomShrubberyNWS()

49

Conver,ngexis,ngcode

•  Converttheseloopstoforeach:– for:makebodyreturnitera,onvalueand.combine

– apply:useiter(X, by="row”)and.combine •  Oriapply(X,1)

– lapply:useiter(mylist)

50

ppiStatsExample

•  Sequen,al:bpMats1 <- lapply(bpList, function(x) { bpMatrix(x, symMat = TRUE, homodimer = FALSE, baitAsPrey = FALSE, unWeighted = FALSE, onlyRecip = FALSE, baitsOnly = FALSE) })

•  Parallel:bpMats1 <- foreach(x=iter(bpList), .packages = "ppiStats") %dopar% { bpMatrix(x, sysMat = TRUE, homodimer = FALSE, baitAsPrey = FALSE, unWeighted = FALSE, onlyRecip = FALSE, baitsOnly = FALSE) }

51

ppiStatsExample

•  Sequen,al:bpGraphs <- lapply(bpMats1, function(x) {

genBPGraph(x, directed = TRUE, bp = FALSE)

})

•  Parallel:bpGraph <- foreach(x=iter(bpMat1),

.packages = "ppiStats") %dopar% {

genBPGraph(x, directed = TRUE, bp = FALSE) }

52

Excercise

•  Findother“embarassinglyparallel”BioConductorexamples,andconverttoparallelwithforeach.

53

54

Mul,‐StratumParallelism

foreach(iterator)%dopar%{tasks}

foreach…

task task

foreach…

task task

CLUSTER

SMP

Anexampleofexplicitmulti‐stratum||ism

55

require ("doNWS") require ("foreach") require ("doMC")

s <- sleigh(nodelist=c(rep("localhost",2), rep("bladeserver",8)) registerDoNWS(s)

foreach (iterator_i, .packages=c("foreach", "doMC"))%dopar% { registerDoMC() foreach (iteratorj_) %dopar% { tasks… } }

Mul,‐stratumtemplate:NWS/Mul,core

56

Pi�allstoavoid

•  Sequen,alvsParallelProgramming•  RandomNumberGenera,on

–  library(sprngNWS) –  sleigh(workerCount=8, rngType=‘sprngLFG’)

•  Nodefailure•  CosmicRays

57

Conclusions

•  Parallelcompu,ngiseasy!•  Writeloopswithforeach/%dopar%

– Worksfineinasingle‐processorenvironment

– Third‐partyuserscanregisterbackendsformul,processororclusterprocessing

– Speedbenefitswithoutmodifyingcode

•  Easyperformancegainsonmodernlaptops/desktops

•  Expandtoclustersformeatyjobs

58

ThankYou!

•  DavidSmith–  david@revolu,on‐compu,ng.com,@revodavid

•  REvolu,onCompu,ng– www.revolu,on‐compu,ng.com

•  Revolu+ons,theRblog– blog.revolu,on‐compu,ng.com

•  Downloads:– Slides:h7p://,nyurl.com/R‐Bioc‐slides– Script:h7p://,nyurl.com/R‐Bioc‐script

top related