list of lecturestdda69/lectures/2015/09_concurrent.pdftdda69 data and program structure parallel and...
TRANSCRIPT
![Page 1: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/1.jpg)
TDDA69DataandProgramStructureParallelandDistributedComputing
CyrilleBerger
2/64
Listoflectures1IntroductionandFunctionalProgramming2ImperativeProgrammingandDataStructures3Environment4Evaluation5ObjectOrientedProgramming6Macrosanddecorators7VirtualMachinesandBytecode8GarbageCollectionandNativeCode9ParallelandDistributedComputing
10LogicProgramming11Summary
3/64
LecturegoalLearnabouttheconcept,thechallengesofdistributedcomputingTheimpactofdistributedprogrammingonprogramminglanguageandimplementations
4/64
LecturecontentParallelProgramming
MultithreadedProgrammingTheStatesProblemsandSolutions
AtomicactionsLanguageandInterpreterDesignConsiderations
SingleInstruction,MultipleThreadsProgramming
DistributedprogrammingMessagePassingMapReduce
![Page 2: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/2.jpg)
5/64
ConcurrentcomputingInconcurrentcomputingseveralcomputationsareexecutedatthesametimeInparallelcomputingallcomputationsunitshaveaccesstosharedmemory(forinstanceinasingleprocess)Indistributedcomputingcomputationsunitscommunicatethroughmessagespassing
6/64
BenefitsofconcurrentcomputingFasterResponsiveness
Interactiveapplicationscanbeperformingtwotasksatthesametime:rendering,spellchecking...
AvailabilityofservicesLoadbalancingbetweenservers
ControllabilityTasksrequiringcertainpreconditionscansuspendandwaituntilthepreconditionshold,thenresumeexecutiontransparently.
7/64
Disadvantagesofconcurrentcomputing
ConcurrencyishardtoimplementproperlySafety
EasytocorruptDeadlock
TaskscanwaitindefinitelyforeachNon-Notalwaysfaster!
ThememorybandwidthandCPUcacheis
8/64
Concurrentcomputingprogramming
Fourbasicapproachtocomputing:Sequencialprogramming:noconcurrencyDeclarativeconcurrency:streamsinafunctionallanguageMessagepassing:withactiveobjects,usedindistributedcomputingAtomicactions:onasharedmemory,usedinparallelcomputing
![Page 3: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/3.jpg)
9/64
StreamProgramminginFunctionalProgramming
NoglobalFunctionsonlyactontheirinput,theyarereentrantFunctionscanthenbeexecutedinparallel
Aslongastheydonotdependontheoutputofanotherfunction
ParallelProgramming
11
ParallelProgrammingInparallelcomputingseveralcomputationsareexecutedatthesametimeandhaveaccesstosharedmemory
Unit Unit Unit
Memory
12
SIMD,SIMT,SMT(1/2)SIMD:SingleInstruction,Multiple
Elementsofashortvector(4to8elements)areprocessedinparallel
SIMT:SingleInstruction,MultipleThesameinstructionisexecutedbymultiplethreads(from128to3048ormoreinthefuture)
SMT:SimultaneousGeneralpurpose,differentinstructionsareexecutedbydifferentthreads
![Page 4: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/4.jpg)
13
SIMD,SIMT,SMT(2/2)SIMD:
PUSH[1,2,3,4]PUSH[4,5,6,7]chrome://downloads/VEC_ADD_4
SIMT:execute([1,2,3,4],[4,5,6,7],lambdaa,b,ti:a[ti]=a[ti]+max(b[ti],5))
SMT:a=[1,2,3,4]b=[4,5,6,7]...Thread.new(lambda:a=a+b)Thread.new(lambda:c=c*b)
14
Whytheneedforthedifferentmodels?
Flexibility:SMT>SIMT>SIMD
Lessflexibilitygivehigherperformance
Unlessthelackofflexibilitypreventtoaccomplishthetask
Performance:SIMD>SIMT>SMT
MultithreadedProgramming
16
SinglethreadedvsMultithreaded
![Page 5: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/5.jpg)
17
MultithreadedProgrammingModelStartwithasinglerootthreadFork:tocreateconcurentlyexecutingthreadsJoin:tosynchronizethreadsThreadscommunicatethroughsharedmemoryThreadsexecuteassynchronouslyTheymayormaynotexecuteondifferentprocessors
main
sub0 subn...
main
sub0 subn...
main
18
Amultithreadedexamplethread1=newThread(function(){/*dosomecomputation*/});thread2=newThread(function(){/*dosomecomputation*/});thread1.start();thread2.start();thread1.join();thread2.join();
TheStatesProblemsandSolutions
20
GlobalStatesandmulti-threadingExample:
vara=0;thread1=newThread(function(){a=a+1;});thread2=newThread(function(){a=a+1;});thread1.start();thread2.start();
Whatisthevalueofa?Thisiscalledaracecondition
![Page 6: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/6.jpg)
Atomicactions
22
MutexMutexistheshortofMutualexclusion
Itisatechniquetopreventtwothreadstoaccessasharedresourceatthesametime
Example:vara=0;varm=newMutex();thread1=newThread(function(){m.lock();a=a+1;m.unlock();});
thread2=newThread(function(){m.lock();a=a+1;m.unlock();});thread1.start();thread2.start();
Now
23
DependencyExample:
vara=1;varm=newMutex();thread1=newThread(function(){m.lock();a=a+1;m.unlock();});
thread2=newThread(function(){m.lock();a=a*3;m.unlock();});thread1.start();thread2.start();
Whatisthevalueofa?4or6?
24
ConditionvariableAConditionvariableisasetofthreadswaitingforacertainconditionExample:
vara=1;varm=newMutex();varcv=newConditionVariable();thread1=newThread(function(){m.lock();a=a+1;cv.notify();m.unlock();});
thread2=newThread(function(){cv.wait();m.lock();a=a*3;m.unlock();});thread1.start();thread2.start();a=6
![Page 7: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/7.jpg)
25
DeadlockWhatmighthappen:
vara=0;varb=2;varma=newMutex();varmb=newMutex();thread1=newThread(function(){ma.lock();mb.lock();b=b-1;a=a-1;ma.unlock();mb.unlock();});
thread2=newThread(function(){mb.lock();ma.lock();b=b-1;a=a+b;mb.unlock();ma.unlock();});thread1.start();thread2.start();thread1waitsformb,
thread2waitsforma
26
AdvantagesofatomicactionsVeryefficientLessoverhead,fasterthanmessagepassing
27
DisadvantagesofatomicactionsBlocking
MeaningsomethreadshavetowaitSmalloverheadDeadlockAlow-prioritythreadcanblockahighprioritythreadAcommonsourceofprogrammingerrors
LanguageandInterpreterDesignConsiderations
![Page 8: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/8.jpg)
29
CommonmistakesForgettounlockamutexRaceconditionDeadlocksGranularityissues:toomuchlockingwillkilltheperformance
30
ForgettounlockamutexMostprogramminglanguagehave,either:
AguardobjectthatwillunlockamutexupondestructionAsynchronizationstatementsome_rlock=threading.RLock()withsome_rlock:print("some_rlockislockedwhilethisexecutes")
31
RaceconditionCanwedetectpotentialraceconditionduringcompilation?Intherustprogramminglanguage
ObjectsareownedbyaspecificthreadTypescanbemarkedwithSendtraitindicatethattheobjectcanbemovedbetweenthreads
TypescanbemarkedwithSynctraitindicatethattheobjectcanbeaccessedbymultiplethreadssafely
32
SafeSharedMutableStateinrust(1/3)
letmutdata=vec![1u32,2,3];forjin0..2{thread::spawn(move||{for(inti=0;i<2;++i)data[i]+=1;});}
Givesanerror:"captureofmovedvalue:`data`"
![Page 9: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/9.jpg)
33
SafeSharedMutableStateinrust(2/3)
letmutdata=Mutex::new(vec![1u32,2,3]);forjin0..2{letdata=data.lock().unwrap();thread::spawn(move||{for(inti=0;i<2;++i)data[i]+=1;});}
Givesanerror:MutexGuarddoesnothaveSendtraits
Meaningwecanotmovedatatothethread
34
SafeSharedMutableStateinrust(3/3)
letmutdata=Arc::new(vec![1u32,2,3]);forjin0..2{letdata=data.clone();thread::spawn(move||{letmutdata=data.lock().unwrap();for(inti=0;i<2;++i)data[i]+=1;});}
ArchastheSynctrait.
SingleInstruction,MultipleThreadsProgramming
36
SingleInstruction,MultipleThreadsProgramming
WithSIMT,thesameinstructionsisexecutedbymultiplethreadsondifferentregisters
![Page 10: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/10.jpg)
37
Singleinstruction,multipleflowpaths(1/2)
Usingamaskingsystem,itispossibletosupportif/elseblock
Threadsarealwaysexecutingtheinstructionofbothpartoftheif/elseblocksdata=[-2,0,1,-1,2],data2=[...]functionf(thread_id,data,data2){if(data[thread_id]<0){data[thread_id]=data[thread_id]-data2[thread_id];}elseif(data[thread_id]>0){data[thread_id]=data[thread_id]+data2[thread_id];}}
38
Singleinstruction,multipleflowpaths(1/2)
Benefits:Multipleflowsareneededinmanyalgorithms
Drawbacks:Onlyoneflowpathisexecutedatatime,nonrunningthreadsmustwaitRandomizememoryaccessElementsofavectorarenotaccessedsequentially
39
ProgrammingLanguageDesignforSIMT
OpenCL,CUDAarethemostcommonVerylowlevel,C/C++-derivative
GeneralpurposeprogramminglanguagearenotsuitableSomeworkhasbeendonetowriteinPythonforCUDA
@jit(argtypes=[float32[:],float32[:],float32[:]],target='gpu')defadd_matrix(A,B,C):A[cuda.threadIdx.x]=B[cuda.threadIdx.x]+C[cuda.threadIdx.x]withlimitationonstandardfunctionthatcanbecalled
Distributedprogramming
![Page 11: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/11.jpg)
41
DistributedProgramming(1/4)Indistributedcomputingseveralcomputationsareexecutedatthesametimeandcommunicatethroughmessagespassing
Unit Unit Unit
Memory Memory Memory
42
Distributedprogramming(2/4)Adistributedcomputingapplicationconsistsofmultipleprogramsrunningonmultiplecomputersthattogethercoordinatetoperformsometask.Computationisperformedinparallelbymanycomputers.Informationcanberestrictedtocertaincomputers.Redundancyandgeographicdiversityimprovereliability.
43
Distributedprogramming(3/4)Characteristicsofdistributedcomputing:
Computersareindependent—theydonotsharememory.Coordinationisenabledbymessagespassedacrossanetwork.
44
Distributedprogramming(4/4)Individualprogramshavedifferentiatingroles.Distributedcomputingforlarge-scaledataprocessing:
Databasesrespondtoqueriesoveranetwork.Datasetscanbepartitionedacrossmultiplemachines.
![Page 12: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/12.jpg)
MessagePassing
46
MessagePassingMessagesare(usually)passedthroughsocketsMessagesareexchangedsyncrhonouslyorasynchronouslyCommunicationcanbecentralizedorpeer-to-peer
47
Python'sGlobalInterpreterLockCPythoncanonlyinterpretonesinglethreadatagiventimeThelockisreleased,
ThecurrentthreadisblockingforI/OEvery100interpreterticks
TruemultithreadingisnotpossiblewithCPython
48
Python'sMultiprocessingmoduleThemultiprocessingpackageoffersbothlocalandremoteconcurrency,effectivelyside-steppingtheGlobalInterpreterLockbyusingsubprocessesinsteadofthreadsItimplementstransparentmessagepassing,allowingtoexchangePythonobjectsbetweenprocesses
![Page 13: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/13.jpg)
49
Python'sMessagePassing(1/2)Exampleofmessagepassing
frommultiprocessingimportProcessdeff(name):print'hello',nameif__name__=='__main__':p=Process(target=f,args=('bob',))p.start()p.join()Outputhellobob
50
Python'sMessagePassing(2/2)ExampleofmessagepassingwithpipesfrommultiprocessingimportProcess,Pipedeff(conn):conn.send([42,None,'hello'])conn.close()if__name__=='__main__':parent_conn,child_conn=Pipe()p=Process(target=f,args=(child_conn,))p.start()printparent_conn.recv()p.join()
Output[42,None,'hello']
Transparentmessagepassingispossiblethankstoserialization
51
SerializationAserializedobjectisanobjectrepresentedasasequenceofbytesthatincludestheobject’sdata,itstypeandthetypesofdatastoredintheobject.
52
pickleInPython,serializationisdonewiththepicklemodule
Itcanserializeuser-definedTheclassdefinitionmustbeavailablebeforedeserialization
WorkswithdifferentversionofBydefault,useanASCII
Itcanserialize:Basictypes:booleans,numbers,Containers:tuples,lists,setsanddictionnary(ofpickableToplevelfunctionsandclasses(onlytheObjectswhere__dict__or__getstate()__are
Example:pickle.loads(pickle.dumps(10))
![Page 14: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/14.jpg)
53
SharedmemoryMemorycanbesharedbetweenPythonprocesswithaValueorArray.
frommultiprocessingimportProcess,Value,Arraydeff(n,a):n.value=3.1415927foriinrange(len(a)):a[i]=-a[i]if__name__=='__main__':num=Value('d',0.0)arr=Array('i',range(10))p=Process(target=f,args=(num,arr))p.start()p.join()printnum.valueprintarr[:]
Andofcourse,youwouldneedtouseMutextoavoidrace
MapReduce
55
BigDataProcessing(1/2)MapReduceisaframeworkforbatchprocessingofbigdata.Framework:AsystemusedbyprogrammerstobuildapplicationsBatchprocessing:Allthedataisavailableattheoutset,andresultsarenotuseduntilprocessingcompletesBigdata:Usedtodescribedatasetssolargeandcomprehensivethattheycanrevealfactsaboutawholepopulation,usuallyfromstatisticalanalysis
56
BigDataProcessing(2/2)TheMapReduce
DatasetsaretoobigtobeanalyzedbyonemachineUsingmultiplemachineshasthesamecomplications,regardlessoftheapplication/analysisPurefunctionsenableanabstractionbarrierbetweendataprocessinglogicandcoordinatingadistributedapplication
![Page 15: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/15.jpg)
57
MapReduceEvaluationModel(1/2)
Mapphase:Applyamapperfunctiontoallinputs,emittingintermediatekey-valuepairs
Themappertakesaniterablevaluecontaininginputs,suchaslinesoftextThemapperyieldszeroormorekey-valuepairsforeachinput
58
MapReduceEvaluationModel(2/2)Reducephase:Foreachintermediatekey,applyareducerfunctiontoaccumulateallvaluesassociatedwiththatkey
Thereducertakesaniterablevaluecontainingintermediatekey-valuepairsAllpairswiththesamekeyappearconsecutivelyThereduceryieldszeroormorevalues,eachassociatedwiththatintermediatekey
59
MapReduceExecutionModel(1/2)
60
MapReduceExecutionModel(2/2)
![Page 16: List of lecturesTDDA69/lectures/2015/09_concurrent.pdfTDDA69 Data and Program Structure Parallel and Distributed Computing Cyrille Berger 2 / 64 List of lectures 1Introduction and](https://reader033.vdocuments.site/reader033/viewer/2022043014/5fb2c2173fb8955a9467043f/html5/thumbnails/16.jpg)
61
MapReduceexampleFroma1.1billionpeopledatabase(facebook?),wewanttoknowtheaveragenumberoffriendsperageIn
SELECTage,AVG(friends)FROMusersGROUPBYage
Inthetotalsetofusersinsplitteddifferentusers_setfunctionmap(users_set){for(userinusers_set){send(user.age,user.friends.size);}}
Thekeysareshuffledandassignedtoreducersfunctionreduce(age,friends):{varr=0;for(friendinfriends){r+=friend;}send(age,r/friends.size);}
62
MapReduceAssumptionsConstraintsonthemapperandreducer:
ThemappermustbeequivalenttoapplyingadeterministicpurefunctiontoeachinputindependentlyThereducermustbeequivalenttoapplyingadeterministicpurefunctiontothesequenceofvaluesforeachkey
Benefitsoffunctionalprogramming:Whenaprogramcontainsonlypurefunctions,callexpressionscanbeevaluatedinanyorder,lazily,andinparallelReferentialtransparency:acallexpressioncanbereplacedbyitsvalue(orvisversa)withoutchangingtheprogram
InMapReduce,thesefunctionalprogrammingideasallow:
Consistentresults,howevercomputationisRe-computationandcachingofresults,as
63
MapReduceBenefitsFaulttolerance:Amachineorharddrivemightcrash
TheMapReduceframeworkautomaticallyre-runsfailedtasksSpeed:Somemachinemightbeslowbecauseit'soverloaded
Theframeworkcanrunmultiplecopiesofataskandkeeptheresultoftheonethatfinishesfirst
Networklocality:DatatransferisexpensiveTheframeworktriestoschedulemaptasksonthemachinesthatholdthedatatobeprocessed
Monitoring:Willmyjobfinishbeforedinner?!?Theframeworkprovidesaweb-basedinterfacedescribingjobs
64/64
SummaryParallelprogrammingMulti-threadingandhowtohelpreduceprogrammererrorDistributedprogrammingandMapReduce