loop scheduling in openmp · 2017-11-24 · •loop scheduling in openmp. •a primer of a loop...
TRANSCRIPT
![Page 1: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/1.jpg)
LoopSchedulinginOpenMPVivekKale
UniversityofSouthernCalifornia/InformationSciencesInstitute
![Page 2: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/2.jpg)
Overview• LoopSchedulinginOpenMP.• Aprimerofaloopconstruct• DefinitionsforschedulesforOpenMP loops.
• Aproposalforuser-definedloopscheduleforOpenMP• Needtoallowforrapiddevelopmentofnovelloopschedulingstrategies.• WesuggestgivingusersofOpenMP applicationscontroloftheloopschedulingstrategytodoso.• Wecalltheschemeuser-definedloopschedulingandproposetheschemeasanadditiontoOpenMP.
![Page 3: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/3.jpg)
OpenMP loops:Aprimer• OpenMP providesaloopconstructthatspecifiesthattheiterationsofoneormoreassociatedloopswillbeexecutedin parallelbythreadsintheteaminthecontextoftheirimplicittasks.1
#pragma omp for [clause[ [,] clause] ... ]for (int i=0; i<100; i++){}
• Loopneedstobeincanonicalform.• Theclause canbeoneormoreofthefollowing:private(…), firstprivate(…), lastprivate(…), linear(…), reduction(…), schedule(…), collapse(...), ordered[…], nowait, allocate(…)
• Wefocusontheclauseschedule(…) inthistalk.1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/
![Page 4: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/4.jpg)
AScheduleforanOpenMP loop#pragma omp parallel for schedule([modifier [modifier]:]kind[,chunk_size])
• Aschedule inOpenMP is:• aspecificationofhowiterationsofassociatedloopsaredividedintocontiguousnon-emptysubsets• Wecalleachofthecontiguousnon-emptysubsetsachunk
• and howthesechunksaredistributedtothreadsoftheteam.1
• Thesizeofachunk,denotedaschunk_sizemustbeapositiveinteger.
1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/
![Page 5: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/5.jpg)
TheKindofaSchedule• Aschedulekind ispassedtoanOpenMP loopscheduleclause:• providesahintforhowiterationsofthecorrespondingOpenMP loopshouldbeassignedtothreadsintheteamoftheOpenMP regionsurroundingtheloop.
• FivekindsofschedulesforOpenMP loop1:• static• dynamic• guided• auto• runtime
• TheOpenMP implementationand/orruntimedefineshowtoassignchunkstothreadsofateamgiventhekindofschedulespecifiedbyasahint.
1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/
![Page 6: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/6.jpg)
ModifiersoftheClauseSchedule• simd:thechunk_size mustbeamultipleofthesimd width.1
• monotonic:Ifathreadexecutediterationi,thenthethreadmustexecuteiterationslargerthani subsequently.1
• non-monotonic:Executionordernotsubjecttothemonotonicrestriction.1
1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/
![Page 7: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/7.jpg)
NeedNovelLoopSchedulingSchemesinOpenMP• Supercomputerarchitecturesandapplicationsarechanging.• Largenumberofcorespernode.• Speedvariabilityacrosscores.• Complexdynamicbehaviorinapplicationsthemselves.
• So,weneednewmethodsofdistributinganapplication’scomputationalworktoanode’scores1,specificallytoscheduleanapplication’sparallelizedloop’siterationstocores.• Suchmethodsneedto• Ensuredatalocalityandreducesynchronizationoverheadwhilemaintainingloadbalance2.• Beawareofinter-nodeparallelismhandledbylibrariessuchasMPICH3.• Adaptduringanapplication’sexecution.
1:R.D.Blumofe andC.E.Leiserson.SchedulingMultithreadedComputationsbyWorkStealing.JournalofACM46(5):720–748,1999.2:S.Donfack,L.Grigori,W.D.Gropp,andV.Kale.HybridStatic/DynamicSchedulingforAlreadyOptimizedDenseMatrixFactorizations.InIEEEInternationalParallelandDistributedProcessingSymposium,IPDPS2012,Shanghai,China,2012.3:E.Lusk,N.Doss,andA.Skjellum.AHigh-performance,PortableImplementationoftheMessagePassingInterfaceStandard.ParallelComputing,22:789–828,1996.
![Page 8: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/8.jpg)
UtilityofNovelStrategiesShown• TheutilityofnovelstrategiesisdemonstratedinpublishedworkbyV.Kaleetal 1,2 andothers.• Forexample,static-dynamicschedulingmixedstrategywithanadjustablestaticfraction.• Motivation:tolimittheoverheadofdynamicscheduling,whilehandlingimbalances,suchasthoseduetonoise.
1:S.Donfack,L.Grigori,W.D.Gropp,andV.Kale.HybridStatic/DynamicSchedulingtoImprovePerformanceofAlreadyOptimizedDenseMatrixFactorizations,IPDPS2012.2:V.Kale,S.Donfack,L.Grigori,andW.D.Gropp.LightweightSchedulingforBalancingtheTradeoffBetweenLoadBalanceandLocality2014.
CALUusingstaticscheduling(top)andfd =0.1(bottom)with2-levelblocklayoutrunonAMDOpteron16corenode.
Diagramofstatic(top)andmixedstatic/dynamicscheduling(bottom)wherefdisthedynamicfraction.
13
Scheduling CALU’s Task Dependency Graph• Static scheduling
+ Good locality of data - Ignores OS jitter
Slack MPI
1234
Threads
Tp
1234
1234
Time
Slack
Noise
ThreadsThreads
MPI
MPI
Amplification
t1q(1� fd) · Tp S
White:idletime
![Page 9: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/9.jpg)
NeedtoSupportAUser-definedScheduleinOpenMP
• PracticeofburyingalloftheschedulingstrategyinsideanOpenMPimplementation,withlittlevisibilityinapplicationcodebeyondtheuseofkeywordssuchas’dynamic’or’guided’,isn’tadequateforthispurpose.• OpenMP’s implementations,e.g.,GCC’slibgomp 1andLLVM’sOpenMPlibrary 2 aredifficulttochange,whichcanhinderdevelopmentofloopschedulingstrategiesatarapidpace.
1. DocumentationforGCC’sOpenMP library.https://gcc.gnu.org/onlinedocs/libgomp/ .2. DocumentationforLLVM’sOpenMP library.http://openmp.llvm.org/Reference.pdf/ .
![Page 10: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/10.jpg)
ReasonsforUser-definedSchedules• Flexibility.
• GiventhevarietyofOpenMP implementations,havingastandardizedwayofdefiningauser-levelstrategyprovidesflexibilitytoimplementschedulingstrategiesforOpenMP programseasilyandeffectively.
• EmergenceofThreadedRuntimeSystems.• EmergenceofthreadedlibrariessuchasArgobots1 andQuickThreads2 arguesinfavorofaflexiblespecificationofschedulingstrategiesalso.
• Notethatkeywordsauto andruntime aren’tadequate.• Specifyingautoorruntimeschedulesisn’tsufficientbecausetheydon’tallowforuser-levelscheduling.
1. S.Seo,A.Amer,P.Balaji,C.Bordage,G.Bosilca,A.Brooks,A.Castello,D.Genet,T.Herault,P.Jindal,L.Kale,S.Krishnamoorthy,J.Lifflander,H.Lu,E.Meneses,M.Snir,Y.Sun,andP.H.Beckman.Argobots:Alightweightthreadingtaskingframework.2016.
2. D.Keppel.Toolsandtechniquesforbuildingfastportablethreadspackages.TechnicalReportUWCSE93-05-06,UniversityofWashingtonDepartmentofComputerScienceandEngineering,May1993.
![Page 11: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/11.jpg)
SchedulingCodeoflibgomp• Theschedulingcodeinlibgomp,forexample,supportsthestatic,dynamic,andguidedschedulesnaturally.• However,itscodestructurecan’taccommodatethenumberandsophisticationofthestrategiesthatwewouldliketoexplore.
àAddingauser-definedscheduleintoOpenMP libraries:• ispossiblewitheffortforagivenlibrary.• mustbedonedifferentlyforeachlibrary.
![Page 12: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/12.jpg)
SpecificationofUser-definedSchedulingScheme
• Weaimtospecifyauser-definedschedulingschemewithintheOpenMPstandard1 .• Theschemeshouldaccommodateanarbitraryuser-definedscheduler.• Thesearetheelementsrequiredtodefineascheduler.• Scheduler-specificdatastructures.• Historyrecord.• Specificationofschedulingbehaviorofthreads.
1. OpenMP ApplicationProgrammingInterface.November2015.
![Page 13: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/13.jpg)
PotentialSchedulingDataStructures• Thestrategiesshouldbeallowedtouseasubsetorcombinationof:• Shareddatastructures.• Low-overheadsteal-queues.• Exclusivequeuesmeantforeachthread.• Sharedqueuesfromwhichmultiplethreadscandequeue work,eachrepresentingachunkofloopiterations.
![Page 14: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/14.jpg)
HistoryTrackingfromPriorInvocations1. Tofacilitatetheabilitytolearnfromrecenthistory,e.g.,valuesof
slackinMPIcommunication1,2 frompreviousouteriterations,theschedulingschemeshouldallowforpassingacall-sitespecifichistory-trackingobject3 tothescheduler.
2. Examplesofhistoryinformation,i.e.,attributestotrackviahistoryobjects:• Previousvaluesofdynamicfraction.• Iteration-to-coreaffinities.• Runtimeperformanceprofiles.
1. A.Faraj,P.Patarasuk,andX.Yuan.AStudyofProcessArrivalPatternsforMPICollectiveOperations.InProceedingsofthe2006ACM/IEEEConferenceonSupercomputing,SC’06,Tampa,FL,USA,2006.ACM.
2. B.Rountree,D.K.Lowenthal,B.R.deSupinski,M.Schulz,V.W.Freeh,andT.Bletsch.Adagio:MakingDVSPracticalforComplexHPCApplications.InProceedingsofthe23rdInternationalConferenceonSupercomputing,ICS’09,pages460–469,YorktownHeights,NY,USA.2009.ACM.
3. V.Kale,T.Gamblin,T.Hoefler,B.R.deSupinski andW.D.Gropp.Slack-consciousLightweightLoopSchedulingforScalingPasttheNoiseAmplificationProblem.Poster.SC2012
![Page 15: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/15.jpg)
Example:LibrarytoSupportStaggeredScheduling• Inearlierwork1,aloopschedulinglibrarythatsupportsa“staggered”schedulingstrategywasimplemented.
• ItwasimplementedwithinanOpenMP parallelregionbyenclosingtheloopbodywithmacrosFORALL_BEGIN()andFORALL_END()withappropriateparameters.
1:V.Kale,A.P.Randles,V.Kale,andW.D.Gropp.Locality-OptimizedSchedulingforImprovedLoadBalancingonSMPs.InProceedingsofthe21stEuropeanMPIUsers’GroupMeetingConferenceonRecentAdvancesintheMessagePassingInterface,volume0,pages1063–1074.AssociationforComputingMachinery,2014.
1.Theschedulingstrategy’snameandassociatedparametersarespecifiedintheparametersofthemacro.2.Bothmacrosinvokelibraryfunctionscorrespondingtothestrategy’snameasspecifiedinthemacrocall.Themacro’sparametersarepassedtotheuser-definedschedulerfunctions.
int start, end = 0;static LoopTimeRecord* ltr;double fd = 0.3; #pragma omp parallel
{ int tid = omp_get_thread_num();int numThrds = omp_get_num_threads();FORALL_BEGIN(sds,tid,numThrds,0,n,start,end,fd)for(int i=start;i<end;i++)c[i] += a[i]*b[i];
FORALL_END(sds,tid,start,end)}
![Page 16: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/16.jpg)
Example:LibrarytoSupportStaggeredScheduling• Inearlierwork1,aloopschedulinglibrarythatsupportsthis“staggered”schedulingstrategywasimplemented.
• ItwasimplementedwithinanOpenMP parallelregionbyenclosingtheloopbodywithmacrosFORALL_BEGIN()andFORALL_END()withappropriateparameters.
1:V.Kale,A.P.Randles,V.Kale,andW.D.Gropp.Locality-OptimizedSchedulingforImprovedLoadBalancingonSMPs.InProceedingsofthe21stEuropeanMPIUsers’GroupMeetingConferenceonRecentAdvancesintheMessagePassingInterface,volume0,pages1063–1074.AssociationforComputingMachinery,2014.
1.Theschedulingstrategy’snameandassociatedparametersarespecifiedintheparametersofthemacro.2.Bothmacrosinvokelibraryfunctionscorrespondingtothestrategy’snameasspecifiedinthemacrocall.Themacro’sparametersarepassedtotheuser-definedschedulerfunctions.
int start, end = 0;static LoopTimeRecord* ltr;double fd = 0.3; #pragma omp parallel
{ int tid = omp_get_thread_num();int numThrds = omp_get_num_threads();FORALL_BEGIN(sds,tid,numThrds,0,n,start,end,fd)for(int i=start;i<end;i++)c[i] += a[i]*b[i];
FORALL_END(sds,tid,start,end)}
FORALL_BEGIN(strat,…)macroexpansionstrat_##LoopStart(..&start,&end,..);do(loopnotdone){
FORALL_END(strat,…)macroexpansion:}
while(strat_##LoopNext(… &start,&end,..));barrier;
![Page 17: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/17.jpg)
ProposalforUser-definedScheduleinOpenMP• Weproposeauser-definedschedulingschemethatisanadaptationoftheabovemacro-basedscheme.• Overcomeslimitationsofsimplemacro-basedscheme.
• TheproposedAPIusedforasimplecodeisillustratedbelow.
double dynamicFraction = 0.3;static LoopTimeRecord* ltr; // for history.int chunkSize = 4;#pragma omp parallel for schedule(user, staggered:chunkSize:dynamicFraction:ltr)for(int i = 0; i < n; i++)
c[i] = a[i]*b[i];
• Thefirstparameteroftheclause’schedule’specifiesanewschedulekinduser.• Thesecondparameterspecifiestheschedulingstrategy’snamestaggered,optionallywiththestrategy-specificparameters.
![Page 18: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/18.jpg)
ImplementationofUser-definedSchedule• Whenauserspecifiesaschedulekinduser andastrategynamedX
• Theyneedtolinkalibrarythatdefinesfunctions:• X_loopStart(), X_loopNext() andX_init().
• X_init()allowsauser-levelschedulertoallocateandinitializeitsdatastructuresthataretobeusedcommonlyacrossparallelloops thatuseX.• ThefunctionsX_loopStart()andX_loopNext()determinealoop’sindicesthatathreadshouldworkonbasedontheparametervaluesfortheschedulingstrategyandoftheloop.
Everythreadexecuteswhenstartinganewloop:…X_loopStart();do X_loopNext();untildone;//doneflagissetbytheuser-defined//schedulerfunctions
• Aslongasoneisallowedtodefinethesefunctions,onecanimplementauser-definedscheduler.
• EverythreadshouldcallX_loopNext() repeatedly.
![Page 19: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/19.jpg)
SoftwareArchitectureforUser-definedSchedule
#include <mpi.h>
#include <omp-mod.h>
int main(){double dynamicFraction = 0.3;static LoopTimeRecord* ltr; // for history.int chunkSize = 4;while(timestep <1000) {#pragma omp parallel for schedule(user,staggered:chunkSize:dynamicFraction:ltr)for(int i = 0; i < n; i++)
c[i] = a[i]*b[i];}
}
#include <userDefSched.h>
#defineFORALL_BEGIN(strat,…)strat_##LoopStart(..&start,&end,..);do(loopnotdone){
#defineFORALL_END(strat,…)}
while(strat_##LoopNext(… &start,&end,..));barrier();
switch(clause){ ‘static’:‘dynamic’:‘guided’: ‘auto’:‘runtime’:‘user’:
}
strat_##LoopStart(..&start,&end,..);{}
myApplication.Comp-mod.h
sd_Init():- allocateshareddatastructuresforloopsthatusestrategysd;
sd_LoopStart():ifIamthemasterthread,- setupadatastructureloopParams,alongwithalock;- signalotherthreadstostart;
else- waitforthesignalfrommasterthread;
usethesharedloopParams datastructuretocalculatemystaticiterationsandexecuteloop_body forthatrange;
sd_LoopNext():lockloopParams;extractachunktoworkon;unlockloopParams;if(nochunkavailable)waitforbarrier;done=1;
elseexecuteloop_body fortheextractedchunk;
![Page 20: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/20.jpg)
sd_LoopStart():ifIamthemasterthread,- setupadatastructureloopParams,alongwithalock;- signalotherthreadstostart;
else- waitforthesignalfrommasterthread;
- usethesharedloopParams datastructuretocalculatemystaticiterationsandexecuteloop_body forthatrange;
sd_LoopNext():lockloopParams;extractachunktoworkon;unlockloopParams;if(nochunkavailable)waitforbarrier;done=1;
elseexecuteloop_body fortheextractedchunk;
Static/dynamicSchedulingUsingaSharedDataStructure
![Page 21: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/21.jpg)
sd2_LoopStart():ifIamthemasterthread
— setupadatastructureloopParams withloopparameters;— enqueue asingleentrycorrespondingtodynamicrangeofiterations intothread0’sstealqueue;— signalotherthreadstostart;
else— waitforthesignalfrommasterthread;
- UsetheshareddatastructuretocalculatemystaticiterationsandexecuteloopBody forthatrange;
sd2_LoopNext():- nextRange =myQueue.dequeue();if(nextRange ==NULL)nextRange =steal(random_neighbor);if(nextRange !=NULL)L=nextRange->low;U=nextRange->high;if((U-L)>Threshold)- splittherangein2, enqueue theminmystealqueue;
else- executeloopbodyforiterationsL:U;- updatecountofiterationscompletedtosetthe“done”flagwhen done;
AnAlternativeImpl.ofStatic/DynamicStrategyUsingStealQueuesforDynamicIterations
![Page 22: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/22.jpg)
staggered_LoopStart():ifIamthemasterthread
- setupadatastructureloopParams;- signalotherthreadstostart;
else- waitforthesignalfrommasterthread;
- enqueue entriescorrespondingtochunksofmydynamicrangeofiterationsintomythread’sstealqueue;- calculatemystaticiterationsandexecuteloopBody forthatrange;
staggered_LoopNext():nextRange =myQueue.dequeue();if(nextRange ==NULL)nextRange =steal(random_neighbor);if(nextRange !=NULL)- L=nextRange->low;U=nextRange->high;- executeloopbodyformyiterationsL:U;- updatecountofiterationscompletedtosetthe“done”flagwhendone;
AnImplementationofStaggeredStatic/DynamicStrategy
tion. The name of the scheduling strategy and its associated parameters isspecified in the parameter of the macro function. The parameters passedto the macro function are passed to the scheduler functions defined by theuser. This macro-based approach is of limited utility because of its inabilityto use the compiler support for features such as reductions. The approachwe propose eliminates these limitations and leads to concise code.
T0 T1 T2 T3T0 T0 T0 T0T0 T0T0 T1 T1 T1 T1 T1T1T1 T2 T2 T2 T2T2 T2T2
Increasing loop iteration number
T3 T3T3T3T3T3T3
Figure 1: Diagram of the iteration space of a threaded computation regionshowing loop iterations distributed to threads when using the technique ofstaggered mixed static/dynamic scheduling. In the figure, the smaller, redcolored, rectangles correspond to dynamic iterations tentatively assignedto a specific thread.
double static_fraction = 0.5; double constraint =0.2; int
chunk_size = 4;#pragma omp parallel for schedule(user, statdynstaggered:
chunk_size:static_fraction:constraint)for (int i=0; i<n; i++)
sum+= a[i]*b[i];
Figure 2: An illustration of the use of the user-defined scheduler in a dotproduct code parallelized with OpenMP.
We propose the user-defined scheduling scheme to be an adaptation ofthe scheme of using macro-based functions. The schedule kind ’user’ andthe name of the scheduling strategy with the strategy’s parameter value(s),e.g., the name ’staggered’ and the parameter values for the static frac-tion, chunk size and constraint, are specified as parameters of the OpenMP’schedule’ clause. Figure 2 illustrates the proposed API for using a user-defined schedule through the API’s use in the dot product code. The user-defined scheduler named ’staggered’, specified in the second parameter ofthe schedule clause, defines how the work of a loop in the code is enqueuedinto queues or data structures and the strategy of dequeuing (or obtaining)a chunk of loop iterations to execute next using the scheduling strategyillustrated in Figure 1.
When the user specifies a schedule clause with kind ’user’ and a
3
![Page 23: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/23.jpg)
DiscussionofAPI’sdetails• WehavesketchedaboveasyntaxfortheAPIforuser-levelschedulers.• However,weexpectthattheAPI’sdetailswillbeworkedoutwiththecommunity’sconsensus.
• Notethatthereisaprecedentforaddinguser-definedfunctionsinOpenMP standard.• The combinerfunctioninuser-definedreductions.
![Page 24: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined](https://reader033.vdocuments.site/reader033/viewer/2022060310/5f0a8b887e708231d42c28e6/html5/thumbnails/24.jpg)
Summary• Needforexperimentationwithsophisticatedloopschedulingstrategies.• OpenMP communityshoulddiscusshowtoallowflexiblespecificationofsuchstrategiesinauser’scodeandhowtodesignauser-levelschedulerlibrarysothatitcanbeportablyusedwithanyconformingOpenMP implementation.• Supportinguser-definedschedulersinthiswaywillfacilitaterapiddevelopmentofschedulingstrategies.• Ihopetheexpertswilldiscusstheseideasatthisconference.
Acknowledgements:UniversityofSouthernCalifornia/ISI’sTechnicalComputingNewDirectionsprogram.