revisiting resource partitioning for multi -core...

34
Revisiting Resource Partitioning for Multi-core Chips: Integration of Shared Resource Partitioning on a Commercial RTOS 21 Apr. 2017 PAK,EUNJI Senior researcher, ETRI (Electronics and Telecommunications Research Institute) [email protected] CMAAS’2017

Upload: vuongthien

Post on 13-Apr-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

RevisitingResourcePartitioningforMulti-coreChips:IntegrationofSharedResourcePartitioningonaCommercialRTOS

21Apr.2017

PAK,EUNJI

Seniorresearcher,ETRI(ElectronicsandTelecommunicationsResearchInstitute)

[email protected]

CMAAS’2017

Page 2: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Agenda• Qplus-AIR, acommercialRTOS• ComprehensivesharedresourcepartitioningimplementationonQplus-AIR

Page 3: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Qplus-AIR

ARINC653compliantRTOSCertifiableforDO-178BLevelA

Page 4: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

IntroductiontoQplus-AIR• Qplus-AIR

� DevelopedbyETRIforsafety-criticalsystem(2010~2012)� MainoperatingsystemfortheIFCC(Integratedflightcontrolcomputer)ofUAV(UnmannedAvionicsVehicle),KAI

� IntegrateMC(MissionControl),FC(FlightControl),andC&C(CommunicationsandCommands)intheIFCC

� ARINC653compliantRTOS*� Robustpartitioningamongapplications� Spatialandtemporal� Preventcross-applicationinfluenceanderrorpropagationamongapplications

� Easyintegrationofmultipleapplicationswithdifferentdegreesofcriticality

*AirlinesElectronicEngineeringCommittee,AvionicsApplicationSoftwareStandardInterfaceARINCSpecification653Part1,2006

Page 5: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

IntroductiontoQplus-AIR• Qplus-AIR

� CertifiablepackageforDO-178BLevelA� LightweightARINC653support:kernel-levelimplementation� Supportformulticoreplatforms(2014~)

• RTWORKS� AcommercialversionofQplus-AIR� ManagedbyRTST(2013~),ETRI’sspin-offcompany

� Startwith4developers,andnowhas11OSdevelopers� AUTOSAR(automotiveindustrystandard)andISO26262ASILDisinprogress

• ETRIfocusesonresearchissueswhileRTSTfocusesoncommercialization

Page 6: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ApplicationExamples• Safety-criticalindustrialapplications

� Integratedflightcontrolcomputerofunmannedavionicsvehicle,2010~2012

� Tiltrotorflightcontrolcomputer,2012� Nuclearpowerplantcontrolsystem,2013� HUMS(HealthandUsageMonitoringSystem)forhelicopter,2013~2016

� Subwayscreen-doorcontrolsystem,2016 (exporttoBrazil)� Communicationsystemofself-propelledguns,2017~� (project)Autonomousdrivingcar,2015~

Page 7: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ComprehensivesharedresourcepartitioningimplementationonQplus-AIR

Page 8: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Contents• Introduction

• HWplatform:P4080

• Comprehensiveresourcepartitioningimplementation� Memorybusbandwidthpartitioning� DRAMbankpartitioning� Sharedcachepartitioning– set-based/way-based

• CombinedallthetechniquesontheQplus-AIR

• Evaluations

• Conclusions&FutureWork

Page 9: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Introduction[1/2]• Robustpartitioningamongapplications(partitions)

� Qplus-AIRsupportsspatialandtemporalpartitioning� Ensuresindependentexecutionofmultipleapplicationswithvarioussafety-criticallevels

• Robustpartitioningmaynolongerbevalidinmulticore� Multiplecoressharehardwareresourcessuchascacheormemory� Concurrentlyexecutingapplicationsaffecteachotherduetothecontentiononsharedresource

� Majorsourceoftimingvariability� PessimisticWCETestimation→overprovisioningofhardwareresourcesandlowsystemutilization

� Insafety-criticalsystems,wehadtoturnoffbutonecore

Page 10: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Introduction[2/2]• Wemustdealwiththeresourcecontentionproperly

� WCEToftasksstaysguaranteedandtightlybounded� Especiallyforsafetycriticalapplicationsthatrequirecertification

• Requirementofinter-coreinterferencemitigation� “TheapplicanthasidentifiedtheinterferencechannelsthatcouldpermitinterferencetoaffectthesoftwareapplicationshostedontheMCPcores,andhasverifiedtheapplicant’schosenmeansofmitigationoftheinterference.“- FAACAST(CertificationAuthoritiesSoftwareTeam)-32APositionPaper*

• ComprehensivesharedresourcepartitioningimplementationonARINC653compliantRTOS� Integrateanumberofresourcepartitioningschemes,eachofwhichtargetsdifferentsharedhardwareresources, onQplus-AIR

� UniquechallengesduetothefactthattheRTOSdidnotsupportLinux-likedynamicpaging

*CertificationAuthoritiesSoftwareTeam,PositionPaperCAST-32A:Multi-coreProcessors,2016.

Page 11: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

HWplatform,P4080[1/2]• P4080architecture*

� EightPowerPCe500mccores� Eachcorehasaprivate32KB-I/32KB-DL1and128KBL2cache� TwoL332-way1MBcacheswithcache-lineinterleaving� Twomemorycontrollersfortwo2GBDDRDIMMmodules(eachDIMMmoduleshas16DRAMbanks)

� CoreNet coherencyfabric– interconnectscoresandotherSoC modules,ahigh-bandwidthswitchthatsupportsseveralconcurrenttransactions

PowerPCe500mccore

CoreNetInterface

L2$

L1I-$ L1I-$

CoreNetFabric

L3$

DDR

Controller

L3$

DUART

GPIO

FMan

BMan

……

QMan

DDR

Controller

DIMM

mod

ule

DIMM

mod

ule

*P4080QorIQIntegratedProcessorHardwareSpecifications,Feb2014.

Page 12: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

HWplatform,P4080[2/2]• PartitioningsupportofrecentPowerPCprocessors*

Hardware Support for Robust Partitioning in Freescale QorIQ Multicore SoCs (P4080 and derivatives), Rev. 0

10 Freescale Semiconductor

Overall partitioning model

Figure 4. Example of a Partitioned System

In this model, there are four distinct partitions, each running on two cores. The main memory is divided into several physical regions:

• Private• Shared between partitions; accessible at user level• Shared among partitions; restricted to hypervisor level

This mapping is enforced by the cores’ MMUs accessible only at the hypervisor level. System peripherals (PCIe and sRIO) in this example are not shared -- each is allocated to a partition usage. As such, the hypervisor is able to restrict their DMA-accessible memory range to some part of the memory region assigned to the partition through the MMU.

The shared internal memory (CPC) is partially partitioned, which provides two partition-specific sub-ranges.

NOTEThis CPC allocation can be done per-way. Each way is configured to work either as a cache or as a fixed-address sRAM.

1.7 HypervisorsSeveral hypervisor technologies are proposed for the P4080 to address different purposes.

RTOS suppliers, such as GreenHills, SysGo and WindRiver, have developed their own hypervisor technology with particular focus on safety and robust partitioning.

*HardwareSupportforRobustPartitioninginFreescaleQorIQMulticoreSoCs(P4080andderivatives)

Mainmemoryisdividedintoseveralphysicalregions• Private• Sharedbetweenpartitions;accessibleatuserlevel

• Sharedamongpartitions;restrictedtohypervisorlevel

Thismappingisenforcedbythecore’sMMUs

Systemperipheralsarenotshared• HypervisorisabletorestricttheirDMA-accessiblememoryrangetosomepartofthememoryregion(throughtheMMU)

CPCisPartitioned• Waypartition(32KBperway)

Eachcoreisallocatedtoeachpartition

Restrictthecoherencyoverhead• Disablethecoherency– preventsnoopoverhead• Specifyagroupparticipatingcoherency

Page 13: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Resourcepartitioningmechanisms• 1. Memorybus(interconnect)bandwidthpartitioning

• 2. Memorybankpartitioning

• Sharedcachepartitioning� 3. Set-basedcachepartitioningwithpagecoloring� 4. Way-basedcachepartitioningwiththesupportofP4080hardware

• CombineallthetechniquesandintegratedonQplus-AIR

• Paging� Memorybankpartitioningandset-basedcachepartitioningassumesthatOSsupportsLinux-likepaging

� PagingimplementationinQplus-AIR

Page 14: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsMemorybusbandwidthregulator [1/2]• Busbandwidthregulator*

� Limitthebandwidthusagepercore

Core1 Core2

1)Setmemorybusbandwidthbudget

10/10 3/10

2)Count#ofrequestssenttomemorybus

3) Generateaninterrupt

Core1 Core2

Memorybus(CoreNet Fabric)

#/10 #/10

Memorybus(CoreNet Fabric)

Core1 Core2

10/10 3/10

Memorybus(CoreNet Fabric)

Core1 Core2

10/10 3/10

Memorybus(CoreNet Fabric)

4)Throttletherequestsfromcore1

*H.Yun,G.Yao,R.Pellizzoni,M.Caccamo,andL.Sha.Memorybandwidthmanagementforefficientperformanceisolationinmulti-coreplatforms.IEEETransactionsonComputers,65:562–576,2015.

Page 15: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsMemorybusbandwidthregulator [2/2]• Implementation

� Setupthebudgetandconfiguretogenerateaninterruptwhenacoreexhaustthebudget� Configureperformancemonitoringcontrolregistersandperformancemonitoringcounters

� OSschedulerthrottlesfurtherexecutionatthatcore� ImplementinterrupthandlerfortheinterruptthatPMCgenerates� Schedulerde-schedulethetasksonthecore

• Periodofbandwidthregulatorexecution� Iftooshort,overheadbecomesexcessive;incontrast,iftoolong,predictabilityisworsened

� Defaultperiodofourimplementationis5ms

Page 16: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsBank-awarememoryallocation• DRAMbank-awarememoryallocation*

� Managesmemoryallocationinsuchawaythatnoapplicationsharesitsmemorybankwithapplicationsrunningonothercores

1)requestmemory

DRAM2)Allocatephysicalmemory

Bank1

Bank2

Application2

VirtualMemory Physical

memory

OS

Application1

VirtualMemory

Core1 Core2

Physicalmemory

DRAM

Core1 Core2

Bank1

Physicalmemory

Bank2

Physicalmemory

Pagetable(virtual-to-physicaladdresstranslation)

HWMMU

*H.Yun,R.Mancuso,Z.-P.Wu,andR.Pellizzoni.PALLOC:Drambank-awarememoryallocatorforperformanceisolationonmulticoreplatforms.InRTAS,2014.

031 67141618

banks

12L3cachesets

L2cachesets

[P4080memoryaddressmapping]

Page 17: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsSet-basedcachepartitioning [1/2]• Set-basedpartitioningviapagecoloring*

� Allocationofphysicalmemoryconsideringthecachesetlocation� 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑐𝑜𝑙𝑜𝑟𝑠 = ./.012341

5/612341∗./.01/223.3/83938:

1)requestmemory

DRAM2)AllocatephysicalmemoryApplication2

RTOS

Application1

Core1 Core2

Cache

031 716 12

L3cachesets

colorsPhysicalpagenumber

*R.Mancuso,R.Dudko,E.Betti,M.Cesati,M.Caccamo,andR.Pellizzoni.Real-timecachemanagementframeworkformulti-corearchitectures.InRTAS,2013.*M.Chisholm,B.C.Ward,N.Kim,andJ.H.Anderson.Cachesharingandisolationtradeoffsinmulticoremixed-criticalitysystems.InRTSS,2015.

Page 18: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsSet-basedcachepartitioning[2/2]• Implementations

� Manipulatesvirtualtophysicaladdressmapping– allocatedisjointcachesetstoeachcore� Amongaddressbits[15:7],cachesetindex,exploits[15:12]bits,whichintersectswiththephysicalpagenumberinP4080

• L2co-partitioning&Restrictionsofset-basedpartitioning� Co-partitionL2cache

� L3cachesetisdeterminedby[15:12]andL2cachesetby[13:6]� Using[13:12]bitshasasideeffectofco-partitioningL2cache

� Onlythe[15:14]bitsareallowedforL3cachesetpartitioning� Thenumberofcachepartitionsislimitedto4� Ifweadoptfor8cores,somecachesetsinevitablysharedby2cores

031 67141618

banks

12L3cachesets

L2cachesets

[P4080memoryaddressmapping]

Page 19: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsWay-basedcachepartitioning[1/2]• Way-basedpartitioningwithHardware-levelsupport

� Configuremainmemorywithmultipledistinctpartitions� Foreachpartition,registerthe(memoryrange,target,andpartitionID)intheLAW(LocalAccessWindow)register

� PartitiontheL3cacheandallocatedisjointcachewaystoeachcore� ConfiguretheL3cache(CPC)relatedregisters– transactionsfromthespecifiedpartitioncanallocatetheblocksinthedesignatedcacheways� E.g.,transactionsfromthe‘partition1‘allocateblocksinthe‘way0,1,2,3’

Physicalmemory(DDR3,DRAM)CPC(L3cache)

e6500core

L1cache L1cache L1cache L1cache

2MBBankedL2cache

CoreNetCoherencyFabric

e6500core

e6500core

e6500core

LocalAccessWindowsLocalAccessWindowsLocalAccessWindows CPCConfigurationRegister

MMUMMU MMUMMU

Part.1

Part.2

Part.1 Part.2 Part.3 Part.4

Part.3

Part.4

shared

Part.

1

Part.

2

Part.

3

Part.

4

Page 20: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsWay-basedcachepartitioning[2/2]• Relaxedrestrictionsonthenumberofcachepartitions

� Withset-basedcachepartitioning,numberofcachepartitionsisrestricteduptofour

� P4080supportscachepartitioningwithper-waygranularity,witheachwayproviding32KB� L3cacheis32-wayandcanbepartitionedto32parts

• Limitationsofway-basedcachepartitioning� Way-basedcachepartitioningcannotbeusedwithset-basedcacheormemorybankpartitioning

� Conflictingrequirementofmemoryallocation� Sequentialvs.interleaving� MayberelevanttoallotherPowerPCchipmodels

� Cachewaylockingallowintegration� MostARMprocessorssupportscachewaylocking� PowerPCe500mcprocessorsupportscachelockinginablockgranularity

Part.1(core1)

Part.2(core2)

Part.1Part.2Part.1Part.2Part.1Part.2

vs.

Page 21: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ImplementationissuesFromtheperspectiveofanRTOS[1/4]• Challenges– paging

� PagecoloringassumesthatOSmanagesmemorywithfixed-sizedpages(normally,4KB)

� Qplus-AIRdeliberatelyavoidpagingduetothetimingpredictabilityisworsenedwhenaTLBmissoccurswithinapagingscheme

Kerneldata

Kernelcode

Partition2

Partition1

Partition3

Memorylayout

• MemorymanagementofQplus-AIR� Managedwithvariablesizedpagesratherthanfixed4KBpages� Kerneldata/code,partitionregions� Manageseachregionasonelargepage- 1TLBentryforeachregion

� OSlockstheentryintheTLB- ForceallthemappingdatatostayintheTLB

� Sizeofmemoryforeachapplicationisconfiguredbydevelopers

� MMUisusedtopreventcross-applicationmemoryaccesses

16MB

16MB

Size(example)

16MB

64MB

64MB

Page 22: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ImplementationissuesFromtheperspectiveofanRTOS[3/4]• MemorymanagementinP4080

� TwolevelsofMMU� Hardware-managedL1MMU� Software-managedL2MMU

� EachMMUconsistsof� TLBforvariable-sizedpages(VSP),11differentpagesizes(4KB~4GB)

� TLBfor4KBfixed-sizedpages(FSP)� TLBlockingforvariable-sizedpages

• Modify memorymanagementofQplus-AIR� Tosupportpagecoloring,whichisusedtoimplementmemorybankpartitioningandset-basedcachepartitioning

� Manageapplication’smemoryregionswith4KBgranularity� Managementofkernelregionswasunchanged– bindperformancepredictabilityofkernelexecution

[ref.]PowerPCe500mccorereferencemanual

Page 23: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ImplementationissuesFromtheperspectiveofanRTOS[3/4]• Overheadofpaging

� ‘Latency’benchmarkwithchangingdatasizeandaccesspattern� Sequentialaccessandrandomaccessoflinkedlist

� Measuretheaveragememoryaccesslatency

0

10

20

30

40

50

60

70

80

90

0 2000 4000 6000 8000 10000

aver

age

mem

ory

late

ncy

data size (KB)

paging overhead(sequential access)

no paging paging

0

50

100

150

200

250

300

0 2000 4000 6000 8000 10000

aver

age

mem

ory

late

ncy

data size (KB)

paging overhead(random access)

no paging paging

Upto6%overheadwhendatasize>2MB

[note]TLBhitratio=98.43%L2TLBhas512-entry

Upto197%overheadwhendatasize>2MB

Page 24: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ImplementationissuesFromtheperspectiveofanRTOS[4/4]• Analysisofoverhead

� DegradationisduetotheMMUarchitectureofe500mccore� L1instructionanddataTLBsandL2unifiedTLB� L1MMUiscontrolledasaninclusivecacheofL2MMU� InPowerPCe6500core,L1andL2MMUisnotinclusive

• Requirementsforthepredictablepaging� Somestudiesfocusedonpredictablepaging*� COTShardwareprovidesmeansforimplementingpredictablepaging–software-managedTLBorTLBlocking

L1TLB

L2TLB

L1TLB

L2TLB

L1TLB

L2TLB

TLBentryforcodeTLBentryfordata

Evict(replaceout)InstructionTLBentries

DatasizeincreasesInvalidated(inclusionproperty)

L1I-TLBmiss!

L1I-TLBmissevenifthecodesizeiswithintheL1I-TLBcoverage

I-TLB D-TLB

*D.HardyandI.Puaut.Predictablecodeanddatapagingforrealtimesystems.InECRTS,2008.

*T.Ishikawa,T.Kato,S.Honda,andH.Takada.Investigationandimprovementontheimpactoftlb missesinreal-timesystems.InOSPERT,2013.

Page 25: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

ResourcepartitioningmechanismsIntegrationofpartitioningschemes• Fourtechniqueswithpaging

� Memorybuspartitioning(RP-BUS),memorybankpartitioning(RP-BANK),set-basedcachepartitioning(RP-$SET),andway-basedcachepartitioning(RP-$WAY)

• Integrationofmemorybus,memorybank,andset-basedandway-basedcachepartitioningmechanisms� Notethatway-basedcachepartitioningcannotbeintegratedwithmemorybankpartitioningorset-basedcachepartitioning

• Possibleintegration options� Integrationoption#1:RP_BUS,RP_BANK,andRP_$SET

� Restrictionsonthenumberofavailablecachepartitions� Integrationoption#2:RP_BUSandRP_$WAY

� Contentionsonmemorybankisunavoidable

Page 26: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Evaluations [1/5]• Evaluationsetup

� Hardwareplatform� P4080withactivate4or8oftotal8cores

� Softwareplatform� Qplus-AIR

� Syntheticbenchmark� Latency :traversealinkedlisttoperformaread/writeoperationoneachnode,memoryrequestismadeoneatatime

� Bandwidth :accessmemoryinsequencewithnodatadependencybetweenconservativeaccesses– CPUgeneratemultiplememoryrequestsinparallel,maximizingmemorylevelparallelism(MLP)availableinthememorysystem

� Metric� Averagememoryaccesslatency(ns)– timetoread/writeoneblock(64B)� Normalizeaveragelatencytothebest-casewithoutresourcecontention

Page 27: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Evaluations[2/5]• Evaluationsetup

� Twobenchmarkmixes� 4-core MIX

� Causecontentiononallthememoryresourcestoevaluateeachpartitioningmechanismandintegratedone

� 8-coreMIX� toshowthelimitationofset-basedcachepartitioning

� Datasizeconfiguration

Core1 Core2, 3 Core4

Latency(512KB)

Bandwidth(4MB)

Bandwidth(32MB)

Core1, 2 Core3, 4, 5, 6 Core7, 8

Latency(512KB)

Bandwidth(4MB)

Bandwidth(32MB)

DatasizeExamples Cache(LLC)

hit ratePlatform:2MBLLCon4-coreCPU

LLC SizeofLLCdividedbynumberofcores

2MB/4cores=512KB

100%

DRAM/small TwicethesizeofLLC 2MB;2 =4MB 0%

DRAM/large SignificantlylargerthanLLC

Muchlargerthan2MB(32MBinourexperimentalsetup)

0%

Page 28: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Evaluations [3/5](a) (b) (c) (d) (e)

core1 0.41 0.55 0.97 0.97 1.00

core2 0.49 0.57 0.62 0.78 1.00

core3 0.50 0.57 0.62 0.79 1.00

core4 0.93 0.87 0.87 0.85 1.00

0.20.30.40.50.60.70.80.91

1.1

(a)WORST (b)RP_BANK (c)RP_BANK+RP_$SET

(d)RP_BANK+RP_$SET+RP_BUS

(e)BEST

Normalize

dperformance

core1 core2 core3 core41 istheperformancew/ointerference

• 4-coreMIX,IntegrationOption#1� RP_BANK,RP_$SET,andRP_BUS� (b)RP_BANK:allthecoresareenabledtoaccessbanksinparallel� (c)AddingRP_$SETensures512KBL3cacheforLatency(LLC)apprunningoncore1(56%improvementcomparedtotheworst-case)� Moreover,feweraccessestomainmemorywererequestedbycore1helpsperformanceonothercores

� (d)AddRP_BUS:Performancewhenalltechniquesareputtogether

Page 29: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Evaluations [4/5]• 4-coreMIX,Integrationoption#2

� RP_$WAYandRP_BUS� RP_BANKisinapplicable

� Inthisbenchmark,memoryaccessisnotconcentratedtoabanksinceRP_$WAYallocatesmemorytoeachcoresequentially

� However,worstcasecouldarisedependingonantaskbehavior� RP_$WAYvs.RP_SET

� PagingoverheadonRTOSdegradesperformance� 3%, 16%, 17%, and13%foreachapplicationoncore1,2,3,and4

0.20.30.40.50.60.70.80.91

1.1

(a)WORST (b)RP_$WAY (c)RP_$WAY+RP_BUS

(d)BEST

Normalize

dperformance

core1 core2 core3 core4

(a) (b) (c) (d)

core1 0.41 1.00 1.00 1.00

core2 0.49 0.78 0.91 1.00

core3 0.50 0.79 0.91 1.00

core4 0.93 1.01 0.89 1.00

0.20.30.40.50.60.70.80.91

1.1

(c)RP_BANK+RP_$SET

1 istheperformancew/ointerference

Page 30: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

0

0.2

0.4

0.6

0.8

1

1.2

(a)WORST (b)RP_BANK+RP_$SET (c)RP_BANK+RP_$SET+RP_BUS

(d)RP_$WAY (e)RP_$WAY+RP_BUS BEST

Norm

alize

dperfo

rmance

core1 core2 core3 core4 core5 core6 core7 core8

Evaluations [5/5]• 8-coreMIX,Integration#1&#2

� Restrictionsonnumberofpossiblecachepartitions� RP_$SET– 4partitions,RP_$WAY– 32partitionsinP4080platform� PerformanceofLatency(LLC)isabout64%and88%withRP_$SETandRP_$WAY,respectively

� Overheadofpaging� Comparetheperformancein(b)and(d),or(c)and(e)

(a) (b) (c) (d) (e) (f) core1 0.37 0.64 0.64 0.88 0.87 1.00core2 0.37 0.64 0.63 0.88 0.86 1.00core3 0.30 0.42 0.54 0.52 0.71 1.00core4 0.30 0.42 0.54 0.52 0.71 1.00core5 0.30 0.42 0.54 0.53 0.71 1.00core6 0.30 0.42 0.54 0.53 0.71 1.00core7 0.82 0.75 0.74 0.94 0.79 1.00core8 0.82 0.74 0.73 0.94 0.79 1.00

1 istheperformancew/ointerference

Page 31: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

Conclusions&FutureWork• Conclusions

� Qplus-AIR,anARINC653compliantRTOS� ComprehensivesharedresourcepartitioningimplementationonanARINC653compliantRTOS,Qplus-AIR� Implementationissuesofimplementingandcombiningmultipleresourcepartitioningmechanisms

� TheuniquechallengesweencounteredduetothefactthattheRTOSdidnotsupportLinux-likedynamicpaging

• FutureWork� Predictablepaging� Evaluationwithreal-worldapplications

Page 33: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

References [1/2][1]AirlinesElectronicEngineeringCommittee,AvionicsApplicationSoftwareStandardInterfaceARINCSpecification653Part1,2006.[2]BIOSandkerneldeveloper’sguildforAMDfamily15hprocessors,March2012.[3]ARMCortex53TechnicalReferenceManual,2014.[4]P4080QorIQIntegratedProcessorHardwareSpecifications,Feb2014.[5]CertificationAuthoritiesSoftwareTeam,PositionPaperCAST-32A:Multi-coreProcessors,2016.[6]QorIQ T2080ReferenceManual,2016.[7]M.Chisholm,B.C.Ward,N.Kim,andJ.H.Anderson.Cachesharingandisolationtradeoffsinmulticoremixed-criticalitysystems.InRTSS,2015.[8]J.Flodin,K.Lampka,andW.Yi.Dynamicbudgetingforsettlingdramcontentionofco-runninghardandsoftreal-timetasks.InSIES,2014.[9]D.HardyandI.Puaut.Predictablecodeanddatapagingforrealtimesystems.InECRTS,2008.[10]T.Ishikawa,T.Kato,S.Honda,andH.Takada.Investigationandimprovementontheimpactoftlb missesinreal-timesystems.InOSPERT,2013.[11]H.Kim,A.Kandhalu,andR.Rajkumar.Acoordinatedapproachforpracticalos-levelcachemanagementinmulti-corereal-timesystems.InECRTS,2013.[12]T.Kim,D.Son,C.Shin,S.Park,D.Lim,H.Lee,B.Kim,andC.Lim.Qplus-air:Ado-178bcertifiablearinc 653rtos.InThe8thISET,2013.

Page 34: Revisiting Resource Partitioning for Multi -core Chipsrtsl-edge.cs.illinois.edu/CMAAS17/media/talk_1.pdfRevisiting Resource Partitioning for Multi -core Chips: ... AUTOSAR (automotive

References [2/2][13]R.Mancuso,R.Dudko,E.Betti,M.Cesati,M.Caccamo,andR.Pellizzoni.Real-timecachemanagementframeworkformulti-corearchitectures.InRTAS,2013.[14]M.D.BennettandN.C.Audsley.Predictableandefficientvirtualaddressingforsafety-criticalreal-timesystems.InECRTS,2001.[15]J.NowotschandM.Paulitsch.Leveragingmulti-corecomputingarchitecturesinavionics.InEDCC,2012.[16]J.Nowotsch,M.Paulitsch,D.Buhler,H.Theiling,S.Wegener,andM.Schmidt.Multi-coreinterference-sensitivewcetanalysisleveragingruntimeresourcecapacityenforcement.InECRTS,2014.[17]S.A.PanchamukhiandF.Mueller.Providingtaskisolationviatlbcoloring.InRTAS,2015.[18]M.K.QureshiandY.N.Patt.Utility-basedcachepartitioning:Alow-overhead,high-performance,runtimemechanismtopartitionsharedcaches.InMICRO,2006.[19]R.E.KesslerandM.D.Hill.Pagereplacementalgorithmsforlargereal-indexedcaches.InACMTrans.onComp.Sys.,1992.[20]L.Sha,M.Caccamo,R.Mancuso,J.-E.Kim,andM.-K.Yoon.Singlecoreequivalentvirtualmachinesforhardreal-timecomputingonmulticoreprocessors,whitepaper.2014.[21]N.Suzuki,H.Kim,D.deNiz,B.Anderson,L.Wrage,M.Klein,andR.Rajkumar.Coordinatedbankandcachecoloringfortemporalprotectionofmemoryaccesses.InICCSE,2013.[22]H.Yun,R.Mancuso,Z.-P.Wu,andR.Pellizzoni.Palloc:Drambank-awarememoryallocatorforperformanceisolationonmulticoreplatforms.InRTAS,2014.[23]H.Yun,G.Yao,R.Pellizzoni,M.Caccamo,andL.Sha.Memorybandwidthmanagementforefficientperformanceisolationinmulti-coreplatforms.IEEETransactionsonComputers,65:562–576,2015.