virtual memory-fixed - ccs.neu.edu · • a simple mechanism for address translation • maps a...

103
CS 5600 Computer Systems Lecture 7: Virtual Memory

Upload: vunguyet

Post on 26-Aug-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

CS5600ComputerSystems

Lecture7:VirtualMemory

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

2

MainMemory

• Mainmemoryisconceptuallyverysimple– Codesitsinmemory– Dataiseitheronastackoraheap– Everythinggetsaccessedviapointers– Datacanbewrittentoorreadfromlongtermstorage

• Memoryisasimpleandobviousdevice– SowhyismemorymanagementoneofthemostcomplexfeaturesinmodernOSes?

3

ProtectionandIsolation

• Physicalmemorydoesnotofferprotection orisolation

40x00000000

0xFFFFFFFFKernelMemory

Process1w/SecretData

PhysicalMemory

EvilProcessI’minyourprocess,stealing

yourdata;)

Ohsorry,Ididn’tmeantooverwriteyourtask_structs ;)

CompilationandProgramLoading

• Compiledprogramsincludefixedpointeraddresses

• Example:000FE4D8<foo>:…000FE21A: pusheax000FE21D: pushebx000FE21F: call0x000FE4D8• Problem:whatiftheprogramisnotloadedatcorrespondingaddress?

50x00000000

0xFFFFFFFFKernelMemory

Process1

PhysicalMemory

Process2

Addr offoo():0x000FE4D8

Addr offoo():0x0DEB49A3

PhysicalMemoryhasLimitedSize

• RAMischeap,butnotascheapassolidstateorcloudstorage

• WhathappenswhenyourunoutofRAM?

60x00000000

0xFFFFFFFFKernelMemory

Process1

Process2

Process3

Process4

Process5

Physicalvs.VirtualMemory• Clearly,physicalmemoryhaslimitations– Noprotectionorisolation– Fixedpointeraddresses– Limitedsize– Etc.

• Virtualizationcansolvetheseproblems!– Aswellasenableadditional,coolfeatures

7

AToyExample• Whatdowemeanbyvirtualmemory?– Processesusevirtual (orlogical)addresses– Virtualaddressesaretranslatedtophysicaladdresses

0x0000

0xFFFF KernelMemory

Process1

PhysicalMemory(Reality)

Process’ViewofVirtualMemory

0x0000

0xFFFF

Process2

Process3

Process1

Allthememorybelongstome!

IammasterofallIsurvey!

PhysicalAddress

VirtualAddress

MagicalAddressTranslationBlackBox

ImplementingAddressTranslation• Inasystemwithvirtualmemory,eachmemoryaccessmustbetranslated

• CantheOSperformaddresstranslation?– Onlyifprogramsareinterpreted

• Modernsystemshavehardwaresupportthatfacilitatesaddresstranslation– ImplementedintheMemoryManagementUnit(MMU)oftheCPU

– CooperateswiththeOStotranslatevirtualaddressesintophysicaladdresses

9

VirtualMemoryImplementations• TherearemanywaystoimplementanMMU– Baseandboundregisters– Segmentation– Pagetables– Multi-levelpagetables

• Wewilldiscusseachoftheseapproaches– Howdoesitwork?–Whatfeaturesdoesitoffer?–Whatarethelimitations?

10

Old,simple,limitedfunctionality

Modern,complex,lotsoffunctionality

GoalsofVirtualMemory• Transparency– Processesareunawareofvirtualization

• Protectionandisolation• Flexiblememoryplacement– OSshouldbeabletomovethingsaroundinmemory

• Sharedmemoryandmemorymappedfiles– Efficientinterprocess communication– Sharedcodesegments,i.e.dynamiclibraries

• Dynamicmemoryallocation– Growheapsandstacksondemand,noneedtopre-allocatelargeblocksofemptymemory

• Supportforsparseaddressspaces• Demand-basedpaging– Createtheillusionofnear-infinitememory

11

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

12

BaseandBoundsRegisters• Asimplemechanismforaddresstranslation• Mapsacontiguousvirtualaddressregiontoacontiguousphysicaladdressregion

130x0000

0xFFFF KernelMemory

Process1

PhysicalMemory

0x00FF

0x10FFProcess1

Process’ViewofVirtualMemory

0x0001

0x1001

Register Value

EIP 0x0023

ESP 0x0F76

BASE 0x00FF

BOUND 0x1000

BaseandBoundsExample

14

0x0000

0xFFFF KernelMemory

Process1

PhysicalMemory

0x00FF

0x10FFProcess1

Process’ViewofVirtualMemory

0x0001

0x1001

Register Value

EIP 0x0023

ESP 0x0F76

BASE 0x00FF

BOUND 0x1000

0x0023mov eax,[esp]

1)Fetchinstruction0x0023+0x00FF=0x0122

2)Translatememoryaccess0x0F76+0x00FF=0x1075

3)Movevaluetoregister[0x1075]à eax

1

21

2

ProtectionandIsolation

15

0x0000

0xFFFF KernelMemory

Process1

PhysicalMemory

0x00FF

0x10FF

Process1

Process’ViewofVirtualMemory

0x0001

0x1001

Register Value

EIP 0x0023

ESP 0x0F76

BASE 0x00FF

BOUND 0x1000

0x0023mov eax,[0x4234]

1)Fetchinstruction0x0023+0x00FF=0x0122

2)Translatememoryaccess0x4234+0x00FF=0x43330x4333>0x10FF

(BASE+BOUND)RaiseProtectionException!

1

2

1

2

ImplementationDetails

• BASEandBOUNDareprotectedregisters– OnlycodeinRing0maymodifyBASEandBOUND– Preventsprocessesfrommodifyingtheirownsandbox

• EachCPUhasoneBASEandoneBOUNDregister– JustlikeESP,EIP,EAX,etc…– Thus,BASEandBOUNDmustbesavedarestoredduringcontextswitching

16

BaseandBoundPseudocode1. PhysAddr =VirtualAddress +BASE2. if (PhysAddr >=BASE+BOUND)3. RaiseException(PROTECTION_FAULT)4. Register=AccessMemory(PhysAddr)

17

• Simplehardwareimplementation• Simpletomanageeachprocess’virtualspace

• Processescanbeloadedatarbitraryfixedaddresses

• Offersprotectionandisolation• Offersflexibleplacementofdatainmemory

AdvantagesofBaseandBound

180x00000000

0xFFFFFFFFKernelMemory

Process1

PhysicalMemory

Process2

I’mloadedataddress0x00AF

No,I’mloadedataddress0x00AF

PreviousBASEà 0x00FFNewBASEà 0x10A0

LimitationsofBaseandBound• Processescanoverwritetheirowncode– Processesaren’tprotectedfromthemselves

• Nosharingofmemory– Code(read-only)ismixedinwithdata(read/write)

• Processmemorycannotgrowdynamically– Mayleadtointernalfragmentation

190x00000000

0xFFFFFFFFKernelMemory

/bin/bash

PhysicalMemory

/bin/bash

Code

Code

Data

DataCodeisduplicatedinmemory:(

InternalFragmentation• BOUNDdeterminesthemaxamountofmemoryavailabletoaprocess

• Howmuchmemorydoweallocate?– Emptyspaceleadstointernalfragmentation

• Whatifwedon’tallocateenough?– IncreasingBOUNDaftertheprocessisrunningdoesn’thelp

20

PhysicalMemory

Code

Heap

Stack

Wastedspace=internalfragmentation

Heap

Stack

IncreasingBOUNDdoesn’tmovethestackawayfromtheheap

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

21

TowardsSegmentedMemory• HavingasingleBASEandasingleBOUNDmeanscode,stack,andheapareallinonememoryregion– Leadstointernalfragmentation– Preventsdynamicallygrowingthestackandheap

• Segmentation isageneralizationofthebaseandboundsapproach– Giveeachprocessseveralpairsofbase/bounds• Mayormaynotbestoredindedicatedregisters

– Eachpairdefinesasegment– Eachsegmentcanbemovedorresizedindependently

22

SegmentationDetails• Thecodeanddataofaprocessgetsplitintoseveralsegments– 3segmentsiscommon:code,heap,andstack– Somearchitecturessupport>3segmentsperprocess

• Eachprocessviewsitssegmentsasacontiguousregionofmemory– Butinphysicalmemory,thesegmentscanbeplacedinarbitrarylocations

• Question:givenavirtualaddress,howdoestheCPUdeterminewhichsegmentisbeingaddressed?

23

SegmentsandOffsets• Keyidea:splitvirtualaddressesintoasegmentindexandanoffset

• Example:supposewehave14-bitaddresses– Top2bitsarethesegment– Bottom12bitsaretheoffset

• 4possiblesegmentsperprocess– 00,01,10,11

24

13 12 11 10 9 8 7 6 5 4 3 2 1 0

Segment Offset

SeparationofResponsibility• TheOSmanagessegmentsandtheirindexes– Createssegmentsfornewprocessesinfreephysicalmemory

– Buildsatablemappingsegmentsindexestobaseaddressesandbounds

– Swapsoutthetablesandsegmentregistersduringcontextswitches

– Freessegmentsfromphysicalmemory

• TheCPUtranslatesvirtualaddressestophysicaladdressesondemand– Usesthesegmentregisters/segmenttablesbuiltbytheOS

25

SegmentationExample

26

0x0000

0x3FFF

Process’ViewofVirtualMemory

Code

Segment Index Base Bound

CS (Code) 00 0x0020 0x0100

HS (Heap) 01 0xB000 0x0100

SS (Stack) 10 0x0400 0x0100

0x0023mov eax,[esp]1)Fetchinstruction0x0023(EIP)- 00000000100011

0x0020(CS)+0x0023=0x00432)Translatememoryaccess0x2015(ESP)– 10000000010101

0x0400(SS)+ 0x0015=0x04150x1000

0x0000

0xFFFFKernelMemory

PhysicalMemory

Code

Heap

Stack

Code

Heap

Stack

0x0020

0xB1000xB000

0x01200x04000x0500

0x2000

0x3000

Heap

Stack

SegmentationPseudocode1. //gettop2bitsof14-bitVA2. Segment=(VirtualAddress &SEG_MASK)>>SEG_SHIFT3. //nowgetoffset4. Offset=VirtualAddress &OFFSET_MASK5. if (Offset>=Bounds[Segment])6. RaiseException(PROTECTION_FAULT)7. else8. PhysAddr =Base[Segment]+Offset9. Register=AccessMemory(PhysAddr)

27

MoreonSegments• Inthepreviousexample,weusea14-bitaddressspacewith2bitsreservedforthesegmentindex– Thislimitsusto4segmentsperprocess– Eachsegmentis212 =4KBinsize

• Realsegmentationsystemstendtohave1. Morebitsforthesegmentsindex(16-bitsforx86)2. Morebitsfortheoffset(16-bitsforx86)

• However,segmentsarecourse-grained– Limitednumberofsegmentsperprocess(typically~4)

28

SegmentPermissions• ManyCPUs(includingx86)supportpermissionsonsegments– Read,write,andexecutable

• Disallowedoperationstriggeranexception– E.g.Tryingtowritetothecodesegment

29

0x0000

0x3FFF

Process1’sViewofVirtualMemory

Code

Index Base Bound Permissions

00 0x0020 0x0100 RX

01 0xB000 0x0100 RW

10 0x0400 0x0100 RW11 0xE500 0x100 R

0x1000

0x2000

0x3000

Heap

Stack

.rodata

x86Segments• Intel80286introducedsegmentedmemory– CS– codesegmentregister– SS– stacksegmentregister– DS– datasegmentregister– ES,FS,GS– extrasegmentregisters

• In16-bit(realmode)x86assembly,segment:offsetnotationiscommonmov [ds:eax],42 //move42tothedatasegment,offset

//bythevalueineaxmov [esp],23 //usestheSSsegmentbydefault

30

x86SegmentsToday• Segmentregistersandtheirassociatedfunctionalitystillexistintoday’sx86CPUs

• However,the80386introducedpagetables– ModernOSes “disable”segmentation– TheLinuxkernelsetsupfoursegmentsduringbootup

31

SegmentName Description Base Bound Ring

KERNEL_CS Kernelcode 0 4GB 0

KERNEL_DS Kerneldata 0 4GB 0

USER_CS Usercode 0 4GB 3

USER_DS Userdata 0 4GB 3

Pagesareusedtovirtualizememory,notsegments

Usedtolabelpageswithprotectionlevels

WhatisaSegmentationFault?• Ifyoutrytoread/writememoryoutsideasegmentassignedtoyourprocess

• Examples:– char buf[5];

strcpy(buf,“HelloWorld”);return 0;//whydoesitseg faultwhenyoureturn?

• Today“segmentationfault”isananachronism– Allmodernsystemsusepagetables,notsegments

32

SharedMemory

33

0x0000

0x3FFF

Process1’sViewofVirtualMemory

Code

Index Base Bound

00 0x0020 0x0100

01 0xB000 0x0100

10 0x0400 0x0100

11 0xE500 0x0300

0x1000

0x2000

0x3000

Heap

Stack

0x0000

0x3FFF

Process2’sViewofVirtualMemory

0x1000

0x2000

0x3000

Code

Heap

Stack

SharedData

SharedData

Index Base Bound

00 0x0020 0x0100

01 0xC000 0x0100

10 0x0600 0x0100

11 0xE500 0x0300

Same00and01physicalsegments

Different01and10physicalsegments

AdvantagesofSegmentation• Alltheadvantagesofbaseandbound• Bettersupportforsparseaddressspaces– Code,heap,andstackareinseparatesegments– Segmentsizesarevariable– Preventsinternalfragmentation

• Supportssharedmemory• Persegmentpermissions– Preventsoverwritingcode,orexecutingdata

34

ExternalFragmentation• Problem:variablesizesegmentscanleadtoexternalfragmentation– Memorygetsbrokenintorandomsize,non-contiguouspieces

• Example:thereisenoughfreememorytostartanewprocess– Butthememoryisfragmented:(

• Compactioncanfixtheproblem– Butitisextremelyexpensive

35

KernelMemory

PhysicalMemory

Code

Heap

Stack

Code

Heap

Stack

Heap

Stack

Code

Code

HeapStack

Code

Heap

Stack

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

36

TowardsPagedMemory• Segmentsimproveonbaseandbound,buttheystillaren’tgranularenough– Segmentsleadtoexternalfragmentation

• Thepagedmemorymodelisageneralizationofthesegmentedmemorymodel– Physicalmemoryisdividedupintophysicalpages(a.k.a.frames)offixedsizes

– Codeanddataexistinvirtualpages– Atablemapsvirtualpagesà physicalpages(frames)

37

ToyExample• Supposewehavea64-bytevirtualaddressspace– Letsspecify16bytesperpage

• Howmanybitsdovirtualaddressesneedtobeinthissystem?– 26 =64bytes,thus6bitaddresses

Page00

64

16

32

48

Page1

Page2

Page3

VirtualMemory

38

• Howmanybitsofthevirtualaddressareneededtoselectthephysicalpage?– 64bytes/16bytesperpage=4pages– 22 =4,thus2bitstoselectthepage

5 4 3 2 1 0

VirtualPage# Offset

ToyExample,Continued

Page00

64

16

32

48

Page1

Page2

Page3

VirtualMemory

Page00

64

16

32

48

Page1

Page2

Page3

PhysicalMemory

Page4

128

96

112

Page5

Page6

Page7

80

VirtualPage# PhysicalPage#

00(0) 010(2)

01(1) 111(7)

10(2) 100(4)

11(3) 001(1)

mov eax,[21]

Translation21– 010101

117– 1110101

ConcreteExample• Assumea32-bitvirtualandphysicaladdressspace– Fixthepagesizeat4KB(4096bytes,212)

• Howmanytotalpageswilltherebe?– 232 /212 =1048576 (220)

• Howmanybitsofavirtualaddressareneededtoselectthephysicalpage?– 20bits(sincethereare1048576totalpages)

• Assumethateachpagetableentryis4byteslarge– Howbigwillthepagetablebe?– 1048586*4bytes=4MBofspace

40

• Eachprocessneedsitsownpagetable• 100processes=400MBofpagetables

ConcreteExample,Continued• Process1requires:

– 2 KBforcode(1page)– 7KBforstack(2pages)– 12KBforheap(3pages)

41

0

232

Process1’sViewofVirtualMemory

Code

Heap

Stack

HeapHeap

Stack

VPN PFN Valid?

0…i - 1 whatever 0

i d 1

i +1…j– 1 whatever 0

j b 1

j+1 f 1

j+2 e 1

j+3…k- 1 whatever 0

k a 1

k+1 c 1

Pagei

PagejPagej+1Pagej+2

PagekPagek+1

0

230KernelMemory

PhysicalMemory

Code

Heap

Stack

Heap

Heap

Stack

Pagea

PagebPagec

PagedPagee

Pagef

Thevastmajorityofeachprocess’page

tableisempty,i.e.thetableissparse

Heap

Heap

Pagej+3

Pageg

PageTableImplementation• TheOScreatesthepagetableforeachprocess– Pagetablesaretypicallystoredinkernelmemory– OSstoresapointertothepagetableinaspecialregisterintheCPU(CR3registerinx86)

– Oncontextswitch,theOSswapsthepointerfortheoldprocessestableforthenewprocessestable

• TheCPUusesthepagetabletotranslatevirtualaddressesintophysicaladdresses

42

x86PageTableEntry• Onx86,pagetableentries(PTE)are4bytes

43

31- 12 11- 9 8 7 6 5 4 3 2 1 0

PageFrameNumber (PFN) Unused G PAT D A PCD PWT U/S W P

• Bitsrelatedtopermissions– W– writablebit– isthepagewritable,orread-only?– U/S– user/supervisorbit– canuser-modeprocessesaccessthispage?

• Hardwarecachingrelatedbits:G,PAT,PCD,PWT• Bitsrelatedtoswapping– P– presentbit– isthispageinphysicalmemory?– A– accessedbit– hasthispagebeenreadrecently?– D– dirtybit– hasthispagebeenwrittenrecently?

Wewillrevisittheselaterinthelecture…

PageTablePseudocode1. //ExtracttheVPNfromthevirtualaddress2. VPN=(VirtualAddress &VPN_MASK)>>SHIFT3. //Formtheaddressofthepage-tableentry(PTE)4. PTEAddr =PTBR+(VPN*sizeof(PTE))5. //FetchthePTE6. PTE=AccessMemory(PTEAddr)7. if (PTE.Valid ==False)//Checkifprocesscanaccessthepage8. RaiseException(SEGMENTATION_FAULT)9. elseif(CanAccess(PTE.ProtectBits)==False)10. RaiseException(PROTECTION_FAULT)11. //AccessisOK:formphysicaladdressandfetchit12. offset=VirtualAddress &OFFSET_MASK13. PhysAddr =(PTE.PFN<<PFN_SHIFT)|offset14. Register=AccessMemory(PhysAddr)

44

TricksWithPermissionsandSharedPages

• Recallhowfork()isimplemented– OScreatesacopyofallpagescontrolledbytheparent

• fork()isaslooooow operation– Copyingallthatmemorytakesalooooong time

• Canweimprovetheefficiencyoffork()?– Yes,ifwearecleverwithsharedpagesandpermissions!

45

Copy-on-Write• Keyidea:ratherthancopyalloftheparentspages,createanewpagetableforthechildthatmapstoalloftheparentspages– Markallofthepagesasread-only– Ifparentorchildwritestoapage,aprotectionexceptionwillbetriggered

– TheOScatchestheexception,makesacopyofthetargetpage,thenrestartsthewriteoperation

• Thus,allunmodifieddataisshared– Onlypagesthatarewrittentogetcopied,ondemand

46

Copy-on-WriteExample

47

0

230KernelMemory

PhysicalMemory

Code

Heap

StackPagea

Paged

Pagef

Function VPN PFN Writable?

Code i d 0

Heap j b 1

Stack k a 1

ParentsPageTable

Function VPN PFN Writable?

Code i d 0

Heap j b 0

Stack k a 0

ChildsPageTable

0

0

ProtectionException

Stack

Pagem1

1

Stackm

Zero-on-Reference

• Howmuchphysicalmemorydoweneedtoallocatefortheheapofanewprocess?– Zerobytes!

• Whenaprocesstouchestheheap– SegmentationfaultintoOSkernel– Kernelallocatessomememory– Zeros thememory• Avoidaccidentallyleakinginformation!

– Restarttheprocess

48

AdvantagesofPageTables• Alltheadvantagesofsegmentation• Evenbettersupportforsparseaddressspaces– Eachpageisrelativelysmall– Fine-grainedpageallocationstoeachprocess– Preventsinternalfragmentation

• Allpagesarethesamesize– Eachtokeeptrackoffreememory(say,withabitmap)– Preventsexternalfragmentation

• Persegmentpermissions– Preventsoverwritingcode,orexecutingdata

49

ProblemsWithPageTables• Pagetablesarehuge– Ona32-bitmachinewith4KBpages,eachprocess’tableis4MB

– Ona64-bitmachinewith4KBpages,thereare240entriespertableà 240*4bytes=4TB

– Andthevastmajorityofentriesareempty/invalid!• Pagetableindirectionaddssignificantoverheadtoallmemoryaccesses

50

PageTablesareSlow0x1024mov [edi +eax *4], 0x00x1028inc eax0x102Ccmp eax,0x03E80x1030jne 0x1024

• Howmanymemoryaccessesoccurduringeachiterationoftheloop?– 4instructionsarereadfrommemory– [edi +eax *4]writestoonelocationinmemory– 5pagetablelookups

• Eachmemoryaccessmustbetranslated• …andthepagetablesthemselvesareinmemory

• Naïvepagetableimplementationdoublesmemoryaccessoverhead 51

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

52

Problem:PageTableSpeed• Pagetablesgiveusagreatdealofflexibilityandgranularitytoimplementvirtualmemory

• However,pagetablesarelarge,thustheymustgoinRAM(asopposedtoinaCPUregister)– Eachvirtualmemoryaccessmustbetranslated– Eachtranslationrequiresatablelookupinmemory– Thus,memoryoverheadisdoubled

• Howcanweusepagetableswithoutthismemorylookupoverhead?

53

Caching• Keyidea:cachepagetableentriesdirectlyintheCPU’sMMU– TranslationLookaside Buffer(TLB)– Shouldbecalledaddresstranslationcache

• TLBstoresrecentlyusedPTEs– SubsequentrequestsforthesamevirtualpagecanbefilledfromtheTLBcache

• Directlyaddressesspeedissueofpagetables– On-dieCPUcacheisvery,veryfast– TranslationsthathittheTLBdon’tneedtobelookedupfromthepagetableinmemory

54

ExampleTLBEntry

• VPN&PFN– virtualandphysicalpages• G– isthispageglobal(i.e.accessiblebyallprocesses)?

• ASID– addressspaceID• D– dirtybit– hasthispagebeenwrittenrecently?• V– validbit– isthisentryintheTLBvalid?• C– cachecoherencybits– formulti-coresystems

55

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ASIDGVirtualPageNumber(VPN)

VDCPhysicalFrameNumber (PFN)

Moreonthislater…

TLBControlFlowPsuedocode1. VPN=(VirtualAddress&VPN_MASK)>>SHIFT2. (Success,TlbEntry)=TLB_Lookup(VPN)3. if (Success==True)//TLBHit4. if (CanAccess(TlbEntry.ProtectBits)==True)5. Offset=VirtualAddress&OFFSET_MASK6. PhysAddr =(TlbEntry.PFN<<SHIFT)|Offset7. AccessMemory(PhysAddr)8. else9. RaiseException(PROTECTION_FAULT)10. else //TLBMiss11. PTEAddr =PTBR+(VPN*sizeof(PTE))12. PTE=AccessMemory(PTEAddr)13. if (PTE.Valid ==False)14. RaiseException(SEGMENTATION_FAULT)15. elseif(CanAccess(PTE.ProtectBits)==False)16. RaiseException(PROTECTION_FAULT)17. TLB_Insert(VPN,PTE.PFN,PTE.ProtectBits)18. RetryInstruction() 56

Loadthepagetableentry

frommemory,addittotheTLB,andretry

Makesurewehave

permission,then

proceed

FastPath

SlowPath

ReadinganArray(noTLB)• Supposewehavea10KBarrayofintegers– Assume4KBpages

• WithnoTLB,howmanymemoryaccessesarerequiredtoreadthewholearray?– 10KB/4=2560integersinthearray– Eachrequiresonepagetablelookup,onememoryread– 5120reads,plusmorefortheinstructionsthemselves

57

ReadinganArray(withTLB)• Sameexample,nowwithTLB– 10KBintegerarray– 4KBpages– AssumetheTLBstartsoffcold(i.e.empty)

• Howmanymemoryaccessestoreadthearray?– 2560toreadtheintegers– 3pagetablelookups– 2563totalreads– TLBhitrate:96%

58

Process1’sViewofVirtualMemory

ArrayData

ArrayDataArrayData

PagejPagej+1Pagej+2

VPN PFN

j a

j+1 b

j+2 c

TLB

Locality• TLB,likeanycache,iseffectivebecauseoflocality– Spatiallocality:ifyouaccessmemoryaddressx,itislikelyyouwillaccessx+1 soon• Mostofthetime,x andx+1 areinthesamepage

– Temporallocality:ifyouaccessmemoryaddressx,itislikelyyouwillaccessx againsoon• Thepagecontainingx willstillbeintheTLB,hopefully

59

BeCarefulWithCaching• Recall:TLBentrieshaveanASID(addressspaceID)field.Whatisthisfor?– Here’sahint:thinkaboutcontextswitching

60

VPN PFN

i d

j b

k a

Process1’sPageTable

VPN PFN

i r

j u

k s

Process2PageTableVPN PFN

i d

J b

k a

TLB

VPNsarethesame,butPFNmappingshavechanged!

• Problem:TLBentriesmaynotbevalidafteracontextswitch

PotentialSolutions1. CleartheTLB(markallentriesasinvalid)after

eachcontextswitch– Works,butforceseachprocesstostartwithacold

cache– Onlysolutiononx86(until~2008)

2. AssociateanASID(addressspaceID)witheachprocess– ASIDisjustlikeaprocessIDinthekernel– CPUcancomparetheASIDoftheactiveprocessto

theASIDstoredineachTLBentry– Iftheydon’tmatch,theTLBentryisinvalid 61

ReplacementPolicies• OnmanyCPUs(likex86),theTLBismanagedbythehardware

• Problem:spaceintheTLBislimited(usuallyKB)– OncetheTLBfillsup,howdoestheCPUdecidewhatentriestoreplace(evict)?

• Typicalreplacementpolicies:– FIFO:easytoimplement,butcertainaccesspatternsresultinworst-caseTLBhitrates

– Random:easytoimplement,fair,butsuboptimalhitrates

– LRU (LeastRecentlyUsed):algorithmtypicallyusedinpractice 62

Hardwarevs.SoftwareManagement• Thusfar,discussionhasfocusedonhardwaremanagedTLBs(e.g.x86)

PTE=AccessMemory(PTEAddr)TLB_Insert(VPN,PTE.PFN,PTE.ProtectBits)

– CPUdictatesthepagetableformat,readspagetableentriesfrommemory

– CPUmanagesallTLBentries• However,softwaremanagedTLBsarealsopossible(e.g.MIPSandSPARC)

63

SoftwareManagedTLBPseudocode1. VPN=(VirtualAddress &VPN_MASK)>>SHIFT2. (Success,TlbEntry)=TLB_Lookup(VPN)3. if (Success==True)//TLBHit4. if (CanAccess(TlbEntry.ProtectBits)==True)5. Offset=VirtualAddress &OFFSET_MASK6. PhysAddr =(TlbEntry.PFN<<SHIFT)|Offset7. Register=AccessMemory(PhysAddr)8. else9. RaiseException(PROTECTION_FAULT)10. else //TLBMiss11. RaiseException(TLB_MISS)

64

Thehardwaredoesnot:1. Trytoreadthepagetable2. Add/removeentriesfromtheTLB

ImplementingSoftwareTLBs• Keydifferencesvs.hardwaremanagedTLBs– CPUdoesn’tinsertentriesintotheTLB– CPUhasnoabilitytoreadpagetablesfrommemory

• OnTLBmiss,theOSmusthandletheexception– Locatethecorrectpagetableentryinmemory– InsertthePTEintotheTLB(evictifnecessary)– TelltheCPUtoretrythepreviousinstruction

• Note:TLBmanagementinstructionsareprivileged– OnlythekernelcanmodifytheTLB

65

ComparingHardwareandSoftwareTLBsHardwareTLB

• Advantages– Lessworkforkerneldevelopers,

CPUdoesalotofworkforyou

• Disadvantages– Pagetabledatastructureformat

mustconformtohardwarespecification

– LimitedabilitytomodifytheCPUsTLBreplacementpolicies

SoftwareTLB• Advantages

– Nopredefineddatastructureforthepagetable

– OSisfreetoimplementnovelTLBreplacementpolicies

• Disadvantages– Moreworkforkerneldevelopers– BewareinfiniteTLBmisses!

• OSes pagefaulthandlermustalwaysbepresentintheTLB

66

GreaterflexibilityEasiertoprogram

TLBSummary• TLBsaddresstheslowdownassociatedwithpagetables– FrequentlyusedpagetableentriesarecachedintheCPU

– PreventsrepeatedlookupsforPTEsinmainmemory• Reducethespeedoverheadofpagetablesbyanorderofmagnitudeormore– Cachingworksverywellinthisparticularscenario– Lotsofspatialandtemporallocality

67

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

68

Problem:PageTableSize• Atthispoint,wehavesolvedtheTLBspeedissue• However,recallthatpagestablesarelargeandsparse– Example:32-bitsystemwith4KBpages– Eachpagetableis4MB– Mostentriesareinvalid,i.e.thespaceiswasted

• Howcanwereducethesizeofthepagetables?– Manypossiblesolutions– Multi-layerpagetablesaremostcommon(x86)

69

SimpleSolution:BiggerPages• Supposeweincreasethesizeofpages– Example:32-bitsystem,4MBpages– 232 /222 =1024pagesperprocess– 1024*4bytesperpage=4KBpagetables

• Whatisthedrawback?– Increasedinternalfragmentation– Howmanyprogramsactuallyhave4MBofcode,4MBofstack,and4MBofheapdata?

70

AlternateDataStructures• Thusfar,we’veassumedlinearpagetables– i.e.anarrayofpagetableentries

• Whatifweswitchtoanalternatedatastructure?– Hashtable– Red-blacktree

• Whyisswitchingdatastructuresnotalwaysfeasible?– CanbedoneiftheTLBissoftwaremanaged– IftheTLBishardwaremanaged,thentheOSmustusethepagetableformatspecifiedbytheCPU

71

InvertedPageTables• Ourcurrentdiscussionfocusesontablesthatmapvirtualpagestophysicalpages

• Whatifweflipthetable:mapphysicalpagestovirtualpages?– Sincethereisonlyonephysicalmemory,weonlyneedoneinvertedpagetable!

72

VPN PFN

i d

j b

k a

VPN PFN

i r

j u

k s

VPN PFN

i r

j u

k s

PFN VPN

i d

j b

k a

Standardpagetables:oneperprocess

Invertedpagetables:onepersystem

TraditionalTables InvertedTable

Normalvs.InvertedPageTables• Advantageofinvertedpagetable– Onlyonetableforthewholesystem

• Disadvantages– Lookupsaremorecomputationallyexpensive

– Howtoimplementsharedmemory?

73

VPN PFN

i d

j b

k a

PFN VPN

i d

j b

k a

VPNservesasanindexintothearray,thusO(1)lookuptime

TraditionalTable InvertedTable

TablemustbescannedtolocateagivenVPN,thusO(n)lookuptime

Multi-LevelPageTables• Keyidea:splitthelinearpagetableintoatreeofsub-tables– Benefit:branchesofthetreethatareempty(i.e.donotcontainvalidpages)canbepruned

• Multi-levelpagetablesareaspace/timetradeoff– Pruningreducesthesizeofthetable(savesspace)– But,nowthetreemustbetraversedtotranslatevirtualaddresses(increasedaccesstime)

• Techniqueusedbymodernx86CPUs– 32-bit:two-leveltables– 64-bit:four-leveltables 74

Multi-LevelTableToyExample• Imagineasmall,16KBaddressspace– 64-bytepages,14-bitvirtualaddresses,8bitsfortheVPNand6fortheoffset

• Howmanyentriesdoesalinearpagetableneed?– 28 =256entries

750

214

Process1’sViewofVirtualMemory

Stack

Heap

Code Page0

Page4

Page255 Assume3pagesoutof

256totalpagesareinuse

FromLineartoTwo-levelsTables• Howdoyouturnalineartableintoamulti-leveltable?– Breakthelineartableupintopage-sizeunits

• 256tableentries,eachis4byteslarge– 256*4bytes=1KBlinearpagetables

• Given64-bytepages,a1KBlineartablecanbedividedinto1664-bytetables– Eachsub-tableholds16pagetableentries

76

13 12 11 10 9 8 7 6 5 4 3 2 1 0

VirtualPage# OffsetPageDirectoryIndex(TableLevel1)

PageTableIndex(TableLevel2)

77

VPN PFN Valid?

00000000 a 1

... 0

00000100 b 1

… 0

11111111 c 1

LinearPageTable

0

214

Process1’sViewofVirtualMemory

Stack

Heap

Code Page0

Page4

Page255

13 12 11 10 9 8 7 6 5 4 3 2 1 0

OffsetVirtualPageNumber

253tablesentriesareempty,space

iswasted:(

0

214

Process1’sViewofVirtualMemory

Stack

Heap

Code Page0

Page4

Page255

13 12 11 10 9 8 7 6 5 4 3 2 1 0

OffsetPageDirectoryIndex

PageTableIndex

Index PFN Valid?

0000 a 1

… 0

0100 b 1

… 0

PageTable0000

Index PFN Valid?

0000 0

… 0

1111 c 1

PageTable1111

Index Valid?

0000 1

0001 0

0010 0

… 0

1111 1

PageDirectory

Emptysub-tablesdon’tneedtobe

allocated:)

32-bitx86Two-LevelPageTables

79

31 24 23 16 15 8 7 0

10-bitsPDIndex

10-bitsPTIndex

12-bitsOffset

PhysicalMemory

CR3Register

PageDirectory

PageTables

64-bitx86Four-LevelPageTables

80

31 24 23 16 15 8 7 0

9-bitsPD1Index

9-bitsPTIndex

12-bitsOffset

PhysicalMemory

CR3Register

PageDirectory3

PageDirectories2

63 56 55 48 47 40 39 32

9-bitsPD2Index

9-bitsPD3Index

PageDirectories1

PageTables

Don’tForgettheTLB• Multi-levelpageslookcomplicated– Andtheyare,butonlywhenyouhavetotraversethem

• TheTLBstillstoresVPNà PFNmappings– TLBhitsavoidreading/traversingthetablesatall

81

Multi-LevelPageTableSummary• Reasonablyeffectivetechniqueforshrinkingthesizeofpagetables– Implementedbyx86

• Canonicalexampleofaspace/timetradeoff– TraversingmanylevelsoftableindirectionisslowerthanusingtheVPNasanindexintoalineartable

– But,lineartableswastealotofspace

82

• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace

83

StatusCheck• Atthispoint,wehaveafull-featuredvirtualmemorysystem– Transparent,supportsprotectionandisolation– Fast(viaTLBs)– Spaceefficient(viamulti-leveltables)

• Arewedone?– No!

• Whatifwecompletelyrunoutofphysicalmemory?– Canvirtualizationhelp?

84

SwapSpace• Keyidea:takeframesfromphysicalmemoryandswap(write)themtodisk– Thisfreesupspaceforothercodeanddata

• Loaddatafromswapbackintomemoryon-demand– Ifaprocessattemptstoaccessapagethathasbeenswappedout…

– Apage-faultoccursandtheinstructionpauses– TheOScanswaptheframebackin,insertitintothepagetable,andrestarttheinstruction

85

SwappingExample

86

• Supposememoryisfull• Theuseropensanewprogram

• Swapoutidlepagestodisk

• Iftheidlepagesareaccessed,pagethembackin

0x0000

0xFFFFKernelMemory

Process1

Process2

Process3

Process4

Process5

Active

Active

Active

Idle

HardDrive

AllModernOSes SupportSwapping• OnLinux,youcreateaswappartition alongwithyournormalext3/4filesystem– Swappedpagesarestoredinthisseparatepartition

• Windows

87

This image cannot currently be displayed.

ImplementingSwap1. Datastructuresareneededtotrackthemapping

betweenpagesinmemoryandpagesondisk2. Meta-dataaboutmemorypagesmustbekept–Whenshouldpagesbeevicted(swappedtodisk)?– Howdoyouchoosewhichpagetoevict?

3. ThefunctionalityoftheOSes pagefaulthandlermustbemodified

88

x86PageTableEntry,Again• Onx86,pagetableentries(PTE)are4bytes

89

31- 12 11- 9 8 7 6 5 4 3 2 1 0

PageFrameNumber (PFN) Unused G PAT D A PCD PWT U/S W P

• P– presentbit– isthispageinphysicalmemory?– OSsetsorclearsthepresentbitbasedonitsswappingdecisions• 1meansthepageisinphysicalmemory• 0meansthepageisvalid,buthasbeenswappedtodisk

– Attemptstoaccessaninvalidpageor apagethatisn’tpresenttriggerapagefault

HandlingPageFaults• Thusfar,wehaveviewedpagefaultsasbugs– i.e.whenaprocesstriestoaccessaninvalidpointer– TheOSkillstheprocessthatgeneratepagefaults

• However,nowhandlingpagefaultsismorecomplicated– IfthePTEisinvalid,theOSstillkillstheprocess– IfthePTEisvalid,butpresent=0,then

1. TheOSswapsthepagebackintomemory2. TheOSupdatesthePTE3. TheOSinstructstheCPUtoretrythelastinstruction

90

PageFaultPseudocode1. VPN=(VirtualAddress&VPN_MASK)>>SHIFT2. (Success,TlbEntry)=TLB_Lookup(VPN)3. if (Success==True)//TLBHit4. if(CanAccess(TlbEntry.ProtectBits)==True)5. Offset=VirtualAddress&OFFSET_MASK6. PhysAddr =(TlbEntry.PFN<<SHIFT)|Offset7. Register=AccessMemory(PhysAddr)8. else RaiseException(PROTECTION_FAULT)9. else //TLBMiss10. PTEAddr =PTBR+(VPN*sizeof(PTE))11. PTE=AccessMemory(PTEAddr)12. if(PTE.Valid ==False)RaiseException(SEGMENTATION_FAULT)13. if (CanAccess(PTE.ProtectBits)==False)14. RaiseException(PROTECTION_FAULT)15. if(PTE.Present ==True)//assuminghardware-managedTLB16. TLB_Insert(VPN,PTE.PFN,PTE.ProtectBits)17. RetryInstruction()18. elseif (PTE.Present ==False)RaiseException(PAGE_FAULT) 91

WhenShouldtheOSEvictPages?• Memoryisfinite,sowhenshouldpagesbeswapped?

• On-demandapproach– Ifapageneedstobecreatedandnofreepagesexist,swapapagetodisk

• Proactiveapproach– MostOSes trytomaintainasmallpooloffreepages– Implementahighwatermark– Oncephysicalmemoryutilizationcrossesthehighwatermark,abackgroundprocessstartsswappingoutpages 92

WhatPagesShouldbeEvicted?• Knownasthepage-replacementpolicy• Whatistheoptimalevictionstrategy?– Evictthepagethatwillbeaccessedfurthestinthefuture– Provablyresultsinthemaximumcachehitrate– Unfortunately,impossibletoimplementinpractice

• Practicalstrategiesforselectingwhichpagetoswaptodisk– FIFO– Random– LRU(Leastrecentlyused)

• SamefundamentalalgorithmsasinTLBeviction 93

ExamplesofOptimalandLRU

Optimal(FurthestintheFuture)Access Hit/Miss? Evict CacheState

0 Miss 0

1 Miss 0,1

2 Miss 0,1,2

0 Hit 0,1,2

1 Hit 0,1,2

3 Miss 2 0,1,3

0 Hit 0,1, 3

3 Hit 0,1,3

1 Hit 0,1,3

2 Miss 3 0,1, 2

0 Hit 0,1,2

LRUAccess Hit/Miss? Evict CacheState

0 Miss 0

1 Miss 0,1

2 Miss 0,1,2

0 Hit 0,1,2

1 Hit 0,1,2

3 Miss 2 0,1,3

0 Hit 0,1, 3

3 Hit 0,1,3

1 Hit 0,1,3

2 Miss 0 1,2,3

0 Miss 3 0,1,2

Assumethecachecanstore3pages

95

Whenmemoryaccessesarerandom,itsimpossibletobesmartaboutcaching

• Allmemoryaccessesareto100%randompages

96

LRUdoesabetterjobofkeeping“hot”pagesinRAMthanFIFOorrandom

• 80%ofmemoryaccessesarefor20%ofpages

97

• Theprocesssequentiallyaccessesonememoryaddressin50pages,thenloops

• WhenthecachesizeisC<50,LRUevictspageXwhenpageX+C isread

• Thus,pagesarenotinthecacheduringthenextiterationoftheloop

WhenC>=50,allpagesarecached,thushitrate=100%

ImplementingHistoricalAlgorithms• LRUhashighcachehitratesinmostcases…• …buthowdoweknowwhichpageshavebeenrecentlyused?

• Strategy1:recordeachaccesstothepagetable– Problem:addsadditionaloverheadtopagetablelookups

• Strategy2:approximateLRUwithhelpfromthehardware

98

x86PageTableEntry,Again• Onx86,pagetableentries(PTE)are4bytes

99

31- 12 11- 9 8 7 6 5 4 3 2 1 0

PageFrameNumber (PFN) Unused G PAT D A PCD PWT U/S W P

• Bitsrelatedtoswapping– A– accessedbit– hasthispagebeenreadrecently?– D– dirtybit– hasthispagebeenwrittenrecently?– TheMMUsetstheaccessedbitwhenitreadsaPTE– TheMMUsetsthedirtybitwhenitwritestothepagereferencedinthePTE

– TheOSmaycleartheseflagsasitwishes

ApproximatingLRU• Theaccessedanddirtybitstelluswhichpageshavebeenrecentlyaccessed

• But,LRUisstilldifficulttoimplement– Oneviction,LRUneedstoscanallPTEstodeterminewhichhavenotbeenused

– ButtherearemillionsofPTEs!• IsthereacleverwaytoapproximateLRUwithoutscanningallPTEs?– Yes!

100

TheClockAlgorithm• ImaginethatallPTEsarearrangedinacircularlist• TheclockhandpointstosomePTEPinthelist

101

function clock_algo() {

start = P;

do {

if (P.accessed == 0) {

evict(P);

return;

}

P.accessed = 0;

P = P.next;

} while (P != start);

evict_random_page();

}

IncorporatingtheDirtyBit• Moremodernpageevictionalgorithmsalsotakethedirtybitintoaccount

• Forexample:supposeyoumustevictapage,andallpageshavebeenaccessed– Somepagesareread-only(likecode)– Somepageshavebeenwrittentoo(i.e.theyaredirty)

• Evictthenon-dirtypagesfirst– Insomecases,youdon’thavetoswapthemtodisk!– Example:codeisalreadyonthedisk,simplyreloadit

• Dirtypagesmustalwaysbewrittentodisk– Thus,theyaremoreexpensivetoswap

102

RAMasaCache• RAMcanbeviewedasahigh-speedcacheforyourlarge-but-slowspinningdiskstorage– YouhaveGBofprogramsanddata– OnlyasubsetcanfitinRAMatanygiventime

• Ideally,youwantthemostimportantthingstoberesidentinthecache(RAM)– Code/datathatbecomelessimportantcanbeevictedbacktothedisk

103