• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
2
MainMemory
• Mainmemoryisconceptuallyverysimple– Codesitsinmemory– Dataiseitheronastackoraheap– Everythinggetsaccessedviapointers– Datacanbewrittentoorreadfromlongtermstorage
• Memoryisasimpleandobviousdevice– SowhyismemorymanagementoneofthemostcomplexfeaturesinmodernOSes?
3
ProtectionandIsolation
• Physicalmemorydoesnotofferprotection orisolation
40x00000000
0xFFFFFFFFKernelMemory
Process1w/SecretData
PhysicalMemory
EvilProcessI’minyourprocess,stealing
yourdata;)
Ohsorry,Ididn’tmeantooverwriteyourtask_structs ;)
CompilationandProgramLoading
• Compiledprogramsincludefixedpointeraddresses
• Example:000FE4D8<foo>:…000FE21A: pusheax000FE21D: pushebx000FE21F: call0x000FE4D8• Problem:whatiftheprogramisnotloadedatcorrespondingaddress?
50x00000000
0xFFFFFFFFKernelMemory
Process1
PhysicalMemory
Process2
Addr offoo():0x000FE4D8
Addr offoo():0x0DEB49A3
PhysicalMemoryhasLimitedSize
• RAMischeap,butnotascheapassolidstateorcloudstorage
• WhathappenswhenyourunoutofRAM?
60x00000000
0xFFFFFFFFKernelMemory
Process1
Process2
Process3
Process4
Process5
Physicalvs.VirtualMemory• Clearly,physicalmemoryhaslimitations– Noprotectionorisolation– Fixedpointeraddresses– Limitedsize– Etc.
• Virtualizationcansolvetheseproblems!– Aswellasenableadditional,coolfeatures
7
AToyExample• Whatdowemeanbyvirtualmemory?– Processesusevirtual (orlogical)addresses– Virtualaddressesaretranslatedtophysicaladdresses
0x0000
0xFFFF KernelMemory
Process1
PhysicalMemory(Reality)
Process’ViewofVirtualMemory
0x0000
0xFFFF
Process2
Process3
Process1
Allthememorybelongstome!
IammasterofallIsurvey!
PhysicalAddress
VirtualAddress
MagicalAddressTranslationBlackBox
ImplementingAddressTranslation• Inasystemwithvirtualmemory,eachmemoryaccessmustbetranslated
• CantheOSperformaddresstranslation?– Onlyifprogramsareinterpreted
• Modernsystemshavehardwaresupportthatfacilitatesaddresstranslation– ImplementedintheMemoryManagementUnit(MMU)oftheCPU
– CooperateswiththeOStotranslatevirtualaddressesintophysicaladdresses
9
VirtualMemoryImplementations• TherearemanywaystoimplementanMMU– Baseandboundregisters– Segmentation– Pagetables– Multi-levelpagetables
• Wewilldiscusseachoftheseapproaches– Howdoesitwork?–Whatfeaturesdoesitoffer?–Whatarethelimitations?
10
Old,simple,limitedfunctionality
Modern,complex,lotsoffunctionality
GoalsofVirtualMemory• Transparency– Processesareunawareofvirtualization
• Protectionandisolation• Flexiblememoryplacement– OSshouldbeabletomovethingsaroundinmemory
• Sharedmemoryandmemorymappedfiles– Efficientinterprocess communication– Sharedcodesegments,i.e.dynamiclibraries
• Dynamicmemoryallocation– Growheapsandstacksondemand,noneedtopre-allocatelargeblocksofemptymemory
• Supportforsparseaddressspaces• Demand-basedpaging– Createtheillusionofnear-infinitememory
11
• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
12
BaseandBoundsRegisters• Asimplemechanismforaddresstranslation• Mapsacontiguousvirtualaddressregiontoacontiguousphysicaladdressregion
130x0000
0xFFFF KernelMemory
Process1
PhysicalMemory
0x00FF
0x10FFProcess1
Process’ViewofVirtualMemory
0x0001
0x1001
Register Value
EIP 0x0023
ESP 0x0F76
BASE 0x00FF
BOUND 0x1000
BaseandBoundsExample
14
0x0000
0xFFFF KernelMemory
Process1
PhysicalMemory
0x00FF
0x10FFProcess1
Process’ViewofVirtualMemory
0x0001
0x1001
Register Value
EIP 0x0023
ESP 0x0F76
BASE 0x00FF
BOUND 0x1000
0x0023mov eax,[esp]
1)Fetchinstruction0x0023+0x00FF=0x0122
2)Translatememoryaccess0x0F76+0x00FF=0x1075
3)Movevaluetoregister[0x1075]à eax
1
21
2
ProtectionandIsolation
15
0x0000
0xFFFF KernelMemory
Process1
PhysicalMemory
0x00FF
0x10FF
Process1
Process’ViewofVirtualMemory
0x0001
0x1001
Register Value
EIP 0x0023
ESP 0x0F76
BASE 0x00FF
BOUND 0x1000
0x0023mov eax,[0x4234]
1)Fetchinstruction0x0023+0x00FF=0x0122
2)Translatememoryaccess0x4234+0x00FF=0x43330x4333>0x10FF
(BASE+BOUND)RaiseProtectionException!
1
2
1
2
ImplementationDetails
• BASEandBOUNDareprotectedregisters– OnlycodeinRing0maymodifyBASEandBOUND– Preventsprocessesfrommodifyingtheirownsandbox
• EachCPUhasoneBASEandoneBOUNDregister– JustlikeESP,EIP,EAX,etc…– Thus,BASEandBOUNDmustbesavedarestoredduringcontextswitching
16
BaseandBoundPseudocode1. PhysAddr =VirtualAddress +BASE2. if (PhysAddr >=BASE+BOUND)3. RaiseException(PROTECTION_FAULT)4. Register=AccessMemory(PhysAddr)
17
• Simplehardwareimplementation• Simpletomanageeachprocess’virtualspace
• Processescanbeloadedatarbitraryfixedaddresses
• Offersprotectionandisolation• Offersflexibleplacementofdatainmemory
AdvantagesofBaseandBound
180x00000000
0xFFFFFFFFKernelMemory
Process1
PhysicalMemory
Process2
I’mloadedataddress0x00AF
No,I’mloadedataddress0x00AF
PreviousBASEà 0x00FFNewBASEà 0x10A0
LimitationsofBaseandBound• Processescanoverwritetheirowncode– Processesaren’tprotectedfromthemselves
• Nosharingofmemory– Code(read-only)ismixedinwithdata(read/write)
• Processmemorycannotgrowdynamically– Mayleadtointernalfragmentation
190x00000000
0xFFFFFFFFKernelMemory
/bin/bash
PhysicalMemory
/bin/bash
Code
Code
Data
DataCodeisduplicatedinmemory:(
InternalFragmentation• BOUNDdeterminesthemaxamountofmemoryavailabletoaprocess
• Howmuchmemorydoweallocate?– Emptyspaceleadstointernalfragmentation
• Whatifwedon’tallocateenough?– IncreasingBOUNDaftertheprocessisrunningdoesn’thelp
20
PhysicalMemory
Code
Heap
Stack
Wastedspace=internalfragmentation
Heap
Stack
IncreasingBOUNDdoesn’tmovethestackawayfromtheheap
• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
21
TowardsSegmentedMemory• HavingasingleBASEandasingleBOUNDmeanscode,stack,andheapareallinonememoryregion– Leadstointernalfragmentation– Preventsdynamicallygrowingthestackandheap
• Segmentation isageneralizationofthebaseandboundsapproach– Giveeachprocessseveralpairsofbase/bounds• Mayormaynotbestoredindedicatedregisters
– Eachpairdefinesasegment– Eachsegmentcanbemovedorresizedindependently
22
SegmentationDetails• Thecodeanddataofaprocessgetsplitintoseveralsegments– 3segmentsiscommon:code,heap,andstack– Somearchitecturessupport>3segmentsperprocess
• Eachprocessviewsitssegmentsasacontiguousregionofmemory– Butinphysicalmemory,thesegmentscanbeplacedinarbitrarylocations
• Question:givenavirtualaddress,howdoestheCPUdeterminewhichsegmentisbeingaddressed?
23
SegmentsandOffsets• Keyidea:splitvirtualaddressesintoasegmentindexandanoffset
• Example:supposewehave14-bitaddresses– Top2bitsarethesegment– Bottom12bitsaretheoffset
• 4possiblesegmentsperprocess– 00,01,10,11
24
13 12 11 10 9 8 7 6 5 4 3 2 1 0
Segment Offset
SeparationofResponsibility• TheOSmanagessegmentsandtheirindexes– Createssegmentsfornewprocessesinfreephysicalmemory
– Buildsatablemappingsegmentsindexestobaseaddressesandbounds
– Swapsoutthetablesandsegmentregistersduringcontextswitches
– Freessegmentsfromphysicalmemory
• TheCPUtranslatesvirtualaddressestophysicaladdressesondemand– Usesthesegmentregisters/segmenttablesbuiltbytheOS
25
SegmentationExample
26
0x0000
0x3FFF
Process’ViewofVirtualMemory
Code
Segment Index Base Bound
CS (Code) 00 0x0020 0x0100
HS (Heap) 01 0xB000 0x0100
SS (Stack) 10 0x0400 0x0100
0x0023mov eax,[esp]1)Fetchinstruction0x0023(EIP)- 00000000100011
0x0020(CS)+0x0023=0x00432)Translatememoryaccess0x2015(ESP)– 10000000010101
0x0400(SS)+ 0x0015=0x04150x1000
0x0000
0xFFFFKernelMemory
PhysicalMemory
Code
Heap
Stack
Code
Heap
Stack
0x0020
0xB1000xB000
0x01200x04000x0500
0x2000
0x3000
Heap
Stack
SegmentationPseudocode1. //gettop2bitsof14-bitVA2. Segment=(VirtualAddress &SEG_MASK)>>SEG_SHIFT3. //nowgetoffset4. Offset=VirtualAddress &OFFSET_MASK5. if (Offset>=Bounds[Segment])6. RaiseException(PROTECTION_FAULT)7. else8. PhysAddr =Base[Segment]+Offset9. Register=AccessMemory(PhysAddr)
27
MoreonSegments• Inthepreviousexample,weusea14-bitaddressspacewith2bitsreservedforthesegmentindex– Thislimitsusto4segmentsperprocess– Eachsegmentis212 =4KBinsize
• Realsegmentationsystemstendtohave1. Morebitsforthesegmentsindex(16-bitsforx86)2. Morebitsfortheoffset(16-bitsforx86)
• However,segmentsarecourse-grained– Limitednumberofsegmentsperprocess(typically~4)
28
SegmentPermissions• ManyCPUs(includingx86)supportpermissionsonsegments– Read,write,andexecutable
• Disallowedoperationstriggeranexception– E.g.Tryingtowritetothecodesegment
29
0x0000
0x3FFF
Process1’sViewofVirtualMemory
Code
Index Base Bound Permissions
00 0x0020 0x0100 RX
01 0xB000 0x0100 RW
10 0x0400 0x0100 RW11 0xE500 0x100 R
0x1000
0x2000
0x3000
Heap
Stack
.rodata
x86Segments• Intel80286introducedsegmentedmemory– CS– codesegmentregister– SS– stacksegmentregister– DS– datasegmentregister– ES,FS,GS– extrasegmentregisters
• In16-bit(realmode)x86assembly,segment:offsetnotationiscommonmov [ds:eax],42 //move42tothedatasegment,offset
//bythevalueineaxmov [esp],23 //usestheSSsegmentbydefault
30
x86SegmentsToday• Segmentregistersandtheirassociatedfunctionalitystillexistintoday’sx86CPUs
• However,the80386introducedpagetables– ModernOSes “disable”segmentation– TheLinuxkernelsetsupfoursegmentsduringbootup
31
SegmentName Description Base Bound Ring
KERNEL_CS Kernelcode 0 4GB 0
KERNEL_DS Kerneldata 0 4GB 0
USER_CS Usercode 0 4GB 3
USER_DS Userdata 0 4GB 3
Pagesareusedtovirtualizememory,notsegments
Usedtolabelpageswithprotectionlevels
WhatisaSegmentationFault?• Ifyoutrytoread/writememoryoutsideasegmentassignedtoyourprocess
• Examples:– char buf[5];
strcpy(buf,“HelloWorld”);return 0;//whydoesitseg faultwhenyoureturn?
• Today“segmentationfault”isananachronism– Allmodernsystemsusepagetables,notsegments
32
SharedMemory
33
0x0000
0x3FFF
Process1’sViewofVirtualMemory
Code
Index Base Bound
00 0x0020 0x0100
01 0xB000 0x0100
10 0x0400 0x0100
11 0xE500 0x0300
0x1000
0x2000
0x3000
Heap
Stack
0x0000
0x3FFF
Process2’sViewofVirtualMemory
0x1000
0x2000
0x3000
Code
Heap
Stack
SharedData
SharedData
Index Base Bound
00 0x0020 0x0100
01 0xC000 0x0100
10 0x0600 0x0100
11 0xE500 0x0300
Same00and01physicalsegments
Different01and10physicalsegments
AdvantagesofSegmentation• Alltheadvantagesofbaseandbound• Bettersupportforsparseaddressspaces– Code,heap,andstackareinseparatesegments– Segmentsizesarevariable– Preventsinternalfragmentation
• Supportssharedmemory• Persegmentpermissions– Preventsoverwritingcode,orexecutingdata
34
ExternalFragmentation• Problem:variablesizesegmentscanleadtoexternalfragmentation– Memorygetsbrokenintorandomsize,non-contiguouspieces
• Example:thereisenoughfreememorytostartanewprocess– Butthememoryisfragmented:(
• Compactioncanfixtheproblem– Butitisextremelyexpensive
35
KernelMemory
PhysicalMemory
Code
Heap
Stack
Code
Heap
Stack
Heap
Stack
Code
Code
HeapStack
Code
Heap
Stack
• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
36
TowardsPagedMemory• Segmentsimproveonbaseandbound,buttheystillaren’tgranularenough– Segmentsleadtoexternalfragmentation
• Thepagedmemorymodelisageneralizationofthesegmentedmemorymodel– Physicalmemoryisdividedupintophysicalpages(a.k.a.frames)offixedsizes
– Codeanddataexistinvirtualpages– Atablemapsvirtualpagesà physicalpages(frames)
37
ToyExample• Supposewehavea64-bytevirtualaddressspace– Letsspecify16bytesperpage
• Howmanybitsdovirtualaddressesneedtobeinthissystem?– 26 =64bytes,thus6bitaddresses
Page00
64
16
32
48
Page1
Page2
Page3
VirtualMemory
38
• Howmanybitsofthevirtualaddressareneededtoselectthephysicalpage?– 64bytes/16bytesperpage=4pages– 22 =4,thus2bitstoselectthepage
5 4 3 2 1 0
VirtualPage# Offset
ToyExample,Continued
Page00
64
16
32
48
Page1
Page2
Page3
VirtualMemory
Page00
64
16
32
48
Page1
Page2
Page3
PhysicalMemory
Page4
128
96
112
Page5
Page6
Page7
80
VirtualPage# PhysicalPage#
00(0) 010(2)
01(1) 111(7)
10(2) 100(4)
11(3) 001(1)
mov eax,[21]
Translation21– 010101
117– 1110101
ConcreteExample• Assumea32-bitvirtualandphysicaladdressspace– Fixthepagesizeat4KB(4096bytes,212)
• Howmanytotalpageswilltherebe?– 232 /212 =1048576 (220)
• Howmanybitsofavirtualaddressareneededtoselectthephysicalpage?– 20bits(sincethereare1048576totalpages)
• Assumethateachpagetableentryis4byteslarge– Howbigwillthepagetablebe?– 1048586*4bytes=4MBofspace
40
• Eachprocessneedsitsownpagetable• 100processes=400MBofpagetables
ConcreteExample,Continued• Process1requires:
– 2 KBforcode(1page)– 7KBforstack(2pages)– 12KBforheap(3pages)
41
0
232
Process1’sViewofVirtualMemory
Code
Heap
Stack
HeapHeap
Stack
VPN PFN Valid?
0…i - 1 whatever 0
i d 1
i +1…j– 1 whatever 0
j b 1
j+1 f 1
j+2 e 1
j+3…k- 1 whatever 0
k a 1
k+1 c 1
Pagei
PagejPagej+1Pagej+2
PagekPagek+1
0
230KernelMemory
PhysicalMemory
Code
Heap
Stack
Heap
Heap
Stack
Pagea
PagebPagec
PagedPagee
Pagef
Thevastmajorityofeachprocess’page
tableisempty,i.e.thetableissparse
Heap
Heap
Pagej+3
Pageg
PageTableImplementation• TheOScreatesthepagetableforeachprocess– Pagetablesaretypicallystoredinkernelmemory– OSstoresapointertothepagetableinaspecialregisterintheCPU(CR3registerinx86)
– Oncontextswitch,theOSswapsthepointerfortheoldprocessestableforthenewprocessestable
• TheCPUusesthepagetabletotranslatevirtualaddressesintophysicaladdresses
42
x86PageTableEntry• Onx86,pagetableentries(PTE)are4bytes
43
31- 12 11- 9 8 7 6 5 4 3 2 1 0
PageFrameNumber (PFN) Unused G PAT D A PCD PWT U/S W P
• Bitsrelatedtopermissions– W– writablebit– isthepagewritable,orread-only?– U/S– user/supervisorbit– canuser-modeprocessesaccessthispage?
• Hardwarecachingrelatedbits:G,PAT,PCD,PWT• Bitsrelatedtoswapping– P– presentbit– isthispageinphysicalmemory?– A– accessedbit– hasthispagebeenreadrecently?– D– dirtybit– hasthispagebeenwrittenrecently?
Wewillrevisittheselaterinthelecture…
PageTablePseudocode1. //ExtracttheVPNfromthevirtualaddress2. VPN=(VirtualAddress &VPN_MASK)>>SHIFT3. //Formtheaddressofthepage-tableentry(PTE)4. PTEAddr =PTBR+(VPN*sizeof(PTE))5. //FetchthePTE6. PTE=AccessMemory(PTEAddr)7. if (PTE.Valid ==False)//Checkifprocesscanaccessthepage8. RaiseException(SEGMENTATION_FAULT)9. elseif(CanAccess(PTE.ProtectBits)==False)10. RaiseException(PROTECTION_FAULT)11. //AccessisOK:formphysicaladdressandfetchit12. offset=VirtualAddress &OFFSET_MASK13. PhysAddr =(PTE.PFN<<PFN_SHIFT)|offset14. Register=AccessMemory(PhysAddr)
44
TricksWithPermissionsandSharedPages
• Recallhowfork()isimplemented– OScreatesacopyofallpagescontrolledbytheparent
• fork()isaslooooow operation– Copyingallthatmemorytakesalooooong time
• Canweimprovetheefficiencyoffork()?– Yes,ifwearecleverwithsharedpagesandpermissions!
45
Copy-on-Write• Keyidea:ratherthancopyalloftheparentspages,createanewpagetableforthechildthatmapstoalloftheparentspages– Markallofthepagesasread-only– Ifparentorchildwritestoapage,aprotectionexceptionwillbetriggered
– TheOScatchestheexception,makesacopyofthetargetpage,thenrestartsthewriteoperation
• Thus,allunmodifieddataisshared– Onlypagesthatarewrittentogetcopied,ondemand
46
Copy-on-WriteExample
47
0
230KernelMemory
PhysicalMemory
Code
Heap
StackPagea
Paged
Pagef
Function VPN PFN Writable?
Code i d 0
Heap j b 1
Stack k a 1
ParentsPageTable
Function VPN PFN Writable?
Code i d 0
Heap j b 0
Stack k a 0
ChildsPageTable
0
0
ProtectionException
Stack
Pagem1
1
Stackm
Zero-on-Reference
• Howmuchphysicalmemorydoweneedtoallocatefortheheapofanewprocess?– Zerobytes!
• Whenaprocesstouchestheheap– SegmentationfaultintoOSkernel– Kernelallocatessomememory– Zeros thememory• Avoidaccidentallyleakinginformation!
– Restarttheprocess
48
AdvantagesofPageTables• Alltheadvantagesofsegmentation• Evenbettersupportforsparseaddressspaces– Eachpageisrelativelysmall– Fine-grainedpageallocationstoeachprocess– Preventsinternalfragmentation
• Allpagesarethesamesize– Eachtokeeptrackoffreememory(say,withabitmap)– Preventsexternalfragmentation
• Persegmentpermissions– Preventsoverwritingcode,orexecutingdata
49
ProblemsWithPageTables• Pagetablesarehuge– Ona32-bitmachinewith4KBpages,eachprocess’tableis4MB
– Ona64-bitmachinewith4KBpages,thereare240entriespertableà 240*4bytes=4TB
– Andthevastmajorityofentriesareempty/invalid!• Pagetableindirectionaddssignificantoverheadtoallmemoryaccesses
50
PageTablesareSlow0x1024mov [edi +eax *4], 0x00x1028inc eax0x102Ccmp eax,0x03E80x1030jne 0x1024
• Howmanymemoryaccessesoccurduringeachiterationoftheloop?– 4instructionsarereadfrommemory– [edi +eax *4]writestoonelocationinmemory– 5pagetablelookups
• Eachmemoryaccessmustbetranslated• …andthepagetablesthemselvesareinmemory
• Naïvepagetableimplementationdoublesmemoryaccessoverhead 51
• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
52
Problem:PageTableSpeed• Pagetablesgiveusagreatdealofflexibilityandgranularitytoimplementvirtualmemory
• However,pagetablesarelarge,thustheymustgoinRAM(asopposedtoinaCPUregister)– Eachvirtualmemoryaccessmustbetranslated– Eachtranslationrequiresatablelookupinmemory– Thus,memoryoverheadisdoubled
• Howcanweusepagetableswithoutthismemorylookupoverhead?
53
Caching• Keyidea:cachepagetableentriesdirectlyintheCPU’sMMU– TranslationLookaside Buffer(TLB)– Shouldbecalledaddresstranslationcache
• TLBstoresrecentlyusedPTEs– SubsequentrequestsforthesamevirtualpagecanbefilledfromtheTLBcache
• Directlyaddressesspeedissueofpagetables– On-dieCPUcacheisvery,veryfast– TranslationsthathittheTLBdon’tneedtobelookedupfromthepagetableinmemory
54
ExampleTLBEntry
• VPN&PFN– virtualandphysicalpages• G– isthispageglobal(i.e.accessiblebyallprocesses)?
• ASID– addressspaceID• D– dirtybit– hasthispagebeenwrittenrecently?• V– validbit– isthisentryintheTLBvalid?• C– cachecoherencybits– formulti-coresystems
55
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ASIDGVirtualPageNumber(VPN)
VDCPhysicalFrameNumber (PFN)
Moreonthislater…
TLBControlFlowPsuedocode1. VPN=(VirtualAddress&VPN_MASK)>>SHIFT2. (Success,TlbEntry)=TLB_Lookup(VPN)3. if (Success==True)//TLBHit4. if (CanAccess(TlbEntry.ProtectBits)==True)5. Offset=VirtualAddress&OFFSET_MASK6. PhysAddr =(TlbEntry.PFN<<SHIFT)|Offset7. AccessMemory(PhysAddr)8. else9. RaiseException(PROTECTION_FAULT)10. else //TLBMiss11. PTEAddr =PTBR+(VPN*sizeof(PTE))12. PTE=AccessMemory(PTEAddr)13. if (PTE.Valid ==False)14. RaiseException(SEGMENTATION_FAULT)15. elseif(CanAccess(PTE.ProtectBits)==False)16. RaiseException(PROTECTION_FAULT)17. TLB_Insert(VPN,PTE.PFN,PTE.ProtectBits)18. RetryInstruction() 56
Loadthepagetableentry
frommemory,addittotheTLB,andretry
Makesurewehave
permission,then
proceed
FastPath
SlowPath
ReadinganArray(noTLB)• Supposewehavea10KBarrayofintegers– Assume4KBpages
• WithnoTLB,howmanymemoryaccessesarerequiredtoreadthewholearray?– 10KB/4=2560integersinthearray– Eachrequiresonepagetablelookup,onememoryread– 5120reads,plusmorefortheinstructionsthemselves
57
ReadinganArray(withTLB)• Sameexample,nowwithTLB– 10KBintegerarray– 4KBpages– AssumetheTLBstartsoffcold(i.e.empty)
• Howmanymemoryaccessestoreadthearray?– 2560toreadtheintegers– 3pagetablelookups– 2563totalreads– TLBhitrate:96%
58
Process1’sViewofVirtualMemory
ArrayData
ArrayDataArrayData
PagejPagej+1Pagej+2
VPN PFN
j a
j+1 b
j+2 c
TLB
Locality• TLB,likeanycache,iseffectivebecauseoflocality– Spatiallocality:ifyouaccessmemoryaddressx,itislikelyyouwillaccessx+1 soon• Mostofthetime,x andx+1 areinthesamepage
– Temporallocality:ifyouaccessmemoryaddressx,itislikelyyouwillaccessx againsoon• Thepagecontainingx willstillbeintheTLB,hopefully
59
BeCarefulWithCaching• Recall:TLBentrieshaveanASID(addressspaceID)field.Whatisthisfor?– Here’sahint:thinkaboutcontextswitching
60
VPN PFN
i d
j b
k a
Process1’sPageTable
VPN PFN
i r
j u
k s
Process2PageTableVPN PFN
i d
J b
k a
TLB
VPNsarethesame,butPFNmappingshavechanged!
• Problem:TLBentriesmaynotbevalidafteracontextswitch
PotentialSolutions1. CleartheTLB(markallentriesasinvalid)after
eachcontextswitch– Works,butforceseachprocesstostartwithacold
cache– Onlysolutiononx86(until~2008)
2. AssociateanASID(addressspaceID)witheachprocess– ASIDisjustlikeaprocessIDinthekernel– CPUcancomparetheASIDoftheactiveprocessto
theASIDstoredineachTLBentry– Iftheydon’tmatch,theTLBentryisinvalid 61
ReplacementPolicies• OnmanyCPUs(likex86),theTLBismanagedbythehardware
• Problem:spaceintheTLBislimited(usuallyKB)– OncetheTLBfillsup,howdoestheCPUdecidewhatentriestoreplace(evict)?
• Typicalreplacementpolicies:– FIFO:easytoimplement,butcertainaccesspatternsresultinworst-caseTLBhitrates
– Random:easytoimplement,fair,butsuboptimalhitrates
– LRU (LeastRecentlyUsed):algorithmtypicallyusedinpractice 62
Hardwarevs.SoftwareManagement• Thusfar,discussionhasfocusedonhardwaremanagedTLBs(e.g.x86)
PTE=AccessMemory(PTEAddr)TLB_Insert(VPN,PTE.PFN,PTE.ProtectBits)
– CPUdictatesthepagetableformat,readspagetableentriesfrommemory
– CPUmanagesallTLBentries• However,softwaremanagedTLBsarealsopossible(e.g.MIPSandSPARC)
63
SoftwareManagedTLBPseudocode1. VPN=(VirtualAddress &VPN_MASK)>>SHIFT2. (Success,TlbEntry)=TLB_Lookup(VPN)3. if (Success==True)//TLBHit4. if (CanAccess(TlbEntry.ProtectBits)==True)5. Offset=VirtualAddress &OFFSET_MASK6. PhysAddr =(TlbEntry.PFN<<SHIFT)|Offset7. Register=AccessMemory(PhysAddr)8. else9. RaiseException(PROTECTION_FAULT)10. else //TLBMiss11. RaiseException(TLB_MISS)
64
Thehardwaredoesnot:1. Trytoreadthepagetable2. Add/removeentriesfromtheTLB
ImplementingSoftwareTLBs• Keydifferencesvs.hardwaremanagedTLBs– CPUdoesn’tinsertentriesintotheTLB– CPUhasnoabilitytoreadpagetablesfrommemory
• OnTLBmiss,theOSmusthandletheexception– Locatethecorrectpagetableentryinmemory– InsertthePTEintotheTLB(evictifnecessary)– TelltheCPUtoretrythepreviousinstruction
• Note:TLBmanagementinstructionsareprivileged– OnlythekernelcanmodifytheTLB
65
ComparingHardwareandSoftwareTLBsHardwareTLB
• Advantages– Lessworkforkerneldevelopers,
CPUdoesalotofworkforyou
• Disadvantages– Pagetabledatastructureformat
mustconformtohardwarespecification
– LimitedabilitytomodifytheCPUsTLBreplacementpolicies
SoftwareTLB• Advantages
– Nopredefineddatastructureforthepagetable
– OSisfreetoimplementnovelTLBreplacementpolicies
• Disadvantages– Moreworkforkerneldevelopers– BewareinfiniteTLBmisses!
• OSes pagefaulthandlermustalwaysbepresentintheTLB
66
GreaterflexibilityEasiertoprogram
TLBSummary• TLBsaddresstheslowdownassociatedwithpagetables– FrequentlyusedpagetableentriesarecachedintheCPU
– PreventsrepeatedlookupsforPTEsinmainmemory• Reducethespeedoverheadofpagetablesbyanorderofmagnitudeormore– Cachingworksverywellinthisparticularscenario– Lotsofspatialandtemporallocality
67
• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
68
Problem:PageTableSize• Atthispoint,wehavesolvedtheTLBspeedissue• However,recallthatpagestablesarelargeandsparse– Example:32-bitsystemwith4KBpages– Eachpagetableis4MB– Mostentriesareinvalid,i.e.thespaceiswasted
• Howcanwereducethesizeofthepagetables?– Manypossiblesolutions– Multi-layerpagetablesaremostcommon(x86)
69
SimpleSolution:BiggerPages• Supposeweincreasethesizeofpages– Example:32-bitsystem,4MBpages– 232 /222 =1024pagesperprocess– 1024*4bytesperpage=4KBpagetables
• Whatisthedrawback?– Increasedinternalfragmentation– Howmanyprogramsactuallyhave4MBofcode,4MBofstack,and4MBofheapdata?
70
AlternateDataStructures• Thusfar,we’veassumedlinearpagetables– i.e.anarrayofpagetableentries
• Whatifweswitchtoanalternatedatastructure?– Hashtable– Red-blacktree
• Whyisswitchingdatastructuresnotalwaysfeasible?– CanbedoneiftheTLBissoftwaremanaged– IftheTLBishardwaremanaged,thentheOSmustusethepagetableformatspecifiedbytheCPU
71
InvertedPageTables• Ourcurrentdiscussionfocusesontablesthatmapvirtualpagestophysicalpages
• Whatifweflipthetable:mapphysicalpagestovirtualpages?– Sincethereisonlyonephysicalmemory,weonlyneedoneinvertedpagetable!
72
VPN PFN
i d
j b
k a
VPN PFN
i r
j u
k s
VPN PFN
i r
j u
k s
PFN VPN
i d
j b
k a
Standardpagetables:oneperprocess
Invertedpagetables:onepersystem
TraditionalTables InvertedTable
Normalvs.InvertedPageTables• Advantageofinvertedpagetable– Onlyonetableforthewholesystem
• Disadvantages– Lookupsaremorecomputationallyexpensive
– Howtoimplementsharedmemory?
73
VPN PFN
i d
j b
k a
PFN VPN
i d
j b
k a
VPNservesasanindexintothearray,thusO(1)lookuptime
TraditionalTable InvertedTable
TablemustbescannedtolocateagivenVPN,thusO(n)lookuptime
Multi-LevelPageTables• Keyidea:splitthelinearpagetableintoatreeofsub-tables– Benefit:branchesofthetreethatareempty(i.e.donotcontainvalidpages)canbepruned
• Multi-levelpagetablesareaspace/timetradeoff– Pruningreducesthesizeofthetable(savesspace)– But,nowthetreemustbetraversedtotranslatevirtualaddresses(increasedaccesstime)
• Techniqueusedbymodernx86CPUs– 32-bit:two-leveltables– 64-bit:four-leveltables 74
Multi-LevelTableToyExample• Imagineasmall,16KBaddressspace– 64-bytepages,14-bitvirtualaddresses,8bitsfortheVPNand6fortheoffset
• Howmanyentriesdoesalinearpagetableneed?– 28 =256entries
750
214
Process1’sViewofVirtualMemory
Stack
Heap
Code Page0
Page4
Page255 Assume3pagesoutof
256totalpagesareinuse
FromLineartoTwo-levelsTables• Howdoyouturnalineartableintoamulti-leveltable?– Breakthelineartableupintopage-sizeunits
• 256tableentries,eachis4byteslarge– 256*4bytes=1KBlinearpagetables
• Given64-bytepages,a1KBlineartablecanbedividedinto1664-bytetables– Eachsub-tableholds16pagetableentries
76
13 12 11 10 9 8 7 6 5 4 3 2 1 0
VirtualPage# OffsetPageDirectoryIndex(TableLevel1)
PageTableIndex(TableLevel2)
77
VPN PFN Valid?
00000000 a 1
... 0
00000100 b 1
… 0
11111111 c 1
LinearPageTable
0
214
Process1’sViewofVirtualMemory
Stack
Heap
Code Page0
Page4
Page255
13 12 11 10 9 8 7 6 5 4 3 2 1 0
OffsetVirtualPageNumber
253tablesentriesareempty,space
iswasted:(
0
214
Process1’sViewofVirtualMemory
Stack
Heap
Code Page0
Page4
Page255
13 12 11 10 9 8 7 6 5 4 3 2 1 0
OffsetPageDirectoryIndex
PageTableIndex
Index PFN Valid?
0000 a 1
… 0
0100 b 1
… 0
PageTable0000
Index PFN Valid?
0000 0
… 0
1111 c 1
PageTable1111
Index Valid?
0000 1
0001 0
0010 0
… 0
1111 1
PageDirectory
Emptysub-tablesdon’tneedtobe
allocated:)
32-bitx86Two-LevelPageTables
79
31 24 23 16 15 8 7 0
10-bitsPDIndex
10-bitsPTIndex
12-bitsOffset
PhysicalMemory
CR3Register
PageDirectory
PageTables
64-bitx86Four-LevelPageTables
80
31 24 23 16 15 8 7 0
9-bitsPD1Index
9-bitsPTIndex
12-bitsOffset
PhysicalMemory
CR3Register
PageDirectory3
PageDirectories2
63 56 55 48 47 40 39 32
9-bitsPD2Index
9-bitsPD3Index
PageDirectories1
PageTables
Don’tForgettheTLB• Multi-levelpageslookcomplicated– Andtheyare,butonlywhenyouhavetotraversethem
• TheTLBstillstoresVPNà PFNmappings– TLBhitsavoidreading/traversingthetablesatall
81
Multi-LevelPageTableSummary• Reasonablyeffectivetechniqueforshrinkingthesizeofpagetables– Implementedbyx86
• Canonicalexampleofaspace/timetradeoff– TraversingmanylevelsoftableindirectionisslowerthanusingtheVPNasanindexintoalineartable
– But,lineartableswastealotofspace
82
• MotivationandGoals• BaseandBounds• Segmentation• PageTables• TLB• Multi-levelPageTables• SwapSpace
83
StatusCheck• Atthispoint,wehaveafull-featuredvirtualmemorysystem– Transparent,supportsprotectionandisolation– Fast(viaTLBs)– Spaceefficient(viamulti-leveltables)
• Arewedone?– No!
• Whatifwecompletelyrunoutofphysicalmemory?– Canvirtualizationhelp?
84
SwapSpace• Keyidea:takeframesfromphysicalmemoryandswap(write)themtodisk– Thisfreesupspaceforothercodeanddata
• Loaddatafromswapbackintomemoryon-demand– Ifaprocessattemptstoaccessapagethathasbeenswappedout…
– Apage-faultoccursandtheinstructionpauses– TheOScanswaptheframebackin,insertitintothepagetable,andrestarttheinstruction
85
SwappingExample
86
• Supposememoryisfull• Theuseropensanewprogram
• Swapoutidlepagestodisk
• Iftheidlepagesareaccessed,pagethembackin
0x0000
0xFFFFKernelMemory
Process1
Process2
Process3
Process4
Process5
Active
Active
Active
Idle
HardDrive
AllModernOSes SupportSwapping• OnLinux,youcreateaswappartition alongwithyournormalext3/4filesystem– Swappedpagesarestoredinthisseparatepartition
• Windows
87
This image cannot currently be displayed.
ImplementingSwap1. Datastructuresareneededtotrackthemapping
betweenpagesinmemoryandpagesondisk2. Meta-dataaboutmemorypagesmustbekept–Whenshouldpagesbeevicted(swappedtodisk)?– Howdoyouchoosewhichpagetoevict?
3. ThefunctionalityoftheOSes pagefaulthandlermustbemodified
88
x86PageTableEntry,Again• Onx86,pagetableentries(PTE)are4bytes
89
31- 12 11- 9 8 7 6 5 4 3 2 1 0
PageFrameNumber (PFN) Unused G PAT D A PCD PWT U/S W P
• P– presentbit– isthispageinphysicalmemory?– OSsetsorclearsthepresentbitbasedonitsswappingdecisions• 1meansthepageisinphysicalmemory• 0meansthepageisvalid,buthasbeenswappedtodisk
– Attemptstoaccessaninvalidpageor apagethatisn’tpresenttriggerapagefault
HandlingPageFaults• Thusfar,wehaveviewedpagefaultsasbugs– i.e.whenaprocesstriestoaccessaninvalidpointer– TheOSkillstheprocessthatgeneratepagefaults
• However,nowhandlingpagefaultsismorecomplicated– IfthePTEisinvalid,theOSstillkillstheprocess– IfthePTEisvalid,butpresent=0,then
1. TheOSswapsthepagebackintomemory2. TheOSupdatesthePTE3. TheOSinstructstheCPUtoretrythelastinstruction
90
PageFaultPseudocode1. VPN=(VirtualAddress&VPN_MASK)>>SHIFT2. (Success,TlbEntry)=TLB_Lookup(VPN)3. if (Success==True)//TLBHit4. if(CanAccess(TlbEntry.ProtectBits)==True)5. Offset=VirtualAddress&OFFSET_MASK6. PhysAddr =(TlbEntry.PFN<<SHIFT)|Offset7. Register=AccessMemory(PhysAddr)8. else RaiseException(PROTECTION_FAULT)9. else //TLBMiss10. PTEAddr =PTBR+(VPN*sizeof(PTE))11. PTE=AccessMemory(PTEAddr)12. if(PTE.Valid ==False)RaiseException(SEGMENTATION_FAULT)13. if (CanAccess(PTE.ProtectBits)==False)14. RaiseException(PROTECTION_FAULT)15. if(PTE.Present ==True)//assuminghardware-managedTLB16. TLB_Insert(VPN,PTE.PFN,PTE.ProtectBits)17. RetryInstruction()18. elseif (PTE.Present ==False)RaiseException(PAGE_FAULT) 91
WhenShouldtheOSEvictPages?• Memoryisfinite,sowhenshouldpagesbeswapped?
• On-demandapproach– Ifapageneedstobecreatedandnofreepagesexist,swapapagetodisk
• Proactiveapproach– MostOSes trytomaintainasmallpooloffreepages– Implementahighwatermark– Oncephysicalmemoryutilizationcrossesthehighwatermark,abackgroundprocessstartsswappingoutpages 92
WhatPagesShouldbeEvicted?• Knownasthepage-replacementpolicy• Whatistheoptimalevictionstrategy?– Evictthepagethatwillbeaccessedfurthestinthefuture– Provablyresultsinthemaximumcachehitrate– Unfortunately,impossibletoimplementinpractice
• Practicalstrategiesforselectingwhichpagetoswaptodisk– FIFO– Random– LRU(Leastrecentlyused)
• SamefundamentalalgorithmsasinTLBeviction 93
ExamplesofOptimalandLRU
Optimal(FurthestintheFuture)Access Hit/Miss? Evict CacheState
0 Miss 0
1 Miss 0,1
2 Miss 0,1,2
0 Hit 0,1,2
1 Hit 0,1,2
3 Miss 2 0,1,3
0 Hit 0,1, 3
3 Hit 0,1,3
1 Hit 0,1,3
2 Miss 3 0,1, 2
0 Hit 0,1,2
LRUAccess Hit/Miss? Evict CacheState
0 Miss 0
1 Miss 0,1
2 Miss 0,1,2
0 Hit 0,1,2
1 Hit 0,1,2
3 Miss 2 0,1,3
0 Hit 0,1, 3
3 Hit 0,1,3
1 Hit 0,1,3
2 Miss 0 1,2,3
0 Miss 3 0,1,2
Assumethecachecanstore3pages
95
Whenmemoryaccessesarerandom,itsimpossibletobesmartaboutcaching
• Allmemoryaccessesareto100%randompages
97
• Theprocesssequentiallyaccessesonememoryaddressin50pages,thenloops
• WhenthecachesizeisC<50,LRUevictspageXwhenpageX+C isread
• Thus,pagesarenotinthecacheduringthenextiterationoftheloop
WhenC>=50,allpagesarecached,thushitrate=100%
ImplementingHistoricalAlgorithms• LRUhashighcachehitratesinmostcases…• …buthowdoweknowwhichpageshavebeenrecentlyused?
• Strategy1:recordeachaccesstothepagetable– Problem:addsadditionaloverheadtopagetablelookups
• Strategy2:approximateLRUwithhelpfromthehardware
98
x86PageTableEntry,Again• Onx86,pagetableentries(PTE)are4bytes
99
31- 12 11- 9 8 7 6 5 4 3 2 1 0
PageFrameNumber (PFN) Unused G PAT D A PCD PWT U/S W P
• Bitsrelatedtoswapping– A– accessedbit– hasthispagebeenreadrecently?– D– dirtybit– hasthispagebeenwrittenrecently?– TheMMUsetstheaccessedbitwhenitreadsaPTE– TheMMUsetsthedirtybitwhenitwritestothepagereferencedinthePTE
– TheOSmaycleartheseflagsasitwishes
ApproximatingLRU• Theaccessedanddirtybitstelluswhichpageshavebeenrecentlyaccessed
• But,LRUisstilldifficulttoimplement– Oneviction,LRUneedstoscanallPTEstodeterminewhichhavenotbeenused
– ButtherearemillionsofPTEs!• IsthereacleverwaytoapproximateLRUwithoutscanningallPTEs?– Yes!
100
TheClockAlgorithm• ImaginethatallPTEsarearrangedinacircularlist• TheclockhandpointstosomePTEPinthelist
101
function clock_algo() {
start = P;
do {
if (P.accessed == 0) {
evict(P);
return;
}
P.accessed = 0;
P = P.next;
} while (P != start);
evict_random_page();
}
IncorporatingtheDirtyBit• Moremodernpageevictionalgorithmsalsotakethedirtybitintoaccount
• Forexample:supposeyoumustevictapage,andallpageshavebeenaccessed– Somepagesareread-only(likecode)– Somepageshavebeenwrittentoo(i.e.theyaredirty)
• Evictthenon-dirtypagesfirst– Insomecases,youdon’thavetoswapthemtodisk!– Example:codeisalreadyonthedisk,simplyreloadit
• Dirtypagesmustalwaysbewrittentodisk– Thus,theyaremoreexpensivetoswap
102