lecture 01: introduction - github pages8 •“i think there is a world market for maybe five...
TRANSCRIPT
![Page 1: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/1.jpg)
Lecture01:Introduction
CSCE513ComputerArchitectureFall2018
DepartmentofComputerScienceandEngineeringYonghong Yan
[email protected]://cse.sc.edu/~yanyh
1
![Page 2: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/2.jpg)
CopyrightandAcknowledgement• Lotsoftheslideswereadaptedfromlecturesnotesofthetwotextbookswithcopyrightofpublisherorthe
originalauthorsincludingElsevierInc,MorganKaufmann,DavidA.PattersonandJohnL.Hennessy.• Someslideswereadaptedfromthefollowingcourses:
– UCBerkeleycourse“ComputerScience252:GraduateComputerArchitecture”ofDavidE.CullerCopyright2005UCB• http://people.eecs.berkeley.edu/~culler/courses/cs252-s05/
– GreatIdeasinComputerArchitecture(MachineStructures)byRandyKatzandBernhardBoser• http://inst.eecs.berkeley.edu/~cs61c/fa16/
• Ialsorefertothefollowingcoursesandlecturenoteswhenpreparingmaterialsforthiscourse– ComputerScience152:ComputerArchitectureandEngineering,Spring2016byDr.GeorgeMichelogiannakis from
UCBerkeley• http://www-inst.eecs.berkeley.edu/~cs152/sp16/
– ComputerScience252:GraduateComputerArchitecture,Fall2015byProf.Krste Asanović fromUCBerkeley• http://www-inst.eecs.berkeley.edu/~cs252/fa15/
– ComputerScienceS250:VLSISystemsDesign,Spring2016byProf.JohnWawrzynek fromUCBerkeley• http://www-inst.eecs.berkeley.edu/~cs250/sp16/
– ComputerSystemArchitecture,Fall2005byDr.JoelEmer andProf.Arvind fromMIT• http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-823-computer-system-architecture-
fall-2005/– SynthesisLecturesonComputerArchitecture
• http://www.morganclaypool.com/toc/cac/1/1
• Theusesofthematerials(sourcecode,slides,documentsandvideos)ofthiscourseareforeducationalpurposesonlyandshouldbeusedonlyinconjunctionwiththetextbook.Derivativesofthematerialsmustacknowledgethecopyrightnoticesofthisandtheoriginals.Permissionforcommercialpurposesshouldbeobtainedfromtheoriginalcopyrightholderandthesuccessivecopyrightholdersincludingmyself.2
![Page 3: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/3.jpg)
Contents
• Computercomponents• Computerarchitecturesandgreatideasincomputerarchitectures
• Performance
3
![Page 4: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/4.jpg)
GenerationOfComputers
4https://solarrenovate.com/the-evolution-of-computers/
![Page 5: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/5.jpg)
NewSchoolComputer(#1)
PersonalMobileDevices
55
![Page 6: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/6.jpg)
NewSchool“Computer”(#2)
66
![Page 7: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/7.jpg)
ClassesofComputers
• PersonalMobileDevice(PMD)– e.g.startphones,tabletcomputers– Emphasisonenergyefficiencyandreal-time
• DesktopComputing– Emphasisonprice-performance
• Servers– Emphasisonavailability,scalability,throughput
• Clusters/WarehouseScaleComputers– Usedfor“SoftwareasaService(SaaS)”– Emphasisonavailabilityandprice-performance– Sub-class:Supercomputers,emphasis:floating-point
performanceandfastinternalnetworks• InternetofThings/EmbeddedComputers
– Emphasis:price
![Page 8: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/8.jpg)
8
• “Ithinkthereisaworldmarketformaybefivecomputers.”
– ThomasWatson,chairmanofIBM,1943.
• “Thereisnoreasonforanyindividualtohaveacomputerintheirhome”
– KenOlson,presidentandfounderofDigitalEquipmentCorporation,1977.
• “640K[ofmemory]oughttobeenoughforanybody.”– BillGates,chairmanofMicrosoft,1981.
NotesbythePioneers
![Page 9: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/9.jpg)
ComponentsofaComputer
• Samecomponentsforallkindsofcomputer– Desktop,server,
embedded• Twocoreparts
– Processorandmemory• Input/outputincludes
– User-interfacedevices• Display,keyboard,mouse
– Storagedevices• Harddisk,CD/DVD,flash
– Networkadapters• Forcommunicatingwithothercomputers
![Page 10: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/10.jpg)
InsidetheProcessor(CPU)
• Functionalunits:performscomputations• Datapath:wiresformovingdata• Controllogic:sequencesdatapath,memory,andoperations• Cachememory
– SmallfastSRAMmemoryforimmediateaccesstodata
Apple A5
10
![Page 11: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/11.jpg)
ASafePlaceforData
• Volatilemainmemory– Losesinstructionsanddatawhenpoweroff
• Non-volatilesecondarymemory– Magneticdisk– Flashmemory– Opticaldisk(CDROM,DVD)
![Page 12: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/12.jpg)
Contents
• Computercomponents• Computerarchitecturesandgreatideasincomputerarchitectures
• Performance
12
![Page 13: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/13.jpg)
Whatis“ComputerArchitecture”?
13
Applications
InstructionSetArchitecture
Compiler
OperatingSystem
Firmware
I/OsystemInstr.SetProc.
DigitalDesignCircuitDesign
Datapath &Control
Layout&fabSemiconductorMaterials
software
hardware
![Page 14: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/14.jpg)
14
TheInstructionSet:aCriticalInterface
instructionset
software
hardware
• Propertiesofagoodabstraction– Laststhroughmanygenerations(portability)– Usedinmanydifferentways(generality)– Providesconvenientfunctionalitytohigherlevels– Permitsanefficientimplementationatlowerlevels
![Page 15: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/15.jpg)
15
ElementsofanISA
• Setofmachine-recognizeddatatypes– bytes,words,integers,floatingpoint,strings,...
• Operationsperformedonthosedatatypes– Add,sub,mul,div,xor,move,….
• Programmablestorage– regs,PC,memory
• Methodsofidentifyingandobtainingdatareferencedbyinstructions(addressingmodes)– Literal,reg.,absolute,relative,reg +offset,…
• Format(encoding)oftheinstructions– Opcode,operandfields,…
![Page 16: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/16.jpg)
ComputerArchitectureHowthingsareputtogetherindesignandimplementation
• Capabilities&PerformanceCharacteristicsofPrincipalFunctionalUnits
–(e.g.,Registers,ALU,Shifters,LogicUnits,...)•Waysinwhichthesecomponentsareinterconnected• Informationflowsbetweencomponents• Logicandmeansbywhichsuchinformationflowiscontrolled.
• ChoreographyofFUstorealizetheISA
16
![Page 17: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/17.jpg)
Great Ideas in Computer Architectures
1. Design for Moore’s Law
2. Use abstraction to simplify design
3. Make the common case fast
4. Performance via parallelism
5. Performance via pipelining
6. Performance via prediction
7. Hierarchy of memories
8. Dependability via redundancy
17
![Page 18: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/18.jpg)
GreatIdea:“Moore’sLaw”
GordonMoore,FounderofIntel• 1965:sincetheintegratedcircuitwasinvented,thenumberof
transistors/inch2 inthesecircuitsroughlydoubledeveryyear;thistrendwouldcontinuefortheforeseeablefuture
• 1975:revised- circuitcomplexitydoubleseverytwoyears
18Imagecredit:Intel
![Page 19: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/19.jpg)
Moore’sLawtrends• Moretransistors=↑opportunitiesforexploitingparallelisminthe
instructionlevel(ILP)– Pipeline,superscalar,VLIW(VeryLongInstructionWord),SIMD(Single
InstructionMultipleData)orvector,speculation,branchprediction• Generalpathofscaling
– Widerinstructionissue,longerpiepline– Morespeculation– Moreandlargerregistersandcache
• Increasingcircuitdensity~=increasingfrequency~=increasingperformance
• Transparenttousers– Aneasyjobofgettingbetterperformance:buyingfasterprocessors(higher
frequency)
• Wehaveenjoyedthisfreelunchforseveraldecades,however(TBC)…
19
![Page 20: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/20.jpg)
GreatIdea:PipelineFundamentalExecutionCycle
20
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
Obtaininstructionfromprogramstorage
Determinerequiredactionsandinstructionsize
Locateandobtainoperanddata
Computeresultvalueorstatus
Depositresultsinstorageforlateruse
Determinesuccessorinstruction
Processor
regs
F.U.s
Memory
program
Data
vonNeumanbottleneck
![Page 21: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/21.jpg)
PipelinedInstructionExecution
21
Instr.
Order
Time (clock cycles)
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
![Page 22: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/22.jpg)
GreatIdea:Abstraction(LevelsofRepresentation/Interpretation)
lw $t0,0($2)lw $t1,4($2)sw $t1,0($2)sw $t0,4($2)
HighLevelLanguageProgram(e.g.,C)
AssemblyLanguageProgram(e.g.,MIPS)
MachineLanguageProgram(MIPS)
HardwareArchitectureDescription(e.g.,blockdiagrams)
Compiler
Assembler
MachineInterpretation
temp=v[k];v[k]=v[k+1];v[k+1]=temp;
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
LogicCircuitDescription(CircuitSchematicDiagrams)
ArchitectureImplementation
Anythingcanberepresentedasanumber,
i.e.,dataorinstructions
22
![Page 23: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/23.jpg)
TheMemoryAbstraction
• Associationof<name,value>pairs– typicallynamedasbyteaddresses– oftenvaluesalignedonmultiplesofsize
• SequenceofReadsandWrites• Writebindsavaluetoanaddress
– Leftvalue• Readofaddr returnsmostrecentlywrittenvalueboundtothataddress– Rightvalue
23
address (name)command (R/W)
data (W)
data (R)
done
int a=b;
![Page 24: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/24.jpg)
Greatidea:MemoryHierarchyLevelsoftheMemoryHierarchy
24
CPU Registers100s Bytes<< 1s ns
Cache10s-100s K Bytes~1 ns$1s/ MByte
Main MemoryM Bytes100ns- 300ns$< 1/ MByte
Disk10s G Bytes, 10 ms (10,000,000 ns)$0.001/ MByte
CapacityAccess TimeCost
Tapeinfinitesec-min$0.0014/ MByte
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
StagingXfer Unit
prog./compiler1-8 bytes
cache cntl8-128 bytes
OS512-4K bytes
user/operatorMbytes
Upper Level
Lower Level
faster
Larger
![Page 25: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/25.jpg)
Processor-DRAMMemoryGap(latency)
25
µProc60%/yr.(2X/1.5yr)
DRAM9%/yr.(2X/10 yrs)
1
10
100
100019
8019
81
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU19
82
Processor-MemoryPerformance Gap:(grows 50% / year)
Perf
orm
ance
Time
“Moore’s Law”
![Page 26: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/26.jpg)
JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?
26RegistersOn Chip CacheOn Board Cache
Main Memory
Disk
12
10
100
Tape /Optical Robot
10 9
10 6
Charleston
This CampusThis Room
My Head
10 min
2 hr
2 Years
1 min
Pluto
2,000 Years
Andromeda
(ns)
JimGrayTuringAwardB.S.Cal1966Ph.D.Cal1969!
![Page 27: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/27.jpg)
ThePrincipleofLocality
• ThePrincipleofLocality:– Programaccessarelativelysmallportionoftheaddressspace
atanyinstantoftime.• TwoDifferentTypesofLocality:
– TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)
– SpatialLocality(LocalityinSpace):Ifanitemisreferenced,closeby itemstendtobereferencedsoon(e.g.,straightline code,arrayaccess)
• Last30years,HWreliedonlocalityforspeed
27
P MEM$
![Page 28: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/28.jpg)
GreatIdea:Parallelism
28
![Page 29: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/29.jpg)
Parallelism
• Classesofparallelisminapplications:– Data-LevelParallelism(DLP)– Task-LevelParallelism(TLP)
• Classesofarchitecturalparallelism:– Instruction-LevelParallelism(ILP)– Vectorarchitectures/GraphicProcessorUnits(GPUs)– Thread-LevelParallelism– Heterogeneity
29
![Page 30: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/30.jpg)
ComputerArchitectureTopics
30
Instruction Set Architecture
Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, Dynamic Compilation
Addressing,Protection,Exception Handling
L1 Cache
L2/L3 Cache
DRAM
Disks, WORM, Tape
Coherence,Bandwidth,Latency
Emerging TechnologiesInterleavingBus protocols
RAID
VLSI
Input/Output and Storage
MemoryHierarchy
Pipelining and Instruction Level Parallelism
NetworkCommunication
Oth
er P
roce
ssor
s
![Page 31: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/31.jpg)
WhyisArchitectureExcitingToday?
31
CPUSpeedFlat
![Page 32: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/32.jpg)
SingleProcessorPerformance
RISC
Move to multi-processor
32
![Page 33: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/33.jpg)
ProblemsofTraditionalILPScaling
• Fundamentalcircuitlimitations1– delays⇑ asissuequeues⇑ andmulti-portregisterfiles⇑– increasingdelayslimitperformancereturnsfromwiderissue
• Limitedamountofinstruction-levelparallelism1
– inefficientforcodeswithdifficult-to-predictbranches
• Powerandheatstallclockfrequencies
33
[1]Thecaseforasingle-chipmultiprocessor,K.Olukotun,B.Nayfeh,L.Hammond,K.Wilson,andK.Chang,ASPLOS-VII,1996.
![Page 34: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/34.jpg)
ILPimpacts
34
![Page 35: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/35.jpg)
Simulationsof8-issueSuperscalar
35
![Page 36: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/36.jpg)
Power/HeatDensityLimitsFrequency
36
• Somefundamentalphysicallimitsarebeingreached
![Page 37: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/37.jpg)
RecentMulticoreProcessors
37
![Page 38: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/38.jpg)
RecentManycore GPUprocessors
38
��
An�Overview�of�the�GK110�Kepler�Architecture�Kepler�GK110�was�built�first�and�foremost�for�Tesla,�and�its�goal�was�to�be�the�highest�performing�parallel�computing�microprocessor�in�the�world.�GK110�not�only�greatly�exceeds�the�raw�compute�horsepower�delivered�by�Fermi,�but�it�does�so�efficiently,�consuming�significantly�less�power�and�generating�much�less�heat�output.��
A�full�Kepler�GK110�implementation�includes�15�SMX�units�and�six�64�bit�memory�controllers.��Different�products�will�use�different�configurations�of�GK110.��For�example,�some�products�may�deploy�13�or�14�SMXs.��
Key�features�of�the�architecture�that�will�be�discussed�below�in�more�depth�include:�
� The�new�SMX�processor�architecture�� An�enhanced�memory�subsystem,�offering�additional�caching�capabilities,�more�bandwidth�at�
each�level�of�the�hierarchy,�and�a�fully�redesigned�and�substantially�faster�DRAM�I/O�implementation.�
� Hardware�support�throughout�the�design�to�enable�new�programming�model�capabilities�
�
Kepler�GK110�Full�chip�block�diagram�
��
Streaming�Multiprocessor�(SMX)�Architecture�
Kepler�GK110)s�new�SMX�introduces�several�architectural�innovations�that�make�it�not�only�the�most�powerful�multiprocessor�we)ve�built,�but�also�the�most�programmable�and�power�efficient.��
�
SMX:�192�single�precision�CUDA�cores,�64�double�precision�units,�32�special�function�units�(SFU),�and�32�load/store�units�(LD/ST).�
��
Kepler�Memory�Subsystem�/�L1,�L2,�ECC�
Kepler&s�memory�hierarchy�is�organized�similarly�to�Fermi.�The�Kepler�architecture�supports�a�unified�memory�request�path�for�loads�and�stores,�with�an�L1�cache�per�SMX�multiprocessor.�Kepler�GK110�also�enables�compiler�directed�use�of�an�additional�new�cache�for�read�only�data,�as�described�below.�
�
�
64�KB�Configurable�Shared�Memory�and�L1�Cache�
In�the�Kepler�GK110�architecture,�as�in�the�previous�generation�Fermi�architecture,�each�SMX�has�64�KB�of�on�chip�memory�that�can�be�configured�as�48�KB�of�Shared�memory�with�16�KB�of�L1�cache,�or�as�16�KB�of�shared�memory�with�48�KB�of�L1�cache.�Kepler�now�allows�for�additional�flexibility�in�configuring�the�allocation�of�shared�memory�and�L1�cache�by�permitting�a�32KB�/�32KB�split�between�shared�memory�and�L1�cache.�To�support�the�increased�throughput�of�each�SMX�unit,�the�shared�memory�bandwidth�for�64b�and�larger�load�operations�is�also�doubled�compared�to�the�Fermi�SM,�to�256B�per�core�clock.�
48KB�Read�Only�Data�Cache�
In�addition�to�the�L1�cache,�Kepler�introduces�a�48KB�cache�for�data�that�is�known�to�be�read�only�for�the�duration�of�the�function.�In�the�Fermi�generation,�this�cache�was�accessible�only�by�the�Texture�unit.�Expert�programmers�often�found�it�advantageous�to�load�data�through�this�path�explicitly�by�mapping�their�data�as�textures,�but�this�approach�had�many�limitations.��
• ~5kcores
![Page 39: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/39.jpg)
CurrentTrendsinArchitecture• LeveragingInstruction-Levelparallelism(ILP)isnearanend
– Singleprocessorperformanceimprovementendedin2003• Newmodelsforperformance:
– Data-levelparallelism(DLP)– Thread-levelparallelism(TLP)
• Excitingtopicsandchallenges– Heterogeneity– Domainspecificarchitectures– Softwareandhardwareco-design– Agiledevelopment
• DARPAPicksItsFirstSetofWinnersinElectronicsResurgenceInitiative,July2018
• https://spectrum.ieee.org/tech-talk/semiconductors/design/darpa-picks-its-first-set-of-winners-in-electronics-resurgence-initiative.amp.html
39
![Page 40: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/40.jpg)
• Video:https://www.acm.org/hennessy-patterson-turing-lecture• Shortsummary
– https://www.hpcwire.com/2018/04/17/hennessy-patterson-a-new-golden-age-for-computer-architecture/
40
![Page 41: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/41.jpg)
Exercise:InspectISAforsum• Sumexample
– https://passlab.github.io/CSCE513/exercises/sum
• Check– sum_full.s,– sum_riscv.s– sum_x86.s
• Generateandexecute– gcc -save-tempssum.c –osum– ./sum102400
• ForhowtocompileandrunLinuxprogram– https://passlab.github.io/CSCE513/notes/lecture01_LinuxCProgramming.pdf
• Othersystemcommands:– cat/proc/cpuinfo toshowtheCPUand#cores– topcommandtoshowsystemusageandmemory
41
![Page 42: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/42.jpg)
MachineforDevelopmentandExperiment
• LinuxmachinesinSwearingen1D43and3D22– AllCSCEstudentsbydefaulthaveaccesstothesemachine
usingtheirstandardlogincredentials• Letmeknowifyou,CSCEornot,cannotaccess
– RemoteaccessisalsoavailableviaSSHoverport222. Namingschemaisasfollows:• l-1d43-01.cse.sc.eduthroughl-1d43-26.cse.sc.edu• l-3d22-01.cse.sc.eduthroughl-3d22-20.cse.sc.edu
• Restrictedto2GBofdataintheirhomefolder(~/).– Formorespace,createadirectoryin/scratchonthelogin
machine,howeverthatdataisnotsharedanditwillonlybeavailableonthatspecificmachine.
42
![Page 43: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/43.jpg)
PuttySSHConnectiononWindows
43
l-1d43-08.cse.sc.edu 222
![Page 44: Lecture 01: Introduction - GitHub Pages8 •“I think there is a world market for maybe five computers.” –Thomas Watson, chairman of IBM, 1943. •“There is no reason for any](https://reader034.vdocuments.site/reader034/viewer/2022042320/5f09b9337e708231d4283597/html5/thumbnails/44.jpg)
SSHConnectionfromLinux/MacOSXTerminal
44
-XforenablingX-windowsforwardingsoyoucanusethegraphicsdisplayonyourcomputer.ForMacOSX,youneedhaveXserversoftwareinstalled,e.g.Xquartz(https://www.xquartz.org/)istheoneIuse.