cs 2630 computer organizationhomepage.cs.uiowa.edu/~bdmyers/cs2630_sp17/public/... · 1. i...
Post on 13-Jun-2020
1 Views
Preview:
TRANSCRIPT
CS2630ComputerOrganization
Whatdidweaccomplishin15weeks?BrandonMyers
UniversityofIowa
Whytake2630?
• Theesotericanswer:Computer Sciencegraduatesshouldhaveanappreciationforhowrealcomputerswork• Butreally…
• 1.Itwillbeuptoyoutodesignournewcomputersystems...computerarchitectshavebeenpanickingfornearlyadecadeandtheyarenot calmingdown
• 2.Evenifyouvowtonever,ever,EVERdoanythingexceptapplicationsprogramming...atsomepointyouwillbehavetomeasureasystemyou’vebuilt:performance(latency&throughput),energyusage,reliability,...Tounderstandhowtomeasure/interpret/improveyoursystem,youneedtounderstandmoreofthecomputer
require slidefromday1
App
High-levellanguage(e.g.,C,Java)
Instructionsetarchitecture(e.g.,MIPS)
Compiler
Operatingsystem(e.g.,Linux,Windows)
Memorysystem I/OsystemProcessor
Datapath &Control
Digitallogic
Circuits
Devices(e.g.,transistors)
Physics
lw $t0, 4($s0)addi $t0, $t0, 10sw $t0, 8($s0)
YoulearnedhowtowriteassemblycodeinHW2(usuallythecompilerdoestheworkforus)
rug:don’tneedtowriteassemblycodeforaparticulararchitecture.InsteadwriteportableJava/Python/Ccode
bumps:someCcodeisn’tportable;someprogrammerswritesnippetsofassemblycodewhenthecompilerdoesn’tdothebestthing
100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000
lw $t0, 4($s0)addi $t0, $t0, 10sw $t0, 8($s0)
Project1– MiniMAtheMIPSassemblerrug:wecanwriteMIPS
programsinalanguagemadeofhuman-readablecharacters,usepseudoinstructions,refertolabelseventhoughthemachinereadsbinarynumbers
100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000
lw $t0, 4($s0)addi $t0, $t0, 10sw $t0, 8($s0)
100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000
100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000
100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000100001000001011111000000001111000011100100001101010000000000100000000110000010000000000000001000100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000
math.astdlib.a
browser.exe
rug: linkerallowsustowriteourprogramsmodularly
PEERINSTRUCTION(actually,asurvey)stronglyagreeagreeneutraldisagreestronglyagree
abcde
1. Iunderstandtherelationshipbetweenbits,numbers,andinformation.
2. Iunderstandthestoredprogramconcept3. Iunderstandtheroleoftheinstructionsetarchitectureina
computer4. Iunderstandwhyabstractionsareessentialforbuildingcomplex
systems5. Iunderstandwhythedigitalabstractionisimportant6. Iunderstandwhythesynchronousabstractionisimportant7. Iunderstandthetradeoffsinthememoryhierarchy8. Iunderstandhowproblemscanbedecomposedintoadatapath and
acontrol9. Iappreciatethelayersofthecomputingstackandwhytheymay
needtochangeinthenearfuture.
100011100000100000000000000001000010000100001000000000000000101010101110000010000000000000001000
browser.exe
InProject2-2youplayedtheroleoftheLoaderbyloadingyourheximageintotheInstructionmemory
rug: ourprogramhastheillusionofhavingaccesstotheentireaddressspace(e.g.all232bytes)ofthecomputer
Project2-2,youdesigned,built,andtestedaprocessorthatrunsassembledMIPSprograms datapath andcontrol
rug:machinecodefortheMIPSarchitectureoughttorunonanyMIPSprocessor,regardlessofitsdesign(itsmicroarchitecture)
bumps:choicesaboutthearchitecturesometimesarebasedonassumptionsaboutthemicroarhitecture (e.g.,MIPSbranchdelayslot).
Project2-1,HW4youbuiltcomponents(likeregisterfilesandfinitestatemachines)fromsequentiallogic
rug:wecanbuildacomplexsystemoutofbasiccomponents.Synchronous abstractionallowsustonothavetoworryaboutinterfacesbetweencomponents
Digitallogic
Circuits
Instructionsetarchitecture(e.g.,MIPS)
GPUinstructionsetneuralnetwork
structureandweights
Youdon’thavetobuildthe5-stagepipelinedMIPSprocessor
http://people.cs.pitt.edu/~cho/cs2410/papers/yeager-micromag96.pdf
MIPSR10000(out-of-ordersuperscalar)
youdon’thavetobuildaMIPSprocessor
systolicarrayandcontrol
everycomponentismadeoflogicgates;youlearnedhowtobuildlogiccircuitsinHW3
logicgatemadeofpMOS andnMOStransistorsarrangedinaCMOSconfiguration
rug: ItisabiteasierjusttothinkofgatesthatarefunctionsasopposedtotransistorsattachingoutputtoVsource orVground
rug:CMOSensureseverygatehasapure0or1output!Thisideaisthedigitalabstractionthatletsthelayersabovecomposetwoelectricalcircuitswithoutworryingabouthowtheyaffecteachother.
layoutengineer’sviewofaNORgate
https://commons.wikimedia.org/wiki/File:NOR_gate_layout.png
rug:Whenbuildingafunctionaldigitallogiccircuit,noneedtoworryabouthowitisarrangedonthesilicon
https://commons.wikimedia.org/wiki/File:NOR_gate_layout.png
nMOScrosssection
https://commons.wikimedia.org/wiki/File:MOSFET_functioning_body.svg
rug:deviceengineersprovidelayoutengineerswith“designrules”.Iftheyobeytherequiredspacingbetweencomponentsthenthetransistorswillwork
OFF ON
rug:WhenoperatingatransistorinthesaturationregimesitlookslikeanelectricalswitchbetweenvoltagesGNDandVDD.Partofsupportingthedigitalabstraction.
TheCreationofAdam
Theprogrammablecomputer
blatantlystealingatraditionfrommy ComputerOrganizationinstructor
thisslidesettoHandel’sHallelujahchorus
SOFTWARE
HARDWARE
It’snotenoughtojustbuildsoftwarethesedays
ProjectCatapult:customhardwarerunningpartofBingsearch
computervisionprocessorsrunningGoogle’saugmentedrealityplatformTango
“holographic”processorforMicrosoft’saugmentedrealityplatformHoloLens
Sparc M7chipisbuiltspecificallyforforacceleratingdatabasequeries
allinproductionnotjustresearch
GoogleTPUisbuiltformachinelearning
LifebeyondLogisim?• Logisim’s mainmodeofinputisschematicentry
• Muchdigitallogicdesignuseshardwaredescriptionlanguages(HDL)likeVerilog(lookupVeriloginyourtextbookindex)• HDLisnotmuchdifferentthanwhatyoudid,exceptitistextualinsteadofgraphical
• Typicallyhavepowerfulcompilersthanmakedevelopmenteasier thanusingLogisim,e.g.,writeastatementlike
case(ALUCtrl){0:R=X+Y1:R=X-Y…
}AndyougetanALU!
Didwereallybuildarealprocessor?
• Yes!YouimplementedmuchoftheMIPSInstructionSetArchitectureandI/O.YourProject2-2couldrunLinux(at4KHzclockfrequency)givenabootloaderprogramandLinuxcompiledforMIPS.
No,Imeanlikereal hardware
No,Imeanlikereal hardware
Ifweuseahardwarecompiler,wecouldturnyourlogisimfiles(lookinside;it’sjustsomeXMLlistingabunchofcomponentsandwires)intoanFPGAdesignorstandard-cellVLSIdesign
moretolearnabouthowtodealwiththedetailsofthesedesignflows,butyouhaveagoodstartingpoint
Administrivia
• FinalExam• Friday,3-5pminhere!• opennotes/book,noelectronics• reminder:practicematerialsonICONannouncement
Theinsightsyoubroughttothecourse:CATopics
Whatcoursesnext?• CS:3620OperatingSystems
• CS:3210ProgramminglanguagesandTools(inC++)
• CS:3640IntroductiontoNetworksandTheirApplications
• CS:3820ProgrammingLanguageConcepts
• CS:4640 ComputerSecurity
• CS:4700HighPerformanceandParallelComputing
• CS:5610:Highperformancecomputerarchitecture
• CS:4980TopicsinCS(CompilerConstructiononraspberrypi)• CS:4980TopicsinCS(askforacomputerarchitecturecourse!)
What’stolearnnext:operatingsystems
Questionswedidn’tgettoanswerfullyinCS2630Operatingsystems
• howdomultipleprogramssharethecomputer?• 2-64processors• 1 networkinterface• 1 memory• 1keyboard,mouse,screen• 100’sofrunningprograms
• howdoyoukeepprogramsisolatedfromeachotheroroneprogramfromconsumingallresources?
• howdoyouimplementsyscalls?
• howdoyouloadtheOScodeintomemorywhenyoupoweronthecomputer?
What’stolearnnext:computerarchitecture
What’stolearnnext:computerarchitecture
• theroleofparallelisminmicroarchitectures
• everyimplementationanditseffectonperformance...
𝑠𝑒𝑐𝑜𝑛𝑑𝑠𝑝𝑟𝑜𝑔𝑟𝑎𝑚 =
𝑠𝑒𝑐𝑜𝑛𝑑𝑠𝑐𝑦𝑐𝑙𝑒 ∗
𝑐𝑦𝑐𝑙𝑒𝑠𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 ∗
𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠𝑝𝑟𝑜𝑔𝑟𝑎𝑚
...andcostandenergy
Parallelisminarchitecturespipelining
vector
superscalar
dataflow
andothers...multicore,VLIW,multithreading,...
Vectormachinesfoundin...• earlysupercomputers
• IntelAVX• GPUs
SIMD:singleinstruction,multipledata
Superscalarmachines
Replicateresources,e.g.,• twodecoders,2-wide
instructioncachereadport:fetchtwoinstructionsatatime
• twoALUs:executetwoinstructionsatatime
• moreregisterfilewriteports:writebacktworegistersinonecycle
foundin...mostCPUsinserversandsmartphones
superscalar+pipelining
superscalar
Dataflowmachinesaprocessorneedstogettheinstructionandtheinputdatatothesamephysicalplaceatthesametime(knownas“dataflowlocality”)
DataflowmachineshaveabunchofExecutionunitsofvariouskind;thedata”flows”throughtheoperators
Challenges?
What’stolearnnext:parallelcomputing
Metricforperformancecomparison:TimeMyprogramrunsin100seconds
IfI“parallelizeit”on10processorsIsawthatitrunsin12seconds
Whatisthespeedup?
Tserial /Tparallel =100/12=8.33X
Predictingparallelrunningtime(Tpar)fromserialrunningtimeMyprogramrunsinTser =100seconds
IfI“parallelizeit”on10processors,howfastwillitrun(i.e.,whatisTpar)?
𝑇4567689:𝑇6;<54=>?
=1
1 − 𝑟 + 𝑟𝑠
𝑇6;<54=>? = 𝑇4567689: ∗ ( 1 − 𝑟 ∗ 1 + 𝑟 ∗1𝑠)
Inthisform,itiscalledAmdahl’slaw:saysyourspeedupislimitedbyhowmuchoftheprogramisimproved(e.g.,parallelize)
r=fractionofprogramthatisabletobeimproveds=speedupwhenapplyingtheimprovement
https://en.wikipedia.org/wiki/Amdahl's_law#/media/File:AmdahlsLaw.svg
Amdahl’slawappliedtoparallelization
r
s
Asequentialabstractmachinemodelyoualreadyknow• RAM:randomaccessmemory• justlikeanyothercomputationalstep,accessingmemoryiscostof1
memory
processor
RAM
Oneofthefoundationalparallelmachinemodels:ParallelRandomAccessMachine(PRAM)• Allprocessorsareattachedtoasharedmemory
• Memoryaccesstakes1step
• MorerealisticvariantsofPRAMincurgreatercostfor“conflicting”memoryaccesses
• usedveryoftenforunderstandingthespeeduplimitsofparallelalgorithms;notveryrealistic
Oneofthefoundationalparallelmachinemodels:Bulksynchronousparallel(BSP)
https://en.wikipedia.org/wiki/Bulk_synchronous_parallel
w1 w2wp
𝑙
ℎ6
(seeblackboardnotes)
thisabstractmachinedoesnotsupportasmanyalgorithmsasCTA,butitissimpler
ThefutureofCS2630• Pleasestayintouch!• TellothershowawesomeCS2630is!• Signuptobeanapprovedtutor!https://cs.uiowa.edu/resources/approved-tutors
• CS2630movingtoTILEclassroomsinFall• Replacingsomelectureswithlabassignments• allowustobettersupportlearningallthetools,getmoretimeanalyzinganddesigningprogramsandcircuits• veryspeculatively:futureopportunityforlabassistants(helpstudentsbutdonotgradework)
top related