lecture 1 1 cs 352h: computer systems architecture lecture 1: what is computer architecture and why...
Post on 01-Jan-2016
226 Views
Preview:
TRANSCRIPT
Lecture 1 1
CS 352H: Computer Systems Architecture
Lecture 1: What is Computer Architecture and why should I care?
Professor Emmett Witchel
University of Texas at Austin
witchel@cs.utexas.edu
Lecture 1 2
Goals
• Understand the “how” and “why” of computer system organization– Instruction Set Architecture– System Organization (processor, memory, I/O)– Microarchitecture– Virtualization
• Learn methods of evaluating performance– Metrics & benchmarks
• Learn how to make systems go fast– Pipelining, caching– Parallelism (ILP, DLP, TLP)– Application specific architectures (graphics, signal proc.)
• Preview of where architecture is heading
Lecture 1 3
Logistics
Lectures T/Th 12:30-2:00pm, PAI 3.14Instructor Prof. Emmett Witchel, W 1:15-2:15TA Shalini Sahoo MW 11:30-1:00pm PAI 5.38 Desk1
Grading see web page
Texts Hennessy & Patterson, Computer Organization and Design (Fourth
Edition)Including CDRevised Fourth Edition preferred, not required
Lecture 1 4
CS352H Online
URL: www.cs.utexas.edu/users/witchel/CS352H
I will occasionally email you via blackboard and by your registered email address. I expect this channel to be reliable and timely.
discussion group: via blackboard login at courses.utexas.edu
General, Homeworks, Project
Computer Architecture Seminar Series:
www.cs.utexas.edu/users/cart/arch
Lecture 1 5
Assignment for Next Tuesday
• Turn in student survey forms, if you want• Read the Moore paper (see webpage)
– Write a review of 1/2-1 page (see syllabus)– Review should include
• Summary of content of paper• Your observations on the most interesting/important
aspects• Your observations on its relevance today
– Be prepared to discuss on Tuesday in class
Discussion
• Are you interested in taking this course?• One question about computer science• One question about computer architecture
CS352HFall 2007
Lecture 1 6
Lecture 1 7
Specification
Program
ISA (Instruction Set Architecture)
microArchitecture
Logic
Transistors
Physics/Chemistry
compute the fibonacci sequence
for(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];}
load r1, a[i];add r2, r2, r1;
registers
A
B
S
F
G
D
S
G
S
D
Arch vs. µarch
Lecture 1 8
CS352H Topics
• Technology Trends• Instruction set architectures• Pipelining• Modern pipelined architectures
– Dynamic ILP machines– Static ILP machines
• Cache memory systems• Virtual memory• Multiprocessors• Computer system implementation
Making This Class Work For You
• Plus and minus grades• Clickers
CS352HFall 2007
Lecture 1 9
Lecture 1 10
What is Computer Architecture?
Technology
ApplicationsComputer Architect
Interfaces
Machine Organization
Measurement &Evaluation
ISA
AP
I
Link
I/O C
han
Regs
IR
Lecture 1 11
Technology Constraints
• Yearly improvement– Semiconductor technology
• 60% more devices per chip(doubles every 18 months)
• 15% faster devices(doubles every 5 years)
• Slower wires– Magnetic Disks
• 60% increase in density– Circuit boards
• 5% increase in wire density
– Cables• no change
1998
1995
1992
1989
>100x more devices since 198910x faster devices
2002
2006
90nm130nm1000nm
800nm
350nm250nm
Lecture 1 12
Changing Technology leads to Changing Architecture
• 1970s– multi-chip CPUs
– semiconductor memory very expensive
– microcoded control
– complex instruction sets (good code density)
• 1980s– single-chip CPUs, on-chip
RAM feasible
– simple, hard-wired control
– simple instruction sets
– small on-chip caches
• 1990s– lots of transistors
– complex control to exploit instruction-level parallelism
• 2000s– even more transistors
– Power wall
– Transition to CMPs
– Multi-level caches
• 2010s– Embedded vs. Desktop vs.
Data center (cloud)
– New storage (PCM, flash)
– Simpler cores and lots of them
– Optimizing for power
Lecture 1 13
Intel 4004 - 1971
• The first microprocessor
• 2,300 transistors• 108 KHz• 10m process
Lecture 1 14
Some Recent Chips!
Intel Pentium IV
• 42 million transistors
• 4GHz
• 0.13m process
• Could fit ~15,000 4004s on this chip!
NVidia - GeForce 6800• 222 million transistors• 400MHz• 0.13m process
Intel Itanium II (Montecito)• 1.7 billion transistors• 1.6 GHz• 90nm process
IBM Cell• 8 vector processors + 1
PPC• 4 GHz• 90nm process
Intel’s net revenue was around $35 billion a year for most of the aughtsR&D about $5 billion a year
CS352HFall 2007
Lecture 1 15
Any Architecture You Want (as long as it is x86)
Lecture 1 16
Application Constraints
• Applications drive machine ‘balance’
– Numerical simulations• floating-point performance• main memory bandwidth
– Transaction processing• I/Os per second• integer CPU performance
– Decision support• I/O bandwidth
– Embedded control• I/O timing, power
– Media processing• low-precision ‘pixel’
arithmetic
Lecture 1 17
Application-Driven Architectures
• General purpose - good performance on “all” programs– x86 family, ARM, powerPC, etc.
• Application specificity can focus on:– Types of concurrency available– Domain of deployment (server, handheld, desktop)
• Today - overview of graphics processors– Interface (instruction set architecture - ISA)– Processor organization– Concurrent elements
Apple’s iPad/iPhone4 Powered by A4 Chip
• A4 is modified ARM Cortex run at 1GHz– Integrated processor, graphics, memory controller
• Among other claims, ARM says the processors gets a near "25 percent processing power boost, even at same processor speed, from the use of a new instruction pipelining system." – We will cover pipelining in this class.
• Claim: 10 hours of 1024x768 video at 25W• Let’s look at the Freescale i.MX51
CS352HFall 2007
Lecture 1 18
Performance: Latency and Throughput
• Latency: time to complete an operation• Throughput: work completed per unit time• Consider plumbing
– Low latency: turn on faucet and water comes out– High bandwidth: lots of water (e.g., to fill a pool)
• What is “High speed Internet?”– Low latency: needed to interactive gaming– High bandwidth: needed for downloading large files– Marketing departments like to conflate latency and
bandwidth…
Relationship between Latency and Throughput
• Latency and bandwidth only loosely coupled– Henry Ford: assembly lines increase bandwidth without
reducing latency
• My factory takes 1 day to make a Model-T ford.– But I can start building a new car every 10 minutes– At 24 hrs/day, I can make 24 * 6 = 144 cars per day– A special order for 1 green car, still takes 1 day– Throughput is increased, but latency is not.
• Latency reduction is difficult• Often, one can buy bandwidth
– E.g., more memory chips, more disks, more computers– Big server farms (e.g., google) are high bandwidth
What is cloud computing?
• Cloud computing is where dynamically scalable and often virtualized resources are provided as a service over the Internet (thanks, wikipedia!)
• Infrastructure as a service (IaaS)– Amazon’s EC2 (elastic compute cloud)
• Platform as a service (PaaS)– Google gears– Microsoft azure
• Software as a service (SaaS)– gmail– facebook– flickr
Thanks, James Hamilton, amazon
Lecture 1 23
Graphics has dedicated chip in PCs
CPU
Memory
Input/Output Glue Chip(“South Bridge”)
GraphicsProcessor
Memory Controller Chip(“North Bridge”)
Memory
Memory
Memory
Disk, Keyboard, PCIe, etc.
582 Milliontransistors
681 Milliontransistors(GeForce 8800, 90nm)
(AGP, PCIe) (Intel “Kentsfield” quad core,
QX6700, 65nm, two dies, 8MB L2$)
Lecture 1 24
GPU/CPU Performance comparison
GF
LOP
S
G80 = GeForce 8800 GTX
G71 = GeForce 7900 GTX
G70 = GeForce 7800 GTX
NV40 = GeForce 6800 Ultra
NV35 = GeForce FX 5950 Ultra
NV30 = GeForce FX 5800
Source: NVIDIA (except CELL and Core2 Quad)
* IBM Cell ~200 GFlops
Core 2 Quad 3GHz, 96 GFLOPS *
CS352HFall 2007
Lecture 1 25
Why a dedicated processing chip?
• 1) Specialization – becoming less important with time• 2) Parallelism – becoming more important
Graphics processors are the only highly-parallelprocessors in every desktop machine.
128 “processors”* 2 FLOPS@ 1.35 GHz
You can program them!
Lecture 1 26
Graphics requires programmability
void normalmapped(float2 normalMapTexCoord : TEXCOORD0,void normalmapped(float2 normalMapTexCoord : TEXCOORD0, … … out float4 color : COLOR, out float4 color : COLOR, uniform float ambient, uniform float ambient, …) …){{ float3 normalTex, …; float3 normalTex, …; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; … … diffuse = saturate(dot(normal, normLightDir); diffuse = saturate(dot(normal, normLightDir); … … color = Kd * (ambient + diffuse ) + color = Kd * (ambient + diffuse ) + Ks * pow(specular, specularExponent; Ks * pow(specular, specularExponent;}}
Every application does something a bit different.
Example Cg “shader” program (invoked like a “callback” function):
Lecture 1 27
GeForce 8800
Lecture 1 28
Next Time
• Performance evaluation• Basic computer organization• How chips are made• Start in on instruction set review/overview
• Always check web page for assignments
top related