lecture 1 1 cs 352h: computer systems architecture lecture 1: what is computer architecture and why...

Lecture 1 1

CS 352H: Computer Systems Architecture

Lecture 1: What is Computer Architecture and why should I care?

Professor Emmett Witchel

University of Texas at Austin

witchel@cs.utexas.edu

Lecture 1 2

• Understand the “how” and “why” of computer system organization– Instruction Set Architecture– System Organization (processor, memory, I/O)– Microarchitecture– Virtualization

• Learn methods of evaluating performance– Metrics & benchmarks

• Learn how to make systems go fast– Pipelining, caching– Parallelism (ILP, DLP, TLP)– Application specific architectures (graphics, signal proc.)

• Preview of where architecture is heading

Lecture 1 3

Logistics

Lectures T/Th 12:30-2:00pm, PAI 3.14Instructor Prof. Emmett Witchel, W 1:15-2:15TA Shalini Sahoo MW 11:30-1:00pm PAI 5.38 Desk1

Grading see web page

Texts Hennessy & Patterson, Computer Organization and Design (Fourth

Edition)Including CDRevised Fourth Edition preferred, not required

Lecture 1 4

CS352H Online

URL: www.cs.utexas.edu/users/witchel/CS352H

I will occasionally email you via blackboard and by your registered email address. I expect this channel to be reliable and timely.

discussion group: via blackboard login at courses.utexas.edu

General, Homeworks, Project

Computer Architecture Seminar Series:

www.cs.utexas.edu/users/cart/arch

Lecture 1 5

Assignment for Next Tuesday

• Turn in student survey forms, if you want• Read the Moore paper (see webpage)

– Write a review of 1/2-1 page (see syllabus)– Review should include

• Summary of content of paper• Your observations on the most interesting/important

aspects• Your observations on its relevance today

– Be prepared to discuss on Tuesday in class

Discussion

• Are you interested in taking this course?• One question about computer science• One question about computer architecture

CS352HFall 2007

Lecture 1 6

Lecture 1 7

Specification

Program

ISA (Instruction Set Architecture)

microArchitecture

Transistors

Physics/Chemistry

compute the fibonacci sequence

for(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];}

load r1, a[i];add r2, r2, r1;

registers

Arch vs. µarch

Lecture 1 8

CS352H Topics

• Technology Trends• Instruction set architectures• Pipelining• Modern pipelined architectures

– Dynamic ILP machines– Static ILP machines

• Cache memory systems• Virtual memory• Multiprocessors• Computer system implementation

Making This Class Work For You

• Plus and minus grades• Clickers

CS352HFall 2007

Lecture 1 9

Lecture 1 10

What is Computer Architecture?

Technology

ApplicationsComputer Architect

Interfaces

Machine Organization

Measurement &Evaluation

Lecture 1 11

Technology Constraints

• Yearly improvement– Semiconductor technology

• 60% more devices per chip(doubles every 18 months)

• 15% faster devices(doubles every 5 years)

• Slower wires– Magnetic Disks

• 60% increase in density– Circuit boards

• 5% increase in wire density

– Cables• no change

>100x more devices since 198910x faster devices

90nm130nm1000nm

350nm250nm

Lecture 1 12

Changing Technology leads to Changing Architecture

• 1970s– multi-chip CPUs

– semiconductor memory very expensive

– microcoded control

– complex instruction sets (good code density)

• 1980s– single-chip CPUs, on-chip

RAM feasible

– simple, hard-wired control

– simple instruction sets

– small on-chip caches

• 1990s– lots of transistors

– complex control to exploit instruction-level parallelism

• 2000s– even more transistors

– Power wall

– Transition to CMPs

– Multi-level caches

• 2010s– Embedded vs. Desktop vs.

Data center (cloud)

– New storage (PCM, flash)

– Simpler cores and lots of them

– Optimizing for power

Lecture 1 13

Intel 4004 - 1971

• The first microprocessor

• 2,300 transistors• 108 KHz• 10m process

Lecture 1 14

Some Recent Chips!

Intel Pentium IV

• 42 million transistors

• 4GHz

• 0.13m process

• Could fit ~15,000 4004s on this chip!

NVidia - GeForce 6800• 222 million transistors• 400MHz• 0.13m process

Intel Itanium II (Montecito)• 1.7 billion transistors• 1.6 GHz• 90nm process

IBM Cell• 8 vector processors + 1

PPC• 4 GHz• 90nm process

Intel’s net revenue was around $35 billion a year for most of the aughtsR&D about $5 billion a year

CS352HFall 2007

Lecture 1 15

Any Architecture You Want (as long as it is x86)

Lecture 1 16

Application Constraints

• Applications drive machine ‘balance’

– Numerical simulations• floating-point performance• main memory bandwidth

– Transaction processing• I/Os per second• integer CPU performance

– Decision support• I/O bandwidth

– Embedded control• I/O timing, power

– Media processing• low-precision ‘pixel’

arithmetic

Lecture 1 17

Application-Driven Architectures

• General purpose - good performance on “all” programs– x86 family, ARM, powerPC, etc.

• Application specificity can focus on:– Types of concurrency available– Domain of deployment (server, handheld, desktop)

• Today - overview of graphics processors– Interface (instruction set architecture - ISA)– Processor organization– Concurrent elements

Apple’s iPad/iPhone4 Powered by A4 Chip

• A4 is modified ARM Cortex run at 1GHz– Integrated processor, graphics, memory controller

• Among other claims, ARM says the processors gets a near "25 percent processing power boost, even at same processor speed, from the use of a new instruction pipelining system." – We will cover pipelining in this class.

• Claim: 10 hours of 1024x768 video at 25W• Let’s look at the Freescale i.MX51

CS352HFall 2007

Lecture 1 18

Performance: Latency and Throughput

• Latency: time to complete an operation• Throughput: work completed per unit time• Consider plumbing

– Low latency: turn on faucet and water comes out– High bandwidth: lots of water (e.g., to fill a pool)

• What is “High speed Internet?”– Low latency: needed to interactive gaming– High bandwidth: needed for downloading large files– Marketing departments like to conflate latency and

bandwidth…

Relationship between Latency and Throughput

• Latency and bandwidth only loosely coupled– Henry Ford: assembly lines increase bandwidth without

reducing latency

• My factory takes 1 day to make a Model-T ford.– But I can start building a new car every 10 minutes– At 24 hrs/day, I can make 24 * 6 = 144 cars per day– A special order for 1 green car, still takes 1 day– Throughput is increased, but latency is not.

• Latency reduction is difficult• Often, one can buy bandwidth

– E.g., more memory chips, more disks, more computers– Big server farms (e.g., google) are high bandwidth

What is cloud computing?

• Cloud computing is where dynamically scalable and often virtualized resources are provided as a service over the Internet (thanks, wikipedia!)

• Infrastructure as a service (IaaS)– Amazon’s EC2 (elastic compute cloud)

• Platform as a service (PaaS)– Google gears– Microsoft azure

• Software as a service (SaaS)– gmail– facebook– flickr

Thanks, James Hamilton, amazon

Lecture 1 23

Graphics has dedicated chip in PCs

Memory

Input/Output Glue Chip(“South Bridge”)

GraphicsProcessor

Memory Controller Chip(“North Bridge”)

Memory

Disk, Keyboard, PCIe, etc.

582 Milliontransistors

681 Milliontransistors(GeForce 8800, 90nm)

(AGP, PCIe) (Intel “Kentsfield” quad core,

QX6700, 65nm, two dies, 8MB L2$)

Lecture 1 24

GPU/CPU Performance comparison

G80 = GeForce 8800 GTX

NV40 = GeForce 6800 Ultra

NV35 = GeForce FX 5950 Ultra

NV30 = GeForce FX 5800

Source: NVIDIA (except CELL and Core2 Quad)

* IBM Cell ~200 GFlops

Core 2 Quad 3GHz, 96 GFLOPS *

CS352HFall 2007

Lecture 1 25

Why a dedicated processing chip?

• 1) Specialization – becoming less important with time• 2) Parallelism – becoming more important

Graphics processors are the only highly-parallelprocessors in every desktop machine.

128 “processors”* 2 FLOPS@ 1.35 GHz

You can program them!

Lecture 1 26

Graphics requires programmability

void normalmapped(float2 normalMapTexCoord : TEXCOORD0,void normalmapped(float2 normalMapTexCoord : TEXCOORD0, … … out float4 color : COLOR, out float4 color : COLOR, uniform float ambient, uniform float ambient, …) …){{ float3 normalTex, …; float3 normalTex, …; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; … … diffuse = saturate(dot(normal, normLightDir); diffuse = saturate(dot(normal, normLightDir); … … color = Kd * (ambient + diffuse ) + color = Kd * (ambient + diffuse ) + Ks * pow(specular, specularExponent; Ks * pow(specular, specularExponent;}}

Every application does something a bit different.

Example Cg “shader” program (invoked like a “callback” function):

Lecture 1 27

GeForce 8800

Lecture 1 28

Next Time

• Performance evaluation• Basic computer organization• How chips are made• Start in on instruction set review/overview

• Always check web page for assignments

lecture 1 1 cs 352h: computer systems architecture lecture 1: what is computer architecture and why...

computer organization

chip doubles

computer scienceone

chip ram feasiblesimple

cachingparallelism ilp

pm pai

technology leads

paperyour observations

Documents

operating system transactions donald e. porter, owen s....

emmett witchel junghwan rhee krste asanović university of...

teaching medical students to generate their own pedagogic...

chris rossbach, jon currey, microsoft research mark...

mondriaan memory protectionwitchel/pubs/witchel-phd.pdf ·...

mondriaan memory...

concurrent programing: motivation, theory, practice emmett...

cs 352h: computer systems architecture

instructor: professor emmett witchel

emmett witchel krste asanović mit lab for computer science

copyright by donald elliott porter...

cs:app chapter 4 computer architecture instruction set...

increasing and detecting memory address congruence sam...

security and privacy a modern perspective emmett witchel...

virtual memory:...

Раздел 1 информация - almet.ru ·...

slide 1 don porter cs 380s tocttou attacks some slides...

chris rossbach, microsoft research jon currey, microsoft...

cs 352h: computer systems...

christopher j. rossbach, owen s. hofmann, emmett witchel ut...