performance in the age of trustworthy computing fileben zorn, pprc caches hide many cycles of...

45
Ben Zorn, PPRC Performance in the Age of Trustworthy Computing Ben Zorn PPRC Microsoft Research

Upload: buikhuong

Post on 23-Apr-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Performance in the

Age of Trustworthy Computing

Ben Zorn

PPRC

Microsoft Research

Page 2: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Trustworthy Computing (TwC)

“Six months ago, I sent a call-to-action to Microsoft's

50,000 employees, outlining what I believe is the

highest priority for the company and for our industry

over the next decade: building a Trustworthy

Computing environment for customers that is as

reliable as the electricity that powers our homes and

businesses today.”

Bill Gates

Executive Memo, 7/18/02

(emphasis mine)

Trustworthy = secure, reliable, available, private, etc.

Page 3: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

TwC Research on the Rise

0

2

4

6

8

10

12P

ap

ers

Pu

bli

shed

1999 2000 2001 2002 2003

Year

PLDI Papers by Type

Mem Mgmt

Optimization

Analysis

Security, Reliability+

Page 4: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Proebsting’s Law and other Doubts

Moore’s Law states roughly that advances in hardware double computing power every 18 months

“Compiler Advances Double Computing Power Every 18 Years”- Todd Proebsting, Microsoft Research

“Perhaps this means Programming Language Research should be concentrating on something other than optimizations. Perhaps programmer productivity is a more fruitful arena.”http://research.microsoft.com/~toddpro/papers/law.htm

Other doubts about performance and optimization research

“Is Code Optimization Research Relevant?”Bill Pugh, U. Maryland

“Systems Software Research is Irrelevant” Rob Pike, Bell Labs

Page 5: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Exponential Growth is Hard to Beat…

Relative CPU Performance (log scale)

1

10

100

1000

10000

100000

1000000

Yea

r

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

Year

Re

lati

ve

Pe

rfo

rma

nc

e

CPU

Data from Computer Architecture: A Quantitative Analysis (3rd ed.) by Hennessy and Patterson

Page 6: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Performance is Dead,

Long Live Performance!

A revolution is happening, but…

Performance is not a solved problem

Outline for rest of talk

The Memory Wall and Efforts to Climb It

Memory latency

Optimizing layout to reduce disk I/O

Challenges and Opportunities of Managed Code

Concurrency (I wish I had time…)

Page 7: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Revisiting Moore’s Law…CPU / Memory Performance Gap (log scale)

1

10

100

1000

10000

100000

1000000

Year

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

Year

Re

lati

ve

Pe

rfo

rma

nc

e

CPU Memory

Increases:

Memory = 1.07x / year

CPU = 1.55x / year (since 1986)

Data from Computer Architecture: A Quantitative Analysis (3rd ed.) by Hennessy and Patterson

Page 8: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Caches Hide Many Cycles of Latency

0

200

400

600

800

1000

1200

1400

1600

Pe

ak

In

st.

Fe

tch

es d

uri

ng

DR

AM

Acce

ss

Pentium

Processor

66MHz

Pentium-Pro

Processor

200MHz

Pentium III

Processor

1100MHz

Pentium 4

Processor

2000 MHz

Future CPUs

Data from Dileep Bhandarkar, Intel Architect, PACT 2002 Keynote Address

“Parallelism in Mainstream Enterprise Platforms of the Future”

Page 9: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

A Case Study – Optimizing Working Set

Relative cost of I/O is enormous

40,000,000+ cycles per page fault

Much user-perceived latency is disk-related

Overview

PPRC and our approach

Improving code locality via reordering with profiles

Results

Process considerations

Work of Hoi Vo’s Binary Technologies (BiT) group

Page 10: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

What is PPRC?

PPRC – Programmer Productivity Research Center

Amitabh Srivastava, Director

Focus on improving software development process

Areas: performance, correctness, compilation, tools

Approach

Build flexible infrastructure on which to layer tools, research

Build strong interactions with product teams by focused

solutions

Used knowledge of important problems to drive infrastructure

and further research

Successes

Vulcan – binary instrumentation

PREfix – static analysis for error detection

Page 11: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Code Does Matter

Desktop Application Working Sets

0

1000

2000

3000

4000

5000

6000

7000

Word XP Internet

Explorer

Excel XP

Application

4 K

Byte

Pag

es

Other Data

Mapped

Teb

Stack

Heap

Code/SData

Other System

PTE

Page 12: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Improving Code Locality

Basic idea – use profiles to direct code

placement

Separate hot/cold functions, basic-blocks

Impact at page level, cache level

Static data can be placed with code where used

Profile methodology

Separate instrumented build to gather profiles

Requires mechanisms to integrate profiles from

different scenarios, weight them

Impact on build process cannot be ignored

Page 13: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Function F

Function Reordering

Function A

Function C

Function D

Function E

Function B

Function F

Function A

Function D

Function B

Function C

Function E

Page Boundary

Page 14: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Results

Foxpro 8.0 Working-Set

0

50

100

150

200

250

300

350

400

0 5 10 15 20

Time

Pag

e F

au

lts

Function Reordering

Original Order

Page 15: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Function Separation

BB 1

BB 2

BB 3

BB 4

BB 5

BB 6

BB 1

BB 2

BB 4

BB 6

BB 3

BB 5

Live Code

Page 16: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Results

Foxpro 8.0 Working-Set

0

20

40

60

80

100

120

0 5 10 15 20

Time

Pag

e F

au

lts

Function Reordering

Func Ord + Separation

Page 17: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Making It Work for Real

Must be well integrated into build process Different for every major group

Vulcan technology key to widespread adoption

Time budget “Compile -> Profile -> Opt” process rarely fits within

time constraint

Profile rarely matches the same build

Multiple platform support

Serviceability Debuggers work after code reordering

Patching

Page 18: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Managing Profile Data

Organizing scenarios

Startup important in reducing delay

Phases associated with typical uses (print, spell

check, etc.)

Stale profile data

Collecting new profiles takes lots of time

Delaying the build cycle is unacceptable

Solution: profile propagation via binary matching

Most profile data remains similar between builds

Page 19: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Data Locality Research

Data presents additional challenges

Trishul Chilimbi – Daedalus Project

Goal – identify opportunities to improve data locality

and exploit

Contributions

Hot data streams data abstraction (PLDI’01)

Bursty-tracing measurement approach (FDDO’01)

Prefetching hot data streams (PLDI’02)

Runtime Analysis and Design (RAD) group http://research.microsoft.com/rad

http://research.microsoft.com/~trishulc/Daedalus.htm

Page 20: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

So What about TwC?

Question: What software technology is likely

to have the most impact on computing in the

next 5-10 years?

My answer: Managed code

Page 21: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

What is Managed Code?

Managed code* =

Code executed by the Common Language

Runtime (CLR)

Provides metadata to allow the CLR to

Locate methods encoded in assembly modules

Dynamic loading with interface type checking

Store and retrieve security information

Implement a security model

Handle exceptions

Walk the program stack

Garbage collect the heap

* As defined by .Net Glossary

back

Page 22: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Impact of Managed Code

Performance implications

Pointers = abstraction (less direct control)

GC has global properties

Runtime metadata continually present, referenced

Large, feature-filled class frameworks

Page 23: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Shift in Platform

Should most software be managed?

Historically, transitions from asm to C, C to C++,

and now C++ to Java / C#

Transition to Java / C# in progress but stalled

Where does most Java code run? Why?

Just a matter of time or technology?

Should most interfaces be managed?

Class libraries a start – what about OS APIs?

These are not hypothetical questions

Page 24: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Managed Code on the Client

Managed code research is mature…

Many Java implementation papers since 1995

SPECJVM benchmarks in widely used, cited

New GC research after 40+ years!

However

Increasing client-side managed code

Client-side performance issues less understood

Opportunities for research + product impact

Page 25: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

CLR Platform Research Opportunities

C# / CLR / .Net available, used on clients

Caveat: in transition 1.0 -> 1.1 -> Whidbey (1.2)

Sizeable applications written

HeadTrax (see next slides)

FxCop, clrprofiler (download from gotdotnet.com)

Rich profiling API exists in CLR, Windows

Hook calls, returns, allocations

Easy integration with Windows perfmon APIs, tools

clrprofiler written in C#, sources available

Page 26: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

The HeadTrax Experience Report HeadTrax study (Ovidiu Platon, July 2003)

Multi-tier internal MS app manages HR information

Client / server - focus on client experience

Client configuration: 128 Mb, 1 GHz CPU

Implementation Client written in C# with .Net Framework 1.1

Network interaction via web services and database APIs

Security important – strongly signed binaries, encryption

Preliminary numbers (startup) Cold start 23 seconds

Warm start 10 seconds

Report available at: http://gotdotnet.com/

Page 27: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

How they Improved Performance

Changes performed

Made web service calls asynchronous

Cache data locally

Lazy instantiation of proxies

Show UI before populating

Results: cold 23 -> 10 secs, warm 10 -> 8 secs

Changes proposed

Merge assemblies, DLLs

Merge threads

Use thread pool

Page 28: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

What We can Learn from This

10 seconds is still a long time to wait 1500 16+ Kb chunks read from disk at 6 ms / seek

Logical and physical organization are at odds E.g., 21 assemblies, 50 DLLs for 1 app

Databases figured this out long ago

Determining “correct” granularity is tough

What choices do systems provide? How easy to use?

Performance at odds with logical and physical isolation

XML serialization uses reflection, C# compiler

Eclipse faces many similar issues

Pre-JIT is important (what is it?)

Page 29: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

What is Pre-JIT (aka Ngen)?

Pre-JIT is ahead-of-time compilation

Generates high-quality native code

Reduces runtime checking required across interfaces

Opportunities for placement of code and static data

Ngen represents one choice in design space

Full runtime solutions not proven (esp. on client)

Best solution employs thoughtful integration of Compiler, load time, runtime organization and optimization

Any solution requires care in widespread deployment

Page 30: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Longhorn on the Horizon

MS Longhorn (OS after XP)

Details given in Oct 2003 (PDC conference)

Large components written in managed code WinFS – transactional file system

Avalon – managed UI + shell

Web Services

Managed APIs

Longhorn emphasis…

Increases availability of interesting managed apps

Increases potential impact of performance solutions

Page 31: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Managed Code Challenges

New overheads

I/O, Memory, CPU beyond SPECJVM issues

Complex mental model

Biggest performance improvements involve

human intervention

Managed code abstraction creates new developer

challenges

Page 32: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

I/O Overhead

Substantial overhead at startup and ongoing

Code, metadata, static data all important

Static nature enhances optimization opportunities

Disk and OS interaction cannot be ignored

HeadTrax warm start times highly variable

How useful is I/O data without a disk model?

OS / PL communities should get together on this

Who is considering placement on the disk?

Should startup be a 1st class research focus?

Why isn’t it now?

Page 33: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Memory Overhead

Memory footprint has broad implications

GC is only one aspect

Who is looking at / solving other problems?

What’s the memory cost of runtime ops?

How much space does JIT compiler, metadata, GC tables,

etc. take up?

What’s overall performance impact of footprint on client?

How to balance small program units versus memory

fragmentation?

Current pressure to merge units

Tools needed to expose issues and optimize

Page 34: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

CPU Overhead

Significant sources of CPU overhead

GC – thankfully, lots of research here

CPU overhead not currently on critical path for client

Exceptions – not as exceptional as one might

expect

Managed / unmanaged interface

Security model

Runtime checking

Page 35: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

What a Developer has to Think About

GC gotcha’s from Rico Mariani (April 2003)

Too many allocations

Too large allocations

Too many pointers (high connectivity)

Too many roots

Too many writes (esp. to older objects)

Too many almost long-lived objects

Reasoning about lifespans and promotions

Finalization

What tool support does a dev need or have?

Page 36: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Thoughts about the Future…

Performance space is getting trickier

Memory latency is bad, getting worse

Prediction, placement, compression only go so far

Chip design favors chip multiprocessors

Pentium 4 – 2 HW threads, Prescott 4? HW threads

Power 4 – 2 processor, Power 5 – 2 processors

w/ 2 threads each

Intel “core hopping” to balance temperature hot spots!

Design is and should be a research option

Page 37: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Where Could Managed Code Go?

How suitable for defining large-grain abstractions?

CLR has assemblies, Java has MJ, what else?

How suitable for defining OS?

Several Java attempts, any serious contenders?

Valuable exercise or waste of time?

Existing support for concurrency

Threads just too hard to get right? Alternative?

Better models for isolation and robustness?

How do we get there?

Page 38: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Summary

TwC (reliability, security) an important focus

Systems can and will get better

Performance challenges remain

Can always trade performance for other qualities

Memory latency threatens Moore’s Law

I/O performance a major challenge, underinvestigated

Increasing investment in managed code

Developer experience is still immature

Current research misses important challenges

Page 39: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Things to be aware of…

Phoenix research compiler infrastructure

Intended to be the basis of commercial compiler + research vehicle

Infrastructure for analysis, optimization at multiple compilation stages

Rotor (SSCLI) continues to be developed

Tracking Whidbey design changes

Increased awareness of performance requirements for research use

Second RFP funded

Page 40: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Additional Resources CLR Performance Info

http://gotdotnet.com/team/clr/about_clr_performance.aspx

Includes white papers, clrprofiler tool

FxCop

http://gotdotnet.com/team/fxcop/ PPRC

http://research.microsoft.com/pprc

Application info: http://research.microsoft.com/pprc/pprc-recruiting-2004.htm

Phoenix http://research.microsoft.com/phoenix

Rotor

http://research.microsoft.com/collaboration/university/europe/rfp/rotor/

http://sscli.net

Page 41: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

More things to be aware of…

PPRC now has link to Windows Org. Amitabh now Windows VP of Development

PPRC Groups Advanced Compiler Technology (ACT) – David Tarditi

Binary Technologies (BiT) – Hoi Vo

Runtime Analysis and Design (RAD) – Trishul Chilimbi

Reliability – G.S. Rana

Static Program Analysis (SPA) – Manuvir Das

Software Productivity Tools (SPT) – Sriram Rajamani

Testing, Measurement, and Verification (TMV) – Tom Ball

Applications for interns, fulltime hires requested by Feb 15, 2004

Page 42: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

Something to think about…

1

CPU

2

CPU

4

CPU

8

CPU

16

CPU …

Chip Multiprocessors are real

Today:IBM dual processor Power4HP dual processor PA-8800

2004:IBM / Sony “Cell” processor(speculated to have 4-16processors on a chip)

2010 ?The potential for these chipsis enormous!

Time is running out! Thank you…

back

Page 43: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

SPEC2000: eon vs mcf

Instr

ucti

on

s/s

ec =

1/(

CP

I*c

yc

le t

ime

)

P3

550 MHz

P4

2.7 GHz3.5

years

eon = 4.8x

speedup (57% /yr)

mcf = 1.9x

speedup (20% /yr)

Data gathered and reported by Trishul Chilimbi

Page 44: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

eon / mcf Differences

0%

25%

50%

75%

100%

252.eon 181.mcf MS app

No

rma

lize

d p

roce

sso

r cycle

s

Busy Other stall Memory stall

Data gathered and reported by Trishul Chilimbi

Page 45: Performance in the Age of Trustworthy Computing fileBen Zorn, PPRC Caches Hide Many Cycles of Latency 0 200 400 600 800 1000 1200 1400 1600 g ess Pentium Processor 66MHz Pentium-Pro

Ben Zorn, PPRC

FxCop – a Short Introduction

Managed app available on the Web

Checks conformance rules for .Net assemblies

(think “lint” for CLR)

Easy to make it do a lot of work

Presents performance challenges

Startup, memory footprint, CPU overhead

Keeps GC busy as well!

Lots of strings

Easy to get, I’m happy to demo + tools