understanding and optimizing the performance of internet-based systems ben zorn performance...

44
Understanding and Optimizing the Performance of Internet- based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research Center (PPRC) Microsoft Research

Upload: emma-stanley

Post on 11-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

Understanding and Optimizing the Performance of Internet-based

Systems

Ben ZornPerformance Monitoring and Analysis GroupProgrammer Productivity Research Center

(PPRC)Microsoft Research

Page 2: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

2

Who “We” Are

Performance Monitoring and Analysis Group Part of PPRC (directed by Amitabh Srivastava) Developers, testers, and researchers Recently formed with emphasis on .NET systems

Approach Provide solutions to MS product teams through

ideas, technologies, tools, and prototypes Actively participate in the external research

community through papers, leadership in professional community and grants

Page 3: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

3

My Dad’s View of an Internet System

“There’s a little person inside.”- George Zorn

“Any sufficiently advanced technology is indistinguishablefrom magic” - Arthur C. Clarke

Page 4: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

4

Outline

Introduction and motivation “Powers-of-ten” drill down

A framework to attack the problem Specific examples from a case study:

Optimizing the memory hierarchy

Page 5: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

5

Why Performance is Interesting and Hard

Context Internet systems are probably the most complicated

artifact ever created by man They are currently immature

Improvements possible in 3 areas Functionality, correctness, and performance

My focus is on performance Efficiency is a central theme of the Internet revolution

Easier / cheaper to get to information, give information, and make informed decisions

Page 6: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

6

Inspired by the film “Powers of Ten” by Charles and Ray Eames They looked at 38 orders of magnitude

(local galactic group down to proton in a nucleus)

We’ll drill down into computer abstractions Consider different logical abstraction layers

Distributed System “Powers of Ten”

Page 7: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

7

Back to My Dad…

What really goeson in there?

Page 8: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

8

Network “Cloud” View

client

MSNserverInternet

Dad

ISP

modemlink

- distinct roles- differentiated components

- Less than 7 items to remember- We seem to “get it”

1

2

Page 9: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

9

Expanded Cloud View

clientDad

MSN

streamingmediaISP

DNSresolution

router

servercluster

1

2

3

45

Page 10: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

10

Inside the Web Site

… databaseservers(back ends)

Webservers(front ends)

IP “director”

interconnectiontopology 1 2

… …

Page 11: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

11

Inside a Web Server

devicedriver

devicedriver

networkprotocolstack

filter, parserequest

get staticpage

ExtensionAPI

generateHTML

serverextension

DB API

Web server program

operating system

DBserver

2

3

1 54

Page 12: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

12

Inside the Server Extension

enter

call checkData

call SQL API

datavalid?

return

TF

proc checkData…load rx, addr 36use rx…load rx, addr 110use rx…return

1

2

Page 13: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

13

Inside the Memory Hierarchy

CPU L1 cache

L2 cache

Main Memory

Virtual Memory (Disk)

load

1

2

Page 14: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

14

Inside the CPU

Diagram courtesy of Artur Klauser

1

2

4

3

3

Page 15: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

15

A Sea of Gates

Image from the Computer Info Centerhttp://bwrc.eecs.berkeley.edu/CIC/

Photo of a Pentium die

???

So what does Dad thinkabout all this?

What can he or anybodydo about performance?

Page 16: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

16

Outline

Introduction and motivation A framework to attack the problem

Resource management and optimization Data collection, analysis, and action

Specific examples from a case study:Optimizing the memory hierarchy

Page 17: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

17

Information is Essential

Optimization is really resource allocation Allocation requires good decision

making Time / space trade-off Where should data be stored, cached?

Challenges What information do we need? How do we get it? How do we manage it? What does it mean?

Page 18: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

18

What Information Can We Get?

•Tag “interesting” events (like FedEx tracks packages)

• Associate time/resources with events

• Accumulate and analyze data

Event repositorytime,id

Page 19: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

19

Information Management is Essential

It is easy to gather too much data Our capacity to generate data follows

Moore’s Law Data without context is less valuable

How do we related data gathered with problems experienced?

Systems change (new builds daily) Our abstractions are immature, current

approaches are ad hoc Data mining is a large potential opportunity

Page 20: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

20

Example: Netmon Monitoring Tool

Netmon provides info about network packets Netmon has a rich, extensible architecture

(parsing, reporting) Netmon provides data, but

management, analysis, etc. have to be layered on top of it

Example output:00000060 00 01 00 5F 00 00 5C 00 5C 00 52 00 45 00 44 00 ..._..\.\.R.E.D.00000070 2D 00 44 00 43 00 2D 00 32 00 37 00 2E 00 52 00 -.D.C.-.2.7...R.00000080 45 00 44 00 4D 00 4F 00 4E 00 44 00 2E 00 43 00 E.D.M.O.N.D...C.00000090 4F 00 52 00 50 00 2E 00 4D 00 49 00 43 00 52 00 O.R.P...M.I.C.R.000000A0 4F 00 53 00 4F 00 46 00 54 00 2E 00 43 00 4F 00 O.S.O.F.T...C.O.000000B0 4D 00 5C 00 49 00 50 00 43 00 24 00 00 00 3F 3F M.\.I.P.C.$...??000000C0 3F 3F 3F 00 ???.

Page 21: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

21

Conceptual Monitoring Framework

Sensors

Path Trace HW Perf Counters

Network Trace

StoreFilter

Intrusion Alerting

Leak Detector

Site Monitor

Event Bus

Tools

Actuators

Cluster

Detection

RebootSystem

WeeklyReport

Store

Management

Analysis

Page 22: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

22

Outline

Introduction and motivation A framework to attack the problem Specific examples from a case study:

Optimizing the memory hierarchy Hardware performance counters Vulcan: binary transformation infrastructure Daedalus: data locality optimization

Page 23: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

23

Parts of the Big Picture

Many different groups are working from similar frameworks Commercial efforts (e.g., Windows WMI) Many research efforts (e.g., Internet-scale caching)

I will focus on the lowest levels (CPU arch.) Hardware can generate 100 million events/sec. Data collection, reduction are significant problems Concretely illustrates different parts of approach:

Data gathering, data reduction, abstraction

Page 24: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

24

Optimizing the Memory Hierarchy

CPU

L1 cache

L2 cache

Main Memory

Virtual Memory (Disk)

load1-4 cyclesUOT=1 word 10-20 cycles

100 cycles

1,000,000 cycles

64 Kbytes, UOT=32 bytes

1 Mbyte,UOT=32 bytes

100 Mbytes, UOT=32 bytes

50,000 MbytesUOT=8192 bytes

UOT=Unit of Transfer

Page 25: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

25

Finding a Memory Problem

Problem Some loads take a long time,

but which ones?

Solution Hardware vendors provide

performance counters Counters can be read, also

interrupt processor Causing interrupts at costly

operations allows them to be recorded

proc Foo…load rx, addr 36use rx…load rx, addr 110use rx…return

This load takes too long

This use of rx is what stalls

Page 26: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

26

Exposing Performance Information

CPU

L1 cache

L2 cache

Main Memory

Virtual Memory (Disk)

load

257

1,346

15,304

257

15,304C1

C2

addr 36

addr 110

addr 60

addr 116

performancecounters(L1 hits, L2 misses)

Page 27: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

27

Extracting More Information

New Problem Why was I calling procedure Foo? What fraction of the total time did I spend in

Foo?

Solution: Binary transformation Program API to transform binary code Calls to arbitrary routines can be “spliced” in PPRC Vulcan infrastructure [Srivastava et al. ‘00]

X86, IA64 binaries Instrumentation can be added on-the-fly

Page 28: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

28

Example Transformation

proc Foo…load rx, addr 36use rx…load rx, addr 110use rx…return

As code is executing, transform:

This:proc Foocall probe_enter_Foo()…load rx, addr 36use rx…load rx, addr 110use rx…call probe_exit_Foo()return

To this:

Page 29: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

29

• Hierarchical interface to structure of binary

foreach procedure… foreach basic block foreach instruct…• Calls to arbitrary functions can be inserted anywhere

How Vulcan Works

proc Foo proc Bar

Block 1

Block 2

useload

useload

Program

Block 1

Block 2

moveshift

multadd

… …

call probe_enter_Foo

call probe_exit_Foo

Page 30: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

30

Vulcan Tricks

Optimization Example:Instruction Scheduling

If “load rx” takes 100 cycles, find useful work to do between load and use

Other Vulcan uses: Code obfuscation Binary matching Software watermarking Software testing tools

Coverage Fault injection

proc Foo…load rx, addr 36

useful work notdependent on rxinserted here

use rx…load rx, addr 110use rx…return

100cycledelay

Page 31: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

31

New Abstractions are Central to Success

Problem How to reorganize data for better locality?

Context Code reorganization is well understood

because code structures are static But… OO data structures are dynamic

Solution New abstraction: sequences of “hot” objects Daedalus Project [Chilimbi PLDI ’01]

Page 32: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

32

Revisiting the Memory Hierarchy

CPU L1 cache

L2 cache

Main Memoryobj B

obj A

obj C

obj D

obj E

obj F

obj G

obj H

Goal: place“hot” objectscloser to CPU

Constraint:assume UOT = 2 objects

obj

obj

obj

obj

obj

obj

obj

obj

Virtual Memory (Disk) obj

obj

obj

objLoad sequence: A F B C A F E E A F B C…

Page 33: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

33

Potential for Performance Improvement

0

20

40

60

80

100

Nor

mal

ize

mis

s ra

te

Base Perfect Optimization

Page 34: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

34

Daedalus Project

Analyze locality Represent very large streams of references

(SEQUITUR algorithm [Nevill-Manning, Witten ‘97] ) Define new abstractions (hot data streams)

Exploit locality Build customized heap allocators (malloc/new) Insert prefetching instructions (PIII, etc. support) Data restructuring tools

Goal: Analyze and exploit data locality

Page 35: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

35

SEQUITUR (Example)

aaabac aaabac aaabac aaabac aaabad aaabad aaabad aaabad aaabad aa

SEQUITURSEQUITURS -> BBDDCaa

A -> aaabac

B -> AA

C -> aaabad

D -> CC

S

C

BD

a b c d

A

SEQUITUR Grammar

DAG

representation

of grammar

Page 36: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

36

Locality Analysis

Pruning

Program Execut

e

S

B

A C

a b c d

Whole Program Streams

Program data

reference trace

SEQUITUR

Hot data stream

analyses

S

B

A

a b dHot Program

Streams d b a

a b

Hot Data Streams

Page 37: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

37

Daedalus Highlights

Data reference representations 100 to 10,000 times smaller than data reference trace

Data restructuring recommendations Improved execution time of several programs by 8—

15% with small header file modifications Custom heap allocators

Automatically reduced working set size by up to 40% and TLB misses by up to 90%

In-progress Automatic prefetching, smart copying garbage

collection, scalability optimizations, dynamic on-line optimizations

Page 38: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

38

What’s This Got to Do with the Internet?

Approach remains the same Record interesting behavior (e.g. network

packets) Reduce large data volumes

Compression, summarization, presenting differences, etc.

Find interesting patterns that correspond to performance (security, correctness) issues

Display information using visualizations / abstractions that match the problem domain

An easier problem? It will take time to know for sure…

Page 39: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

39

Summary

What’s my Dad to think? Internet Systems are usable today…but

extremely complex Ability to understand existing systems is

immature Technology still rapidly changing,

following Moore’s Law curve

But… Microsoft’s .NET initiative sets the

stage for our opportunity and challenges

Our approach is pragmatic, effective

Page 40: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

40

More Information

Related Resources MSR, PPRC, PMA:

http://research.microsoft.com/pprc/pma.asp Vulcan

Srivastava et.al., “Binary Transformation in a Distributed Environment”, MSR Technical Report

Srivastava, “Emerging Opportunities for Binary Tools”, Keynote Talk, WBT 2000, October 2000.

Daedalus Project http://research.microsoft.com/users/trishulc/Daedalus.

htm Chilimbi, "Efficient Representations and Abstractions for

Quantifying and Exploiting Data Reference Locality", PLDI 2001, June 2001.

Contact me: [email protected]

Page 41: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

41

Backup Slides

Page 42: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

42

The Process of Optimizing Performance

Where’s the bottleneck?Who is at fault? How to find out?

What tool to use? How to use? How to understand?

Will my effort be worth it?

I’m happy now…but what about next time?

Suppose performance is poor…

Page 43: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

43

A Framework for Monitoring Systems

Goals Collect data at all system levels Approximate continuous monitoring closely

Component Classes Sensors (gathering data) Management (communicate, summarize,

store) Analysis (recognizing patterns and

relationships) Tools (human feedback) Actuators (take action directly)

Page 44: Understanding and Optimizing the Performance of Internet-based Systems Ben Zorn Performance Monitoring and Analysis Group Programmer Productivity Research

44

Reference Skew (Code Vs. Data)

Addr / PC reference skew

0

20

40

60

80

100

120

0 2 4 6 8 10

% of addr/ load-store pc

% o

f p

rog

ram

dat

a re

fere

nce

s

twolf-Addr

twolf-PC

perlbmk-Addr

perlbmk-PC

eon-Addr

eon-PC

mcf-Addr

mcf-PC

sqlserver-Addr

sqlserver-PC