performance instrumentation beyond what you do now

Post on 16-Jun-2015

719 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Performance Instrumentationbeyond whatyou do now

Cary Millsapcary.millsap@method-r.com

Percona Performance ConferenceSanta Clara, California9:00a–9:55a Thursday 23 April 2009

2

Introductions

3

Cary Millsap carymillsap.blogspot.com cary_millsap

4

1986

1989

1999

2008

4

1986

1989

1999

2008

SoftwareDeveloper

and

PerformanceAnalyst

5

6

Method R Corporationhttp://method-r.com

7

What we do at Method R Corporation…

• Write code for you• Troubleshoot performance problems• Teach you how to do what we do• Write software tools that make your work easier

8

Thinking clearly about performance

9

Performance is HARD

10

“Our users say that everything is slow, but I

don’t know where to begin.”

11

“Our users are complaining,but all our dials are green.”

12

A story.

13

In the beginning...

(1989: Oracle 6.0.26)

14

“Tuning” was…

15

bstat.sql...

estat.sqlreport.txt

16

16

V$DB_OBJECT_CACHE

V$FILESTAT

V$LATCH

V$LIBRARYCACHE

V$LOCK

V$OPEN_CURSOR

V$PARAMETER

V$PROCESS

V$ROLLSTATV$ROWCACHE

V$SESSION

V$SESSTAT

V$SQLV$SQLTEXT

V$TIMER

V$TRANSACTION

V$WAITSTAT

V$SESS_IOV$SYSSTAT

V$FIXED_VIEW_DEFINITION

ps

sar

vmstat

iostat

netstat

pstat

nfsstat

17

People looked for “bad numbers.”

18

Inefficiencies.

19

But how can you know what causes a specific task to be

slow?

20

21

21

It's latches

21

It's latches

It'sI/O

21

It's latches

It'sI/O

It's always I/

O

21

It's latches

It'sI/O

It's always I/

O

It'sbad SQL

21

It's latches

It'sI/O

It's always I/

O

It'sbad SQL It's

always bad SQL

21

It's latches

It'sI/O

It's always I/

O

It'sbad SQL It's

always bad SQL

There's not

enough memory

21

It's latches

It'sI/O

It's always I/

O

It'sbad SQL It's

always bad SQL

There's not

enough memory

There's never

enough memory

22

My problem…

23

How can you possibly

know that?

24

Reminded me of…

25vailroger.googlepages.com/orionconstellation

26

You do see it...

Right?

27vailroger.googlepages.com/orionconstellation

27vailroger.googlepages.com/orionconstellation

28

But who says

thatis what you have to see?

29

29

30

Why not?

31

Performance is hard.

32

A good pilot makes it look easy.

—Van R. Millsap1936–2004

33

Performance is EASY

34

How?

35

It’s the

user’s experience

that matters.

36

37

A user’s performance experienceconsists of two elements…

38

1. a task2. time

39

Task

40

The things we used to “computerize”… tasks.http://olathe.lib.ks.us/images/Image/Computer%20User.jpg

41

A task is a business unit of work.

• Post to the General Ledger• Enter an order• Look up a book by author

42

Tasks can nest.

Posting

PO AP AR … FA

42

Tasks can nest.

• Print Addresses is a task

Posting

PO AP AR … FA

42

Tasks can nest.

• Print Addresses is a task• Print Address #42 is a

(sub)task

Posting

PO AP AR … FA

42

Tasks can nest.

• Print Addresses is a task• Print Address #42 is a

(sub)task

Posting

PO AP AR … FA

42

Tasks can nest.

• Print Addresses is a task• Print Address #42 is a

(sub)task

• Often, a program is a taskPosting

PO AP AR … FA

42

Tasks can nest.

• Print Addresses is a task• Print Address #42 is a

(sub)task

• Often, a program is a task• Often, a tiny part of a

program is a taskPosting

PO AP AR … FA

43

Tasks are it.

Business people don’t care about the “system” except

through execution of the tasks that make up their business.

44

Tasks are it.

Tasks are what system owners care

about.

45

Time

46

Performance is about time.

47

How fast: “Daddy, can your car go 500 miles?”He meant “500 miles per hour.”To talk about performance (speed), you have to talk about time.

48

Two ways to measure performance…

49

49

tasks per time

49

tasks per time(that’s throughput)

49

tasks per time(that’s throughput)

49

tasks per time(that’s throughput)

time per task

49

tasks per time(that’s throughput)

time per task(that’s response time)

50

Throughput and response time…

50

Throughput and response time…

• Throughput (X)– The tasks-per-time way– Number of task executions completed in a given duration

• “orders/second”

50

Throughput and response time…

• Throughput (X)– The tasks-per-time way– Number of task executions completed in a given duration

• “orders/second”

50

Throughput and response time…

• Throughput (X)– The tasks-per-time way– Number of task executions completed in a given duration

• “orders/second”

• Response time (R)– The time-per-task way– Elapsed duration of an execution of a given task

• “seconds/order”

51

51

X = 1/R

51

X = 1/R

51

X = 1/R

(kind of)

52

Average throughput is the inverse of average response time.

52

Average throughput is the inverse of average response time.

X = 1,000 txn/sec?

52

Average throughput is the inverse of average response time.

X = 1,000 txn/sec?

Then R = (1 sec)/(1,000 txn) = .001 sec/txn

But…

53

53

…Adding load to createhigher throughput

changes response time.

54

…Which leads to a whole ’nother conversation I’d loveto have with you some other time.

55

Sequence Diagram

56

RA

A simple way to view response time is witha UML sequence diagram.

http://www.websequencediagrams.com

57

RA

More complicated systems have nested levels ofsuppliers and consumers.

RB

http://www.websequencediagrams.com

58

RUser

The tiers represent the way your system is constructed.

http://www.websequencediagrams.com

59

RUser

This sequence diagram shows the complicated interactions among consumers and suppliers.

http://www.websequencediagrams.com

60

The sequence diagram is a

good conceptual tool.

61

But when you need to analyze thousands of calls,you need something else.

62

Profile

63

A profile is a complete account of a task’s response time.

Response time (seconds)

# Calls R/call (seconds)

Call name0.769 50.3% 5,003 0.000154 unaccounted-for between

dbcalls0.393 25.7% 5,010 0.000078 SQL*Net message from client0.381 24.9% 5,013 0.000076 CPU service, execute calls0.090 5.9% 11 0.008194 CPU service, prepare calls0.027 1.8% 1 0.027396 log file sync0.008 0.5% 5,010 0.000002 SQL*Net message to client0.000 0.0% 9 0.000000 CPU service, fetch calls

–0.138 –9.1% 5,031 –0.000028 unaccounted-for within dbcalls1.530 100.0% Total

64

You’ve done this before,if you’ve ever used…

gcc –pg …; gprof …java –prof …; java ProfilerViewer …

perl –d:Dprof …; dprofpp …dbms_monitor.session_trace_enable(…); p5prof …

65

Profile

• Full account of response time– Spanning (sum ≮ R)– Non-overlapping (sum ≯ R)

• Sorted by descending R• Useful dimension

– Flat profile– Call graph

• Contributions as %R• Duration per call

Mean, minimum, maximum, …Skew

• Drill-downIndividual call level of detailMaybe even deeper

66

Response Time

67

To optimize throughput, you

must analyze response time.

68

(Proof)

68

(Proof)

You cannot optimize X for a task that’s inefficient.

68

(Proof)

You cannot optimize X for a task that’s inefficient.

68

(Proof)

You cannot optimize X for a task that’s inefficient.

You cannot measure a task’s efficiency without measuring its R.

68

(Proof)

You cannot optimize X for a task that’s inefficient.

You cannot measure a task’s efficiency without measuring its R.

68

(Proof)

You cannot optimize X for a task that’s inefficient.

You cannot measure a task’s efficiency without measuring its R.

Therefore, to optimize X, you must first analyze R.

69

The universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.

—Donald Knuth

70

(Programmers aren’t very good at guessing where their code spends time.)

71

To optimize performance (throughput or response time),

people need profiles.

72

Performance is EASY

73

Performance is easy if you can

stop guessing where your code is slow.

74

When you have profiles for task response times, performance

problems cannot hide from you.

75

Some surprising things I’ve learned by measuring R…

76

Disk I/O is often less important

than people think.http://carymillsap.blogspot.com/2009/04/cary-on-joel-on-ssd.html

77

Common performance problems:

77

Common performance problems:

CPU

77

Common performance problems:

CPU

77

Common performance problems:

CPU

Network I/O

77

Common performance problems:

CPU

Network I/O

77

Common performance problems:

CPU

Network I/O

Software serialization

78

The point…

79

Your problems have nothing to do with experiences I’ve had.

So measure.

80

Finding what you need to see

81

How are you supposed to

create these profiles?

82

You have to insist on seeing where time goes for any task you think is important.

83

To drill down, you needcall-by-call data.

(NOT data about aggregations of calls.)

84

In Oracle, we do it with a feature called extended SQL tracing.

• For Developers: Making Friends with the Oracle Database for Fast, Scalable Applications– Cary Millsap

http://method-r.com/downloads/doc_details/10-for-developers-making-friends-with-the-oracle-database-cary-millsap

• Optimizing Oracle Performance– Cary Millsap with Jeff Holt

85

The stuff you need…

86

Feature (attribute) Oracle MySQL App tierTask identification yCall-by-call coverage 98%+DB call begin sequence partly derivableDB call begin time partly derivableDB call end time yDB call context info yOS call begin sequence partly derivableOS call begin time derivableOS call end time yOS call context info yCall SQL context yCall CPU (sys mode) -Call CPU (usr mode) -Call CPU (total) ySQL execution plans y

87

Recap

88

Here’s what I hopeyou take away today…

89

Performance is abouttime and tasks.

90

If you’re interested in performance, then

read Goldratt’s The Goal.

91

91

Don’t guess; you’re probably wrong.

91

Don’t guess; you’re probably wrong.

Measure response timebefore you optimize anything.

91

Don’t guess; you’re probably wrong.

Measure response timebefore you optimize anything.

Insist on it.

92

Performance is easy(and fun!)

when code measures its owntime and tasks.

93

top related