krisztián flautner - [email protected] automatic monitoring for interactive performance and...

52
Krisztián Flautner - [email protected] Automatic Monitoring for Interactive performance and Power Reduction 1 Automatic Monitoring for Interactive Performance and Power Reduction Krisztián Flautner [email protected]

Post on 21-Dec-2015

235 views

Category:

Documents


0 download

TRANSCRIPT

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 1

Automatic Monitoring for Interactive Performance and Power Reduction

Krisztián [email protected]

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 2

Overview

• A mechanism for quantifying the user experience.– Metric: response time.– Automatic, no user program modifications required.– Run-time feedback to the kernel.

• Multiprocessing to improve response times.

• Slow down processor to save energy when response times are fast enough.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 3

Research contributions

• A metric (TLP) and a portable methodology for quantifying the amount of concurrency in a multiprocessor system.

• An automatic technique for detecting execution episodes that directly impact the user-perceived response times of interactive applications.

• Quantifying how much multiprocessing improves the responsiveness of interactive applications.

• An automatic mechanism for setting the optimum performance level of processors that support dynamic voltage scaling.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 4

Response time

• Faster is not always better.– Fundamental limit to what is perceptible to humans.

• Movies: 20-30 frames per second.• Perceptual causality: 50ms-100ms.• Dragging objects on screen: 200ms.• Non-continuous operation: 1-2sec.

The time it takes for the computer to respond to user initiated events.

The goal is to run fast enough to meet the perception threshold, no point to running any faster.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 5

Episode classification

• Interactive episodes– When the user is waiting for the computer to respond.

• Periodic episodes– Producer (e.g. MP3 player).– Consumer (e.g. sound daemon).

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 6

A utilization trace

Each horizontal quantum is a millisecond, height corresponds to the utilization in that quantum.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 7

Episode classification

Interactive (Acrobat Reader), Producer (MP3 playback), and Consumer (esd sound daemon) episodes.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 8

Mouse movement

X server updates screen every ~10ms. Update takes ~0.25ms.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 9

Interactive episodes

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 10

Interactive episodes can include idle time

Waiting for data from the network during a run of Netscape. Page rendering starts after 250ms.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 11

Finding interactive episodes

• One way: mouse click indicates start, long idle time indicates end.

– Not always accurate.

– Not all episodes are initiated by mouse click.

– Latency in finding the ends of episodes.

• Our approach: track inter-task communication.– Accurate.– Finds all interactive episodes.– No latency.– No program modifications required.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 12

Tracking interactive episodes

• Start of an interactive episode:– X server sends a message to another task.

• During interactive episode:– Keep track of communicating tasks (episode’s task set).

– Compute desired metrics.

• Conditions for ending the episode (applied to tasks in the episode’s task set):– No tasks are executing.

– Data written by the tasks have been consumed.

– No task was preempted the last time it ran.

– No tasks are blocked on I/O.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 13

Communication between tasksC P U 1C P U 0

89 5

75 7

75 7

75 77 78

7 78

8 89

89 5

75 7

20 88

75 7

R

R

R

W

W

W

W

W

W

W

C P U 1C PU 0

7 57

7 572 09 0

7 57

W

W

W

7 57 W

7 57W

7 57 W

7 57 W

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 14

Does multiprocessing improve interactive performance?

Metrics: Response time, thread-level parallelism (TLP).• Response time: duration of interactive episode.• Machine is idle when all processors are idle.• TLP: machine utilization when machine is not idle.

Results relevant to SMT, CMP processors.

Workloads: interactive desktop applications.

OS: Linux 2.3.99-pre3, Mandrake 7.1, glibc 2.1.3, XFree86 3.3.6.

Hardware: Dell Precision 410 Workstation: dual Pentium II 450Mhz, 512M RAM, Matrox Millennium II AGP 4M.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 15

Why use TLP?

0%

25%

50%

75%

100%

Benchmarks

Machine Utilization

Idle time

Automatedbenchmark runs

"Realistic"benchmark runs

Machine utilization only quantifies concurrency if there is no idle time during execution.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 16

Initial results

• Surveyed >50 desktop applications– BeOS, Linux and Windows NT.

• Lots of threads, but limited concurrency.– Multimedia, web: 1.2~1.4.– TLP is workload dependent. Photoshop: 1.23-2.36 TLP.– Java apps similar to Windows apps.

• Lots of idle time (often >80% of execution time).

• 4 processor machine is overkill (for apps other than make –j and parallel MPEG player).

Does TLP translate into improved response times?

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 17

Workloads and TLP results

Benchmark Version DescriptionDual processor Uniprocessor

TLPie TLPrun Idlerun Idlerun

Acroread 4.0 PDF viewer 1.20 1.19 88% 87%

FrameMaker 5.5.6b Document editor 1.35 1.33 93% 93%

Ghostview 3.5.8 PS and PDF viewer 1.42 1.39 84% 84%

GIMP 1.1.22 Image editor 1.26 1.24 88% 84%

Netscape 4.7 Web browser 1.34 1.28 90% 89%

Xemacs 21.1p8 Text editor 1.26 1.21 93% 92%

Average 1.31 1.27 89% 88%

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 18

Methodology

• All benchmarks run by a human– Non-intrusive automation is difficult.

• Repeated runs of the same workload are not identical.– Inexact repeat of mouse movement.– Different amounts of idle times between episodes.– Background activity.

• Average results of seven runs in each configuration.– Mouse clicks used to synchronize traces.– TLP identical, response time variance <3%.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 19

Response time improvement over uniprocessor

Benchmark TLPie

Response-time (TR) improvement

Acroread 1.20 15%

FrameMaker 1.35 22%

Ghostview 1.42 34%

GIMP 1.26 19%

Netscape 1.34 21%

Average 1.32 22%

Very little idle time (<1%) during interactive episodes.Max. possible response-time improvement is 50% on a dual-processor.TR improvement = 1 - TR(DP) / TR(UP) (expected to be close to 1 – 1 / TLP)

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 20

Background activity: MP3 playback

No MP3 MP3

Avg. TR improvement on dual-processor

22% 29%

TLPie 1.31 1.36

TLPrun 1.27 1.23

Uniprocessor Dual-processor

Avg. TR increase due to MP3 playback

14% 4%

P1 P2

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 21

Time above the perception threshold

0%

20%

40%

60%

80%

100%

50ms 100ms 150ms 200ms 250ms 300ms

Perception threshold

Tim

e

ab

ov

e t

he

pe

rce

pti

on

th

res

ho

ld

Acrobat Reader

FrameMaker

Ghostview

GIMP

Netscape

Time above the perception threshold is given as a percentage of time spent in all interactive episodes. Data is from the uniprocessor runs.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 22

Characteristics of Interactive Episodes

• Many interactive episodes are already fast enough.

• More will be imperceptible in the near future.– 200ms perception threshold today estimates work

done during 50ms 3 years from now.

• Faster is not necessarily better.– Human perception has finite resolution.

Slow down the processor!

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 23

Why bother?

386386

486 486

Pentium(R)Pentium(R)

MMX

Pentium Pro

(R)

Pentium II (R)

1

10

100

Max

Po

wer

(W

att

s)

?

So

urc

e:

Inte

l

Higher performance = increased power consumption.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 24

Power Density!

1

10

100

1000

Wat

ts/c

m2

Hot plate

Nuclear Reactor

RocketNozzle Sun’s

Surface?

So

urc

e:

Inte

l

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 25

Dynamic Voltage Scaling

• Voltage is proportional to the frequency.

• Reduce frequency (and corresponding voltage) to match performance demands.

• Since reduced frequency implies increased execution time, energy is proportional to v2.

Power = Capacitance • voltage2 • frequency

Energy ~ voltage2

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 26

Processors supporting DVS

lpARM Intel SA-1100Transmeta

Crusoe 5600Intel XScale

Intel XScale Demo

Min.

8Mhz

1.1V

1.8mW

59Mhz

0.79V

106mW

500Mhz

1.2V

~1W

150Mhz

0.75V

40mW

150Mhz

0.75V

40mW

Max.

100Mhz

3.3V

220mW

251Mhz

1.65V

964mW

700Mhz

1.6V

~2W

800Mhz

1.5V

900mW

1000Mhz

1.75V

1.45W

Process 0.6 0.35 0.18 0.18 0.18

Max/min energy

9 4.4 1.8 4 5.4

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 27

Some recent desktop processors

Intel Pentium IV Intel Pentium IIIAMD Athlon

Model 4MPC 7450

Core 1.4Ghz @ 1.7V500Mhz @ 1.35V

733Mhz @ 1.65V

650Mhz @ 1.75V

1.2Ghz @ 1.75V

533Mhz @ 1.8V

667Mhz @ 1.8V

I/O 400Mhz100Mhz, 133Mhz

3.3V

200Mhz, 266Mhz

1.6V

133Mhz

1.8V-2.5V

Process 0.18 0.18 0.18 0.18

Max. Power

66.3W12W

19.1W

38W

66W

17W

19.1W

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 28

Small performance reduction = big energy savings

20% performance reduction = 32% energy reduction40% performance reduction = 55% energy reduction

0

0.4

0.8

1.2

1.6

2

0 200 400 600 800 1000 1200

Frequency (Mhz)

Vo

ltag

e (V

)

0

0.2

0.4

0.6

0.8

1

En

ergy facto

r

Graph based onIntel XScale data

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 29

The key: performance-setting algorithm

• Use episode detection and classification.– Interactive episodes.– Periodic episodes (producer and consumer).

• Performance-setting on a per episode basis.

• Stretch episodes to their deadlines.– Interactive episode: perception threshold.– Stretch producer to consumer.

No modification of existing programs needed.Works with irregular processor utilization and multiprogramming.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 30

Producer and consumer episodes

• Example: MP3 playback through esd sound daemon.• Monitor communications to/from sound daemon.• Distance between producer and consumer episodes determines

necessary performance level.

Sound daemon

MP3 player

HW sound device

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 31

Cumulative interactive episode length distributionF

ram

eMak

er

Episode length (sec)

Cumulative numberCumulative time

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1e-05 0.0001 0.001 0.01 0.1 1

50ms10ms

Minimum performance level sufficient Max. performance

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 32

Cumulative interactive episode length distributionX

emac

s

Episode length (sec)

Cumulative numberCumulative time

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1e-05 0.0001 0.001 0.01 0.1 1

50ms10ms

Minimum performance level sufficient Max. performance

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 33

Performance-setting strategy for interactive episodes

• Predict the performance factor that would be correct most of the time (not for most events).– Based on past optimal performance factors.

• Limit worst case impact on response time.

• No need to predict episode length.– Performance factors have smaller range.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 34

Performance-setting for interactive episodes

• Wait 5ms before transition to ignore short episodes• Switch to predicted performance level.

• If episode duration reaches PanicThreshold, switch to maximum performance.

• Estimate full performance episode duration.

• Compute optimum performance level for past episode.

• Compute new prediction based on optimum settings.

At the beginning of the episode

During the episode

At the end of the episode

PanicThreshold = PerceptionThreshold(1 + PerformanceFactor)

Predicted PerformanceFactor is the average of past optimum settings, weighted by the corresponding episode lengths.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 35

Performance-setting algorithm

• Enter period-sampling mode.

• Switch to maximum performance.

• Establish base performance level.

• Exit period-sampling mode.

Periodic activity detected

• If not in period-sampling mode, apply interactive episode performance-setting policy.

Start of interactive episode

• Update interactive episode statistics.

• Switch to base performance level, if there is periodic activity on the machine.

End of interactive episode

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 36

Advantages

• Automatic.• Impact on response time is quantifiable.

– Performance can be adapted to the user’s preference.

• Works well in the presence of multiprogramming.• Irregular processor utilization is not a problem.• Implementation requires very little state.

– Weighted average: two counters.• Rescale to adapt to temporal variations.

Existing interval-based schemes:• No feedback about service quality.• Only work well if processor utilization is regular.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 37

Performance-setting during the Acrobat Reader benchmark (200ms p.t.)

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12 14 16 18104 124

Time (sec)

Pe

rfo

rma

nce

fa

cto

r

Transitions to maximum performance level are due to reaching the PanicThreshold

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 38

Performance-setting during the Acrobat Reader + MP3 benchmark (200ms p.t.)

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

Time (sec)

Pe

rfo

rma

nce

fa

cto

r

Transitions due to PanicThreshold

Full performance for periodic activity.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 39

Hardware assumptions

Minimum performance 150Mhz @ 0.75V

Maximum performance 1000Mhz @ 1.75V

PLL resynch time (stalls execution)

0.02ms

Voltage transition time 1ms

Assumptions based on Intel Xscale.

We assume that processor switches to sleep mode when it is not executing an episode.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 40

Energy factors (no MP3)

0%

20%

40%

60%

80%

100%

50ms 100ms 150ms 200ms 250ms 300ms

Perception threshold

Ene

rgy

fact

or

Acroread FrameMakerGhostview GIMPNetscape Xemacs

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 41

Energy factors with MP3 playback

0%

20%

40%

60%

80%

100%

50ms 100ms 150ms 200ms 250ms 300ms

Perception threshold

Ene

rgy fa

ctor

Acroread FrameMakerGhostview GIMPNetscape Xemacs

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 42

Changes in cumulative episode lengths as the result of performance scaling (Xemacs 50ms p.t. )

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1e-05 0.0001 0.001 0.01 0.1 1

50ms10ms

Episode length (sec)

Be

fore

pe

rfo

rma

nc

e s

ca

lin

g Afte

r pe

rform

an

ce

sc

alin

g

Cum

ula

tive

pe

rce

nta

ge o

f tim

e

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 43

Desired improvements

• Processor parameters are good enough.– Faster voltage transitions would help a little.– As peak performance gets higher, lower minimum

performance is desirable.

• More sophisticated prediction algorithms.– Distinguish between episode instances, not just

episode types.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 44

Conclusions

• Multiprocessing can significantly improve response times.– Measured 15%-38% improvement (out of possible 50%)!

• Many interactive episodes are already fast enough.– More will be fast enough in the near future.– Use Dynamic Voltage Scaling to save energy.

• Episode classification based on inter-task communication.– Fast, accurate, no user program modifications required.

• Performance-setting based on episode classification.– Works well with multiprogramming, irregular processor utilization.– Ensures high quality interactive performance.– Significant energy savings (10%-80%).

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 45

Future work

• Evaluate our algorithms on real hardware.– Processors are slowly becoming available.– Impact on interactive performance.

• An API to specify episodes.– Light-weight: specify hints, not complete information.– Works in concert with existing detection mechanism.

• Apply episode detection to other problems.– Scheduler: can real-time deadlines be detected

automatically?

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 46

fin.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 47

The performance gap

1

10

100

1000

10000

100000

0 1.5 3 4.5 6 7.5 9Time (years)

Per

form

ance

Available performancestarts accommodatingrequirements (A).

Desired performance

Available Performance

All performancerequirements are met (B).

Slowest availableperformance exceedsminimum requirements (C).

Available performanceis higher than required (D).

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 48

Applicability to other environments

Technique exploits information from existing design patterns.

On Linux with X windows:• Communication through sockets, pipes, signals.• Well-known tasks: X server, sound daemon, etc.• Select syscall used for asynchonous I/O.• Use of blocking system calls in dedicated threads.

Other systems:• Adapt to that system’s design patterns and IPC mechanisms.

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 49

Computing the performance factor for interactive episodes

1

PF min

p e rc ep t i o nth re sh o l d

m in i m u m -p e r for m an c eth r es h o ld

P F 1 = P F min

ful l -sp e e d e p is o de d u ra t i o n

p e rc ep t i o n th r es h o ldP F 2 = P F 3 = 1

Pe

rfo

rma

nc

e

Full-speed

fac

tor

episode duration (sec)

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 50

Performance scaling

De

ad

line

Pe

rfor

ma

nce

Pe

rfor

ma

nce

A

B

Time

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 51

Energy-delay (no MP3)

Increase of perceptible interactive episode lengths

Ene

rgy

fact

or

Krisztián Flautner - [email protected] Monitoring for Interactive performance and Power Reduction 52

Energy-delay (MP3)

Increase of perceptible interactive episode lengths

Ene

rgy

fact

or