performance problems you can fix: a dynamic analysis of memoization opportunities luca della toffola...

33
Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas R. Gross – ETH Zurich October 30 th , 2015 - OOPSLA15 1

Upload: agatha-bell

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Apache POI – Issue Performance Issue

TRANSCRIPT

Page 1: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

1

Performance Problems You Can Fix: A Dynamic Analysis of Memoization OpportunitiesLuca Della Toffola – ETH ZurichMichael Pradel – TU DarmstadtThomas R. Gross – ETH Zurich

October 30th, 2015 - OOPSLA15

Page 2: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt

2

Dynamic analysis

Memoization opportunities

Automatic

9 new real-world memoization opportunities

Page 3: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Apache POI – Issue 55611

3

PerformanceIssue

Page 4: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

public boolean DateUtil.isADateFormat(int idx, String format) {StringBuilder sb = new StringBuilder(format.length());for (int i = 0; i < sb.length(); i++) {

// Modify format and write to sb}String f = sb.toString();// Process f using date pattern matchingreturn date_ptrn.matcher(f).matches();

}

Apache POI – Issue 55611

3

Page 5: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

public boolean DateUtil.isADateFormat(int idx, String format) {StringBuilder sb = new StringBuilder(format.length());for (int i = 0; i < sb.length(); i++) {

// Modify format and write to sb}String f = sb.toString();// Process f using date pattern matchingreturn date_ptrn.matcher(f).matches();

}

Apache POI – Issue 55611

3

Java profilerRanked 10 (189), 4000 callsJava profilerNo additional bottleneck info

Page 6: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

public boolean DateUtil.isADateFormat(int idx, String format) {StringBuilder sb = new StringBuilder(format.length());for (int i = 0; i < sb.length(); i++) {

// Modify format and write to sb}String f = sb.toString();// Process f using date pattern matchingreturn date_ptrn.matcher(f).matches();

}

Apache POI – Issue 55611

3

Research toolsSympthoms are not there*

No nested loops

No memory

bloat

* [Nistor, ISCE13], [Xu, OOPSLA12]

Page 7: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

public boolean DateUtil.isADateFormat(int idx, String format) {StringBuilder sb = new StringBuilder(format.length());for (int i = 0; i < sb.length(); i++) {

// Modify format and write to sb}String f = sb.toString();// Process f using date pattern matchingreturn date_ptrn.matcher(f).matches();

}

Apache POI – Issue 55611

3

ObservationMany calls have the same input and output values!

OutputReturned value

InputParameters +

accessed fields

true

true

true

false

false

0, “m/d/yy”

0, “m/d/yy”

0, “m/d/yy”

1, “h:mm”

1, “h:mm”

Memoization ?

Page 8: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

public boolean DateUtil.isADateFormat(int idx, String format) {StringBuilder sb = new StringBuilder(format.length());for (int i = 0; i < sb.length(); i++) {

// Modify format and write to sb}String f = sb.toString();// Process f using date pattern matchingreturn date_ptrn.matcher(f).matches();

}

Apache POI – Issue 55611

3

Purity analysis?Too conservative!

Sideeffects

Sideeffects

Sideeffects

Ignore side effects!

Page 9: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

public boolean DateUtil.isADateFormat(int idx, String format) {StringBuilder sb = new StringBuilder(format.length());for (int i = 0; i < sb.length(); i++) {

// Modify format and write to sb}String f = sb.toString();// Process f using date pattern matchingreturn date_ptrn.matcher(f).matches();

}

Apache POI – Issue 55611

3

MemoizeIt1st ranked method!

MemoizeItFinds calls with the same input and output values.

Memoization!

Page 10: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

boolean cache_value;int cache_key1;String cache_key2;

public boolean isADateFormatSlow(int idx, String format) {// Slow isADateFormat code

}

public boolean isADateFormat(int idx, String format) {if (cache_key1 == idx && cache_key2 .equals(format)) {

return cache_value;}// Update cache keys and valuereturn isADateFormatSlow(idx, format);

}

Apache POI – Issue 55611

3

Single entry instance cache

Up to 25% speed-up!

Page 11: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt – Contributions

4

1. Automatic analysis to find memoization opportunities

2. Suggest fix configurations for candidate methods

Page 12: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt – Contributions

5

1. Automatic analysis to find memoization opportunities

2. Suggest fix configurations for candidate methodsChallengeboolean DateUtil.isADateFormat(int idx, MyClass format)

Heap

Page 13: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt – Contributions

6

1. Automatic analysis to find memoization opportunities

2. Suggest fix configurations for candidate methodsChallenge

MemoizeIt==

Memoization + Iterative

Page 14: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt

7

Program Profiling Input

CPU-TimeProfiling

Filtering of methods:

1. Number of executions2. Average execution time3. Relative execution time

Initial method candidates

Page 15: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt

8

Program Profiling Input

CPU-TimeProfiling

Input-OutputProfiling

Page 16: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Input-Output Profiling

9

Input:Parameters + accessed

fields

Output:Returned value

Input-output tuple (T)

main

… …

1. For each call of candidate method

3. Select method candidates

T1

T2

multiplicity(T1) = 3

multiplicity(T2) = 2

Repeated Input-Output Memoization

boolean DateUtil.isADateFormat(int idx, String format)

2. Trace method input-output values

true

true

true

false

false

0, “m/d/yy”

0, “m/d/yy”

0, “m/d/yy”

1, “h:mm”

1, “h:mm”

Page 17: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Challenge – Complex Objects

10

boolean DateUtil.isADateFormat(int idx, MyClass format)

Page 18: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Challenge – Complex Objects

10

x: 45

MyClass

y: 1

z: Ba:

equals?

Structural and content equivalence

x: 45

MyClass

y: 0

z: Ba:

Page 19: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Challenge – Complex Objects

11

flat(object)(MyClass1, [45, 1, (B1, [...])])

x: 45

MyClass

y: 1

z: Ba:

Page 20: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Challenge – Complex Objects

12

Heap…

x: 45

MyClass

y: 1

z: Ba:

Can’t keep everything!

Page 21: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Challenge – Complex Objects

13

depth = 1 depth = 2

x: 45

MyClass

y: 0z: B

a:

x: 45

MyClass

y: 1

z: Ba:

Heap

ref1

ref2

equals?

Exhaustive traversal is expensive!

Page 22: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Solution - Iterative Profiling

14

depth = 1 depth = 2

x: 45

MyClass

y: 0z: B

a:

x: 45

MyClass

y: 1

z: Ba:

Heap

ref1

ref2

equals? Iterative approach can analyze programs

with complex structures

Page 23: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt

15

Program Profiling input

CPU-TimeProfiling

Input-OutputProfiling

Candidatesranking

Fixsuggestions

Initial methodcandidates

Input-OutputProfiling

Filter methodcandidates

if max depth || time limit

new candidates

depth++

exit()

d = 1

Page 24: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt

16

Program Profiling Input

CPU-TimeProfiling

Input-OutputProfiling

Ranking of Candidates !

Ranked candidatemethods

Ranking based1. Estimated saved time2. Estimated hit-ratio

Page 25: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

MemoizeIt

17

Program Profiling Input

CPU-TimeProfiling

Input-OutputProfiling

Ranking of Candidates

FixSuggestions

Optimal cache configuration

!Ranked candidatemethods

Suggests configuration among:

SingleInstance

SingleGlobal

MultiInstance

MultiGlobal

+ need for invalidation

Page 26: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Experimental Setup

18

Program Description

DaCapo 2006 MR2 antlr, bloat, chart, fop, luindex, pmd

Checkstyle - 5.6 Source-code style checker

Soot – ae0cec69c0 Static program analysis / manipulation

Apache Tika - 1.3 Content analysis toolkit

Apache POI - 3.9 MS Office documents manipulation

Page 27: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Evaluation – Research Question

Is MemoizeIt effective at finding new memoization opportunities?

1. Manually select realistic input2. Execute MemoizeIt3. Manually inspect methods4. Implement MemoizeIt’s suggestions

Timeout for profiling: 1 hour

19

Page 28: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Evaluation – Results

20

9 new opportunities

DaCapo-antlr, DaCapo-bloat, DaCapo-fopSoot , Apache-Tika, Apache-POI, Checkstyle

1 duplicate method in Apache-Tika, Apache-POI

31 memoization opportunities

Is MemoizeIt effective at finding new memoization opportunities?

Page 29: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Evaluation – Results

21

Small workload

[speed-up]

Largeworkload

[speed-up]DaCapo-antlr 1.04 ± 0.03 1.05 ± 0.02

DaCapo-bloat 1.08 ± 0.03 -

DaCapo-fop 1.05 ± 0.01 NA

Checkstyle - 9.95 ± 0.10

Soot 1.27 ± 0.03 12.93 ± 0.05

Apache-Tika Excel - 1.25 ± 0.02

Apache-Tika Jar 1.09 ± 0.01 1.12 ± 0.02

Apache-POI (1) 1.11 ± 0.01 1.92 ± 0.01

Apache-POI (2) 1.07 ± 0.01 1.12 ± 0.01

Page 30: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Evaluation – Research Question

22

Is the iterative or exhaustive approach more efficient?

Page 31: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Evaluation – Results

22

Iterative Time [minutes]

Exhaustive Time [minutes]

DaCapo-antlr timeout timeoutDaCapo-bloat timeout timeoutDaCapo-chart 2 2DaCapo-fop 18 timeoutDaCapo-luindex 32 timeoutDaCapo-pmd timeout timeoutCheckstyle 6 22Soot timeout timeoutApache-Tika Excel 58 56Apache-Tika Jar 41 35Apache-POI 23 37

Iterative wins

Exhaustive wins

Is the iterative or exhaustive approach more efficient?

Page 32: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Related Work

Performance problems

Detecting[Xu, OOPSLA12], [Zaparanuks, PLDI12]

Understanding[Song, OOPSLA14], [Yu, ASPLOS14]

Fixing[Nistor, ICSE15]

23

Compiler optimizations[Ding, CGO04], [Costa, CGO13], [St-Amour, OOPSLA12]

Incremental computations[Pugh, POPL89]

Other caching techniques[Ma, WWW15]

Page 33: Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas

Conclusions

Profiling of memoization opportunities• New real-world opportunities• Relevant speed-ups• Iterative strategy beneficial

Suggests cache configurations• Suggestions easy to implement

Artifact evaluated• https://github.com/lucadt/memoizeit

24

Heap

SingleGlobal

MultiInstance

MultiGlobal

SingleInstance