guoquing xu, atanas rountev ohio state university oct 9 th, 2008 presented by eun jung park

24
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th , 2008 Presented by Eun Jung Park

Upload: matilda-griffith

Post on 17-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Guoquing Xu, Atanas RountevOhio State University

Oct 9th, 2008 Presented by Eun Jung Park

Page 2: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Example of memory leak/dangling pointer in C/C++

How about in Java? ◦ GC (Garbage Collector) will handle this!

Then what is memory leak problem in Java?

int *pi;void foo() { pi = (int*) malloc(8*sizeof(int)); // oops, memory leak of 4 ints // use pi free(pi); // foo() is done with pi}void main() { pi = (int*) malloc(4*sizeof(int)); foo(); pi[0] = 10; // oops, pi is now a dangling pointer}

int *pi;void foo() { pi = (int*) malloc(8*sizeof(int)); // oops, memory leak of 4 ints // use pi free(pi); // foo() is done with pi}void main() { pi = (int*) malloc(4*sizeof(int)); foo(); pi[0] = 10; // oops, pi is now a dangling pointer}

Above example is from http://www.ibm.com/developerworks/rational/library/05/0816_GuptaPalanki/index.html

Page 3: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

What is Java Memory Leak?◦ Object references that are no longer needed are unnecessarily

maintained. They will not disappeared by GC.

Why Java Memory Leak is bad?◦ It can degrade the performance.◦ It can eventually cause running out of memory and crash.◦ It is difficult to find.

public void slowlyLeakingVector(int iter, int count) {for (int i=0; i<iter; i++) {

for (int n=0; n<count; n++) {myVector.add(Integer.toString(n+i));

}for (int n=count-1; n>0; n--) {

// Oops, it should be n>=0myVector.removeElementAt(n);

}}

}

public void slowlyLeakingVector(int iter, int count) {for (int i=0; i<iter; i++) {

for (int n=0; n<count; n++) {myVector.add(Integer.toString(n+i));

}for (int n=count-1; n>0; n--) {

// Oops, it should be n>=0myVector.removeElementAt(n);

}}

}

Above example is from http://www.ibm.com/developerworks/rational/library/05/0816_GuptaPalanki/index.html

Page 4: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Static method using compiler or code analysis◦ Not precise: Usually they cannot precisely identify these

unnecessary references.◦ Not scalable: It is not good to use for large application.

Dynamic method using fine-grained runtime information about individual objects with single information - memory contribution or staleness contribution.◦ Not precise: They use from-symptom-to-cause approach and it

can be difficult to locate the source of the leak and cause the imprecise leak reports. (possible false positive)

◦ Hard to interpret and not sufficient for programmers: The output is too complex to interpret and lack of precision. Also the output is not enough to locate a bug for programmers.

Page 5: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Dynamic method with container-based heap-tracking◦ Instead of using from-symptom-to-cause

Only track containers to directly identify the source of the leak.◦ Instead of using single information

Compute heuristic confidence value for each container based on the combination of Overall memory consumption Each container’s memory consumption Each container’s staleness contribution

What is definition of container? An abstract data type (ADT) with a set of data elements and three basic operations ADD, GET, and REMOVE. (e.g., hash table, graphical element)

Why container is suspicious? Container causes many memory leaks in Java!

Page 6: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Contribution 1: Computing a Confidence Value

Contribution 2: Java Memory Leak Detection

Contribution 3: Implementation

Contribution 4: Empirical Evaluation

Page 7: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Define Memory Leak

Symptom

Define Memory Leak

Symptom

Define Memory Leak Free

Define Memory Leak Free

ChooseNon Memory-Leak-Free

Containers

ChooseNon Memory-Leak-Free

Containers

Calculate Memory Contribution

Calculate Memory Contribution

CalculateStaleness Contribution

CalculateStaleness Contribution

Put them together!Calculate

Leaking Confidence

Put them together!Calculate

Leaking Confidence

Page 8: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

A program written in garbage-collected language has a memory leak symptom within [ , ] if◦ (1) Memory consumption at the moment immediately after gc-events

in the region, ◦ (2) There exists a subsequences of gc-events,

memory consumption at each gc-events keeps growing

How to define and ?◦ by offline: Ending of the program or the out of memory error.◦ by online: User-defined. gc_events will be a check-points.◦ : Choose the smallest user-defined ratio to get the longest

region and more precise analysis.

This helps to identify the appropriate time region to analyze

s egci

egcis mmm

),...,,( 21gcn

gcgcss

s eees

Page 9: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

A container is memory-leak-free if(1) at the end of leak region, the number of element is 0(2) all elements added were removed and garbage collected

within the leak region. This means that # of ADD = # of REMOVE .

Why we this need definition? Containers that are not

memory-leak-free will "possibly" contribute to the memory leak symptom and considered for further evaluation.

We choose container that is not memory-leak-free and we are ready to go to next step!

Page 10: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Memory time graph is used to capture a container's memory footprint.◦ x-axis: the relative time of

program execution at ◦ y-axis: the relative memory

consumption of a container at

◦ Staring point: / , where =max( , allocation time of container)

◦ Ending point: / , where =min( , deallocation time of container)

Container’s memory contribution is defined as the area covered by the memory consumption curve in the graph.

MCMC

ii

x-axis = , y-axis= at x-axis = , y-axis= at e

i

Memory consumption of all reachable objects from container

iTotal amount of memory consumption of a container

0 e 0 s1 e 1 e

Page 11: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Staleness: the time since the object's last use. How calculate Staleness? time diff between and where,

◦ : the moment that element was removed from a container.◦ : the moment that element was added into a container or

retrieved from a container. ◦ Condition: no retrieval of element between and .◦ If < ?◦ If < ? ◦ If an element is never removed from a container?

How calculate Staleness contribution? When we have element in a container,

MCMC SCSC

1 2

12

1 21 s2 s

noo ...1

se

n

i i nostaleness

1

/)(

Page 12: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Combining MC and SC, we get Leaking Confidence defined as

◦ Why LC as an exponential function of SC? SC is more important than MC in determining a memory leak.

Desirable Properties◦ ◦ ◦ ◦

MCMC SCSC

LCLC

SCMCSCLC 1

0]1,0[0 LCSCandMC0]1,0[0 LCMCandSC

1]1,0[1 LCMCandSC

SCLCSCandMC ]1,0[1

Page 13: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Leak sympto

m

Leak sympto

m

Leak free

Leak free

MCMC SCSC

LCLC

Non Leak-freeContainers

Non Leak-freeContainers

Container ModelingContainer Modeling

Code InstrumentationCode Instrumentation

ProfilingProfiling

Instrumented code with glue class

Instrumented code with glue class

Data AnalysisData AnalysisInformation of

Potential leaking containers

Information of Potential leaking

containersLeaking Call SitesLeaking Call Sites

Page 14: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

For each container, corresponding glue class◦ Provided for all types in the Java collection frameworks.◦ User's annotation required for user-defined container.

These glue methods call profiling library to pass◦ For instrumentation step: call site ID◦ For SC computation

the container object the element object the number of elements in the container before the operation is

performed operation types are used for SC computation

Page 15: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Soot analysis framework is used

Calls to the corresponding glue methods are inserted before and/or after the call site.

Code is inserted after a container is allocated in order to track its allocation time.

Escape analysis: They do not include thread-local and method-local containers since their lifetime is limited within their allocating methods.

Page 16: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Perform profiling with JVMTI◦ Data for MC values◦ Data for SC values

What JVMTI helps for profiling?◦ Activate an object graph traversal thread ◦ Calculate the deallocation time of a tagged container.◦ Activate a dumping thread to prevent too much profiling data in

memory for performance.

Page 17: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

When we reach to the ending of leak region,tool starts offline analysis to

◦ Determine leaking region

◦ Approximate the memory time graph and MC value

◦ Compute SC

Page 18: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

For each element in a container, tool calculates the average staleness of each call sites.

Tool reports to programmers (testers)◦ potentially leaking containers sorted by LC value◦ potentially leaking call sites sorted by average staleness in each

container

Page 19: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Hardware Platform: 2.4 GHz Dual-core, 2GB RAM Three memory leak bugs

◦ Two are from Sun Bug Repository◦ One is from SPECjbb

Method◦ Check how successfully their tool can locate a memory leak bug in

three different sampling/dumping rates (1/15gc, 1/50gc, 1/85gc)◦ Check overhead and performance by measuring instrumentation

overhead, runtime with different size of heap in different sampling rate, and the overhead of using their tool.

What they want to show here?◦ Their tool achieved high precision in reporting causes for memory

leak bug with acceptable overhead for practical use!

Page 20: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

1. Enough information for programmers to locate bugs.

2. Sampling rate: 1/15gc and 1/50gc is better than 1/85gc.1/50gc is the best for tradeoff between performance and preciseness.

JDK bug #6209673

JDK bug #6559589

Image = (VolatileImage)volatileMap.get(config)

addElement(weakWindow)

Page 21: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

1. Requires user-defined container glue class. Before this, tool couldn’t locate a memory leak bug successfully

2. After modeling, it found the correct place for memory leak bug

3. 1/50gc showed the best performance and preciseness.orderTable.put(anOrder.getId(), anOrder)

Page 22: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Static Overhead-# of call sites instrumented-Static overhead of tools

Dynamic Overhead-# of gc-events and runtime with the default vs. large heap size in two different samplings

Overhead of usingthis tool

1. Applying escape analysis reduces the number of call sites2. In the same sampling rate, large initial heap size uses smaller running time3. Decreasing the sampling rate reduces the runtime overhead

Page 23: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

Why they are different from existing dynamic method? ◦ Instead of focusing on arbitrary objects, they only focus on

containers main contributor of memory leak problem. ◦ They consider the combination of MC and SC, not single.◦ They locate a bug more precisely◦ Programmer or testers only need to learn how to add glue class

and can use their tool easily instead of learning how to interpret complex outputs from existing tools.

Contributions◦ Contribution 1: Computing a Confidence◦ Contribution 2: Java Memory leak detection◦ Contribution 3: Implementation◦ Contribution 4: Empirical Evaluation

Page 24: Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park

How about overhead?◦ Need optimization to reduce overhead◦ Using JikesRVM to avoid JVMTI

Automated tool ◦ Automate the mapping between container methods

to the ADT operations

Alternative definition of LC for more precisely information

More context information about containers and call sites that can be useful for programmers.