java gc - pause tuning

Post on 20-Jan-2015

2.851 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

English version of the presentation we gave at Devoxx FR 2012.In depth analysis on how java Garbage collector works and how to minimise pause in your application.

TRANSCRIPT

Everything you ever wanted to know about GC pauses**but were afraid to ask

1

Death by pauses

Tuesday, July 10, 12

Agenda

1. Introduction

2. Crime Scene Investigation

3. JVM Memory management systems and tools

4. Putting it together

2Tuesday, July 10, 12

The Crime ScenePG 13*

* Parents strongly cautioned: typed language, dead objects and verbose logs may not be suitable to scripting language fans

3Tuesday, July 10, 12

4

Apa

che

Tom

cat

Ora

cle

B2Ce-commerce platform

Tuesday, July 10, 12

4

Apa

che

Tom

cat

Ora

cle

B2Ce-commerce platform

•12+ Servers•10 different Webapps•50+ JVMs (Oracle JDK6)

Tuesday, July 10, 12

4

Apa

che

Tom

cat

Ora

cle

B2Ce-commerce platform

•12+ Servers•10 different Webapps•50+ JVMs (Oracle JDK6)

•> 30000 sessions•250-400 Req/s•Variance is high

Tuesday, July 10, 12

... an unusual victim...

5

Product catalog modeled as a Graph100% custom implementation

100% on-heap (no SQL except for initial load)in-place update by AtomicReference.set()

Tuesday, July 10, 12

... an unusual victim...

5

Product catalog modeled as a Graph100% custom implementation

100% on-heap (no SQL except for initial load)in-place update by AtomicReference.set()

Caching aggressively is not possibleLarge number of request-scoped objects

Many WS into backoffice systems = latency

Tuesday, July 10, 12

6

vs.

Throughput Latency

Tuesday, July 10, 12

7Tuesday, July 10, 12

7

Interactive e-commerce app:Low latency is the top

priority!

Tuesday, July 10, 12

The Crime Scene

8

Time

JDBC Connections

Tuesday, July 10, 12

The Crime Scene

8

Time

JDBC Connections

Time

Requests/s

Tuesday, July 10, 12

The Crime Scene

8

Time

JDBC Connections

Time

Requests/s

Time

Active threads

Tuesday, July 10, 12

The Crime Scene

8

Time

JDBC Connections

Time

Requests/s

Time

Active threads

Time

HTTP Executor Queue Size

Tuesday, July 10, 12

The evidence

9

Size

in M

B

1 hour

Heap

Tuesday, July 10, 12

The evidence

9

Size

in M

B

1 hour

Heap

Can’t see anything: let’s zoom out!

Tuesday, July 10, 12

The evidence

10

Size

in M

B

24 hours

Heap

Tuesday, July 10, 12

The evidence

10

Size

in M

B

24 hours

Heap

Tuesday, July 10, 12

The evidence

10

Size

in M

B

24 hours

Heap

Tuesday, July 10, 12

0

25

50

75

100

1 hour

The evidence

11

Time spent in GC (%)

Size

in M

B

1 hour

Heap

Tuesday, July 10, 12

The usual suspects...

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

• OutOfMemory PermGen

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

• OutOfMemory PermGen

• Long GC pauses

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

• OutOfMemory PermGen

• Long GC pauses

➡ under high load = immediate death

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

• OutOfMemory PermGen

• Long GC pauses

➡ under high load = immediate death

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

• OutOfMemory PermGen

• Long GC pauses

➡ under high load = immediate death

12Tuesday, July 10, 12

The usual suspects...

• OutOfMemory Heap

• OutOfMemory PermGen

• Long GC pauses

➡ under high load = immediate death

12

Death by

pauses

Tuesday, July 10, 12

Why do we need this GC thing again ?

13

“Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free.”

Cliff Click

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200200 <= X< 300

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200200 <= X< 300300 <= X< 400

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200200 <= X< 300300 <= X< 400400 <= X< 500

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200200 <= X< 300300 <= X< 400400 <= X< 500500 <= X< 600

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200200 <= X< 300300 <= X< 400400 <= X< 500500 <= X< 600600 <= X< 700

Tuesday, July 10, 12

Fine, we just need to tune the JVM, right?...

14

POP QUIZZ!Number of command-line flags*?

* Oracle JVM 1.6.0_31 x86_64 server

less than 100 flags100 <= X< 200200 <= X< 300300 <= X< 400400 <= X< 500500 <= X< 600600 <= X< 700 664 Flags!

Tuesday, July 10, 12

15Tuesday, July 10, 12

15Tuesday, July 10, 12

JVM

Memory in the JVM

16

Tuesday, July 10, 12

JVM

17

Permanent (PermGen) Class metadatainterned Strings, etc.

Tuesday, July 10, 12

Heap

18

Permanent (PermGen)

Application Objects

Class metadatainterned Strings, etc.

Tuesday, July 10, 12

19

Young / New

Old / Tenured

Permanent (PermGen)

Tuesday, July 10, 12

20

Old / Tenured

Eden S0 S1

Permanent (PermGen)

Tuesday, July 10, 12

The Garbage Collector is generational

21

Old

Eden Sur

vivor 0

Surviv

or 1

Tuesday, July 10, 12

22

Old

Eden Sur

vivor 0

Surviv

or 1

Allocation

Tuesday, July 10, 12

23

Old

Eden Sur

vivor 0

Surviv

or 1

Tuesday, July 10, 12

24

Old

Eden Sur

vivor 0

Surviv

or 1

100% = GC!

Tuesday, July 10, 12

25

Old

Eden Sur

vivor 0

Surviv

or 1

LiveUnreferenced

Tuesday, July 10, 12

26

Old

Eden Sur

vivor 0

Surviv

or 1

Copy

Tuesday, July 10, 12

Reset...

27

Old

Eden Sur

vivor 0

Surviv

or 1

Tuesday, July 10, 12

28

Old

Eden Sur

vivor 0

Surviv

or 1

Allocation

Tuesday, July 10, 12

29

Old

Eden Sur

vivor 0

Surviv

or 1

100% = GC !

Tuesday, July 10, 12

30

Old

Eden Sur

vivor 0

Surviv

or 1

Tuesday, July 10, 12

31

Old

Eden Sur

vivor 0

Surviv

or 1

Copy

Tuesday, July 10, 12

32

Old

Eden Sur

vivor 0

Surviv

or 1

Copy

Tuesday, July 10, 12

33

Reset

Old

...

Eden Sur

vivor 0

Surviv

or 1

Génération 1

Génération 2

Tuesday, July 10, 12

34

Old

Eden Sur

vivor 0

Surviv

or 1

Allocation

Tuesday, July 10, 12

35

Old

Eden Sur

vivor 0

Surviv

or 1

100% = GC !

Tuesday, July 10, 12

36

Old

Eden Sur

vivor 0

Surviv

or 1

Copy

Tuesday, July 10, 12

37

Eden Sur

vivor 0

Surviv

or 1

Promotion

Old

Tuesday, July 10, 12

Old

38Tuesday, July 10, 12

39

Old“Almost full” = GC !

Tuesday, July 10, 12

40Tuesday, July 10, 12

41Tuesday, July 10, 12

42

Old

Compaction(optional)

Tuesday, July 10, 12

43Tuesday, July 10, 12

Garbage Collectors

44

•Générational

• Stop the world!

• Throughput or Concurrent

Tuesday, July 10, 12

GC characteristics

45

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Tuesday, July 10, 12

GC characteristics

46

YoungYoung

OldOldOld

Serial Parallel

Serial Default

Parallel N/A

Concurrent

Tuesday, July 10, 12

GC characteristics

47

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Tuesday, July 10, 12

GC characteristics

47

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Serial

Tuesday, July 10, 12

GC characteristics

47

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Serial Parallel

Tuesday, July 10, 12

GC characteristics

47

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Serial Parallel

ParallelOld

Tuesday, July 10, 12

GC characteristics

47

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Serial Parallel

ParallelOld

CMS

Tuesday, July 10, 12

GC characteristics

47

YoungYoung

OldOldOld

Serial Parallel

Serial

Parallel

Concurrent

Serial Parallel

ParallelOld

CMSCMS Serial

Tuesday, July 10, 12

GC characteristics

48

YoungYoung

OldOldOld

Serial Parallel

Serial Serial Parallel

Parallel ParallelOld

Concurrent CMS Serial CMS

Parallel implementation actually differ for each variant

Tuesday, July 10, 12

GC characteristics

49Tuesday, July 10, 12

GC characteristics

49Tuesday, July 10, 12

CMS is the right choice

50

Serial

Parallel

ParallelOld

CMS

CMS Serial

0 250 500 750 1000

937

871

846

852

917

Average test duration (s)

Tuesday, July 10, 12

Tools: CLI

51

jps, jhat, jmap, jstack, jstat

$ jstat -gcutil PID S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 40.88 58.41 18.34 66.65 2729 316.538 46 6.820 323.358

Tuesday, July 10, 12

Tools: GUIs

52Tuesday, July 10, 12

Tools: GUIs (2)

•Any profiler

•During development

• For autopsies!

53Tuesday, July 10, 12

Tools: GUIs (2)

•Any profiler

•During development

• For autopsies!

53

HeapDumpOnOutOfMemoryErrorHeapDumpPath

Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54Tuesday, July 10, 12

verbose:gc

54

Stop the world!

Tuesday, July 10, 12

verbose:gc

54

Stop the world!

Tuesday, July 10, 12

MBeans

55Tuesday, July 10, 12

OK, so we can measure... temperature!!

56Tuesday, July 10, 12

OK, so we can measure... temperature !

57

=

Tuesday, July 10, 12

58Credit: http://www.lhup.edu/mkhalequ/fieldtrip/geos253.htm

But...a single temperature measure is not enough to diagnose anything!

We must archive all measurementsto know the baseline!

Tuesday, July 10, 12

Therefore we must persist all measurements!

• JMX + jmxtrans

• RRD

• Graphite

• etc.

59Tuesday, July 10, 12

Operating the (many) switches only makes sense...

60Credit: http://www.our-energy.com

Tuesday, July 10, 12

...if we can measure/compare the effects!

61

Before

After

cput

ime

Tuesday, July 10, 12

Putting it together

62Tuesday, July 10, 12

63

We want to minimize the GC pauses

Young (ParNew)Old (CMS-initial-mark + CMS-remark)

Tuesday, July 10, 12

64

vs.

Tuesday, July 10, 12

JVM

Tomcat

64

Application(code)

vs.

Tuesday, July 10, 12

1. Code

• Tuning the JVM cannot compensate for bad code

• Rules of thumb

• Immutability = object reuse = less allocations *

• Move code invariants out of tight loops

• Know the characteristics of your data structures & frameworks (java.util, Guava, Hibernate, etc.)

• Mind the gap: data structure overhead can kill you!

65* But...pooling can be counter-productive!

Tuesday, July 10, 12

Example : HashMap

66

HashMap

Entry[16]

Entry

value

key

48

80

32

Tuesday, July 10, 12

Example : HashMap

66

HashMap

Entry[16]

Entry

value

key

48

80

32Overhead = 160 Bytes!

Tuesday, July 10, 12

Example : HashMap

66

HashMap

Entry[16]

Entry

value

key

48

80

32Overhead = 160 Bytes!

•SingletonMap (40 Bytes)•initialCapacity + loadFactor

Tuesday, July 10, 12

Less allocations...

67

GC

You

ng /

s

Tuesday, July 10, 12

... saves CPU

68

Cha

rge

CPU

Tuesday, July 10, 12

2. Tomcat

• Pooling

• JSP tags: enablePooling in web/webdefault.xml

• -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false

• Careful with buffers and their reuse

• -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true

• JSP usage is a factor in PermGen requirements

• Test & Measure, always!

69Tuesday, July 10, 12

2. Tomcat

• Pooling

• JSP tags: enablePooling in web/webdefault.xml

• -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false

• Careful with buffers and their reuse

• -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true

• JSP usage is a factor in PermGen requirements

• Test & Measure, always!

69

!Pooling may lead

to Old fragmentation!

Tuesday, July 10, 12

3. The JVM

70

Hea

p Si

ze

Time

Hea

p Si

ze

Time

Tuesday, July 10, 12

3. The JVM

70

Hea

p Si

ze

Time

Hea

p Si

ze

Time

pause > 1s !

Tuesday, July 10, 12

3. The JVM

70

Hea

p Si

ze

Time

Hea

p Si

ze

Time

Frequent GC

pause > 1s !

Tuesday, July 10, 12

The heap

71

Heap

Tuesday, July 10, 12

The heap

71

Heap-Xms : start size-Xmx : max size

Tuesday, July 10, 12

Young vs Old

72

Young

Old

Tuesday, July 10, 12

Young vs Old

72

Young

Old

-XX:NewSize -XX:MaxNewSize-XX:NewRatio

Tuesday, July 10, 12

Young vs Old

72

Young

Old•“Working Set”•Caches, object pools•HttpSession, average lifespan objects

Objects < RequestScope

Tuesday, July 10, 12

First mistake: setting the Young too small

73

Young

Old

Tuesday, July 10, 12

First mistake: setting the Young too small

73

Young

Old

Young fills up quickly = many GC Young

Objects promoted to Tenured too fast = many GC Old

Tuesday, July 10, 12

Second mistake: setting Young too large

74

Young

Old

Tuesday, July 10, 12

Second mistake: setting Young too large

74

Young

Old

GC Young pauses increase

Tuesday, July 10, 12

Tuning Young

75

Young

Old

Default NewRatio=8 with -server on Intel

=Too small for a webapp with non-

trivial load!

Tuesday, July 10, 12

Tuning Young

75

Young

Old

Increase Young slowly and measure the effects!

Tuesday, July 10, 12

Old: Mind the Gaps (fragmentation)!

76

Young

Old

Tuesday, July 10, 12

Old: Mind the Gaps (fragmentation)!

76

Young

OldJDK6 < u22:

ParNew (prom

otion failur

e size = 155

95) (promoti

on failed)

Tuesday, July 10, 12

Old generation : ideal shape

77Tuesday, July 10, 12

Old generation : real life

78Tuesday, July 10, 12

Old generation : ideal vs. real

79Tuesday, July 10, 12

Old generation : ideal vs. real

79

Rate increases

Tuesday, July 10, 12

Things to watch for

• Traffic/Load variance

• Traffic increases => Memory pressure increase

• CMS requires some headroom to operate properly

• Several phases are concurrent, i.e. at the same time as new objects are allocated

80Tuesday, July 10, 12

Things to watch for

• Traffic/Load variance

• Traffic increases => Memory pressure increase

• CMS requires some headroom to operate properly

• Several phases are concurrent, i.e. at the same time as new objects are allocated

80

(concurrent mode failure): 2165740K->1284261K(2228224K), 8.9411250 secs

Tuesday, July 10, 12

Giving CMS some room to operate

81

Young

Old

Tuesday, July 10, 12

Giving CMS some room to operate

81

Young

Old

CMSInitiatingOccupancyFraction = 92%

This is the default....

Tuesday, July 10, 12

We really need 75-80%

UseCMSInitiatingOccupancyOnly to force the JVM to only consider this criteria

Giving CMS some room to operate

81

Young

Old

Tuesday, July 10, 12

82

CMS initial-mark

Tuesday, July 10, 12

83

CMS initial-mark (cumulative)

Tuesday, July 10, 12

83

CMS initial-mark (cumulative)

Median: -83%

Tuesday, July 10, 12

83

CMS initial-mark (cumulative)

Median: -83%

Top 99%: -79%

Tuesday, July 10, 12

84

CMS remark

Tuesday, July 10, 12

85

CMS remark (cumulative)

Tuesday, July 10, 12

85

CMS remark (cumulative)

Top 90%: -56%

Tuesday, July 10, 12

But...I still see pauses !

• RMI triggers explicit GC regularly

• Invokes System.gc()

• Explicit GC = Full GC (Serial) = 4-8s stop-the-world pause !

• DisableExplicitGC + CMSClassUnloadingEnabled

• ExplicitGCInvokesConcurrentAndUnloadsClasses

86Tuesday, July 10, 12

Complete GC comparison

87Tuesday, July 10, 12

88Tuesday, July 10, 12

88Tuesday, July 10, 12

• Survivors tuning (S0 & S1)

• Size, ratio vs. Eden, max generation

• G1

• Principles and operations are radically different!

• Other JVMs : JRockit, Azul, IBM

• Check tuning validity after every code change!

• Measure, measure, measure!

89

What’s next?

Tuesday, July 10, 12

90

Questions ?

Tuesday, July 10, 12

91Tuesday, July 10, 12

top related