trust but verify - a year with cassandra and the hunt...
TRANSCRIPT
![Page 1: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/1.jpg)
Trust but verifyA year with Cassandra and the hunt for native memory JVM leaks
Chris Burroughs
Clearspring
2011-08-22
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 1 / 34
![Page 2: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/2.jpg)
1 Introduction
2 Cassandra at Clearspring
3 Some Definitions
4 Time-line of the Hunt
5 Conclusions
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 2 / 34
![Page 3: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/3.jpg)
Table of Contents
1 Introduction
2 Cassandra at Clearspring
3 Some Definitions
4 Time-line of the Hunt
5 Conclusions
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 3 / 34
![Page 4: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/4.jpg)
Hello!
Chris Burroughs [email protected]
Active in the Apache Cassandra and (incubating) Kafka communities
A few mostly minor tickets: 1966, 2082, 2551
http://www.meetup.com/Cassandra-DC-Meetup/
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 4 / 34
![Page 5: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/5.jpg)
We are hiring
http://www.clearspring.com/about/careers
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 5 / 34
![Page 6: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/6.jpg)
What is this talk about?
Some of what we learned after using Cassandra for a year.
Particularly as we struggled with with unbounded RES growth. Mostof this is applicable to any JVM program.
I’ve tried to explain things when they make sense, not chronologicallywhen we figured them out. (But feel free to ask questions)
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 6 / 34
![Page 7: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/7.jpg)
Disclaimers
I have come out of this with a general positive view of Cassandraeven though getting there sucked.
This is mostly about what I learned, to the extent that there were“discoveries” they were made by others.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 7 / 34
![Page 8: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/8.jpg)
Disclaimers
I have come out of this with a general positive view of Cassandraeven though getting there sucked.
This is mostly about what I learned, to the extent that there were“discoveries” they were made by others.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 7 / 34
![Page 9: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/9.jpg)
Table of Contents
1 Introduction
2 Cassandra at Clearspring
3 Some Definitions
4 Time-line of the Hunt
5 Conclusions
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 8 / 34
![Page 10: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/10.jpg)
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 9 / 34
![Page 11: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/11.jpg)
Sharecounter
Capacity planning conundrum: The counter will account for between0 and 100% of views within ? days?/weeks?/months?
Primary considerations: Proven, incremental, horizontal scalability.Tolerance to individual node failures.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 10 / 34
![Page 12: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/12.jpg)
Sharecounter
Capacity planning conundrum: The counter will account for between0 and 100% of views within ? days?/weeks?/months?
Primary considerations: Proven, incremental, horizontal scalability.Tolerance to individual node failures.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 10 / 34
![Page 13: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/13.jpg)
Tangent: Counters
We did not use CASSANDRA-1072 counters.
(Probably will in the future depending on results of SSTable compressiontests.)
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 11 / 34
![Page 14: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/14.jpg)
Tangent: Counters
We did not use CASSANDRA-1072 counters.
(Probably will in the future depending on results of SSTable compressiontests.)
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 11 / 34
![Page 15: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/15.jpg)
Table of Contents
1 Introduction
2 Cassandra at Clearspring
3 Some Definitions
4 Time-line of the Hunt
5 Conclusions
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 12 / 34
![Page 16: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/16.jpg)
JVM: Cocoon
JVM, on heap: “Normal” place for allocation. You can set a max sizeof n bytes.
I Max heap size seems to get a reasonable amount of respect from theJVM.
I But the heap can fragment and take up more than n bytes. This isdifficult to detect.
JVM, off heap: Give me some bytes! You can use either useDirectByteBuffer’s yourself, or it’s likely that you use a library thatdoes (NIO).
JVM, permgen: Classes and stuff like that.
Other: Hotspot is a C++ program. It can use memory for whateverit needs to do.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 13 / 34
![Page 17: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/17.jpg)
Linux: Harsh Reality
Resident set size: The least bad measure of how much memory aprocess is using.
mmap(2): mmap-ed files are counted as part of your PIDs RSS.Reduces visibility (have fun with pmap and friends), may be faster.
Linux does not care about your nice heap abstractions, it’s just anotherprocess.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 14 / 34
![Page 18: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/18.jpg)
Wizard is Out Of Mana
JVM: OutOfMemory Exception → Nice log messages with a clue towhat happened.
Linux: The kernel needs more memory → it kills processes until it’ssatisfied. Check dmesg.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 15 / 34
![Page 19: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/19.jpg)
Table of Contents
1 Introduction
2 Cassandra at Clearspring
3 Some Definitions
4 Time-line of the Hunt
5 Conclusions
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 16 / 34
![Page 20: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/20.jpg)
First test stack failure
A node in the test stack dies at 2010-10-10 at 3:15pm.
Around this time there was a large and unexplained increase inCPU utilization
$ dmesg | grep -i oom
syslogd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0
java invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
That was weird, decrease max heap size and forget about it.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 17 / 34
![Page 21: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/21.jpg)
About a month later . . .
All production servers die within an hour or so of each other.
On failures
We often model as if failures are uncorrelated.
This isn’t really true for hardware (ie same model disks), but itdefinitely is not true for software.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 18 / 34
![Page 22: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/22.jpg)
About a month later . . .
All production servers die within an hour or so of each other.
On failures
We often model as if failures are uncorrelated.
This isn’t really true for hardware (ie same model disks), but itdefinitely is not true for software.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 18 / 34
![Page 23: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/23.jpg)
Monitoring!
We get a graph like this:
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 19 / 34
![Page 24: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/24.jpg)
More Monitoring!
Start rolling restarts every few weeks.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 20 / 34
![Page 25: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/25.jpg)
January: reduced cached mem; resident set size growth
Armed with graphs we started posting on cassandra-users.
1 Hotspot version
(we upgraded, no difference)
2 permgen? (nope, checked that)
3 mmap? (nope, disabled that a long time ago)
4 swap? (Not currently swapping)
5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)
This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34
![Page 26: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/26.jpg)
January: reduced cached mem; resident set size growth
Armed with graphs we started posting on cassandra-users.
1 Hotspot version (we upgraded, no difference)
2 permgen?
(nope, checked that)
3 mmap? (nope, disabled that a long time ago)
4 swap? (Not currently swapping)
5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)
This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34
![Page 27: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/27.jpg)
January: reduced cached mem; resident set size growth
Armed with graphs we started posting on cassandra-users.
1 Hotspot version (we upgraded, no difference)
2 permgen? (nope, checked that)
3 mmap?
(nope, disabled that a long time ago)
4 swap? (Not currently swapping)
5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)
This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34
![Page 28: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/28.jpg)
January: reduced cached mem; resident set size growth
Armed with graphs we started posting on cassandra-users.
1 Hotspot version (we upgraded, no difference)
2 permgen? (nope, checked that)
3 mmap? (nope, disabled that a long time ago)
4 swap?
(Not currently swapping)
5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)
This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34
![Page 29: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/29.jpg)
January: reduced cached mem; resident set size growth
Armed with graphs we started posting on cassandra-users.
1 Hotspot version (we upgraded, no difference)
2 permgen? (nope, checked that)
3 mmap? (nope, disabled that a long time ago)
4 swap? (Not currently swapping)
5 Heap Fragmentation
(Well that’s interesting, have fun with jemalloc)
This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34
![Page 30: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/30.jpg)
January: reduced cached mem; resident set size growth
Armed with graphs we started posting on cassandra-users.
1 Hotspot version (we upgraded, no difference)
2 permgen? (nope, checked that)
3 mmap? (nope, disabled that a long time ago)
4 swap? (Not currently swapping)
5 Heap Fragmentation (Well that’s interesting, have fun with jemalloc)
This smelled like a JVM/glibc/kernel bug, but we are faced with the factthat it only occurs when we are running Cassandra.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 21 / 34
![Page 31: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/31.jpg)
Tangent: Rolling restarts and caches
Refresher on caches:
key cache: Caches location of keys
row cache: Caches entire rows
Also, the OS page cache
Cassandra can persist the entire key cache, and can persist the row keysfor the row cache, but not the rows themselves.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 22 / 34
![Page 32: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/32.jpg)
Tangent: Rolling restarts and caches, shoot ourselves inthe foot
Before cache savings:
Size row cache it get best hit rate vs heap size trade-off
Restart node.
Node can’t handle reads, drops messages for a while. Not safe torestart another one until it stops.
After:
Size row cache it get best hit rate vs heap size trade-off.
Persist row cache keys
Restart node.
Wait half an hour for all row’s to be read, node now has a pile ofhinted handoffs to deal with.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 23 / 34
![Page 33: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/33.jpg)
Tangent: Rolling restarts and caches, shoot ourselves inthe foot
Before cache savings:
Size row cache it get best hit rate vs heap size trade-off
Restart node.
Node can’t handle reads, drops messages for a while. Not safe torestart another one until it stops.
After:
Size row cache it get best hit rate vs heap size trade-off.
Persist row cache keys
Restart node.
Wait half an hour for all row’s to be read, node now has a pile ofhinted handoffs to deal with.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 23 / 34
![Page 34: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/34.jpg)
Tangent: Rolling restarts and caches, right answer
Options:
1 Something hacky to save row values along with row keys and beinconsistent.
2 Something hacky to save a random set of row keys and hope thathelps.
3 Modify CLHM to allow traversal in hotness order.
4 Recognize that this is a sign you need more capacity.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 24 / 34
![Page 35: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/35.jpg)
Tangent: Rolling restarts and caches, CASSANDRA-1966
Ben Manes Google Alert:
This example it would be a fair usage and justification of orderediteration. Its a trivial change, but its an enhancement I’veavoided eagerly performing until a project considers it aworthwhile feature.
1.0 will have a row cache keys to save option.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 25 / 34
![Page 36: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/36.jpg)
CASSANDRA-2654
CASSANDRA-2654
Work around native heap leak in sun.nio.ch.Util affectingIncomingTcpConnection
Java bug #6210541
Deep in the bowels of Java NIO is a weak references cache to directbyte buffers
That’s a painfully broken design.
CASSANDRA-2654 works around it. But this isn’t really a “leak”,since eventually a full GC should clean them up.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 26 / 34
![Page 37: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/37.jpg)
More attempts
Tried to audit the use of DirectByteBuffersI -XX:MaxDirectMemorySize
Opened a ticket with Oracle.I Has not gone anywhere yet.
Survey on the user listI No pattern among kernel, OS, hotspot, or other software versions.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 27 / 34
![Page 38: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/38.jpg)
Hark, a Tweet!
http://twitter.com/#!/kimchy/status/90861039930970113
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 28 / 34
![Page 39: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/39.jpg)
Java Bug 7066129
import j a v a . l ang . management . GarbageCol lectorMXBean ;import j a v a . l ang . management . ManagementFactory ;import j a v a . u t i l . L i s t ;
pub l i c c l a s s TestMemoryLeak {
pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ) throws Excep t i on {wh i l e ( t rue ) {
L i s t<GarbageCol lectorMXBean> gcMxBeans = ManagementFactory . getGarbageCo l l ectorMXBeans ( ) ;f o r ( GarbageCol lectorMXBean gcMxBean : gcMxBeans ) {
( ( com . sun . management . GarbageCol lectorMXBean ) gcMxBean ) . g e t L a s tGc I n f o ( ) ;}
}}
}
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 29 / 34
![Page 40: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/40.jpg)
CASSANDRA-2868
Several people verified that disabling the GCInspector (which callsGarbageCollectorMXBean#getLastGcInfo) keeps RSS from increasing.
There is a patch that tries to get similar data through another set ofmethods.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 30 / 34
![Page 41: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/41.jpg)
Table of Contents
1 Introduction
2 Cassandra at Clearspring
3 Some Definitions
4 Time-line of the Hunt
5 Conclusions
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 31 / 34
![Page 42: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/42.jpg)
Conclusions
Happy to be spending less time with Cassandra for a while.
There are bugs in Hotspot, your file system, RHEL5 and everythingelse you think is infallible.
I think page cache management is the open question right now.
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 32 / 34
![Page 43: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/43.jpg)
Thoughts on Upcoming Cassandra changes
Very excited about the alternative SSTable format inCASSANDRA-674 and friends (type specific data compression,compressed index, row cache as row+filter, etc)
Once burned twice shy: Terrified of off heap data structures, but itlooks like we didn’t go down that path after all. (CASSANDRA-2252)
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 33 / 34
![Page 44: Trust but verify - A year with Cassandra and the hunt …files.meetup.com/1794037/cburroughs-cass-trust-but...Trust but verify A year with Cassandra and the hunt for native memory](https://reader034.vdocuments.site/reader034/viewer/2022042309/5ed5b1100a1a7f290d5f7390/html5/thumbnails/44.jpg)
Questions?
Chris Burroughs (Clearspring) Trust but verify 2011-08-22 34 / 34