debugging ruby with mongodb

Post on 27-Apr-2015

3.411 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Debugging Rubywith MongoDB

Aman Gupta@tmm1

Ruby developers know...

Rubyis

fatboyke (flickr)

Ruby loves eating RAM

37prime (flickr)

ruby allocates memory from the OS

memory is broken up into slots

each slot holds one ruby object

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is empty, GC is run

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is empty, GC is run

GC finds non-reachable objects and adds them to the freelist

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is empty, GC is run

GC finds non-reachable objects and adds them to the freelist

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is still empty (all slots were in use)

if the freelist is empty, GC is run

GC finds non-reachable objects and adds them to the freelist

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is still empty (all slots were in use)

another heap is allocated

all the slots on the new heap are added to the freelist

turns out,

Ruby’s GC is

also one of the

reasons it can be so

slowantphotos (flickr)

Matz’ Ruby Interpreter (MRI 1.8)has a...

john_lam (flickr)

Conservativelifeisaprayer (flickr)

Stopthe

Worldbenimoto (flickr)

Markand

Sweepmichaelgoodin (flickr)

Garbage Collector

kiksbalayon (flickr)

•conservative: the VM hands out raw pointers to ruby objects

•stop the world: no ruby code can execute during GC

•mark and sweep: mark all objects in use, sweep away unmarked objects

more objects=

longer GC

mckaysavage (flickr)

longer GC=

less time to run your ruby code

kgrocki (flickr)

fewer objects=

better performance

januskohl (flickr)

improve performance1. remove unnecessary object allocations

object allocations are not free

improve performance1. remove unnecessary object allocations

object allocations are not free

2. avoid leaked referencesnot really memory ‘leaks’

you’re holding a reference to an object you no longer need. GC sees the reference, so it keeps the object around

the GC follows

references recursively, so a reference

to classA will ‘leak’ all these objects

let’s build a debugger

• step 1: collect data

• list of all ruby objects in memory

• step 2: analyze data

• group by type

• group by file/line

• simple patch to ruby VM (300 lines of C)

• http://gist.github.com/73674

• simple text based output format

0x154750 @ -e:1 is OBJECT of type: T0x15476c @ -e:1 is HASH which has data0x154788 @ -e:1 is ARRAY of len: 00x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi0x1547dc @ -e:1 is STRING len: 1 and val: T0x154814 @ -e:1 is CLASS named: T inherits from Object0x154a98 @ -e:1 is STRING len: 2 and val: hi0x154b40 @ -e:1 is OBJECT of type: Range

version 1: collect data

version 1: analyze data$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5

   10948 ARRAY   20355 OBJECT   30744 DATA  64952 HASH  123290 STRING

$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

version 1

• it works!

• but...

• must patch and rebuild ruby binary

• no information about references between objects

• limited analysis via shell scripting

• better data format

• simple: one line of text per object

• expressive: include all details about object contents and references

• easy to use: easy to generate from C code & easy to consume from various scripting languages

version 2 goals

equanimity (flickr)

version 2 is memprof• no patches to ruby necessary

• gem install memprof

• require ‘memprof’

• Memprof.dump_all(“/tmp/app.json”)

• C extension for MRI ruby VMhttp://github.com/ice799/memprof

• uses libyajl to dump out all ruby objects as json

{ "_id": "0x19c610",

"file": "file.rb", "line": 2,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object

file and line where string was created

length and contentsof this string instance

address of the class “String”

stringsMemprof.dump{ "hello" + "world"}

floats and strings are separate ruby objects

{ "_id": "0x19c5c0",

"class": "0x1b0d18", "class_name": "Array",

"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

integers and symbols are stored in the array itself

arraysMemprof.dump{ [ 1, :b, 2.2, "d" ]}

hashes{ "_id": "0x19c598",

"type": "hash", "class": "0x1af170", "class_name": "Hash",

"default": null,

"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}

hash entries as key/value pairs

no default proc

Memprof.dump{ { :a => 1, "b" => 2.2 }}

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}

class variables and constants are stored in the instance variable table

superclass object reference

references to method objects

version 2: memprof.coma web-based heap visualizer and leak analyzer

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof

let’s run some queries.

thaths (flickr)

how many objects?

how many objects?> db.rails.count()809816

• ruby scripts create a lot of objects

• usually not a problem, but...

• MRI has a naïve stop-the-world mark/sweep GC

• fewer objects = faster GC = better performance

brettlider (flickr)

what types of objects?

what types of objects?> db.rails.distinct(‘type’)

[‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]

mongodb: distinct• distinct(‘type’)

list of types of objects

• distinct(‘file’)list of source files

• distinct(‘class_name’)list of instance class names

• optionally filter first

• distinct(‘name’, {type:“class”})names of all defined classes

improve performancewith indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex( {‘file’:1}, {background:true})

mongodb: ensureIndex

• add an index on a field (if it doesn’t exist yet)

• improve performance of queries against common fields: type, class_name, super, file

• can index embedded field names

• ensureIndex(‘methods.add’)

• find({‘methods.add’:{$exists:true}})find classes that define the method add

darrenhester (flickr)

how many objs per type?

> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b) { return a.count - b.count})

how many objs per type?

group on type

increment countfor each obj

sort results

[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]

• nodes represent ruby code

• stored like any other ruby object

• makes ruby completely dynamic

lots of nodes

how many objs per type?

mongodb: group

• cond: query to filter objects before grouping

• key: field(s) to group on

• initial: initial values for each group’s results

• reduce: aggregation function

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}

• by file & line• key: {file:1, line:1}

• by type in a specific file• cond: {file: “app.rb”},

key: {file:1, line:1}

• by length of strings in a specific file• cond: {file:“app.rb”,type:‘string’},

key: {length:1}

davestfu (flickr)

what subclasses String?

what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})

{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}

select only name field

mongodb: find

• find({type:‘string’})all strings

• find({type:{$ne:‘string’}})everything except strings

• find({type:‘string’}, {data:1})only select string’s data field

http://body.builder.hu/imagebank/pictures/1088273777.jpg

the largest objects?

the largest objects?> db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1}).sort({length:-1}).limit(3) {type: "string", length: 2308}{type: "string", length: 1454}{type: "string", length: 1238}

mongodb: sort, limit/skip

• sort({length:-1,file:1})sort by length desc, file asc

• limit(10)first 10 results

• skip(10).limit(10)second 10 results

zoutedrop (flickr)

when were objs created?

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created

• find minimum time, call it start_time

• create buckets for every minute of execution sincestart

• place objects into buckets

when were objs created?> db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60;

emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find().sort({time:1}).limit(1)[0].time } }){result:"tmp.mr_1272615772_3"}

start_time = min(time)

mongodb: mapReduce• arguments

•map: function that emits one or more key/value pairs given each object this

• reduce: function to return aggregate result, given key and list of values

• scope: global variables to set for funcs

• results

• stored in a temporary collection(tmp.mr_1272615772_3)

when were objs created?> db.tmp.mr_1272615772_3.count()12

script was running for 12 minutes

> db.tmp.mr_1272615772_3.find().sort({value:-1}).limit(1){_id: 8, value: 41231}

41k objects created 8 minutes after start

jeffsmallwood (flickr)

references to this object?

references to this object?ary = [“a”,”b”,”c”]

ary references “a”“b” referenced by ary

• ruby makes it easy to “leak” references

• an object will stay around until all references to it are gone

• more objects = longer GC = bad performance

• must find references to fix leaks

references to this object?• db.rails_refs.insert({

_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table

• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array

• db.rails_refs.find({refs:“0xa”})efficiently lookup all objs holding a ref to 0xa

mongodb: multikeys

• indexes on array values create a ‘multikey’ index

• classic example: nested array of tags

• find({tags: “ruby”})find objs where obj.tags includes “ruby”

version 2: memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

memprof.coma web-based heap visualizer and leak analyzer

plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request

# in environment.rbrequire `gem which memprof/signal`.strip

let’s use memprof to find it!

plugging a leak in rails3

tell memprof to dump out the entire heap to json

$ memprof --pid <pid> --name <dump name> --key <api key>

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

2519 classes

30 copies of TestController

2519 classes

30 copies of TestController

mongo query for all TestController classes

details for one copy of TestController

find references to object

find references to object

find references to object

holding references to all controllers

“leak” is on line 178

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

Questions?

Aman Gupta@tmm1

top related