python and ruby vms
Post on 06-May-2015
1.690 Views
Preview:
DESCRIPTION
TRANSCRIPT
Why should you care about Ruby
● Opscode Chef● Puppet● VMware Cloud Foundry● Red Hat OpenShift● Redmine
Matz's Ruby Implementation (MRI) / Yet another Ruby VM (YARV) outline● Memory management
○ Automatic, full heap mark-sweep GC● Execution model
○ Bytecode interpretation (stack machine) from 1.9 (YARV)
○ Direct AST interpretation before 1.9 (MRI)● Concurrency
○ Multi-threaded, one active interpreter thread at time○ Green threads before 1.9 (MRI), OS level threads in
1.9 (YARV)● Method calls
○ Late binding, search for method in class dict by name
Typical interpreter execution model
Script.........
If
a=1 a=2
ParsingBytecode generation
Interpreter thread stacks
Heap
...Instruction aInstruction bInstruction c
...
Currently executed instruction
AST
GIL ownership diagram
Thread 1
Thread 2
GIL state
Interpreting IO Waiting
Owned by Thread 1
IO Interpreting
Owned by Thread 2Free
InterpretingIO
Owned by Thread 1
IO Waiting
MRI memory allocation diagram
Object pool 1
Object pool 2
Heap
RArray data RString data
Free list 1 Free list 2
MRI memory allocation
● Any ruby object is allocated on heap (even local variables)
● SLAB like allocation for Ruby objects○ C union is used, hence all objects are of the same
size (40 bytes)○ unlike typical SLAB allocator there is only one size of
objects to store● RString, RArray, RHash, etc. have a pointer
on external memory block containing the actual contents
MRI memory allocation (continue)
● External memory block for string or array is allocated using plain malloc
● String content can be shared between several objects (copy on write)
● 1.9 changes: small strings (23 bytes or less) are embedded into RString structure rather than allocated externally
MRI GC
● If there is no free slot for an object GC is run○ If there is still no free slot new slab (pool) is allocated
■ Unlike Java GC is not triggered only when all heap is utilized
● Stop the world mark-sweep GC○ Unlike Java or .NET there is no generations
MRI GC (continue)
● 1.9.3 changes: lazy sweep GC○ "In Lazy sweeping, each invocation of the object
allocation sweeps the heap until it finds an appropriate free object"■ i. e. just search for object marked as dead
instead of building free lists● 2.0 changes
○ Instead of marking live objects with FL_MARK flag external bitmap is created■ This allows to avoid excessive copies of memory
regions in forked processes
Real world Ruby usage stories
● Twitter switch from Ruby to Scala: http://www.artima.com/scalazine/articles/twitter_on_scala.html
● Iron.io switch from Ruby to Go: http://blog.iron.io/2013/03/how-we-went-from-30-servers-to-2-go.html
MRI Links
● Threads in Ruby discussion: http://stackoverflow.com/questions/56087/does-ruby-have-real-multithreading
● MRI GC slides: http://timetobleed.com/garbage-collection-slides-from-la-ruby-conference/
CPyton VM outline
● Memory management○ Automatic, reference counting
● Execution model○ Bytecode interpretation (stack machine)○ Maps, lists, tuples are created and managed by
bytecode instructions● Concurrency
○ Multi-threaded, one active interpreter thread at time● Method calls
○ Late binding, search for method in class dict by name
Python GC
● CPython uses reference counting to track object visibility○ Python uses global interpreter lock in order to avoid
synchronization on each reference operation● Cyclic references
○ Example: l = []; l.append(l); del l○ Cyclic references are only possible for "container"
objects● The GC for cyclic references has been
included since version 2.2 and is enabled by default
Search for cyclic references in CPython (generations)
● The GC classifies objects into three generations depending on how many collection sweeps they have survived○ New objects are placed in the youngest generation
(generation 0)○ If an object survives a collection it is moved into the
next older generation○ Since generation 2 is the oldest generation, objects
in that generation remain there after a collection
Search for cyclic references in CPython (activation)
● When the number of allocations minus the number of deallocations exceeds first threshold (gc.get_threshold), collection starts○ Initially only generation 0 is examined○ If generation 0 has been examined more than
second threshold times since generation 1 has been examined, then generation 1 is examined as well
○ Third threshold controls the number of collections of generation 1 before collecting generation 2
Objects with __del__ method in reference cycle
● Which __del__ method for two objects in cycle should be called first?○ After calling the first finalizer the object cannot be
freed as the second finalizer still may access it● Cycles that are referenced from objects with
finalizers are added to a global list of uncollectable garbage (gc.garbage)○ The program can access the global list and free
cycles in a way that makes sense for application
CPython links
● Python GC description: http://arctrix.com/nas/python/gc/
● GC module documentation: http://docs.python.org/2/library/gc.html
● Python method call description: http://css.dzone.com/articles/python-internals-how-callables-0
top related