Transcript
Page 1: Garbage collection 介紹

Garbage collection 介紹

高國棟

Page 2: Garbage collection 介紹

演講經歷

● 2013/04 在 taipei.py 演講關於 pdb 的實作。相關投影片:http://www.slideshare.net/ya790026/recoverpdb

● 2013/05 在 pyconf.tw 演將 CPython 原始碼解析。相關投影片:http://www.slideshare.net/ya790026/c-python-23247730。

● 2013/08 在taipei.py 演講 python 如何執行程式碼。相關投影片:http://www.slideshare.net/ya790026/python-27854881

Page 3: Garbage collection 介紹

Garbage Collection

● memory leak, dangling pointer

● Reference count● Mark and sweep

Page 4: Garbage collection 介紹

memory memory

memory memory free

memory leak dangling pointer

Page 5: Garbage collection 介紹

Reference Counting

● Reference count is maintained for each object on the heap.

● When an object is first created and a reference to it is assigned to a variable, the object's reference count is set to one.

Page 6: Garbage collection 介紹

Reference Counting

● When any other variable is assigned a reference to that object, the object's count is incremented.

● When a reference to an object goes out of scope or is assigned a new value, the object's count is decremented.

Page 7: Garbage collection 介紹

a = 5000a

ob_ival: 5000ob_refcnt: 1

a = 5000b = a

a

ob_ival: 5000ob_refcnt: 2

b

a = 5000b = aa = 3000

a

ob_ival: 3000ob_refcnt: 1

b

ob_ival: 5000ob_refcnt: 1

Page 8: Garbage collection 介紹

a = 5000b = aa = 3000b = 4000 a

ob_ival: 3000ob_refcnt: 1

b

ob_ival: 4000ob_refcnt: 1

ob_ival: 5000ob_refcnt: 0

Page 9: Garbage collection 介紹

Reference Counting

Advantage:suitable for real-time environments where

the program can't be interrupted for very long.

Disadvantage:reference counting does not detect cycles.

Page 10: Garbage collection 介紹

a = []b = []a.append(b)b.append(a)

a

b

a = []b = []a.append(b)b.append(a)a = Noneb = None

Page 11: Garbage collection 介紹

mark and sweep

1. Find the root objects of the system. These are things like the global environment (like the __main__ module in Python) and objects on the stack.

2. Search from these objects and find all objects reachable from them. This objects are all "alive".

3. Free all other objects.

Page 12: Garbage collection 介紹

Two-Color Mark & Sweep

White Black

New

Free

Mark

SweepSweep

Page 13: Garbage collection 介紹

Two-Color Mark & Sweep

● the algorithm is non-incremental (atomic collection)

Page 14: Garbage collection 介紹

Tri-Color Incremental Mark & Sweep● Initially grey set is all the objects that are reachable from

root references but the objects referenced by grey objects haven't been scanned yet.

● The white setis the set of objects that are candidates for having their memory recycled.

● The black set is the set of objects that can cheaply be proven to have no references to objects in the white set.

Page 15: Garbage collection 介紹

White

Black

New

Free

Sweep

Gray

Sweep

Mark

AfterCheck

BarrierForward

Barrierbackward

Mark

Mark

Page 16: Garbage collection 介紹

Tri-Color Incremental Mark & Sweep

● When there are no more objects in the grey set, then all the objects remaining in the white set have been demonstrated not to be reachable, and the storage occupied by them can be reclaimed.

Page 17: Garbage collection 介紹

Generational Collectors

1. Most objects created by most programs have very short lives.

2. Most programs create some objects that have very long lifetimes. A major source of inefficiency in simple copying collectors is that they spend much of their time copying the same long-lived objects again and again.

Page 18: Garbage collection 介紹

External Memory fragment

● Free memory is separated into small blocks and is interspersed by allocated memory.

● Although free storage is available, it is unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.

Page 19: Garbage collection 介紹

External Memory fragment

a b c d

a c d

a c

del bdel d

We can’t create a variable with four blocks.

Page 20: Garbage collection 介紹

Compacting and copying

● Move objects on the fly to reduce heap fragmentation

Page 21: Garbage collection 介紹

a

b

Object

a

b

Object

table of object handles

Page 22: Garbage collection 介紹

stop and copy● The heap is divided into two regions.● Only one of the two regions is used at any time.● Objects are allocated from one of the regions until all

the space in that region has been exhausted.● Find out live objects and copy them to the other region.● Memory will be allocated from the new heap region until

it too runs out of space

Page 23: Garbage collection 介紹

free

allocated

unused

allocated

unusedCopy live objects

unused

free

allocated

Page 24: Garbage collection 介紹

Python garbage collection

● Python use both of reference count and “mark and sweep”.

● “mark and sweep” only work for containers for solving reference cycles.

● Containers mean list, dict, instance, etc.● python 的 mark and sweep和傳統方法不一

樣,因為 c extentsion 的存在,因此很難有共同的 root object。

Page 25: Garbage collection 介紹

Python mark and sweep

1. For each container object, set gc_refs equal to the object's reference count.

2. For each container object, find which container objects it references and decrement the referenced container's gc_refs field.

Page 26: Garbage collection 介紹

Python mark and sweep3. All container objects that now have a gc_refs field greater than one are referenced from outside the set of container objects. We cannot free these objects so we move them to a different set.

4. Any objects referenced from the objects moved also cannot be freed. We move them and all the objects reachable from them too.

Page 27: Garbage collection 介紹

Python mark and sweep

5. Objects left in our original set are referenced only by objects within that set (ie. they are inaccessible from Python and are garbage). We can now go about freeing these objects.

Page 28: Garbage collection 介紹

gc_refs: 1 gc_refs: 1

1

2gc_refs: 1 gc_refs: 0

3gc_refs: 1 gc_refs: 0

GC_TENTATIVELY_UNREACHABLE

Page 29: Garbage collection 介紹

4gc_refs: 1 gc_refs: 1

Page 30: Garbage collection 介紹

gc_refs: 1 gc_refs: 1

1

2gc_refs: 0 gc_refs: 0

3gc_refs: 0 gc_refs: 0

GC_TENTATIVELY_UNREACHABLE

Page 31: Garbage collection 介紹

4gc_refs: 0 gc_refs: 0

Page 32: Garbage collection 介紹

Java Reference

Strong referenceSoftReferenceWeakReferencePhantomReference

Page 33: Garbage collection 介紹

Soft Reference

● The garbage collector may reclaim the memory occupied by a softly reachable object.

● It’s useful for cache.

Page 34: Garbage collection 介紹

Weak Reference

● The garbage collector must reclaim the memory occupied by a weakly reachable object.

● Canonicalizing mappings

Page 35: Garbage collection 介紹

Phantom Reference

● Similar with weak reference● Whereas the garbage collector enqueues

soft and weak reference objects when their referents are leaving the relevant reachability state, it enqueues phantom references when the referents are entering the relevant state.

● Establish more flexible pre-mortem cleanup policies than are possible with finalizers.

Page 36: Garbage collection 介紹

Python Reference

Strong referenceWeak reference

weakref.ref(object[, callback])

Page 37: Garbage collection 介紹

Python gc 介面

gc.enable()gc.disable()c.isenabled()gc.collect([generation])gc.set_threshold(threshold0[, threshold1[, threshold2]])gc.get_count()gc.get_threshold()

Page 38: Garbage collection 介紹

Python gc 介面

gc.set_debug(flags)

gc.get_referrers(*objs)gc.get_referents(*objs)

gc.garbage

Page 39: Garbage collection 介紹

In [1]: import gc

In [2]: gc.set_debug(gc.DEBUG_STATS)

In [3]: gc.collect()

gc: collecting generation 2...

gc: objects in each generation: 159 2655 7538

gc: done, 10 unreachable, 0 uncollectable, 0.0123s

elapsed.

Page 40: Garbage collection 介紹

>>> class Finalizable:

... def __del__(self): pass

...

>>> a = Finalizable()

>>> b = Finalizable()

>>> a.x = b

>>> b.x = a

>>> del a

>>> del b

>>> import gc

>>> gc.collect()

>>> gc.garbage

Page 41: Garbage collection 介紹

● memory-bound○ 可以考慮調低 threshold 用時間換取空間

● cpu-bound○ 可以考慮調高 threshold 用空間換取時間○ 但是不可以調太高 以免每次 gc 時間過久○ 在部分要求低延遲的程式碼 可以暫時停用 gc

Page 42: Garbage collection 介紹

結論

● python 的 gc 演算法很有趣● python 的記憶體管理機制,能夠減少記憶體破

碎的情形發生。但是 gc 無法解決 ExternalMemory fragment 的問題

● python 的 gc 是 atomic

Page 44: Garbage collection 介紹

PyConf 場務徵人

Page 45: Garbage collection 介紹

Thank you


Top Related