page reclaim

Download Page reclaim

Post on 30-Jun-2015




4 download

Embed Size (px)


Investigation on (basic of) Linux's page reclaim function.


  • 1. Linux @siburu! 2014/7/27(Sun)

2. 1. 3. Whats Page Frame page frame = A page-sized/aligned piece of RAM! struct page = An one-on-one structure in kernel for each page frame! mem_map! Unique array of struct page's which covers all RAM that a kernel manages.! but in CONFIG_SPARSEMEM environment! There's no unique mem_map.! Instead, there's a list of 2MB-sized arrays of struct page's.! You must use __pfn_to_page(), __page_to_pfn() or wrappers of them. 4. Whats NUMA NUMA(Non-Uniform Memory Architecture)! System is comprised of nodes.! Each node is dened by a set of CPUs and one physical memory range.! Memory access latency differs depending on source and destination nodes.! NUMA conguration! ACPI provides NUMA conguration:! SRAT(Static Resource Afnity Table)! To know which CPUs and memory range are contained in which NUMA node?! SLIT(System Locality Information Table)! To know how far a NUMA node is from another node? 5. Whats Memory Zone Physical memory is separated by address range:! ZONE_DMA: ags &= ~PF_MEMALLOC! get_page_from_freelist! drain_all_pages! get_page_from_freelist 14. pfmemalloc_watermark_ok() ARGS! pgdat(type: struct pglist_data)! RETURN! type: bool! nodes free_pages > 0.5 * nodes min_wmark! DESC! node(zone) min watermark OK! false nodekswapd! nodedirect reclaim kswapd 15. do_try_to_free_pages() Core function for page reclaim, which is called at 3 different scenes! try_to_free_pages()Global reclaim path via __alloc_pages_nodemask()! try_to_free_mem_cgroup_pages()Per-memcg reclaim path! Right before per-memcg slab allocation! Right before per-memcg le page allocation! Right before per-memcg anon page allocation! Right before per-memcg swapin allocation! shrink_all_memory()Hibernation path! Arguments: (1)struct zonelist *zonelist (2)struct scan_control *sc 16. struct scan_control struct scan_control {! ! unsigned long nr_scanned;! ! unsigned long nr_reclaimed;! ! unsigned long nr_to_reclaim;! ! ! ! int swappiness; // 0..100! ! ! ! struct mem_cgroup *target_mem_cgroup;! ! ! ! nodemask_t! *nodemask;! };! 17. do_try_to_free_pages ! shrink_zones()! ! wakeup_usher_threads()! shrink_zones(scan_context::nr_to_reclaim)1.5 ! (bdi) 18. shrink_zones() 1. for_each_zone_zonelist_nodemask:! 1. mem_cgroup_soft_limit_reclaim! while mem_cgroup_largest_soft_limit_node:! mem_cgroup_soft_reclaim! shrink_zonezone memcglimit ! 2. shrink_zone! foreach mem_cgroup_iter:! shrink_lruvec! iterationGlobal reclaim root memcg! 2. shrink_slab! 19. shrink_lruvec() per-zone page freer! 1. get_scan_count! ! 2. while :! shrink_list(LRU_INACTIVE_ANON)! shrink_list(LRU_ACTIVE_ANON)! shrink_list(LRU_INACTIVE_FILE)! shrink_list(LRU_ACTIVE_FILE)! 3. if INACTIVE:! shrink_active_list 20. shrink_list() shrink_{active or inactive}_listactive shrinkinactive! 1. if ACTIVE:! if size of lru(ACTIVE) > size of lru(INACTIVE):! shrink_active_list! 2. else:! shrink_inactive_list 21. shrink_{active,inactive}_list shrink_active_list()! 1. Traverse pages in an active list! 2. Find inactive pages in the list and move them to an inactive list! shrink_inactive_list()! foreach page:! 1. page_mapped(page) => try_to_unmap(page)! 2. if PageDirty(page) => pageout(page) 22. inactive !laptop_mode! active LRU list inactive! laptop_mode! active LRU listclean inactive 23. try_to_unmap() Unmap a specied page from all corresponding mappings! 1. Set up struct rmap_walk_control.! 2. rmap_walk_{le, anon, or ksm}! rmap walk is iterating VMAs and unmapping from it! A. le: traverse address_space::i_mmap tree! B. anon: traverse anon_vma tree! C. ksm: traverse all merged anon_vma trees! each operation is similar to that for anon 24. A. rmap_walk_file page address_space(inode) i_mmap(type: rb_root) vma vma vma vma pgtbl pgtbl pgtbl pgtbl unmap 25. B. rmap_walk_anon page anon_vma rb_root(type:rb_root) vma vma vma vma pgtbl pgtbl pgtbl pgtbl unmap 26. C. rmap_walk_ksm page stable_node hlist anon! vma anon vma anon! vma vma vma vma vma pgtbl pgtbl pgtbl pgtbl anon! vma 27. 2.2 Daemon Reclaim (KSwapD) 28. kswapd Processing overview! 1. Wake up! 2. balance_pgdat()! 3. Sleep! balance_pgdat()! Work until all zones of pgdat are at or over hi-wmark.! reclaim function: kswapd_shrink_zone()