buffer caches

30
ital UNIX Internals II 4 - 1 Buffer Caches Buffer Caches Chapter Four

Upload: genica

Post on 05-Feb-2016

62 views

Category:

Documents


0 download

DESCRIPTION

Buffer Caches. Chapter Four. buffer. in memory cache. kernel. File System I/O Using a Cache. user process. user process. buffer. buffer. read/write. mmap. On-disk Data. read ( ... ,1). user process. A Buffer. kernel. Process Reading One Byte. File System Caches and I/O. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Buffer Caches

Digital UNIX Internals II 4 - 1 Buffer Caches

Buffer Caches

Chapter Four

Page 2: Buffer Caches

Digital UNIX Internals II 4 - 2 Buffer Caches

File System I/O Using a Cache

buffer

user process

buffer

kernel

in memory cacheOn-disk Data

read/write

buffer

user process

mmap

Page 3: Buffer Caches

Digital UNIX Internals II 4 - 3 Buffer Caches

Process Reading One Byte

read ( ... ,1)

user process

kernel

A Buffer

Page 4: Buffer Caches

Digital UNIX Internals II 4 - 4 Buffer Caches

File System Caches and I/O

• Read-ahead– When a file system notices a file being read sequentially, it

can order the physical read of the next block(s) before the application actually requests them.

• Write-behind– Data blocks do not have to be immediately written to disk.

File systems can cluster together writes to contiguous disk blocks to improve performance.

Page 5: Buffer Caches

Digital UNIX Internals II 4 - 5 Buffer Caches

File System Caches in Digital UNIX

• (Traditional BSD UNIX) Buffer Cache– From BSD– Fixed pool of physical memory

• Unified Buffer Cache– Similar to SunOS and SVR4– Flexible pool of physical memory– Supports memory mapping

Page 6: Buffer Caches

Digital UNIX Internals II 4 - 6 Buffer Caches

Example: UFS uses both

v_type = VDIR

v_object

v_cleanblkhd

v_dirtyblkhd

vnode

v_type = VREG

v_object

v_cleanblkhd

v_dirtyblkhd

vnode

vm_object

vo_vp

vo_cleanpl

vo_cleanwpl

vo_dirtywpl

ob_memq

vm_page

vm_vp_object

buf

Page 7: Buffer Caches

Digital UNIX Internals II 4 - 7 Buffer Caches

Traditional Buffer Cache

• Pool of Memory– Allocated at boot time– Shared with no other subsystem or allocator

• Buffer Structures– Links into access hash chain, LRU and same vnode lists– Device containing buffer– Pointer to vnode– Logical block in vnode– Pointer to routine called when I/O is done

• Linked lists of Buffers– Hash chain bucket, LOCKED, LRU, AGE and EMPTY lists

Page 8: Buffer Caches

Digital UNIX Internals II 4 - 8 Buffer Caches

struct buf

b_flags

b_forw, b_back

av_forw, av_back

b_blockf, b_blockb

b_bufsizeb_bcount

b_dev b_error

b_un

b_lblkno, b_blknob_residb_proc

b_hash_chainb_iodone()b_pagelist

b_vp, b_rvp

b_rcred, b_wcred

b_dirtyoff, b_dirtyend

b_iocompleteb_lock

driver fields

buf bufHash list

buf bufQueue

buf bufVnode buffer list

Buffer

proc

buf

vm_page

vnode

ucred

Credentials

Head of hash lst

Page 9: Buffer Caches

Digital UNIX Internals II 4 - 9 Buffer Caches

Buffer Cache Lists

bufhd

bufhash

buf buf

buf

bufbuf

buf

bufbufbuf

buf

buf

buf

bfreelist[0]LOCKED

bfreelist[1]LRU

bfreelist[2]AGE

bfreelist[3]EMPTY

BufferMemory Pages

Page 10: Buffer Caches

Digital UNIX Internals II 4 - 10 Buffer Caches

To Find a Buffer

1. Calculate hash index using disk block number (b_blkno) and vnode (b_vp) (see BUFHASH macro in /sys/include/sys/buf.h).

2. Index into the hash list.

3. Follow hash pointer to buf structure in queue.

4. Identify the correct buf structure using vnode and block numbers.

5. If no match, follow hash pointer (b_forw) to next buf structure in queue.

6. If you get to the end of the list (wraps back to beginning) without finding the buf structure, it does not exist; allocate a new one from the free list.

Page 11: Buffer Caches

Digital UNIX Internals II 4 - 11 Buffer Caches

Getting a Buffer

bread()

getnewbuf()

getblk()

VOP_STRATEGY()

allocbuf()

Page 12: Buffer Caches

Digital UNIX Internals II 4 - 12 Buffer Caches

UBC - Unified Buffer Cache(1)

• Motivation– File Systems and Virtual Memory (Process Management)

compete for physical memory.– UBC unifies previously separate pools of physical memory.– Available Memory can be used by File Systems (UBC) or

VM on a first come first serve basis.– VM can memory map a file using same memory object as

UBC.

• Utilizes memory from the available pool– vm_page_queue_free– vm_page_array

Page 13: Buffer Caches

Digital UNIX Internals II 4 - 13 Buffer Caches

Unified Buffer Cache (2)

• Uses memory objects of type OT_UBC– includes a pointer to a vnode – associates cached pages with a specific file– accessed by

• a file system looking for cached data

• memory management on pagefault for an mmap’d file

• Utilizes lists;– vm_page_buckets to find vm_pages belonging to an object– ubc_lru to time order when pages were cached

Page 14: Buffer Caches

Digital UNIX Internals II 4 - 14 Buffer Caches

UBC Memory Object (OT_UBC)struct vm_ubc_object

ob_ref_count

ob_res_count

ob_size

ob_resident_pages

ob_flags

ob_memq

<lock>

ob_ops = u_anon_oopvm_object_ops

ob_type

vm_page

vu_cleanpl

vu_cleanwpl

vu_dirtywpl

vu_ops

vu_vfp

vu_wirecnt

vu_object

vu_nsequential

vu_loffset

vu_stamp

vu_seglock

vu_seglist

vfs_ubcops

vu_pshared

vu_freelists

Page 15: Buffer Caches

Digital UNIX Internals II 4 - 15 Buffer Caches

UBC LRU Page Queue

• Least recently used list of UBC pages– One per memory affinity domain

• vm_mads[N].md_ubc.ubc_lru

• Each is a struct vm_page – vm_page -> vm_ubc_object -> vnode

• For each vnode's VM object,– clean page list– clean wired page list– dirty page list– dirty wired page list

Page 16: Buffer Caches

Digital UNIX Internals II 4 - 16 Buffer Caches

UBC Routines (1)

Routine Functionubc_object_allocate() Allocates a vm_ubc_object if the vnode is

a regular type and one has not already been allocated.

ubc_object_free() Frees the vm_ubc_object when the vnode is about to be reused.

ubc_page_lookup() Looks up the page at the specified offset and specified vm_vp_object.

ubc_incore() Looks for resident pages in the specified range.

ubc_page_alloc() Allocates a page or returns a found page in the page hash list.

ubc_page_release() Releases a page to the UBC LRU list or system memory if possible.

Page 17: Buffer Caches

Digital UNIX Internals II 4 - 17 Buffer Caches

UBC Routines (2)

Routine Functionubc_lookup() Performs a hash search lookup on the page

at the specified offset. If found, removes the page from the ubc_lru list and holds it.

ubc_page_dirty() Transitions a page from the vnode's clean page list to its dirty page list.

ubc_msync() Calls for mmap to free all clean pages and writes all dirty pages.

ubc_invalidate() Invalidates some (or all) resident pages for a vnode.

ubc_flush_dirty() Starts I/O on all dirty pages for a vnode. Does not wait for I/O completion if flag B_ASYNC is used.

Page 18: Buffer Caches

Digital UNIX Internals II 4 - 18 Buffer Caches

UBC Routines (3)

Routine Function

ubc_dirty_kluster() Creates a list of sorted pages for a vnode. Assumes pages are scheduled for writing.

ubc_bufalloc() Allocates a buf structure.

ubc_sync_iodone() Waits for synchronous I/O transfer to complete, then frees buf and pages.

ubc_async_iodone_lwc() Called as LWC when asyncronous I/O transfer completes.

Page 19: Buffer Caches

Digital UNIX Internals II 4 - 19 Buffer Caches

File System and VM Routines

System Callread()write()

VFSVOP_READVOP_WRITE

File Systemufs_read()ufs_write()

uiomove()

UBCResidentPageManagement

ufs_getpage()returns

VM page

mmapPage Fault Handler

I/O

Page 20: Buffer Caches

Digital UNIX Internals II 4 - 20 Buffer Caches

Finding a UBC page from a file system

VOP_READ(vnode, ...)

ufs_read(vnode, ...)

ufs_getpage(vnode, ...)

ufs_getapage(vnode,...)

ubc_lookup(vnode, ...)

vm_page_lookup(mem_obj, ..)

Page 21: Buffer Caches

Digital UNIX Internals II 4 - 21 Buffer Caches

Limiting UBC

• ubc_dirty_thread– Calls ubc_memory_flushdirty

• Launders excessive dirty pages via calls to FSOP_PUTPAGE()

• vm_pageout thread (pageout daemon)– Runs vm_pageout_loop()– When number of free pages is low and UBC has borrowed to many

pages,• UBC pages are reclaimed off ubc_lru

• If no free pages, vm_page_alloc() may also come to ubc_lru.

Page 22: Buffer Caches

Digital UNIX Internals II 4 - 22 Buffer Caches

ubc_memory_purge() Flow

Start

Get ubc_lru page

Free the page

Referencedbit on?

Yes

No

Yes

NoFreed enough?

Yes

Dirty?

Move page from vm_vp_obectdirty list to clean list

Write the page out (VOP_PUTPAGE())asynchronously

No

Turn off and moveto tail of bc_lru

Stop

Page 23: Buffer Caches

Digital UNIX Internals II 4 - 23 Buffer Caches

Limiting the Amount of Dirty Data in UBC

• UBC limits the percent of its cached data that is modified– improves performance by spreading out IO load– minimizes loss of data if system crash

• Managed by separate kernel daemon thread

Page 24: Buffer Caches

Digital UNIX Internals II 4 - 24 Buffer Caches

ubc_dirty_thread_loop() FlowStart

Sleep on timer

Remove page from ubc_lru

Too manydirty pages

YesNo

Get ubc_lru_page

Yes

NoToo manydirty pages

Yes

Dirty

Move page from vm_vp_obectdirty list to clean list

Write the page out (VOP_PUTPAGE())asynchronously

No

Page 25: Buffer Caches

Digital UNIX Internals II 4 - 25 Buffer Caches

UBC Parameters and Thresholds (1)

Field Descriptionubc_pages Count of UBC pages.

ubc_minpages Smallest number of pages UBC will shrink to. ubc_minpages = (vm_managed_pages * ubc_minpercent)/100 where ubc_minpercent is tunable (Default =10).

ubc_maxpages Upper limit of size of UBC. ubc_maxpages = (vm_managed_pages * ubc_maxpercent)/100 where ubc_maxpercent is tunable (Default = 100).

ubc_lru_count Number of pages on the UBC LRU queue.

ubc_dirty_limit Determines if UBC should flush and free dirty pages. ubc_dirty_limit=MAX(ubc_min_dirtypages, ((vm_tune_value(ubcdirtypercent) * ubc_pages)/100)) where ubcdirtypercent is tunable (Default =10).

Page 26: Buffer Caches

Digital UNIX Internals II 4 - 26 Buffer Caches

UBC Parameters and Thresholds (2)

Field Descriptionubc_dirty_pages UBC page currently dirty; tracked by system.

ubc_borrowlimit Number of pages UBC can have. If ubc_pages>ubc_borrowlimt then UBC is asked to free pages. ubc_borrowlimit=(ubc_borrowpercent * vm_managed_pages)/100 where ubc_borrowpercent is 10 by default.

vm_perf.vpf_ubchit Rate of UBC pages transitioning to the tail of the UBC LRU list because a pmap_is_referenced returned TRUE.

vm_perf.vpf_ubcalloc Rate of UBC page allocation

vm_perf.vpf_ubcpagepushes Rate of pages being evicted from the UBC because of memory reclamation activity.

vm_free_count Current count of free pages.

Page 27: Buffer Caches

Digital UNIX Internals II 4 - 27 Buffer Caches

Source Reference (1 of 4)

Buf Cache• kernel/sys/buf.h

– definition of struct buf

• kernel/vfs/vfs_bio.c– bfreelist[], bufhash and buf routines (bread() etc.)

Page 28: Buffer Caches

Digital UNIX Internals II 4 - 28 Buffer Caches

Source Reference (2 of 4)

UBC• kernel/vm/vm_page.h

– definitions of vm_page, vm_page_array

• kernel/vm/vm_resident.c– definition of vm_page_bucket hashing array

• kernel/vfs/vfs_ubc.c– definition of ubc lru list

• kernel/vm/vm_ubc.h– definition of vm_ubc_object

• kernel/vfs/vfs_ubc.c– implementation of ubc routines interface routines.

Page 29: Buffer Caches

Digital UNIX Internals II 4 - 29 Buffer Caches

Source Reference (3 of 4)

Reading Data From a UBC Cached UFS File• kernel/ufs/ufs_vnops.c ufs_read()

ufs_getpage() ufs_getapage()

• kernel/vfs/vfs_ubc.c ubc_lookup()• kernel/vm/vm_resident.c vm_page_lookup()

Page 30: Buffer Caches

Digital UNIX Internals II 4 - 30 Buffer Caches

Source Reference (4 of 4)

Pagefaulting on a UBC MMAPed Page• kernel/arch/alpha/locore.s XentMM• kernel/arch/alpha/trap.c trap()• kernel/vm/vm_fault.c vm_fault()• kernel/vm/vm_umap.c u_map_fault()• kernel/vm/u_mape_vp.c u_vp_fault()