cps110: i/o and file systems landon cox april 8, 2008

CPS110: I/O and file systems

Landon Cox

April 8, 2008

Virtual/physical interfaces

HardwareHardware

OSOS

ApplicationsApplications

Multiple updates and reliability

Reliability is only an issue in file systems Don’t care about losing address space

after crash Your files shouldn’t disappear after a crash Files should be permanent

Multi-step updates cause problems Can crash in the middle

Multi-step updates

Transfer $100 from Melissa’s account to mine1. Deduct $100 from Melissa’s account2. Add $100 to my account

Crash between 1 and 2, we lose $100

Multiple updates and reliability

Seems obvious, right? No modern OS would make this mistake, right?

Video evidence suggests otherwise Directory with 3 files Want to move them to external drive Drive “fails” during move Don’t want to lose data due to failure

Roll film …

The lesson?

1. Building an OS is hard.2. OSes are software, not religions (one is not morally

superior to another)3. “Fanboys are stupid, but you are not.” (Anil Dash,

dashes.com/anil)

Multi-step updates

Back to business … Move file from one directory to

another1. Delete from old directory2. Add to new directory

Crash between 1 and 2, we lose a file“/home/lpcox/names” “/home/chase/names”

Multi-step updates

Create an empty new file1. Point directory to new file header2. Create new file header

What happens if we crash between 1 and 2? Directory will point to uninitialized header Kernel will crash if you try to access it

How do we fix this? Re-order the writes

Multi-step updates

Create an empty new file1. Create new file header2. Point directory to new file header

What happens if we crash between 1 and 2? File doesn’t exist File system won’t point to garbage

Multi-step updates

What if we also have to update a map of free blocks?1. Create new file header2. Point directory to new file header3. Update the free block map

Does this work? Bad if crash between 2 and 3 Free block map will still think new file header

is free

Multi-step updates

What if we also have to update a map of free blocks?1. Create new file header2. Update the free block map3. Point directory to new file header

Does this work? Better, but still bad if crash between 2 and 3 Leads to a disk block leak Could scan the disk after a crash to recompute free map Older versions of Unix and Windows do this (now we have journaling file systems …)

Careful ordering is limited

Transfer $100 from Melissa’s account to mine1. Deduct $100 from Melissa’s account2. Add $100 to my account

Crash between 1 and 2, we lose $100 Could reverse the ordering

1. Add $100 to my account2. Deduct $100 from Melissa’s account

Crash between 1 and 2, we gain $100 What does this remind you of?

Atomic actions

A lot like pre-emptions in a critical section Race conditions

Allow threads to see data in inconsistent state Crashes

Allow OS to see file system in inconsistent state

Want to make actions atomic All operations are applied or none are

Atomic actions

With threads Built larger atomic operations

(lock, wait, signal) Using atomic hardware operations

(test and set, interrupt enable/disable)

Same idea for persistent storage Transactions

Transactions

Fundamental to databases (except MySQL, until recently)

Several important properties “ACID” (atomicity, consistent, isolated,

durable) We only care about atomicity (all or

nothing) BEGIN disk write 1 … disk write nEND

Called “committing” the transaction

Transactions

Basic atomic unit provided by the hardware Writing to a single disk sector/block (not actually atomic, but close)

How to make sequences of updates atomic?

Two styles Shadowing Logging

Transactions: shadowing

Easy to explain, not widely used1. Update a copy version2. Switch the pointer to the new version

Each individual step is atomic Step 2 commits the transaction Why doesn’t anyone do this?

Double the storage overhead

Transactions: logging

1. Begin transaction2. Append info about modifications to a log3. Append “commit” to log to end x-action4. Write new data to normal database Single-sector write commits x-action (3)

Invariant: append new data to log before applying to DBCalled “write-ahead logging”

Transaction CompleteTransaction Complete



What if we crash here (between 3,4)?On reboot, reapply committed updates in log order.



What if we crash here?On reboot, discard uncommitted updates.

Transactions

Most file systems Use transactions to modify meta-data

Why not use them for data? Related updates are program-specific Would have to modify programs OS doesn’t know how to group updates


HardwareHardware

OSOS


Naming and directories

How to locate a file? Need to tell OS which file header Use a symbolic name or click on an icon

Could also describe contents Like Google desktop and Mac Spotlight Naming in databases works this way

Name translation

User-specified file on-disk location Lots of possible data structures (hash table, list of pairs, tree, etc)

Once you have the header, the rest is easy

“/home/lpcox/”FS

translation data

(directory)

FS translation

data(directory)

Directories

Directory Contains a mapping for a set of

files Name file header’s disk block #

Often just table <file name, disk block #> pairs

Directories

Stored on disk along with actual files Disk stores lots of kinds of data

End-user data (i.e. data blocks) Meta-data that describes end-user data

Inodes, directories, indirect blocks

Can often treat files and directories the same Both can use the same storage structure E.g. multi-level indexed files Allows directories to be larger than one block

Directories

Directories can point to each other Name file’s disk header block # Name directory’s disk header block #

Can users read/write directories like files? OS has no control over content of data blocks OS must control content of meta-data blocks

Why? OS interprets meta-data, it must be well-

formatted

Directories

Users still have to modify directories E.g. create and delete files

How do we control these updates? Where else have we seen this issue?

Use a narrow interface (e.g. createfile ()) Like using system calls to modify kernel

Directories

Typically a hierarchical structure Directory A has mapping to files and

directories /lpcox/cps110/names

/ is the root directory /lpcox is a directory within the / directory /lpcox/cps110 is a directory within the /lpcox

directory

How many disk I/Os?

Read first byte of /lpcox/cps110/names?1. File header for / (at a fixed spot on disk)2. Read first data block for /3. Read file header for /lpcox4. Read first data block for /lpcox5. Read file header for /lpcox/cps1106. Read first data block for /lpcox/cps1107. Read file header for /lpcox/cps110/names8. Read first data block for /lpcox/cps110/names

How many disk I/Os?

Caching is only way to make this efficient If file header block # of /lpcox/cps110 is cached Can skip steps 1-4

Current working directory Allows users to specify file names instead of full

paths Allows system to avoid walking from root on

each access Eliminates steps 1-4

Mounting multiple disks

Can easily combine multiple disks Basic idea:

Each disk has a file system (a / directory) Entry in one directory can point to root of another

FS This is called a “mount point”

For example, on my machine crocus /bin is part of one local disk /tmp is part of another local disk /afs is part of the distributed file system AFS

Mounting multiple disks

Requires a new mapping type Name file’s disk header block # Name directory’s disk header block # Name device name, file system type

Use drivers to walk multiple file systems Windows: disks visible under

MyComputer/{C,D,E} Unix: disks visible anywhere in tree

Course administration

Project 3: out today, due in two weeks Amre is the expert on these I can help, but he is the one in charge

Discussion sections Will focus on Project 3 Will be hard to do P3 without


HardwareHardware

OSOS


File caching

File systems have lots of data structures Data blocks Meta-data blocks

Directories File headers Indirect blocks Bitmap of free blocks

Accessing all of these is really slow

File caching

Caching is the main thing that improves performance Random sequential I/O Accessing disk accessing memory

Should the file cache be in virtual or physical memory? Should put in physical memory Could put in virtual memory, but might get paged out Worst-case: each file is on disk twice

Could also use memory-mapped files This is what Windows does

Memory-mapped files

Basic idea: use VM system to cache files Map file content into virtual address

space Set the backing store of region to file Can now access the file using load/store

When memory is paged out Updates go back to file instead of swap

space

File Read in Windows NT

ApplicationApplication

Kernel mode

User mode

I/O ManagerI/O Manager

Disk DriverDisk Driver

Cache Managermap <A,file>

cp A->b

Cache Managermap <A,file>

cp A->b

VM Managerlookup <A,file>

cp file->A

VM Managerlookup <A,file>

cp file->ADiskDisk

Read(file,b)

Page Fault (A)

FS DriverFS Driver

Disk I/O

IRP(file,b) IRP(file,b) Read(file,b) Read(file,b)

NonCachedRead(file)

File caching issues

Normal design considerations for caches Cache size, block size, replacement, etc.

Write-back or write-through? Write-through: writes to disk immediately

Slow, loss of power won’t lose updates

Write-back: delay for a period Fast, loss of power can lose updates

Most systems use a 30 sec write-back delay

Distributed file systems

Distributed file system Data stored on many different

computers One, unified view of all data

Same interface to access Local files Remote files

Distributed file systems

Examples: AFS, NFS, Samba, web(?) Why use them?

Easier to share data among people Uniform view of files from different

machines Easier to manage (backup, etc)

Basic implementation Classic client/server model

Clients Server

Going over the network for data makes performance bad.Cache at the client to improve.

Caching

I’m sitting at crocus and want to access /usr/project/courses/cps110/bin/submit110 Which is stored at linux.cs.duke.edu

What should happen to the file? Transfer sole copy from server to client? Make a copy of the file instead (replication)

Caching

What happens if I modify the file? Other clients’ copy will be out of date

All copies must be kept consistent1.Invalidate all other copies2.Update other copies

Exam analogy: all exams must be the same Could have everyone update their copy

(broadcast) Could make everyone throw out their copy

(invalidation)

Caching

When do I give up my copy? (invalidation) Only when someone else modifies

it

Cached file states

Someoneelse writes

the file

I write the file

Someoneelse writes

the file

I write the file

InvalidInvalid

I read the file

SharedSharedExclusiveExclusiveI read the file

What else from 110 does this remind you of?Reader-writer locks: allow parallel reads of shared data, one writer

cps110: i/o and file systems landon cox april 8, 2008

Documents

new file headerupdate

new file headerdoes

new directorycrash

new filepoint directory

file systemsdont care

file systemslandon coxapril

free block mappoint

melissas accountcrash