files and file systems 1 cs502 spring 2006 files and file systems cs502 – operating systems spring...

39
Files and File Systems 1 CS502 Spring 2006 Files and File Systems CS502 – Operating Systems Spring 2006

Post on 20-Dec-2015

230 views

Category:

Documents


3 download

TRANSCRIPT

Files and File Systems 1CS502 Spring 2006

Files and File Systems

CS502 – Operating Systems

Spring 2006

Files and File Systems 2CS502 Spring 2006

File – an abstraction

• A (potentially) large amount of information or data that lives a (potentially) very long time

• Often much larger than the memory of the computer• Often much longer than any computation

• (Usually) organized as a linear array of bytes or blocks

• Internal structure is imposed by application• (Occasionally) blocks may be variable length

• (Often) requiring concurrent access by multiple processes

• Even by processes on different machines!

Files and File Systems 3CS502 Spring 2006

File Systems and Disks

• User view– File is a named persistent collection of data

• OS & file system view– File is collection of disk blocks

– File System maps file names and offsets to disk blocks

• Outline for this section– Files

– Directories

– Implementation of Files

– Implementation of Directories

Files and File Systems 4CS502 Spring 2006

Fundamental ambiguity

• Is the file the “container of the information” or the “information” itself?

• Almost all systems confuse the two.

• Almost all people confuse the two.

Files and File Systems 5CS502 Spring 2006

Example – Suppose that you e-mail me a document

• Later, how do either of us know that we are using the same version of the document?

• Windows/Outlook/Exchange:• Time-stamp is a pretty good indication that they are• Time-stamps preserved on copy, drag and drop,

transmission via e-mail, etc.

• Unix• By default, time-stamps not preserved on copy, ftp,

e-mail, etc.• Time-stamp associated with container, not with

information

Files and File Systems 6CS502 Spring 2006

Rule of Thumb

• Almost always, application (and user) thinks in terms of the information

• Most systems think in terms of containers

Professional Guidance: Be aware of the distinction, even when the system is not

Files and File Systems 7CS502 Spring 2006

Attributes of Files

• Name:– Although the name may not

be what you think it is!

• Type:– May be encoded in the

name (e.g., .cpp, .txt)

• Dates:– Creation, updated, last

accessed, etc.

– (Usually) associated with container

• Size:– Length in number of bytes;

occasionally rounded up

• Protection:– Owner, group, etc.

– Authority to read, update, extend, etc.

• Locks:– For managing concurrent

access

• …

Files and File Systems 8CS502 Spring 2006

File Metadata

• Attributes of the file maintained by the file system– Separate from file itself– Possibly attached to the file– Used by OS in variety of ways

Files and File Systems 9CS502 Spring 2006

File Types

Files and File Systems 10CS502 Spring 2006

Non-attribute of files – Location

• Example 1:mv ~lauer/project1.doc ~cs502/public_html/S06

• Example 2:– System moves file from disk block 10,000 to

disk block 20,000– System restores a file from backup

Files and File Systems 11CS502 Spring 2006

Attribute anomaly – Unix File Naming

ln ~lauer/project1.doc ~cs502/public_html/S06

ln ~lauer/project1.doc NewProject1.doc

• Unix hard links allow one file to have more than one name and/or location– The real name of a Unix file is its i-node – an internal

name known only to the OS

• Hard links are rarely used in modern Unix practice– Usually safe to regard last element of path as name

Files and File Systems 12CS502 Spring 2006

Operations on Files

• Open, Close• Gain or relinquish access to a file• OS returns a file handle – an internal data structure letting it

cache internal information needed for efficient file access

• Read, Write, Truncate• Read: return a sequence of n bytes from file• Write: replace n bytes in file, and/or append to end• Truncate: throw away all but the first n bytes of file

• Seek, Tell• Seek: reposition file pointer for subsequent reads and writes• Tell: get current file pointer

• Create, Delete:• Conjure up a new file; or blow away an existing one

Files and File Systems 13CS502 Spring 2006

File – a very powerful abstraction

• Documents, code

• Databases• Very large, possibly spanning multiple disks

• Streams• Input, output, keyboard, display

• Pipes

• Virtual memory backing store

• …

Files and File Systems 14CS502 Spring 2006

File Access Methods

• Sequential access– Read all bytes/records from the beginning

– Cannot jump around, could possibly rewind or back up

– Convenient when medium was magnetic tape

• Random access– Bytes/records read in any order

– Essential for data base systems

– Read can be …• move file pointer (seek), then read or …

• read and then move file marker

Files and File Systems 15CS502 Spring 2006

File Access Methods (continued)

• Keyed or indexed access – access items in file based on the contents of an (part of an) item in the file– Provided in older commercial operating

systems (IBM ISAM)– (Today) usually handled separately by database

applications in modern systems

Files and File Systems 16CS502 Spring 2006

Directory – A Special Kind of File

• A tool for users & applications to organize and find files

• User-friendly names

• Names that are meaningful over long periods of time

• The data structure for OS to locate files (i.e., containers) on disk

Files and File Systems 17CS502 Spring 2006

Directory structures

• Single level– One directory per system, one entry pointing to each file– Small, single-user or single-use systems

• PDA, cell phone, etc.

• Two-level– Single “master” directory per system– Each entry points to one single-level directory per user– Uncommon in modern operating systems

• Hierarchical– Any directory entry may point to

• Another directory• Individual file

– Used in most modern operating systems

Files and File Systems 18CS502 Spring 2006

Directory Organization – Hierarchical

Files and File Systems 19CS502 Spring 2006

Directory Considerations

• Efficiency – locating a file quickly.

• Naming – convenient to users.• Separate users can use same name for separate files.

• The same file can have different names for different users.

• Names need only be unique within a directory

• Grouping – logical grouping of files by properties

• e.g., all Java programs, all games, …

Files and File Systems 20CS502 Spring 2006

More on Hierarchical Directories

• Most systems support idea of current (working) directory– Absolute names – fully qualified from root of file system

• /usr/group/foo.c

– Relative names – specified with respect to working directory• foo.c

– A special name – the working directory itself• “.”

• Modified Hierarchical – Acyclic Graph (no loops) and General Graph– Allow directories and files to have multiple names– Links are file names (directory entries) that point to existing

(source) files

Files and File Systems 21CS502 Spring 2006

Links

• Hard links: bi-directional relationship between file names and file – A hard link is directory entry that points to a source file’s metadata– Metadata counts the number of hard links (including the source

file) – link reference count – Link reference count is decremented when a hard link is deleted– File data is deleted and space freed when the link reference count

is 0

• Symbolic (soft) links: uni-directional relationship between a file name and the file– Directory entry points to new metadata– Metadata points to the source file– If the source file is deleted, the link pointer is invalid

Files and File Systems 22CS502 Spring 2006

Path Name Translation

• Assume that I want to open “/home/lauer/foo.c”fd = open(“/home/lauer/foo.c”, O_RDWR);

• File system does the following– Opens directory “/” – the root directory is in a known place on disk– Search root directory for the directory home and get its location– Open home and search for the directory lauer and get its location– Open lauer and search for the file foo.c and get its location– Open the file foo.c– Note that the process needs the appropriate permissions

• Some file systems spend a lot of time walking down directory paths– This is why open calls are separate from other file operations– File System attempts to cache prefix lookups to speed up common

searches

Files and File Systems 23CS502 Spring 2006

Directory Operations

• Create:• Make a new directory

• Add, Delete entry:• Invoked by file create & destroy, directory create & destroy

• Find, List:• Search or enumerate directory entries

• Rename:• Change name of an entry without changing anything else about it

• Link, Unlink:• Add or remove entry pointing to another entry elsewhere• Introduces possibility of loops in directory graph

• Destroy:• Removes directory; must be empty

Files and File Systems 24CS502 Spring 2006

More on Directories

• Fundamental mechanism for interpreting file names in an operating system

• Orphan: a file not named in any directory• Cannot be opened by any application (or even OS)• (Often) does not even have name!

• Tools• FSCK – check & repair file system, find orphans• Delete_on_close – when number of links reaches zero

• Special directory entry: “..” parent in hierarchy• Essential for maintaining integrity of directory system• Useful for relative naming

Files and File Systems 25CS502 Spring 2006

Implementation of Files

• Map file abstraction to physical disk blocks

• Some goals– Efficient in time, space, use of disk resources– Fast enough for application requirements– Effective for a wide variety of file sizes

• Many small files (< 1 page)

• Huge files (100’s of gigabytes, terabytes, spanning disks)

• Everything in between

Files and File Systems 26CS502 Spring 2006

File Allocation Schemes

• Contiguous– Blocks of file stored in consecutive disk sectors– Directory points to first entry, others derived by math

• Linked– Blocks of file scatter across disk, as linked list– Directory points to first entry, follow link to find others

• Indexed– Separate index block contains pointers to file blocks– Directory points to index block, lookup all blocks

Files and File Systems 27CS502 Spring 2006

Contiguous Allocation

• Ideal for large, static files– Databases, fixed system structures, OS code– CD-ROM, DVD

• Simple address calculation– Block address sector address

• Fast multi-block reads and writes– Minimize seeks between blocks

• Prone to fragmentation when …• Files come and go• Files change size

– Similar to unpaged virtual memory

Files and File Systems 28CS502 Spring 2006

Bad Block Management –Contiguous Allocation

• Bad blocks must be concealed• Foul up the block-to-sector calculation

• Methods• Spare sectors in each track, remapped by formatting• Look-aside list of bad sectors

– Check each sector request against hash table– If present, substitute a replacement sector behind the

scene

• Handling• Disk controller, invisible to OS• Lower levels of OS file system, invisible to appl.

Files and File Systems 29CS502 Spring 2006

Contiguous Allocation – Extents

• Extent: a contiguously allocated subset of a file• Directory entry contains

– (For file with one extent) the extent itself– (For file with multiple extents) pointer to an extent

block describing multiple extents

• Advantages– Speed, ease of address calculation of contiguous file– Avoids (some of) the fragmentation issues– Can be extended to support files across multiple disks

• Disadvantages– Degenerates to indexed allocation in Unix-like systems

Files and File Systems 30CS502 Spring 2006

Linked Allocation

• Blocks scattered across disk– Each block contains

pointer to next block

– Directory points to first and last blocks

– Sector header block:• Pointer to next block

• ID and block number of file

Files and File Systems 31CS502 Spring 2006

Linked Allocation

• Advantages– No space fragmentation– Easy to create, extend files– Ideal for lots of small files

• Disadvantages– Lots of disk arm movement– Space taken up by links– Sequential access only!

Files and File Systems 32CS502 Spring 2006

Variation on Linked Allocation – File Allocation Table

• Instead of a link oneach block, put alllinks in one table– the FAT (File Allocation Table)– fixed place on disk

• One entry per physical block in disk– Directory points to first & last

blocks of file– Each block points to next block

(or EOF)

Files and File Systems 33CS502 Spring 2006

FAT File Systems

• Advantages– Advantages of Linked File System– FAT can be cached in memory– Searchable at CPU speeds, pseudo-random access

• Disadvantages– Limited size, not suitable for very large disks– FAT cache describes entire disk, not just open files!– Not fast enough for large databases

• Used in MS-DOS, early Windows systems

Files and File Systems 34CS502 Spring 2006

Disk Defragmentation

• Re-organize blocks in disk so that file is (mostly) contiguous

• Linked or FAT organization preserved

• Minimizes disk arm movement during sequential accesses

Files and File Systems 35CS502 Spring 2006

Bad Block Management –Linked and FAT Systems

• In OS:– format all sectors of disk• Don’t reserve any spare sectors

• Allocate bad blocks to a hidden file for the purpose

• If a block becomes bad, append to the hidden file

• Advantages• Very simple

• No look-aside or sector remapping needed

• Totally transparent without any hidden mechanism

Files and File Systems 36CS502 Spring 2006

Indexed Allocation

• i-node:– Part of file metadata– Data structure lists the

sector address of each block of file

• Advantages– True random access– Only i-nodes of open

files need to be cached– Supports small and

large files

Files and File Systems 37CS502 Spring 2006

Unix i-nodes – Part of File Metadata

• Direct blocks:– Pointers to first n

sectors

• Single indirect table:– Extra block containing

pointers to blocks n+1 .. n+m

• Double indirect table:– Extra block containing

single indirect blocks

• …

Files and File Systems 38CS502 Spring 2006

Indexed Allocation

• Access to every block of file is via i-node

• Bad block management– Same as Linked/FAT systems

• Disadvantage– Not as fast a contiguous allocation for large

databases

Files and File Systems 39CS502 Spring 2006

Next time

• Implementing Directories

• Other “disk” systems– CD-ROM– RAID (Redundant Array of Inexpensive Disks)

• Stable Storage

• Log File Systems