transactions and reliability. file system components disk management naming reliability what are...
TRANSCRIPT
File system components
Disk managementNamingReliability
What are the reliability issues in file systems?
Security
File system reliability issues
Heavy caching to achieve higher performance. Machines crash all the time. How to maintain a
consistent file system after the machine crashes?
Disk can crash. How to keep the data even when the disk breaks?
Caching issue
File systems have lots of metadata:Free blocks, directories, file headers, indirect
blocks
Metadata is heavily cached for performanceModification to the metadata may not be on
disk right away.
Problem
System crashes: all cached data are lost!!OS needs to ensure that the file system does
not reach an inconsistent state
Example: move a file between directoriesRemove a file from the old directoryAdd a file to the new directory
What happens when a crash occurs in the middle?
UNIX File System (Ad Hoc Failure-Recovery)
Metadata handling:Uses a synchronous write-through caching
policyA call to update metadata does not return until the
changes are propagated to disk
Updates are orderedWhen crashes occur, run fsck to repair in-
progress operations
Some Examples of Metadata Handling
Undo effects not yet visible to usersIf a new file is created, but not yet added to the
directory--Delete the file
Continue effects that are visible to usersIf file blocks are already allocated, but not
recorded in the bitmap--Update the bitmap
UFS User Data Handling
Uses a write-back policyModified blocks are written to disk at 30-second
intervals--Unless a user issues the sync system call
Data updates are not orderedIn many cases, consistent metadata is good
enough
Current solution: the Transaction Approach (Journaling)A transaction groups operations as a
unit, with the following characteristics:Atomic: all operations in a transaction either
happen or they do not (no partial operations)Serializable: transactions appear to happen
one after the otherDurable: once a transaction happens, it is
recoverable and can survive crashes
More on Transactions
A transaction is not done until it is committed
Once committed, a transaction is durableIf a transaction fails to complete, it must
rollback as if it did not happen at all
Transaction Implementation (One Thread)
Example: money transfer
Begin transaction
x = x – $100;
y = y + $100;
Commit
Transaction Implementation (One Thread)
Common implementations involve the use of a log, a journal that is never erased
A file system uses a write-ahead log to track all transactions information is written to the log before written
into the disk.
Transaction Implementation (One Thread)
Once accounts of x and y are on a log, the log is committed to disk in a single write
Actual changes to those accounts are done later
Transaction Illustrated
x = 1;y = 1;
x = 0;y = 2;
begin transaction
old x: 1
old y: 1
new x: 0
new y: 2
commit
Commit the log to disk before updating the actual values on disk
Transaction Steps
Mark the beginning of the transactionLog the changes in account xLog the changes in account yCommitModify account x on diskModify account y on disk
Scenarios of Crashes
If a crash occurs after the commitReplays the log to update accounts
If a crash occurs before the commitRolls back and discard the transaction
A crash cannot occur during the commitCommit is built as an atomic operatione.g. writing a single sector on disk
Two-Phase Locking (Multiple Threads)
Logging alone not enough to prevent multiple transactions from trashing one another (not serializable)
Solution: two-phase locking1. Acquire all locks
2. Perform updates and release all locks
Thread A cannot see thread B’s changes until thread B commits and releases locks
Transactions in File Systems
Many recent file systems built use write-ahead loggingWindows NT, Solaris, ext3 (Linux), etc
+ Eliminates running fsck after a crash
+ Write-ahead logging provides reliability
- All modifications need to be written twice
Log-Structured File System (LFS)
If logging is so great, why don’t we treat everything as log entries?
Log-structured file systemEverything is a log entry (file headers,
directories, data blocks)Write the log only once
Use version stamps to distinguish between old and new entries
More on LFS
New log entries are always appended to the end of the existing logAll writes are sequentialSeeks only occurs during reads
Not so bad due to temporal locality and caching
Problem:Need to create more contiguous space all the
time
RAID: dealing with disk crash
RAID: redundant array of independent disksStandard way of organizing disks and classifying the
reliability of multi-disk systemsGeneral methods: data duplication, parity, and error-
correcting codes (ECC)
Mirrored Disks (RAID Level 1)
Writes go to both disks
+ Reliability is doubled
+ Read access faster
- Write access slower
- Expensive and inefficient
Memory-Style ECC (RAID Level 2)
Some disks in array are used to hold ECC Using Hamming codes as the ECC
correct one bit error in a 4 bits code word requires 3 redundant bits.
Memory-Style ECC (RAID Level 2)
+ More efficient than mirroring
+ Can correct, not just detect, errors
- Still fairly inefficiente.g., 4 data disks require 3 ECC disks
Bit-Interleaved Parity (RAID Level 3)
One disk in the array stores parity for the other disksEnough to correct the error when the disk controller
tells which disk fails.
+ More efficient that Levels 1 and 2- Parity disk doesn’t add bandwidth
Parity Method
Disk 1: 1001Disk 2: 0101Disk 3: 1000Parity: 0100 (even parity: the number of
1’s is an even number)
How to recover disk 2?
Block-Interleaved Parity (RAID Level 4)
+ More efficient data access than level 3
- Parity disk can be a bottleneck- Every write needs to write the parity disk.
- Small writes require 4 I/OsRead the old blockRead the old parityWrite the new blockWrite the new parity
Block-Interleaved Distributed-Parity (RAID Level 5)
Sort of the most general level of RAIDSpreads the parity out over all disks
+ No parity disk bottleneck
Block-Interleaved Distributed-Parity (RAID Level 5)
+ All disks contribute read bandwidth
– Requires 4 I/Os for small writes