chapter 51 chapter 5 record storage & primary file organizations

57
Chapter 5 1 Chapter 5 Record Storage & Primary File Organizations

Upload: elaina-burnley

Post on 11-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 1

Chapter 5Record Storage & Primary File

Organizations

Page 2: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 2

Storage

• The are two general types of storage media that is used with computers. They are :– Primary Storage - This includes all storage

media that can be operated on directly by the CPU (RAM , L1 and L2 Cache Memory)

– Secondary Storage - This includes Hard Drives, CD’s and tape.

Page 3: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 3

Memory Hierarchies & Storage Devices

• The Memory Hierarchy is based upon speed of access. However, this speed comes with a price tag attached which varies inversely with the access time of memory. Like cars the faster the memory access is the more it costs.

Page 4: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 4

Primary Storage Level of Memory

• The Primary Storage Level of Memory is generally made up of 3 Levels.– L1 Cache which is located on the CPU – L2 Cache which is located near the CPU– Main Memory which is the RAM figure that is

often referred to in computer advertisements

Page 5: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 5

Secondary Storage Level of Memory

• The Secondary Storage Level of Memory may be made up of 4 Levels.– Flash Memory or EEPROM– Hard Drives – CD ROM’s– Tape

Page 6: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 6

Figure 5.1

Page 7: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 7

Terms Used in the Hardware Description of Hard Drives

• Capacity - The number of bytes it can store.

• Single-sided vs. Double-sided - States if the disk/platter is written on one or both sides.

• Disk Pack - A collection of disks/platters that are assembled together into a pack.

• Track - A Circle of a small width on a disk. A disk surface will have many tracks.

Page 8: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 8

Terms Used in the Hardware Description of Hard Drives

• Sector - A segment or arc of a track.

• Block - is the division of a track into equal sized portions by the operating system.

• Interblock Gaps - These are fixed sized segments that separate the blocks.

• Read/Write Head - Actual reads/writes the information to the disk.

Page 9: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 9

Terms Used in the Hardware Description of Hard Drives

• Cylinder - Tracks with the same diameter that are located on the disk surface of a disk pack.

Page 10: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 10

Figure 5.2

Page 11: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 11

Terms Used in Measuring Disk Operations

• Seek Time (s)- The time it takes to position the read/write head on the desired track. It will be given in all problems that it is needed for.

• Rotational Delay (rd) - The average amount of time it takes the desired block to rotate into position under the read/write head. Rd=(1/2)*(1/p) min where p is rpm of the disk

Page 12: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 12

Terms Used in Measuring Disk Operations

• Transfer Rate (tr) - The rate at which information can be transferred to or from the disk. tr =(track size)/(1/p min)

• Block Transfer Time (btt) - The time it takes to transfer the data once the read/write head has been positioned. btt = B/tr msec where B is the block size in bytes.

Page 13: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 13

Terms Used in Measuring Disk Operations

• Bulk Transfer Rate (btr) - The rate at which multiple blocks can be written/read to contiguous blocks. Where G is the Interblock Gap

btr = (B/(B+G)) * tr bytes/msec

• Rewrite Time (Trw) - Time it takes after a block is read to write that same block back to the disk or the time for one revolution.

Page 14: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 14

Computing Times

• Given :– Seek Time (s) = 10 msec– Rotational speed = 3600 rpm– Track size = 50 KB– Block size (B) = 512 bytes– Interblock Gap = 128 bytes

Page 15: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 15

Problems for Disk Operations

• Compute the average time it takes to transfer 1 block on this system.

• Compute the average time it takes to transfer 20 non-contiguous blocks that are located on the same track.

• Compute the average time it takes to transfer 20 contiguous blocks.

Page 16: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 16

Parallelizing Disk Access Using RAID

• RAID - Stands for Redundant Arrays of Inexpensive Disks or Redundant Arrays of Independent Disks.

• RAIDs are used to provide increased reliability, increased performance or both.

Page 17: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 17

RAID Levels

• Level 0 - has no redundancy and the best write performance but its read performance is not as good as level 1.

• Level 1 - uses mirrored disks which provide redundancy and improved read performance.

• Level 2 - provides redundancy using Hamming Codes

Page 18: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 18

RAID Levels

• Level 3 - uses a single parity disk.

• Level 4 and 5 - use block-level data striping with level 5 distributing the data across all the disks.

• Level 6 - uses the P + Q redundancy scheme making use of the Reed-Soloman codes to protect against the failure of 2 Disks.

Page 19: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 19

Figure 5.4

Page 20: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 20

Fig 5.5

Page 21: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 21

Fig 5.6

Page 22: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 22

Records

• Records is the term used to refer to a number of related values or items. Each value or item is stored in a field of a specific data type.

• Records may be of either fixed or variable lengths.

Page 23: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 23

Variable Length Records in Files

• There are several reasons a record with the same record type may be of variable length.– Variable length fields– Repeating fields

• For efficiency reasons different record types may be clustered in a file.

Page 24: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 24

Fig 5.7

Page 25: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 25

Spanned Vs Unspanned Records

• When the records in a file is stored on a disk they may be placed in blocks of a fixed size. This will rarely match the record size. So a decision must be made when the record size is smaller than the block size and the block size is not a multiple of the record size whether to store the record all in one block and have unused space or in two different blocks.

Page 26: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 26

Fig 5.8

Page 27: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 27

File Operations

• File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods.

• Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification.

Page 28: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 28

File Structure

• Heap (Pile) Files

• Hash (Direct) Files

• Ordered (Sorted) Files

• B - Trees

Page 29: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 29

• Once the data has been brought into memory, it can be accessed by an instruction in .00000004 seconds by a machine running a 25MIPS. The disparity between time for memory access and disk access is enormous:we can perform 625,000 instructions in the time it takes to read /write one disk page.

• To put this in human terms if you were typing a letter for you boss and found a word you could not make out so you leave him a voice mail message. Since you were told to do nothing else but this you patiently wait for his reply doing Nothing! Unfortunately, he just went on vacation and does not get your message for 3 WEEKS.

• This is similar to the computer waiting .025 seconds to get the needed data into memory from a disk read.

Page 30: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 30

Heap (Pile) Files(Unordered)

• Insertions - Very efficient

• Search - Very inefficient (Linear Search)

• Deletion - Very inefficient– Lazy Deletion

• Problems?

• When are they Used?

Page 31: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 31

Ordered (Sorted Files) Records

• Records are stored based on the value contained in one of their fields called the ordering field.

• If the ordering field is also a key field than the field is better described as an ordering key.

Page 32: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 32

Advantages of Ordered Files

• Reading of the records in order of the ordering field is extremely efficient.

• Finding the next record is fast.

• Finding records based on a query of the ordering field is efficient. (binary search).

• Binary search may be done on the blocks as well.

Page 33: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 33

Disadvantages of Ordered Files

• Searches on non-ordering fields are inefficient.

• Insertion and deletion of records are very expensive.

• Solutions to these problems?

Page 34: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 34

Hashing Techniques

• This is where a records placement is determined by value in the hash field. This value has a hash or randomizing function applied to it which yields the address of the disk block where the record is stored. For most records, we need only a single-block access to retrieve that record.

Page 35: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 35

Internal Hashing

• Internal Hashing is implemented as a hash table through the use of an array of records. (In memory)

• An array index range of 0 to M-1. A function that transforms the hash field value into an integer between 0 to M-1 is used. A common one is h(K) =K mod M.

Page 36: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 36

Internal Hashing (con’t)

• Collisions occur when a hash field value of a record being inserted hashes to an address that already contains a different record.

• The process of finding another position for this record is called collision resolution.

Page 37: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 37

Collision Resolution

• Open Addressing- Places the record to be inserted in the first available position subsequent to the hash address.

• Chaining - A pointer field is added to each record location. When an overflow occurs this pointer is set to point to overflow blocks making a linked list.

Page 38: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 38

Collision Resolution (con’t)

• Multiple hashing - If an overflow occurs a second hash function is used to find a new location. If that location is also filled either another hash function is applied or open addressing is used.

Page 39: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 39

Fig 5.10 Page 140

Page 40: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 40

Goals of the Hash Function

• The goals of a good hash function are to uniformly distribute the records over the address space while minimizing collisions to avoid wasting space.

• Research has shown – 70% to 90% fill ratio best.– That when uses a Mod function M should be a

prime number.

Page 41: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 41

External Hashing for Disk Files

• External hashing makes use of buckets, each of which can hold multiple records.

• A bucket is either a block or a cluster of contiguous blocks.

• The hash function maps a key into a relative bucket number, rather than an absolute block address for the bucket.

Page 42: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 42

Types of External Hashing

• Using a fixed address space is called static hashing.

• Dynamically changing address space:– Extendible hashing*– Linear hashing**

* With a Directory

** Without a Directory

Page 43: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 43

Static Hashing

• Under Static Hashing a fixed number of buckets (M) is allocated.

• Based on the hash value a bucket number is determined in the block directory array which yields the block address.

• If n records fit into each block. This method allows up to n*M records to be stored.

Page 44: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 44

Fig 5.11 Page 143

Page 45: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 45

Fig 5.12 Page 144

Page 46: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 46

Extendible Hashing• In Extendible Hashing, a type of directory is

maintained as an array of 2d bucket addresses. Where d refers to the first d high (left most) order bits and is referred to as the global depth of the directory. However, there does NOT have to be a DISTINCT bucket for each directory entry.

• A local depth d’ is stored with each bucket to indicate the number of bits used for that bucket.

Page 47: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 47

Figure 5.13 Page 146

Page 48: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 48

Overflow (Bucket Splitting)

• When an overflow in a bucket occurs that bucket is split. This is done by dynamically allocating a new bucket and redistributing the contents of the old bucket between the old and new buckets based on the increased local depth d’+1 of both these buckets.

Page 49: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 49

Overflow (Bucket Splitting)

• Now the new bucket’s address must be added to the directory.

• If the overflow occurred in a bucket whose current local depth d’ is less than or equal to the global depth d adjust the directory entries accordingly. (No change in the directory size is made.)

Page 50: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 50

Overflow (Bucket Splitting)

• If the overflow occurred in a bucket whose current local depth d’ is now greater than the global depth d you must increase the global depth accordingly.

• This results in a doubling of the directory size for each time d is increased by 1 and appropriate adjustment of the entries.

Page 51: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 51

Slide showing how buckets are split under Extendible Hashing.

Page 52: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 52

Shrinking Extendible Hashing Files

• The generally used principal for shrinking extendible hashing files is that when d > d’ for all buckets after a deletion occurs.

• Buckets may be combined when the each of the buckets to be combined are less than half full and have the same bit pattern with the exception of the d’ bit. I.e. d’ = 3 and the bit patterns of 110 and 111.

Page 53: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 53

Linear Hashing

• Linear Hashing allows the hash file to expand and shrink its number of buckets dynamically without needing a directory.

• It starts with M buckets numbered 0 to M-1 and use the mod hash function

h(K)= K mod M

as the initial hash function called hi.

Page 54: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 54

Linear Hashing (Con’t)

• Overflow is handled by chaining individual overflow chains for each bucket.

• It works by methodically splitting the original buckets; starting with bucket 0, redistributing the contents of bucket 0 between bucket 0 and bucket M (the new bucket) using a secondary hash function:

h i+1(K) = K mod 2M

Page 55: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 55

Linear Hashing (Con’t)

• This splitting of buckets is done in order (0,1,…,M-1) REGARDLESS of which bucket the collision occurred. To keep track of the next bucket to be split we will use n. So n would be incremented to 1.

• When a record hashes to a bucket less than n we use the secondary hash function to determine which of the two buckets it belongs in.

Page 56: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 56

Linear Hashing (Con’t)• When all of the original M buckets have

been split and we have 2M buckets and n = M

• We reset M to 2M, n to 0 and change our secondary hash function to our primary hash function.

• Shrinking of the file is done based on the load factor using the reverse of splitting.

Page 57: Chapter 51 Chapter 5 Record Storage & Primary File Organizations

Chapter 5 57

Slide showing how to split using linear hashing.