processing data in external storage cs 302 - data structures mehmet h gunes modified from authors’...

39
Processing Data in External Storage CS 302 - Data Structures Mehmet H Gunes dified from authors’ slides

Upload: felicity-blankenship

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Processing Data in External Storage

CS 302 - Data StructuresMehmet H Gunes

Modified from authors’ slides

Contents

• A Look at External Storage• Sorting Data in an External File• External Dictionaries

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

A Look at External Storage

• External storage exists after program execution

• Generally, there is more external storage than internal memory

• Direct access files are essential for external dictionaries

• A file contains records that are organized into blocks

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

A Look at External Storage

• A file partitioned into blocks of records

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

A Look at External Storage

• Direct access input and output involves blocks instead of records

• A buffer stores data temporarily

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

A Look at External Storage

• Updating a portion of a record within a block

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

A Look at External Storage

• Time required to read or write a block of data– Longer than the time required to operate on the

block’s data– Implication: reduce the number of required block

accesses

• File access time dominant factor in an algorithm’s efficiency

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Sorting Data in an External File

Example Problem•Fille contains 1,600 employee records•Sorted by Social Security number•Each block contains 100 records•Program can access only enough internal memory to manipulate 3 blocks at a time

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Sorting Data in an External File

• (a) Sixteen sorted runs, one block each, in file F1 ; (b) Eight sorted runs, two blocks each, in file F2 ;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Sorting Data in an External File

• (c) Four sorted runs, four blocks each, in file F1 ; (d) Two sorted runs, eight blocks each, in file F2

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Sorting Data in an External File

• (a) Merging single blocks; (b) merging long runs

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Sorting Data in an External File

• Algorithm for merging arbitrary-sized sorted runs

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

External Dictionaries

• Records in order by search key• Algorithm to traverse file in sorted order

• Retrieval could be done with binary search

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

External Dictionaries

• Shifting across block boundaries

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Indexing an External File

Benefits of indexing with catalog1.Index record will be much smaller than a data record2.Do not need to maintain the data file in any particular order, insert new records in any convenient location3.Maintain several indexes simultaneously

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Indexing an External File

• A data file with an index

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Indexing an External File

• A data file with a sorted index file

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

External Hashing

• Similar to the internal scheme described in Chapter 18

• Hash the index file instead of data file

• Hash table — contains a pointer to beginning of chain of items that hash into that location

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

External Hashing

• A hashed index file

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

External Hashing

• A single block with a pointer

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

External Hashing

Insertion under external hashing1.Insert the data record into the data file2.Insert a corresponding index record into the index file

Removal under external hashing1.Search index file for corresponding index record2.Remove data record from the data file

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a) Blocks organized into a 2-3 tree;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (b) a single node of the 2-3 tree

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a) A node with two children; (b) a node with three children; (c) a node with m children

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a) A full tree whose internal nodes have five children; (b) the format of a single node

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• A B-tree of degree 5

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

Insertion into a B-tree

1.Insert the data record into the data file2.Insert a corresponding index record into the index file

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a through d) The steps for inserting 55

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a through d) The steps for inserting 55

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (e) splitting the root

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

Removal from a B-tree

1.Locate the index record in the index file2.Remove the data record from the data file

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a through e) The steps for removing 73 ;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a through e) The steps for removing 73 ;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a through e) The steps for removing 73 ;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (a through e) The steps for removing 73 ;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

B-Trees

• (f) removing the root;

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Traversals

• Accessing only the search key of each record, but not the data file

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Traversals

• Accessing data file also

Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013

Multiple Indexing

• Multiple index filesData Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013