processing data in external storage cs 302 - data structures mehmet h gunes modified from authors’...
TRANSCRIPT
Processing Data in External Storage
CS 302 - Data StructuresMehmet H Gunes
Modified from authors’ slides
Contents
• A Look at External Storage• Sorting Data in an External File• External Dictionaries
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
A Look at External Storage
• External storage exists after program execution
• Generally, there is more external storage than internal memory
• Direct access files are essential for external dictionaries
• A file contains records that are organized into blocks
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
A Look at External Storage
• A file partitioned into blocks of records
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
A Look at External Storage
• Direct access input and output involves blocks instead of records
• A buffer stores data temporarily
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
A Look at External Storage
• Updating a portion of a record within a block
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
A Look at External Storage
• Time required to read or write a block of data– Longer than the time required to operate on the
block’s data– Implication: reduce the number of required block
accesses
• File access time dominant factor in an algorithm’s efficiency
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Sorting Data in an External File
Example Problem•Fille contains 1,600 employee records•Sorted by Social Security number•Each block contains 100 records•Program can access only enough internal memory to manipulate 3 blocks at a time
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Sorting Data in an External File
• (a) Sixteen sorted runs, one block each, in file F1 ; (b) Eight sorted runs, two blocks each, in file F2 ;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Sorting Data in an External File
• (c) Four sorted runs, four blocks each, in file F1 ; (d) Two sorted runs, eight blocks each, in file F2
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Sorting Data in an External File
• (a) Merging single blocks; (b) merging long runs
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Sorting Data in an External File
• Algorithm for merging arbitrary-sized sorted runs
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
External Dictionaries
• Records in order by search key• Algorithm to traverse file in sorted order
• Retrieval could be done with binary search
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
External Dictionaries
• Shifting across block boundaries
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Indexing an External File
Benefits of indexing with catalog1.Index record will be much smaller than a data record2.Do not need to maintain the data file in any particular order, insert new records in any convenient location3.Maintain several indexes simultaneously
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Indexing an External File
• A data file with an index
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Indexing an External File
• A data file with a sorted index file
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
External Hashing
• Similar to the internal scheme described in Chapter 18
• Hash the index file instead of data file
• Hash table — contains a pointer to beginning of chain of items that hash into that location
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
External Hashing
• A hashed index file
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
External Hashing
• A single block with a pointer
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
External Hashing
Insertion under external hashing1.Insert the data record into the data file2.Insert a corresponding index record into the index file
Removal under external hashing1.Search index file for corresponding index record2.Remove data record from the data file
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a) Blocks organized into a 2-3 tree;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (b) a single node of the 2-3 tree
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a) A node with two children; (b) a node with three children; (c) a node with m children
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a) A full tree whose internal nodes have five children; (b) the format of a single node
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• A B-tree of degree 5
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
Insertion into a B-tree
1.Insert the data record into the data file2.Insert a corresponding index record into the index file
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a through d) The steps for inserting 55
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a through d) The steps for inserting 55
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (e) splitting the root
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
Removal from a B-tree
1.Locate the index record in the index file2.Remove the data record from the data file
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a through e) The steps for removing 73 ;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a through e) The steps for removing 73 ;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a through e) The steps for removing 73 ;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (a through e) The steps for removing 73 ;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
B-Trees
• (f) removing the root;
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Traversals
• Accessing only the search key of each record, but not the data file
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Traversals
• Accessing data file also
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013