14. files - data structures using c++ by varsha patil
TRANSCRIPT
1Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
14 Files
2Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
OBJECTIVES
The purpose of standard data organization methods
Various file organizations and their application-specific suitability such as sequential file, indexed sequential file, and direct access file
The advantages and disadvantages of file organizations
3Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
INTRODUCTION Records that hold information about similar items
of data are usually grouped together into ‘file’ A file is a collection of records where each record
consists of one or more fields
4Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
External storage devices are mainly used for the following:
Overlay or backup of programs during execution
Storage of programs for future use Storage of information in files
External storage device
5Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
EXTERNAL STORAGE DEVICES
The most common external storage devices in order of their use, and study :
magnetic tapes, drums, and disk drives
When we organize data in ‘file’ data structure, the data is non-volatile, which means data will reside on storage after data processing is over
6Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Magnetic Tape A tape is made up of a plastic material
coated with a ferrite substance that is easily magnetized
The physical appearance of the tape is similar to the tape used for sound recording.
A limitation of magnetic tape devices is that records must be processed in the order in which they reside on the tape
7Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Magnetic Drums A magnetic drum is a metal cylinder, from 10 to
36 inches in diameter, which has an outside surface coated with a magnetic recording material
The cylindrical surface of the drum is divided into a number of parallel bands called tracks
The tracks are further divided into either sectors or blocks, depending on the nature of the drum
The sector or block is the smallest addressable unit
8Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Magnetic Disks The magnetic disk is a direct access
storage device, which has become more widely used than the magnetic drum, mainly because of its lower cost
9Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
FILE ORGANIZATION
The proper arrangement of records within a file is called as file organization
10Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
FILE ORGANIZATIONDifferent Schemes of File Organizations
Sequential file Direct or random access file Indexed sequential file Multi-Indexed file
11Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Sequential file In sequential file, records are stored
in the sequential order of their entry This is the simplest kind of data
organization
12Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Direct or random access file
Though we search records using key, we still need to know the address of the record to retrieve it directly
The file organization that supports Files such access is called as direct or random file organization
13Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Indexed sequential file
File—Records are stored sequentially but the index file is prepared for accessing the record directly
An index file contains records ordered by a record key
The record key uniquely identifies the record and determines the sequence in which it is accessed with respect to other records
14Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Multi-Indexed file In multi-indexed file, the data file is associated
with one or more logically separated index files Inverted files and multilist files are examples
of multiindexed files
15Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The factors that affect file organization are mainly the following:
Storage device Type of query Number of keys Mode of retrieval/update of
record
Factors Affecting File Organization
16Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Speed Operations Capacity Size Security
Factors Involved in Selecting File Organization
17Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File I/O Classes Primitive Functions Files Binary and Text File
FILES USING C++
18Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The I/O system of C++ contains a set of classes that define the file handling methods
They are ifstream, ofstream, and fstream These classes are included in ‘fstream.h’
header file ifstream—This class provides input
operations ofstream—This class provides output operations
fstream—This class provides both input and output operations
File I/O Classes
19Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
There are several ways of reading (or writing) the text from (or to) a file
But all of them have a common approach as follows
Open the file Read (or write) the data Close the file
Primitive Functions
20Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Binary and Text File Binary and Text File
The binary file consists of binary data It can store text, graphics, sound data in binary
format The binary files cannot be read directly
Text File The text file contains the plain ASCII characters It contains text data which is marked by ‘end of
line’ at the end of each record This end of record marks help easily to perform
operations such as read and write Text file cannot store graphical data
21Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Sequential File Organization
The order of the records is fixed Within each block, the records are in sequence A sequential file stores records in the order they are
entered New records always appear at the end of the file Primitive Operations Features of Sequential File Organization Drawbacks of Sequential File Organization
22Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Primitive Operations
Open—This opens the file and sets the file pointer to immediately before the first record
Read-next—This returns the next record to the user. If no record is present, then EOF condition will be set
Close—This closes the file and terminates the access to the file
Write-next—File pointers are set to next of last record and write the record to the file
23Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Primitive Operations
Write-next—File pointers are set to next of last record and write the record to the file
EOF—If EOF condition occurs, it returns true, otherwise it returns false
Search—Search for the record with a given key
Update—Current record is written at the same position with updated values
24Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Drawbacks of Sequential File Organization
Insertion and deletion of records in in-between positions huge data movement
Accessing any record requires a pass through all the preceding records, which is time consuming. Therefore, searching a record also takes more time.
Needs reorganization of file from time to time. If too many records are deleted logically, then the file must be reorganized to free the space occupied by unwanted records
25Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Direct Access File Organization
Files that have been designed to make direct record retrieval as easy and efficiently as possible is known as directly organized files
Direct access files are of great use for immediate access to large amounts of information
They are often used in accessing large databases
26Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Direct Access File Organization
Beginning of file
Current position
End of file
0
1
2
SEEK_SET
SEEK_CUR
SEEK_END
Origin Value Macro Name
27Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File position indicators
0
Origin
Value Macro Name
Beginning of file
Current position
End of file
0
1
2
SEEK_SET
SEEK_CUR
SEEK_END
28Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File position indicators
0
Origin
Value Macro Name
Beginning of file
Current position
End of file
0
1
2
SEEK_SET
SEEK_CUR
SEEK_END
29Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
INDEX SEQUENTIALFILE ORGANIZATION
A file that is loaded in key sequence but can be accessed directly by use of one or more indices is known as an indexed sequential file
A sequential data file that is indexed is called as indexed sequential file
A solution to improve speed of retrieving target is index sequential file
An indexed file contains records ordered by a record key
Each record contains a field that contains the record key
30Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Advantages Accessing any record is more efficient than
sequential file organization Large amount of data can be stored using this
type of file organization
Disadvantages Often more than one indices are needed which
occupies large storage area
31Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Types of Indexes(CONTD…) Primary index
It is an index ordered in the same way as the data file, which is sequentially ordered according to a key
The indexing field is equal to this key Secondary index
An index that is defined on a non-ordering field of the data file
In this case, the indexing field need not contain unique values
32Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Types of Indexes
Clustering indexes A data file can associate with utmost one
primary index and several secondary indexes The single-level indexing structure is the
simplest one where a file, whose records are pairs, contains a key and a pointer
The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key and a pointer
33Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Structure of Index Sequential File
Index file consists of three areas: A primary storage area—This includes some
unused space to allow for additions made in data A separate index or indexes—Each query will
reference this index first; it will redirect query to part of data file in which the target record is saved
An overflow area—This is optional separate overflow area
34Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Characteristics of Indexed Sequential
File Records are stored sequentially but the index
file is prepared for accessing the record directly Records can be accessed randomly File has records and also the indexMagnetic tape is not suitable for index
sequential storage Index is the address of physical storage of a
record When randomly very few are required/accessed,
then index sequential is better Faster access methodAddition overhead is to maintain indexIndex sequential files are popularly used in many
applications like digital library
35Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
LINKED ORGANIZATION
In linked organization, the physical sequence of records is different from the logical sequence of records
The next logical record is obtained by following a link value from the present record
Multilist Files Coral Rings Inverted Files Cellular Partitions
36Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
To make searching easy, several indexes are maintained as per primary key and secondary keys, one index per key
The record may be present in different lists as per key
Multilist Files
37Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Consider the following file of office staff
Multilist Files
Staff ID Occupation Salary Record 106 Clerk 5000 A150 Account 4000 B360 clerk 3000 C400 Account 3500 D700 Clerk 2000 E
38Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
We can maintain index on the staff ID
We can group staff ID with range 101–300, 301–600, 601–900, etc
Now all the records with staff ID in the same range will be linked together as shown in Figure
Multilist Files
Sample multilist file
Staff range Link
101-300
301-600
601-900
rec A
rec C
rec E
rec B
rec D
39Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Coral Rings In this doubly linked multilist structure is
used. Each list is circular list with headnode
A C E Alink Clerk Blink S Blink
Sample doubly linked list
40Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Coral Rings ‘Alink’ field is used to link all records with
same key value ‘Blink’ is used for some records back
pointer and for others it is pointer to head node
Owing to these back pointers, deletion is easy without going to start.
Indexes are maintained as per multilists
41Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Inverted Files
With the same key value are linked together and links are kept in each record
But in the inverted files, the link information is kept in the index itself
42Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The indices for fully inverted file are shown in Figure
Occupation index 106 A
150 B
360 C
400 D
700 E
Accountant B,DClerk A,C,E
Salary Index2000 E4000 B,C,D6000 A
43Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Inverted Files The inversion process is associated with the
information of inverted list In inverted files, the record is accessed in two
steps. First, the indices are searched to obtain a list of required records and then second, records are retrieved using these lists
The number of disk accesses required is equal to the number of records being retrieved plus the number to process the indices
In inverted files, only the index structures are important
44Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Cellular Partitions To decrease file search time, the storage
media may be divided into cells A cell may be an entire disk or a cylinder.
Lists are localized to lie within a cell If a cylinder is used as a cell, then all records
on the same cylinder may be accessed without moving the read/write heads
We divide multilists organized on several different cylinders into several small lists which are stored on the same cylinder
45Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Multilist structure with cellular partitioning
Position Student ID T. ID Link1 100 A2 200 B Null3 300 C Null4 400 A Null1 500 D2 600 B Null3 700 A Null4 800 D Null1 900 E Null2 1000 C Null3 1100 D Null4 1200 A Null
Primary key Secondary key
46Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
End of Chapter 14 …!