14. files - data structures using c++ by varsha patil

46
Oxford University Press © 2012 Data Structures Using C++ by Dr Varsha Patil 1 14 Files

Upload: widespreadpromotion

Post on 12-Jan-2017

601 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: 14. Files - Data Structures using C++ by Varsha Patil

1Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

14 Files

Page 2: 14. Files - Data Structures using C++ by Varsha Patil

2Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

OBJECTIVES

The purpose of standard data organization methods

Various file organizations and their application-specific suitability such as sequential file, indexed sequential file, and direct access file

The advantages and disadvantages of file organizations

Page 3: 14. Files - Data Structures using C++ by Varsha Patil

3Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

INTRODUCTION Records that hold information about similar items

of data are usually grouped together into ‘file’ A file is a collection of records where each record

consists of one or more fields

Page 4: 14. Files - Data Structures using C++ by Varsha Patil

4Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

External storage devices are mainly used for the following:

Overlay or backup of programs during execution

Storage of programs for future use Storage of information in files

External storage device

Page 5: 14. Files - Data Structures using C++ by Varsha Patil

5Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

EXTERNAL STORAGE DEVICES

The most common external storage devices in order of their use, and study :

magnetic tapes, drums, and disk drives

When we organize data in ‘file’ data structure, the data is non-volatile, which means data will reside on storage after data processing is over

Page 6: 14. Files - Data Structures using C++ by Varsha Patil

6Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Magnetic Tape A tape is made up of a plastic material

coated with a ferrite substance that is easily magnetized

The physical appearance of the tape is similar to the tape used for sound recording.

A limitation of magnetic tape devices is that records must be processed in the order in which they reside on the tape

Page 7: 14. Files - Data Structures using C++ by Varsha Patil

7Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Magnetic Drums A magnetic drum is a metal cylinder, from 10 to

36 inches in diameter, which has an outside surface coated with a magnetic recording material

The cylindrical surface of the drum is divided into a number of parallel bands called tracks

The tracks are further divided into either sectors or blocks, depending on the nature of the drum

The sector or block is the smallest addressable unit

Page 8: 14. Files - Data Structures using C++ by Varsha Patil

8Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Magnetic Disks The magnetic disk is a direct access

storage device, which has become more widely used than the magnetic drum, mainly because of its lower cost

Page 9: 14. Files - Data Structures using C++ by Varsha Patil

9Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

FILE ORGANIZATION

The proper arrangement of records within a file is called as file organization

Page 10: 14. Files - Data Structures using C++ by Varsha Patil

10Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

FILE ORGANIZATIONDifferent Schemes of File Organizations

Sequential file Direct or random access file Indexed sequential file Multi-Indexed file

Page 11: 14. Files - Data Structures using C++ by Varsha Patil

11Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Sequential file In sequential file, records are stored

in the sequential order of their entry This is the simplest kind of data

organization

Page 12: 14. Files - Data Structures using C++ by Varsha Patil

12Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Direct or random access file

Though we search records using key, we still need to know the address of the record to retrieve it directly

The file organization that supports Files such access is called as direct or random file organization

Page 13: 14. Files - Data Structures using C++ by Varsha Patil

13Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Indexed sequential file

File—Records are stored sequentially but the index file is prepared for accessing the record directly

An index file contains records ordered by a record key

The record key uniquely identifies the record and determines the sequence in which it is accessed with respect to other records

Page 14: 14. Files - Data Structures using C++ by Varsha Patil

14Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Multi-Indexed file In multi-indexed file, the data file is associated

with one or more logically separated index files Inverted files and multilist files are examples

of multiindexed files

Page 15: 14. Files - Data Structures using C++ by Varsha Patil

15Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

The factors that affect file organization are mainly the following:

Storage device Type of query Number of keys Mode of retrieval/update of

record

Factors Affecting File Organization

Page 16: 14. Files - Data Structures using C++ by Varsha Patil

16Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Speed Operations Capacity Size Security

Factors Involved in Selecting File Organization

Page 17: 14. Files - Data Structures using C++ by Varsha Patil

17Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

File I/O Classes Primitive Functions Files Binary and Text File

FILES USING C++

Page 18: 14. Files - Data Structures using C++ by Varsha Patil

18Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

The I/O system of C++ contains a set of classes that define the file handling methods

They are ifstream, ofstream, and fstream These classes are included in ‘fstream.h’

header file ifstream—This class provides input

operations ofstream—This class provides output operations

fstream—This class provides both input and output operations

File I/O Classes

Page 19: 14. Files - Data Structures using C++ by Varsha Patil

19Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

There are several ways of reading (or writing) the text from (or to) a file

But all of them have a common approach as follows

Open the file Read (or write) the data Close the file

Primitive Functions

Page 20: 14. Files - Data Structures using C++ by Varsha Patil

20Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Binary and Text File Binary and Text File

The binary file consists of binary data It can store text, graphics, sound data in binary

format The binary files cannot be read directly

Text File The text file contains the plain ASCII characters It contains text data which is marked by ‘end of

line’ at the end of each record This end of record marks help easily to perform

operations such as read and write Text file cannot store graphical data

Page 21: 14. Files - Data Structures using C++ by Varsha Patil

21Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Sequential File Organization

The order of the records is fixed Within each block, the records are in sequence A sequential file stores records in the order they are

entered New records always appear at the end of the file Primitive Operations Features of Sequential File Organization Drawbacks of Sequential File Organization

Page 22: 14. Files - Data Structures using C++ by Varsha Patil

22Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Primitive Operations

Open—This opens the file and sets the file pointer to immediately before the first record

Read-next—This returns the next record to the user. If no record is present, then EOF condition will be set

Close—This closes the file and terminates the access to the file

Write-next—File pointers are set to next of last record and write the record to the file

Page 23: 14. Files - Data Structures using C++ by Varsha Patil

23Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Primitive Operations

Write-next—File pointers are set to next of last record and write the record to the file

EOF—If EOF condition occurs, it returns true, otherwise it returns false

Search—Search for the record with a given key

Update—Current record is written at the same position with updated values

Page 24: 14. Files - Data Structures using C++ by Varsha Patil

24Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Drawbacks of Sequential File Organization

Insertion and deletion of records in in-between positions huge data movement

Accessing any record requires a pass through all the preceding records, which is time consuming. Therefore, searching a record also takes more time.

Needs reorganization of file from time to time. If too many records are deleted logically, then the file must be reorganized to free the space occupied by unwanted records

Page 25: 14. Files - Data Structures using C++ by Varsha Patil

25Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Direct Access File Organization

Files that have been designed to make direct record retrieval as easy and efficiently as possible is known as directly organized files

Direct access files are of great use for immediate access to large amounts of information

They are often used in accessing large databases

Page 26: 14. Files - Data Structures using C++ by Varsha Patil

26Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Direct Access File Organization

Beginning of file

Current position

End of file

0

1

2

SEEK_SET

SEEK_CUR

SEEK_END

Origin Value Macro Name

Page 27: 14. Files - Data Structures using C++ by Varsha Patil

27Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

File position indicators

0

Origin

Value Macro Name

Beginning of file

Current position

End of file

0

1

2

SEEK_SET

SEEK_CUR

SEEK_END

Page 28: 14. Files - Data Structures using C++ by Varsha Patil

28Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

File position indicators

0

Origin

Value Macro Name

Beginning of file

Current position

End of file

0

1

2

SEEK_SET

SEEK_CUR

SEEK_END

Page 29: 14. Files - Data Structures using C++ by Varsha Patil

29Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

INDEX SEQUENTIALFILE ORGANIZATION

A file that is loaded in key sequence but can be accessed directly by use of one or more indices is known as an indexed sequential file

A sequential data file that is indexed is called as indexed sequential file

A solution to improve speed of retrieving target is index sequential file

An indexed file contains records ordered by a record key

Each record contains a field that contains the record key

Page 30: 14. Files - Data Structures using C++ by Varsha Patil

30Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Advantages Accessing any record is more efficient than

sequential file organization Large amount of data can be stored using this

type of file organization

Disadvantages Often more than one indices are needed which

occupies large storage area

Page 31: 14. Files - Data Structures using C++ by Varsha Patil

31Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Types of Indexes(CONTD…) Primary index

It is an index ordered in the same way as the data file, which is sequentially ordered according to a key

The indexing field is equal to this key Secondary index

An index that is defined on a non-ordering field of the data file

In this case, the indexing field need not contain unique values

Page 32: 14. Files - Data Structures using C++ by Varsha Patil

32Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Types of Indexes

Clustering indexes A data file can associate with utmost one

primary index and several secondary indexes The single-level indexing structure is the

simplest one where a file, whose records are pairs, contains a key and a pointer

The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key and a pointer

Page 33: 14. Files - Data Structures using C++ by Varsha Patil

33Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Structure of Index Sequential File

Index file consists of three areas: A primary storage area—This includes some

unused space to allow for additions made in data A separate index or indexes—Each query will

reference this index first; it will redirect query to part of data file in which the target record is saved

An overflow area—This is optional separate overflow area

Page 34: 14. Files - Data Structures using C++ by Varsha Patil

34Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Characteristics of Indexed Sequential

File Records are stored sequentially but the index

file is prepared for accessing the record directly Records can be accessed randomly File has records and also the indexMagnetic tape is not suitable for index

sequential storage Index is the address of physical storage of a

record When randomly very few are required/accessed,

then index sequential is better Faster access methodAddition overhead is to maintain indexIndex sequential files are popularly used in many

applications like digital library

Page 35: 14. Files - Data Structures using C++ by Varsha Patil

35Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

LINKED ORGANIZATION

In linked organization, the physical sequence of records is different from the logical sequence of records

The next logical record is obtained by following a link value from the present record

Multilist Files Coral Rings Inverted Files Cellular Partitions

Page 36: 14. Files - Data Structures using C++ by Varsha Patil

36Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

To make searching easy, several indexes are maintained as per primary key and secondary keys, one index per key

The record may be present in different lists as per key

Multilist Files

Page 37: 14. Files - Data Structures using C++ by Varsha Patil

37Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Consider the following file of office staff

Multilist Files

Staff ID Occupation Salary Record 106 Clerk 5000 A150 Account 4000 B360 clerk 3000 C400 Account 3500 D700 Clerk 2000 E

Page 38: 14. Files - Data Structures using C++ by Varsha Patil

38Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

We can maintain index on the staff ID

We can group staff ID with range 101–300, 301–600, 601–900, etc

Now all the records with staff ID in the same range will be linked together as shown in Figure

Multilist Files

Sample multilist file

Staff range Link

101-300

301-600

601-900

rec A

rec C

rec E

rec B

rec D

Page 39: 14. Files - Data Structures using C++ by Varsha Patil

39Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Coral Rings In this doubly linked multilist structure is

used. Each list is circular list with headnode

A C E Alink Clerk Blink S Blink

Sample doubly linked list

Page 40: 14. Files - Data Structures using C++ by Varsha Patil

40Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Coral Rings ‘Alink’ field is used to link all records with

same key value ‘Blink’ is used for some records back

pointer and for others it is pointer to head node

Owing to these back pointers, deletion is easy without going to start.

Indexes are maintained as per multilists

Page 41: 14. Files - Data Structures using C++ by Varsha Patil

41Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Inverted Files

With the same key value are linked together and links are kept in each record

But in the inverted files, the link information is kept in the index itself

Page 42: 14. Files - Data Structures using C++ by Varsha Patil

42Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

The indices for fully inverted file are shown in Figure

Occupation index 106 A

150 B

360 C

400 D

700 E

Accountant B,DClerk A,C,E

Salary Index2000 E4000 B,C,D6000 A

Page 43: 14. Files - Data Structures using C++ by Varsha Patil

43Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Inverted Files The inversion process is associated with the

information of inverted list In inverted files, the record is accessed in two

steps. First, the indices are searched to obtain a list of required records and then second, records are retrieved using these lists

The number of disk accesses required is equal to the number of records being retrieved plus the number to process the indices

In inverted files, only the index structures are important

Page 44: 14. Files - Data Structures using C++ by Varsha Patil

44Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Cellular Partitions To decrease file search time, the storage

media may be divided into cells A cell may be an entire disk or a cylinder.

Lists are localized to lie within a cell If a cylinder is used as a cell, then all records

on the same cylinder may be accessed without moving the read/write heads

We divide multilists organized on several different cylinders into several small lists which are stored on the same cylinder

Page 45: 14. Files - Data Structures using C++ by Varsha Patil

45Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

Multilist structure with cellular partitioning

Position Student ID T. ID Link1 100 A2 200 B Null3 300 C Null4 400 A Null1 500 D2 600 B Null3 700 A Null4 800 D Null1 900 E Null2 1000 C Null3 1100 D Null4 1200 A Null

Primary key Secondary key

Page 46: 14. Files - Data Structures using C++ by Varsha Patil

46Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

End of Chapter 14 …!