extendible hashing report

50
Extendible Hashing Vinayak Hegde Nandikal INTRODUCTION A File structure is a combination of representations for data in files and of operations for accessing the data. A File structure application allows us to read, write, and modify data. It might also support finding the data that matches some search criteria or reading through the data in some particular order .An improvement in file structure design may make an application hundreds of times faster. The details of representation of data and the implementation of the operations determine the efficiency of the file structure for particular application. The fundamental operation of file systems: open, create, close, read, write, and seek. Each of these operations involves the creation or use of a link between a physical file stored on a secondary device and a logical file that represents a program’s more abstract view of the same file. When the program describes an operation using the logical file name, the equivalent physical operation gets performed on the corresponding physical file. Disks are very slow compared to memory. On the other hand, disks provide enormous capacity at much less cost than memory. They also keep the information stored on them when they are turned off. The tension between a disk’s relatively slow access time and its enormous, nonvolatile capacity is the driving force 1 Dept of ISE 2007-08

Upload: y-gd-ndl

Post on 12-Nov-2014

1.221 views

Category:

Documents


0 download

DESCRIPTION

The idea behind the project is the purpose of demonstration of all possible operations on the records stored in a file.The first part shows the importance of the use of buffering in this process. The IOBuffer Class hierarchy is used for implementing buffers for various types of records storage operations such as addition of record, deletion of record, search of records, display of all records, etc., are performance in the file. The Buffer classes have the pack and unpack operations implemented.The second part of the project takes up the specific technique of hashing which is used to improve the performance by reducing disc seek to a minimum. The hash driver creates a hashed index for the student record created in Part-I of the project. Further, the dynamic hash driver uses the concept of extendible hashing to created buckets, doubling of directory, recursive collapse, redistribution and trycombine operations. The project focuses on the use of object oriented concepts of C++ like class hierarchy, inheritance, reuse of c components, virtual functions etc. The Project makes extensive file streams of C++ for easier development with greater efficiency.The GUI (Graphical User Interface) further enhances the project by providing a very friendly environment of Data validation and information to the user with suitable error message is also ensured.

TRANSCRIPT

Page 1: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

INTRODUCTION

A File structure is a combination of representations for data in files and of operations for

accessing the data. A File structure application allows us to read, write, and modify data. It might

also support finding the data that matches some search criteria or reading through the data in

some particular order .An improvement in file structure design may make an application

hundreds of times faster. The details of representation of data and the implementation of the

operations determine the efficiency of the file structure for particular application.

The fundamental operation of file systems: open, create, close, read, write, and seek.

Each of these operations involves the creation or use of a link between a physical file stored on a

secondary device and a logical file that represents a program’s more abstract view of the same

file. When the program describes an operation using the logical file name, the equivalent

physical operation gets performed on the corresponding physical file.

Disks are very slow compared to memory. On the other hand, disks provide enormous

capacity at much less cost than memory. They also keep the information stored on them when

they are turned off. The tension between a disk’s relatively slow access time and its enormous,

nonvolatile capacity is the driving force behind file structure design. Good file structure design

will give us access to all the capacity without making our applications spend a lot of time waiting

for the disk. A tremendous variety in the types of data and in the needs of applications makes file

structure design very important.

The problems that researchers struggle will reflect the same issues that one confronts in

addressing any substantial file design problem. Working through the approaches to major file

design issues shows one a lot about how to approach new design problems. Goals of research

and development in file structures are:

Get the information with one access to the disk.

Structures that allow to find the target information with as few accesses as possible.

File structures to group information so that to get everything we need with only one trip

to the disk.

1Dept of ISE 2007-08

Page 2: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 1

REQUIREMENTS SPECIFICATIONS

Requirements for Part 1:

In part 1, we are required to create a student record file. The record consists of the

following fields:

1. University Serial Number

2. Name

3. Address

4. Semester

5. Branch

There should be methods to initialize and assign a record. Also, we should be able to add

a new record, delete a record and modify a record. The number of files is fixed, but the lengths

of the fields are variable.

Requirements for Part 2:

In the second part, we need to develop a hashed index for the student record file

developed in Part 1. The key for the index is the student USN (University Serial Number). We

need to hash the keys and then store the key- reference pairs for further access. Once we develop

a hashed index, this index is used for the retrieval of records.

We need to provide the following functionalities:

1. Add a record.

2. Delete a record.

3. Modify a record.

Also, we need to demonstrate the doubling of the directory size and the space utilization

of the buckets.

2Dept of ISE 2007-08

Page 3: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

Hardware Requirements:

PROCESSOR : Pentium Processor

PRIMARY MEMORY : 64 MB and above.

SECONDARY MEMORY : 1 GB and above.

Software Requirements:

PLATFORM: Microsoft Windows

COMPILER: Turbo C++

LANGUAGE USED: Oops with C++

External Interfaces

User interface GUI (Graphical User Interface) is provided.

3Dept of ISE 2007-08

Page 4: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 2

INTRODUCTION TO FILE STRUCTURES

Different Types Access Methods

Different types of access methods in file structure are:

Indexing

Co sequential processing model

AVL trees

B-trees

B+ trees

Hashing

Indexing:

Indexing is a way of structuring a file so that records can be found by key. This is an

Alternative to sorting. Unlike sorting, indexing permits us to perform binary searches for keys in

variable-length record files. If the index can be held in memory, record addition, deletion, and

retrieval can be done much more quickly with an indexed, entry-sequenced file than with a

sorted file. Indexes can do much more than merely improve on access time: they can provide us

with new capabilities that are inconceivable with access methods based on sorted data records.

The most exciting new capability involves the use of multiple secondary indexes.

Co sequential processing model:

The Co sequential processing model can be applied to problems that involve operations

such as matching and merging (and combination of these) on two or more sorted input files. In

its most complete form, the model depends on certain assumptions about the data in the input

files. Given these assumptions, we can describe the processing components of the model and

define pure virtual functions that represent those components.

Co sequential Operations involve the coordinated processing of two or more sequential

lists to produce a single output list. Sometimes the processing results in a merging, or union, of

the items in the input lists; sometimes the goal is a matching or intersection, of the items in the

4Dept of ISE 2007-08

Page 5: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

lists; and other times the operation is a combination of matching and merging. These kinds of

operations on sequential lists are the basis of a great deal of file processing.

AVL trees:

It is a self-adjusting binary tree structure. AVL is a height-balanced tree, the allowed

difference between the heights of any two sub trees is one.

The important feature of an AVL tree is:

By setting a maximum allowable difference in the height of any two sub trees,

AVL trees guarantee a minimum level of performance in searching.

B-trees:

B-trees are multilevel indexes that solve the problem of linear cost of insertion and

deletion. This is what makes B-trees so good, and why they are now the standard way to

represent indexes. The solution is twofold. First, don’t require that the index records be full.

Second, don’t shift the overflow record into two records, each half full. Deletion takes similar

strategy of merging two records into a single record when necessary.

B+ trees:

The disadvantage of B-tree is that file could not be accessed sequentially with efficiency.

Adding a linked list structure at the bottom level of B-tree solved this problem. The combination

of B-tree and sequential linked list gave rise to B+ trees.

Hashing:

It is a good way of retrieval of records in one access for files that do not change greatly

with time but it does not work will with volatile, dynamic files. A hash function is like a black

box that produces an address every time a key is dropped. Hashing is like indexing in that it

involves associating a key with a relative record address.

5Dept of ISE 2007-08

Page 6: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 3

WHY C++?

Object-oriented toolkit:

Making file structures usable in application development requires tuning this conceptual

toolkit into application programming interfaces-collection of data types and operations that can

be used in application. We have chosen to employ object oriented approach in which data types

and operators are presented in a unified fashion as class definitions.

C++ is used in design of a file structure. C++ is an object oriented programming

language. Objected oriented programming supports the integration of data contents and behavior

into a single design. C++ class definition contains both data and function members and allow

programmers to control precisely the manipulation of objects. These classes are also an extensive

presentation of the features of C++. These features include:

Class Definition

Constructors

Public and private sections

Operator overloading

And the above features enhance the programmer’s ability to control the behaviors of objects.

6Dept of ISE 2007-08

Page 7: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 4

PROJECT PART I

Problem Definition:

Design a class called student. Each object of this class represents information about a

single student. Members should be included for student USN (University Serial Number), Name,

Address, Semester, Branch, etc. Methods should be included for initialization, assignment &

modification values. Provide methods to read the member values to the output stream suitably

formatted. Add methods to store objects as records into the files and load the objects from the

file using buffering, design a suitable IOBuffer class hierarchy. Add pack and Unpack methods

to class student. For all the mini projects assume a fixed-filed, variable-length record with

delimiter record structure for the data file.

Specification And Design:

The part 1 of the project deals with creating a student record file. The record consists of

the following fields as data members.

1. University Serial Number.---->USN

2. Name ---->name

3. Address ---->addr

4. Branch ---->brch

5. Semester. ---->sem

We have provided the following member functions for the operations on the file.

1. Creating a record ---->insert()

2. Assigning a record. ---->assign()

3. Searching a record ---->search()

4. Deleting a record. ---->delet()

5. Modifying a record. ---->modify()

6. Displaying a record ---->display()

7Dept of ISE 2007-08

Page 8: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

insert() function is used to insert the record of one student at a time.

USN Yes

No

assign() function is used to assign the default value to the data members. Here we

assigned NULL value for all data members as a default value.

search () function is used to search for a record based on key value ( USN ).

USNNo

Yes

Mismatch

Match

8Dept of ISE 2007-08

Accept USN

USN Duplicat

e?

StopStart

Accept Data

Store the data in data.dat

Accept USN

Stop

StartRead a record from

Data.dat and unpack the USN

Compare the USN with the key entered

Display the record

EOF?

Page 9: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

delet() function is used to delete a student’s record based on the key value.

USNNo

Yes

Mismatch

Match

modify() function is used to modify the record based on key field entered.

USNNo

Yes

Mismatch

Match

display() function is used to display the records in the file.

USN Yes

No and USN ispresent

9Dept of ISE 2007-08

Stop

Compare the USN with the key entered

Place * at the beginning of record indicates deleted record

EOF?

Accept USN

Start From Data.dat and unpack the USN

Accept USN

StartFrom Data.dat and unpack

the USN

Compare the USN with the key entered

Accept new values from user

Stop

Store the newly accepted data in disk

EOF?

Accept USN

EOF?

StopStart

Read and display the

record

Page 10: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

Algorithm for Part 1:

The steps of insertion are as follows:

Accept the USN from the user

Check for duplication, for the duplicate display error, else continues.

Accept the data from user and check for constraints.

By making use of pack() function, pack the data and put it on the buffer.

By making use of write() function, write the packed data from buffer to disk.

The steps of searching are as follows:

Accept the USN from user.

By making use of read() function, read the records from the disk to buffer.

By making use of unpack() function, unpack only the key, compare it with the key user

has entered. If it matches unpack whole record and display it.

If the match does not occur, go to next record until end of file.

The steps of deletion are as follows:

Accept the key value from the user.

Read the record to the buffer using read().

Unpack the USN from buffer to RAM and compare the USN with key entered

If it matches, use tombstones to indicate record has been deleted.

If it does nit matches, go to next record till end of file.

The steps of modification are as follows:

Accept the key value from the user.

Read the record to the buffer using read().

Unpack the key field from buffer to RAM and compare the with key entered

If it matches,

Accept the new value from the user.

Write the packed data from the buffer to the disk.

If the key doesn’t matches check for next record, repeat until eof, then display error

message.

10Dept of ISE 2007-08

Page 11: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

Setps for deleting a record is as follows :

Read the first set of records from the disk to the buffer.

Unpack the records in buffer and put it on to the RAM.

Display the record and repeat until the end of file.

We have provided the following buffer operations.

read()-from file to buffer

write()-from buffer to file

pack()-from RAM to buffer

unpack()-from buffer to RAM

Pack() write()

Unpack() read()

Figure: Pack(), unpack(), read() and write() operation

Analysis And Design of Buffer Hierarchy:

The read and write file operations need a buffer, which is developed using a hierarchy of

classes. The highest class in the hierarchy is the class IOBuffer. Since we know the number of

fields and since the lengths of the fields are variable, we use the Delimited Text Buffer class.

Here, we write the length of the record first and then the record itself. The fields are separated

using a delimiter. There are methods that pack the fields into the buffer and there are methods

that unpack the fields from the buffer. The access to the records of the file is sequential. We

also provide for addition of records and deletion of records. The fields of records can be

assigned a specific value and records can also be modified. In general we have the following

hierarchy:

IO BUFFER

11Dept of ISE 2007-08

RAM BUFFERSTORAGE DEVICE

Page 12: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

VARIABLE LENGTH BUFFER and FIXED LENGTH BUFFER

DELIMITED FIELD BUFFER, LENGTH FIELD BUFFER and

FIXED FIELD BUFFER

The hierarchy is shown in the diagram.

Figure: Buffer Class Hierarchy

The field packing and unpacking operations, in their various forms, can be Encapsulated

into C++ classes. The three different field representation strategies are Delimited, length-based

and fixed length is implemented in different classes. Class IO BUFFER does not include any

implementation methods. It is an abstract Class and hence object of it can be declared. All the

necessary read, write pack and unpack operation is provided in classes down the hierarchy.

Inheritance allows related classes to share members. We use this powerful Mechanism

provided by C ++ to buffering. Object-Oriented design of classes Guarantees that operations on

objects are performed correctly.

12Dept of ISE 2007-08

IOBUFFERChar array of

Buffer

FIXED LENGTH BUFFER

Read and Write operations

DELIMITED FIELD BUFFERPack and Unpack operations

LENGTH FIELD BUFFER

Pack and Unpack

FIXED FIELD BUFFER

Pack and Unpack

VARIABLE LENGTH BUFFER

Read and Write operations

Page 13: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 5

PROJECT PART II

Problem Definition:

Develop a hashed index of the student record file with the USN as the key. Write a driver

program to create a hashed file from an existing student record file. Demonstrate the recursive

collapse of directory over more than one level.

1. Demonstrate doubling of the directory size

2. Display the space utilization for buckets and directory size.

Specification And Design:

The second part of the project deals with providing O (1) access to the records of the file.

For this, we need to develop an index to the file. The USN is used as the key. To provide O (1)

access we need to hash the index. There are two approaches to hashing.

1. Static hashing

2. Dynamic hashing.

Static hashing is very good for the files, which do not undergo any changes frequently.

But real time files change frequently and the performance of static hashing deteriorates.

Dynamic hashing copes with this problem. In this approach, we hash the key and use

only a part of the hashed address. This approach is called “ Use more as we need more”

approach.

We also use what are called “BUCKETS”. Buckets are nothing but containers of key

reference pairs. All the keys in a bucket have same starting address. Once the bucket is full, we

split the bucket into two and distribute the keys among the buckets. To keep track of the

13Dept of ISE 2007-08

Page 14: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

buckets, we develop another structure, a DIRECTORY. A directory maintains an array of the

bucket locations.

Thus, we hash a key and get a part of the hashed address depending on the population of

the records. Then we use this part of the hashed address as an index into the array of buckets and

find its location. We then directly seek to that location and get the record.

The main design issue here is whether we provide a static hashing that uses a prespecified

size of address space or a dynamic hashing. The dynamic hashing is very useful for files that

change frequently.

We have decided to implement extendible hashing, which uses a part of the hashed

address depending on the size of the file. This is called the use-more-as-U-need-more approach.

We do not hash the data file itself. Instead, we only hash the index. The index consists of key-

record address pairs.

Buckets are used to resolve collision problem. Here one address can hold more than one

record or index entry. We also use Directories to keep track of the buckets. The bucket consists

of key-reference pairs. This means that the buffer class that needs to be used is fixed length

buffer. We keep the addresses of the buckets in memory using arrays.

Buckets are filled with key-reference pairs as and when the data records are inserted.

When a bucket gets filled, the bucket is split into two and the records are redistributed. This

means that we are using more of the hashed address as and when the file size increases. Also,

we keep track of deletions. A deletion may trigger the collapse of the directory, as less number

of buckets will be needed. Thus the hashing technique becomes truly dynamic.

Structure of the ProjectThe project is basically required to do any operation based on hashing the primary key

USN. Hence it all begins by hashing the key into a valid address. The address points to the

directory entry. The directory consists of address to buckets. The bucket in tUSN contains the

address to the address to the record in the STUDENT.DAT file.

14Dept of ISE 2007-08

Page 15: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

The below diagram shows what our project does. The general steps are:

A given key is hashed to a directory address.

The directory cell contains the address for the bucket.

Bucket contains the address of the record in student file.

Figure: Structure of hashed index.

KEY

Creating the addresses:

MakeAddress function extracts a portion of the full hashed address. This function is also

used to reverse the order of the bits in the hashed address, making the lowest–order bit of the

hash address the highest-order bit of the value used in extendible hashing because least

significant integer values tends to have more variation than the high-order bits.

Hash function: retUSNs an integer hash value for key for a 15-bit.

Splitting in Buckets:

Method SPLIT of class Bucket divides keys between an existing bucket and a new

bucket. If necessary, it doubles the size of the directory to accommodate the new bucket.

15Dept of ISE 2007-08

HASH

D

I

R

E

C

T

O

R

Y

BUCKETS

BUCKETS

BUCKETS

STUDENT FILE

Page 16: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

Directory and Bucket Operations:

The INSERT method first searches for the key. SEARCH arranges for the CurrentBucket

member to contain the proper bucket for the key. The FIND method determines where the key

would be if it were in the structure.

Method DoubleSize() and InsertBucket():

The Insert method manages record addition. If the key is already exists, Insert retUSNs

immediately. If the key does not exist, Insert calls Bucket::Insert, for the bucket into which the

key is to be added. If the bucket is full, Bucket::Insert calls Split to handle the task of splitting

the bucket. If the directory needs to be larger, Split calls method Directory::DoubleSize to double

the directory size.

Finding Buddy Buckets:

The method works by checking to see whether it is possible for there to be a buddy

bucket. The next test compares the number of bits used by the bucket with the number of bits

used in the directory address space. A pair of buddy buckets is a set of buckets that are

immediately descendents of the same node in the tries. This method retUSNs a buddy bucket or -

1 if none found.

Collapsing the Directory:

Method Directory::Collapse begins by making sure that we are not at the lower limit of

directory size. By treating the special case of a directory with a single cell here, at the start of the

function, we simplify subsequent processing: with the exception of this case, all directory sizes

are evenly divisible by 2. The test to see if the directory can be collapsed consists of examining

each pair of directory cells to see if they point to different buckets. As soon as we find such a

pair, we know we cannot collapse the directory and method retUSNs

Deletion operations:

16Dept of ISE 2007-08

Page 17: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

We first find the key to be deleted. IF we cannot find it, return failure; if it found call

Bucket::Remove to remove the key from the bucket. Return the value reported back from the

method.

Space utilization:

It is defined as the ratio of actual number of records to the total number of records that

could be stored in allocated space. Expectation of average utilization of 69 %. Space utilization

can be calculated using the formula:

Utilization= (r / b*N)

Where, r is number of records

b is block size, and

N is average number of blocks

Source Code

17Dept of ISE 2007-08

Page 18: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

int MakeAddress (char *key, int depth)

{

int retval = 0;

int mask = 1;

int hashVal = Hash(key);

for ( int j = 0; j < depth; j++)

{

retval = retval << 1;

int lowbit = hashVal & mask;

retval = retval | lowbit;

hashVal = hashVal >> 1;

}

retUSN retval;

}

int Hash (char * key)

{

int sum = 0;

int len = strlen(key);

if (len % 2 == 1) len++; // make len even

for(int j=0; j < len; j+=2)

sum = ( sum +100 * key[j] + key[j+1]) %19937; retUSN sum;

}

Class Bucket: public Text Index

{

protected:

Bucket (Directory & dir, int maxKeys = defaultMaxKeys);

int Insert (char * key, int recAddr);

int Remove (char * key);

Bucket * Split ();

int NewRange (int & newStart, int & newEnd);

int Redistribute (Bucket & newBucket);

int FindBuddy ();// find the bucket that is the buddy of this

18Dept of ISE 2007-08

Page 19: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

int TryCombine (); // attempt to combine buckets

int Combine (Bucket * buddy, int buddyIndex);

int Depth;

int BucketAddr;

ostream & Print (ostream &);

friend class Directory;

friend class BucketBuffer;

};

class BucketBuffer: public TextIndexBuffer

{

public:

BucketBuffer (int keySize, int maxKeys);

int Pack (const Bucket & bucket);

int Unpack (Bucket & bucket);

};

class Directory

{

public:

Directory (int maxBucketKeys = -1);

Directory ();

int Open (char * name);

int Create (char * name);

int Close ();

int Insert (char * key, int recAddr);

int Remove (char * key);

int Search (char * key); // retUSN RecAddr for key

int ReSize (void);

int Reduction (void);

void spaceutil(char * myfile);

ostream & Print (ostream & stream);

protected:

int Depth;

19Dept of ISE 2007-08

Page 20: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

int NumCells;

int * BucketAddr;

int DoubleSize ();

int Collapse ();

int InsertBucket (int bucketAddr, int first, int last);

int RemoveBucket (int bucketIndex, int depth);

int Find (char * key);

int StoreBucket (Bucket * bucket);

int LoadBucket (Bucket * bucket, int bucketAddr);

int MaxBucketKeys;

BufferFile * DirectoryFile;

LengthFieldBuffer * DirectoryBuffer;

Bucket * CurrentBucket;

BucketBuffer * theBucketBuffer;// buffer for buckets

BufferFile * BucketFile;

int Pack () const;

int Unpack ();

Bucket * PrintBucket; friend class Bucket;

};

int Directory::Insert (char * key, int recAddr)

{

int found = Search (key);

if (found == -1) retUSN CurrentBucket->Insert(key, recAddr);

retUSN 0;// key already in directory

}

int Directory::Search (char * key)

{

int bucketAddr = Find(key);

LoadBucket (CurrentBucket, bucketAddr);

retUSN CurrentBucket->Search(key);

}

Bucket * Bucket::Split ()

20Dept of ISE 2007-08

Page 21: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

{

int newStart, newEnd;

if (Depth == Dir.Depth || Dir.NumCells==1)

{

doublesizetrue=1;

Dir.DoubleSize();

}

Bucket * newBucket = new Bucket (Dir, MaxKeys);

Dir.StoreBucket (newBucket);

NewRange (newStart, newEnd);

Dir.InsertBucket(newBucket->BucketAddr, newStart, newEnd);

Depth ++;

newBucket->Depth = Depth;

Redistribute (*newBucket);

Dir.StoreBucket (this);

Dir.StoreBucket (newBucket);

retUSN newBucket;

}

int Directory:: DoubleSize ()

{

int newSize = 2 * NumCells;

Int *newBucketAddr = new int[newSize];

for(int i=0;i<NumCells;i++)

{

newBucketAddr[2*i] = BucketAddr[i];

newBucketAddr[2*i+1] = BucketAddr[i];

}

delete BucketAddr;

BucketAddr = newBucketAddr;

Depth++;

NumCells = newSize;

retUSN 1;

21Dept of ISE 2007-08

Page 22: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

}

int Bucket::FindBuddy ()

{

if (Dir.Depth == 0) retUSN -1;

if (Depth < Dir.Depth) retUSN -1;

int sharedAddress = MakeAddress(Keys[0], Depth);

retUSN sharedAddress ^ 1;

}

int Directory :: Collapse()

{

if (Depth == 0) retUSN 0;

for (int i=0;i<NumCells;i+=2)

if(BucketAddr[i] != BucketAddr[i+1])

retUSN 0;

int newSize = NumCells / 2;

int * newAddrs = new int [newSize];

for(int j =0; j<newSize;j++)

newAddrs[j] = BucketAddr[j*2];

delete BucketAddr;

BucketAddr = newAddrs;

Depth --;

collapsetrue=1;

NumCells = newSize;

retUSN 1;

}

int Bucket::TryCombine ()

{

int result;

int buddyIndex = FindBuddy ();

if (buddyIndex == -1) retUSN 0;

int buddyAddr = Dir.BucketAddr[buddyIndex];

Bucket * buddyBucket = new Bucket (Dir, MaxKeys);

22Dept of ISE 2007-08

Page 23: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

Dir . LoadBucket (buddyBucket, buddyAddr);

if (NumKeys + buddyBucket->NumKeys > MaxKeys) retUSN 0;

Combine (buddyBucket, buddyIndex);

result = Dir.Collapse ();

if (result) TryCombine();

retUSN 1;

}

int Bucket::Remove (char * key)

{

int result = TextIndex::Remove (key);

if (!result) return 0;

TryCombine ();

Dir.StoreBucket(this);

return 1;

}

int Directory::Remove (char * key)

{

int bucketAddr = Find(key);

LoadBucket (CurrentBucket, bucketAddr);

return CurrentBucket -> Remove (key);

}

void Directory::spaceutil(char * myfile)

{

fstream file(myfile,ios::in);

float numrecs=0,util;

char ch;

while(1)

{

file>>ch;

if(file.fail())

break;

else if(ch=='#')

23Dept of ISE 2007-08

Page 24: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

numrecs++;

}

file.close();

int cnt=1;

for(int i=0;i<NumCells-1;i++)//counts number of buckets

{

if(BucketAddr[i+1]==BucketAddr[i])

continue;

cnt++;

}

util=(numrecs/(cnt*4))*100;//utilization=r/(bN)

cout<<"\nRECORDS IN THE FILE = "<<numrecs<<"\n";

cout<<"\n\nBUCKETS USED BY THE RECORDS = "<<cnt++;

cout<<"\n\n\nDIRECTORY SIZE IS = "<<NumCells;

cout<<"\n\n\nUTILIZATION OF SPACE = "<<util<<"%\n\n";

//for directory

float x;

x=pow(numrecs,1.25);

x=x*0.98;

cout<<"\nUTILIZATION 0F SPACE BY THE DIRECTORY = "<<x<<"bytes";

}

void Insert(char *myfile)

{

Student s;

char str[30];

setcolor(BLACK);

settextstyle(2,0,5);

outtextxy(230,100,"ENTER USN NUMBER :");

strget(420,100,s.Usn,10);

strupr(s.Usn);

int res = Dir.Search(s.Usn);

24Dept of ISE 2007-08

Page 25: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

if(res!=-1)

{

outtextxy(400,400,"This reg-no already exists!!!");

outtextxy(400,410,"Press Any Key....");

getch();

return;

}

if((strcmp(s.Usn,NULL)==0))

{

outtextxy(400,400,"Enter a Valid Key!!!\a");

getch();

return;

}

if(!isdigit(s.Usn[0])||!isalpha(s.Usn[1])||!isalpha(s.Usn[2])||!isdigit(s.Usn[3])||!

isdigit(s.Usn[4])||!isalpha(s.Usn[5])||!isalpha(s.Usn[6])||!isdigit(s.Usn[7])||!isdigit(s.Usn[8])||!

isdigit(s.Usn[9]))

{

outtextxy(400,400,"Enter a Valid Key!!!\a");

getch();

return;

}

outtextxy(230,120,"ENTER NAME :");

strget(420,120,s.Name,20);

strupr(s.Name);

int re = Dir.Search(s.Name);

if(re!=-1)

{

outtextxy(400,220,"Name Duplication..!!!");

getch();

}

if((strcmp(s.Name,NULL)==0))

25Dept of ISE 2007-08

Page 26: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

{

outtextxy(400,400,"Enter a Valid NAME!!!\a");

getch();

}

if(!isalpha(s.Name))

{

outtextxy(400,220,"Name Contains other than alpha charector!!");

outtextxy(400,240,"Re-enter NAME");//

getch();

goto NAME;

}

outtextxy(230,140,"ENTER ADDRESS :");

strget(420,140,s.Address,30);

strupr(s.Address);

outtextxy(230,160,"ENTER SEMESTER :");

strget(420,160,s.Semester,2);strupr(s.Semester);

if(atoi(s.Semester)>8)

{

outtextxy(400,400,"Invalid Semester!!!\a");

getch();

return;

}

outtextxy(230,180,"ENTER BRANCH :");

strget(420,180,s.Branch,5);strupr(s.Branch);

int flag=0;

for(int i=0;i<16;i++)

if(strcmp(s.Branch,s.Brlist[i])==0)

{

flag=1;

break;

}

if(flag==0)

26Dept of ISE 2007-08

Page 27: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

{

outtextxy(400,400,"InValid Branch!!!\a");

getch();

return;

}

outtextxy(230,200,"ENTER COLLEGE :");

strget(420,200,s.College,10);

strupr(s.College);

int recaddr=s.Append(myfile);

Dir.Insert(s.Usn,recaddr);

outtextxy(400,400,"Record Successfully Appended.");

getch();

if(doublesizetrue)

{

closegraph();

clrscr();

cprintf("The Directory Has Doubled");

doublesizetrue=0;

Dir.Print(cout);

}

}

void deleterecord (char *myfile)

{

Student s;

strupr(s.Usn);

settextstyle(2,0,5);

outtextxy(50,50,"ENTER USN NUMBER : ");

strget(200,50,s.Usn,10);

strupr(s.Usn);

int addr=Dir.Search(s.Usn);

if(addr==-1)

27Dept of ISE 2007-08

Page 28: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

{

outtextxy(300,300,"THE RECORD DOES NOT EXIST");

getch();

return;

}

fstream ofile(myfile,ios::in|ios::out);

ofile.seekp(addr,ios::beg);

ofile.write("*",1);

ofile.close();

Dir.Remove(s.Usn);

outtextxy(200,400,"THE RECORD IS DELETED SUCCESSFULLY");

compaction();

getch();

}

void display(char *myfile)

{

Student s;

setcolor(BLACK);

settextstyle(2,0,5);

outtextxy(50,50,"ENTER USN NUMBER : ");

strget(200,50,s.Usn,10);

strupr(s.Usn);

int addr;

if((addr = Dir.Search(s.Usn))==-1)

{

outtextxy(300,300,"Record not found!");

outtextxy(300,320,"Press Any Key..");

getch();

return;

}

DelimFieldBuffer :: SetDefaultDelim('|');

DelimFieldBuffer Buff;

28Dept of ISE 2007-08

Page 29: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

fstream file(myfile,ios::in);

Buff.DRead(file,addr);

s.Unpack(Buff);

char str[100];

sprintf(str,"USN NO : %s",s.Usn);

outtextxy(100,100,str);

sprintf(str,"NAME : %s",s.Name);

outtextxy(100,120,str);

sprintf(str,"ADDRESS : %s",s.Address);

outtextxy(100,140,str);

sprintf(str,"SEMESTER : %s",s.Semester);

outtextxy(100,160,str);

sprintf(str,"BRANCH : %s",s.Branch);

outtextxy(100,180,str);

sprintf(str,"COLLEGE : %s",s.College);

outtextxy(100,200,str);

file.close();

}

SECTION 6

GUI DESIGN

29Dept of ISE 2007-08

Page 30: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 7

SNAPSHOTS

30Dept of ISE 2007-08

Page 31: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

MAIN MENU

31Dept of ISE 2007-08

Page 32: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

RECORD INSERTION

DISPLAYING ALL RECORD

32Dept of ISE 2007-08

Page 33: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

RECODR MODIFICATION

DISPLAYING A RECORD

33Dept of ISE 2007-08

Page 34: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SPACE UTILIZATION

DIRECTORY DISPLAY

34Dept of ISE 2007-08

Page 35: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

SECTION 8

CONCLUSION AND FUTURE ENHANCEMENTS

Conclusion:

Hashing is a way of structuring a file so that records can be found by applying a hash

function that transforms a key into address. This address is then used as the basis for insertion

and retrieval of records. Here more than one record can be hashed to the same address, this

phenomenon is called collision. The extendible hashing provides O(1) performance since there is

no overflow. These access time values are truly independent of the size of the file.

Future Enhancement:

Instead of the given STUDENT class, the project can be made to handle a generic class

that accepts a class name as a parameter and used for different applications. Another class called

BUFFERFILE can be included given that it contains a handle to the base class of the buffer class

35Dept of ISE 2007-08

Page 36: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

hierarchy i.e., IOBUFFER and handle to the file for simultaneous manipulation of buffer and file

to support more pure form of OBJECT OREIENTATION.

Some of the possible improvements and new features that can be included are:

Improved User Interface with commercial level enhancements.

Support for remote administration of the system.

Support for simultaneous access and modification of the student file from different

systems.

Improved free space management for data files.

Implementation of other addressing techniques in addition to the present hashing

technique to analyze performance issues.

BIBLIOGRAPHY

TITLE AUTHOR

1) FILE STRUCTURES MICHAEL J. FOLK AN OBJECT ORIENTED BILL ZOELLICK APPROACH WITH C ++ GREG RICCARDI

2) LET US C++ YASHVANTH KANETKAR

3) THE COMPLETE REFERENCE C++ HERBERT SCHILD

36Dept of ISE 2007-08

Page 37: Extendible Hashing Report

Extendible Hashing Vinayak Hegde Nandikal

37Dept of ISE 2007-08