Download - Indexing
IndexingBy: Arnold Mesa
Indexing
You can think of an index to a file like a catalogue to a library
There are two kinds...
Ordered Indices - sorted ordering of the values.
Hash Indices - a uniform distribution of values across a range of buckets. The distribution is based on a hash function.
Key Concepts
Access Types - types of access that are supported efficiently
Access Time - time it takes to access a particular data item
Insertion Time - time it takes to insert a data item Deletion Time - time it takes to delete a data item Space Overhead - additional space occupied by an index
structure
There are two kinds of ordered indices
– Dense Index - An index record appears for every search-key value in the file. The index record contains the search-key value and a pointer to the first data record. The rest of the records with the same search key-value would be sequentially stored after the first record.
– Sparse Index - An index record appears for only some of the search key values. So you have a smaller number of index records. Each index contains a search key and a pointer to the first record, as with the dense index.
234 Hotel Sofitel A-212
321 Hilton B-321
389 Hilton C-002
396 Hilton A-322
112 Westin C-034
253 Westin B-219
501 Marriot B-069
532 Marriot C-304
221 The Ritz A-007
Hotel Sofitel
Hilton
Westin
Marriot
The Ritz
Dense Index
234 Hotel Sofitel A-212
321 Hilton B-321
389 Hilton C-002
396 Hilton A-322
112 Westin C-034
253 Westin B-219
501 Marriot B-069
532 Marriot C-304
221 The Ritz A-007
Hotel Sofitel
Westin
The Ritz
Sparse Tree
234 Hotel Sofitel A-212
321 Hilton B-321
389 Hilton C-002
396 Hilton A-322
112 Westin C-034
253 Westin B-219
501 Marriot B-069
532 Marriot C-304
221 The Ritz A-007
Hotel Sofitel
Westin
The Ritz
Suppose we want to find the Marriot #532...
Efficiency Issues
Even if we use a sparse index, the index itself may become too large for efficient processing
If an index is sufficiently small to be kept in main memory, the search time would be low
If the index is large that is kept on disk, a search may require several disk block reads
How to deal ...
With a large index we should construct a sparse index on the primary index.
234 Hotel Sofitel A-212
321 Hilton B-321
389 Hilton C-002
396 Hilton A-322
112 Westin C-034
253 Westin B-219
501 Marriot B-069
532 Marriot C-304
221 The Ritz A-007
Hotel Sofitel
Hilton
WestinMarriot
The Ritz
Hotel Sofitel
Marriot
Marriot
Is this looking familiar? Remember B+-trees
– B+ trees are said to be of m-order. A number of the designers choosing.– Each leaf has between m and [m-2] children.– All data is stored at the leaf level.– All leaves are at the same depth
Example?