b202 hashing

32
Module 2: Data Distribution, Hashing, and Index Access After completing this module, you will be able to: Describe the data distribution form and method. Describe Hashing. Describe Primary Index hash mapping. Describe the reconfiguration process. Describe a Block Layout. Describe File System Read Access.

Upload: hemant-kumar

Post on 13-Apr-2015

124 views

Category:

Documents


4 download

DESCRIPTION

B202 Hashing

TRANSCRIPT

Page 1: B202 Hashing

Module 2: Data Distribution, Hashing, and Index Access

After completing this module, you will be able to:

Describe the data distribution form and method.

Describe Hashing.

Describe Primary Index hash mapping.

Describe the reconfiguration process.

Describe a Block Layout.

Describe File System Read Access.

Page 2: B202 Hashing

Data Distribution

Teradata

Records From Client (in random sequence)

2 32 67 12 90 6 54 75 18 25 80 41FromHost

Convertedand

Hashed

Distributed

Formatted

Stored

AMP 4AMP 3AMP 1 AMP 2

Parsing Engine(s)

Parsing Engine(s)

EBCDIC ASCII

ASCII

Message Passing Layer

18

254 41

12

90

75

80

32

6

67 25

Data distribution is dependent on the hash value of the primary index.

Page 3: B202 Hashing

Hashing

• The Hashing Algorithm creates a fixed length value from any length input string.

• Input to the algorithm is the Primary Index (PI) value of a row.

• The output from the algorithm is the Row Hash.

– A 32-bit binary value.

– The logical storage location of the row.

– Used to identify the AMP of the row.

– Table ID + Row Hash is used to locate the Cylinder and Data Block.

– Used for distribution, placement, and retrieval of the row.

• Row Hash uniqueness depends directly on PI uniqueness.

• Good data distribution depends directly on Row Hash uniqueness.

• The algorithm produces random, but consistent, Row Hashes.

• The same PI value and data type combination always hash identically.

• Rows with the same Row Hash will always go to the same AMP.

• Different PI values rarely produce the same Row Hash (Collisions).

Page 4: B202 Hashing

Hash Related Expressions

• The SQL hash functions are:

HASHROW (column(s)) HASHBUCKET (hashrow)HASHAMP (hashbucket) HASHBAKAMP (hashbucket)

• Example 1:

SELECT HASHROW ('Teradata') AS "Hash Value"

,HASHBUCKET (HASHROW ('Teradata')) AS "Bucket Num"

,HASHAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Num"

,HASHBAKAMP (HASHBUCKET (HASHROW ('Teradata'))) AS "AMP Fallback Num" ;

Hash Value Bucket Num AMP Num AMP Fallback Num

F66DE2DC 63085 2 3

• Example 2:SELECT HASHROW ('Teradata') AS "Hash Value 1"

,HASHROW ('Teradata ') AS "Hash Value 2"

,HASHROW (' Teradata') AS "Hash Value 3” ;

Hash Value 1 Hash Value 2 Hash Value 3

F66DE2DC F66DE2DC 53F30AB4

Page 5: B202 Hashing

Hashing – Numeric Data Types

• The Hashing Algorithm hashes the same numeric value in different numeric data types to the same hash value.

• The following data types hash the same:

– BYTEINT, SMALLINT, INTEGER, DECIMAL(x,0), DATE

Example:

CREATE TABLE tableA (c1_bint BYTEINT ,c2_sint SMALLINT ,c3_int INTEGER ,c4_dec DECIMAL(8,0) ,c5_dec2 DECIMAL(8,2) ,c6_float FLOAT ,c7_char CHAR(10))

UNIQUE PRIMARY INDEX c1_bint, c2_sint);

INSERT INTO tableA (5, 5, 5, 5, 5, 5, '5');

SELECT HASHROW (c1_bint) AS "Hash Byteint",HASHROW (c2_sint) AS "Hash Smallint",HASHROW (c3_int) AS "Hash Integer",HASHROW (c4_dec) AS "Hash Dec80",HASHROW (c5_dec2) AS "Hash Dec82",HASHROW (c6_float) AS "Hash Float",HASHROW (c7_char) AS "Hash Char"

FROM tableA;

Hash Byteint 609D1715Hash Smallint 609D1715Hash Integer 609D1715Hash Dec80 609D1715Hash Dec82 BD810459Hash Float E40FE360Hash Char 334EC17C

Output from SELECT

Page 6: B202 Hashing

Multi-Column Hashing

• The Hashing Algorithm uses multiplication and addition to create the hash value for a multi-column index.

• Assume PI = (A, B)

[Hash(A) * Hash(B)] + [Hash(A) + Hash(B)] = [Hash(B) * Hash(A)] + [Hash(B) + Hash(A)]

• Example: A PI of (3, 5) will hash the same as a PI of (5, 3) if both c1 & c2 are equivalent data types.

CREATE TABLE tableB (c1_int INTEGER ,c2_dec DECIMAL(8,0))

UNIQUE PRIMARY INDEX (c1_int, c2_dec);

INSERT INTO tableB (5, 3);INSERT INTO tableB (3, 5);

SELECT c1_int AS c1,c2_dec AS c2,HASHROW (c1_int) AS "Hash c1",HASHROW (c2_dec) AS "Hash c2",HASHROW (c1_int, c2_dec) as "Hash c1c2"

FROM tableB;

*** Query completed. 2 rows found. 5 columns returned.

c1 c2 Hash c1 Hash c2 Hash c1c2

5 3 609D1715 6D27DAA6 6C964A823 5 6D27DAA6 609D1715 6C964A82

These two rows will hash the same and will produce a hash synonym.

Page 7: B202 Hashing

Multi-Column Hashing (cont.)

• A PI of (3, 5) will hash differently than a PI of (5, 3) if both column1 and column2 are not equivalent data types.

• Example:

CREATE TABLE tableB (c1_int INTEGER ,c2_dec DECIMAL(8,2))

UNIQUE PRIMARY INDEX (c1_int, c2_dec);

INSERT INTO tableB (5, 3);INSERT INTO tableB (3, 5);

SELECT c1_int AS c1,c2_dec AS c2,HASHROW (c1_int) AS "Hash c1",HASHROW (c2_dec) AS "Hash c2",HASHROW (c1_int, c2_dec) as "Hash c1c2"

FROM tableB;

*** Query completed. 2 rows found. 5 columns returned.

c1 c2 Hash c1 Hash c2 Hash c1c2

5 3.00 609D1715 A4E56902 0E452DAE3 5.00 6D27DAA6 BD810459 336B8C96

These two rows will not hash the same and probably will not produce a hash synonym.

Page 8: B202 Hashing

Additional Hash Examples

• A NULL value for numeric data types is treated as 0.

• Upper and lower case characters hash the same.

CREATE TABLE tableD (c1_int INTEGER ,c2_int INTEGER ,c3_char CHAR(4) ,c4_char CHAR(4))

UNIQUE PRIMARY INDEX (c1_int, c2_int);

INSERT INTO tableD ( 0, NULL, ’EDUC’, ‘Educ’ );

SELECT HASHROW (c1_int) AS "Hash c1",HASHROW (c2_int) AS "Hash c2",HASHROW (c3_char) AS "Hash c3",HASHROW (c4_char) AS "Hash c4"

FROM tableD;

Hash c1 Hash c2 Hash c3 Hash c4

00000000 00000000 34D30C52 34D30C52

Hash of 0 Hash of NULL Hash of ‘EDUC’ Hash of ‘Educ’

Example:

Result:

Page 9: B202 Hashing

Using Hash Functions to View Distribution

• Hash Functions can used to calculate the impact of NUPI duplicates and synonyms for a PI.

SELECT HASHROW (Last_Name, First_Name) AS "Hash Value",COUNT(*)

FROM customerGROUP BY 1ORDER BY 2 DESC;

SELECT HASHAMP (HASHBUCKET(HASHROW (Last_Name, First_Name))) AS "AMP #",COUNT(*)

FROM customerGROUP BY 1ORDER BY 2 DESC;

Hash Value Count(*)

2D7975A8 1214840BD7 7

(Output cut due to length)

E7A4D910 1AAD4DC80 1 The largest

number of NUPI duplicates or synonyms is 12.

AMP # Count(*)

7 9296 9164 8995 8912 8643 8641 8330 821 AMP #7 has the

largest number of rows.

Page 10: B202 Hashing

Primary Index Hash Mapping

Primary Index Valuefor a Row

Hashing Algorithm

DSW(first 16 bits)

Remaining 16 bitsRow Hash (32 bits)

Hash Map - 65,536 entries (memory resident)

Message Passing Layer (PDE and BYNET)

AMP0

AMP1

AMP2

AMP3

AMP4

AMP5

AMP6

AMP7

AMP8

AMP9

DSW - Destination Selection Word

Page 11: B202 Hashing

Hash Maps

• Hash Maps are the mechanism for determining which AMP gets a row.

• There are four (4) Hash Maps on every TPA node.

• The Hash Maps are loaded into PDE memory space of each TPA node when PDE software boots.

• Each Hash Map is an array of 65,536 entries and is approximately 128 KB in size.

• The Communications Layer Interface checks all incoming messages against the designed Hash Map.

• For a PI or USI operation, only the AMP whose number appears in the referenced Hash Map entry is interrupted.

Reconfiguration Fallback

Current Configuration Primary Reconfiguration Primary

Communications Layer Interface (PE, AMP)

Current Configuration Fallback

Page 12: B202 Hashing

Primary Hash Map

DSW(first 16 bits)

Remaining 16 bits

Row Hash (32 bits)

• The first 16 bits of a Row Hash is the Destination Selection Word (DSW).

• The DSW points to one map and one entry within that map.

• The referenced Hash Map entry identifies the AMP for the row hash.

PRIMARY HASH MAP

15 14 15 15 13 14 12 14 13 15 15 12 11 12 13 1413 14 14 10 15 08 11 11 15 09 10 12 09 09 10 1310 10 13 14 11 11 12 12 11 11 14 12 13 14 12 1215 15 13 14 06 08 13 14 13 13 14 14 07 08 15 0715 04 05 07 09 06 09 07 15 15 03 08 15 15 02 0601 00 05 04 08 10 10 05 08 08 06 09 07 06 05 11

0 1 2 3 4 5 6 7 8 9 A B C D E F

000001002003004005

Note:This partial hash map is associated with a 16 AMP System.

Page 13: B202 Hashing

DSW (first 16 bits) Remaining 16 bits

Row Hash (32 bits)

Assume row hash of 0023 1AB2

8 AMP system – AMP 0516 AMP system – AMP 14

PRIMARY HASH MAP – 8 AMP System

07 06 07 06 07 04 05 06 05 05 06 06 07 07 03 0407 07 02 04 01 00 05 04 03 02 03 05 01 00 02 0601 00 05 05 03 02 04 03 01 00 06 02 04 04 01 0007 06 03 03 06 06 02 02 01 00 01 00 07 07 05 0704 04 05 07 05 06 07 07 03 02 03 04 01 00 02 0601 00 05 04 03 02 06 05 01 00 06 05 07 06 05 07

0 1 2 3 4 5 6 7 8 9 A B C D E F

000001002003004005

PRIMARY HASH MAP – 16 AMP System

15 14 15 15 13 14 12 14 13 15 15 12 11 12 13 1413 14 14 10 15 08 11 11 15 09 10 12 09 09 10 1310 10 13 14 11 11 12 12 11 11 14 12 13 14 12 1215 15 13 14 06 08 13 14 13 13 14 14 07 08 15 0715 04 05 07 09 06 09 07 15 15 03 08 15 15 02 0601 00 05 04 08 10 10 05 08 08 06 09 07 06 05 11

0 1 2 3 4 5 6 7 8 9 A B C D E F

000001002003004005

Hash Maps for Different Systems

Page 14: B202 Hashing

Fallback Hash Map

DSW (first 16 bits) Remaining 16 bits

Row Hash (32 bits)

Note:16 AMP System with 2 AMP clusters

Assume row hash of 0023 1AB2

Primary AMP – 14Fallback AMP – 06

PRIMARY HASH MAP – 16 AMP System

15 14 15 15 13 14 12 14 13 15 15 12 11 12 13 1413 14 14 10 15 08 11 11 15 09 10 12 09 09 10 1310 10 13 14 11 11 12 12 11 11 14 12 13 14 12 1215 15 13 14 06 08 13 14 13 13 14 14 07 08 15 0715 04 05 07 09 06 09 07 15 15 03 08 15 15 02 0601 00 05 04 08 10 10 05 08 08 06 09 07 06 05 11

0 1 2 3 4 5 6 7 8 9 A B C D E F

000001002003004005

FALLBACK HASH MAP – 16 AMP System

07 06 07 07 05 06 04 06 05 07 07 04 03 04 05 0605 06 06 02 07 00 03 03 07 01 02 04 01 01 02 0502 02 05 06 03 03 04 04 03 03 06 04 05 06 04 0407 07 05 06 14 00 05 06 05 05 06 06 15 00 07 1507 12 13 15 01 14 01 15 07 07 11 00 07 07 10 1409 08 13 12 00 02 02 13 00 00 14 01 15 14 13 03

0 1 2 3 4 5 6 7 8 9 A B C D E F

000001002003004005

Page 15: B202 Hashing

Reconfiguration

65,536 Hash Map Entries

10,923 10,923 10,923 10,923 10,922 10,922

ExistingAMPs

NewAMPs

16,384 EMPTY EMPTY16,384 16,384 16,384

• The system creates new Hash Maps to accommodate the new configuration.

• Old and new maps are compared.

• Each AMP reads its rows, and moves only those that hash to a new AMP.

• It is not necessary to offload and reload data due to a reconfiguration.

Number of New AMPs 2 1

SUM of Old + New AMPs 6 3= = 33.3%=

Percentage ofRows Moved = to new AMPs

Page 16: B202 Hashing

Message Passing Layer

Parsing EngineSQL Request

ParserHashing Algorithm

48 Bit TABLE ID 32 Bit Row Hash Index Value

AMP File System

DSW

Logical Block Identifier

Vdisk

Logical Row Identifier

DataBlock

Only the AMP whose number appears in the referenced Hash Map is interrupted.

SELECT * FROM tablename WHERE primaryindex = value(s);

Row Retrieval via PI Value – Overview

Page 17: B202 Hashing

Names and Object IDs

DBC.Next (1 row)

NEXT DATABASE ID 4 Other Counters

NEXT TVM ID

• Each Database/User/Profile/Role – is assigned a globally unique numeric ID.

• Each Table, View, Macro, Trigger, Stored Procedure, Join Index, and Hash Index – is assigned a globally unique numeric ID.

• Each Column – is assigned a numeric ID unique within its Table ID.

• Each Index – is assigned a numeric ID unique within its Table ID.

• The DD keeps track of all SQL names and their numeric IDs.

• The PE’s RESOLVER uses the DD to verify names and convert them to IDs.

• The AMPs use the numeric IDs supplied by the RESOLVER.

Page 18: B202 Hashing

Table ID

A Unique Value for Tables, Views, Macros, Triggers, and Stored Procedures comes from DBC.Next dictionary table.

Unique Value also defines the type of table:• Normal data table• Permanent journal• Global Temporary • Spool file

Sub-table ID identifies the part of a table the system is looking at.• Table Header 0 (shown here in decimal value)• Primary data copy 1024• Fallback data copy 2048• First secondary index primary copy 1028• First secondary index fallback copy 2052• Second secondary index primary copy 1032• Second secondary index fallback copy 2056• Third secondary index primary copy 1036 and so on…

Table ID plus Row ID makes every row in the system unique.

Examples shown in this manual use the Unique Value to represent the entire Table ID.

UNIQUE VALUE SUB-TABLE ID+32 Bits 16 Bits

Page 19: B202 Hashing

Row ID

On INSERT, the system stores both the data values and the Row ID.

ROW ID = ROW HASH and UNIQUENESS VALUE

Row Hash• Row Hash is based on Primary Index value.• Multiple rows in a table could have the same Row Hash.• NUPI duplicates and hash synonyms have the same Row Hash.

Uniqueness Value• Type system creates a numeric 32-bit Uniqueness Value.• The first row for a Row Hash has a Uniqueness Value of 1.• Additional rows have ascending Uniqueness Values.• Row IDs determine sort sequence within a Data Block.• Row IDs support Secondary Index performance.• The Row ID makes every row within a table uniquely identifiable.

Duplicate Rows• Row ID uniqueness does not imply data uniqueness.

Page 20: B202 Hashing

AMP File System – Locating a Row via PI

• The AMP accesses its Master Index (always memory-resident)

• An entry in the Master Index identifies a Cylinder # and the AMP accesses the Cylinder Index (frequently memory-resident).

• An entry in the Cylinder Index identifies the Data Block.

• The Data Block is the physical I/O unit and may or may not be memory resident.

• A search of the Data Block locates the row(s).

Table ID Row Hash PI Value

Master Index

Cylinder Index(accessed in FSG Cache)

Data Block(accessed in FSG Cache)

The PE sends request to an AMPvia the Message Passing Layer (PDE & BYNET).

AMP Memory

CIVdisk

Row

Page 21: B202 Hashing

Teradata File System Overview

Cylinder 3872 sectors

Data Block B2

SRD - A SRD - BDBD - A1 DBD - A2 DBD - B1 DBD - B2

Data Block B1 Data Block A1

Data Block A2

Cylinder Index

Data Block B4

SRD - B DBD - B3 DBD - B4

Data Block B3

Cylinder Index

DBD - B5

Data Block B5

Master Index CID CID CIDCID . . .AMP Memory Resident

VDisk

Page 22: B202 Hashing

Master Index Format

• Memory resident structure specific to each AMP.

• Contains Cylinder Index Descriptors (CID) - one for each allocated Cylinder.

• Each CID identifies the lowest Table ID / Part# / Row ID and the last Table ID / Part# / Row Hash for a cylinder.

• Range of Table ID / Part# / Row IDs does not overlap with any other cylinder.

• Sorted list of CIDs.

V2R5 Notes:

• The Master index and Cylinder Index entries are 4 bytes larger to include the Partition #’s to support Partitioned Primary Index (PPI) tables.

• For non-partitioned tables, the partition number is 0 and the Master and Cylinder Index entries (for NPPI tables) will use 0 as the partition number in the entry.

CID 1

.

.

CID 2

CID 3

CID n

Master IndexCI

CI

CI

CI

Cylinder

Cylinder

Cylinder

Cylinder

Vdisk

Cylinder 0Seg. 0

FIB (contains Free Cylinder

List)

Page 23: B202 Hashing

Cylinder Index Format

• Located at beginning of each Cylinder..

• There is one SRD (Subtable Reference Descriptor) for each subtable that has data blocks on the cylinder.

• Each SRD references a set of DBD(s). A DBD is a Data Block Descriptor..

• One DBD per data block - identifies location and first Part# / Row ID and the last Part # / Row Hash within a block.

• FSE - Free Segment Entry - identifies free sectors.

SRD A

.

DBD A1

DBD A2

SRD B

DBD B1

DBD B2.

FSE

FSE

Data Block B2

Data Block B1

Data Block A1

Data Block A2

VDisk

CylinderCylinder Index

Range of Free Sectors

Range of Free Sectors

V2R5 Note:

Cylinder Index entries are 4 bytes larger to include the Partition #’s.

Page 24: B202 Hashing

Data Block Layout

• Contains rows with same table ID.

• Contains rows within range of Row IDs of associated DBD entry and the range of Row IDs does not overlap with any other data block.

• Logically sorted list of rows.

• Variable Block Sizes:

V2R2 (and before) – 1 to 63 sectors (app. 32 KB)V2R3 and V2R4 – 1 to 127 sectors (app. 64 KB)V2R4.1, V2R5.0, and V2R5.1 – 1 to 255 sectors (app. 128 KB)

• A Row has maximum size of 64,256 bytes with releases V2R3 through V2R5.1.

Hea

der

(36

byt

es)

Tra

iler

(2

byt

es)Row

Reference

Array -3 -2 -1 0

Row 1

Row 2

Row 3

Row 4

Page 25: B202 Hashing

General Row Layout

V2R4.1 and V2R5 tables with NPPI

AdditionalOverhead

(2 or more bytes)

RowHash

UniquenessValue

ROW ID Column Data Values

2 Bytes 4 Bytes 4 Bytes (Variable) 2 Bytes

RowRef.

Array

RowLength

• The Primary Index value determines the Row Hash.

• The system generates the Uniqueness Value.

• NPPI – Non-Partitioned Primary Index (typical Teradata primary index)

• For an NPPI table, the Row ID will be unique for every row in a table (for both SET and MULTISET).

• Rows in a table may vary in length. The maximum row length is 64,256 bytes (or 62.75 KB).

• In V2R5, if the Primary Index is not partitioned, then the row is implicitly assumed to be in partition #0.

• Partitioned Primary Indexes will be covered in another module.

Page 26: B202 Hashing

Example of Locating a Row – Master Index

SELECT *FROM employeeWHERE empno = 3755;

To CYLINDER INDEX

Master Index Free Free Cylinder Cylinder

Lowest Highest Pdisk and List List Table ID Part # Row ID Table ID Part # Row Hash Cylinder Number Pdisk 0 Pdisk 1

: : : : : : : : :078 0 58234, 2 095 0 72194 204 124 761098 0 00107, 1 100 0 00676 037 125 780100 0 00773, 3 100 0 01361 169 168 895100 0 01361, 2 100 0 02884 777 170 896

100 0 02937, 1 100 0 03602 802 183 914100 0 03662, 1 100 0 03999 117 189 935100 0 04123, 2 100 0 05888 888 201 941100 0 05974, 1 100 0 07328 753 217 1012100 0 07353, 1 120 0 00469 477 220 1234123 1 00343, 2 123 2 01864 529 347 1375123 2 06923, 1 123 3 00231 943 702 1520

: : : : : : : : :

What cylinder would have Table ID = 100, Row Hash = 00598?

Part # - Partition Number - V2R5

Table ID Part # Row Hash empno

100 0 1000 3755

Page 27: B202 Hashing

Example of Locating a Row – Cylinder Index

SELECT *FROM employeeWHERE empno = 3755;

Table ID Part # Row Hash empno

100 0 1000 3755

Cylinder Index - Cylinder #169

SRDs Table ID First DBD DBDOffset Count Free Block List

SRD #1 100 FFFF 12 Free Sector Entries

DBDs Part # Lowest Part # Highest Start Sector Row Start SectorRow ID RowHash Sector Count Count Sector Count

: : : : : : : : : :DBD #4 0 00867, 2 0 00902 1010 4 5 0270 3DBD #5 0 00938, 1 0 00996 0093 7 10 0301 5DBD #6 0 00998, 1 0 01010 0789 6 8 0349 5DBD #7 0 01010, 3 0 01177 0525 3 4 0470 4DBD #8 0 01185, 2 0 01258 0056 5 6 0481 6DBD #9 0 01290, 1 0 01333 1138 5 6 0550 5 : : : : : : : : : :

This example assumes that only 1 table ID has rows on this cylinder and the table is not partitioned.

Part # - Partition Number - V2R5

Page 28: B202 Hashing

Example of Locating a Row – Data Block

RowHeap

Header (36) Row 1

Row 3 Row 2

Row 4 Row 6

Row 5 Row 7

Row 8

Row Reference Trailer (2) Array

Sector

789

790

791

792

793

794

• A block is the physical I/O unit.

• The block header contains the Table ID (6 bytes).

• Only rows for the same table reside in the same data block.

• Rows are not split across block boundaries.

• Blocks within a table vary in size. The system adjusts block sizes dynamically.

• Blocks may be from 512 bytes to 127.5 KB (1 to 255 disk sectors). With V2R3 and V2R4.0, the maximum block size is 127 sectors (63.5 KB).

• Data blocks are not chained together.

• Row Reference Array pointers are stored (sorted) in reverse sequence based on Row ID within the block.

Page 29: B202 Hashing

Accessing the Row within the Data Block

• Within the data block, the Row Reference Array is used to locate the first row with a matching Row Hash value within the block.

• The Primary Index data value is used as a row qualifier to eliminate synonyms.

Data BlockSectors

789

794

Value Hash3755 1000

IndexHash Uniq Value Data Columns

998 1 4219 Row data

999 1 2968 Row data

999 2 6324 Row data

1000 1 1006 Row data

1000 2 3755 Row data

1002 1 6838 Row data

1008 1 8825 Row data

1010 1 0250 Row data

SELECT *FROM employeeWHERE employee_number = 3755;

Page 30: B202 Hashing

AMP Read I/O Summary

The Master Index is always memory resident.

The AMP reads the Cylinder Index if not memory resident.

The AMP reads the Data Block if not memory resident.

• AMP memory, cache size, and locality of reference determine if either of these steps require physical I/O.

• Often, the Cylinder Index is memory resident and a Unique Primary Index retrieval requires only one (1) I/O.

Table ID Row Hash PI Value

Master Index

Cylinder Index(accessed in FSG Cache)

Data Block(accessed in FSG Cache)

Message Passing Layer

AMP Memory

CIVdisk

Row

Page 31: B202 Hashing

Review Questions

1. The Row Hash for a PI value of 824 is the same for the data types of INTEGER and DECIMAL(18,0). True or False. _______

2. The first 16 bits of the Row Hash is referred to as the _________ or the _______ _________ .

3. The Hash Map consists of 65,536 entries which identify an _____ number for the Row Hash.

4. The Current Configuration ___________ Hash Map is used to locate the AMP to locate/store a row based on PI value.

5. The ____________ utility is used to redistribute rows to a new system configuration with more AMPs.

6. The Unique Value of a Table ID comes from the dictionary table named DBC.________ .

7. The Row ID consists of the _______ ________ and the __________ _____ .

8. The _______ _______ contains a Cylinder Index Descriptor (CID) for each allocated Cylinder.

9. The _______ _______ contains an entry for each data block in the cylinder.

10. The ____ __________ ________ consists of a set of 2 byte pointers to the data rows in data block.

11. For Teradata V2R5.0, the maximum block size is approximately _______ and the maximum row size is approximately _______ .

12. The Primary Index data value is used as a row qualifier to eliminate hash _____________ .

Page 32: B202 Hashing

Module 2: Review Question Answers

1. The Row Hash for a PI value of 824 is the same for the data types of INTEGER and DECIMAL(18,0). True or False. True

2. The first 16 bits of the Row Hash is referred to as the DSW or the bucket number .

3. The Hash Map consists of 65,536 entries which identify an AMP number for the Row Hash.

4. The Current Configuration Primary Hash Map is used to locate the AMP to locate/store a row based on PI value.

5. The Reconfig utility is used to redistribute rows to a new system configuration with more AMPs.

6. The Unique Value of a Table ID comes from the dictionary table named DBC.Next .

7. The Row ID consists of the Row Hash and the Uniqueness Value .

8. The Master Index contains a Cylinder Index Descriptor (CID) for each allocated Cylinder.

9. The Cylinder Index contains an entry for each data block in the cylinder.

10. The Row Reference Array consists of a set of 2 byte pointers to the data rows in data block.

11. For Teradata V2R5.0, the maximum block size is approximately 128 KB and the maximum row size is approximately 64KB.

12. The Primary Index data value is used as a row qualifier to eliminate hash synonyms or collisions .