the internals

30
The Internals (part 1) Table of Contents Background ............................................................................................................................................................. 2 Heap Table .............................................................................................................................................................. 2 First Level Bitmap ................................................................................................................................................ 3 Second Level Bitmap ........................................................................................................................................... 5 Extent Information .............................................................................................................................................. 6 Data Blocks .......................................................................................................................................................... 8 Data Types ........................................................................................................................................................... 9 Varchar and Char............................................................................................................................................. 9 Number ......................................................................................................................................................... 10 Date and Timestamp ..................................................................................................................................... 11 B-Tree Index .......................................................................................................................................................... 11 Unique Index ..................................................................................................................................................... 12 Non-Unique Index ............................................................................................................................................. 15 Composite Index ............................................................................................................................................... 16 Function-Based Index ........................................................................................................................................ 17 Local and Global Index ...................................................................................................................................... 18 Bitmap Index ......................................................................................................................................................... 20 IOT ......................................................................................................................................................................... 21 Row Migration....................................................................................................................................................... 23 The Symptom .................................................................................................................................................... 23 How Things are Working? ................................................................................................................................. 24 The Impact......................................................................................................................................................... 27 Row Chaining......................................................................................................................................................... 28 References............................................................................................................................................................. 30 What’s Next? ......................................................................................................................................................... 30

Upload: heribertus-bramundito

Post on 04-Jul-2015

261 views

Category:

Technology


5 download

DESCRIPTION

A brief description of Oracle Internal structure: table, index, etc

TRANSCRIPT

Page 1: The internals

The Internals (part 1)

Table of Contents

Background ............................................................................................................................................................. 2

Heap Table .............................................................................................................................................................. 2

First Level Bitmap ................................................................................................................................................ 3

Second Level Bitmap ........................................................................................................................................... 5

Extent Information .............................................................................................................................................. 6

Data Blocks .......................................................................................................................................................... 8

Data Types ........................................................................................................................................................... 9

Varchar and Char ............................................................................................................................................. 9

Number ......................................................................................................................................................... 10

Date and Timestamp ..................................................................................................................................... 11

B-Tree Index .......................................................................................................................................................... 11

Unique Index ..................................................................................................................................................... 12

Non-Unique Index ............................................................................................................................................. 15

Composite Index ............................................................................................................................................... 16

Function-Based Index ........................................................................................................................................ 17

Local and Global Index ...................................................................................................................................... 18

Bitmap Index ......................................................................................................................................................... 20

IOT ......................................................................................................................................................................... 21

Row Migration ....................................................................................................................................................... 23

The Symptom .................................................................................................................................................... 23

How Things are Working? ................................................................................................................................. 24

The Impact......................................................................................................................................................... 27

Row Chaining ......................................................................................................................................................... 28

References ............................................................................................................................................................. 30

What’s Next? ......................................................................................................................................................... 30

Page 2: The internals

Background “When you love someone - you'll do anything

you'll do all the crazy things that you can't explain”

Yeah…, that is a few lyric from Bryan Adam’s “When You Love Someone”. Analogy to that, when I like

something, I want to understand how the things are working or stored, internally. Thus, while reading few

articles and also Jonathan Lewis’s book about Oracle internal structure, I decide to do an exercise regarding

Oracle internal structure: table, index, undo, redo, etc. I run these exercises against Oracle 10.2.0.5 in

Windows box with ASSM (the most used setting in current production environment). The objectives of this

exercise are:

1. Understand the structure of table, b-tree and bitmap index, undo segment and redo record

2. How Oracle store data for several data types

3. How Oracle build the result of query involving undo information

4. DML operation

5. Impact of move/ shrink space command

6. Other symptoms: row migration, deadlock, snapshot too old

Heap Table

This is the most popular table type in Oracle or even in other RDBMS system. It will store data in un-

ordered way using first fit algorithm. Using simple “ALTER SYSTEM DUMP” command, we can dump the

structure of any segment in the datafile. The same method will work for undo segment as well.

ALTER SYSTEM DUMP DATAFILE 4 BLOCK MIN 123 BLOCK MAX 130

To observe the structure of heap table, I have created 2 tables: EMPTY and ONE_ROW. The codes of

those 2 tables creation are as attached below. Along with the script, I have attached as well the trace file for

this exercise.

exercise_01.sql

The first dump file is coming from empty table (0 record), while the second dump file is coming from

table with few records in it, but since the PCTFREE is 99, it is enough to force the number of extent to be 2. In

the general, the structure of heap table is like this:

Page 3: The internals

In ASSM, Oracle does not use Freelist – Freelist Group, but introduces new mechanism which is called

BMB (Bitmap Managed Block) to manage free space in the block. BMB has a structure like B-tree index (root,

branch and leaf block) where the free space information of block is being kept inside leaf structure (it is called

as First Level Bitmap, L1), while L2 and L3 contains address to another level. For example, L2 contains address

of L1 and L3 contains address of L3. But during this entire exercise I cannot see any segment in my test case is

having Third Level Bitmap, L3 (maybe due to size of the table). Beside data block address, Oracle also keeps 4

bits information that indicate the available space in the block, as the following:

BINARY CODE

DECIMAL CODE

DESCRIPTION

0000 0 Unformatted

0001 1 FULL

0010 2 0-25% Free

0011 3 25-50% Free

0100 4 50-75% Free

0101 5 75-100% Free

First Level Bitmap

In this section we can see information, like:

1. How many unformatted/ formatted blocks in the segment

2. HWM and Second Level Bitmap

3. Free block statistics and DBA (data block address) range(s)

4. Transaction ID of locker (if available)

# Unformatted blocks

Free block map

Extent information

HWM address

# Available blocks

for storing data

Summary of free block:

nf1: 0-25% free

nf2: 25-50% free

nf3: 50-75% free

nf4: 75-100% free

Second level

bitmap address

Object ID and

locker XID

Page 4: The internals

From above block header information, we can easily see this table has 1 extent only and the extent

size is 8 blocks, it is started from 0x010055b9 to 0x010055c0 (these numbers are a DBA, data block address, in

hexadecimal format – started with 0x). We can convert those numbers into block number and file number

using DBMS_UTILITY package.0x010055b9 is 16799161 in decimal format while0x010055c0 is 16799168.

The first 3 blocks are being used for metadata information, such as: First Level Bitmap Block, Second

Level Bitmap Block and Extent Header Information. Since the table is empty, Oracle only format the first 3

blocks (for storing metadata information) and leave the other 5 blocks unformatted. The HWM is 0x010055bc,

which is block number 3 and since no rows in the table, “#blocks below” HWM is showing 0.

2 extents

8 blocks are

75-100% free

and 5 blocks

are FULL

Page 5: The internals

Above is the dump output of ONE_ROW table with 2 extents in the table. It is interesting to know that

when we insert a row (even only one row), Oracle will format all blocks in the extent (unformatted: 0).Freeness

status also shows nf4 = 8 which is matched with free blocks map in the bottom of this dump section. This part

reminds me to the output of DBMS_SPACE.SPACE_USAGE function (below output is taken from oCheck), and I

understand now why this function is quite fast, regardless the size of table, because this function get the

information of free space from block header, not scanning the whole table.

Second Level Bitmap

Below are the output of EMPTY and ONE_ROW table, it contains the address of BMBL1 along with

indication of available space. For below 2 tables, we can see in the available space indicator, most of the blocks

are 75-100% free (Free: 5)

For comparison, I have created another table with more than 1 entry for L1 (it has 3 entry for L1).

Please find below the script and dump output for the details. First 2 L1 entry for this table has available space

indicator equal to 1 (means most of the blocks are FULL) and both table has 2 extents, while the last entry is

showing 5 (75-100% free) with only 1 extent. Let’s see the complete representation of Second Level Bitmap

along with First Level Bitmap of this table.

DBA to the next

section, extent

information

Available space

indicator DBA of L1

Object ID

Page 6: The internals

exercise_02.sql

Extent Information

Again, I will use the output from BIG_ONE (the output of ONE_ROW can be seen in above attached

dump file). I will just read the output of dump to check the result. #extents are 5 with 8 blocks in each extent,

#L1 with free space

#L1 with full space??

Page 7: The internals

so #blocks will be 40 (shown by green circle). Oracle keeps the details of BMB information in this segment. In

ASSM, we have 2 types of HWM, High HWM and Low HWM. All blocks below Low HWM are usable (have been

formatted) and this is the original HWM in MSSM. All blocks above High HWM have not been unformatted.

HWM information

This is where Oracle

keep details

information about

BMB to track and

manage free space

DBA of L2

Extent Map

This is how to interpret

the Auxiliary Map

information

Page 8: The internals

Data Blocks

The next structure in table segment is data block. Let’s go to ONE_ROW table to see the structure. In

general we can divide data block structure into 3 parts: header information (there is ITL entry in this part), row

directory and table’s row. Oracle uses hexadecimal format for Object ID, not sure what is the reason.

The ITL entry can be used to track down which transaction is locking the row and where is the Undo

Block Address (UBA) is being kept. We will see all those relations later in the next part. From above output we

can see all three rows are being locked by first transaction in the ITL entry (shown by purple circle)

Object ID

ITL entry:

Xid, transaction ID

Uba, undo block address

Flag of DML

Number of lock (Lck)

Scn information

DBA of L1

Row directory

Table’s row

tsiz: table size

hsiz: header size??

ntab: number of table

nrow: number of row

fsbo: free space begin offset

fseo: free space end offset

avsp/tosp: average/ total free

space (calculated as fsbo – fseo)

Page 9: The internals

Data Types

In this section we are going to see how Oracle stores the data in the block. I am going to cover only few

data types, such as: Varchar, Char, Number and Date. Below exercise is being used to show how Oracle stored

the data in the data block. If the purpose is to see how Oracle stores the data only, we can use DUMP function

instead of dumping block using “ALTER SYSTEM DUMP” command, it is faster and easier.

most_type.LST

Varchar and Char

Oracle uses ASCII code to stores both Varchar and Char data type. 65 is ASCII code for A, 66 is B, etc.

Since Char data type is fixed width data type, Oracle will use white space (ASCII code 32) for right padding the

data. Oracle stores the value in hexadecimal format, so before we apply CHR function, we need to convert the

value into decimal first.

Page 10: The internals

Number

Oracle uses different way to store possitive, negative and zero number. I will cover only Number data

type (not Float, Double, etc).For Number data type, in general Oracle follows theserules:

1. 2. First byte is exponential information (10x)

3. Second byte is the Integer part

4. Last byte is negative sign if the value is 0x66 (102)

5. The rest bytes are Decimal part

6. All bytes are stored in hexadecimal format (0x) and Oracle break into 2 bytes each from the beginning

7. The real value for Integer and Decimal part (point 3 and 5) has to be substracted by 1

8. For possitive number, exponential bit is [value] – [0xC1 (193)]

9. For negative number:

a. exponential bit is [0x3E (62)] – [value]

b. data bytes is [0x66(102)]– [value]

10. The final number for exponential byte has to be multiplied by 2

11. For zero, Oracle stores 0x80 (128) without exponential and negative sign bytes

0xc1 – 0xc1 = 0 * 2 = 0

0x02 – 1 = 0x01 = 1

Final = 100 * 1 = 1

0xc1 – 0xc1 = 0 * 2 = 0

0x0b – 1 = 0x0a = 10

0x18 – 1 = 0x17 = 23

Final = 100 * 10.23 = 10.23

Last byte is 0x66, negative number

0x3e – 0x3e = 0 * 2 = 0

0x66 – 0x5b – 1 = 0x0a = 10

0x66 – 0x4e – 1 = 0x17 = 23

Final = -1 * 100 * 10.23 = -10.23

0xc1 – 0xc1 = 0 * 2 = 0

0x02 – 1 = 0x01 = 1

0x17 – 1 = 0x16 = 22

0x1f – 1 = 0x1e = 30

Final = 100 * 1.2230 = 1.223

Page 11: The internals

After we understand how Oracle stores the Number data type, don’t you curious why Oracle uses 0xC1

and 0x3E as a special number for exponential for possitive and negative number respectively? Why Oracle uses

0x66 as negative sign? Why Oracle didn’t use any other number?

These are my best guesses so far:

0XC1 = 193

Maximum value for each byte is 0xFF (255), so the maximum exponential value for possitive

number will be 255 – 193 = 62. This number has to be multiplied by 2 according to above rules, so

62 * 2 = 124. Since the Integer part can be 2 digit (maximum value isdecimal 99), we have 1 more

digit to be added, so total 124 + 1 = 125.

It means the maximum value for possitive value is 9.9999 * 10125

0X3E = 62

0x3E is 62 in decimal value, very nice coincident, right? so according to the rules, it has to be

multiplied by 2, so 62 * 2 = 124 plus 1 more from Integer part of the number. For negative value,

the minimum value is 10, so the final will be 124 + 1 = 125. It means the minimum value for

negative is -1 * 10125

0X66 = 102

0x66 is 102 in decimal format. According to above rules, the real number is X (negative sign) –

stored value – 1. Since the maximum posible value for stored value is 100, so X will be 100 + 1 +1

(this is to avoid the result is 0).

So, Oracle uses above number as range for Number data type itself (-1 * 10125 – 9.9 * 10125). The other

number data type (Float, double, etc) will be having different “special number” I guess, but it’s enough for me

at this stage And why Oracle uses 0x80 for storing 0 doesn’t make me interested to find out the reason.

Date and Timestamp

The different between Timestamp and Date format is that Oracle stores zero for Time part in Date data

type. There are the rules which are being used for these 2 data types:

1. All data is stored in decimal format

2. For Century and Year part, Oracle add 100 to the stored value. The reason is to support BC and AD

dates (please read Thomas Kyte book: Expert Oracle Database Architechture)

3. For Month and Date part, Oracle stores as is

4. For Hour, Minute and Second, we need to substract by 1 to get the real value

B-Tree Index

Moving to the index part, the first thing to be observed of course B-tree index since this is the most

popular index type in the database world. Firstly we are going to see how Oracle stores unique and non-unique

index, and after that we will see also how it is working for function-based index.

Page 12: The internals

Oracle also uses BMB to track and manage free space in the index segment. So I will not repeat again

the explanation for First Level Bitmap and Second Level Bitmap, but I will directly go to the Segment Header

(Extent Control Header). For index segment, there are 2 type of structures in the data segment: branch block

and leaf block.

Unique Index

I have created unique index on ONE_ROW (ID) column with pctfree 98 to expand the size of the index.

index_unique.LST

Object ID

ITL entry:

Xid, transaction ID

Uba, undo block address

Flag of DML

Number of lock (Lck)

Scn information

DBA of L1

kdxcolev: index level (0 = leaf block; 1 = branch block)

kdxcolok: denotes whether structural block transaction is

occurring

kdxcoopc: internal operation code

kdxconco: index column count

kdxcosdc: count of index structural changes involving block

kdxconro: number of index entries (does not include kdxbrlmc

pointer)

kdxcofbo: free space begin offset

kdxcofeo: free space end offset

kdxcoavs: average free space (calculated as kdxcofbo –

kdxcofeo)

kdxbrlmc: entry to the leaf block

kdxbrsno: last index entry to be modified

kdxbrbksz: size of usable block space

Address to Leaf Block

Index key

Page 13: The internals

Above is the output of index dump for branch block. There is only 1 column for every row in the

branch block, which holds index key. The address of leaf block is kept in the row header. Now let’s take a look

into Leaf Block.

It is clear now that Oracle keeps the linked list information in the leaf block (kdxlenxt and kdxleprv),

and those information are being used to move from one leaf to another leaf block.

In Unique Index, Oracle stores ROWID information in the row header. ROWID contains information of

relative file number, block number and row number. ROWID is being used to pointing to the respective row in

the table. It is represented in hexadecimal format. There are 6 bytes in the ROWID, first 2 bytes are relative file

number, next 2 bytes are block number and the latest 2 bytes are row number. To break down the ROWID,

first convert the value into decimal format and follow below rules (ex. 01 0055 c400 00):

0x 01 00 = 256, to get relative file number, we need to divide by 64, so 256 / 64 = 4

0x 55 c4 = 21956 block number

0x 00 00 = 0 row number

So, the results are matched with below query.

Object ID

DBA of L1

kdxlespl: bytes of uncommitted data at time of block split that

have been cleaned out

kdxlende: number of deleted entries

kdxlenxt: pointer to the next leaf block in the index structure

kdxleprv: pointer to the previous leaf block in the index

kdxlebksz: usable block space

kdxledsz: size of data in row header Index value

ROWID

Page 14: The internals

After we understand how Oracle stores index key and ROWID in the branch and leaf block, let’s try to

draw the index structure in different way. Below trace files are being used as source (this index has root –

branch – leaf structure). Since the index key comes from numeric column, we can use the same rules for

reading data block value for Number data type.

And finally we can draw the index’s structure in a “tree” form as below (this is why it’s called as B-tree )

Page 15: The internals

Non-Unique Index

I am going to use the same table ONE_ROW, add one column (ID2) and insert few duplicate values for

ID2. The purpose is to see in non-unique index, how Oracle handles duplicate data in the branch and leaf block.

index_unique_add_duplicate_value.LST

First let’s take a look at the branch block. For non unique index, Oracle adds 1 extra column to keep

the entry unique. There are 2 types of information in that column, if there is only 1 row for the respective

index key, Oracle stores “TERM” in the new column. Apart from that, if there is more than 1 row for any index

Index value

ROWID

Index value

with multiple

rows

Index value

with 1 row

Page 16: The internals

key (see index key c1 03, which is storing ID2 = 2), Oracle stores ROWID information and perhaps Oracle will

uses “TERM”as well. It looks like Oracle uses “TERM” for row with lowest ROWID, but it doesn’t means

anything I guess. We can see for row#3 in the branch block, Oracle doesn’t store the complete ROWID

information (no row number information), maybe it is part of the internal algorithm to keep the additional

column (ROWID column) as short as possible.

Moving to the leaf block, we can see that Oracle uses different approach to store the ROWID. In

unique index, Oracle keeps the ROWID information in the row header, while in non-unique index, Oracle add

one new column (mentioned as col 1;) to store the ROWID. The purpose of this approach is to keep the index

entry unique (exactly the same reason for branch block).

Composite Index

Composite index is index with more than 1 column as the index key. In this section we are going to

observe composite unique index, let’s create table and index as following and capture the branch and leaf

block dump.

It should be

01 00 55 d1 00 00

right?

Branch block

Leaf block

Page 17: The internals

It is as expected that this index has 2 columns (ID and ID2), and since this is unique index, ROWID

information is kept in the row header.

We are aware that in single column index, NULL value is not indexed. How about NULL value, is it indexed in

composite index? Let’s observe the behavior by creating small table with single and composite index and

populate it with few rows.

The output shows that for Composite Index, index key will not be created only if all values, for the columns

which are part of the index, are NULL.

Function-Based Index

How about function-based index? Does Oracle stores the original value or the result of the function? In

this section we are going to create function-based index using LOWER function as below:

Output of TINY_1IDX, there are

2 index entries only:

C1 02 for X = 1

C1 03 for X = 2

Output of TINY_2IDX, there are

3 index entries only. Entry for 4th

row is not created (X = NULL and

Y = NULL)

Page 18: The internals

Function-based index is stored using B-tree structure and Oracle stores the result of the function,

instead of the column’s value.

Local and Global Index

It is interesting as well to see how Oracle stores local and global index in partitioned table. For this

purpose, I have created small partitioned table, PART, from ONE_ROW table and then create non unique local

index on ID column and global index on ID2 column.

part.sql

For local index, Oracle stores the index key in the same way Oracle handles ordinary index (in non-

partitioned table). There is nothing special or any different in the structure. This is the capture of partition P10

of PART table.

Leaf block

0x 61 = 97

0x 62 = 98

0x 63 = 99

Page 19: The internals

Interesting result is shown for global index. Instead of storing ROWID information only, Oracle also

stores object_id (or maybe data_object_id) along with index key.

Let’s take 2 examples from above orange parts and break down the information.

00 00 d6 de01 0056 a400 07

o 0x 00 07 = 7 row number

o 0x 56 a4 = 22180 block number

o 0x 01 00= 256 / 64 = 4 relative file number

o 0xd6 deor 0x00 00 d6 de= 55006 object_id/ data_object_id of partition P20

00 00 d6 df 01 0056 e400 00

o 0x 00 00 = 0 row number

o 0x 56 e4 = 22244 block number

o 0x 01 00= 256 / 64 = 4 relative file number

o 0xd6 dfor 0x00 00 d6 df= 55007 object_id/ data_object_id of partition PX

ROWID

Object ID of partition

Page 20: The internals

Bitmap Index

Bitmap index is another option that we can use to index our table. Usually (not always, but in most of

the cases), bitmap index is being used in the table with low cardinality (few distinct values in the column). The

most famous example is to bitmap index onSEX column, where we have only 2 values: female and male (not

sure if someone will requires another entry, such as “half male” or “half female” )

While NULL value is not indexed in B-tree index, it is indexed in Bitmap index. For every index key,

Oracle creates bitmap to track where the data is. The bigger the data (number of rows) the bigger bitmap it is.

Let’s create small table with huge PCTFREE to create more than 1 extent in the table.

bitmap.sql

Begin ROWID

End ROWID

Bitmap information

NULL is indexed

Page 21: The internals

Oracle doesn’t store the exact ROWID to identify the table’s data, but 2 ROWIDs; 1 is the Begin ROWID

and the other is the End ROWID(it’s kind of range of ROWID). To see what is the changes in the bitmap column

if we have more rows, let’s add another 3 rows to the table.

But how Oracle converts the bitmap into ROWID, vice a versa? Ggrrrr, I don’t know yet, still I am trying to get

the information how this thing is working.

IOT

IOT (Index Organized Table) is a special table in Oracle which is maintained and created using B-tree

structure (there will be root – branch – leaf block). Data will be stored in an ordered form (based on primary

key’s column). Oracle will create system generated name for the segment_name, something like

“SYS_IOT_TOP_<object_id>”. Following is an example of branch and leaf block of an IOT without an overflow

segment.

iot.sql

Page 22: The internals

We have an option to store the overflow column (column which is not part of primary key) into

another tablespace. In this case, in addition of “SYS_IOT_TOP_<object_id>” segment, Oracle creates one more

segment with this pattern “SYS_IOT_OVER_<object_id>”. When we create an overflow segment, Oracle stores

this information using heap table structure and also creates a pointer in the B-tree structure to point to the

table structure (the relation of IOT and overflow segment is very similar to the relation of heap table and B-

tree index). The pointer is stored using “DBA.ROWNO” notation, the same notation we will see when we have

row migration or row chaining in table segment.

Branch block structure, exactly the same

with ordinary B-tree index

Leaf block structure.

No ROWID information since the table

data is stored together within the index

leaf block.

Information inside green rectangle is the

index part, while information inside

purple rectangle is the table part

Page 23: The internals

Row Migration

The Symptom

When database block doesn’t sufficient enough to hold a row (for example: user update some row

with bigger data), Oracle will move that row into another block and create a pointer to join those 2 rows. This

symptom is called as “Row Migration”. Row Migration gives a negative impact in performance perspective

because it will makes additional consistent to get the data (regardless the access path, whether it is full table

scan or index scan). If the block is still enough to hold the data, Oracle will move the row into another part of

the block (the offset of that row will be changed). Row migration doesn’t give any impact in the structure of

index (if the table has index), so it is independent with the index, the only impacted is the structure of table’s

data block.

Below is the illustration of row migration.

IOT part

In the table part (purple rectangle),

Oracle stores the pointer to the overflow

segment using “DBA.ROWNO” notation

(red rectangle)

Overflow segment, stored using

heap table structure

Page 24: The internals

How Things are Working?

I will demonstrate how this symptom is happened and how Oracle creates the pointer for the migrated

row. Below are the complete steps in how to reproduce the symptom.

row_migration.sql

row_migration_trace.zip

First I created small table with only 2 columns. Then I populated with 20 rows only, but please take a

look for the second column, instead of inserting 1,000 characters (the maximum length of Y column), I put only

single character (this behavior we saw frequently in the application). It will make all those rows packed into

single block.

Before demonstrate row migration, let’s try to update single row with bigger value for Y column using

below update statement. Since there is enough room in the current block, Oracle only moves the row into

another offset within the block, but not to another the block. You can see in the following figure that Oracle

moves the offset of row 0, from 0x1ef8 to 0x1ed8

Page 25: The internals

Now let’s update the whole table with bigger value (update Y column to 1,000) and check the result.

Before update

After update

Page 26: The internals

In above picture, Oracle only put pointer information in the original location (there is no data

information), it means that Oracle moves all data from old location into the new location). We can identify this

symptom by monitoring “table fetch continued row” session statistic. From below figure we can see there are

15 row migrations when we select the table. Defrag the table (Alter Table Move, Shrink Space, CTAS or Export

– Import data) is the sensible solution for this problem, but that is only for temporary solution, because row

migration is something related with application and table design.

row 0 is moved from 0x0100570d.0 to

0x010057e.0

In the original row (0x0100570d.0),

Oracle uses nrid to locate to the new

row’s location (0x0100570e.0), and in

the new location, Oracle uses hrid to

point to the original location.

In this case, Oracle doesn’t use ROWID

format in the pointer, but in

“DBA.ROWNO” notation, from above

example:

0x0100570d is the DBA part while 0 is

the row number.

Only pointer information (nrid: ) is left

in the original location (purple

rectangle), all data has been moved

into the new location

Page 27: The internals

The Impact

As mentioned previously, row migration will increase the number of consistent gets (and probably

physical read as well) during index scan or full table scan. We can identify this behavior by simply enable auto-

trace (sqlplus) to get the statistics or by turn on events 10200 to dump the consistent gets. Below are the

results from events 10200 for both index scan and full table scan.

For both index scan and full table scan, row migration makes the consistent gets bigger compare to

table without row migration. In index scan example, Oracle requires 2 consistent gets for the table with row

migration (it is only 1 for table without row migration). And for full table scan example, Oracle requires 20

consistent gets for the table with row migration (table without row migration only requires 3 consistent gets).

Index Scan Without Row Migration

Full Table Scan Without Row Migration

Index Scan With Row Migration

Page 28: The internals

Full Table Scan With Row Migration

Row Chaining

Row chaining is happened when single block doesn’t sufficient enough to hold 1 row, due to too many

number of columns or column size is too width. Considering below example, if we update all the 3 columns (B,

C and D) with the maximum 4,000 characters, the total row size will be more than 12,000 byte. With default

block size (8k), it will require at least 2 blocks to hold the row.

row_chained.sql

Page 29: The internals

Original row

After the update, it splits into 3 blocks.

We don’t see hrid information in the

new location, and Oracle didn’t split

the column into several blocks. Every

column will be stored in different block

to avoid split information of column.

Page 30: The internals

From above picture we can see Oracle requires 3 blocks to hold the row, because Oracle will not stores

split column. Column A and B are stored in block 0x01005755, column C is stored in 0x01005757 and column D

is stored in 0x01005756. To identify row chaining, we can use the same session statistic, “table fetch continued

row”.

References

http://www.dbafan.com/book/oracle_index_internals.pdf

http://www.jlcomp.demon.co.uk/03_bitmap_1.doc

http://crd-legacy.lbl.gov/~kewu/ps/LBNL-62756.pdf

http://www.orafaq.com/node/2810

http://arup.blogspot.com/2011/01/how-oracle-locking-works.html

Jonathan Lewis’s “Oracle Core Essential Internals for DBAs and Developers”

Thomas Kyte’s “Expert Oracle Database Architechture”

What’s Next?

In part 2, I will try to cover the following items, so that we can see complete figure how the internal

things are working.

Undo and Redo

Transaction

Consistent Read

Few other things: deadlocks, snapshot too old, etc.

-heri-