pages and extents

13
SQL Server: Understanding the Data Page Structure We all know very well that SQL server stores data in 8 KB pages and it is the basic unit of IO for SQL server operation. There are different types of pages like data , GAM,SGAM etc. In this post let us try to understand the structure of data pages. SQL server use different types of pages to store different types of data like data, index data,BLOB etc.SQL servers stores the data records in data pages.Data records are rows in heap or in the leaf level of the clustered index. A data page consist of three sections. Page Header ,actual data and row offset array. A schematic diagram of data pages looks like as below.

Upload: rahul-yerrabelli

Post on 15-Apr-2016

222 views

Category:

Documents


2 download

DESCRIPTION

k

TRANSCRIPT

Page 1: Pages and Extents

SQL Server: Understanding the Data Page StructureWe all know very well that SQL server stores data in 8 KB pages and it is the basic unit of IO for SQL server operation. There are different types of pages like data , GAM,SGAM etc. In this post let us try to understand the structure of data pages.SQL server use  different types of pages to store different types of data like data, index data,BLOB etc.SQL servers stores the data records in data pages.Data records are rows in heap or in the leaf level of the clustered index.

A data page consist of three sections. Page Header ,actual data and row offset array. A schematic diagram of data pages looks like as below.

Page 2: Pages and Extents

Before going into details let us see how this looks  internally in SQL server. Let us create a table and insert some records into it.CREATE DATABASE MyDbGOUSE MyDbGOCREATE TABLE Customer (   FirstName CHAR(200),   LastName  CHAR(300),   Email     CHAR(200),   DOB       DATE,)GOINSERT INTO Customer VALUES('William','James','[email protected]','1982-01-20')INSERT INTO Customer VALUES('Jade','Victor','[email protected]','1985-08-12')INSERT INTO Customer VALUES('Jonas','hector','[email protected]','1980-10-02') INSERT INTO  Customer VALUES('William1','James','[email protected]','1982-01-20')INSERT INTO Customer VALUES('Jade1','Victor','[email protected]','1985-08-12')INSERT INTO Customer VALUES('Jonas1','hector','[email protected]','1980-10-02')INSERT INTO Customer VALUES('William2','James','[email protected]','1982-01-20')INSERT INTO Customer VALUES('Jade2','Victor','[email protected]','1985-08-12')INSERT INTO Customer VALUES('Jonas2','hector','[email protected]','1980-10-02')INSERT INTO Customer VALUES('William3','James','[email protected]','1982-01-20')

GO

Now we need to find out the pages allocated to this table. For that we have to use an undocumented command DBCC IND.The syntax of DBCC IND is given below:

DBCC IND ( { 'dbname' | dbid }, { 'objname' | objid }, { nonclustered indid | 1 | 0 | -1 | -2 });nonclustered indid = non-clustered Index ID 1 = Clustered Index ID 0 = Displays information in-row data pages and in-row IAM pages (from Heap) -1 = Displays information for all pages of all indexes including LOB (Large object binary) pages and row-overflow pages -2 = Displays information for all IAM pages

Run the below command from SSMS

DBCC IND('mydb','customer',-1)

Page 3: Pages and Extents

The output will looks like as in below picture:

You can see two records, one with page type 10 and other one with 1. Page type 10 is an IAM page and we will talk about different page types in a different post.Page type 1 is data page  and its page id is 114.

Now to see the row data stored in that page , we have to use the DBCC PAGE command. The syntax of DBCC PAGE :dbcc page ( {'dbname' | dbid}, filenum, pagenum [, printopt={0|1|2|3} ]);Printopt:0 - print just the page header1 - page header plus per-row hex dumps and a dump of the page slot array 2 - page header plus whole page hex dump3 - page header plus detailed per-row interpretation

By default the output of dbcc page is sent to error log. To get the output in the current connection , we have to enable the trace flag 3604.You can also use with tableresults along with dbcc page to get the output in table format. Run the below command to get the row data stored in the data page.

DBCC TRACEON(3604)GODBCC page('mydb',1,114,3)This will have four section in output.The first section is BUFFER which talk about in memory allocation and we are not interested in that section. The next section is page header which is fixed 96 bytes in size.The size of page header will be same for all pages. Page header section will looks like as below picture.

Page 4: Pages and Extents

To know more about these field http://www.sqlskills.com/BLOGS/PAUL/post/Inside-the-Storage-Engine-Anatomy-of-a-page.aspxThe next section is slots where the actual data is stored. I have removed some hex dumps to make it more clear . Each records are stored in a slot. Slot 0 will have the first records in the page and slot 1 will have second records and so on ,but it is not mandatory that these slots should be in the physical order of the data.You can see from the below image that the size of the record is 710 bytes. Out of this 703 bytes are fixed length data and 7 bytes are row overhead.We will discuss about the record structure and row overhead in different post.

Page 5: Pages and Extents

The last section of a page  is row offset table and we should run dbcc page with option 1 to get the row offset table at the end.DBCC page('mydb',1,114,1)

The row offset table will looks like below picture and this should read from the bottom to top.Each slot entry is just a two-bytes pointer into the page slot offset.In our example we have ten records and in the offset table we have ten entries. The first record pointing to the 96th bytes,just after the page header. It is not mandatory to have the first record at 96th bytes.This offset table will helps to manage the records in a page.Each records need 2 bytes of storage in the page for offset array.Consider a non-clustered index over a heap. Each non-clustered index row contains a physical pointer back to the heap row it maps too. This physical pointer is in form of [file:page:slot] - so the matching heap row can be found be reading the page, going to the slot number in the slot array to find the record's offset in the page and then reading the record at that offset.If we need to save a record in between, it is not mandatory to restructure the entire page. it can be easily possible by restructuring only the offset table.

In our case if you look into the page header, free space is 976 bytes, which is equal to(8*1024)- 96-(10 * 703)-(10*7)-(10*2)where 8*1024 =  Total number of bytes in the page                  96 =  Size of Page Header          10*703 =  Number of records * size of four columns in the table              10*7 =  Number of records *  row overhead              10*2 =  Number of records *  size in bytes to store the row offset table

Page 6: Pages and Extents

Now we have seen the structure of the page. Let us summarize this . A page is 8KB size. That means 8192 bytes. Out of these, 96 bytes are used for page header which is in fixed size for all data pages. Below that, data records are stored in slots.The maximum length of data records is 8060 bytes. This 8060 include the 7 bytes row overhead also . So in a record you can have maximum of 8053 bytes. The below create table statement will fail.CREATE TABLE Maxsize(id         CHAR(8000) NOT NULL,id1        CHAR(54) NOT NULL)Msg 1701, Level 16, State 1, Line 1Creating or altering table 'Maxsize' failed because the minimum row size would be 8061, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

The remaining 36 bytes are reserved for slot array entry and any possible forwarding row back pointer(10 bytes). This does not meant that page can hold only 18 (36/2) records. Slot array can grow from bottom to top based on the size of the records.If the size of records is small, more records can be accommodate in a page and offset table will take more space from bottom to top.

SQL Server: Understanding GAM and SGAM PagesWe know that SQL server stores the data in 8 KB pages. An extent is made up of 8 physically contiguous pages.When we create a database, the data files will be logically divided into the pages and extents.Later, when user objects are created, the pages are allocated to them to store the data.GAM (Global Allocation Map) and SGAM (Shared Global Allocation Map) pages are used to track the space allocation in SQL Server. In this post, let us discuss about space allocation in SQL server and how GAM and SGAM helps in the space allocation.

In SQL server there are two types of extents: 

Uniform Extent: These are the extents owned by single user objects. All 8 pages of these extents can be used by a single object, the owning object.

Mixed Extent:These are the extents owned by multiple user objects. There is a possibility of each page in this  extent, that might be allocated to 8 different user objects. Each of the eight pages in the extent can be owned by  different objects.

To make space allocation more optimize, SQL server will not allocate pages from uniform extent to a table or index if its size is less than 8 pages. Let us try a sample .USE MydbGOCREATE TABLE TestSpaceAllocation(   Name CHAR(8000))

Page 7: Pages and Extents

GOINSERT INTO TestSpaceAllocation VALUES('John')GO 26 --Insert 26 records DBCC IND('MyDb','TestSpaceAllocation',1) For the usage of DBCC IND refer the earlier post

The output will looks like as given below:

Page 8: Pages and Extents

From the output, it is clear that, the first 8 pages are not from single extent . There is a gap between page number 187 and 211, remaining 8 pages are physically contiguous (8 page number are in sequential order). While looking into the fragmentation level in your environment, you might have noticed small tables with higher level of fragmentation. This higher fragmentation will not reduce even if you rebuild the index.The reason behind this is due to the allocation of first eight pages from the mixed extent.Refer the post Measuring Fragmentation to learn about Fragmentation 

SQL sever allocates pages for new table or indexes from mixed extents.Once the tables grow beyond 8 pages, SQL server has to allocate page from uniform extent. When a table or index need more space to accommodate the new or modified data, SQL server has to allocate page for the table or index. If the size of the table or index is less than 8 pages, SQL server has to locate a page from mixed extent to allocate. If the size is more than 8 pages, SQL server has to locate the page from uniform extent. SQL server uses two types of pages to optimize this allocation process.

GAM(Global Allocation Map): GAM pages records what extents have been allocated for any use. GAM has bit for every extent. If the bit is 1, the corresponding extent is free, if the bit is 0, the corresponding extent is in use as uniform or mixed extent.A GAM page can hold information of around 64000 extents. That is, a GAM page can hold information of (64000X8X8)/1024 = 4000 MB approximately. In short,  a data file of size 7 GB will have two GAM pages.

SGAM (Shares Global Allocation Map): SGAM pages record what extents are currently being used as mixed extent and also have at least one unused page. SGAM has bit for every extent. If the bit is 1, the corresponding extent is used as a mixed extent and has at least one page free to allocate. If the bit is 0, the extent is either not used as a mixed extent or it is mixed extent and with all its pages being used. A SGAM page can hold information of 64000 extents. That is, a SGAM page can hold information of (64000X8X8)/1024 = 4000 MB. In short, a data file of size 7 GB will have two SGAM page.

Page 9: Pages and Extents

GAM and SGAM pages helps the database engine in extent management. To allocate an extent, the database engine searches the GAM page for a bit 1 and set the bit to 0. If that extent is allocating as mixed extent, it sets  the corresponding extent's bit in SGAM page to 1. If that extent is allocating as uniform extent, there is no need to change the corresponding SGAM bit. To find a mixed extent with free pages, the database engine searches the SGAM page for a bit 1. If there is no free extent, the data file is full. To deallocate an extent, the database engine sets the corresponding GAM bit set to 1 and SGAM bit to 0.

In any data file, the third page(page no 2) is GAM and fourth page (page no 3) is SGAM page. The first page (page no 0) is file header and second page (page no 1) is PFS (Page Free Space) page.  We can see the GAM and SGAM pages using DBCC page command. Refer earlier post for the usage of DBCC page DBCC TRACEON(3604)GODBCC page('adventureworks2008',1,2,3)

The last part of the out put is :

Page 10: Pages and Extents

First line says that, all extents between the extent starts at  page no 0 and 22400 are allocated .That means page numbers from 0 to 22407 are part of the allocated extents.Second line says that, all extents between the extent start at page number 22408 and  2416 are not allocated .That means page number from  22408  to 22423 are part of extents which are not allocated. Third line says that, extent start at page no 22424 is allocated. That means page number from   22424   to  22431 are part of the allocated extent. Let us do DBCC page for one allocated page(22400) and one not allocated page (22408)DBCC page('adventureworks2008',1,22400,1)

After the page header, in the allocation status section, it has mentioned the GAM page, to which the page belongs  to and the status of the extent as ALLOCATED . For the  page 22408  it will be same GAM page but status will be NOT ALLOCATED.

Let us see the SGAM pageDBCC page('adventureworks2008',1,3,3) The last part of the output will look like as given below.

Page 11: Pages and Extents

It says that extents between extent starts at page numbers 0 and  11752  are not allocated, which means these extents are not allocated at all or are uniform extents or mixed extents with no free pages. The second lines says, the extent start at page number 11760 is a mixed extent and has at least one free page.