sql server 2005 architecture

SQL Server 2005 Physical Database

Architecture

All the contents of this file are based on the information published in MSDN, TechNet and SQL Server 2005 Online Tutorials.

IntroductionSQL Server 2005 is a set of components that work together to meet the data storage and analysis needs of the largest Web sites and enterprise data processing systems. The following components work together to manage data effectively in SQL server 2005.

Physical Database ArchitectureDescribes the logical components defined in SQL Server databases and how they are physically implemented in database files.

Relational Database Engine ArchitectureDescribes the features of the database engine that make it efficient at processing large numbers of concurrent requests for data from many users.

Physical Database ArchitectureThe physical database architecture contains

Pages and Extents Physical Database File and Filegroups Space Allocation and Reuse Table and Index Architecture Transaction Log Architecture

Pages and ExtendsThe fundamental unit of data storage in SQL Server is the page. The disk space allocated to a data file (.mdf or .ndf) in a database is logically divided into pages numbered contiguously from 0 to n. Disk I/O operations are performed at the page level. That is, SQL Server reads or writes whole data pages.

Extents are a collection of eight physically contiguous pages and are used to efficiently manage the pages. All pages are stored in extents.

PagesIn SQL Server, the page size is 8 KB. This means SQL Server databases have 128 pages per megabyte. Each page begins with a 96-byte header that is used to store system information about the page. This information includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page.

The following table shows the page types used in the data files of a SQL Server database.Page Type ContentsData Data rows with all data, except text, ntext, image, nvarchar(max),

varchar(max), varbinary(max), and xml data, when text in row is set to ON.

Index Index entries.Text/Image Large object data types:

text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/ddc311fe-f3cc-4e99-b86b-43ce822ceda3.htm

Variable length columns when the data row exceeds 8 KB: varchar, nvarchar, varbinary, and sql_variant

Global Allocation Map, Shared Global Allocation Map

Information about whether extents are allocated.

Page Free Space Information about page allocation and free space available on pages.

Index Allocation Map Information about extents used by a table or index per allocation unit.

Bulk Changed Map Information about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit.

Differential Changed Map Information about extents that have changed since the last BACKUP DATABASE statement per allocation unit.

Note: Log files do not contain pages; they contain a series of log records.

Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.

Large Row SupportRows cannot span pages in SQL Server 2005, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. In SQL Server 2005, this restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page.

ExtentsExtents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte.To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents:

Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.

Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.

A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

Physical Database Files and FilegroupsSQL Server 2005 maps a database over a set of operating-system files. Data and log information are never mixed in the same file, and individual files are used only by one database. Filegroups are named collections of files and are used to help with data placement and administrative tasks such as backup and restore operations.

Database FilesSQL Server 2005 databases have three types of files:

Primary data files The primary data file is the starting point of the database and points to the other files in the database. Every database has one primary data file. The recommended file name extension for primary data files is .mdf.

Secondary data files Secondary data files make up all the data files, other than the primary data file. Some databases may not have any secondary data files, while others have several secondary data files. The recommended file name extension for secondary data files is .ndf.

Log files Log files hold all the log information that is used to recover the database. There must be at least one log file for each database, although there can be more than one. The recommended file name extension for log files is .ldf.

SQL Server 2005 does not enforce the .mdf, .ndf, and .ldf file name extensions, but these extensions help you identify the different kinds of files and their use.

In SQL Server 2005, the locations of all the files in a database are recorded in the primary file of the database and in the master database. The Database Engine uses the file location information from the master database most of the time. However, the database engine uses the file location information from the primary file to initialize the file location entries in the master database in the following situations:

When attaching a database using the CREATE DATABASE statement with either the FOR ATTACH or FOR ATTACH_REBUILD_LOG options.

When upgrading from SQL Server version 2000 or version 7.0 to SQL Server 2005. When restoring the master database.

Logical and Physical File NamesSQL Server 2005 files have two names:

logical_file_name The logical_file_name is the name used to refer to the physical file in all Transact-SQL statements. The logical file name must comply with the rules for SQL Server identifiers and must be unique among logical file names in the database.

os_file_name The os_file_name is the name of the physical file including the directory path. It must follow the rules for the operating system file names.

The following illustration shows examples of the logical file names and the physical file names of a database created on a default instance of SQL Server 2005:

SQL Server data and log files can be put on either FAT or NTFS file systems. NTFS is recommended for the security aspects of NTFS. Read/write data filegroups and log files cannot be placed on an NTFS compressed file system. Only read-only databases and read-only secondary filegroups can be put on an NTFS compressed file system.

When multiple instances of SQL Server are run on a single computer, each instance receives a different default directory to hold the files for the databases created in the instance.

Data File PagesPages in a SQL Server 2005 data file are numbered sequentially, starting with zero (0) for the first page in the file. Each file in a database has a unique file ID number. To uniquely identify a page in a database, both the file ID and the page number are required. The following example shows the page numbers in a database that has a 4-MB primary data file and a 1-MB secondary data file.

The first page in each file is a file header page that contains information about the attributes of the file. Several of the other pages at the start of the file also contain system information, such as allocation maps. One of the system pages stored in both the primary data file and the first log file is a database boot page that contains information about the attributes of the database.

File SizeSQL Server 2005 files can grow automatically from their originally specified size. When you define a file, you can specify a specific growth increment. Every time the file is filled, it increases its size by the growth increment. If there are multiple files in a filegroup, they will not autogrow until all the files are full. Growth then occurs in a round-robin fashion.

Each file can also have a maximum size specified. If a maximum size is not specified, the file can continue to grow until it has used all available space on the disk. This feature is especially useful when SQL Server is used as a database embedded in an application where the user does not have convenient access to a system administrator. The user can let the files autogrow as required to reduce the administrative burden of monitoring free space in the database and manually allocating additional space.

Database FilegroupsDatabase objects and files can be grouped together in filegroups for allocation and administration purposes. There are two types of filegroups:

Primary The primary filegroup contains the primary data file and any other files not specifically assigned to another filegroup. All pages for the system tables are allocated in the primary filegroup.

User-defined User-defined filegroups are any filegroups that are specified by using the FILEGROUP keyword in a CREATE DATABASE or ALTER DATABASE statement.

Log files are never part of a filegroup. Log space is managed separately from data space.No file can be a member of more than one filegroup. Tables, indexes, and large object data can be associated with a specified filegroup. In this case, all their pages will be allocated in that filegroup, or the tables and indexes can be partitioned. The data of partitioned tables and indexes is divided into units

each of which can be placed in a separate filegroup in a database. For more information about partitioned tables and indexes, see Partitioned Tables and Indexes.

One filegroup in each database is designated the default filegroup. When a table or index is created without specifying a filegroup, it is assumed all pages will be allocated from the default filegroup. Only one filegroup at a time can be the default filegroup. Members of the db_owner fixed database role can switch the default filegroup from one filegroup to another. If no default filegroup is specified, the primary filegroup is the default filegroup.

File and Filegroup ExampleThe following example creates a database on an instance of SQL Server. The database has a primary data file, a user-defined filegroup, and a log file. The primary data file is in the primary filegroup and the user-defined filegroup has two secondary data files. An ALTER DATABASE statement makes the user-defined filegroup the default. A table is then created specifying the user-defined filegroup.

USE master;GO-- Create the database with the default data-- filegroup and a log file. Specify the-- growth increment and the max size for the-- primary data file.CREATE DATABASE MyDBON PRIMARY ( NAME='MyDB_Primary', FILENAME= 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_Prm.mdf', SIZE=4MB, MAXSIZE=10MB, FILEGROWTH=1MB),FILEGROUP MyDB_FG1 ( NAME = 'MyDB_FG1_Dat1', FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_FG1_1.ndf', SIZE = 1MB, MAXSIZE=10MB, FILEGROWTH=1MB), ( NAME = 'MyDB_FG1_Dat2', FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB_FG1_2.ndf', SIZE = 1MB, MAXSIZE=10MB, FILEGROWTH=1MB)LOG ON ( NAME='MyDB_log', FILENAME = 'c:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\data\MyDB.ldf', SIZE=1MB, MAXSIZE=10MB, FILEGROWTH=1MB);GOALTER DATABASE MyDB MODIFY FILEGROUP MyDB_FG1 DEFAULT;GO

-- Create a table in the user-defined filegroup.USE MyDB;CREATE TABLE MyTable ( cola int PRIMARY KEY, colb char(8) )ON MyDB_FG1;GO

The following illustration summarizes the results of the previous example.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/7d913c8d-b79b-4b1f-93b9-098dd33b07a8.htm

Space Allocation and ReuseSQL Server 2005 is effective at quickly allocating pages to objects and reusing space that is made available by deleted rows. These operations are internal to the system and use data structures that are not visible to users. However, these processes and structures are still occasionally referenced in SQL Server messages.

This section is an overview of the space allocation algorithms and the data structures. It also provides users and administrators with the knowledge they require to understand the references to the terms in the messages generated by SQL Server.

Managing Extent Allocations and Free SpaceThe SQL Server 2005 data structures that manage extent allocations and track free space have a relatively simple structure. This has the following benefits:

The free space information is densely packed, so relatively few pages contain this information. This increases speed by reducing the amount of disk reads that are required to retrieve allocation information. This also increases the chance that the allocation pages will remain in memory and not require more reads.

Most of the allocation information is not chained together. This simplifies the maintenance of the allocation information.

Each page allocation or deallocation can be performed quickly. This decreases the contention between concurrent tasks having to allocate or deallocate pages.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/00471967-b97e-41f7-a08a-cebba4b8eb62.htm

SQL Server uses two types of allocation maps to record the allocation of extents:

Global Allocation Map (GAM) GAM pages record what extents have been allocated. Each GAM covers 64,000 extents, or almost 4 GB of data. The GAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is free; if the bit is 0, the extent is allocated.

Shared Global Allocation Map (SGAM) SGAM pages record which extents are currently being used as mixed extents and also have at least one unused page. Each SGAM covers 64,000 extents, or almost 4 GB of data. The SGAM has one bit for each extent in the interval it covers. If the bit is 1, the extent is being used as a mixed extent and has a free page. If the bit is 0, the extent is not used as a mixed extent, or it is a mixed extent and all its pages are being used.

Each extent has the following bit patterns set in the GAM and SGAM, based on its current use.

Current use of Extent GAM bit setting SGAM bit settingFree, not being used 1 0Uniform extent, or full mixed extent 0 0Mixed extent with free pages 0 1

This causes simple extent management algorithms. To allocate a uniform extent, the Database Engine searches the GAM for a 1 bit and sets it to 0. To find a mixed extent with free pages, the Database Engine searches the SGAM for a 1 bit. To allocate a mixed extent, the Database Engine searches the GAM for a 1 bit, sets it to 0, and then also sets the corresponding bit in the SGAM to 1. To deallocate an extent, the Database Engine makes sure that the GAM bit is set to 1 and the SGAM bit is set to 0. The algorithms that are actually used internally by the Database Engine are more sophisticated than what is described in this topic, because the Database Engine distributes data evenly in a database. However, even the real algorithms are simplified by not having to manage chains of extent allocation information.

Tracking Free SpacePage Free Space (PFS) pages record the allocation status of each page, whether an individual page has been allocated, and the amount of free space on each page. The PFS has one byte for each page, recording whether the page is allocated, and if so, whether it is empty, 1 to 50 percent full, 51 to 80 percent full, 81 to 95 percent full, or 96 to 100 percent full.

After an extent has been allocated to an object, the Database Engine uses the PFS pages to record which pages in the extent are allocated or free. This information is used when the Database Engine has to allocate a new page. The amount of free space in a page is only maintained for heap and Text/Image pages. It is used when the Database Engine has to find a page with free space available to hold a newly inserted row. Indexes do not require that the page free space be tracked, because the point at which to insert a new row is set by the index key values.

A PFS page is the first page after the file header page in a data file (page number 1). This is followed by a GAM page (page number 2), and then an SGAM page (page 3). There is a PFS page approximately 8,000 pages in size after the first PFS page. There is another GAM page 64,000 extents after the first GAM page on page 2, and another SGAM page 64,000 extents after the first SGAM page on page 3. The following illustration shows the sequence of pages used by the database engine to allocate and manage extents.

Managing Space Used by ObjectsAn Index Allocation Map (IAM) page maps the extents in a 4-GB part of a database file used by an allocation unit. An allocation unit is one of three types:

IN_ROW_DATAHolds a partition of a heap or index.

LOB_DATAHolds large object (LOB) data types, such as xml, varbinary(max), and varchar(max).

ROW_OVERFLOW_DATAHolds variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit.

Each partition of a heap or index contains at least an IN_ROW_DATA allocation unit. It may also contain a LOB_DATA or ROW_OVERFLOW_DATA allocation unit, depending on the heap or index schema. For more information about allocation units, see Table and Index Organization.

An IAM page covers a 4-GB range in a file and is the same coverage as a GAM or SGAM page. If the allocation unit contains extents from more than one file, or more than one 4-GB range of a file, there will be multiple IAM pages linked in an IAM chain. Therefore, each allocation unit has at least one IAM page for each file on which it has extents. There may also be more than one IAM page on a file, if the range of the extents on the file allocated to the allocation unit exceeds the range that a single IAM page can record.

IAM pages are allocated as required for each allocation unit and are located randomly in the file. The system view, sys.system_internals_allocation_units, points to the first IAM page for an allocation unit. All the IAM pages for that allocation unit are linked in a chain.

The sys.system_internals_allocation_units system view is for internal use only and is subject to change. Compatibility is not guaranteed.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/8936789a-e803-4814-83f2-77e717f81736.htm

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/5b6751a2-e67e-432c-8999-eb6382d960c4.htm

An IAM page has a header that indicates the starting extent of the range of extents mapped by the IAM page. The IAM page also has a large bitmap in which each bit represents one extent. The first bit in the map represents the first extent in the range, the second bit represents the second extent, and so on. If a bit is 0, the extent it represents is not allocated to the allocation unit owning the IAM. If the bit is 1, the extent it represents is allocated to the allocation unit owning the IAM page.

When the Database Engine has to insert a new row and no space is available in the current page, it uses the IAM and PFS pages to find a page to allocate, or, for a heap or a Text/Image page, a page with sufficient space to hold the row. The Database Engine uses the IAM pages to find the extents allocated to the allocation unit. For each extent, the Database Engine searches the PFS pages to see if there is a page that can be used. Each IAM and PFS page covers lots of data pages, so there are few IAM and PFS pages in a database. This means that the IAM and PFS pages are generally in memory in the SQL Server buffer pool, so they can be searched quickly. For indexes, the insertion point of a new row is set by the index key. In this case, the search process previously described does not occur.

The database engine allocates a new extent to an allocation unit only when it cannot quickly find a page in an existing extent with sufficient space to hold the row being inserted. The Database Engine allocates extents from those available in the filegroup using a proportional allocation algorithm. If a filegroup has two files and one has two times the free space as the other, two pages will be allocated from the file with the available space for every one page allocated from the other file. This means that every file in a filegroup should have a similar percentage of space used.

Tracking Modified ExtendsSQL Server 2005 uses two internal data structures to track extents modified by bulk copy operations and extents modified since the last full backup. These data structures greatly speed up differential backups. They also speed up the logging of bulk copy operations when a database is using the bulk-logged recovery model. Like the Global Allocation Map (GAM) and Shared Global Allocation Map (SGAM) pages, these structures are bitmaps in which each bit represents a single extent.

Differential Changed Map (DCM):This tracks the extents that have changed since the last BACKUP DATABASE statement. If the bit for an extent is 1, the extent has been modified since the last BACKUP DATABASE statement. If the bit is 0, the extent has not been modified.

Differential backups read just the DCM pages to determine which extents have been modified. This greatly reduces the number of pages that a differential backup must scan. The length of time that a

differential backup runs is proportional to the number of extents modified since the last BACKUP DATABASE statement and not the overall size of the database.

Bulk Changed Map (BCM) This tracks the extents that have been modified by bulk logged operations since the last BACKUP LOG statement. If the bit for an extent is 1, the extent has been modified by a bulk logged operation after the last BACKUP LOG statement. If the bit is 0, the extent has not been modified by bulk logged operations.

Although BCM pages appear in all databases, they are only relevant when the database is using the bulk-logged recovery model. In this recovery model, when a BACKUP LOG is performed, the backup process scans the BCMs for extents that have been modified. It then includes those extents in the log backup.

This lets the bulk logged operations be recovered if the database is restored from a database backup and a sequence of transaction log backups. BCM pages are not relevant in a database that is using the simple recovery model, because no bulk logged operations are logged. They are not relevant in a database that is using the full recovery model, because that recovery model treats bulk logged operations as fully logged operations.

The interval between DCM pages and BCM pages is the same as the interval between GAM and SGAM page, 64,000 extents. The DCM and BCM pages are located behind the GAM and SGAM pages in a physical file:

Table and Index ArchitectureObjects in the SQL Server 2005 database are stored as a collection of 8-KB pages. This section describes the way the pages for tables and indexes are organized, stored, and accessed.

Table and Index OrganizationThe following illustration shows the organization of a table. A table is contained in one or more partitions and each partition contains data rows in either a heap or a clustered index structure. The pages of the heap or clustered index are managed in one or more allocation units, depending on the column types in the data rows.

PartitionsIn SQL Server 2005, table and index pages are contained in one or more partitions. A partition is a user-defined unit of data organization. By default, a table or index has only one partition that contains all the table or index pages. The partition resides in a single filegroup. A table or index with a single partition is equivalent to the organizational structure of tables and indexes in earlier versions of SQL Server.

When a table or index uses multiple partitions, the data is partitioned horizontally so that groups of rows are mapped into individual partitions, based on a specified column. The partitions can be put on one or more filegroups in the database. The table or index is treated as a single logical entity when queries or updates are performed on the data.

To view the partitions used by a table or index, use the sys.partitions (Transact-SQL) catalog view.

Clustered Tables, Heaps, and IndexesSQL Server 2005 tables use one of two methods to organize their data pages within a partition:

Clustered tables are tables that have a clustered index. The data rows are stored in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values. The pages in each level of the index, including the data pages in the leaf level, are linked in a doubly-linked list. However, navigation from one level to another is performed by using key values. For more information, see Clustered Index Structures.

Heaps are tables that have no clustered index. The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list.

Indexed views have the same storage structure as clustered tables.

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/26b28045-c3c2-465a-b564-bf2189e93fdc.htm

When a heap or a clustered table has multiple partitions, each partition has a heap or B-tree structure that contains the group of rows for that specific partition. For example, if a clustered table has four partitions, there are four B-trees; one in each partition.

Nonclustered IndexesNonclustered indexes have a B-tree index structure similar to the one in clustered indexes. The difference is that nonclustered indexes do not affect the order of the data rows. The leaf level contains index rows. Each index row contains the nonclustered key value, a row locator and any included, or nonkey, columns. The locator points to the data row that has the key value.

XML IndexesOne primary and several secondary XML indexes can be created on each xml column in the table. An XML index is a shredded and persisted representation of the XML binary large objects (BLOBs) in the xml data type column. XML indexes are stored as internal tables. To view information about xml indexes, use the sys.xml_indexes or sys.internal_tables catalog views.

Allocation UnitsAn allocation unit is a collection of pages within a heap or B-tree used to manage data based on their page type. The following table lists the types of allocation units used to manage data in tables and indexes.

Allocation unit type Is used to manageIN_ROW_DATA Data or index rows that contain all data, except large object (LOB) data.

Pages are of type Data or Index.LOB_DATA Large object data stored in one or more of these data types: text, ntext,

image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined types (CLR UDT). Pages are of type Text/Image.

ROW_OVERFLOW_DATA Variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit. Pages are of type Data.

A heap or B-tree can have only one allocation unit of each type in a specific partition. To view the table or index allocation unit information, use the sys.allocation_units catalog view.

IN_ROW_DATA Allocation UnitFor every partition used by a table (heap or clustered table), index, or indexed view, there is one IN_ROW_DATA allocation unit that is made up of a collection of data pages. This allocation unit also contains additional collections of pages to implement each nonclustered and XML index defined for the table or view. The page collections in each partition of a table, index, or indexed view are anchored by page pointers in the sys.system_internals_allocation_units system view.


Each table, index, and indexed view partition has a row in sys.system_internals_allocation_units uniquely identified by a container ID (container_id). The container ID has a one-to-one mapping to the

partition_id in the sys.partitions catalog view that maintains the relationship between the table, index, or the indexed view data stored in a partition and the allocation units used to manage the data within the partition.

The allocation of pages to a table, index, or an indexed view partition is managed by a chain of IAM pages. The column first_iam_page in sys.system_internals_allocation_units points to the first IAM page in the chain of IAM pages managing the space allocated to the table, index, or the indexed view in the IN_ROW_DATA allocation unit.

sys.partitions returns a row for each partition in a table or index.

A heap has a row in sys.partitions with index_id = 0. The first_iam_page column in sys.system_internals_allocation_units points to the IAM chain for the collection of heap data pages in the specified partition. The server uses the IAM pages to find the pages in the data page collection, because they are not linked.

A clustered index on a table or a view has a row in sys.partitions with index_id = 1. The root_page column in sys.system_internals_allocation_units points to the top of the clustered index B-tree in the specified partition. The server uses the index B-tree to find the data pages in the partition.

Each nonclustered index created for a table or a view has a row in sys.partitions with index_id > 1. The root_page column in sys.system_internals_allocation_units points to the top of the nonclustered index B-tree in the specified partition.

Each table that has at least one LOB column also has a row in sys.partitions with index_id > 250.

The first_iam_page column points to the chain of IAM pages that manage the pages in the LOB_DATA allocation unit.

ROW_OVERFLOW_DATA Allocation UnitFor every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit. When the size limitation is reached, SQL Server moves the column with the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation unit. A 24-byte pointer to this off-row data is maintained on the original page.

Text/Image pages in the ROW_OVERFLOW_DATA allocation unit are managed in the same way pages in the LOB_DATA allocation unit are managed. That is, the Text/Image pages are managed by a chain of IAM pages.

LOB_DATA Allocation UnitWhen a table or index has one or more LOB data types, one LOB_DATA allocation unit per partition is allocated to manage the storage of that data. The LOB data types include text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined types.

Partition and Allocation Unit Example

The following example returns partition and allocation unit data for two tables: DatabaseLog, a heap with LOB data and no nonclustered indexes, and Currency, a clustered table without LOB data and one nonclustered index. Both tables have a single partition.

USE AdventureWorks;GOSELECT o.name AS table_name,p.index_id, i.name AS index_name , au.type_desc AS allocation_type, au.data_pages, partition_numberFROM sys.allocation_units AS au JOIN sys.partitions AS p ON au.container_id = p.partition_id JOIN sys.objects AS o ON p.object_id = o.object_id JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_idWHERE o.name = N'DatabaseLog' OR o.name = N'Currency'ORDER BY o.name, p.index_id;

Here is the result set. Notice that the DatabaseLog table uses all three allocation unit types, because it contains both data and Text/Image page types. The Currency table does not have LOB data, but does have the allocation unit required to manage data pages. If the Currency table is later modified to include a LOB data type column, a LOB_DATA allocation unit is created to manage that data.

table_name index_id index_name allocation_type data_pages partition_number ----------- -------- ----------------------- --------------- ----------- ------------Currency 1 PK_Currency_CurrencyCode IN_ROW_DATA 1 1Currency 3 AK_Currency_Name IN_ROW_DATA 1 1DatabaseLog 0 NULL IN_ROW_DATA 160 1DatabaseLog 0 NULL ROW_OVERFLOW_DATA 0 1DatabaseLog 0 NULL LOB_DATA 49 1(5 row(s) affected)

Heap StructuresA heap is a table without a clustered index. Heaps have one row in sys.partitions, with index_id = 0 for each partition used by the heap. By default, a heap has a single partition. When a heap has multiple partitions, each partition has a heap structure that contains the data for that specific partition. For example, if a heap has four partitions, there are four heap structures; one in each partition.

Depending on the data types in the heap, each heap structure will have one or more allocation units to store and manage the data for a specific partition. At a minimum, each heap will have one IN_ROW_DATA allocation unit per partition. The heap will also have one LOB_DATA allocation unit per partition, if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition, if it contains variable length columns that exceed the 8,060 byte row size limit.

The column first_iam_page in the sys.system_internals_allocation_units system view points to the first IAM page in the chain of IAM pages that manage the space allocated to the heap in a specific partition. SQL Server 2005 uses the IAM pages to move through the heap. The data pages and the rows within them are not in any specific order and are not linked. The only logical connection between data pages is the information recorded in the IAM pages.


Table scans or serial reads of a heap can be performed by scanning the IAM pages to find the extents that are holding pages for the heap. Because the IAM represents extents in the same order that they exist in the data files, this means that serial heap scans progress sequentially through each file. Using the

IAM pages to set the scan sequence also means that rows from the heap are not typically returned in the order in which they were inserted.

The following illustration shows how the SQL Server Database Engine uses IAM pages to retrieve data rows in a single partition heap.

Clustered Index StructuresIn SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and leaf nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index. By default, a clustered index has a single partition. When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition. For example, if a clustered index has four partitions, there are four B-tree structures; one in each partition.

Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each clustered index will have one IN_ROW_DATA allocation unit per partition. The clustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns. It will also have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization.

The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence



among existing rows. The page collections for the B-tree are anchored by page pointers in the sys.system_internals_allocation_units system view.


For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition. SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

This illustration shows the structure of a clustered index in a single partition.

Nonclustered Index StructuresNonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:

The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.

The leaf layer of a nonclustered index is made up of index pages instead of data pages.

Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.

The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described in the following:

If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is known as a Row ID (RID).

If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2005 makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

Nonclustered indexes have one row in sys.partitions with index_id >1 for each partition used by the index. By default, a nonclustered index has a single partition. When a nonclustered index has multiple partitions, each partition has a B-tree structure that contains the index rows for that specific partition. For example, if a nonclustered index has four partitions, there are four B-tree structures, with one in each partition.

Depending on the data types in the nonclustered index, each nonclustered index structure will have one or more allocation units in which to store and manage the data for a specific partition. At a minimum, each nonclustered index will have one IN_ROW_DATA allocation unit per partition that stores the index B-tree pages. The nonclustered index will also have one LOB_DATA allocation unit per partition if it contains large object (LOB) columns . Additionally, it will have one ROW_OVERFLOW_DATA allocation unit per partition if it contains variable length columns that exceed the 8,060 byte row size limit. For more information about allocation units, see Table and Index Organization. The page collections for the B-tree are anchored by root_page pointers in the sys.system_internals_allocation_units system view.



Included Column IndexesIn SQL Server 2005, the functionality of nonclustered indexes can be extended by adding included columns, called nonkey columns, to the leaf level of the index. While the key columns are stored at all levels of the nonclustered index, nonkey columns are stored only at the leaf level.

Transaction Log ArchitectureEvery SQL Server 2005 database has a transaction log that records all the transactions and database modifications made by each transaction. The transaction log is a critical component of any database. This section contains the architectural information required to understand how the transaction log is used to guarantee the data integrity of the database and how it is used for data recovery.

Transaction Log FundamentalsEvery SQL Server 2005 database has a transaction log that records all transactions and the database modifications made by each transaction. The transaction log is a critical component of the database and, in the case of a system failure, can be the only source of recent data. It should never be deleted or moved unless the consequences of doing that are fully understood.

Operations Supported by the Transaction LogThe transaction log supports the following operations:

Recovery of individual transactionsIf an application issues a ROLLBACK statement, or if the Database Engine detects an error such as the loss of communication with a client, the log records are used to roll back the modifications made by an incomplete transaction.

Recovery of all incomplete transactions when SQL Server is startedIf a server that is running SQL Server fails, the databases may be left in a state where some modifications were never written from the buffer cache to the data files, and there may be some modifications from incomplete transactions in the data files. When an instance of SQL Server is started, it runs a recovery of each database. Every modification recorded in the log which may not have been written to the data files is rolled forward. Every incomplete transaction found in the transaction log is then rolled back to make sure the integrity of the database is preserved.

Rolling a restored database, file, filegroup, or page forward to the point of failureAfter a hardware loss or disk failure affecting the database files, you can restore the database to the point of failure. You first restore the last full backup and the last full differential backup, and then restore the subsequent sequence of the transaction log backups to the point of failure. As you restore each log backup, the Database Engine reapplies all the modifications recorded in the log to roll forward all the transactions. When the last log backup is restored, the Database Engine then uses the log information to roll back all transactions that were not complete at that point.

Supporting transactional replicationThe Log Reader Agent monitors the transaction log of each database configured for transactional replication and copies the transactions marked for replication from the transaction log into the distribution database. For more information, see How Transactional Replication Works.

Supporting standby server solutionsThe standby-server solutions, database mirroring and log shipping, rely heavily on the transaction log. In a log shipping scenario, the primary server sends the active transaction log of the primary database to one or more destinations. Each secondary server restores the log to its local secondary database. For more information, see Understanding Log Shipping.

In a database mirroring scenario, every update to a database, the principal database, is immediately reproduced in a separate, full copy of the database, the mirror database. The principal server instance sends each log record immediately to the mirror server instance which applies the incoming log records to the mirror database, continually rolling it forward. For more information, see Overview of Database Mirroring.

Transaction Log CharacteristicsFollowing are the characteristics of the SQL Server Database Engine transaction log:

The transaction log is implemented as a separate file or set of files in the database. The log cache is managed separately from the buffer cache for data pages, which results in simple, fast, and robust code within the database engine.

The format of log records and pages is not constrained to follow the format of data pages. The transaction log can be implemented in several files. The files can be defined to expand

automatically by setting the FILEGROWTH value for the log. This reduces the potential of running out of space in the transaction log, while at the same time reducing administrative overhead. For more information, see ALTER DATABASE (Transact-SQL).

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/a7f95ddc-5154-4ed5-8117-c9fcf2221f13.htm

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/a7f95ddc-5154-4ed5-8117-c9fcf2221f13.htm

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/55da6b94-3a4b-4bae-850f-4bf7f6e918ca.htm

The mechanism to reuse the space within the log files is quick and has minimal effect on transaction throughput.

Transaction Log Logical ArchitectureThe SQL Server 2005 transaction log operates logically as if the transaction log is a string of log records. Each log record is identified by a log sequence number (LSN). Each new log record is written to the logical end of the log with an LSN that is higher than the LSN of the record before it.

Log records are stored in a serial sequence as they are created. Each log record contains the ID of the transaction that it belongs to. For each transaction, all log records associated with the transaction are individually linked in a chain using backward pointers that speed the rollback of the transaction.

Log records for data modifications record either the logical operation performed or they record the before and after images of the modified data. The before image is a copy of the data before the operation is performed; the after image is a copy of the data after the operation has been performed. The steps to recover an operation depend on the type of log record:

Logical operation logged To roll the logical operation forward, the operation is performed again. To roll the logical operation back, the reverse logical operation is performed.

Before and after image logged To roll the operation forward, the after image is applied. To roll the operation back, the before image is applied.

Many types of operations are recorded in the transaction log. These operations include:

The start and end of each transaction. Every data modification (insert, update, or delete). This includes changes to system tables made

by system stored procedures or data definition language (DDL) statements. Every extent and page allocation or deallocation. Creating or dropping a table or index.

Rollback operations are also logged. Each transaction reserves space on the transaction log to make sure that enough log space exists to support a rollback that is caused by either an explicit rollback statement or if an error is encountered. The amount of space reserved depends on the operations performed in the transaction, but generally is equal to the amount of space used to log each operation. This reserved space is freed when the transaction is completed.

The section of the log file from the first log record that must be present for a successful database-wide rollback to the last-written log record is called the active part of the log, or the active log. This is the section of the log required to do a full recovery of the database. No part of the active log can ever be truncated.

Transaction Log Physical Architecture

The transaction log in a database maps over one or more physical files. Conceptually, the log file is a string of log records. Physically, the sequence of log records is stored efficiently in the set of physical files that implement the transaction log.

The SQL Server Database Engine divides each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file. The database engine chooses the size of the virtual log files dynamically while it is creating or extending log files. The Database Engine tries to maintain a small number of virtual files. The size of the virtual files after a log file has been extended is the sum of the size of the existing log and the size of the new file increment. The size or number of virtual log files cannot be configured or set by administrators.

The only time virtual log files affect system performance is if the log files are defined by small size and growth_increment values. If these log files grow to a large size because of many small increments, they will have lots of virtual log files. This can slow down database startup and also log backup and restore operations. We recommend that you assign log files a size value close to the final size required, and also have a relatively large growth_increment value.

The transaction log is a wrap-around file. For example, consider a database with one physical log file divided into four virtual log files. When the database is created, the logical log file begins at the start of the physical log file. New log records are added at the end of the logical log and expand toward the end of the physical log. Log records in the virtual logs that appear in front of the minimum recovery log sequence number (MinLSN) are deleted, as truncation operations occur. The transaction log in the example database would look similar to the one in the following illustration.

When the end of the logical log reaches the end of the physical log file, the new log records wrap around to the start of the physical log file.

This cycle repeats endlessly, as long as the end of the logical log never reaches the beginning of the logical log. If the old log records are truncated frequently enough to always leave sufficient room for all the new log records created through the next checkpoint, the log never fills. However, if the end of the logical log does reach the start of the logical log, one of two things occurs:

If the FILEGROWTH setting is enabled for the log and space is available on the disk, the file is extended by the amount specified in growth_increment and the new log records are added to the extension. For more information about the FILEGROWTH setting, see ALTER DATABASE (Transact-SQL).

If the FILEGROWTH setting is not enabled, or the disk that is holding the log file has less free space than the amount specified in growth_increment, an 9002 error is generated.

If the log contains multiple physical log files, the logical log will move through all the physical log files before it wraps back to the start of the first physical log file.

Write-Ahead Transaction LogSQL Server 2005 uses a write-ahead log (WAL). A write-ahead log guarantees that no data modifications are written to disk before the associated log record is written to disk. This maintains the ACID properties for a transaction.

To understand how the write-ahead log works, it is important for you to know how modified data is written to disk. SQL Server maintains a buffer cache into which it reads data pages when data must be retrieved. Data modifications are not made directly to disk, but are made to the copy of the page in the buffer cache. The modification is not written to disk until a checkpoint occurs in the database, or the modification must be written to disk so the buffer can be used to hold a new page. Writing a modified data page from the buffer cache to disk is called flushing the page. A page modified in the cache, but not yet written to disk is called a dirty page.

At the time a modification is made to a page in the buffer, a log record is built in the log cache that records the modification. This log record must be written to disk before the associated dirty page is flushed from the buffer cache to disk. If the dirty page is flushed before the log record is written, the dirty page will create a modification on the disk that cannot be rolled back if the server fails before the log record is written to disk. SQL Server has logic that prevents a dirty page from being flushed before the associated log record is written. Log records are written to disk when the transactions are committed.

Checkpoints and the Active Portion of the LogCheckpoints flush dirty data pages from the buffer cache of the current database to disk. This minimizes the active portion of the log that must be processed during a full recovery of a database. During a full recovery, two types of actions are performed:

The log records of modifications not flushed to disk before the system stopped are rolled forward.

All modifications associated with incomplete transactions, such as transactions for which there is no COMMIT or ROLLBACK log record, are rolled back.

Checkpoint OperationA checkpoint performs the following processes in the current database:

Writes a record to the log file marking the start of the checkpoint. Stores information recorded for the checkpoint in a chain of checkpoint log records.

One piece of information recorded in the checkpoint records is the LSN of the first log record that must be present for a successful database-wide rollback. This LSN is called the Minimum Recovery LSN (MinLSN) and is the minimum of the:

LSN of the start of the checkpoint LSN of the start of the oldest active transaction LSN of the start of the oldest replication transaction that has not yet been delivered to the

distribution database

Another piece of information recorded in the checkpoint records is a list of all the active transactions that have modified the database.

Marks for reuse the space that precedes the MinLSN, if the database uses the simple recovery model.

Writes all dirty log and data pages to disk. Writes a record marking the end of the checkpoint to the log file. Writes the LSN of the start of this chain to the database boot page.

Activities That Cause a CheckpointCheckpoints occur in the following situations:

A CHECKPOINT statement is explicitly executed. A checkpoint occurs in the current database for the connection.

A minimally logged operation is performed in the database; for example, a bulk-copy operation is performed on a database that is using the Bulk-Logged recovery model.

Database files have been added or removed by using ALTER DATABASE. A change to the simple recovery model as part of the log truncation process that occurs during

this operation. An instance of SQL Server is stopped by a SHUTDOWN statement or by stopping the SQL Server

(MSSQLSERVER) service. Either will checkpoint each database in the instance of SQL Server. An instance of SQL Server periodically generates automatic checkpoints in each database to

reduce the time that the instance would take to recover the database. A database backup is taken. An activity requiring a database shutdown is performed. For example, AUTO_CLOSE is ON and

the last user connection to the database is closed, or a database option change is made that requires a restart of the database.

Automatic CheckpointsThe SQL Server Database Engine generates automatic checkpoints. The interval between automatic checkpoints is based on the amount of log space used and the time elapsed since the last checkpoint. The time interval between automatic checkpoints can be highly variable and long, if few modifications are made in the database. Automatic checkpoints can also occur frequently if lots of data is modified.

The interval between automatic checkpoints is calculated for all the databases on a server instance from the recovery interval server configuration option. This option specifies the maximum time the Database Engine should use to recover a database during a system restart. The Database Engine estimates how many log records it can process in the recovery interval during a recovery operation.

The interval between automatic checkpoints also depends on the recovery model:

If the database is using either the full or bulk-logged recovery model, an automatic checkpoint is generated whenever the number of log records reaches the number the database engine estimates it can process during the time specified in the recovery interval option.

If the database is using the simple recovery model, an automatic checkpoint is generated whenever the number of log records reaches the lesser of these two values:

The log becomes 70 percent full. The number of log records reaches the number the Database Engine estimates it can

process during the time specified in the recovery interval option.

Automatic checkpoints truncate the unused section of the transaction log if the database is using the simple recovery model. However, if the database is using the full or bulk-logged recovery models, the log is not truncated by automatic checkpoints.

The CHECKPOINT statement now provides an optional checkpoint_duration argument that specifies the requested period of time, in seconds, for checkpoints to finish.

Active LogThe section of the log file from the MinLSN to the last-written log record is called the active portion of the log, or the active log. This is the section of the log required to do a full recovery of the database. No part of the active log can ever be truncated. All log records must be truncated from the parts of the log before the MinLSN.

The following figure shows a simplified version of the end-of-a-transaction log with two active transactions. Checkpoint records have been compacted to a single record.

LSN 148 is the last record in the transaction log. At the time that the recorded checkpoint at LSN 147 was processed, Tran 1 had been committed and Tran 2 was the only active transaction. That makes the first log record for Tran 2 the oldest log record for a transaction active at the time of the last checkpoint. This makes LSN 142, the Begin transaction record for Tran 2, the MinLSN.

Long-Running TransactionsThe active log must include every part of all uncommitted transactions. An application that starts a transaction and does not commit it or roll it back prevents the Database Engine from advancing the MinLSN. This can cause two types of problems:

If the system is shut down after the transaction has performed many uncommitted modifications, the recovery phase of the subsequent restart can take much longer than the time specified in the recovery interval option.

The log might grow very large, because the log cannot be truncated past the MinLSN. This occurs even if the database is using the simple recovery model where the transaction log is generally truncated on each automatic checkpoint.

Replication TransactionsThe Log Reader Agent monitors the transaction log of each database configured for transactional replication and copies the transactions marked for replication from the transaction log into the distribution database. The active log must contain all transactions that are marked for replication, but that have not yet been delivered to the distribution database. If these transactions are not replicated in a timely manner, they can prevent the truncation of the log.

Truncating the Transaction LogIf log records were never deleted from the transaction log, the logical log would grow until it filled all the available space on the disks holding the physical log files. To reduce the size of the logical log, the transaction log is periodically truncated. In very early versions of SQL Server, truncating the log meant physically deleting the log records that were no longer needed for recovering or restoring a database. However, in recent versions, the truncation process just marks for reuse the space that was used by the old log records. The log records in this space are eventually overwritten by new log records.

Truncation does not reduce the size of a physical log file. Instead, it reduces the size of the logical log file and frees disk space for reuse.

The active portion of the transaction log, the active log, can never be truncated. The active portion of the log is the part of the log that is used to recover the database and must always be present in the database. The record at the start of the active portion of the log is identified by the minimum recovery log sequence number (MinLSN). The log records before the MinLSN are only needed to maintain a sequence of the transaction log backups.

The recovery model selected for a database determines how much of the transaction log in front of the MinLSN must be retained in the database, as shown in the following:

In the simple recovery model, a sequence of transaction logs is not being maintained. All log records before the MinLSN can be truncated at any time, except while a BACKUP statement is being processed.

In the full and bulk-logged recovery models, a sequence of transaction log backups is being maintained. The part of the logical log in front of the MinLSN cannot be truncated until the transaction log has been backed up.

Operations That Truncate the LogLog truncation occurs at these points:

After a BACKUP LOG statement is completed and it does not specify NO TRUNCATE. Every time a checkpoint is processed, if the database is using the simple recovery model. This

includes both explicit checkpoints that result from a CHECKPOINT statement and implicit checkpoints that are generated by the system. The exception is that the log is not truncated if the checkpoint occurs when a BACKUP statement is still active.

Log Truncation ExampleTransaction logs are divided internally into sections called virtual log files. Virtual log files are the unit of space that can be reused. When a transaction log is truncated, the log records in front of the virtual log file containing the MinLSN are overwritten as new log records are generated.

This illustration shows a transaction log that has four virtual logs. The log has not been truncated after the database was created. The logical log starts at the front of the first virtual log and the part of virtual log 4 beyond the end of the logical file has never been used.

This illustration shows how the log appears after truncation. The space before the start of the virtual log that contains the MinLSN record has been marked for reuse.

Shrinking the Transaction LogThe size of the log files are physically reduced in the following situations:

A DBCC SHRINKDATABASE statement is executed A DBCC SHRINKFILE statement referencing a log file is executed An autoshrink operation occurs

Shrinking a log depends on first truncating the log. Log truncation does not reduce the size of a physical log file. However, it does reduce the size of the logical log and marks as inactive the virtual logs that do not hold any part of the logical log. A log shrink operation removes enough inactive virtual logs to reduce the log file to the requested size.

The unit of the size reduction is a virtual log file. For example, if you have a 600-MB log file that has been divided into six 100-MB virtual logs, the size of the log file can only be reduced in 100-MB increments. The file size can be reduced to sizes such as 500 MB or 400 MB, but the file cannot be reduced to sizes such as 433 MB or 525 MB.

The size of the virtual log file is chosen dynamically by the database engine when log files are created or extended.

Virtual log files that hold part of the logical log cannot be freed. If all the virtual log files in a log file hold parts of the logical log, the file cannot be shrunk until a truncation marks as inactive one or more of the virtual logs at the end of the physical log.

When any file is shrunk, the space freed must come from the end of the file. When a transaction log file is shrunk, enough virtual log files from the end of the log file are freed to reduce the log to the size requested by the user. The target_size specified by the user is rounded to the next highest virtual log file boundary. For example, if a user specifies a target_size of 325 MB for our sample 600-MB file that contains 100-MB virtual log files, the last two virtual log files are removed and the new file size is 400 MB.

A DBCC SHRINKDATABASE or DBCC SHRINKFILE operation immediately tries to shrink the physical log file to the requested size:

If no part of the logical log in the virtual log files extends beyond the target_size mark, the virtual log files that come after the target_size mark are freed and the successful DBCC statement is completed with no messages.

If part of the logical log in the virtual logs does extend beyond the target_size mark, the SQL Server Database Engine frees as much space as possible and issues an informational message. The message tells you what actions you have to perform to remove the logical log from the virtual logs at the end of the file. After you perform this action, you can then reissue the DBCC statement to free the remaining space.

For example, assume that a 600-MB log file that contains six virtual log files has a logical log that starts in virtual log 3 and ends in virtual log 4 when you run a DBCC SHRINKFILE statement with a target_size of 275 MB:

Virtual log files 5 and 6 are freed immediately, because they do not contain part of the logical log. However, to meet the specified target_size, virtual log file 4 should also be freed, but it cannot because it holds the end portion of the logical log. After freeing virtual log files 5 and 6, the Database Engine fills the remaining part of virtual log file 4 with dummy records. This forces the end of the log file to the end of virtual log file 1. In most systems, all transactions starting in virtual log file 4 will be committed within seconds. This means that the entire active portion of the log is moved to virtual log file 1. The log file now looks similar to this:

The DBCC SHRINKFILE statement also issues an informational message that states that it could not free all the space requested and that you can run a BACKUP LOG statement to free the remaining space.

After the active portion of the log moves to virtual log file 1, a BACKUP LOG statement will truncate the entire logical log that is in virtual log file 4:

Because virtual log file 4 no longer holds any portion of the logical log, you can now run the same DBCC SHRINKFILE statement with a target_size of 275 MB. Virtual log file 4 will then be freed and the size of the physical log file will be reduced to the size you requested.

sql server 2005 architecture

Documents