active directory: troubleshootingcloud.up.ac.th/sites/citcoms/doclib... · microsoft, active...

Active Directory: Troubleshooting

Student Workbook

Version 2.0

Microsoft | Services © 2006 Microsoft Corporation WorkshopPLUS

Microsoft Confidential

Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. These materials are intended for distribution to and use only by Microsoft Premier Customers. Use or distribution of these materials by any other persons is prohibited without the express written permission of Microsoft Corporation. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. ©2006 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, Windows, Windows NT, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Table of Contents iii



Table of Contents

INCOMING ASSESSMENT................................................................................................................... 3

MODULE OVERVIEW........................................................................................................................... 7

SECTION 1: DIRECTORY STANDARDS............................................................................................................8 Directory Standards ..................................................................................................................... 9 LDAP Information Model ........................................................................................................... 11 LDAP Information Model (con’t).............................................................................................. 13 LDAP Naming Model ................................................................................................................. 15 Active Directory LDAP Compliance ........................................................................................ 20

SECTION 2: ACTIVE DIRECTORY ARCHITECTURE ..........................................................................................23 Active Directory Architecture .................................................................................................. 24 Active Directory Schema.......................................................................................................... 25 Active Directory Data Schema................................................................................................ 27 Active Data Store Physical Structure....................................................................................... 29 Active Directory Partitions......................................................................................................... 31 Directory Partitions Hierarchy ................................................................................................... 33 Global Catalog.......................................................................................................................... 36 Global Catalog Physical Structure .......................................................................................... 38 Global Catalog Searches......................................................................................................... 41 Operations Master Roles ........................................................................................................... 44 Domain-wide Operations Master Roles .................................................................................. 47 Operations Master Placement................................................................................................. 49

MODULE SUMMARY.................................................................................................................................51

MODULE OVERVIEW......................................................................................................................... 55

SECTION 1: ACTIVE DIRECTORY AND DNS.................................................................................................56 Active Directory and DNS......................................................................................................... 57 Domain Controller Registration ................................................................................................ 58 Domain Controller Registration ................................................................................................ 60 Netlogon.DNS............................................................................................................................. 61 Active Directory Integrated DNS ............................................................................................. 63 Replication of DNS Data ........................................................................................................... 64 Domain Controller Locator....................................................................................................... 65 Domain Controller Locator....................................................................................................... 66 Windows 2003 Domain Controller Entries................................................................................ 67 Optimizing DC Record Registration......................................................................................... 70 Optimizing DC Record Registration (con’t)............................................................................ 72 Optimizing DC Record Registration (con’t)............................................................................ 74

iv Table of Contents



SECTION 2: AGING AND SCAVENGING .....................................................................................................76 Aging and Scavenging............................................................................................................. 77 Planning Scavenging ................................................................................................................ 79 Aging and Scavenging Parameters........................................................................................ 80 Record Life Span........................................................................................................................ 81 Scavenging Algorithm............................................................................................................... 82 Scavenging Considerations ..................................................................................................... 83

SECTION 3: DNS AND APPLICATION PARTITIONS .......................................................................................84 DNS Application Partitions ........................................................................................................ 85 DNS Application Partition Creation and Enlistment............................................................... 89 Delays Associated with Populating Zone Data...................................................................... 92 DNS Partition Replication Scope.............................................................................................. 95

SECTION 4: DNS AND TOOLS AND TROUBLESHOOTING ...............................................................................97 DNS Troubleshooting.................................................................................................................. 98 IPCONFIG and PING.................................................................................................................. 99 NSLOOKUP ................................................................................................................................ 100 NETDIAG.................................................................................................................................... 101 DNSCMD ................................................................................................................................... 102 DNS Event Log .......................................................................................................................... 104 Event and Debug Logging ..................................................................................................... 108 DNSLINT...................................................................................................................................... 109 DCDIAG /test:DNS ................................................................................................................... 112

MODULE SUMMARY.............................................................................................................................. 119

MODULE OVERVIEW....................................................................................................................... 123

SECTION 1: LOGON FAILURES ................................................................................................................ 124 Logon Process .......................................................................................................................... 125 Domain Controller Locator Process....................................................................................... 128 Domain Controller Detection................................................................................................. 132 Finding a Domain Controller in the Closest Site (1) ............................................................. 137 Finding a Domain Controller in the Closest Site (2) ............................................................. 141 Using a Domain Controller Outside of Client Site ................................................................ 144 Client Logon and Firewalls...................................................................................................... 147 Global Catalog........................................................................................................................ 149 Global Catalog Server Requirement..................................................................................... 152 Global Catalog Server Availability Requirement................................................................. 156 Universal Group Membership Caching................................................................................. 158 Account Lockout Settings....................................................................................................... 160 Domain Controller Behavior ................................................................................................... 163 Lockout Sources....................................................................................................................... 166 Common Causes of Logon Failures....................................................................................... 169 Logon Failure due to Token Size............................................................................................. 171 Other Logon Failures................................................................................................................ 176

Table of Contents v



Pre-Windows 2000 Compatible Access ................................................................................ 178 SECTION 2: LOGON FAILURE TROUBLESHOOTING TOOLS........................................................................... 180

Demo: Kerbtray........................................................................................................................ 181 Klist ............................................................................................................................................. 185 Kerberos Registry Keys ............................................................................................................. 188 EventCombMT.......................................................................................................................... 191 Auditing Account Logons....................................................................................................... 193 Netlogon Logging.................................................................................................................... 197 Account Lockout Status.......................................................................................................... 202 Other Account Lockout Tools................................................................................................. 203

MODULE OVERVIEW....................................................................................................................... 207

SECTION 1: ACTIVE DIRECTORY REPLICATION MODEL.............................................................................. 208 Replication Model Physical Structure.................................................................................... 209 Changes to Attributes ............................................................................................................. 213 Changes to Attributes ............................................................................................................. 213 Change Notification................................................................................................................ 215 Change Notification Between Sites ...................................................................................... 218 Originating Updates ................................................................................................................ 220 Tracking Replicated Updates................................................................................................. 223 Update Sequence Numbers (USNs) ...................................................................................... 225 Object Creation....................................................................................................................... 227 Replication Request Filtering .................................................................................................. 230 Up-to-Dateness Vector............................................................................................................ 231 High-Watermark ....................................................................................................................... 235 Multimaster Conflict Resolution Policy................................................................................... 237 Multimaster Conflict Resolution Policy (con’t) ..................................................................... 238 Replication of Linked and Nonlinked Attributes .................................................................. 240 Replication of Deletions .......................................................................................................... 243 Lingering Objects..................................................................................................................... 244 Lingering Object Removal...................................................................................................... 248 AD Replication on a Restored Domain Controller............................................................... 251

SECTION 2: ACTIVE DIRECTORY REPLICATION TOPOLOGY ........................................................................ 254 Goals of Replication Topology............................................................................................... 256 Figure 6: KCC and ISTG Views of Intrasite and Intersite TopologyReplication Topology Physical Structure Example..................................................................................................... 262 Replication Topology Physical Structure Example............................................................... 263 Topology-Related Components ............................................................................................ 265 Site Link Settings and Their Effects on Intersite Replication ................................................. 272 Site Link Transitivity (con’t) ...................................................................................................... 281 Urgent Replication................................................................................................................... 287

SECTION 3: REPLICATION TOOLS AND SETTINGS ....................................................................................... 289 Repadmin ................................................................................................................................. 290

vi Table of Contents



Replmon.................................................................................................................................... 295 Domain Controller Diagnostic Tool (DCDIAG) ..................................................................... 296 Events and Registry Entries...................................................................................................... 299 Network Ports Used by Active Directory Replication........................................................... 301

MODULE OVERVIEW....................................................................................................................... 305

Section 1: File Replication Service ......................................................................................... 307 Introduction to File Replication Service ................................................................................ 308 Basic FRS Operation................................................................................................................. 310 Replication................................................................................................................................ 313 FRS Concepts Overview.......................................................................................................... 315 Recommended Configuration .............................................................................................. 316 Managing FRS .......................................................................................................................... 319 NTFS Junction Points in SYSVOL .............................................................................................. 323 Intersite vs. Intrasite Replication for SYSVOL.......................................................................... 326 FRS and Active Directory ........................................................................................................ 328 FRS Polling Intervals .................................................................................................................. 331 FRS Tables and Logs................................................................................................................. 333 FRS Logs..................................................................................................................................... 335 File and Folder Filters................................................................................................................ 341 Version Vector Join (VVJoin) .................................................................................................. 343 NTFS Change Journal .............................................................................................................. 344

SECTION 2: COMMON FRS PROBLEMS................................................................................................... 346 Journal Wrap Errors .................................................................................................................. 347 Backlog Files ............................................................................................................................. 349 Name Collisions........................................................................................................................ 350 Excessive Replication and Sharing Violations ...................................................................... 352 Solving Replication Conflicts .................................................................................................. 356 Restoring Replicated Files ....................................................................................................... 359 Pre-staging files for FRS ............................................................................................................ 364

SECTION 3: TROUBLESHOOTING FRS....................................................................................................... 366 Overview of Tools..................................................................................................................... 367 Sonar ......................................................................................................................................... 368 Ultrasound................................................................................................................................. 370 Ultrasound Experience ............................................................................................................ 373 Ultrasound Reporting Pack ..................................................................................................... 375 FRS MOM Management Pack Using Ultrasound.................................................................. 376 FRSDiag ..................................................................................................................................... 377 Event Log Monitoring............................................................................................................... 379 DFS Replication (DFSr) ............................................................................................................. 385 DFS Replication Benefits and Improvements........................................................................ 387

MODULE SUMMARY.............................................................................................................................. 390

MODULE OVERVIEW....................................................................................................................... 395

Table of Contents vii



SECTION 1: GROUP POLICY CONCEPTS ................................................................................................. 396 Active Directory Group Policy................................................................................................ 397 Active Directory Integration of Group Policy ....................................................................... 399 GPO Storage in Active Directory ........................................................................................... 400 Group Policy Container Characteristics ............................................................................... 401 GPC Characteristics (con’t) ................................................................................................... 403 Group Policy Template Characteristics (GPT)...................................................................... 405 GPO Synchronization............................................................................................................... 408 Client-Side Components and Processes............................................................................... 410 Application of Group Policy ................................................................................................... 412 CSE Operation.......................................................................................................................... 414 Group Policy Processing Rules................................................................................................ 416 Targeting GPOs and Security Filtering ................................................................................... 419 Group Policy Loopback Mode .............................................................................................. 421 Loopback Mode Example...................................................................................................... 422 Group Policy History................................................................................................................. 425

SECTION 2: GPO TOOLS AND TROUBLESHOOTING .................................................................................. 427 Group Policy Management Console .................................................................................... 428 Resultant Set of Policies........................................................................................................... 429 RSoP Tools.................................................................................................................................. 430 Replication Convergence...................................................................................................... 433 GPOTool .................................................................................................................................... 435 GPO Refresh ............................................................................................................................. 436 GPUpdate................................................................................................................................. 438 Network Connectivity and Slow Links.................................................................................... 440 DCGPOFIX................................................................................................................................. 442 User Environment Debug Logging ......................................................................................... 445 Troubleshooting Review .......................................................................................................... 447

OUTGOING ASSESSMENT............................................................................................................... 453

ACTION PLANNING........................................................................................................................ 457

Incoming Assessment 1



Incoming Assessment

2 Incoming Assessment




Incoming Assessment 3



Incoming Assessment

This WorkshopPLUS course includes two 25-minute quizzes – an Incoming Assessment (at the start of the workshop) and an Outgoing Assessment (on the last day of the workshop):

So, you’re probably thinking: “Why are they giving me a test during the first hour of this workshop? That is not a very nice way to start the workshop.”

We do this because the Assessments provide key data:

• The Incoming Assessment baselines knowledge. • The Outgoing Assessment measures knowledge transfer.

So, you might be asking yourself: “Key data for whom? What’s in it for me? What’s in it for Microsoft?” Well, there are benefits for you, benefits for your management, and benefits for the WorkshopPLUS program. Read on…

Benefits to you:

• You get an opportunity to see how much you’ve learned – a measure of improvement. • Students are not always aware of how much they’ve learned. • Students are happily surprised – even amazed – at how much they learn and how

much their scores improve. • You finish the workshop feeling really good because:

• You know that your hard work was worth it. • You feel more confident than ever in your ability to perform well on the job.

• The subject matter experts who created this assessment believe that it covers the key points that all students should learn from this workshop. On the last day, after the Outgoing Assessment, the Trainer will review each question and answer, making sure that you understand all the key concepts.

• Note: Your results are anonymous. Your Assessment form has a field for Student Number… but no place for recording your name. (More on anonymity, below…)

Benefits to your management:

• They see positive results that make them confident that the training was worthwhile. • The positive results help to justify the costs of training and perhaps make it possible for

them to ask for an increase in training budget.

4 Incoming Assessment



Benefits to the WorkshopPLUS program:

• We obtain data about the quality and value of the workshop. • We analyze the results to see whether there are problems with: • The wording of the questions or the multiple-choice answers. • The content in the manual and the labs. • The Trainer’s knowledge and teaching ability.

Privacy / Anonymity

Perhaps you’re asking yourself: “Who’s going to see my results? There’s no need to worry because:

• You record only your student number on the Assessment form…not your name. • Some time after the workshop, the scores from the class will be entered into a database.

The person entering the scores will not know who took a given Assessment because the forms have only Student Numbers on them. In addition, the Student Numbers will not be entered: instead, a made-up code number will be entered. Assessment forms will then meet with secure and environment-friendly total destruction.

Module 1: Active Directory Foundational Concepts 5



Module 1: Active Directory Foundational Concepts

6 Module 1: Active Directory Foundational Concepts



Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. These materials are intended for distribution to and use only by Microsoft Premier Customers. Use or distribution of these materials by any other persons is prohibited without the express written permission of Microsoft Corporation. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

©2006 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks or trademarks of Microsoft Corporation in the United States and other countries.

Other product and company names herein may be the trademarks of their respective owners.




Module Overview

Review the basics of LDAP Directory ServicesUnderstand Physical and Logical StructuresIdentify and Describe FSMO rolesUnderstand the Role of the Global CatalogDescribe the Different Active Directory PartitionsUnderstand the Data Store Files

Module Overview

Introduction This lesson discusses Microsoft®’s Active Directory® and its foundation. We will examine the physical and logical aspects of Active Directory and the services it provides. You will also learn about operations master roles, including Global Catalog.

Objectives After completing this module, you will be able to:

• Review the basics of LDAP Directory Services. • Understand Physical and Logical Active Directory Structures. • Identify and Describe the FSMO roles. • Understand the role of the Global Catalog. • Describe the different Active Directory Partitions. • Understand the data store files.




Section 1: Directory Standards

Understand LDAP’s role in Directory ServicesUnderstand the basic LDAP structure and syntaxDescribe Microsoft’s LDAP Compliance

Section 1: Directory Standards

Introduction In this section you will be introduced to industry standards for directories. Both X.500 and the Lightweight Directory Access Protocol (LDAP) will be discussed, along with Microsoft’s Active Directory implementation and interoperability capabilities.

Objectives After completing this section, you will be able to:

• Understand LDAP’s is role in directory services. • Understand the basic LDAP structure and syntax. • Describe Microsoft’s LDAP Compliance.




Directory Standards

Open Standards

X.500Introduced in 1988 by ISO and ITUIncluded concepts of DSAs, DIB and DITObject classes, attributes defined in schema

LDAPNeeded for Light Weight AlternativeStreamlined Directory StandardAPI set for Directory Service Application Development

Directory Standards

Directories—public or private resource lists containing names, locations, and other identifying information—are essential tools, often taken for granted in our daily activities. Typically these directories provide information about people, places, or organizations as part of an overall solution. For example, a telephone is virtually useless without a directory to match names with telephone numbers. Historically, most directories were available only in printed form.

As the computer revolution forged ahead, printed directories gave way to electronic counterparts. Many application providers capitalized on the directory concept, offering proprietary versions that extended their applications’ functionality. Network operating systems also provided directories, typically housing user and device information. Unfortunately, these first generation directories were often developed with little or no concern for interoperability. Isolated and specific in function, they performed admirably. However, it was obvious that directories needed to interact within a larger network ecosystem. This idea grew into the definition of the X.500 standard.

Directory Foundation: X.500 In 1988, the International Organization for Standardization (ISO) and the International Telecommunications Union (ITU) introduced the X.500 standard. X.500 defines the protocols and the information model for an application- and network-platform-agnostic directory service. As a distributed directory, based on hierarchically named information objects, X.500 specifications characterized a directory that users and applications could browse or search.




The X.500 paradigm includes one or more Directory System Agents (DSAs)—directory servers—each of which holds a portion of the Directory Information Base (DIB). The DIB contains named information objects assembled in a tree structure—defined by a Directory Information Tree (DIT)—and each entry has an associated set of attributes. Every attribute has a pre-defined type and one or more associated values. Object classes, containing mandatory and optional attributes, are defined within a directory schema. Users communicate with an X.500 DSA using the Directory Access Protocol (DAP), while the Directory System Protocol (DSP) controls interaction between two or more DSAs.

LDAP: The Need for a Lightweight Alternative Understanding the need for a streamlined directory standard, several implementers proposed a lightweight alternative for connecting to X.500 directories. The Lightweight Directory Access Protocol (LDAP) is a directory service protocol that runs directly over the TCP/IP stack. The information model (both for data and namespaces) of LDAP is similar to that of the X.500 OSI directory service, but with fewer features and lower resource requirements than X.500. Unlike most other Internet protocols, LDAP has an associated API that simplifies writing Internet directory service applications. The LDAP API is applicable to directory management and browser applications that do not have directory service support as their primary function. LDAP cannot create directories or specify how a directory service operates.




LDAP Information Model

Information organized in hierarchical tree structure

Schema provides blueprint/template for the directory

Objects defined by classes and attributesClasses are categories of objectsAttributes are characteristics of an object

OIDs guarantee schema entries are unique

LDAP Information Model

LDAP Information Model The LDAP information model describes the structure of information in a directory and organizes objects into a hierarchical tree structure. The implementation of the model is called the schema, which is a set of objects that defines the structure and content of every object that can be created in a directory service.

Classes and Attributes Classes and attributes are defined in the schema by classSchema objects (object classes) and attributeSchema objects (object attributes), as follows:

• Object classes are categories of objects that can be created in the directory. For example, users, computers, and printers are classes of objects. Every object in the directory is created as an instance of some class according to the definition that is stored in the classSchema object for the respective class.

• Object attributes are the characteristics of the object. (These characteristics are also called properties). An attribute can hold a value or values that represent some property of the object. For example, given name, surname, and e-mail address are attributes of every object of the user class, and their values can be created only as character strings. The schema specifies the attributes that are required to have values and the attributes that can have values as an option. In Active Directory, only attributes that have values assigned to them actually use storage space in the database.




• Each attribute has a syntax that is specified in the schema that determines the kind of values that are allowed in the attribute. Examples of attribute syntaxes are Unicode string, binary, and integer.

New object classes and attributes can be added to the schema, and existing objects can be modified by adding or modifying classSchema and attributeSchema objects.

Child classes inherit attributes from their parent classes. Therefore, each class builds on the attribute set of its parent class. The position in the directory tree of one object relative to another is also defined in the schema.

Object Identifiers Object identifiers (also known as OIDs) are hierarchical, dotted-decimal numeric values that uniquely identify entries in a data model. Object identifiers are found in OSI applications, X.500 directories, Simple Network Management Protocol (SNMP), and other applications in which uniqueness is required. Object identifiers are based on a tree structure in which a designated issuing authority (such as the ISO) allocates a branch of the tree to a subauthority, which in turn can allocate sub-branches. The Active Directory schema identifies the object identifier for each class, attribute, and syntax.




LDAP Information Model (con’t)

Leaf objects – have no child objects

Container objects – contain leaf objects and other containers

RootDseTop of logical namespace for a directory server Contains information about the directory server

LDAP Information Model (con’t)

Leaf Objects and Container Objects Objects in Active Directory are either leaf objects or container objects. A leaf object is an object that has no child objects. The term “container” refers to one of two things:

• An object of the container structural class • An object that has child objects In the schema, a structural class defines objects that can be created as instances of the class in Active Directory. To have child objects, an object must be an instance of a class that is defined by the schema as being a possible superior of those child objects.

rootDSE An LDAP directory service is also referred to as a DSA. In Active Directory, servers that host DSAs are domain controllers.

At the root of an LDAP directory tree is a DSE (the rootDSE), which is not part of any directory partition. The rootDSE represents the top of the logical namespace for one domain controller. The rootDSE attributes contain information about the directory server, including its capabilities and configuration.




There is only one root for a given DSA, but the information that is stored in the root is specific to the domain controller to which you are connected. Among other things, the attributes of the rootDSE identify the following key information:

• The directory partitions (the domain, schema, and configuration directory partitions) that are specific to one domain controller

• The forest root domain directory partition In this way, the rootDSE provides a “table of contents” for a given domain controller.




LDAP Naming Model

Distinguished namecn=Kim Akers, ou=department,dc=corp,dc=contoso,dc=com

Relative distinguished namecn=Kim Akers

Naming AttributesCommon name = cnOrganizational unit = ouDomain component = dc

Object Identity Global Unique Identifier (GUID)

DNS-to-LDAP Distinguished Name Mapping

LDAP Naming Model

The LDAP naming model describes how information is organized and referenced in an LDAP directory. LDAP requires that names of directory objects be formed according to RFC 1779, “A String Representation of Distinguished Names,” and RFC 2247, “Using Domains in LDAP/X.500 Distinguished Names.”

Distinguished Name Every object in Active Directory has a distinguished name (also known as DN). A distinguished name uniquely identifies an object by using the name of the object, plus the names of the container objects and domains that contain the object. Therefore, the distinguished name identifies the object as well as its location in a tree. The distinguished name is unambiguous (that is, it identifies one object only) and unique (that is, no other object in the directory has this name). It contains enough information for an LDAP client to retrieve the object’s information from the directory.

For example, a user named Jeff Smith works in the support department of a company as a promotions coordinator. His user account is created in an OU that stores the accounts for support department employees. The root domain of the company is contoso.com, and the local domain is corp.contoso.com. The distinguished name for this user object is:

cn=kim akers,ou=support,ou=departments,dc=corp,dc=contoso,dc=com




Relative Distinguished Name The relative distinguished name (also known as the RDN) of an object is the part of the distinguished name that is an attribute of the object itself — the part of the object name that identifies this object as unique within a container. For the distinguished name example in the previous paragraph:

cn=Kim Akers,ou=support,ou=departments,dc=corp,dc=contoso,dc=com

the relative distinguished name is:

cn=Kim Akers

Figure 1, below, illustrates the relative distinguished names that make up the distinguished name of the user object Jeff Smith.

Figure 1: Relative Distinguished Names That Make Up a Distinguished Name

The maximum length that is allowed for a relative distinguished name is 255 characters, but attributes have specific limits that are imposed by the directory schema. For example, in the case of the common name (cn), which is the attribute type that is often used for naming the relative distinguished name, the maximum number of characters that is allowed is 64.




Active Directory relative distinguished names are unique within a container; that is, Active Directory does not permit two objects with the same relative distinguished name under the same parent container. However, two objects can have identical relative distinguished names but still be unique in the directory because, within their respective parent containers, their distinguished names are not the same.

For example, the object: cn=Kim Akers,ou=departments,dc=corp,dc=contoso,dc=com

is recognized by LDAP as being different from:

cn=Kim Akers,ou=markerting,dc=corp,dc=contoso,dc=com

The relative distinguished name for each object is stored in the Active Directory database. Each object in the directory contains a reference to the parent of the object. An LDAP operation can construct the entire distinguished name by following these references to the root.

Naming Attributes Each portion of the distinguished name is expressed as attribute_type=value. The attribute type that is used to describe the object’s relative distinguished name (in the Jeff Smith example, cn) is called the naming attribute. In Active Directory, instances of classes have a default mandatory naming attribute that is defined in the schema. For example, part of the definition of the class user is the attribute cn (Common-Name) as the naming attribute. Therefore, the relative distinguished name for user Jeff Smith is expressed as cn=Jeff Smith.

Classes that do not define a naming attribute inherit the naming attribute from their parent class. If you create a new class in the Active Directory schema (that is, if you create a new classSchemaobject), you can use the optional rDNAttID attribute to specify the naming attribute for the class.

Table 1, below, shows the naming attributes that are used in Active Directory.

Object Class Naming Attribute Display Name

Naming Attribute LDAP Name

user Common-Name cn

organizationalUnit Organizational-Unit-Name ou

domain Domain-Component dc

Table 1: Default Active Directory Naming Attributes

Other naming attributes that are described in RFC 2253, “Lightweight Directory Access Protocol (v3): UTF-8 String Representation of Distinguished Names,” such as o= for organization name and c= for country/region name, are not used in Active Directory.




The use of distinguished names, relative distinguished names, and naming attributes is required when you are programming for LDAP and using ADSI or other scripting or programming languages. Active Directory tools, such as Active Directory Users and Computers, do not require you to enter such values, nor do they display these values. However, LDAP editors, such as ADSI Edit in Support Tools, require input and display output in the LDAP distinguished name format.

Using different naming attributes for users to avoid naming collisions

To ensure data integrity, Active Directory requires that relative distinguished names be unique in a container. By default, the user class uses Common-Name (cn) as the naming attribute, which ties the test for uniqueness to the user name. The combination of these two restrictions can result in naming collision problems in large deployments. For example, a very large company might want to create user accounts in the same OU where, as a result of the high incidence of certain common names, many user objects have identical first and last names and, therefore, identical relative distinguished names. In this scenario, it is helpful to be able to use a different naming attribute that guarantees uniqueness, such as an employee ID that is created by the human resources department.

Object Identity In addition to its distinguished name, every object in Active Directory has an identifier that is unique. This identifier is called the globally unique identifier (GUID). A GUID is a unique 128-bit number that is assigned by the DSA when the object is created. Objects might be moved or renamed within a forest, but their GUID never changes. The GUID is stored in an attribute, objectGUID, which is present on every object. The objectGUID attribute is protected so that it cannot be altered or removed. When you store a reference to an Active Directory object in an external store (for example, in a database, such as Microsoft SQL Server), you should use the objectGUID value to represent the object uniquely.

DNS-to-LDAP Distinguished Name Mapping Although DNS domain names match Active Directory domain names, they are not the same thing. Active Directory names have a different format, which is required by LDAP to identify directory objects. Therefore, DNS domain names are mapped to Active Directory domain names (and back again) as described in RFC 2247.

All access to Active Directory is carried out through LDAP, and every object in Active Directory has an LDAP distinguished name. An algorithm automatically provides an LDAP distinguished name for each DNS domain name.




LDAP Compliance

Applications may be either directory-aware (Capable of reading an LDAP directory)

OR

Directory enabled (capable of reading and performing other defined LDAP operations on a directory)

Compliance with standards does not ensure interoperability

LDAP Compliance

Applications may be either directory-aware—capable of reading an LDAP directory—or directory-enabled—capable of reading and performing other defined LDAP operations on a directory. Implementations should be considered LDAP-compliant if proposed standards are followed in achieving an application’s desired level of LDAP functionality. Compliance with standards, however, does not ensure interoperability. Standards can lack sufficient clarity, even after formalization, leading to varying guideline interpretations.

The compliance task for directory server vendors weighs their ability to conform to defined standards while ensuring interoperability with those standards. In addition, the process of standard formalization is an ongoing effort, which compounds the difficulty of achieving full compliance. For example, LDAPv2 has been formalized with a defined set of RFCs, however, LDAPv3, a Proposed Standard, is, theoretically, still a work in progress, as it moves toward an Internet Standard. Finally, a vendor’s compliance statement should be viewed as a definition of its ability to implement a known set of RFCs, accurately interpret those RFCs to ensure interoperability, and provide a framework capable of incorporating new RFCs.




Active Directory LDAP Compliance

Active Directory supports baseline RFC compliance

Active Directory extends beyond baseline RFCs

Support for native LDAP calls

Active Directory LDAP Compliance

Windows Server 2003 Although the LDAP compliance guidelines proposed by the Directory Interoperability Forum signify conformance at a base level, organizations demand directory servers that are capable of providing robust network and application services. This is one of the driving forces behind Microsoft’s support of virtually all IETF-recognized LDAP components. Although LDAP compliance should be considered a work in progress until full IETF standardization is complete, Microsoft’s current LDAP compliance in Windows Server 2003 includes support of the RFCs described in Table 2, below.

RFC Core LDAP Requirements–RFC 3377 RFC Status Timeline

2251 Lightweight Directory Access Protocol (v3)

Proposed Windows 2000

2252 Lightweight Directory Access Protocol (v3): Attribute Syntax Definitions


2253 Lightweight Directory Access Protocol (v3): UTF-8 String Representation of Distinguished Names


2254 The String Representation of LDAP Search Filters


2255 The LDAP URL Format Proposed Windows 2000

2256 A Summary of the X.500(96) User Schema for use with LDAPv3


2829 Authentication Methods for LDAP Proposed Windows 2000




RFC Core LDAP Requirements–RFC 3377 RFC Status Timeline

2830 Lightweight Directory Access Protocol (v3): Extension for Transport Layer Security


RFC Additional LDAP RFC Support RFC Status Timeline

2696 LDAP Control Extension for Simple Paged Results Manipulation


2247 Using Domains in LDAP/X.500 Distinguished Names


2589 LDAP Protocol (v3): Extensions for Dynamic Directory Services

Proposed Windows Server 2003

2798 Definition of the inetOrgPerson LDAP Object Class


2831 Using Digest Authentication as an SASL Mechanism


2891 LDAP Control Extension for Server-Side Sorting of Search Results


Table 2: LDAP RFCs supported by Active Directory

inetOrgPerson As noted previously, Windows Server 2003’s Active Directory schema includes the full definition and provides complete manageability of RFC 2798 and the inetOrgPerson LDAP Object Class.

Native LDAP Calls Active Directory fully supports native LDAP calls through support of the LDAP API (RFC 1823)—it is actually the primary access method for the majority of Windows’ directory operations. Microsoft merely provides ADSI as an abstraction layer to simplify directory development and promote directory interoperability.

It is important to note that Active Directory does have some functionality that is not exposed by the LDAP protocol because the functionality extends beyond the LDAP model. For instance, LDAP applications bind to specific directory servers via the servers’ Domain Name System (DNS) names, whereas Active Directory applications have the option of employing a distributed locator service to find a nearby replica of a given directory partition. This option does not force independent software vendors (ISVs) to rewrite their applications, because they can bind to a specific LDAP server, as usual.




However, if an ISV wants to use the Active Directory distributed locator service to reduce the customer cost of managing an application, they can call the DsGetDcName API and then use LDAP. Conversely, the ISV may use ADSI, which includes both DsGetDcName and LDAP. The choice is entirely up to the ISV.

Active Directory fully supports writing to the Server Principal Name via the Windows Server 2003 and Windows 2000 Server LDAP APIs.




Section 2: Active Directory Architecture

Understand Forests, Domains, and Organizational Units

Locate Active Directory on disk

Understand directory partitions

Identify and describe a Global Catalog Server

Explain Flexible Single Master Operations (FSMO’s)

Section 2: Active Directory Architecture

Introduction In this section you will be introduced the key architectural concepts of Active Directory.


• Understand Forests, Domains, and Organizational Units. • Locate Active Directory on disk. • Understand directory partitions. • Identify and describe a Global Catalog Server. • Explain Flexible Single Master Operations (FSMO’s).




Active Directory Architecture

SecureHierarchical Four Parts

AD ObjectsDNS Support for ADSchemaData Store

Active Directory Architecture

You can define some components for structure and storage in Active Directory, while others are defined by the system and cannot be modified.

• Forests, domains, and OUs: Components that constitute the logical structure of Active Directory. You define them during the Active Directory installation.

• DNS support for Active Directory: Includes components that are used to locate domain controllers and that use DNS naming schemes. Each domain in a forest must adhere to DNS naming schemes, and domains are organized in a root and subordinate domain hierarchy.

• The schema: A single component that exists inside the directory. The schema contains definitions of the objects that are used to store information in the directory. More detail on the schema is in the next section.

• The data store: Consists of three layers of components. The first layer provides the interfaces that clients need to access the directory; the second layer provides the services that perform the operations that are associated with reading data from and writing data to the directory database; the third layer is the database itself, which exists as a single file on the hard disk of each domain controller.




Active Directory Schema

Object Definition

Individual copy on each DC

Governs what comes in and goes out of AD

Enforces data integrityUniformity


Active Directory Schema Everything that is stored in Active Directory is stored in an object. A definition for every type of object is stored in the schema. The definitions themselves consist of two types of objects: class objects and attribute objects. Classes define groups of attributes that are used to describe common objects. New object definitions are created by combining various class objects and attribute objects to make new combinations that contain the necessary attributes to meet the storage requirements of the new object type. The two main types of object definitions that are stored in the Active Directory schema are described in Table 3, below.

Component Description

classSchema objects classSchema objects are object definitions that are stored in the schema, and they are used to define classes. classSchema objects define groups of attributes that have something in common. For example, an object that is used to store a user account needs to store the user’s logon name, first name, last name, and password. It is possible to create a user class that has a logon name attribute, a first name attribute, a last name attribute, and a password attribute. Anytime a new user account is created, the directory uses the user class as the definition, and every user account object that is created uses those attributes. classSchema objects can be nested to create more complex objects.





attributeSchema objects attributeSchema objects define the individual attributes of a single object. For example, a user account object has a number of attributes that are used to store and define various pieces of data that are related to a user account, such as a logon name attribute and a password attribute. Each of these attributes also has its own attributes that specify the type of data that it stores, the syntax of the data that it stores, and whether or not the attribute is required or optional. The directory service uses attributeSchema objects to store data and verify that the stored data is valid.

Table 3: Schema Components




Active Directory Data Schema

InterfacesLDAPREPL (Replication)MAPISAM

ServicesDSADB LayerESE


The Active Directory data store is implemented on every domain controller in the forest. The data store consists of components that store and retrieve data inside the directory. The components of the Active Directory data store are as follows:

• Interfaces (LDAP, REPL, MAPI, SAM): The data store interfaces provide a way for directory clients and other directory servers to communicate with the data store.

• DSA (Ntdsa.dll): The DSA (which runs as Ntdsa.dll on each domain controller) provides the interfaces through which directory clients and other directory servers gain access to the directory database. In addition, the DSA enforces directory semantics, maintains the schema, guarantees object identity, and enforces data types on attributes.

• Database layer: The database layer is an API that resides in Ntdsa.dll and provides an interface between applications and the directory database, to protect the database from direct interaction with applications. Calls from applications are never made directly to the database; they go through the database layer. In addition, because the directory database is flat, with no hierarchical namespace, the database layer provides the database with an abstraction of an object hierarchy.




• ESE (Esent.dll): The ESE (which runs as Esent.dll) communicates directly with individual records in the directory database on the basis of an object’s relative distinguished name attribute.

• Database files: The data store stores directory information in a single database file. In addition, the data store also uses log files, to which it, temporarily, writes uncommitted transactions.




Active Data Store Physical Structure

Data Store ComponentsNTDS.DIT

•Active Directory Storage File•Maintains 3 Tables: Data Table, Link Table, Security Descriptor Table

EDB.LOG•Current Transaction Log•All Transactions created here before being committed to NTDS.DIT

EDB****.LOG•Logs that are complete and committed to NTDS.DIT

EDB.CHK•Checkpoint file (JET) used to identify committed vs. uncommitted transactions

RES1.LOG and RES2.LOG•Reserved space for EDB.LOG•Each file is 10MB

Active Directory Data Store Physical Structure

Active Directory Database Active Directory data is stored in the Ntds.dit database file. The Active Directory database (Ntds.dit) contains three internal tables, the data table, link table, and SD table, which are described in the following sections.

Two copies of Ntds.dit are present in separate default locations on a domain controller, systemroot\NTDS and systemroot\System32:

• Systemroot\NTDS\Ntds.dit: Stores the database that is in use on a domain controller. It contains the values for the domain and a replica of the values for the forest (the Configuration container data).

• Systemroot\System32\Ntds.dit: The distribution copy of the default directory that is used when you install Active Directory on a server running Windows Server 2003 to create a domain controller. Because this file is available, you can run the Active Directory Installation Wizard without having to use the server operating system CD-ROM.

Components of the Active Directory data store are as follows:

• NTDS.DIT: The physical database file in which all directory data is stored. This file consists of three internal tables: the data table, link table, and security descriptor (SD) table.




• EDB.LOG: The log file into which directory transactions are written before they are committed to the database file.

• EDB.CHK: The file that is used to track the point up to which transactions in the log file have been committed.

• RES1.LOG and RES2.LOG: The files that are used to reserve space for additional log files if EDB.LOG becomes full.




Active Directory Partitions

Partition = Naming ContextForest Wide vs. Domain Wide Partitions

Domain Wide•Domain Specific Data•Full copy on each DC for a given Domain

Forest Wide•Stored in two Partitions – Schema and Configuration•Full copy on each DC in the Forest•Schema is read-only except for Schema Master FSMO•Global Catalog

Application PartitionsData needed for applications not prudent in Domain PartitionWindows Server 2003 stores DNS data in App Partitions

Active Directory Partitions

In order to scale to tens of millions of objects, the Active Directory data store is logically partitioned in such a way that each domain controller does not store the entire directory. To accomplish logical partitioning, the data is categorized according to a naming scheme: object names group objects into logical categories, so that the objects can be managed and replicated appropriately. In Active Directory, the largest of these logical categories is called a directory partition.

Every domain controller holds at least one directory partition that stores domain data, such as users, groups, and OUs. Every domain controller also stores two non-domain directory partitions that store forest-wide data, which includes the schema and configuration data.

The data store holds data for a single forest. Although there is a single directory, some directory data is distributed within domains, while other data is distributed throughout the forest, without regard for domain boundaries. In Windows Server 2003, data can also be distributed to domain controllers, according to applications that use the data, where the scope of distribution is set by the application. The three types of data that are stored in the Active Directory data store are:

• Domain-wide data: Domain-specific data is stored in a domain directory partition.A full, writable replica of the domain directory partition is replicated to every domain controller in the domain.




• Forest-wide data: Forest-wide data is stored in two directory partitions: the configuration directory partition and the schema directory partition. The Configuration container is the topmost object of the configuration directory partition; the Schema container is the topmost object of the schema directory partition. A full, writable replica of the configuration directory partition is replicated to every domain controller in the forest. A read-only replica of the schema directory partition is replicated to every domain controller in the forest. The schema is writable on only that domain controller that holds the schema operations master role, and writing to the schema requires first adding a registry entry on the schema master. Schema updates replicate to every domain controller in the forest. In addition to a full, writable replica of a single domain (the domain for which the domain controller is authoritative), special domain controllers that are designated as global catalog servers also store partial, read-only replicas of every other domain directory partition in the forest. (The read-only replicas in the global catalog are “partial” because they store only some of the attributes for each object.) A domain controller that is a global catalog server can be queried to find any object in the forest.

• Application data: Applications can use a new type of directory partition in Windows Server 2003 Active Directory, called an application directory partition, to store application-specific data that has a scope of interest that is smaller than the entire forest or domain. This data can be characterized as either changing frequently (dynamic) or having a short useful lifetime (volatile). For example, Windows Server 2003 DNS can use application directory partitions to store dynamically updated DNS zone data on only those domain controllers that are DNS servers, rather than on all domain controllers in the domain, as is required for Windows 2000 Active Directory-integrated zones.




Directory Partitions Hierarchy

RootDSETop of NamespaceDomain Controller Specific

Cross-Reference ObjectsRefers to a specific PartitionName and Location of each directory Partition

Forest Root DomainFirst Domain in the Forest

Directory PartitionsConfig, Schema, Domain

Directory Partition Hierarchy

In Active Directory, a directory partition is a portion of the directory namespace. Each directory partition contains a subtree of the directory objects in the directory tree. The same directory partition can be stored as a replica on many domain controllers, and the replicas are updated through directory replication.

There is an important distinction between the physical storage of a directory partition and its logical position in the directory tree. Physically, all objects are stored in a single database table, regardless of the directory partition to which they are assigned because of their object names. Logically, the head of a directory partition appears in the naming hierarchy as the topmost object; that is, each of the Domain container, the Configuration container, and the Schema container has a distinguished name that identifies its position in the hierarchy.

Every domain controller stores a replica of a domain directory partition, the configuration directory partition, and the schema partition.

Although the schema directory partition is replicated to every domain controller in the forest, it can be updated only on the domain controller that holds the schema operations master role.

Figure 2, below, is a conceptual diagram of the directory tree hierarchy, including the directory root (rootDSE) and the default directory partitions below the directory root. The rootDSE represents the top of the logical namespace for one domain controller and, as such, it represents the top of the LDAP search tree. There is only one root for a given




directory, but the information that is stored in the root is specific to the domain controller to which you connect.

In any Active Directory forest, the first domain directory partition that is created in the forest (the forest root domain), the configuration directory partition, and the schema directory partition always form the hierarchy.

Additional directory partitions can exist, in the form of application directory partitions, but these partitions are not stored, by default, on every domain controller.

Directory Partition Cross-Reference Objects When Active Directory is installed to create the first domain controller in a new forest, the three directory partitions that are shown in Figure 6 are created on the domain controller. At this time, a cross-reference object (class crossRef) is created for each directory partition in the Partitions container in the configuration directory partition (CN=partitions,CN=configuration,DC=forestRootDomain). Creation of each subsequent directory partition in the forest, either by installing Active Directory to create a new domain or by creating a new application directory partition on an existing domain controller, initiates the creation of an associated cross-reference object in the Partitions container.

Note: You can also manually create a cross-reference object for an application directory partition using NTDSUTIL.EXE.

A cross-reference object identifies the name and server location of each directory partition in the forest. The replication system uses this information to identify servers that store the same directory partitions. LDAP queries use cross-reference objects to create referrals to different domains.




Forest Root Domain Because the forest root domain is the first domain that is created in a forest, it is the root name in the domain namespace hierarchy. In naming only, the topmost object of the configuration directory partition—the Configuration container—is the child of the forest root domain object in the hierarchy. The LDAP distinguished name of the Configuration container (CN=configuration,DC=forestRootDomain) reflects this naming hierarchy, which links the configuration directory partition to the forest.

Similarly, the topmost object in the schema directory partition—the Schema container—is the child of the Configuration container. The distinguished name of the Schema container (CN=schema,CN=configuration,DC=forestRootDomain) links the schema to the forest.

Directory Partitions The Active Directory data store holds three default directory partitions:

• The configuration directory partition • The schema directory partition • The domain directory partition Optionally, the data store may also hold one or more application directory partitions.

Configuration Directory Partition The configuration directory partition is created when the first domain of a forest is created during the installation of Active Directory. Thereafter, it is replicated to every new domain controller that is added to the forest. The configuration directory partition holds information of global interest, for example, the default configuration and policy information for all instances of a given service in the forest.

Schema Directory Partition The Active Directory schema is stored in the Schema container in the schema directory partition. The schema consists of a set of object classes, attributes, and syntaxes. It also defines rules that ensure that objects are created and modified with consistency. Active Directory contains a default set of classes and attributes that cannot be modified. However, if you have Schema Administrators credentials, and if schema modification is enabled for the domain controller, you can extend the schema by adding new attributes and classes to represent application-specific classes. These changes are targeted at the domain controller that is the schema master for the forest. Only the schema master stores a writable copy of the schema.

Domain Directory Partition When you create a new domain, a domain directory partition is created in Active Directory as an instance of the class domainDns. A cross-reference object is added for the domain in the Partitions container to advertise the domain’s location in the directory.




Global Catalog

GC SearchesDefault setting on first installed Domain DCProvides search capabilities to other Domains

Stored in Own Directory PartitionsPartial attribute set

•Most commonly searched objects•isMemberOfPartialAttributeSet•Additional attributes possible if necessary

Read onlyArchitecture

LDAP 3268Direct MAPI Interface

Global Catalog

The global catalog is a distributed data repository that contains a searchable, partial representation of every object in every domain in a multidomain Active Directory forest. The global catalog is stored on domain controllers that have been designated as global catalog servers and is distributed through multimaster replication. Searches that are directed to the global catalog are faster, because they do not involve referrals to different domain controllers.

In addition to configuration and schema directory partition replicas, every domain controller in a Windows Server 2003 forest stores a full, writable replica of a single domain directory partition. Therefore, a domain controller can locate only those objects in its own domain. Locating an object in a different domain would require the user or application to provide the domain of the requested object.

The global catalog provides the ability to locate objects from any domain without having to know the domain name. A global catalog server is a domain controller that, in addition to its full, writable domain directory partition replica, also stores a partial, read-only replica of all other domain directory partitions in the forest. The additional domain directory partitions are partial because only a limited set of attributes is included for each object. By including only those attributes that are most used for searching, every object in every domain in even the largest forest can be represented in the database of a single global catalog server.




Global Catalog Architecture Global catalog server architecture differs from non-global-catalog server architecture in its use of the nonstandard LDAP port 3268, which directs queries to the global catalog. Queries over this port are formed in the same way as any other LDAP query, but Active Directory varies the search behavior according to the port that is used: queries over port 3268 target the global catalog directory partitions (including the read-only domain directory partitions and the one writable domain directory partition for which the server is authoritative); and queries over port 389 target only the writable domain, configuration, application, and schema directory partition replicas stored by the global catalog server in its role as a domain controller. In addition, domain controllers use the proprietary replication interface, when they contact global catalog servers to retrieve universal group membership, during client logons.

Search clients include Exchange Address Book clients, which use the client MAPI provider Emsabp32.dll to look up e-mail addresses in the global catalog. The client-side MAPI provider communicates with the server through the proprietary Name Service Provider Interface (NSPI) RPC interface.




Global Catalog Physical Structure

Stored in NTDS.DIT

One partition per domain

Partial Attribute Set

Global Catalog Physical Structure

Active Directory is a distributed directory service in which data is stored as replicas on multiple domain controllers to provide a virtual database that maintains consistency through Active Directory replication. Domain controllers provide the domain-wide distribution of directory data. Global catalog servers provide the forest-wide distribution of directory data in a multidomain forest.

Global Catalog Partial Attribute Set In its role as a domain controller, a global catalog server stores one domain directory partition that has writable objects with a full complement of writable attributes. In its role as global catalog server, it also stores the objects of all other domain directory partitions in a multidomain forest as read-only objects with a partial set of attributes. The set of attributes that are marked for inclusion in the global catalog are called the partial attribute set (PAS). An attribute is marked for inclusion in the PAS as part of its schema definition.

Objects in the schema that define an attribute are attributeSchema objects, which themselves have an attribute isMemberOfPartialAttributeSet. If the value of that attribute is TRUE, the attribute is replicated to the global catalog. The replication topology for the global catalog is generated automatically by the Knowledge Consistency Checker (KCC), a built-in process that implements a replication topology that is guaranteed to deliver the contents of every directory partition to every global catalog server.

The attributes that are replicated to the global catalog, by default, include a base set that has been defined by Microsoft as the attributes that are most likely to be used in searches.




Administrators can use the Microsoft Management Console (MMC) Active Directory Schema snap-in to specify additional attributes to meet the needs of their installations. In the Active Directory Schema snap-in, you can select the Replicate this attribute to the global catalog check box to designate an attributeSchema object as a member of the PAS; this sets the value of the isMemberOfPartialAttributeSet attribute to TRUE.

The physical representation of global catalog data is the same as the representation of data in all domain controllers: the Ntds.dit database stores object attributes in a single file. On a domain controller that is not a global catalog server, the Ntds.dit file contains a full, writable replica of every object in one domain directory partition for its own domain, along with the writable configuration and schema directory partitions.

Figure 3, below, shows the physical representations of the global catalog as a forest-wide resource that is distributed as a database on global catalog servers.

Figure 3: Global Catalog Physical Structure




As shown in Figure 3, a global catalog server stores a replica of its own domain (full and writable) and partial, read-only replicas of all other domains in the forest. All directory partitions on a global catalog server, whether full or partial, are stored in the directory database file (Ntds.dit) on that server. That is, there is not a separate storage area for global catalog attributes; they are treated as additional information in the directory database of the global catalog server.

Table 4 describes the physical components of the Global Catalog Server.

Physical Component Description

Active Directory forest The Active Directory forest is the set of domains that comprise the Active Directory logical structure and that are searchable in the global catalog.

Domain controller The domain controller is the server that stores one full, writable domain directory partition plus along with forest-wide configuration and schema directory partitions. Global catalog servers are always domain controllers.

Global catalog server The Global Catalog server is the domain controller that stores one full, writable domain plus along with forest-wide configuration and schema directory partitions, as well as a partial, read-only replica of all other domains in the forest.

Ntds.dit The Ntds.dit is the database file that stores replicas of the Active Directory objects held by any domain controller, including global catalog servers.

Table 4: Physical components of a GC




Global Catalog Searches

LDAP search ports32683269

Search criteriaSpecify GC portsSelect entire directory in search scopeNon-local objects

Global Catalog Searches

The location of an object in Active Directory is provided by the distinguished name of the object, which includes the full path to a replica of the object, culminating in the directory partition that holds the object. However, the user or application does not always know the distinguished name of the target object, or even the domain of the object. To locate objects without knowing the domains in which they are located, the most commonly used attributes of the object are replicated to the global catalog. By using these object attributes and directing the search to the global catalog, requesters can find objects of interest without having to know their directory locations. For example, to locate a printer, you can search according to the name of the building in which the printer is located. To locate a person, you can search on the name of the person. To locate all people who are managed by someone, you search on the manager’s name.

LDAP Search Ports Active Directory uses LDAP as its access protocol. LDAP search requests can be sent and received by Active Directory on port 389 (the default LDAP access port) and port 3268 (the default global catalog port). LDAP traffic that uses the SSL authentication protocol accesses ports 686 and 3269, respectively. In this discussion, search behavior that applies to ports 389 and 3268 also apply to the behavior of LDAP queries over ports 686 and 3269, respectively.




When a search request is sent to port 389, the search is conducted on a single domain directory partition. If the object is not found in that domain or in the schema or configuration directory partitions, the domain controller refers the request to a domain controller in the domain that is indicated in the distinguished name of the object.

When a search request is sent to port 3268, the search includes all directory partitions in the forest—that is, the search is processed by a global catalog server. If the request specifies attributes that are part of the PAS, the global catalog can return results for objects in any domain without generating a referral to a domain controller in a different domain. Only global catalog servers receive LDAP requests through port 3268. Certain LDAP client applications are programmed to use port 3268. Even if the data that satisfies a search request is available on a regular domain controller, if the application launching the search uses port 3268, the search goes to a global catalog server.

Search Criteria That Target the Global Catalog Searches are directed to a global catalog server under the following conditions:

• You specify port 3268 or 3269 in an LDAP search tool. • You select Entire Directory in a search-scope list in an Active Directory snap-in or

search tool, such as Active Directory Users and Computers, or the Search command on the Start menu.

• You write the distinguished name as an attribute value, where the distinguished name represents a non-local object. For example, if you are adding a member to a group, and the member that you are adding is from a different domain, a global catalog server verifies that the user object represented by the distinguished name exists and obtains its Globally Unique Identifier (GUID).

Characteristics of a Global Catalog Search The following characteristics differentiate a global catalog search from a standard LDAP search:

• Global catalog queries are directed to port 3268, which explicitly indicates that global catalog semantics are required. By default, ordinary LDAP searches are received through port 389. If you bind to port 389, even if you bind to a global catalog server, your search includes a single domain directory partition. If you bind to port 3268, your search includes all directory partitions in the forest. If the server you attempt to bind to over port 3268 is not a global catalog server, the server refuses the bind.

• Global catalog searches can specify a non-instantiated search base, indicated as "com" or " " (blank search base).

• Global catalog searches cross directory partition boundaries. The extent of the standard LDAP search is the directory partition.




• Global catalog searches do not return subordinate referrals. If you use port 3268 to request an attribute that is not in the global catalog, you do not receive a referral to it. Subordinate referrals are LDAP responses; when you query over port 3268, you receive global catalog responses, which are based solely on the contents of the global catalog. If you query the same server by using port 389, you receive referrals for objects that are in the forest but whose attributes are not referenced in the global catalog.

Note: A referral to a directory that is external to Active Directory can be returned by the global catalog, if a base-level search for an external directory is submitted, and if the distinguished name of the external directory uses the domain component (dc=) naming attribute. This referral is returned according to the ability of Active Directory to construct a Domain Name System (DNS) name from the domain components of the distinguished name, and it is not based on the presence of any cross-reference object. The same referral is returned by using the LDAP port; it is not specific to the global catalog.




Operations Master Roles

5 RolesProvide control over conflicting updatesAutomatically assigned

Forest Wide RolesSchema Master – Schema AdminsDomain Naming Master – Enterprise Admins

Domain Wide RolesPDC Emulator – Domain AdminsRID Master – Domain AdminsInfrastructure Master - Domain Admins

Operations Master Roles

Active Directory defines five operations master roles: the schema master, domain naming master, relative identifier (RID) master, primary domain controller, or PDC emulator master, and infrastructure master. The domain controllers that hold operations master roles are designated to perform specific tasks, to ensure consistency and to eliminate the potential for conflicting entries in the Active Directory database.

Active Directory is a multimaster-enabled database, which provides the flexibility of allowing changes to occur at any domain controller in the forest. However, because it is multimaster-enabled, it can also allow conflicting updates that can potentially lead to problems, when data is replicated throughout the domain or forest.

The general approach to resolving Active Directory replication conflicts is to order all update operations (Add, Modify, Move, and Delete) by assigning a globally unique stamp to the originating update. Each replicated attribute value (or multivalue) is stamped during the originating update, and this stamp is replicated with the value. The stamp that is applied during an originating write consists of a version number, a time stamp indicating when the originating write occurred, and the name of the originating domain controller. Conflicts are resolved by comparing the version number. If two stamps have the same version number, the originating time almost always breaks the tie. In the extremely rare event that the same attribute is updated on two different domain controllers during the same second, the originating domain controller breaks the tie in an arbitrary fashion.




Although this resolution method is acceptable, some changes are too difficult to resolve by using the stamp of the originating update. In such cases, it is best to prevent the conflict from occurring, rather than to try to resolve it after it has occurred.

When changes, such as the addition or removal of domains to a forest or password changes, are made, Active Directory performs them in a single-master fashion to prevent conflicting updates from occurring. In a single-master update model, only one domain controller in the entire directory is allowed to process the update. This is similar to the role of a Windows NT PDC, in which the PDC is responsible for processing all updates in a given domain.

Active Directory extends the single-master model to include multiple roles that are responsible for different types of updates. Active Directory also provides the ability to transfer an operations master role to another domain controller.

By designating a single domain controller to manage specific tasks, Active Directory enhances your ability to avoid conflicts in the directory, to ensure consistency of the schema, and to add a domain to, or remove a domain from a forest. Operations masters also maintain interaction between Windows Server 2003, Windows 2000 Server, and earlier versions of Windows operating systems, and they maintain consistent group-to-user references across domains.

The five operations master roles are assigned automatically when the first domain controller in a given domain is created. Two forest-level roles are assigned to the first domain controller created in a forest, and three domain-level roles are assigned to the first domain controller created in a domain.




Forest-wide Operations Master Roles

Schema MasterAll Schema changes governed hereResponsible for pushing changes to other DCs

Domain Naming MasterPerforms Domain Add/Delete requestsManages crossRef objects in Partitions containerPrepares Domain Rename Operations

Forest-wide Operations Master Roles

The schema master and domain naming master are forest-wide roles, meaning that there is only one schema master and one domain naming master in the entire forest.

Schema Master The schema master is responsible for performing updates to the Active Directory schema. The schema master is the only domain controller that can perform write operations to the directory schema. Those schema updates are replicated from the schema master to all other domain controllers in the forest. Having only one schema master for each forest prevents any conflicts that would result if two or more domain controllers attempt to concurrently update the schema.

Domain Naming Master The domain naming master manages the addition and removal of all domains and directory partitions, regardless of domain, in the forest hierarchy. The domain naming master role holder must be available in order to perform the following actions:

• Add new domains or application directory partitions to the forest. • Remove existing domains or application directory partitions from the forest. • Add replicas of existing application directory partitions to additional domain

controllers. • Add or remove cross-reference objects to or from external directories. • Prepare the forest for a domain rename operation.




Domain-wide Operations Master Roles

PDC EmulatorDown-level NT PDCProcesses Password ChangesAuthoritative Time Source for Domain

RID MasterRelative Identifier AllocatorEvery object created needs a RIDRID conflicts result in duplicate SIDs

Infrastructure MasterUpdates object references for objects in its Domain from other DomainsChanges include

•Inter and Intra Domain moves•Object deletion

Domain-wide Operations Master Roles

The other operations master roles are domain-wide roles, meaning that each domain in a forest has its own RID master, PDC emulator, and infrastructure master.

RID Master The relative identifier, or RID, operations master allocates blocks of RIDs to each domain controller in the domain. Whenever a domain controller creates a new security principal, such as a user, group, or computer object, it assigns a unique security identifier (SID) to the object. This SID consists of a domain SID, which is the same for all security principals created in the domain, and a RID, which uniquely identifies each security principal created in the domain.

PDC Emulator The PDC emulator operations master acts as a Windows NT PDC, in domains that contain client computers operating without Active Directory client software or Windows NT backup domain controllers (BDC). In addition, the PDC emulator processes password changes from clients and replicates the updates to the Windows NT BDCs. Even after all domain controllers are upgraded to Windows 2000 Server or Windows Server 2003, the PDC emulator receives preferential replication of password changes performed by other domain controllers in the domain.

If a logon authentication fails at another domain controller, because a bad password is used, that domain controller forwards the authentication request to the PDC emulator, before rejecting the logon attempt.




Infrastructure Master The infrastructure operations master is responsible for updating object references in its domain that point to the object in another domain. The infrastructure master updates object references locally and uses replication to bring all other replicas of the domain up to date. The object reference contains the object’s GUID, distinguished name and possibly a SID. The distinguished name and SID on the object reference are periodically updated to reflect changes made to the actual object. These changes include moves within and between domains, as well as the deletion of the object. If the infrastructure master is unavailable, updates to object references are delayed until it comes back online.




Operations Master Placement

Leave the two forest-wide roles on a domain controller that is in the forest root domain

Place the two forest-wide roles on a global catalog server

Place all three domain-wide roles on the same domain controller

Never separate PDC and RID unless absolutely necessary

Place the domain-wide roles on a higher performance domain controller

Adjust the workload of the operations master role holder, if necessary

Operations Master Placement

Because operations masters are critical to the long-term performance of the directory, they must be available to all domain controllers and desktop clients that require their services. Careful placement of your operations masters becomes more important as you add more domains and sites to build your forest.

By improperly placing operations master role holders, you might prevent clients that are running Windows NT Workstation 4.0, Windows 95, or Windows 98 without the Active Directory client installed from changing their passwords, or you might be unable to add domains and new objects, such as users and groups. You might also be unable to make changes to the schema. In addition, name changes might not appear properly, within group memberships that are displayed in the user interface.

As your environment changes, you must avoid the problems associated with improperly placed operations master role holders. Eventually, you might need to reassign the roles to other domain controllers.

Although you can assign the operations master roles to any domain controller, follow these guidelines to minimize administrative overhead and ensure the performance of Active Directory:

• Leave the two forest-wide roles on a domain controller that is in the forest root domain.

• Place the two forest-wide roles on a global catalog server. • Place all three domain-wide roles on the same domain controller.




• In a forest that contains multiple domains, do not place the domain-wide roles on a global catalog server, unless all domain controllers in the domain are also global catalog servers.

• Place the domain-wide roles on a higher performance domain controller. • Adjust the workload of the operations master role holder, if necessary.

Active Directory replication Operations masters replicate changes made on them throughout the domain or forest, depending on whether they hold domain roles or forest roles. Active Directory replication must be working properly, in order for the other domain controllers to receive these changes.

Domain Name System (DNS) Active Directory requires that DNS is properly designed and deployed, so that domain controllers can correctly resolve DNS names of replication partners. If DNS is not working properly, operations masters cannot be contacted to perform their specific domain or forest functions.

Security User rights for designating operations master roles can be set for groups or users in a forest. This allows you to limit or add to the group of default users that can change operations master role holders in a forest or domain. The following user rights are required to change operations master role holders:

• The Change Schema Master right is required to transfer or seize the schema master. By default, only members of the Schema Administrators group are assigned this right.

• The Change Domain Master right is required to transfer or seize the domain naming master role. Be default, only members of the Enterprise Administrators group are assigned this right.

• The Change PDC right is required to transfer or seize the PDC emulator role. By default, only members of the Domain Administrators group are assigned this right.

• The Change Infrastructure Master right is required to transfer or seize the infrastructure master. By default, only members of the Domain Administrators group are assigned this right.

• The Change RID Master right is required to transfer or seize the RID master role. By default, only members of the Domain Administrators group are assigned this right.




Module Summary

Understanding LDAP and Directory Services are critical to successfully managing and troubleshooting ADActive Directory is logically contained within multiple “partitions”Operations Masters perform specific tasks in an AD forest/domain and consideration should be given to placement

Module Summary

Module 2: Active Directory and Domain Name System 53



Module 2: Active Directory® and Domain Name System

54 Module 2: Active Directory and Domain Name System




Microsoft, Active Directory, Windows, Windows NT, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Version 1.0




Module Overview

Section 1 – Active Directory and DNSSection 2 – Aging and ScavengingSection 3 – DNS and Application PartitionsSection 4 – DNS Tools and Troubleshooting

Module Overview

Introduction Microsoft® Active Directory® directory service is dependent on Domain Name System (DNS) in order to function properly. This lesson will review the DNS requirements and the different types of DNS records in Active Directory.


• Explain the relationship between Active Directory and DNS. • Identify the DNS requirements so that Active Directory will function properly. • Understand aging and scavenging and the effect on domain controller locator records • Explain application partitions for DNS. • Review troubleshooting tools for DNS.


Section 1: Active Directory and DNS

AD and DNS DC RegistrationAD-Integrated DNSDC Locator and DNS EntriesOptimizing DC Record Location

Section 1: Active Directory and DNS

Introduction DNS is critical to the functioning of Active Directory. Domain controller location, authentication and replication all rely on a solid DNS infrastructure. Understanding how DNS and Active Directory work together is critical to troubleshooting and resolving issues.


• Explain the relationship between Active Directory and DNS. • Identify the DNS requirements to allow Active Directory to function properly.

Recommended Reading Windows Server 2003 Deployment Planning Guide, DNS and Active Directory:




Active Directory and DNS

Name resolution serviceDomain controller locationActive Directory Issues can be caused by DNS misconfiguraton/problems

Active Directory and DNS

Name Resolution Service Active Directory relies on DNS name resolution to perform key functions:

• Locate/Connect to domain controllers for authentication and replication • Locate/Connect to servers • Active Directory and File Replication Service (FRS) replication

Domain Name Service All Microsoft Windows Server™ 2003 domain controllers must use DNS as their locator service, and during the DCPROMO process you may see either of the two following error messages if DNS is not configured and functioning properly:

The domain “<Domain Name>” cannot be contacted. Select a different domain. (If this domain was recently created, its name may not yet be registered with the domain naming service.)

or

The following error occurred validating the name “<Domain Name>”.

The specified domain either does not exist or could not be contacted.

Note: Misconfigured DNS settings and entries are often the underlying cause of issues in the Windows Server 2003 Active Directory and networking environments.


Domain Controller Registration

Net Logon service on domain controllers registers service resource records (SRV)

The SRV record: RFC 2782Locating LDAP servers using SRV: draft-ietf-ldapext-locate-*.txt

SRV record format<service>.<protocol>.<domain> IN SRV <priority> <weight> <port> <host>

__ldap._tcp.dc._msdcs.contoso.comldap._tcp.dc._msdcs.contoso.com. . IN SRV 10 100 389 IN SRV 10 100 389 RootDC.contoso.comRootDC.contoso.com..


Domain Controller Registration Each domain controller (DC) registers its address with DNS using the standard DNS dynamic update. After Active Directory has been installed during DC creation, the Net Logon service dynamically creates records in the DNS database that are used to locate the server.

The records that are dynamically registered in DNS are called service records (SRV), and allow servers to be located by service type (in this example, LDAP (lightweight directory access protocol)) and protocols (for example, TCP and User Datagram Protocol [UDP]). In addition to registering LDAP-specific SRV records, Net Logon also registers Kerberos version 5 authentication protocol–specific SRV records to enable locating servers that run the Kerberos Key Distribution Center (KDC) service.

The format of the SRV records determines what service it provides and for what domain or forest. These are some of the SRV records registered by Net Logon service running on a DC:

_ldap._tcp.DnsDomainName - Allows a client to locate a server that is running the LDAP service in the domain named by DnsDomainName. For example, _ldap._tcp.contoso.com.

_ldap._tcp.SiteName._sites.DnsDomainName - Allows a client to locate a server that is running the LDAP service in the domain named in DnsDomainName in the site named by SiteName. For example, _ldap._tcp.east._sites.contoso.com.




_ldap._tcp.dc._msdcs.DnsDomainName - Allows a client to locate a DC of the domain named by DnsDomainName. All Windows Server 2003 Server–based DCs register this SRV record.

_ldap._tcp.SiteName._sites.dc._msdcs.DnsDomainName - Allows a client to locate a DC for the domain named by DnsDomainName and in the site named by SiteName.

_ldap._tcp.pdc._msdcs.DnsDomainName - Allows a client to locate the server that is acting as the primary domain controller (PDC) in the mixed-mode domain named in DnsDomainName.

_ldap._tcp.gc._msdcs.DnsForestName - Allows a client to locate a global catalog server for this forest.

_ldap._tcp.SiteName._sites.gc._msdcs.DnsForestName - Allows a client to locate a global catalog server for this forest in the site named in SiteName.

_gc._tcp.DnsForestName - Allows a client to locate a global catalog server for this forest. The server is not necessarily a DC.

The Net Logon service also registers records when the domain controller is restarted and when the Net Logon service starts. The Net Logon service sends DNS dynamic update queries for its SRV records, A records,, and CNAME records, every hour to ensure that the DNS server always has these records registered. As described in Request for Comments (RFC) 2136, dynamic update is a recent addition to the DNS standard. It defines a protocol for updating a DNS server with new or changed records dynamically.

Every Windows Server 2003–based DC also dynamically registers a single host resource record (an A resource record), which contains the name of the domain (DnsDomainName) in which the DC exists, and the IP address of the DC. The A resource record makes it possible for clients that do not recognize SRV records to locate a DC by means of a generic host lookup.



AddAdd

rootdc.contoso.comrootdc.contoso.com

__ldap._tcp.dc._msdcs.contoso.comldap._tcp.dc._msdcs.contoso.com. IN SRV 10 100 389 dc. IN SRV 10 100 389 dc--01.contoso.com.01.contoso.com.

dcdc--01.contoso.com. IN A <IP address>01.contoso.com. IN A <IP address>

contoso.comcontoso.com

5e50847e5e50847e--247d247d--49774977--841a841a--e6fcd80462e9._msdcs.contoso.com. IN CNAME <e6fcd80462e9._msdcs.contoso.com. IN CNAME <rootdc.contoso.comrootdc.contoso.com.>.>


Domain Controller Registration Example The following example illustrates the combined information that is contained in A resource records and SRV resource records. A domain controller named ROOTDC in contoso.com has an IP address of 157.54.160.14. It registers the following A records and SRV records with DNS:

rootdc.contoso.com A 157.54.160.14

_ldap._tcp. contoso.com SRV 0 0 389 rootdc.contoso.com

_kerberos._tcp. contoso.com SRV 0 0 88 rootdc. contoso.com

_ldap._tcp.dc._msdcs. contoso.com SRV 0 0 389 rootdc. contoso.com

_kerberos._tcp.dc._msdcs. contoso.com SRV 0 0 88 rootdc. contoso.com




Netlogon.DNS Netlogon.DNS

Netlogon.DNS As noted previously, the Net Logon service registers certain SRV, CNAME, and A resource records every hour, even if some or all these records are correctly registered in DNS. The list of records that the Net Logon service tries to register is stored in the %systemroot%\System32\Config\Netlogon.dns file. This log file lists records that are required to be registered for this domain controller. The Net Logon service does not control registrations that it performs on a per-adaptor basis.


Figure 1. NetLogon.DNS log file

If a domain controller is failing to register its SRV records in DNS, verify that its SRV records are listed in its Netlogon.dns files. See Figure 1, above.

If they are not, verify that the following registry value, UseDynamicDns, is set to 1.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters\UseDynamicDns

Data type: REG_DWORD

Range: 0 - 1 Default value: 1

Reference: For more information, please see “How to enable or disable DNS updates in Windows 2000 and in Windows Server 2003” at http://support.microsoft.com/?id=246804.




Active Directory Integrated DNS

• Benefits of integration of DNS in the Active Directory–Replication by Active Directory

•Per property replication•Multiple master (eliminate single point of failure)•Secure updates

–Support for zone transfers (AXFR/IXFR)–DNS zones stored in the Active Directory

• Active Directory integrated DNS must run on a domain controller

Active Directory Integrated DNS

In addition to supporting a conventional way of maintaining and replicating DNS zone files, the implementation of DNS in Windows Server 2003 has the option of using the Active Directory services as the data storage and replication engine. This approach provides the following benefits:

• DNS replication will be performed by the Active Directory service, so there is no need to support a separate replication topology for DNS servers.

• Active Directory service replication provides per-property replication granularity. • Active Directory supports the ability to secure updates to the zone. A primary DNS server is eliminated as a single point of failure. Original DNS replication is single-master; it relies on a primary DNS server to update all the secondary servers. Unlike original DNS replication, Active Directory service replication is multi-master; an update can be made to any domain controller, and the change will be propagated to other domain controllers. In this way, if DNS is integrated into Active Directory service, the replication engine will always synchronize the DNS zone information.

Thus, Active Directory service integration significantly simplifies the administration of a DNS namespace. At the same time, standard zone transfer to other servers (non-Windows Server 2003 DNS servers and previous versions of the Microsoft DNS servers) is still supported.

Only DNS servers running on domain controllers can load Active Directory integrated zones.


Replication of DNS Data

1) Receive 1) Receive updateupdate

3) Active Directory 3) Active Directory replicatesreplicates

2) Write to the Active 2) Write to the Active DirectoryDirectory

ADSADSDNSDNS

ADSADSDNSDNS

““PrimaryPrimary”” zoneszones

Active Directory-integrated DNS zone

4) Read the Active 4) Read the Active DirectoryDirectory

Replication of DNS Data

Active Directory supports multi-master replication, or replication in which any domain controller can send or receive updates of information stored in Active Directory. Replication processing is performed on a per-property basis, meaning only relevant changes are propagated. Replication processing differs from DNS full zone transfers, in which the entire zone is propagated. Replication processing also differs from incremental zone transfers, in which the server transfers all changes made since the last change. With Active Directory replication, however, only the final result of all changes to a record is sent.

When you store a primary zone in Active Directory, the zone information is replicated to all domain controllers within the Active Directory domain. Every DNS server running on a domain controller is then authoritative for that zone and can update it.




Domain Controller Locator

Clients query for SRV records to locate domain controllers

Function of Net Logon serviceIP/DNS Locator is used and appends appropriate string to the front of the domain nameSome records are site specific

__ldap._tcp.redmond._sites.dc._msdcs.contoso.comldap._tcp.redmond._sites.dc._msdcs.contoso.com. . IN SRV 10 100 389 IN SRV 10 100 389 rootdc.contoso.comrootdc.contoso.com..


The IP/DNS-compatible locator is used if the domain name passed to DsGetDcName is a DNS-compatible name. The Net Logon service on the client looks up the name in DNS (by calling DnsQuery) after it appends an appropriate string to the front of the domain name. The DNS service supports a query for determining the set of domain controllers. If the client site name is known, the client DNS query specifies the site. DNS returns the IP addresses of domain controllers that match the DNS query. The client Net Logon service sends an LDAP UDP message to one or more of the domain controllers that have been returned by DNS in order to determine whether any of the specified domain controllers are running and support the specified domain.

The locator process will be discussed more in the “Client Logon” module.



rootdc.contoso.comrootdc.contoso.com

ParisParis LondonLondon

rootdc.contoso.comrootdc.contoso.comdns.contoso.comdns.contoso.com

contoso.comcontoso.com

rootdc.contoso.comrootdc.contoso.com. IN A <IP address>. IN A <IP address>

__ldap._tcp.paris._sites.dc._msdcs.contoso.comldap._tcp.paris._sites.dc._msdcs.contoso.com..IN SRV 10 100 389 IN SRV 10 100 389 rootdc.contoso.comrootdc.contoso.com..

Response:Response:


When the appropriate SRV records and A records are in place, a DNS lookup of _ldap._tcp.dc._msdcs.contoso.com returns the names and addresses of all domain controllers in the domain.

_ldap._tcp.DnsDomainName _ldap._tcp.DnsDomainName allows a client to locate a server that is running the LDAP service in the domain named by DnsDomainName. The server is not necessarily a domain controller — that is, the only assumption that can be made about the server is that it supports the LDAP application programming interface (API). All Windows Server 2003–based domain controllers register this SRV record (for example, _ldap._tcp.contoso.com.). See graphic in slide above. [See comment to graphic above].




Windows 2003 Domain Controller Entries

Locating replication partners_msdcs zone and replicationTesting name resolution

Dc_<guid>_msdcs.DnsForestName

Windows 2003 Domain controller Entries

Locating Replication Partners A domain controller locates a replication partner by using DNS to look up the partner according to the globally unique identifier (GUID) of the NT Directory Service (NTDS) Settings object (class nTDSDSA), which uniquely identifies the domain controller. The NTDS Settings object represents the directory system agent (DSA) on the domain controller. Its GUID is guaranteed to find the correct server, even if its name has been changed.

The GUID of the NTDS Settings object is stored in the objectGUID attribute. The DSA GUID is created when Active Directory is installed on the domain controller, and is destroyed only if Active Directory is removed from the domain controller to create a member server.

The Active Directory database also has a GUID, which the DSA uses to identify the specific versions of the database when a database has been restored. This GUID is stored in the invocationId attribute on the nTDSDSA NTDS Settings object. During the system state restore of a domain controller, the Active Directory database is assigned a new invocationId, retiring the old one. This allows replication partners to treat the restored domain controller as a new domain controller for determining which object should be considered for replication.

As part of the DNS registration process, the Net Logon service on every domain controller registers a canonical name (CNAME) resource record. It is constructed using


the DSA GUID and maps to the DNS fully-qualified domain name (FQDN). The format of CNAME records is as follows:

DsaGuid._msdcs.DNSForestName

To locate a replication partner, a domain controller uses the DSA GUID of the NTDS Settings object of the partner to query the DNS for the CNAME record. DNS responds by returning both the CNAME resource record and the A resource record, which contains the IP address of the target domain controller. In addition, the domain controller uses information in the CNAME resource record to authenticate to the replication partner. Therefore, by using the CNAME and A resource record data, the domain controller can initiate replication.

_msdcs Zone and Replication The _msdcs.DnsForestName DNS zone contains a number of forest-wide service (SRV) resource records that are used to locate special servers, such as domain controllers and global catalog servers, and to facilitate replication. In the context of this discussion, it is important to note that if an authoritative DNS server for the _msdcs.DnsForestName zone is unavailable, replication between domain controllers cannot occur. To ensure the availability of this zone, do one of the following.

• Create a secondary replication topology to ensure that the _msdcs.DnsForestName zone is replicated to every DNS server in the forest. This is the default behavior for a new Windows Server 2003 domain and forest.

• If you are using Active Directory-integrated DNS, use the application directory partition containing this zone that is automatically created and replicated to all domain controllers in the forest that are DNS servers.




NTDS Settings Properties Dialog Box

C:\>ping 5e50847e-247d-4977-841a-e6fcd80462e9._msdcs.contoso.com

Pinging rootdc.contoso.com [157.54.160.14] with 32 bytes of data:

Reply from 157.54.160.14: bytes=32 time<1ms TTL=128




PING statistics for 157.54.160.14:

Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 0ms, Maximum = 0ms, Average = 0ms


Optimizing DC Record Registration

In large hub and spoke environments, DC location must never proceed to another branch office, only the central hubRegkey and GPO can control what SRVs get registeredModify Servers in branch office to limit their registrationsLeave Hub servers on default settings

Optimizing DC Record Registration

In the branch office scenario, it is important for clients who cannot find a domain controller in their own site to find a domain controller in their hub site, but never a domain controller in another branch or hub. In many deployments, clients from one branch cannot connect to machines in another branch, because the network is not fully routed (for example, one-way dial-up lines are used). Even if connectivity is possible, however, it is still undesirable to initiate network connections between branches. Such network traffic would always go through the hub site; therefore, it is better to restrict the traffic to branch-to-hub only.

To avoid the situation where clients in one branch contact a domain controller in another branch, the Net Logon service on all branch office domain controllers must be configured to publish only site-specific locator records, but not generic domain controller locator records. The result is that only the hub domain controllers publish the generic locator records in addition to their site-specific records.

Net Logon Registry Editing To prevent Net Logon on a domain controller from attempting dynamic updates of certain DNS records, use Regedt32.exe to configure the following registry value:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters

Registry value: DnsAvoidRegisterRecords

Data type: REG_MULTI_SZ




In this value, specify the list of mnemonics corresponding to the DNS records that should not be registered by this domain controller. For information about specific registry settings, see Chapter 2, “Name Space Planning in the Branch Office Configuration Planning Guide”.

Group Policy Configuration To change the value of this entry, you can use the Group Policy Object Editor (Gpedit.msc). The corresponding policy is located in Administrative Templates\System\Net Logon\DC Locator DNS Records.


Optimizing DC Record Registration (con’t)

Disabling Autosite CoverageAutosite allows clients to find DCs even if none are local to site.Disable AutoSiteCoverage on ALL of the domain controllersDo not register generic records, just site specificSee KB article 267855 for more information


Autosite coverage is discussed in detail in the Client Authentication module. It is reviewed here to show the impact on DNS.

A domain controller may register site-specific DC locator DNS SRV resource records for any other sites that do not contain a DC in the same role (such as one that hosts the same domain, or that is a Global Catalog) to which the site of the domain controller is the closest. This ensures that clients locate the nearest DC in case no DC is located in the client's site.

If you have a number of DC-less sites, you may choose to disable autosite coverage to better control which domain controllers register in the site.

When the value of HKLM\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters AutoSiteCoverage (type=dword) is 1, the system can add sites that do not have domain controllers to this domain controller's coverage area.

Value Meaning

0 The system cannot add sites to the coverage area of this domain controller.

1 The system can add sites to the coverage area of this domain controller.




The sites added to the domain controller's coverage are stored in memory, and a new list is assembled each time the Net Logon service starts or when Netlogon is notified of the site object changes. While Net Logon runs, it updates this list at an interval specified by the value of the entry DnsRefreshInterval.



NS server registrationDNS servers will register an NS record when loading a zone from ADThis will create many NS records and increases traffic during recursion in Branch Office scenarioDisable registration with Regkey(DisableNSRecordsAutoCreation), or DNSCMD


If you want to specify a list of DNS servers that can add corresponding NS (Name Server) records to a specified zone, choose one DNS server and then run Dnscmd.exe with the /AllowNSRecordsAutoCreation switch:

To set a list of TCP/IP addresses of DNS servers that have permission to automatically create NS records for a zone, use the dnscmd servername /config zonename /AllowNSRecordsAutoCreation IPList command. For example: Dnscmd NS1 /config zonename.com /AllowNSRecordsAutoCreation 10.1.1.1 10.5.4.2.

Note: Run this command on only one DNS server. Active Directory replication propagates the changes to all DNS servers that are running on DCs in the same domain.

In an environment in which the majority of the DNS DCs for a domain are located in branch offices and a few are located in a central location, you may want to use the DNSCMD command described earlier in this article to set the IPList to include only the centrally located DNS DCs. If you use the DNSCMD command, , only the centrally located DNS DCs add their respective NS records to the Active Directory domain zone.




If you want to choose which DNS server does not add NS records corresponding to themselves to any Active Directory-integrated DNS zone, use Registry Editor (REGEDT32.EXE) to configure the following registry value on each affected DNS server:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters Registry value: DisableNSRecordsAutoCreation Data type: REG_DWORD Data range: 0x0 | 0x1 Default value: 0x0

This value affects all Active Directory-integrated DNS zones. The values have the following meanings:

With a value of 0, DNS server automatically creates NS records for all Active Directory-integrated DNS zones unless any zone that is hosted by the server contains the AllowNSRecordsAutoCreation attribute. This is the default value.

With a value of 1, the DNS server does not automatically create NS records for all Active Directory-integrated DNS zones, regardless of the AllowNSRecordsAutoCreation configuration in the Active Directory-integrated DNS zones.


Section 2: Aging and Scavenging

Planning ScavengingParametersRecord LifespanScavenging AlgorithmConsiderations

Section 2: Aging and Scavenging

Introduction In this lesson, we will review aging and scavenging of DNS records. Aging and scavenging allows for automatic removal of stale resource records.


• Plan to implement aging and scavenging of resource records • Understand parameters for configuring aging and scavenging • Consider the implications and possible problems of aging and scavenging

Related Topics Covered in This Lesson • Active Directory Replication




Aging and Scavenging

Prevents stale records from “corrupting” DNS dataMay be set on per server, per zone or per record basisSet using DNS MMC or DNSCMD.EXE

Aging and Scavenging

With dynamic update, records are automatically added to the zone when computers and domain controllers are added. However, in some cases, they are not automatically deleted. For example, if a computer registers its own A resource record and is improperly disconnected from the network, the A resource record might not be deleted. If your network has many mobile users, this can happen frequently.

Having many stale resource records presents a few different problems. Stale resource records take up space on the server, and a server might use a stale resource record to answer a query. As a result, DNS server performance suffers.

To solve these problems, the Windows 2003 DNS server can “scavenge” stale records; that is, it can search the database for records that have aged and delete them. Administrators can control aging and scavenging by specifying the following:

• Which servers can scavenge zones • Which zones can be scavenged • Which records must be scavenged if they become stale The DNS server uses an algorithm to ensure it does not accidentally scavenge a record that must remain, provided that you configure all the parameters correctly. By default, the scavenging feature is off.

Caution: By default, the scavenging mechanism is disabled. Do not enable it unless you are absolutely certain that you understand all the parameters. Otherwise, you might


accidentally configure the server to delete records that it should retain. If a name is accidentally deleted, not only do users fail to resolve queries for that name, but also, any user can create that name and then take ownership of it, even on zones configured for secure dynamic update.

You can manually enable or disable aging and scavenging on a per-server, per-zone, or per-record basis. You can also enable aging for sets of records by using the command line tool DNSCMD.EXE. (For information about DNSCMD.EXE, see Windows 2003 Support Tools Help. For information about installing and using the Windows 2003 Support Tools and Support Tools Help, see the file Sreadme.doc in the directory \Support\Tools on the Windows 2003 operating system CD.) Keep in mind that if you enable scavenging on a record that is not a dynamic update record, the record will be deleted if it is not periodically refreshed, and you must recreate the record if it is still needed.

If scavenging is disabled on a standard zone and you enable scavenging, the server does not scavenge records that existed before you enabled scavenging. The server does not scavenge those records even if you convert the zone to an Active Directory–integrated zone first. To enable scavenging of such records, use the AgeAllRecords in DNSCMD.EXE.




Planning Scavenging

Which servers?Which zones?

AD IntegratedStandard primaryServer does not scavenge records that existed before scavenging turned on

Which records?If you enable scavenging on a record that is not dynamically updated, the record will be deleted when it is not periodically refreshedAt that point, any computer may register the record

Planning Scavenging

In order to determine which servers, zone, and records will be aged and scavenged, it is important to look at several key factors.

When Aging and Scavenging is set on a server, zone or record, a timestamp is written with the record when it is written or refreshed. Assuming the zone is AD-integrated, the record and the timestamp is then replicated to all other DCs. If the zone is non-AD integrated, the timestamp is written to the zone file.

Note: This does not affect replication with secondary replication partners. However, the format of the zone file does change to accommodate the timestamp. This change alters the format in such a way that the zone file cannot be copied to a non-Windows–based secondary.


Aging and Scavenging Parameters

Server properties (default)Enable scavenging on stale recordsScavenging intervalNo Refresh interval (default setting for all zones)Refresh interval (default setting for all zones)

Zone propertiesCan be defined separately, however, inherits default settings on serverEnabling on the zone allows timestamp to be written, but record is not scavenged unless enabled on server

Aging and Scavenging Parameters

Scavenging can be enabled on DNS Servers and zones to remove stale resource records. Intervals define how long a record will be retained in DNS without being dynamically updated.

Enabling aging and scavenging on the zone, without enabling on the server, will cause a timestamp to be written when the record is created or updated. However, records will not be scavenged when aging and scavenging are enabled at the server level. This can be a valuable technique for timestamping all records in preparation for scavenging, but without actually removing any records.




Record Life Span Record Life Span

Each record is compared to current server time on the basis of the following sum to determine whether the record should be removed:

Record time stamp + No-refresh interval for zone + Refresh interval for zone

If the value of this sum is greater than current server time, no action is taken and the record continues to age in the zone.

If value of this sum is less than current server time, the record is deleted both from any zone data currently loaded in server memory and also from the applicable DnsZone object store in Active Directory.


Scavenging Algorithm

When scavenging is enabled, zones will be evaluated automatically provided the follow conditions are met:

Scavenging on the server is enabledScavenging on the zone is enabledDynamic update is enabled on the zone

Scavenging period begins again when:Parameter is enabledZone is loadedZone is resumed

Scavenging Algorithm

• StartScavenging is equal to the time that one of the preceding events occurs plus the amount of time specified in the refresh interval for the zone. This prevents a problem that can occur if the client is unable to refresh records because the zone isn’t available—for example, if the zone is paused or the server is not working. If that happens and the server does not use StartScavenging, the server could scavenge the zone before the client has a chance to update the record.

• When the server scavenges a zone, it examines all the records in the zone one by one. If the timestamp is not zero, and the current time is later than the time specified in the timestamp for the record plus the no-refresh and refresh intervals for the zone, it deletes the record. All other records are unaffected by the scavenging procedure.




Scavenging Considerations

How often are records being refreshed?Refresh interval > refresh periodServices like Netlogon, DHCP

Impact on replication traffic

Scavenging Considerations

To ensure that no records are deleted before the dynamic update client has time to refresh them, the refresh interval must be greater than the refresh period for each record subjected to scavenging within a zone.

Many different services might refresh records at different intervals; for example, Netlogon refreshes records once an hour, cluster servers generally refresh records every 15 to 20 minutes, DHCP servers refresh records at renewal of IP address leases, and Windows 2003–based computers refresh their A and PTR (pointer) resource records every 24 hours.

Usually, the DHCP service requires the longest refresh interval of all services. If you are using the Windows 2003 DHCP service, you can use the default scavenging and aging values. If you are using another DHCP server, you might need to modify the defaults.

The longer you make the no-refresh and refresh intervals, the longer stale records remain. Therefore, you might want to make those intervals as short as is reasonable. However, if you make the no-refresh interval too short, you might cause unnecessary replication by Active Directory.


Section 3: DNS and Application Partitions

Creation of DNS partitionsPartition creation and enlistmentDelays with populating zone dataDNS partition replication scope

Section 3: DNS and Application Partitions

Introduction This lesson introduces the domain naming service (DNS) application partitions that may be created on Windows Server 2003.

Objectives After completing this lesson, you will be able to perform the following.

• Explain the purpose of the new default DNS application partition. • Describe how and under what circumstances the default DNS application partitions

are created.




DNS Application Partitions

Limitations with Microsoft Windows 2000 storage of DNS zones Uses domain partitionReplicates to all domain controllers in domain and not domain controllers in child domainReplicates to the global catalog

DNS storage in application partitionsForest-wide replication (ForestDNSZones)Domain-wide replication (DomainDNSZones)Custom replication scopeDoes not replicate to the global catalog server

Changing the scope of a zoneZones vs. partitionsPartition removal

DNS Application Partitions

Windows 2000 Limitations Windows 2000 can use Active Directory to store DNS zone information. The scope of replication for DNS data in Windows 2000 Active Directory is the domain partition. This has the following three major limitations.

The replication scope of the DNS zone data is the domain. If a DNS server is located on a child domain DC, zone data in the parent domain, including the _MSDCS zone, cannot be replicated using Active Directory Replication.

The DNS zone data is replicated to all DCs in the domain whether or not all DCs have the DNS Server service installed. This creates additional unnecessary data replication traffic.

Zone data is replicated to the global catalog. Although the zone data in the global catalog is never actually used, the Windows 2000 mechanism replicates zone data to the global catalog.

Windows Server 2003 addresses all of these issues by moving the DNS zone data to application partitions. By default, Windows Server 2003 creates two application partitions for replicating zone data when the first Windows Server 2003 domain controller is added to a forest or any domain in a forest. These application partitions are ForestDNSZones and DomainDNSZones.


ForestDNSZones The ForestDNSZones application partition is created on the first DNS server (Windows Server 2003 DC) in the forest. Whenever a DNS server is installed in the forest, it automatically joins the replication scope of the ForestDNSZones partition. This process occurs in the background when the DNS service starts for the first time.

This means that all domain controllers/DNS servers in a forest will replicate the contents of the ForestDNSZones partition. For this reason, the number of zones hosted should be limited.

By default, DNS creates the _msdcs.<forestroot> in the ForestDNSZones partition. This zone contains CNAME records for all of the domain controllers in the domain. The domain controllers use the CNAME records to find other domain controllers for the purpose of replicating Active Directory data. The forest-wide replication of this zone helps ensure that all domain controllers will be able to locate other domain controllers when replicating Active Directory data.

Location: The forest DNS zone is located in MicrosoftDNS.ForestDnsZones.<forestrootroot>.

Permissions: Forest Root Domain Administrators

DomainDNSZones The DomainDNSZones application partition is created on the first DNS server in the forest along with ForestDNSZones. A new domain-wide partition is also created in each child domain when a Windows Server 2003 DNS server is brought up for the first time in a child domain. Just as with the forest-wide partitions, whenever a DNS server is installed in the domain, it automatically joins the replication scope of the DomainDNSZones partition. This process appears seamless to the administrator and occurs in the background when the DNS service starts for the first time.

Location: The domain DNS zone is located in MicrosoftDNS.DomainDnsZones.<domainroot>

Permissions: Domain Administrators and Enterprise Administrators

Custom Replication Scope It is also possible to configure a custom replication scope for DNS zones. To create custom replication scope, first create a custom application partition and include the desired DCs in the replication scope (discussed in detail later in this lesson in “Manually Creating Default DNS Partitions”).

Once the new application partition is created, the administrator can designate the partition as the storage location for the zone using the DNS management snap-in. (See section below.)




Changing the Replication Scope of a Zone An administrator can choose the replication scope of the zones using the DNS snap-in (pictured below). This is where an administrator can choose forest-wide, domain-wide, Windows 2000 style and custom scope. In the example, a pre-created application partition exists in the forest and has been selected as the custom scope.

Change Zone Replication Scope Dialog Box

Windows 2000 Style Zones Windows 2000 stores its DNS zones in the domain partition. When a zone is moved from a Windows 2000 style zone to an application partition, the records are recreated in the application partition and then deleted from the domain partition. The space will not be completely reclaimed until the records are “garbage collected.”

Warning: Because the zone data is deleted from the domain partition, any remaining Windows 2000 DNS servers will no longer host the zone. Ensure that all Windows 2000 DNS servers are no longer hosting a zone before changing the replication scope.

Zones vs. Partitions It is important to distinguish between the partition, which is the storage location, and the zone, which is the data stored in the partition. When a zone is moved out of one scope into another, the data stored in the partition is moved, not the partition itself. Additionally, the DC is not removed from the replica set. Changing a zone only affects the zone, not the replication or membership of the partitions. Think of the partition as the replicated container and the zone as the content that is being replicated.


For example, suppose that an administrator at a remote location decides to reduce DNS replication traffic caused by zones being updated in the ForestDNSZones partition. The intention is to remove the local domain controller from the replication forest-wide zones. The administrator opens the snap-in and changes the zone type to a custom scope of the DCs at the site.

The actual effect of this is to move the zones out of the forest-wide partition for all DCs in the forest. The partition still exists and is still replicating to the DCs, as the partition has been removed from all the other DNS servers.

Note: It is possible to remove a DNS server from the domain or forest-wide replication of DNS partitions with NTDSUTIL. Using domain management, remove the NC replica. This is discussed later in the topic “Manually Creating Default DNS Partitions.”

Removing DNS Does Not De-enlist the Domain Controller When DNS is removed, the DC does not stop participating in the replication of the DNS partitions. If desired, the administrator must manually remove partitions using NTDSUTIL.




DNS Application Partition Creation and Enlistment

DNS Application Partition Creation and Enlistment

First Domain Controller in the Forest The default DNS application partitions are created on the first domain controller in the forest (running DNS) and on the first domain controller in each domain. The following process occurs on the first domain controller.

During DCPROMO, the DNS servers are tested to determine whether the zone for the domain exists and dynamic updates have been accepted. If the zone exists with updates accepted, DCPROMO reports success and continues. If not, perform step 2.

DCPROMO prompts the user to perform the following:

Address the problem and try again.

Install and configure DNS on this domain controller.

Ignore the error and continue with promotion.

If the user performs step 2 b, DNS is installed at the end of DCPROMO.

The domain zone is created as a standard primary zone, which is not integrated with Active Directory. The zone is stored in a standard DNS zone file.

The _msdsc.<forest root> zone is created as a standard primary zone, not integrated with Active Directory. The zone is stored in a standard DNS zone file.


The __msdsc.<forest root> is added as a delegated zone under the forest root with the local domain controller name listed as the name server. The name server format is not an FQDN, and includes a dot at the end (for example, mydc.). At this point, the domain controller has not rebooted and Active Directory has not been started, so the application partitions have not been created.

Both zones are flagged for conversion from zone file storage to application partition storage.

Reverse lookup zones are not created.

The domain controller reboots and Active Directory starts for the first time.

The DNS service starts and determines whether the ForestDNSZones and DomainDNSZones exist.

If these zones do not exist, then create forestDNSZones with the name of ForestDnsZones.<ForestRoot>, and create DomainDNSZones with the name of DomainDnsZones.<DomainRoot>.

Next, the DC/DNS server will add itself to the zones (enlist in the replication of the zones) by adding itself to the cross-ref attribute. This step is performed using Local System credentials (Local System is a member of Enterprise Domain Controllers).

For upgrades, the forward lookup zone for the domain remains unchanged (whether Active Directory integrated or Standard Primary). Only the partitions are created in these steps.

For new installations with DNS configured by DCPromo, the DNS service automatically moves the zones to the new application partitions. (See the following section on Zone Conversion.)

The same process occurs for any additional DCs in the forest. However, the process will abort if the partitions are found to exist already. Zones will then be added to the existing partitions.

Note: Both DNS partitions are created, even if the first DC is pointing to another DNS server. Although the partitions are created, they are not populated and do not contain any zones.

Zone Conversion On the first domain controller in the forest, with DNS configured by DCPROMO, the domain zone and delegated _msdcs zone are initially stored in DNS zone files. After the first reboot, the application partitions can be created and the zones moved from zone files to become zones integrated with Active Directory. The following process moves the zones after the first reboot.




The zone is set to be moved to the target application partitions by writing the value “DcPromoConvert” dword=1 or 2. A value of 1 will move the zone to the domain partition, and a value of 2 will move the zone to the forest partition.

Zones “DcPromoConvert” added to the registry information are moved to the specified partition when the DNS server restarts.

During the DNS service restart, the values” DcPromoConvert” and “DatabaseFile” will be removed, and the values “DirectoryPartition=<partition name>” and “DsIntegrated=1” will be added.

Installation of DNS Service on Domain Controllers The DNS service can be added after the domain controller is up and running. The following describes the enlistment behavior after installing DNS.

The DNS service checks for the presence of a DNS zone integrated with Active Directory as well as the DNS partitions (ForestDNSZones and DomainDNSZones).

If any of these zones exist, the DNS services will enlist in these zones.

For the Active Directory-integrated zone, the zone information is already in the domain partition and is just added to the DNS server’s list of zones.

For the default partitions, the DNS server uses the local system account and adds itself to the replica set for both partitions. The zone data is not available to DNS until a replication link is constructed and the zone information is replicated. An additional delay occurs as the DNS service periodically polls the data store and adds the newly-found zones.

If the partitions do not exist, the DNS server will create them and join the replica set. If, for some reason, the partitions are not automatically created (for example, the domain-naming master is offline or running Windows 2003), the administrator can manually create the partitions using DNSMGMT.MSC, NTDSUTIL, or DNSCMD.EXE. (For more details, refer to the section “Manually Creating Default DNS Partitions”.)


Delays Associated with Populating Zone Data

DNS service on Domain controller joins replica setAddition to replica set is written to partitionKCC builds inbound connection objects and replication linksData in application partitions is replicated (subject to inter-site replication schedules)DNS service polls updates from Active Directory at periodic intervalsManually creating default DNS partitions

Delays Associated with Populating Zone Data

A number of steps are involved in the process of a DNS server joining a replica set and making the zone’s records available to clients. Each step has an associated delay before the zone is fully functional or populated. The following steps are associated with a DNS joining an existing replica set.

DNS Service joins the domain controller to the replica set. When the DNS service starts, it determines whether the DC is a member of the forest and domain zones. If not, it adds the DC to both zones.

The addition to the replica set is written to the partition.

KCC adds inbound connection objects and replication links. The KCC is responsible for building the replication topology for all partitions hosted on DCs. The KCC runs every 15 minutes. Once the KCC runs, inbound connection objects and replication links are built. (Hint: the REPADMIN/KCC command may be used to start the KCC manually.)

Data in the application partition is replicated (subject to inter-site schedule). After the replication links are built, the data needs to be replicated. If the DC’s replication partner is at the same site, replication will start almost immediately. If the DC’s partner is at a different site, replication will begin at the next inter-site replication interval defined on the site-link or connection object.




The DNS Server service periodically polls the data store and pulls updates. When the service starts, it loads all DNS zones stored in the application directory partitions that are locally available. After the DNS server has started and loaded the zone from the application directory partitions, the server continues to pull updates from Active Directory once every five minutes. The DNS server also searches for the new zones in the application directory partitions every time it pulls updates from Active Directory.

Finally, DNS records become available to clients on completing these steps.

Manually Creating Default DNS Partitions To create the DNS application partitions, an administrator can right-click the DNS server listed in the DNS Manager and chooses the option “Create Default Application Directory Partitions.” Alternatively, an administrator can create these partitions by using the DNSCMD.EXE support tool as follows:

dnscmd ServerName /CreateBuiltinDirectoryPartitions {/Domain|/Forest|/AllDomains}

Value Description

DNSCMD Specifies the name of the command-line tool.

ServerName Required. Specifies the DNS host name of the DNS server. You can also type the IP address of the DNS server. To specify the DNS server on the local computer, you can also type a period (.).

/CreateBuiltinDirectoryPartitions Required. Creates a default application directory partition.

{/Domain|/Forest|/AllDomains} Required. Specifies which default application directory partition to create. Do one of the following:

To create a default domain-wide DNS application directory partition for the Active Directory domain where the specified DNS server is located, type /DOMAIN.

To create a default forest-wide DNS application directory partition for the Active Directory forest where the specified DNS server is located, type /FOREST.


To create a default domain-wide DNS application directory partition on a DNS server in each domain in the Active Directory forest where the user running this command is logged on, type /ALLDOMAINS.

The ServerName parameter is ignored for /ALLDOMAINS. The computer on which this command is run must be joined to a domain in the forest where you want to create all of the default domain-wide application directory partitions.




DNS Partition Replication Scope

A.COMA.COM

B.A.COB.A.COMM

Domain A.COM (and Windows 2000 Active Directory Integrated Zones)Domain B.A.COM (and Windows 2000 Active Directory Integrated Zones)

Schema and Config (combined)ForestDNSZonesDomainDNSZones (a.com)DomainDNSZones (b.a.com)

Global Catalog and DNS

Link for global catalogfrom

b.a.comand DNSrecords

DNS

DNS

DC1DC1--AA

DC1DC1--BBDC2DC2--BB

DC3DC3--BB

DC2DC2--AA DC3DC3--AA

DNS Partition Replication Scope

Replication Scope Example The graphic above shows the replicate scope of the normal partitions, as well as the application partitions used for DNS zone data.

Schema and Configuration Partitions The schema and configuration partitions are replicated to all domain controllers in the forest. For simplicity, both partitions are represented in the graphic by a single replication link. However, they actually have separate replication links.

Domain Partition Just as with Windows 2000, the domain partition is only replicated to domain controllers within the domain and Global Catalog servers. In this example, DC3-B has a one-way replicate link to the only global catalog in the domain, which is located on domain controller DC3-A. Note that the Windows 2000 style of DNS storage is in the domain partition. In this case, any zones stored in A.COM are replicated to all three DCs. Likewise Windows 2000–style Active Directory–integrated zones are stored on all DCs for B.A.COM as well as replicated to the global catalog server.


Windows Server 2003 DNS Application Partitions and Replication Scope When DNS is installed on the first domain controller in the forest, either through DCPROMO or afterwards, two partitions are created and hosted locally.

ForestDNSZones This is the partition that will be replicated to all DNS servers (running on domain controllers) in the forest. In the example above, this partition is replicated to DC2-A, DC3-A, DC1-B and DC2-B, as these are all the DNS servers in the forest. The default zone stored in this partition is _msdcs.<forestroot> as it is required by all DCs for locating replication partners. Because this partition is replicated to all DNS servers (running on DCs) in the forest, care should be taken when deciding to place a zone in this partition. Zones with a high rate of change or zones that are not needed forest-wide should not be located in this partition.

DomainDNSZones This is the partition that will be replicated to all DNS servers (running on domain controllers) in the domain. An additional application partition gets created for each domain when the first DNS server is installed on a DC in each domain. In the example, the domain DomainDNSZone partition for A.COM is hosted and replicated only on DC2-A and DC3-A. In B.A.Com, the partition is hosted and replicated only on DC1-B and DC2-B, because these are the only two DCs that actually need the zone data. The default for Windows Server 2003 is to host the zone for the current domain in this partition.




Section 4: DNS and Tools and Troubleshooting

ToolsDebug loggingEvent Logging

Section 4: DNS Tools and Troubleshooting

Introduction DNS troubleshooting requires an understanding of the technology, networking, and the interaction with Active Directory. A number of tools are available for testing and confirming the DNS infrastructure.

Objectives After completing this section, you will be able to use various tools to troubleshoot DNS issues.


DNS Troubleshooting

IPCONFIGPINGNSLOOKUPNETDIAGDNSCMDDNS Event LogDNS Debug LoggingDNSLINTDCDIAG

DNS Troubleshooting

A number of tools are available for troubleshooting DNS. Simple configuration and network connectivity are usually tested first, using IPCONFIG and PING, progressing to more advanced diagnostics like DNSLINT and DCDIAG.

These tools are addressed in the following sections.




IPCONFIG and PING

IPCONFIG provides basic configuration info/all to see client parameters/flushdns to clear cache/registersdns to update records in DNS

PING can test Layer 2 connectivityPING by addressPING by namePING –a (reverse lookup)

IPCONFIG and PING

IPCONFIG is used to provide basic configuration data such as IP Address, default gateway, DNS server, etc. Use IPCONFIG/REGISTERDNS to force the re-registration of A and PTR records.

PING is used to test simple network connectivity.


NSLOOKUP

NSLOOKUPProvides query testing of DNS serversVerify DNS registration for DC

_ldap._tcp.dc_msdcs.<AD Domain>Simple Tasks with NSLOOKUP

Non-interactive ModeInteractive Mode

NSLOOKUP

NSLOOKUP is a tool for testing and troubleshooting DNS servers through query testing. NSLOOKUP provides two modes—non-interactive and interactive.

Non-interactive Mode is used to view a single piece of data, for example:

nslookup <opt> <name> <server>

EG nslookup ls -t srv domain.com

-type=<record type>

(recordtype=MX, A, NS, SOA, etc).

Interactive Mode allows for returning series of records and can test elements such as zone transfers. Run NSLookup with no parameters to start interactive mode.

SET ALL will show current defaults

ls -d <domain name> to simulate a zone Xfer

SET will allow to pre-program options




NETDIAG

NETDIAG /debug (run on DCs)NETDIAG /test:DNSNETDIAG /Fix

NETDIAG is a Resource Kit command line utility. From a command line prompt type the commands below in the directory where NETDIAG lives.

See Q219289

NETDIAG

If the computer is a domain controller, NETDIAG verifies all the DNS entries in the Netlogon.DNS file to determine if they are correct and updates the appropriate entries if there is a problem.


DNSCMD

DNSCMD is a command-line tool designed to assist local and remote administration of the DNS environment.Allows the administrator to view the configuration parameters of DNS servers, zones, and resource records.

DNSCMD

DNSCMD is a command line tool found in the Windows Support Tools for managing DNS servers. It can be used to script batch files, change configuration parameters, and view server, zone, and resource record information.

Examples of management and troubleshooting commands with DNSCMD follow. For more information on any of the commands, type DNSCMD /? at the command prompt or view the Windows Support Tools help files.

DNSCMD Commands:

• Dnscmd ageallrecords • Dnscmd clearcache • Dnscmd config • Dnscmd createbuiltindirectorypartitions • Dnscmd createdirectorypartition • Dnscmd deletedirectorypartition • Dnscmd directorypartitioninfo • Dnscmd enlistdirectorypartition • Dnscmd enumdirectorypartitions • Dnscmd enumrecords




• Dnscmd enumzones • Dnscmd info • Dnscmd nodelete • Dnscmd recordadd • Dnscmd recorddelete • Dnscmd resetforwarders • Dnscmd resetlistenaddresses • Dnscmd startscavenging • Dnscmd statistics • Dnscmd unenlistdirectorypartition • Dnscmd writebackfiles • Dnscmd zoneadd • Dnscmd zonechangedirectorypartition • Dnscmd zonedelete • Dnscmd zoneexport • Dnscmd zoneinfo • Dnscmd zonepause • Dnscmd zoneprint • Dnscmd zoneresettype • Dnscmd zonerefresh • Dnscmd zonereload • Dnscmd zoneresetmasters • Dnscmd zoneresetscavengeservers • Dnscmd zoneresetsecondaries • Dnscmd zoneresume • Dnscmd zoneupdatefromds • Dnscmd zonewriteback


DNS Event Log

Event Viewer can be an excellent tool to begin Troubleshooting

DNS Event Log

The DNS Event Log is used to check for errors in DNS as well as for monitoring and troubleshooting.

Some critical errors to review include:

EventID Description

140 The DNS server could not initialize the Remote Procedure Call (RPC) service. If it is not running, start the RPC service or reboot the computer. For specific error code, see the Record Data page on the Event Viewer.

In order for DNS to run, the Remote Procedure Call (RPC) service must be running on the DNS server.

Verify that the Remote Procedure Call (RPC) service has been started.

Open Administrative Tools, and double-click Services.

If the service has been started, try restarting the server.

If the error continues, remove and reinstall the RPC Configuration service by using the Services tab network connection in Network and Dial-up Connections in Control Panel.




403 The DNS server could not create a Transmission Control Protocol (TCP) socket. Restart the DNS server or reboot the computer. For the specific error code, see the Record Data page.

The Wsock32.dll might be incompatible with a third-party TCP/IP stack. This problem can also occur if the TCP/IP protocol is not bound to the network adapter.

If you are using a third-party TCP/IP protocol, verify that the protocol is compatible with the Wsock32.dll.

Check the bindings of the protocol stack. It is a good idea to have TCP/IP bound at the top of the stack. If the error continues, remove and reinstall the TCP/IP protocol, and then try again.

Open Control Panel, and then double-click Network and Dial-up Connections.

Right-click the connection, and then click Properties.

Verify that the bindings for all protocols to network adapters are enabled and that no broken connections exist in the stack.

407 DNS server could not bind the main datagram socket. The data is the error.

This error can occur if there is a mismatch between the configured IP address in the Advanced IP Addressing dialog box and the addresses listed in the Server Properties dialog box for the DNS server. This problem can also occur if the TCP/IP protocol is not bound to the network adapter.

Verify that the TCP/IP addresses configured in the Advanced IP Addressing dialog box match those configured in the Server Properties dialog box in DNS Manager:

Open Control Panel, and double-click Network.

Click the Protocols tab, and click TCP/IP Protocol in the Network Protocols list.

Click Properties, and then click Advanced.

Match the IP addresses to those displayed in the DNS server Properties dialog box:

In DNS Manager, right-click the DNS server name, and then click Properties.


Compare the IP addresses with those from the Advanced IP Addressing dialog box. If there are no IP addresses configured in the Advanced IP Addressing dialog box or on the Interfaces tab of the Server Properties dialog box, enter the IP address of your network adapter. Use the –IPCONFIG-ALL command to obtain your IP address.

Check the binding of the TCP/IP protocol to the network adapter:

Open Control Panel, and double-click Network.

Click the Bindings tab.

Verify that the bindings for all protocols to network adapters are enabled and that no broken connections exist in the stack.

408 DNS server could not open socket for address [IP address of server].

The DNS server could not open a socket with the current TCP/IP and DNS service configurations.

Verify that this is a valid IP address on this machine.

If the IP is not valid:

• Use the Interfaces dialog under Server Properties in the DNS Manager to remove it from the list of IP interfaces.

• Stop and restart the DNS server. (If this was the only IP interface on this machine, the DNS server may not have started as a result of this error. In that case, remove the DNS\PARAMETERS\LISTENADDRESS value in the services section of the registry and restart.)

If the IP is valid:

• Verify that no other application (for example, another DNS server) is running that would attempt to use the DNS port.

4000 The DNS server was unable to open Active Directory.

The DNS server is configured to obtain and use information from the directory for this zone and is unable to load the zone without it.

Check that Active Directory is functioning properly and reload the zone.

4001 The DNS server was unable to open zone domain name in Active Directory. This DNS server is configured to obtain and use information from the directory for this zone and is unable to load the zone without it.




Check that Active Directory is functioning properly and reload the zone.

4004 The DNS server is configured to use information obtained from Active Directory for this zone and is unable to load the zone without it.

Check that Active Directory is functioning properly and repeat enumeration of the zone.

4007 The DNS server was unable to open zone <zone name> in Active Directory from the application directory partition <partition name>. This DNS server is configured to obtain and use information from the directory for this zone and is unable to load the zone without it. Check that Active Directory is functioning properly and reload the zone. The event data is the error code.

4016 The DNS server timed out attempting an Active Directory service operation on <distinguished name>. Check Active Directory to see that it is functioning properly. The event data contains the error.

For more information on DNS Event log messages, see the following knowledge base articles:

DNS event messages 1 through 1614 in Windows Server 2003

http://support.microsoft.com/kb/884114/

DNS event messages 1616 through 6702 in Windows Server 2003

http://support.microsoft.com/kb/842006/


Event and Debug Logging

Event and Debug Logging Tabs

The GUI has been updated to make it much easier to configure DNS logging for troubleshooting purposes.

Enable filtering based on the IP address To provide additional filtering of the packets to be logged (i.e., those packets that are sent from some specific IP addresses to the DNS server or from the DNS server to some specific IP addresses), check the Filter packets by IP address checkbox and press the Filter button to display the Filter dialogbox. If this checkbox is unchecked, the DNS server will log all packets regardless of the IP address.

Event Logging Tab Administrators can control the level of event logs for Windows DNS. This will make it easier to keep unnecessary “warnings” from appearing in the event log. In addition, it makes it possible to suppress all DNS errors if the administrator expects a long duration of known events to occur.




DNSLINT

Download from Microsoft.comCommand line utilityCan run numerous tests, including verifying AD replication records are in place and tracing lame delegationCreates reports of results in html formatReview documentation downloaded with tool, or kb 321045

DNSLINT

DNSLINT is a Microsoft Windows utility that helps to diagnose common DNS name resolution issues. DNSLINT aids in verifying DNS records and provides an output in HTML format.

• DNSLINT /d diagnoses potential causes of "lame delegation" and other related DNS problems.

• DNSLINT /ad verifies DNS records specifically used for Active Directory replication.

DNSLINT /ad /s Running this command will verify that the records required for Active Directory replication are present on all DNS servers hosting _msdcs.<forestrootdnsdomain>.com. Recall that all domain controllers register a GUID CNAME record that must be resolvable for replication to occur. This command will enumerate all DCs in your forest by querying the DC you specify, then contacting a DNS server that is authoritative for the _msdcs.<forestrootdnsdomain>.com to ensure that for each DC, the CNAME can be resolved to both the FQDN and the corresponding IP address.

DNSLint Report

System Date: Tue Oct 31 16:26:26 2006


Command run:

dnslint /ad <DC IP ADDR> /s <DNS SERVER IP>

Root of Active Directory Forest:

contoso.com

Active Directory Forest Replication GUIDs Found:

DC: rootdc

GUID: 5e50847e-247d-4977-841a-e6fcd80462e9

Total GUIDs found: 1

--------------------------------------------------------------------------------

The following 1 DNS servers were checked for records related to AD forest replication:

DNS server: User Specified DNS Server

IP Address: 11.1.1.10

UDP port 53 responding to queries: YES

TCP port 53 responding to queries: Not tested

Answering authoritatively for domain: Unknown

SOA record data from server:

Authoritative name server: rootdc.contoso.com

Hostmaster: hostmaster.contoso.com

Zone serial number: 23




Zone expires in: 1.00 day(s)

Refresh period: 900 seconds

Retry delay: 600 seconds

Default (minimum) TTL: 3600 seconds

Additional authoritative (NS) records from server:

rootdc1.contoso.com 11.1.1.10

Alias (CNAME) and glue (A) records for forest GUIDs from server:

CNAME: 5e50847e-247d-4977-841a-e6fcd80462e9._msdcs.contoso.com

Alias: rootdc.contoso.com

Glue: 11.1.1.10

Total number of CNAME records found on this server: 1

Total number of CNAME records missing on this server: 0

Total number of glue (A) records this server could not find: 0

DNSLINT /D CONTOSO.COM Running DNSLINT /D CONTOSO.COM will verify the DNS records for the domain called contoso.com. DNSLINT will connect to www.internic.net and determine the IP addresses of the DNS servers that are supposed to be authoritative for contoso.com. It will then contact each DNS server in the list and document the various DNS records that each server has regarding the domain. It adds new authoritative DNS servers to the list as they are found, and queries them accordingly.

After DNSLINT has collected all of the DNS record data, it processes the data and generates a report in HTML format. The default name of the report is dnslint.htm and is created in the current directory from where DNSLINT was executed. The user can specify the name and location of the report.


DCDIAG /test:DNS

Powerful command to verify overall health of DNS with respect to Active DirectoryCommands can be run individually against a single domain controller, or against all domain controllers in the enterpriseVerify domain controller registration of A, SRV and CNAME recordsVerify DNS Server configuration (forwarders, delegations, root hints, etc.)Gathers basic DNS resolver configuration data like MAC address, IP address, default gateway

DCDIAG /test:DNS

The Windows Server 2003 SP1 version of DCDIAG includes several DNS-related tests which can be run individually, or all at once. These tests may be performed on one or all DCs in an enterprise. When the tests have completed, DCDIAG presents a summary of the results, along with detailed information for each DC tested. The data below contains the configurations which may trigger DCDIAG to report warnings or errors for each of the DNS sub-tests:

Connectivity Test • Mandatory test which executes automatically before any other DCDIAG test is

executed. • Determines whether DCs are registered in DNS, can be pinged, and have LDAP/RPC

connectivity. • If the connectivity test fails on a given DC, no other tests are run against that domain

controller.

Basic DNS Test (/DnsBasic) • Confirms that the following essential services are running and available on Domain

Controllers tested by DCDIAG: • DNS client service

• NETLOGON service

• KDC service




• DNS Server Service (if DNS is installed on the DC).

• Confirms network connectivity for each DC by verifying that DNS servers on all adapters are reachable.

• Confirms that the A record of each DC is registered on at least one of the DNS servers configured on the client.

• If a Domain Controller is running the DNS Server service, confirms that the Active Directory domain zone and SOA record for the Active Directory domain zone are present.

• Checks if the root (.) zone is present.

Forwarder test (/DNSFORWARDERS) • Only runs if the DC being tested is running the Microsoft DNS Server service. • Determines whether recursion is enabled. • If forwarders or root hints are configured, the forwarder test confirms that all

forwarders or root hints on the DNS server are functioning, and also confirms that the _ldap._tcp.<Forest root domain> DC Locator record is resolved. (Resolution of the _ldap_tcp.<Forest root domain> DC Locator record is not attempted for forwarders or root hints configured on the forest root DC.)

Delegation test (/DNSDELEGATION) • Only runs if the DC being tested is running the Microsoft DNS Server service. • Confirms that the delegated name server is a functioning DNS Server. • Checks for broken delegations by ensuring that all NS records in the Active Directory

domain zone in which the target DC resides have corresponding glue A records.

Dynamic Update Test (/DNSDYNAMICUPDATE) • Confirms that the Active Directory domain zone is configured for secure dynamic

updates and performs registration of a test record (_dcdiag_test_record). (The test record is subsequently deleted.)

Record Registration Test (/DNSRECORDREGISTRATION) • The record registration test tests the registration of all essential DC Locator records

on all DNS Servers configured on each adapter of the DCs: CNAME GUID, A, LDAP SRV, GC SRV, PDC SRV.

Sample Output Domain Controller Diagnosis

Performing initial setup:

* Verifying that the local machine rootdc, is a DC.

* Connecting to directory service on server rootdc.


* Collecting site info.

* Identifying all servers.

* Identifying all NC cross-refs.

* Found 1 DC(s). Testing 1 of them.

Done gathering initial info.

Doing initial required tests

Testing server: Default-First-Site-Name\rootdc

Starting test: Connectivity

* Active Directory LDAP Services Check

* Active Directory RPC Services Check

......................... rootdc passed test Connectivity

Doing primary tests

Testing server: Default-First-Site-Name\rootdc

Test omitted by user request: Replications

Test omitted by user request: Topology

Test omitted by user request: CutoffServers

Test omitted by user request: NCSecDesc

Test omitted by user request: NetLogons

Test omitted by user request: Advertising

Test omitted by user request: KnowsOfRoleHolders

Test omitted by user request: RidManager

Test omitted by user request: MachineAccount

Test omitted by user request: Services

Test omitted by user request: OutboundSecureChannels

Test omitted by user request: ObjectsReplicated

Test omitted by user request: frssysvol




Test omitted by user request: frsevent

Test omitted by user request: kccevent

Test omitted by user request: systemlog

Test omitted by user request: VerifyReplicas

Test omitted by user request: VerifyReferences

Test omitted by user request: VerifyEnterpriseReferences

Test omitted by user request: CheckSecurityError

DNS Tests are running and not hung. Please wait a few minutes...

Running partition tests on : ForestDnsZones

Test omitted by user request: CrossRefValidation

Test omitted by user request: CheckSDRefDom

Running partition tests on : DomainDnsZones



Running partition tests on : Schema



Running partition tests on : Configuration



Running enterprise tests on : contoso.com

Test omitted by user request: Intersite

Test omitted by user request: FsmoCheck

Starting test: DNS

Test results for domain controllers:


DC: rootdc.contoso.com

Domain: contoso.com

TEST: Authentication (Auth)

Authentication test: Successfully completed

TEST: Basic (Basc)

Microsoft(R) Windows(R) Server 2003, Enterprise Edition (Service Pack level: 1.0) is supported

NETLOGON service is running

kdc service is running

DNSCACHE service is running

DNS service is running

DC is a DNS server

Network adapters information:

Adapter [00000001] Intel(R) PRO/100 VM Network Connection:

MAC address is 00:0B:CD:64:C2:FB

IP address is static

IP address: 157.54.160.14

DNS servers:

157.54.160.14 (rootdc.contoso.com.) [Valid]

The A record for this DC was found

The SOA record for the Active Directory zone was found

The Active Directory zone on this DC/DNS server was found (primary)

Root zone on this DC/DNS server was not found

TEST: Forwarders/Root hints (Forw)

Recursion is enabled

Forwarders Information:

209.133.38.7 (<name unavailable>) [Valid]




TEST: Delegations (Del)

Delegation information for the zone: contoso.com.

Delegated domain name: _msdcs.contoso.com.

DNS server: rootdc.contoso.com. IP:157.54.160.14 [Valid]

Delegated domain name: test1.contoso.com.

Error: DNS server: test1.test.contoso.com. IP:11.1.1.50 [Broken delegation]

TEST: Dynamic update (Dyn)

Dynamic update is enabled on the zone contoso.com.

Test record _dcdiag_test_record added successfully in zone contoso.com.

Test record _dcdiag_test_record deleted successfully in zone contoso.com.

TEST: Records registration (RReg)

Network Adapter [00000001] Intel(R) PRO/100 VM Network Connection:

Matching A record found at DNS server 157.54.160.14:

rootdc.contoso.com

Matching CNAME record found at DNS server 157.54.160.14:

5e50847e-247d-4977-841a-e6fcd80462e9._msdcs.contoso.com

Matching DC SRV record found at DNS server 157.54.160.14:

_ldap._tcp.dc._msdcs.contoso.com

Matching GC SRV record found at DNS server 157.54.160.14:

_ldap._tcp.gc._msdcs.contoso.com

Matching PDC SRV record found at DNS server 157.54.160.14:

_ldap._tcp.pdc._msdcs.contoso.com


Summary of test results for DNS servers used by the above domain controllers:

DNS server: 11.1.1.50 (test1.test.contoso.com.)

1 test failure on this DNS server

This is not a valid DNS server. PTR record query for the 1.0.0.127.in-addr.arpa. failed on the DNS server 11.1.1.50

[Error details: 1460 (Type: Win32 - Description: This operation returned because the timeout period expired.)]

Delegation is broken for the domain test1.contoso.com. on the DNS server 11.1.1.50

[Error details: 1460 (Type: Win32 - Description: This operation returned because the timeout period expired.) - Delegation is broken for the domain test1.contoso.com. on the DNS server 11.1.1.50]

DNS server: 157.54.160.14 (rootdc.contoso.com.)

All tests passed on this DNS server

This is a valid DNS server.

Name resolution is funtional. _ldap._tcp SRV record for the forest root domain is registered

Delegation to the domain _msdcs.contoso.com. is operational

DNS server: 209.133.38.7 (<name unavailable>)

All tests passed on this DNS server

This is a valid DNS server.

Summary of DNS test results:

Auth Basc Forw Del Dyn RReg Ext

________________________________________________________________

Domain: contoso.com

rootdc PASS PASS PASS FAIL PASS PASS n/a

......................... contoso.com failed test DNS




Module Summary

DNS is critical to the functioning of Active Directory for authentication, replication, and domain controller locator services.Aging and Scavenging parameters must be approached carefully, but when implemented correctly, can prevent the problem of stale resource records in DNS.DNS Application partitions provide greater flexibility for the storage and replication of DNS zone data.Many troubleshooting tools exist to enable better control and monitoring of the DNS infrastructure.

Module Summary

Module 3: Client Logon 121



Module 3: Client Logon

122 Module 3: Client Logon






Microsoft®, Active Directory®, Windows®, Windows NT®, and Windows Server® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.





Module Overview

Describe the logon process.Describe the domain controller location mechanism.Describe the Global Catalog and why it is required for logon.Utilize basic tools to determine the cause of common logon failures.

Module Overview

Introduction This lesson examines the principal causes of failure to logon to a Microsoft® Active Directory® service domain along with how Microsoft clients attempt to locate services in the Active Directory.


• Describe the logon process. • Describe the domain controller location mechanism. • Describe the Global Catalog and why it is required for logon. • Utilize basic tools to determine the cause of common logon failures.




Section 1: Logon Failures

Describe the logon process.Describe the domain controller locator mechanism.Describe the Global Catalog and the requirement for it during logonDescribe the main causes of logon failure

Section 1: Logon Failures

Introduction This lesson examines the principal causes of failure to log on to a Microsoft Active Directory service domain.

Objectives After completing this lesson, you will be able to:

• Describe the logon process. • Describe the domain controller locator mechanism. • Describe the Global Catalog and the requirement for it during logon. • Describe the main causes of logon failure.

Related Topics Covered in This Lesson • Domain Name System (DNS) • Kerberos • Active Directory replication • Password policy

Recommended Reading • Public Key Infrastructure for Windows Server 2003 • Kerberos Authentication Technical Reference • Logon and Authentication Technologies (Windows Server 2003 Technical

Reference)




Logon Process

WinlogonWinlogon

Log On to Windows

AdministratorUser Name:Password:

OK Cancel Options<<

MicrosoftMicrosoftWindows 2000

Advanced ServerBuilt on NT Technology

Shutdown...Shutdown...

Log on to: NWTRADERS

®SASSAS

GINAGINA

LSALSA

KDCKDCLocalSAMLocalSAM

Kerberos SSPKerberos SSP

AccessToken

AccessToken

AS_REQAS_REQAS_REQ

AS_REPAS_REPAS_REP

TGS_REQTGS_REQTGS_REQ

TGS_REPTGS_REPTGS_REP

1

2

3

45

6

Logon Process

Exactly how the logon process works depends on how the computer is configured. With standard configurations of Microsoft Windows®, interactive users log on with a password. In another optional configuration of Microsoft Windows 2000® and later, users can log on with a smart card. Although the basic process is the same for both configurations, there are some differences.

When a user logs on to the network with a domain user and computer account, he or she begins by pressing the key combination Ctrl+Alt+Del, which is the Secure Attention Sequence (SAS) on computers with a standard configuration.

In response to the SAS, Winlogon switches to the logon desktop and dispatches to a DLL called the Graphical Identification and Authentication (GINA), a component loaded in Winlogon's process. GINA is responsible for collecting the logon data from the user, packaging it in a data structure, and sending it to the Local Security Authority (LSA) for verification. Third parties can develop replacement GINAs, but in this case Winlogon has loaded the standard component (Msgina.dll) supplied with the Windows operating system. MSGINA displays the standard logon dialog box.

When a user types his/her name and password, and clicks OK (or presses ENTER), MSGINA returns the logon information to Winlogon. Winlogon then sends the information to the Microsoft Windows NT® Local Security Authority (LSA) for validation by calling LsaLogonUser.




Upon receiving a data structure with a user’s logon data, the LSA immediately converts the plaintext password to a secret key by passing it through a one-way hashing function. It saves the result in the credentials cache, where the hashed password can be retrieved when it is needed for encryption and decryption.

To validate a user’s logon information and set up a logon session on the computer, the LSA must obtain the following:

• A ticket-granting ticket (TGT) that is good for admission to the ticket-granting service.

• A session ticket that is good for admission to the computer. The LSA gets these tickets by working through the Kerberos Security Support Provider (SSP), which exchanges messages directly with the domain's Key Distribution Center (KDC).

The messages follow this sequence:

The LSA sends a KRB_AS_REQ message to the KDC's authentication service in the domain.

The message includes:

• The user's principal name.

• The name of the account domain.

Preauthentication data encrypted with the secret key derived from the user's password.

The KDC's authentication service replies with a KRB_AS_REP message.


• A session key for the user to share with the KDC, encrypted with the secret key derived from the user’s password.

• A TGT for the KDC in the domain, encrypted with the KDC's secret key. The TGT includes a session key for the KDC to share with the user and authorization data for the user.

The authorization data includes the security identifier (SID) for the account, SIDs for security groups in the domain that include the user, and SIDs for universal groups in the enterprise that include either the user account or one of their domain groups.




The LSA sends a KRB_TGS_REQ message to the KDC's ticket-granting service in the domain.


• The name of the destination computer.

• The name of the destination computer's domain.

• The user’s TGT.

• An authenticator encrypted with the session key the user shares with the KDC.

The KDC replies with a KRB_TGS_REP message.


• A session key for the user to share with his/her local computer encrypted with the session key the user shares with the KDC.

• A session ticket to the computer encrypted with the secret key the computer shares with the KDC.

The session ticket includes a session key for the computer to share with the user and authorization data copied from the user’s TGT.

Upon receipt of the user session ticket, the LSA decrypts it with the computer's secret key and extracts the authorization data. It then queries the local Security Accounts Manager (SAM) database to determine whether the user is a member of any security groups local to the computer and whether the user has been given any additional user rights on the local computer. It adds any SIDs returned by this query to the list taken from the ticket's authorization data. The entire list is then used to build an access token. A handle to the access token is then returned to Winlogon, along with the identifier for the user’s logon session and confirmation that the logon information is valid.

Winlogon creates a window station and several desktop objects for the user, attaches the user’s access token, and starts the shell process the user will use to interact with the computer. Any application process started by the user during his/her logon session subsequently inherits this access token.




Domain Controller Locator Process

Client initiates processClient collects information for NETLOGONNETLOGON calls locator

IP/DNS compatibleMicrosoft Windows NT 4.0

compatibleNETLOGON “pings” the names returned

Each domain controller responds to the pingNETLOGON returns first successful responseNETLOGON caches this information

Domain Controller Locator Process

Each Active Directory-based domain controller registers its DNS records on the DNS server and registers its NetBIOS name by using a transport-specific mechanism (for example, WINS). Therefore, a DNS client locates a domain controller by querying DNS, and a NetBIOS client locates a domain controller by querying the appropriate transport-specific name service. Because the code for the Windows IP/DNS-compatible Locator and the Microsoft Windows NT version 4.0–compatible Locator is shared, both DNS clients and NetBIOS clients are supported.

The following sequence describes how the Locator is able to find a domain controller:

On the client (the computer locating the domain controller), the Locator is initiated as a Remoter Procedure Call (RPC) to the local Net Logon service. The Locator application programming interface (API) (DsGetDcName) is implemented by the Net Logon service.

The client collects the information that is needed to select a domain controller and passes the information to the Net Logon service by using the DsGetDcName API, which is discussed in a later topic.




The Net Logon service on the client uses the collected information to look up a domain controller for the specified domain in one of two ways:

For a DNS name, Net Logon queries DNS by using the IP/DNS compatible Locator—that is, DsGetDcName calls the DnsQuery API to read the Service Resource (SRV) records and A records from DNS, after it appends an appropriate string to the front of the domain name that specifies the SRV record.

A workstation that is logging on to an Active Directory domain queries DNS for SRV records in the general form.

_service._protocol.DnsDomainName

Active Directory servers offer the LDAP service over the TCP protocol; therefore, clients find an LDAP server by querying DNS for a record of the form:

_ldap._tcp.DnsDomainName

For a NetBIOS name, Net Logon performs domain controller discovery by using the Windows NT 4.0–compatible Locator, that is, by using the transport-specific mechanism (for example, WINS).

Note: In Windows NT 4.0 and earlier, “discovery” is a process for locating a domain controller for authentication in either the primary domain or a trusted domain.

The Net Logon service sends a datagram to (that is, pings) the computers that registered the name. For NetBIOS domain names, the datagram is implemented as a mailslot message. For DNS domain names, the datagram is implemented as a Lightweight Directory Access Protocol (LDAP) User Datagram Protocol (UDP) search. (UDP is the connectionless datagram transport protocol that is part of the TCP/IP protocol suite. TCP is a connection-oriented transport protocol.)

Note: UDP allows an application on one computer to send a datagram to an application on another computer. UDP includes a protocol port number, which allows the sender to distinguish among multiple destinations (applications) on the remote computer.

Each available domain controller responds to the datagram to indicate that it is currently operational and returns the information to DsGetDcName.

The Net Logon service returns the information to the client from the domain controller that responds first.




The Net Logon service caches the domain controller information so that subsequent requests need not repeat the discovery process. Caching this information encourages consistent use of the same domain controller and, thus, a consistent view of Active Directory.

When a client logs on or joins the network, the client must be able to locate a domain controller. The client sends a DNS Lookup query to DNS to find domain controllers, preferably in the client's own subnet. Therefore, clients find a domain controller by querying DNS for a record of the form:

_LDAP._TCP.dc._msdcs.domainname

After the client locates a domain controller, the client establishes communication by using LDAP to gain access to Active Directory. As part of that negotiation, the domain controller identifies which site the client is in, based on the IP subnet of that client. If the client is communicating with a domain controller that is not in the closest (most optimal) site, the domain controller returns the name of the client's site.

If the client has already tried to find domain controllers in that site (for example, when the client sends a DNS Lookup query to DNS to find domain controllers in the client's own subnet), the client uses the domain controller that is not optimal. Otherwise, the client performs a site-specific DNS lookup again by using the name of the optimal site. The domain controller uses some of the directory service information for identifying sites and subnets.

After the client locates a domain controller, the domain controller entry is cached. If the domain controller is not in the optimal site, the client flushes the cache after 15 minutes and discards the cache entry. The client then attempts to find an optimal domain controller in its own site.

After the client has established a communications path to the domain controller, the client can establish its logon and authentication credentials and, if necessary for Windows-based computers, set up a secure channel. The client then is ready to perform normal queries and search for information against the directory.

The client establishes an LDAP connection to a domain controller to log on. The logon process uses Security Accounts Manager (SAM). Because the communications path uses the LDAP interface and the client is authenticated by a domain controller, the client account is verified and passed through SAM to the directory service agent, then to the database layer, and finally to the database in the extensible storage engine (ESE).

To troubleshoot the domain locator process: Check Event Viewer to see whether the event logs contain any error information. On both

the client and the server, check the System log for failures during the logon process. Also, check the Directory Service logs on the server and the DNS logs on the DNS server.




Check the IP configuration by running the ipconfig /all command at a command prompt. Verify that the configuration is correct for the network.

Use the Ping utility to verify network connectivity and name resolution. Ping both the IP address and the server name.

In Microsoft Windows XP and later check the Network Diagnostics tool in Help and Support under "Use Tools to view your computer information and diagnose problems" to determine whether the network components are correctly installed and working properly. Network Diagnostics also runs some tests and provides information about the network configuration, information that can be helpful.

In Windows 2000, the network diagnostics tools are only available at the command line, for example, netdiag /v and netdiag /fix.

Use the nltest /dsgetdc:<domainname> command to verify that a domain controller can be located for a specific domain.

Use the NSLookup tool to verify that DNS entries are correctly registered in DNS. Verify that the server host records and GUID SRV records can be resolved.

For example, to verify record registration, use the following commands:

nslookup <server_name>.<child_of_root_domain>.<root_domain>.com

nslookup guid._msdcs.<root_domain>.com

If either of these commands does not succeed, use one of the following methods to reregister records with DNS:

To force host record registration, type ipconfig /registerdns at a command prompt.

To force domain controller service registration, stop and then restart the Net Logon service.

To verify appropriate LDAP connectivity, use the Ldp.exe tool to connect and bind to the domain controller.

If you suspect that a particular domain controller has problems, turn on Netlogon debug logging. Use the NLTest utility by typing nltest /dbflag:2080ffff at a command prompt. The information is logged in the Debug folder in the Netlogon.log file.

If you still have not isolated the problem, use Network Monitor to monitor network traffic between the client and the domain controller.




Domain Controller Detection

Computer

Computer

Computer

Computer

Site Site 11 SiteSite 33

DCDC DCDC

Computer

SiteSite 22

DCDC

Computer

DCDC

Computer

ComputerComputer

Computer

Computer

Domain 1 Domain 2

Domain Controller Detection

A client computer needs to get the latest configuration status during each startup phase. Therefore, it has to locate at least one controller in its domain.

In a Windows 2000 or later domain, each controller is also an LDAP server. To retrieve a list of available controllers, the client can query the DNS for SRV resource records with the name _ldap._tcp.dc._msdcs.DnsDomainName.

The following frames show an example of this. Frame Source Destination Protocol Description

1 Client DNS Server DNS 0x1:Std Qry for _ldap._tcp.dc._msdcs.contoso.com. of type Srv Loc on class INET addr.

2 DNS Server Client DNS 0x1:Std Qry Resp. for _ldap._tcp.dc._msdcs.contoso.com. of type Srv Loc on class INET addr

The Windows domain is an administrative boundary, which is independent from the structure of a given network. The computers in a given environment can be grouped into sites. A site in Active Directory is defined as a set of IP subnets connected by fast, reliable connectivity. As a rule of thumb, networks with LAN speed or better are considered fast networks for the purposes of defining a site. A domain can span multiple sites and multiple domains can cover a site.




The locator service of a client attempts to find the closest site during the startup process, and stores the name of the site in the registry. If a client knows which site it belongs to, it can send a query for controllers in its site to the DNS server. The format of such a query is:

_ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.DnsDomainName

The following frames show an example of this. Frame Source Destination Protocol Description

1 Client DNS Server DNS 0x1:Std Qry for _ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.contoso.com. of type Srv Loc on class INET addr.

2 DNS Server Client DNS 0x1:Std Qry Resp. for _ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.contoso.com. of type Srv Loc on class INET addr.

The DNS query above shows the client looking for the LDAP service in the "Default-First-Site-Name." Default-First-Site-Name is the default name given to a Windows site when it is created.

The domain controllers registered for a site can be viewed with the Microsoft Management Console DNS snap-in. If it is possible for the DNS server to locate the requested information, it sends back a list of all known domain controllers in the site.

DNS: Answer section: _ldap._tcp.Site2._sites.dc._msdcs.contoso.com. of type Srv Loc on class INET addr.(2 records present) DNS: Resource Record: rootdc.contoso.com. of type Host Addr on class INET addr. DNS: Resource Record: Rootdc01.contoso.com. of type Host Addr on class INET addr.

The client randomly picks up one controller for the additional communication process, and it does not distinguish between local or remote subnets because it considers each member of its site as a computer that is reasonably close to the client.

As already mentioned, it is possible to have an influence on the controller selection in the form of the site concept. After retrieving a domain controller, the client tries to determine whether the controller is the closest one by using LDAP queries.




Frame Source Destination Protocol Description

1 Client Server LDAP ProtocolOp: SearchRequest (3)

2 Server Client LDAP ProtocolOp: SearchResponse (4)

3 Client Sever LDAP ProtocolOp: SearchRequest (3)


In the query, the client requires a match for attributes such as:

• DNS domain name • Host name • Domain globally unique identifier (GUID) • Domain SID If the controller does have exactly this information in its Active Directory database, it passes back information about itself such as:

• DomainControllerName • DomainControllerAddress • DomainControllerAddressType • DomainGUID • DomainName • DNSForestName • DCSiteName • ClientSiteName

The most important information for the client is the site name. The hex dump of the response from the server will contain only one site name if the client is a member of the controller's site:

00000: 00 A0 C9 F1 A0 00 00 01 02 33 BF E7 08 00 45 00 . Éñ ....3¿ç..E. 00010: 00 D4 E9 90 00 00 80 11 00 00 0A 00 00 16 0A 00 .Ôé�..€......... 00020: 00 18 01 85 04 04 00 C0 B6 57 30 84 00 00 00 9C ...…...À[para]W0„...œ 00030: 02 01 02 64 84 00 00 00 93 04 00 30 84 00 00 00 ...d„..."..0„... 00040: 8B 30 84 00 00 00 85 04 08 6E 65 74 6C 6F 67 6F ‹0„...…..netlogo 00050: 6E 31 84 00 00 00 75 04 73 17 00 00 00 FD 01 00 n1„...u.s....ý.. 00060: 00 48 44 82 88 4E 79 85 47 A8 CA 16 1D 55 23 B2 .HD‚ˆNy…G¨Ê..U#² 00070: E0 06 64 63 63 6C 61 62 05 6C 6F 63 61 6C 00 C0 à.contoso.com.À. 00080: 18 08 64 63 63 6C 61 62 32 32 C0 18 06 44 43 43 .rootdcÀ..CONTOS




00090: 4C 41 42 00 08 44 43 43 4C 41 42 32 32 00 09 44 O..ROOTDC..ROOTD 000A0: 43 43 4C 41 42 32 34 24 00 17 44 65 66 61 75 6C C01$..Default-Fi 000B0: 74 2D 46 69 72 73 74 2D 53 69 74 65 2D 4E 61 6D rst-Site-Name.ÀP 000C0: 65 00 C0 50 05 00 00 00 FF FF FF FF 30 84 00 00....ÿÿÿÿ0„....... 000D0: 00 10 02 01 02 65 84 00 00 00 07 0A 01 00 04 00 .....e„......... 000E0: 04 00 ..

If the client is communicating with a controller that is not in the client's site, the controller will also pass back the name of the client's proper site:

00000: 00 20 78 E0 AA 2B 00 20 78 01 80 69 08 00 45 00 . xàª+. x.€i..E. 00010: 00 C9 FD A8 00 00 7F 11 28 64 0A 00 00 16 0B 00 .Éý¨...(d...... 00020: 00 02 01 85 04 03 00 B5 C8 55 30 84 00 00 00 91 ...…...μÈU0„...' 00030: 02 01 01 64 84 00 00 00 88 04 00 30 84 00 00 00 ...d„...ˆ..0„... 00040: 80 30 84 00 00 00 7A 04 08 6E 65 74 6C 6F 67 6F €0„...z..netlogo 00050: 6E 31 84 00 00 00 6A 04 68 17 00 00 00 7D 01 00 n1„...j.h....}.. 00060: 00 48 44 82 88 4E 79 85 47 A8 CA 16 1D 55 23 B2 .HD‚ˆNy…G¨Ê..U#² 00070: E0 06 64 63 63 6C 61 62 05 6C 6F 63 61 6C 00 C0 à.contoso.com.À. 00080: 18 08 64 63 63 6C 61 62 32 32 C0 18 06 44 43 43 ..rootdcÀ..CONTO 00090: 4C 41 42 00 08 44 43 43 4C 41 42 32 32 00 0B 44 SO..ROOTDC..ROOT 000A0: 43 43 52 4F 55 54 45 52 32 24 00 05 53 69 74 65 DC01$..Site2... 000B0: 32 00 05 53 69 74 65 31 00 05 00 00 00 FF FF FF Site1.....ÿÿÿÿ0 000C0: FF 30 84 00 00 00 10 02 01 01 65 84 00 00 00 07 „.......e„..... 000D0: 0A 01 00 04 00 04 00 .......

In this case, the client sends another query to the DNS server asking for the list of controllers in this site. The following table shows an example of this. The client is looking for a domain controller in Site2 and switches to Site1 after the LDAP searches to verify items, such as the DNS domain name and domain SID.


1 Client DNS Server DNS 0x1:Std Qry for _ldap._tcp.Site2._sites.dc._msdcs.contoso.com.

2 DNS Server

Client DNS 0x1:Std Qry Resp. for _ldap._tcp.Site2._sites.dc._msdcs.contoso.com

3 Client DC Server LDAP ProtocolOp: SearchRequest (3)

4 DC Server Client LDAP ProtocolOp: SearchResponse (4)

5 Client DC Server LDAP ProtocolOp: SearchRequest (3)

6 DC Server Client LDAP ProtocolOp: SearchResponse (4)

7 Client DNS Server DNS DNS 0x2:Std Qry for _ldap._tcp.Site1._sites.dc._msdcs.contoso.com.

8 DNS Server

Client DNS 0x2:Std Qry Resp. for _ldap._tcp.Site1._sites.dc._msdcs.contoso.com

It is not necessary to have a domain controller in each site. Each domain controller checks all sites in a forest and the replication cost. A domain controller registers itself in any site that does not have a domain controller for its domain and for which its site has the lowest cost connection. This process is also known as automatic site coverage. What this means is that a client will use the next domain controller that it has lowest cost to get to.




The default location process of the closest domain controller consists of several network packets and creates around 2,000 bytes of traffic.


1 Client Server ARP_RARP ICMP Echo: From 10.00.00.24 To 10.00.00.22

2 Server Client ARP_RARP ICMP Echo Reply: To 10.00.00.24 From 10.00.00.22 10.0.0.22 10.0.0.24

3 Client Server DNS 0x1:Std Qry for _ldap._tcp.Default- First-Site- Name._sites.dc._msdcs.contoso.com.

4 Client Server DNS 0x2:Std Qry for _ldap._tcp.Default- First-Site- Name._sites.dc._msdcs.contoso.com.

5 Server Client DNS 0x1:Std Qry Resp. for _ldap._tcp.Default-First-Site- Name._sites.dc._msdcs.contoso.com.

6 Server Client DNS 0x2:Std Qry Resp. for _ldap._tcp.Default-First-Site- Name._sites.dc._msdcs.contoso.com.





11 Client Server ARP_RARP ICMP Echo: From 10.00.00.24 To 10.00.00.22

12 Client Server ARP_RARP ICMP Echo Reply: To 10.00.00.24 From 10.00.00.22 10.0.0.22 10.0.0.24




Finding a Domain Controller in the Closest Site (1)

Search for a site-specific DNS record before searching for a DNS record that is not site-specificActive Directory site and subnet objects

Objects for entire forest are stored in the configuration container

Mapping IP addresses to site names


During a search for a domain controller, the Locator attempts to find a domain controller in the site closest to the client. When the domain that is being sought is an Active Directory domain, the domain controller uses the information stored in Active Directory to determine the closest site. When the domain being sought is a Windows NT 4.0 domain, domain controller discovery occurs when the client starts and uses the first domain controller that it finds.

Each Active Directory–based domain controller registers DNS records that indicate the site where the domain controller is located. The site name (the relative distinguished name of the site object in Active Directory) is registered in several records so that the various roles the domain controller might perform, such as Global Catalog or Kerberos server, can be associated with the domain controller's site. When DNS is used, the Locator searches first for a site-specific DNS record before it begins to search for a DNS record that is not site-specific (thereby preferentially locating a domain controller in that site).

A client computer stores its own site information in the registry, but the computer is not necessarily located physically in the site associated with its IP address. For example, a portable computer that was moved to a new location contacts a domain controller in its home site, which is not the site to which the computer is currently connected. In this situation, the domain controller looks up the client site on the basis of the client IP




address by comparing the address with the sites that are identified in Active Directory, and returns the name of the site that is closest to the client. The client then updates the information in the registry.

The domain controller stores site information for the entire forest in the Configuration container. The domain controller uses the site information to check the IP address of the client computer against the list of subnets in the forest. In this way, the domain controller ascertains the name of the site in which the client is assumed to be located or the site that is the closest match, and returns this information to the client.

Active Directory Site and Subnet Objects In Active Directory, a site is defined by a site object in the cn=Sites,cn=Configuration, dc=ForestRootDomain container. A subnet is an addressed segment within a site and is represented by an object in the cn=Subnets,cn=Sites,cn=Configuration, dc=ForestRootDomain container.

The site in which a domain controller is located is identified in the Configuration container by the domain controller object that is located within the cn=Servers container beneath the site object for a particular site. A domain controller can identify the site of a client by using the subnet object in the Sites container. Each subnet object has a siteObject property ("attribute") that links it to a site object; the value of the siteObject property is the distinguished name of the site object. This link enables a domain controller to identify clients that have an IP address in the specified subnet as being in the specified site.

Subnet names in Active Directory take the form "network/bits masked" (for example, the subnet object 172.16.72.0/22 has a subnet of 172.16.72.0 and a 22-bit subnet mask). If this subnet had a siteObject property value that contained the distinguished name of the East site object, all IP addresses in the 172.16.72.0/22 subnet would be considered to be in the East site. The siteObject property is a single value, which implies that a single subnet maps to a single site. However, multiple subnet objects can be linked to the same site object. The directory administrator manually creates subnet objects and, hence, the siteObject property value.

The Configuration container (including all of the site and subnet objects in it) is replicated to all domain controllers in the forest. Therefore, any domain controller in the forest can identify the site in which a client is located, compare it to the site in which the domain controller is located, and indicate to the client whether that domain controller's site is the closest site to the client.

Mapping IP Addresses to Site Names During Net Logon startup, the Net Logon service on each domain controller enumerates the site objects in the Configuration container. Net Logon on each domain controller is




also notified of any changes made to the site objects. Net Logon uses the site information to build an in-memory structure that is used to map IP addresses to site names.

When a client that is searching for a domain controller receives the list of domain controller IP addresses from DNS, the client begins querying the domain controllers in turn to find out which domain controller is available and appropriate. Active Directory intercepts the query, which contains the IP address of the client, and passes it to Net Logon on the domain controller. Net Logon looks up the client IP address in its subnet-to-site mapping table by finding the subnet object that most closely matches the client IP address and then returns the following information:

• The name of the site in which the client is located, or the site that most closely matches the client IP address.

• The name of the site in which the current domain controller is located. • A bit that indicates whether the found domain controller is located (bit is set) or not

located (bit is not set) in the site closest to the client. The domain controller returns the information to the client. The response also contains various other pieces of information that describe the domain controller. The client inspects the information to determine whether to try to find a better domain controller. The decision is made as follows:

• If the returned domain controller is in the closest site (the returned bit is set), the client uses this domain controller.

• If the client has already tried to find a domain controller in the site in which the domain controller claims the client is located, the client uses this domain controller.

• If the domain controller is not in the closest site, the client updates its site information and sends a new DNS query to find a new domain controller in the site. If the second query is successful, the new domain controller is used. If the second query fails, the original domain controller is used.

If the domain that is being queried by a computer is the same as the domain to which the computer is joined, the site in which the computer resides (as reported by a domain controller) is stored in the computer registry. The client stores this site name in the DynamicSiteName registry entry.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ Netlogon\Parameters




Therefore, the DsGetSiteName API returns the site in which the computer is located. Never change dynamically determined values. To override the dynamic site name, add the SiteName entry with the REG_SZ data type in the above key. When a value is present for the SiteName entry, the DynamicSiteName entry is not used.

If the domain being located is the same as the domain to which the computer is joined and the computer has not physically moved to a different site since the last query, the dynamically determined site name in the registry is the actual site in which the computer is located. As such, the client finds a domain controller in the correct site without having to retry the operation. If the site name in the registry is not the current site of the computer (for example, if the computer is portable), the domain controller location process serves to update the site information in the registry.





Automatic site coverageDetermining site coverage on the basis of cost Site coverage algorithm Cache time-out and closest site Clients with no apparent site


Automatic Site Coverage There is not necessarily a domain controller in every site. For various reasons, it is possible that no domain controller exists for a particular domain at the local site. By default, each domain controller checks all sites in the forest and then checks the replication cost matrix. A domain controller advertises itself (registers a site-related SRV record in DNS) in any site that does not have a domain controller for that domain and for which its site has the lowest-cost connections. This process ensures that every site has a domain controller that is defined by default for every domain in the forest, even if a site does not contain a domain controller for that domain. The domain controllers that are published in DNS are those from the closest site (as defined by the replication topology).

For example, given one domain and three sites, a domain controller for that domain might be located in two of the sites, but there might be no domain controller for the domain in the third site. Replication to the domain that does not have a domain controller in the third site might be too expensive in terms of cost or replication latency. To ensure that a domain controller can be located in the site closest to a client computer, if not the same site, Active Directory automatically attempts to register a domain controller in every site. The algorithm that is used to accomplish automatic site coverage determines how one site can "cover" another site when no domain controller exists in the second site.




Determining Site Coverage on the Basis of Cost Given one domain and sites A, B, and C, site A has no domain controllers for the domain. If a client in site A attempts to locate a domain controller, which domain controller should be returned? The answer depends on which site covers site A for the domain. Site coverage is determined according to site-link costs, and domain controllers register themselves in sites accordingly.

In the example, a site link exists between site A and both of the other sites—that is, the connections between domain controllers in site A, site B, and site C are configured for replication over site links in Active Directory Sites and Services. Costs are associated with site links based on the expense of transferring data over the connections. The administrator uses the speed of the connection between sites to assign a cost to the communication link, and replication uses the cost to establish the least expensive route for replication traffic.

Site A and site B are connected by site link AB. Site A and site C are connected by site link AC, with the following costs:

• Site link AB cost = 50. • Site link AC cost = 100. The link between site A and site C has a much higher cost than the link between site A and site B. The administrator configured this cost based on the expensive ISDN line that connects site A and site C, and the administrator would prefer that resources in site B be used when possible. The site-coverage algorithm ensures that a domain controller in site B registers itself as a domain controller for site A. In this way, clients in site A that are looking for a domain controller find one from site B, instead of possibly finding one from site C.

Site Coverage Algorithm During registration of SRV records in DNS, the following algorithm is used to determine which domain controllers register site SRV records that designate them as preferred domain controllers in sites that do not have a specific domain represented.

For every domain controller in the forest, follow this procedure:

Build a list of target sites—sites that have no domain controllers for this domain (the domain of the current domain controller).

Build a list of candidate sites—sites that have domain controllers for this domain.

For every target site, follow these steps:

Build a list of candidate sites of which this domain is a member. (If none, do nothing.)

Of these, build a list of sites that have the lowest site-link cost to the target site. (If none, do nothing.)




• If more than one, break ties (reduce this list to one candidate site) by choosing the site with the largest number of domain controllers.

• If more than one, break ties by choosing the site that is first alphabetically.

• Register target-site-specific SRV records for the domain controllers for this domain in the selected site.

Cache Time-out and Closest Site If a domain member computer requests a domain controller while all domain controllers in its site are offline, the Locator necessarily returns a domain controller in a different site. The location of this domain controller is stored in the client cache. The cache lifetime is controlled by the CloseSiteTimeout entry in the registry.

In addition, the domain controller performs authentication, and a secure channel is set up. On subsequent location attempts, the lifetime of the cache and the lifetime of the secure channel are secondary to the location of a domain controller in the closest site.

If the domain controller that is stored in the client cache is not in a site that is close to the client, Net Logon attempts to find a close domain controller when either of the following events occurs:

• An interactive logon process uses pass-through authentication on the secure channel. • The value in the CloseSiteTimeout registry entry has elapsed since the last attempt,

and any other attempt is made to use the secure channel (for example, pass-through authentication of network logons).

Thus, Net Logon attempts to find a close domain controller only on demand. The default value of the CloseSiteTimeout period is 15 minutes; the maximum value is 49 days, and the minimum value is 60 seconds. The implications of this setting are that if the time-out value is too large, a client never tries to find a close domain controller if there is not one available at startup. If the value of this setting is too small, secure channel traffic is unnecessarily slowed down by discovery attempts.

Clients with No Apparent Site Sometimes the client pings a domain controller and the client IP address cannot be found in the subnet-to-site mapping table. In this case, the domain controller returns a NULL site name, and the client uses the returned domain controller.




Using a Domain Controller Outside of Client Site

Group policy objects linked to siteStorage is in domain partitionRecommended for site based settings (e.g. proxy servers)

Application of Group PolicyAuthenticating DC not guaranteed to be the one that policy is obtained from (but will be most of the time)KB 831201 can change this behavior

DNS registration is not optimized

Using a Domain Controller Outside of Client Site

While the Active Directory infrastructure is set up so that a client will always attempt to use a domain controller that is the most efficient based on the topology there exists some situations where this may not occur during a logon. For example, a client may be attempting to locate a domain controller for logon in their site but they are not responding (network issue, servers down, etc.). In this case the client may end up using a non-optimal domain controller. Below are some scenarios where this can occur.

• Application of a Site linked Group Policy Objects

• Application of Group Policies

• DNS registrations not optimized (branch office scenarios)

Application of Site-linked Group Policy Objects All Group Policy information is stored in the two places in the AD: Group Policy Container (GPC) and Group Policy Template (GPT). The GPC is stored in the System folder of the domain naming context while the GPT is stored in the Sysvol on each DC;, therefore, replication of this data only occurs to DC’s in the same domain. Because sites can contain domain controllers, servers, and workstations from any domain in the forest, each may have to apply the site policy which may be located on a DC from a domain that is not located in the server’s site. Also, note that only Enterprise Admins can create




site based GPO’s and most likely this will be done in the domain root. Because the root will have fewer DCs than the other domains as a general rule, the likelihood of having to cross site boundaries to obtain the policy is increased.

Note: Site policies can be used effectively if set up appropriately. For example, a collection of clients or servers in a site may need to use certain settings (ex: Proxy server addresses) that are different than those of other sites. Creating a GPO linked to a site is a great way to manage these settings as long as it is managed properly.

Application of Group Policies When a user or computer is authenticated by a domain controller, it was widely believed that this would be the same one that Group Policy objects would be applied. When a user or computer is enumerating the Active Directory for group policy, it sends the location of the GPT, which is located in Sysvol. The location will be a DFS-style path that is covered by all domain controllers in the domain (ex: \\contoso.com\sysvol\constoso.com\policies \{61b354.....) This path is not a physical location but a DFS path that the client must resolve to an actual target that contains the data. In this case, the client does a DFS referral requesting the names of the servers that cover the name of \\controso.com\sysvol.

In the response back from the server, it will list the names of the servers that cover this namespace based on the client’s site information. If more than one DC is located in the client’s site, the DC that authenticated the client may not be at the top of the referral returned to the client, and another DC may be used for obtaining the policy data. To ensure the domain controller that authenticates the user be the same that group policy is obtained please obtain the following update:

831201 An update for Windows Server 2003 and Windows 2000 Server makes it possible to put the logon server at the top of the DFS referrals list

http://support.microsoft.com/default.aspx?scid=kb;EN-US;831201

DNS Registrations not optimized In the branch office scenario, it is important that clients who cannot find a domain controller in their own site find a domain controller in their hub site, but never a domain controller in another branch or hub. In many deployments, clients from one branch cannot connect to machines in another branch, because the network is not fully routed (for example, one-way dial-up lines are used). Even if connectivity is possible, however, it is still undesirable to initiate network connections between branches. Such network traffic would always go through the hub site; therefore, it is better to restrict the traffic to branch-to-hub only.




To avoid the situation where clients in one branch contact a domain controller in another branch, the Net Logon service on all branch office domain controllers must be configured to publish only site-specific locator records but not generic domain controller locator records. The result is that only the hub domain controllers publish the generic locator records in addition to their site-specific records. Clients that cannot find a domain controller in their own site will now only find generic domain controller locator records for hub domain controllers.

267855 Problems with Many Domain Controllers with Active Directory Integrated DNS Zones

http://support.microsoft.com/default.aspx?scid=kb;EN-US;267855




Client Logon and Firewalls

Can be done but requires a lot of changes to the firewall configurationIf this configuration is required suggest using:

IPSECVPN/Tunneling

Ports required for Client logon

Client logon and Firewalls

In some configurations, a client may be situated on one side of a firewall, while the domain controller(s) reside on the other side. In this case, for a successful logon to occur (ex: client authentication, application of Group Policies, etc.) configuration changes to the firewall will be required. The necessary ports to be opened on the firewall are listed in the following table:

Application Protocol Protocol Port(s)

Global Catalog Server TCP 3269

Global Catalog Server TCP 3268

LDAP TCP and UDP 389

LDAP SSL TCP and UDP 636

IPSec ISAKMP UDP 500

NAT-T UDP 4500

RPC (EPM) TCP 135

RPC (Randomly allocated high TCP ports) TCP 1024 - 65536

NetBIOS Datagram Service UDP 138

NetBIOS Name Resolution UDP 137

NetBIOS Session Service TCP 139

SMB TCP 445

DNS TCP and UDP 53

DCOM TCP and UDP 1024 - 65534




Application Protocol Protocol Port(s)

ICMP (ping) UDP 20

Kerberos TCP and UDP 88

As one can see, opening this many ports may not be desirable in certain configurations and may reduce the overall effectiveness of the firewall. Therefore, a better solution for both the client and security would be to utilize IPSEC or create a VPN/Tunnel the clients can use to connect to the domain controllers.




Global Catalog

First Domain Controller in forest becomes a global catalogGlobal catalog servers can be any domain controller from any domainRequired during logon

AdministratorUniversal group membership Caching (first time users denied)Cached credentials (access to each network resource must be validated individually)No cached credentials or group memberships (logon denied)

Number of Cached Credentials Logons can be controlled via GPO

Global Catalog

The first domain controller that is installed for a given forest is automatically selected to be a Global Catalog server. The global catalog server replicates a copy of all objects from every domain in the forest but only contains a read-only subset of each object's attributes. The attributes that are replicated are those that will be used in most common queries. Beyond the first domain controller of the forest, it is an administrative action to make other domain controllers in the forest global catalog servers.

Global catalog servers can be domain controllers from any domain. When authentication occurs, the domain controller that is authenticating the user's log-on request needs to locate a global catalog in order to construct the universal groups to which that user belongs. In the event that there is only one domain in the forest, all domain controllers contain the same data and thus each domain controller is equivalent to a global catalog. If the domain controller handling the user logon request is also a global catalog, there is no need to remote the request to another global catalog. There is no requirement that the global catalog selected to service the request be a member of the domain to which the authenticating domain controller belongs.

If a global catalog server cannot be located by the domain controller during this process:

• If the user is an administrator, Windows 2000 and later allow the logon to take place without the domain controller contacting a global catalog. This is a special case applying to that account alone (as opposed to anyone who is a member of administrators, enterprise administrators, domain administrators, and so forth). The account is distinguished by its RID (0x1F4 or 500 decimal). This is for the purpose




of logging on to a domain controller to set up a global catalog if none is available or cannot be brought up in time.

• If a domain controller in a Windows 2000 native-mode domain cannot contact a Global Catalog server when a user attempts to log on, the domain controller refuses the logon request. This requirement is managed differently by domain controllers that are running Microsoft Windows Server™ 2003, which can cache universal group memberships. When Universal Group Membership Caching is enabled on the NTDS Site Settings object for a site that contains domain controllers running Windows Server 2003, the global catalog is required for universal group enumeration only the first time the user logs on, and the membership is cached on the domain controller thereafter.

• If cached credentials exist for the user on the local computer, the user is logged on with those credentials. Access to network resources must be validated on an individual basis. Administrators can create a Group Policy to deny cached credential logon should they decide that resources should be denied to a user when a global catalog cannot be contacted.

• If cached credentials do not exist on the local computer and universal groups are not cached on the authenticating domain controller, all users except the built-in administrator are denied logon.

For these reasons, it is important to note that a user getting logged on with cached credentials or being denied logon would be the result of the domain controller failing to find any global catalog. Because all global catalogs contain the same information (with the replication latency exception), any global catalog can be used. If a local global catalog happens to be offline, a remote global catalog can and will be used. For performance reasons and bandwidth efficiency, it may be beneficial to host a local global catalog in each site. Clients and domain controllers prefer communicating with a global catalog in the local site before using a remote global catalog in another site.

This process takes place at every logon and is consistent with down-level behavior. If the user is added to or removed from a group, the change is not reflected until the user logs off and back on.

Limiting the Number of Cached Credential Logons As long as a user has logged on to a domain-joined system at least once and been authenticated by a domain controller he or she can continue to logon to the local computer using cached credentials if a domain controller is not available.

Note: This does not imply that users who have been deleted from the directory can still logon to systems with their cached credentials. If a DC is available and the password is no longer valid (or account does not exist) the user will be denied logon.




When a user logs on to a Windows 2000, XP, or 2003 system, a local profile is created and the user credentials are stored locally. If, during an interactive logon session, a domain controller is not available (ex: a user takes his/her laptop home and performs an interactive logon), the credentials stored locally are used to authenticate the user and grant access to the local system. However, as noted in the previous section, any access to a domain resources will require authentication (ex: user VPN’s into the corporate network and accesses a files server)..

In some cases you may want to limit the number of logons a user can perform with cached credentials on the workstations. This security setting can be controlled via GPO and is located as follows:

Computer Configuration Windows settings Security settings Local Policies Security options “Interactive Logon: Number of previous logons to cache (in

case a domain controller is not available)”

This setting is defined using the number of logons you wish to allow using cached credentials. This setting is not defined by default and can be set from 0 to 50 (10 is the default setting).




Global Catalog Server Requirement

Contains universal group membership for forestDuring logon no need to communicate with every domain in the forestNative-modeMixed-modeDown level clients

User principal nameSearch requestsAdding a global catalog

Occupancy

Global Catalog Server Requirement

Logon A universal group is a security group that is available in Windows 2000 native mode in a Windows 2000 domain, and at the Windows 2000 native and Windows Server 2003 domain functional levels in a Windows Server 2003 domain. Universal groups can have members from any domain in the forest. The membership for all universal groups cannot be stored on every domain controller because each domain controller stores objects for only one domain. A user may be in a universal group in any domain, and finding these groups could be time consuming. For this reason, only global catalog servers, which store every object in the forest, can enumerate the membership of a universal group. To ensure that universal group membership is assessed for each authenticated user, a global catalog server is required for logging on to domains that can use universal groups. So instead of communicating with every domain in the forest to enumerate the universal groups from each, the member list of each universal group is replicated to global catalog servers, making it easier for a domain controller to query one location for all universal groups of which the user is a member.

As part of the logon process, the user is identified to the system and the LSA constructs a security token. The user and his/her group membership SIDs are added to the security token. The global catalog enumerates the membership of universal groups. The memberships of other types of groups (global and domain local) are not enumerated by the global catalog; only the group object name is listed. Membership enumeration of global groups and domain local groups is the responsibility of the resource domain controller. For replication, this arrangement means that the replication of global and




domain local group memberships is not required by global catalog servers, which significantly reduces replication traffic.

In a Windows 2000 Native-mode domain, the Key Distribution Center (KDC on the domain controller authenticating the user's logon request is responsible for adding the SIDs for global groups from the user's logon domain, locating and communicating with the global catalog to enumerate the universal groups of which the user is a member, and adding the SIDs of those groups to the user's token. If the domain in which the computer resides is in Native mode, any domain local groups from that domain of which the user is a member are added to the token. Lastly, any local groups from the local computer of which the user is a member are added to the token.

In a Mixed-mode domain, universal groups cannot be created. If a Windows 2000 or later computer is located in a down-level or Mixed-mode domain, different behavior occurs. Other domains may be in Native mode and universal groups may have been created that contain the user as a member. The domain controller authenticating the logon request will add the SIDs of the global groups of which the user is a member to the user's token and the local computer adds SIDs for groups of which the user is a member on the local computer as appropriate. When an attempt to use resources in another domain occurs, the computer hosting the resource contacts a domain controller for that domain, which adds the SIDs of the group’s local to that domain (which may include universal groups) of which the user is a member to the user's token.

Down-level clients do not perform this operation at logon and are unaffected. Note that computers are security principals and can be affected in the same way. An enumeration of the groups to which the computer belongs is also performed at computer startup.

User Principal Name and Global Catalog Logon Support User principal names are user names that can be used when a user is logging on to an Active Directory domain as an alternative to the traditional SAM account name. The user principal name format (<UserName>@<DNSDomainName>) is resolved by the Global Catalog server.

Search Requests and the Global Catalog The Global Catalog can be used to locate objects in any domain without a referral to a different server. When a search request is sent to port 389 (the default LDAP port), the search is conducted on a single directory partition. If the object is not found in that directory partition (and is not in the schema or configuration directory partitions), the request is referred to a domain controller in a different domain that is assumed to contain the requested object, on the basis of the distinguished name that is presented in the search request.

When a search request is sent to port 3268 (the default Global Catalog port), the search includes all directory partitions in the forest—that is, the search is processed by a Global Catalog server. If the request specifies attributes that are part of the Global Catalog




attribute set, the Global Catalog can return results for objects in any domain without generating a referral to a domain controller in a different domain.

Adding a Global Catalog When creating a new global catalog server, the promotion process can be delayed by several conditions, including the following:

• The Knowledge Consistency Checker (KCC) could not reach a source- domain controller from which to replicate a directory partition.

• Replication cannot begin until the scheduled time. • Replication of the partition is in progress but has not yet completed. This event might

be logged if the partition is very large. In addition, the replication priority queue prioritizes addition of new directory partitions at a lower priority than incremental replication of existing partitions.

• The source domain controller for a directory partition has failed or is unavailable due to network problems.

By default, a global catalog server is not considered "promoted" until all read-only directory partitions have been fully replicated to the new global catalog server. A domain controller checks every 30 minutes to see whether it has received all read-only directory partitions that are required to be present before the server advertises itself in DNS as a global catalog server.

A global catalog server receives all read-only replicas from other domains through replication according to the occupancy level, which is a registry setting on the domain controller. On domain controllers running Windows Server 2003, the occupancy level of 6 requires that all replicas have been added and synchronized on the global catalog server before the server is advertised in DNS. Lower occupancy levels specify varying levels of replication completeness, including advertising in DNS when all read-only replicas of only those domains represented in the domain controller’s site are synchronized. Windows 2000 domain controllers running SP2 and earlier have an occupancy level of 4 that requires only the replicas of domains in the site. Windows 2000 domain controllers running SP3 and later have an occupancy level that requires full synchronization of all read-only replicas.

When conditions preclude the successful synchronization of the new global catalog server, it is possible to force advertisement of the global catalog server and then remove it. Until the global catalog server is successfully advertised, it is not possible to remove it. For this reason, when successful promotion is impossible, use the registry to override the delay so that the incomplete global catalog can be removed from the domain controller.

HKLM\System\CurrentControlSet\Services\NTDS\ Parameters

Value name: Global Catalog Partition Occupancy

Data type: REG_DWORD




When the administrator modifies the global catalog status of a computer, it is not until a process named the KCC runs next that the computer actually starts the process of changing its role. For this reason, users may observe that they can still log on despite the absence of a global catalog. The administrator can expedite this, though, by running the KCC manually using a variety of tools (Repadmin, Replmon, Active Directory Sites and Services snap-in, and so forth).




Global Catalog Server Availability Requirement

Site ASite A Site BSite BDomaincontroller ADomaincontroller A

Global catalogGlobal catalog

Normally the WAN link between A and B is required for each logon.

Disable this requirement on domain controller A using the “ignoreGCfailures” registry key.

Global Catalog Server Availability Requirement

Placement of Global Catalog servers in remote sites is usually desired to improve performance in user logon time, searches, and other actions requiring communication with Global Catalog servers, and to reduce wide area network (WAN) traffic. However, to reduce administrative intervention, hardware requirements, and other related overhead, in some situations organizations may not want to locate a Global Catalog server at a remote site. This is especially relevant in environments that have a large number of sites that could experience substantially increased hardware costs when the size of the sites may not justify that hardware and administration. The problem, as noted earlier in this document, is that logons require the domain controller authenticating the user to contact a Global Catalog server to determine if the user is a member of any universal groups. So if the remote office does not have a Global Catalog server, and a Global Catalog server cannot be contacted (for various reasons), the user's logon request may not work (based on the rules stated earlier).

To eliminate the need for a Global Catalog server at a site and to avoid potential denial of user logon requests, the following registry key is provided to perform logons if a Global Catalog server is not available:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\IgnoreGCFailures




This key needs to be set on the domain controller that performs the initial authentication of the user. Note that setting this key to inhibit security causes potential security vulnerabilities if universal groups are also used.

Important: If this key is enabled, universal groups should not be used because if a user is a member of a universal group and the group is denied access to a resource, the key turns off enumeration of universal groups so the universal group SID is not added to the user's token and the user could have access to the resource.

There is nothing in Active Directory that prohibits the definition of universal groups if this registry key is enabled. It is the responsibility of the administrator to ensure that universal groups are not used if this feature is used.

Advantages of using a global catalog Disadvantages of using a global catalog

Eliminates the need to contact each domain to enumerate universal groups which can have multi-domain membership and visibility.

You may need to configure additional network bandwidth to your global catalogs so that they can communicate with relevant domain controllers.

Improves performance when searching for Active Directory objects especially when those objects may exist in different domains.

Global catalogs in a multi-domain environment require additional hardware such as disk space and processing power.

If a global catalog is located in each site, then a WAN link connecting the sites may only need to be used periodically to replicate objects from different domains. These objects then become available to all users in that site, whereas each access to objects in Windows NT4.0 in different domains would necessitate using the WAN link.




Universal Group Membership Caching

Group membership cached on domain controller after first logon

Universal and global groupsLocal and domain local

Tuning group membership caching

Universal Group Membership Caching

Due to available network bandwidth and server hardware limitations, it may not be practical to have a global catalog in smaller branch office locations. For these sites, domain controllers can be deployed running Windows Server 2003, which can store universal group membership information without being configured as a global catalog.

Information is stored once this option is enabled and a user logs on for the first time. The domain controller obtains the universal group membership for that user from a global catalog. Once the universal group membership information is obtained, it is cached on the domain controller for that site indefinitely and is periodically refreshed. The next time that user attempts to log on, the authenticating domain controller running Windows Server 2003 will obtain the universal group membership information from its local cache without the need to contact a global catalog.

By default, the universal group membership information contained in the cache of each domain controller will be refreshed every eight hours. To refresh the cache, domain controllers running Windows Server 2003 will send a universal group membership confirmation request to a designated global catalog. Up to 500 universal group memberships can be updated at once. Universal group membership caching can be enabled using Active Directory Sites and Services. Universal group membership caching is site-specific and requires that all domain controllers running Windows Server 2003 be located in that site to participate.

It may be necessary to continue using a global catalog in branch office locations if an application in a site is sending global catalog queries to port 3268. Universal group membership caching does not intercept calls made to port 3268.




Note: The term “cache” as it applies to this topic can be misleading. The membership is stored in a non-volatile Active Directory value. The “cached” memberships that are written to the value will not be lost as a result of a reboot or a power outage.

Global Groups Cached In addition to the universal groups, a user’s global groups are also cached. However, local groups on member servers in other domains, and domain local groups in the user’s domain are not cached. When group information is returned from the global catalog, the membership contains both group types. The two group types are not separated, and both types are entered in the cache. This can lead to confusing behavior when an administrator modifies the global group membership for a user and expects the change to be seen immediately. Even if the change is made on the domain controller that validates the user, the membership in the cache will be used instead of the membership on the user attribute. As a result, it can take up to eight hours (by default) before group membership changes are realized at sites where “no global catalog logon” is enabled.

Tuning Group Membership Caching Group membership caching behavior is capable of a number of tuning options. All are located in the registry of each domain controller.

HKLM\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

Data Type : REG_DWORD

Value Default Description

Cached Membership Site Stickiness (minutes)

180 Days This setting determines the maximum time a user’s cache will continue to be refreshed automatically without logging on at this site. When this time expires, the user is removed from the list of users who are updated during the refresh cycle. The user’s site affinity value is updated at a period of half of this value only when a user logs on at the site.

Cached Membership Staleness (minutes)

1 week This defines the valid lifetime of the cached memberships.

Cached Membership Refresh Interval (minutes)

8 hours This defines the length of time between refreshes.

Cached Membership Refresh Limit

500 This defines the maximum number of users per domain controller that are refreshed per refresh cycle.




Account Lockout Settings

Account lockout policyAccount lockout thresholdAccount lockout durationReset account lockout counter after (ObservationWindow)Enforce user logon restrictions

Account lockout propertiesbadPasswordTime (non-replicated) badPwdCount (non-replicated) ntPwdHistory

Can be a potential DoS

Account Lockout Settings

Account Lockout Policy It may be necessary to establish an account lockout policy in addition to establishing a password security policy. Account lockout policies protect environments against brute-force or dictionary attacks. Given enough tries, even complex passwords can be guessed. Account lockout policies reduce the number of guesses that an attacker can make.

It is best to establish an account lockout policy that is restrictive enough to prevent attacks, while still allowing for the occasional user error. An account lockout policy that is too strict might increase the number of support calls in an organization as users type their passwords incorrectly and are mistakenly locked out.

Creating an account lockout policy involves setting the following options in the Default Domain Group Policy object.

Account lockout threshold The account lockout threshold limits the number of times that anyone can attempt to log on to a computer from a remote location. This prevents attackers from trying all possible passwords over the network. This setting is disabled by default in the Default Domain Group Policy object. It is possible to turn it on by setting the value to a number within the accepted range of 1 through 999. Set the value high enough to ensure that occasional errors do not result in account lockout.

Note that this setting does not apply to attempts to log on at the console of a locked workstation or to attempts to unlock a screensaver. Locked workstations cannot be forced to run password-cracking programs.




Account lockout duration The account lockout duration determines how long, in minutes, an account that has exceeded the account lockout threshold remains locked before it is automatically unlocked. Valid settings range from 0 through 99,999 minutes, or about 10 weeks. When the value is set to 0, an administrator must manually unlock the account. Because account lockout policies are designed to protect against brute-force attacks, setting even a low value for the account lockout duration reduces the number of possible attacks considerably. Setting a high value for the account lockout duration can increase help desk calls when legitimate users are mistakenly locked out, and aside from indicating that an attack was attempted, provides little additional protection.

By default, this policy is not defined, because it is only applicable when an account lockout threshold is specified.

Reset account lockout counter after (ObservationWindow) This setting determines the number of minutes that must elapse after a failed logon attempt before the counter is reset to 0 bad logon attempts. The range is 1 through 99,999 minutes. This value must be less than or equal to the account lockout duration.

Enforce user logon restrictions When this option is enabled, the KDC validates every request for a session ticket by examining the user rights policy on the target computer. The user requesting the session ticket must be assigned the Log on locally policy (if the requested service is running on the same computer) or the Access this computer from the network policy (if the requested service is on a remote computer) to receive a session ticket. This option also serves as a means to ensure that the requesting account is still valid. Verification is optional because the extra step takes time and might slow network access to services, but if account rights have changed or user accounts have been disabled between the time when the initial ticket was issued and the time when a service ticket was requested, these changes do not take effect.

By default, the policy is enabled in the Default Domain Group Policy object. If the policy is disabled, this check is not performed. For greater security in an environment in which user accounts change frequently, enable this setting. For faster performance, particularly in a more stable user account environment, disable this setting.

Account Lockout Properties Various account lockout properties are stored as attributes of an account.

badPasswordTime (Non-replicated)

Specifies the last time the user, computer, or service account submitted a password that did not match the password on the authenticating domain controller. This value is stored as a large integer that represents the number of seconds elapsed since 00:00:00, January 1, 1601 (the FILETIME data structure). This property is stored locally on each domain controller in the domain. A value of 0 means that the last bad password time is unknown.




For an accurate value for the user's last bad password time in the domain, each domain controller in the domain must be queried and the largest value should be used.

badPwdCount (Non-replicated)

Specifies the number of times that the user, computer, or service account tried to log on to the account using an incorrect password. This property is maintained separately on each domain controller in the domain except for the primary domain controller (PDC) of the accounts domain, which maintains the total number of bad password attempts. A value of 0 indicates that the value is unknown. For an accurate value for the user's total bad password attempts in the domain, each domain controller in the domain must be queried and the sum of the values should be used.

The table below provides recommended combinations of account lockout and password policy settings for various security configurations. Each configuration includes differing degrees of cost in terms of downtime for the user whose account is locked out, and support time servicing the account.

Security Category Threshold Observation

Window Lockout Duration

Password History

Max Password Age

Min Password Age

Min Password Length

Complexity

Low - - - 3 42 0 0 Disabled

Medium 10 30 30 24 42 1 7 Enabled

High 10 30 Infinite/0 24 42 1 8 Enabled




Domain Controller Behavior

Immediate replication (shortcut)Urgent replication (normal but now!)Single user object “On Demand” replicationPassword history check

App ServerApp Server Domain controllerDomain controller PDCPDC

ClientClient

NTLM

NTLM Chained NTLM

App ServerApp Server Domain controllerDomain controller

PDCPDC

ClientClient

Chained Kerberos

Kerberos

Domain Controller Behavior

When a domain controller that is not the PDC fails an authentication with an incorrect or expired password or one that has an attribute set that the password must be changed at next logon or if the account is locked out, the logon is retried or chained to the PDC operations master. In this way the authenticating domain controller is requesting that the PDC supply a second opinion if the password is current.

The request for authentication is repeated on the PDC operations master to verify that the password is correct. If the PDC emulator rejects the bad password, then both the authenticating domain controller and the PDC emulator will increment the badPwdCount for that user object. The PDC is the authority on the user’s password validity. As long as that user, application, or service continues sending wrong credentials to the authenticating domain controller, the process of chaining bad passwords to the PDC will continue until the threshold value for bad logon attempts is reached (if set in policy) and the users account gets locked out.

Authentication requests arrive at the domain controller differently depending on the authentication protocol being used.

When using NTLM to access an application server such as a file server, it is the file server that validates the user’s domain credentials. To achieve this it passes the authentication request to its authenticating domain controller. If the user account or password is incorrect or does not match the replica domain controllers, the replica domain controller will chain the authentication request to the PDC or PDC emulator for validation. If the password does not match that on PDC then both the authenticating




domain controller and the PDC will increment their badPwdCount. This process will continue until both domain controllers’ badPwdCount for that user account reaches the account lockout threshold value that is defined in Group Policy.

When using Kerberos to authenticate, the client tries to access resources on an application server. However, the client must first contact a KDC to acquire a session ticket for the application server. If the client TGT has expired, the client will have to reauthenticate with the KDC. If a bad password is sent to the KDC, the KDC will chain the authentication request to the PDC for validation. If the PDC rejects the password, then the badPwdCount on both domain controllers is incremented. This process is repeated each time a bad password is sent until the account becomes locked out.

Immediate Replication When a password is changed, it is "pushed" to the primary domain controller (PDC). "Pushed" means that the password is sent over NETLOGON's secure channel to the PDC. Specifically, the domain controller receiving the password change makes a remote procedure call (RPC) to the PDC, which indicates the user and the user’s new password. The PDC then sets this value locally. This push mechanism, also known as “immediate replication,” is independent of Active Directory replication.

Urgent Replication Urgent replication (used on password changes, for example) consists of the domain controller processing the originating update sending an urgent notification to its intrasite replication partners indicating that there are changes available for them to pull. Normally, such a notification following the originating update would be sent after a certain interval governed by the two “Notification Delay” related per-domain controller registry key values. The default “Notification Delay” in Windows 2000 is five minutes (it is 15 seconds in Windows Server 2003). However, for urgent changes like a password change, these delay parameters are ignored and an urgent notification is sent instead.

Even though the replication partner domain controller may make a request for changes immediately in response to such a notification, the changes are still replicated in a single replication stream (that is, there is no out-of-band stream by which the password changes are replicated ahead of other pending changes made prior to the password change). Only the notification sent to partner domain controllers is urgent, but the subsequent pulling of changes by the partner domain controller will pull all changes from the source domain controller up to that time, and not selectively for the change for which the urgent notification was sent.

When an administrator (or a delegated user) unlocks an account, manually sets password expiration on a user account (by selecting the User Must Change Password at Next Logon check box), or resets the password on an account, these attributes are immediately replicated to the primary domain controller (PDC emulator), and then urgently replicated to other domain controllers in that site. By default, urgent replication does not occur across site boundaries. Therefore, it is highly recommended that administrators make




manual user account resets and password changes on a domain controller that resides in the same local site as the user.

Single User Object “On Demand” Replication In Windows 2000 pre-Service Pack 4, the following may occur:

An administrator resets a password and sets “User must change password at next logon” on a domain controller in site A (so the user is given a new password but forced to change it at first logon).

If the user logs on with that new password in site B, the logon succeeds (due to the chaining during authentication), but the subsequent enforced password change fails because domain controllers in site B do not know the new password (owing to replication latency).

The change in SP4 for Windows 2000 helps the problem by implementing an "on demand" replication scheme which works as follows. On a domain controller, when an authentication succeeds due to PDC authentication chaining, an asynchronous request is made to the PDC to replicate one single object (the user object whose authentication just succeeded due to PDC chaining). The idea is that the PDC has the most up-to-date password, and the authenticating domain controller should send it down when it has information that the PDC has a more up-to-date version.

Password History Check Windows 2000 SP4 and Windows Server 2003 introduce a password history check. Before Windows increments a badPwdCount, it checks the failed password against the password history of that user. If the password matches one of the last two entries in the password history, the badPwdCount is not incremented for both NTLM and Kerberos protocols. This change will reduce the number of lockouts that occur because of user/application error.




Lockout Sources

ApplicationsService accounts Bad password threshold set too lowUser logging on to multiple computersScheduled tasksPersistent drive mappingsActive Directory replicationDisconnected terminal server sessionsAccount lockout for remote connectionsInternet Information Services (IIS)Microsoft Exchange/Microsoft Outlook

Lockout Sources

To avoid unnecessary lockouts, the administrator should check each computer on which a lockout occurs for the following problems.

Applications Many applications will cache credentials or keep active threads with credentials after a change in password resulting in the old password continuing to be used.

Service Accounts Service Account passwords are cached by Service Control Manager (SCM) on member computers and domain controllers in the forest. Resetting the password for a service account without resetting the password in SCM will cause account lockouts of the service account. Look for a pattern in Netlogon and event logs from individual clients as they retry logon authentication using the previous password.

Bad Password Threshold Set Too Low This is probably the most common configuration issue. Many organizations have the setting at three or five attempts. By keeping this value too low, erroneous lockouts will take place. The recommended value for this policy setting is 10.

User Logging on Multiple Computers If a user is concurrently logged on to multiple computers, the threads of network applications running on those computers may run in the context of that locally logged on user when accessing resources in the domain. If this user changes his/her password on one of the computers, applications running on the other computers will still use the




original password. As those applications authenticate when accessing network resources, the old password is still being used, and the user’s account becomes locked. When changing the password, log off from all consoles including Terminal Service sessions, change the password from a single console, and log off there as well.

Scheduled Tasks Scheduled processes may have been configured to start using credentials that have since expired.

Persistent Drive Mappings Persistent drives may have been mapped using credentials that have since expired. The simplest way to ensure current credentials are used is to cancel and re-establish the mapping. Persistent Net Use shares are often the cause of users locking themselves out accidentally. When explicit credentials are entered while connecting to a share, the credential is not persistent unless it is explicitly saved in Stored User Names and Passwords, whereas the mapping is consistent. Every time the user logs off, logs on, or reboots, Windows attempts to restore the connection, and the authentication attempt fails because there are no stored credentials. This increments the badPwdCount attribute. To avoid this problem, configure Net Use not to make connections persistent. To do this, type net use /persistent:no at a command prompt.

Active Directory Replication User properties need to replicate between domain controllers. Any delays in the replication can result in the password on a domain controller not being current. It is important to verify that proper Active Directory replication is taking place within the domain.

Disconnected Terminal Server Sessions Disconnected sessions may be running a process that is using credentials or a mapped drive. A disconnected session can have the exact same effect as a user with multiple interactive logons. The only difference is the source of the lockout comes from a Terminal Server.

Service Accounts By default, many computer services are configured to start/log on using the “Local System” account. However, a service logon account can be manually configured to log on using a specific user account/password. If a service is configured to start with a specific user account and that user later changes his/her password, the service logon property will need to be updated with the new password or that service may lock out that users account.

Account Lockout for Remote Connections The Active Directory account lockout feature that is discussed in this section is independent of the account lockout feature for remote connections.




Reference: For more information, see the following Knowledge Base article: 310302 “HOW TO: Configure Remote Access Client Account Lockout in Windows 2000.”

Internet Information Services By default, Internet Information Services (IIS) has a token-caching mechanism in place in which user account authentication information is cached locally on the server running IIS. If lockouts are specific to users accessing Microsoft Exchange mailboxes via Microsoft Office Outlook® Web Access and IIS, resetting the token cache on IIS may resolve the lockout problem.

Reference: For more information, see the following Knowledge Base article: 173658 “Mailbox Access via OWA Depends on IIS Token Cache.”

Microsoft Exchange and Microsoft Outlook Account Lockout Outlook clients may have multiple bindings that they use which may impact a low “Account Lockout Threshold” setting. To reduce the number of attempts, remove some of the bindings located in the following registry setting or increase the threshold.

HKLM\SOFTWARE\Microsoft\Exchange\Exchange Provider

Reference: For more information, see the following Knowledge Base article: 163576 “Changing the RPC Binding Order.”




Common Causes of Logon Failures

Error messages appearing during logonSecure channelName resolutionUser rightsGINACrashonauditfail

Common Causes of Logon Failures

"The system cannot log you on to this domain because the system's computer account in its primary domain is missing or the password on that account is incorrect.”

Use Netdom or Nltest to verify/reset the secure channel between the computer and the domain controller.

Alternately, you can also remove the computer from the domain and re-add it if it is a member server or workstation.

Verify security settings on the client and domain controllers:

• LM compatibility • SMB signing • Strong secure channel key settings

Perform a network trace, at a minimum capturing the client side traffic. Verify that the domain controller being contacting is valid and functional. Look for other abnormalities. Compare the capture with known good logon traffic.

"The system could not log you on. Make sure your User name and domain are correct, and then type your password again. Letters in passwords must be typed using the correct case. Make sure that Caps Lock is not accidentally on."

Verify Name Resolution




• Windows NT 4.0 domain or Non-Kerberos trust using NetBIOS domain name: Test client logon by adding appropriate entries to the LMHOSTS file.

• Active Directory domain: Verify DNS records with tools such as nslookup. Verify the user has the "Access this computer from the network" user right defined on the domain controllers.

Verify Sysvol/Netlogon shares exist on target domain controller.

Verify time synchronization between client and DC:

Check client for firewall applications—for example Surf Control, BlackIce, and Zone Alarm.

"The local policy of this system does not permit you to logon interactively"

Grant the desired user the right to logon locally. This can be done remotely with NTRights.exe, or modify Group Policy.

Check for misconfigured policy using the Winlogon.log.

Reference: For more information, see the following Knowledge Base article: 245422 "How to Enable Logging for Security Configuration Client Processing."

Check for viruses, Trojans, MiRC, and so forth. Check scheduled tasks and Run keys.

No error - after applying policy the user is returned back to the ctrl-alt-del screen.

Access the registry: (You can do this remotely, via parallel install, or via Windows PE.) Verify that you have a correct path for the UserInit value in:

HKLM\Software\Microsoft\Windows NT\CV\Winlogon\

Check for/Remove any third-party GINAs by verifying that the GinaDLL value is either not visible, or is set to MSGina.dll, and then reboot.

“Your account is configured to prevent you from using this workstation. Please try another workstation.”

Log on to the computer with an account with local administrator privileges.

Check the Registry for CrashOnAuditFail value:

HKLM\System\CCS\Control\LSA

If the value is missing or set to 0: Check the affect Domain User's properties for workstation logon restrictions.

If the value is set to 1 (policy enabled): Check the affect Domain User's properties for workstation logon restrictions.

If the value is set to 2 (policy enabled and in error state): While logged on as an administrator, correct the condition putting the computer in an error state, modify the CrashonAuditFail value, and then reboot the system.




Logon Failure due to Token Size

If we can’t build a token then logon failsDefault Token size of 12000 ( bytes ) Can be bloated by the following:

Client is a member of a large number of groups ( either directly or transitively ) SID History

Improved in Windows 2000 SP4 and later OS’sToken size not increased but more efficient storageMaximum Token size increased to 65535 bytes

Updated NTDSUTIL can assist in trouble shootingTokensz.exe tool can display token sizes

Logon Failure due to Token Size

An access token is created when a security principal (ex: user) logs on to a computer or attempts to access a resource. The access token contains information about the identity and privileges associated with the security principal. Every process has a token that describes the security context of the principal’s account associated with the process.

Every security principal such as a user, group, computer, or domain controller is issued a unique Security Identifier (SID) when the object is created. An example of a SID is shown below:

S-1-5-21-1177238915-1614895754-839522115-1016

This SID is used as an identifier for the security principle regardless if this is a user, computer account, or group. It is this identifier that is placed in the access token when generated by Lsass during a logon. This access token will not only include the SID of the security principal itself but also those of any groups to which it is a member. The following example shows how an access token is created when a user logs on in the following manner:

When a user logs on interactively or tries to make a network connection to a computer running Windows, the user’s logon credentials are authenticated. If the authentication succeeds, the logon process returns a SID for the user and a list of SIDs for the user’s security group membership.




The Local Security Authority (LSA) on the computer uses this information to create an access token that includes the SIDs returned by the logon process. The token also includes a list of privileges assigned by the local security policy to the user and the user’s security groups. The LSA uses a process called token evaluation to determine which security groups to include in the token.

Several factors can affect the outcome of the token evaluation process, including the following:

• Whether the token is issued for logon purposes or for resource access. • The groups of which the principal is a member, including direct and transitive

memberships. • The types of groups involved.

There are two types of groups in Active Directory: distribution groups and security groups. Distribution groups are not included in the principal's token, but all security groups are included. All group scopes (universal, global, domain local, machine local, and built-in) are included in the token evaluation.

• The functional level (for Windows Server 2003) or the domain mode (for Windows 2000 Server).

The token evaluation process evaluates groups recursively. For example, if User A is a member of Group 1 and Group 1 is a member of Group 2, then a token generated for User A contains SIDs representing both Group 1 and Group 2. In native mode and higher domains, universal, global, and domain local groups are all evaluated recursively. Universal security groups do not exist in mixed mode domains.

Default Token Sizes In the initial release of Windows 2000 the token size was set to 8,000 bytes; however, in Windows 2000 Sp2 and Windows 2003, the size was increased to 12,000 bytes with a maximum size of 65,535. To adjust the token size on a system one can use the following registry value:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Lsa\Kerberos

Value: MaxTokenSize

Data Type: REG_DWORD

Default Value: 12000

In most cases this value does not have to be adjusted due to the way SIDs are stored in the token. However, you can use the following formula to determine if this the current token size is sufficient:




TokenSize = 1200 + 40d + 8s

Where:

• d = The number of domain local groups of which a user is a member plus the number of universal groups outside the user's account domain plus the number of groups represented in security ID (SID) history.

• s = The number of security global groups of which a user is a member plus the number of universal groups in a user's account domain.

• 1200 = The estimated value for ticket overhead. This value can vary depending on factors such as DNS domain name length, client name, and other factors.

If the value you calculate for Tokensize using the above formula is less than 12,000 bytes, then modification of the MaxTokenSize Registry entry is not required.

How the Access Token Limit Is Reached When a user logs on and authentication is successful, the logon process returns a SID for the user and a list of SIDs for the user’s security groups and these comprise the access token. SID history can add additional SIDs to the token. The SIDs in an access token include:

• The security principal's SID, including SIDs from the SID history of the principal. • The SID from each domain local group of which the principal is directly or

transitively a member, for the domain of the workstation or resource. • The SID for each global group of which the principal is directly or transitively a

member, including SIDs from the SID history of the group. • The SID for each universal group of which the principal is directly or transitively a

member, including SIDs from the SID history of the group. • The SID for each built-in group of which the principal is directly or transitively a

member. • The SID for each local group of which the principal is directly or transitively a

member. Due to a system limitation, the field that contains the SIDs of the principal's group memberships in the access token can contain a maximum of 1,024 SIDs. If there are more than 1,024 SIDs in the principal's access token, the Local Security Authority (LSA) cannot create an access token for the principal during the logon attempt. If this happens, the principal cannot log on or access resources.




In environments that use SID history, each security principal can have two or more SIDs. An additional SID is optionally added to the sIDHistory attribute when a security principal is migrated. Since groups, as well as users, can have SID history, the token of a migrated user with migrated groups can potentially have double the number of SIDs compared to a user that is not migrated.

Note: To reduce the token size of migrated users, ensure that your migration plans include security translation and retirement of the sIDHistory attribute, when possible.

Symptoms of an Access Token Limitation Problem When a user or a computer tries to log on or access a resource, the system exhibits different symptoms if the access token limitation is reached. For users, the symptoms depend on the source of the limitation. For computers, the symptoms depend on the role of the computer, specifically, domain controllers.

User Account Symptoms A user might experience one of the following symptoms if the access token limitation is reached:

• Scenario 1: A user is denied logon to a computer that is located in the same domain as the user.

A user who is not able to log on due to the access token limitation receives the following error message:

In this case, the user, the user's computer, the account domain controller, and global catalog server contribute SIDs in the token evaluation process. The groups that can potentially contribute to the cause of the access token limitation problem are domain local, global, or universal groups.

• Scenario 2: A user cannot access a resource located in the account domain. The user receives an error message similar to the message for logon denial. In this case, in addition to the user and user's computer, the resource server, domain controller, and




global catalog server contribute SIDs. If the user can log on successfully, machine local groups on the resource server are most likely causing the access token limitation problem.

• Scenario 3: A user cannot access a resource in another domain. In this case, in addition to the user and user's computer, the resource server, the domain controller in the account domain, the domain controller in the resource domain, and the global catalog server contribute SIDs in the token evaluation process. If the user can log on, the domain local groups in the resource domain, or local machine groups on the resource server might be causing the problem.

Performing a Preliminary Analysis You can run the Group Membership Evaluation task of the Ntdsutil.exe tool for a user, group, computer, or domain controller. Before you run the tool, begin your problem analysis by determining the scope of the affected principals. You can start the problem analysis by answering the following questions:

• Is it indeed a problem caused by exceeding the access token limitation? • Is this happening to one security principal or many? • If there are multiple principals experiencing this problem, what are the common

patterns among them? • Do all affected principals belong to the same domain? • Do all principals belong to the same forest, but different domains? • Do all principals belong to a common business unit, organizational unit (OU),

location, or other logical grouping? Answering these questions can help direct your effort in locating the source of the access token limitation problem. For more information on creating and analyzing the reports please consult the document “Addressing Problems Due to Access Token Limitation” at the following link:

http://www.microsoft.com/downloads/details.aspx?familyid=22dd9251-0781-42e6-9346-89d577a3e74a&displaylang=en

Tokensz: Kerberos Token Size Tool A tool now exists that can calculate the size of the token and determine if the source of the Kerberos errors stem from reaching the maximum token size. The tool will simulate an authentication request and report the size of the resulting Kerberos token along with the maximum supported size for the token.

Kerberos Token size (tokensz.exe) is a command line tool that you can use to view the maximum Kerberos token size for a given account. For more information and where to download the tool please consult the “Troubleshooting Kerberos Errors” paper on Microsoft TechNet center:

http://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/security/tkerberr.mspx#E2HAC




Other Logon Failures

Logon failures due to replicationTime delays Duplicate accounts

Time and date are not synchronizedSmart Card logon is requiredProtocol bindingsDigest authentication

Other Logon Failures

Logon Failures Due to Replication Time delays in Active Directory multi-master replication play a critical role in logon, and make it difficult to identify the domain controller to which changes will be directed. Something as simple as creating a user account—and not waiting or forcing replication to occur—may cause logon failures. Errors can occur if the logon is directed to a domain controller where the new account has not been replicated.

Another related problem is the creation of two duplicate accounts at the same time. Because no two objects can have the same name, one object will be renamed using the object’s GUID. For example, if the user account named “Alice” was created simultaneously on two domain controllers, they would both appear as “Alice” in the Active Directory Users and Computers snap-in until replication took place.

Reference: For more information, see Active Directory Replication later in this workbook.

Time and Date Are Not Synchronized When attempting to log on the network, users may receive the following error message:

The system cannot log you on due to the following error: There is a time difference between the Client and Server. Please try again or consult your system administrator.

This behavior can occur if the time or date is not synchronized between the user’s computer and the domain to which he or she is attempting to log on. If the client




computer's time or date is not synchronized with the authenticating domain controller, Kerberos validation does not succeed. This occurs because of the variation in the time stamps between the client and server. Because Kerberos is the only form of logon authentication between two Active Directory-based computers, the logon does not succeed.

To resolve this issue, log on to the computer locally using an account with administrative privileges, and set the time and date to match the time and date and time zone on the domain controller that validates the logon.

Smart Card Logon Is Required When attempting to log on with a password, users may receive the following error message:

Your account has been disabled. Please see your system administrator.

This behavior can occur if the account is configured to allow only smart card logons but a user attempts to log on with a password. Each user account object contains a “User must logon using a smart card” option. If this option is selected and a logon is attempted without using a smart card, the error message listed above appears even though the account is not actually disabled.

It is not possible to log on without using a smart card until the administrator removes this restriction from the user account.

Because of certain protocol limitations, a more accurate error message (such as “Smart card is required”) cannot be returned to the client. Instead, an error message indicating that the account is disabled is displayed. In environments in which smart cards are used, verify that this option has not been enabled, or that the smart card is being used to log on.

Protocol Bindings The order of protocol bindings and the use of client redirectors are still factors in terms of a successful logon. Each setting can provide its own unique type of failure. Also, because there is support for working “Offline” in Windows 2000 and later, the connectoid for the LAN connection can be temporarily disabled, which will also cause network logon failures similar to unplugging the network cable.

Digest Authentication Digest authentication in IIS addresses many of the weaknesses of Basic authentication. The password is not in clear text when Digest authentication is used. In addition, Digest authentication can work through proxy servers, unlike integrated Windows authentication. In some configurations user accounts must be configured to have the “Save password as encrypted clear text” option enabled. This is an option on each user object in the Active Directory. Setting this option requires the password to be reset or re-entered.




Pre-Windows 2000 Compatible Access

Anonymous accessLegacy applicationsDCpromo optionPre-Microsoft Windows 2000 compatible

Anonymous (Microsoft Windows Server 2003)Everyone (Windows 2000 & Windows Server 2003)

Pre-Windows 2000 Compatible Access

To maximize security, by default, Active Directory does not allow accounts logged on with Anonymous access the ability to view group memberships and other user and group information. Windows NT 4.0 did allow this degree of access. Several existing applications, including Microsoft BackOffice® applications like Microsoft SQL Server™, as well as some third-party applications, depend on this type of access to function correctly.

To allow administrators to choose between the stronger security provided by Active Directory and the ability to continue to use the security required for legacy applications, Windows 2000 and later includes the “Builtin” local security group “Pre-Windows 2000 Compatible Access.” In Windows 2000, adding or removing the special group Everyone as a member of this group and then rebooting the domain controllers in that domain allows the network to operate either with pre-Windows 2000 security levels or with the greater security provided by Active Directory. In Windows Server 2003 it is necessary to add both the Everyone group and Anonymous account.

To provide a clean and simple upgrade path from Windows NT, the Active Directory Installation wizard offers the choice between Permissions compatible with pre-Windows 2000 servers, which provides the security level compatible with some pre-Windows 2000 applications and Permissions compatible only with Windows 2000 or Windows Server 2003.




If Permissions compatible only with Windows 2000 or Windows Server 2003 is chosen while promoting a domain controller, and applications are not functioning correctly, try resolving the problem by adding the special group Everyone and Anonymous to the Pre–Windows 2000 Compatible Access security group and rebooting the domain controllers in the domain. Once the upgrade to Windows 2000 or later compatible applications is completed, administrators should return to the more secure Windows 2000 or later configuration by removing Everyone and anonymous from the Pre-Windows 2000 Compatible Access security group and rebooting the domain controllers in the affected domain.




Section 2: Logon Failure Troubleshooting Tools

Understand the benefits of the secondary logon service.Configure the Kerberos Service from the Registry.Utilize the Kerberos management tools. Utilize Account Lockout troubleshooting tools.

Section 2: Logon Failure Troubleshooting Tools

Introduction There are a number of tools for troubleshooting logon failures.


• Understand the benefits of the secondary logon service. • Configure the Kerberos Service from the Registry. • Utilize the Kerberos management tools. • Utilize Account Lockout troubleshooting tools.

Related Topics Covered in This Lesson • Authentication • Kerberos • Account Lockout

Recommended Reading • Account Lockout training




Demo: Kerbtray

GUI Tool to display local ticket informationCan “list” and “purge” tickets

Kerbtray

Kerberos Tray is a graphical user interface (GUI) tool that displays some local ticket information for an Active Directory-based computer.

The KerbTray icon will be located in the status area of your desktop and can be used to view and purge the ticket cache. Positioning the mouse cursor over the KerbTray icon will display the time left on the initial TGT before it expires. The icon will also change in the last hour of life before the Local Security Authority (LSA) renews the ticket.

Note: The initial TGT is the ticket you received when you first logged onto the Active Directory domain with the account.

Using KerbTray Double-clicking will bring up a list of tickets obtained since logon. Right-clicking the icon will bring up a menu. Selecting List Tickets will display the same dialog as a double click.

List Tickets lists all tickets you have obtained since logon.

Purge Tickets will destroy all tickets that you have cached. New tickets are acquired the next time Kerberos services are used.




The KerbTray dialog box comprises the following sections:

• The top section lists the name of your Kerberos client principal associated with your domain account.

• The scrolling list contains domains and tickets for services that have been used since logon that are still active. Select an item here, and its properties are displayed in the remaining sections of the dialog.

• The middle section lists the service principal. This name is the target principal name for the selected ticket from the domain list.

• The bottom section is a set of property pages (Names, Times, Flags, and Encryption types) that describe attributes of the ticket selected in the scrolling list. Only unexpired tickets show attributes.

Names Tab Option Description

Client name Requestor of the ticket. In most cases this is your client principal name.

Service name Canonical name of the account principal for the service. This is the same as the samAccountName property in the directory for that account. A ticket-granting ticket (TGT) is a ticket for the key distribution center (KDC) service. The "initial" TGT is the TGT that you got when you logged on for the domain with your account. The service name for a TGT is krbtgt.

Target name Service name for which the ticket was requested. This is the name of a servicePrincipalName property on an account in the directory.

Times Tab Option Description

Start time Time from which the ticket is valid.

End time Time until which the ticket is valid. Once a ticket is past this time, it can no longer be used to authenticate to a service.

Renew until If the ticket is a renewable ticket, then this is the maximum lifetime of the ticket. In order to continue using a ticket it must be renewed. Tickets must be renewed before both the End time and Renew until times expire.

Execution types Tab Option Description

Ticket Encryption Type Encryption type used to encrypt the Kerberos ticket.

Key Encryption Type Encryption type the enclosed session key will be used with.




The following Kerberos ticket flags may be set:

Flags tab Option Description

Forwardable This flag allows for authentication forwarding without requiring the user to enter a password again.

Forwarded This flag is set by the ticket-granting service (TGS) when a client presents a ticket with the FORWARDABLE flag set and requests it be set by specifying the FORWARDED key distribution center (KDC) option and supplying a set of addresses for the new ticket. It is also set in all tickets issued based on tickets with the FORWARDED flag set.

Proxiable This flag allows a client to pass a proxy to a server to perform a remote request on its behalf. When set, this flag tells the ticket-granting service (TGS) that it can issue a new ticket, but not a ticket-granting ticket (TGT), with a different network address based on this ticket.

Proxy This flag is set in a ticket by the ticket-granting service (TGS) when it issues a proxy ticket. Application servers may check this flag and require additional authentication from the agent presenting the proxy in order to provide an audit trail.

May Postdate This flag must be set in a ticket-granting ticket (TGT) in order to issue a postdated ticket based on the presented ticket.

Postdated This flag indicates a ticket has been postdated. Postdated tickets provide a way to obtain these tickets from the key distribution center (KDC) at job submission time, but leave them "dormant" until they are activated and validated by a further request of the KDC. When the KDC issues a POSTDATED ticket, it will also be marked as INVALID, so that the application client must present the ticket to the KDC to be validated before use.

Invalid This flag indicates the ticket is invalid (not valid). A postdated ticket will usually be issued in this form. Invalid tickets must be validated by the key distribution center (KDC) before use. Tickets are presented to the KDC in a ticket-granting server (TGS) request with the VALIDATE option specified. The KDC will only validate tickets after their start time has passed.

Initial This flag indicates the ticket was issued using the Authentication Service protocol and not issued based on a ticket-granting ticket (TGT).

Renewable This flag allows the ticket holder to maintain a valid ticket for long periods of time. Renewable tickets have two "expiration times": the first is when the current instance of the ticket expires, and the second is the latest permissible value for an individual expiration time.

HW Authenticated This flag provides additional information about the initial authentication, regardless of whether the current ticket was issued directly, in which case INITIAL will also be set, or issued on the basis of a ticket-granting ticket (TGT), in which case the INITIAL flag is clear.

Pre-authenticated This flag provides additional information about the initial authentication, regardless of whether the current ticket was issued directly, in which case INITIAL will also be set, or issued on the basis of a ticket-granting ticket (TGT), in which case the INITIAL flag is clear.




Option Description

OK a delegate This flag indicates that the server (not the client) specified in the ticket has been determined by policy of the realm to be a suitable recipient of delegation. Windows 2000 will only forward the user's credentials to services that are "OK as delegate".

File Required • Kerbtray.exe

Source • Resource Kit

Reference: For more information, see RFC-1510, The Kerberos Network Authentication Service (v5).




Klist

Command line tool for viewing and deleting Kerberos ticketsklist [-?] [tickets | tgt | purge]

Klist

Klist Kerberos List is a command-line tool that enables you to view and delete Kerberos tickets granted to the current logon session. To use this tool, and see any tickets, the computer must be joined to an Active Directory domain and logged on with a domain account.

Kerberos List from a client shows:

• TGT to a Windows Kerberos KDC. • TGT to Ksserver on UNIX.

Warning! Deleting Kerberos tickets can disable functionality for the current logon session.

Kerberos List Syntax klist [-?] [tickets | tgt | purge]

Where:

-?

Displays command-line help.

tickets




Lists the currently-cached tickets of services that you have authenticated to since logon. Displays the following attributes of all cached tickets:

Option Description

Server Server and domain for the ticket.

KerbTicket Encryption Type Encryption type used to encrypt the Kerberos ticket.

End Time Time the ticket becomes no longer valid. Once a ticket is past this time, it can no longer be used to authenticate to a service.

Renew Time If the ticket is a renewable ticket (see TicketFlags below), then this is the maximum lifetime of the ticket. In order to continue using this ticket it must be renewed before the End Time. It can be renewed as long at it is before the End Time and is before the RenewUntil time.

tgt

Lists the initial Kerberos ticket-granting-ticket (TGT). Displays the following attributes of the currently-cached ticket:

Option Description

ServiceName A TGT (ticket-granting-ticket) is a ticket for the KDC service. The service name for a TGT is krbtgt.

TargetName Service name the ticket was requested for. This is the name of a servicePrincipalName property on an account in the directory.

FullServiceName Canonical name of the account principal for the service.

DomainName The domain name of the service.

TargetDomainName For a cross-realm ticket, this is the realm in which the ticket is good instead of the issuing realm.

AltTargetDomainName The name supplied to InitializeSecurityContext that generated this ticket, usually an SPN.

TicketFlags Kerberos ticket flags set on the current ticket in hexadecimal. The KerbTray tool displays these flags visually in the Flags tab.

KeyExpirationTime The key expiration time from the KDC reply.

Start time Time the ticket becomes valid.

End Time Time the ticket becomes no longer valid. Once a ticket is past this time, it can no longer be used to authenticate to a service.

RenewUntil If the ticket is a renewable ticket (see TicketFlags), then this is the maximum lifetime of the ticket. In order to continue using a ticket it must be renewed. Tickets must be renewed before both the End Time and RenewUntil times expire.

TimeSkew The reported time difference between the client computer and the server computer for a ticket.

purge




Purge allows you to delete a specific ticket. Purge will destroy all tickets that you have cached, so use this with caution. It might stop you from being able to authenticate to resources. If this happens you will have to log off and log on again.

Kerberos List Examples List details about the current ticket. This is useful to determine if you have a valid ticket to the krbtgt/[email protected] principal.

klist tgt Cached TGT: ServiceName: krbtgt TargetName: krbtgt FullServiceName: imauser DomainName: CONTOSO.COM TargetDomainName: CONTOSO.COM AltTargetDomainName: CONTOSO.COM TicketFlags: 0x40e00000 KeyExpirationTime: 0/256/1 1:00:29928 StartTime: 10/6/1999 7:43:02 EndTime: 11/5/1999 7:43:02 RenewUntil: 12/5/1999 7:43:02 TimeSkew: 12/5/1999 7:43:02

List all currently cached tickets.

klist tickets

Cached Tickets: (4)

Server: krbtgt/ [email protected] KerbTicket Encryption Type: RSADSI RC4-HMAC(NT) End Time: 11/5/1999 7:43:02 Renew Time: 12/5/1999 7:43:02 Server: krbtgt/ [email protected] KerbTicket Encryption Type: RSADSI RC4-HMAC(NT) End Time: 11/2/1999 15:31:40 Renew Time: 12/3/1999 15:31:38 Server: [email protected] KerbTicket Encryption Type: RSADSI RC4-HMAC(NT) End Time: 11/4/1999 5:09:08 Renew Time: 12/4/1999 8:28:53 Server: [email protected] KerbTicket Encryption Type: RSADSI RC4-HMAC(NT) End Time: 11/4/1999 5:09:08 Renew Time: 12/4/1999 8:28:53

File Required • Klist.exe





Kerberos Registry Keys

Can be created in Group Policy or locallyHKLM\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters

Kerberos Registry Keys

This section describes registry entries related to Kerberos. In most cases, it is not necessary to change these values in the registry. Rather, they can be modified globally by using the domain Group Policy object. When testing or troubleshooting, it may be advisable not to make global changes to Kerberos parameters; the information in the section below includes detailed information needed when trying to resolve a problem. During the boot process, these registry entries are read and stored in global variables for use by Windows and Kerberos-aware programs.

All of the values are contained within the following registry key:

HKLM\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters

Value: SkewTime


Default Data: 5 (minutes)

This is the skew time in minutes. This is the time difference that is tolerated between one computer and the computer that you are trying to authenticate to. If you are using a checked build, the default is two hours.

Value: LogLevel





Default Data: 0

When set to 1, all Kerberos errors are logged in the system event log.

Value: MaxPacketSize


Default Data: 2,000 (bytes)

Whatever this is set to is the maximum size that the operating system will try with User Datagram Protocol (UDP). When the packet size is larger than this value, TCP is used.

Value: StartupTime


Default Data: 120 (seconds)

The time to wait (the specified number of seconds) for the KDC to start before giving up.

Value: KdcWaitTime



The value passed to Winsock as a time out for selecting a response from a KDC.

Value: KdcBackoffTime



The value that is added to KERB_KDC_CALL_TIMEOUT for each successive call to a KDC in case of a retry.

Value: KdcSendRetries


Default Data: 3

The number of retry attempts a client will make in order to contact a KDC.

Value: DefaultEncryptionType





Default Data: KERB_ETYPE_RC4_HMAC_NT

The default encryption type for preauthentication. KERB_ETYPE_RC4_HMAC_OLD is the other possible value.

Value: UseSidCache

Data Type: REG_BOOL

Default Data: False

A flag that decides whether SIDs are used instead of names. SID lookups are faster for SAM at the server end.

Value: FarKdcTimeout


Default Data: 10 (minutes)

This time-out value is used to invalidate a domain controller that is in the domain controller cache for the Kerberos clients for domain controllers that are not in the same site as the client. KerbGlobalFarKdcTimeout saves this value as a TimeStamp (10000000 * 60 * number of minutes).

Value: StronglyEncryptDatagram

Data Type: REG_BOOL

Default Data: False

A flag that decides whether to use 128-bit encryption for datagram packets.

Value: MaxReferralCount


Default Data: 6

This is the count of how many KDC referrals a client follows before giving up.

Reference: For more information, see the Knowledge Base article 837361 Kerberos protocol registry entries and KDC configuration keys in Windows Server 2003.




EventCombMT EventCombMT

EventCombMT is a multithreaded tool that will parse event logs from many servers at the same time, spawning a separate thread of execution for each server that is included in the search criteria. The tool allows you to:

• Define either a single Event ID or multiple Event IDs to search for. You can include a single event ID or multiple event IDs separated by spaces.

• Define a range of Event IDs to search for. The endpoints are inclusive. For example, if you want to search for all events between and including Event ID 528 and Event ID 540, you would define the range as 528 > ID < 540. This feature is useful because most applications that write to the event log use a sequential range of events.

• Limit the search to specific event logs. You can choose to search the system, application, and security logs. If executed locally at a domain controller, you can also choose to search FRS, DNS, and Active Directory logs.

• Limit the search to specific event message types. You can choose to limit the search to error, informational, warning, success audit, failure audit, or success events.

• Limit the search to specific event sources. You can choose to limit the search to events from a specific event source.




• Search for specific text within an event description. With each event, you can search for specific text. This is useful if you are trying to track specific users or groups.

Note: You cannot include search logic, such as AND, OR, or NOT, in the specific text. In addition, do not delimit text with quotes.

• Define specific time intervals to scan back from the current date and time. This allows you to limit your search to events in the past week, day, or month.




Auditing Account Logons

Audit Account EventsValidated by a domain controllerAppears in the security log of the DCE.g. Event 672

Audit Logon EventsWill appear in the local Seucrity log of workstation member server and Security log of a DCE.g. Event 528

Event ID 528 will contain the logon type2=interactive console (Windows 2000 logs for Terminal Service logon3=Network10=Remote Interactive (TS Logon for XP and W2k3)

Audit Account Logon

When auditing logons, there are two Auditing policies you need to enable:

• Audit Account Logon events • Audit Logon Events Using the above audit policies on domain controllers you can build a baseline behavior of account logons and can be useful in security investigations. However, there are differences in the two Audit settings (Account Logon events Logon events), which need to be distinguished in order use these settings successfully.

Account Logon Events When a user logs on to a domain, the logon is processed at a domain controller that can validate the account. When auditing Account Logon Events, the domain controller that authenticated the account is where the event will be logged in the Security log.

For example, a user logs on to Windows XP workstation joined to the child.constoso.com domain but the account utilized for the logon is from the contoso.com domain. In this scenario, the logon event will appear not on the domain controllers in the child.contoso.com domain but instead on the domain controller in constoso.com that authenticated the account.

Also note that only the domain controller that authenticates the account logs and event in the domain controller’s security log. For environments with a large number of domain controllers it may be hard to determine which DC authenticated the account. For this




reason, it is highly recommended that all security event logs be consolidated to a central location to be analyzed.

To enable Auditing for Account Logon Events open the appropriate policy and expand the tree as follows:

Computer Configuration\Windows Settings\Security Settings\Local Policies\Audit Policy\

Check the box labeled Define these policy settings and check the box for Success and Failure. Listed below are some common events when auditing account logon events:

Account Logon Events Description

672 An authentication service (AS) ticket was successfully issued and validated.

673 A ticket-granting service (TGS) ticket was granted.

674 A security principal renewed an AS ticket or TGS ticket.

675 Preauthentication failed. This event is generated on a Key Distribution Center (KDC) when a user types in an incorrect password.

676 Authentication ticket request failed. This event is not generated in Windows XP or in the Windows Server 2003 family.

677 A TGS ticket was not granted. This event is not generated in Windows XP or in the Windows Server 2003 family.

678 An account was successfully mapped to a domain account.

681 Logon failure. A domain account logon was attempted. This event is not generated in Windows XP or in the Windows Server 2003 family.

682 A user has reconnected to a disconnected terminal server session.

683 A user disconnected a terminal server session without logging off.

Audit Logon Events Where account logon events are generated when a domain user account is authenticated on a domain controller, audit logon events are generated for a user logging on to or off from a computer. For example, a user logging on to a Windows XP or Windows 2003 domain member system will generate logon events in the security log of the local workstation or member server, not the domain controller that authenticated the account. While this is a very simple example it does illustrate the difference between the two types of auditing.

In most cases the audit policies Audit account logon events and Audit logon events are usually enabled in the security policies on domain controllers and member servers. In this case, you would see both account logon events and logon events registered in the security log of a domain controller when a user logs on interactively because they will have to retrieve logon scripts and group policy. To do this, the user will connect to the domain controller and perform a network logon vs. and interactive logon.




Note: The most common logon event (for successful logons) will be the event 528, which will list the logon type (ex: 2 = Interactive/console, 3 = Network, etc.). Please consult the table below for the logon type.

To enable Auditing for logon events open the appropriate policy and expand the tree as follows:

Computer Configuration\Windows Settings\Security Settings\Local Policies\Audit Policy\

Check the box labeled Define these policy settings and check the box for Success and Failure. Listed below are some common events when auditing account logon events:

Logon Events Description

528 A user successfully logged on to a computer. For information about the type of logon, see the Logon Types table below.

529 Logon failure. A logon attempt was made with an unknown user name or a known user name with a bad password.

530 Logon failure. A logon attempt was made user account tried to log on outside of the allowed time.

531 Logon failure. A logon attempt was made using a disabled account.

532 Logon failure. A logon attempt was made using an expired account.

533 Logon failure. A logon attempt was made by a user who is not allowed to log on at this computer.

534 Logon failure. The user attempted to log on with a type that is not allowed.

535 Logon failure. The password for the specified account has expired.

536 Logon failure. The Net Logon service is not active.

537 Logon failure. The logon attempt failed for other reasons. Note: In some cases, the reason for the logon failure may not be known.

538 The logoff process was completed for a user.

539 Logon failure. The account was locked out at the time the logon attempt was made.

540 A user successfully logged on to a network.

541 Main mode Internet Key Exchange (IKE) authentication was completed between the local computer and the listed peer identity (establishing a security association), or quick mode has established a data channel.

542 A data channel was terminated.

543 Main mode was terminated. Note: This might occur as a result of the time limit on the security association expiring (the default is eight hours), policy changes, or peer termination.

544 Main mode authentication failed because the peer did not provide a valid certificate or the signature was not validated.

545 Main mode authentication failed because of a Kerberos failure or a password that is not valid.

546 IKE security association establishment failed because the peer sent a proposal that is not valid. A packet was received that contained data that is not valid.

547 A failure occurred during an IKE handshake.




Logon Events Description

548 Logon failure. The security ID (SID) from a trusted domain does not match the account domain SID of the client.

549 Logon failure. All SIDs corresponding to untrusted namespaces were filtered out during an authentication across forests.

550 Notification message that could indicate a possible denial-of-service attack.

551 A user initiated the logoff process.

552 A user successfully logged on to a computer using explicit credentials while already logged on as a different user.

682 A user has reconnected to a disconnected terminal server session.

683 A user disconnected a terminal server session without logging off. Note: This event is generated when a user is connected to a terminal server session over the network. It appears on the terminal server.

As previously noted, the most common logon event is the event ID 528. In the text of the event it will show a value for Logon Type. This value can help you determine if the logon is a console logon, network, or a service starting. The table below shows the logon types and their descriptions.

Logon type Logon title Description

2 Interactive A user logged on to this computer.

3 Network A user or computer logged on to this computer from the network.

4 Batch Batch logon type is used by batch servers, where processes may be executing on behalf of a user without their direct intervention.

5 Service A service was started by the Service Control Manager.

7 Unlock This workstation was unlocked.

8 NetworkCleartext A user logged on to this computer from the network. The user's password was passed to the authentication package in its unhashed form. The built-in authentication packages all hash credentials before sending them across the network. The credentials do not traverse the network in plaintext (also called cleartext).

9 NewCredentials A caller cloned its current token and specified new credentials for outbound connections. The new logon session has the same local identity, but uses different credentials for other network connections.

10 RemoteInteractive A user logged on to this computer remotely using Terminal Services or Remote Desktop.

11 CachedInteractive A user logged on to this computer with network credentials that were stored locally on the computer. The domain controller was not contacted to verify the credentials.




Netlogon Logging

Capture Netlogon and NTLM events for troubleshootingNLTest /dbflag:2080FFFF

Netlogon.logNetlogon log codesNetlogon log logon types Good source for debugging 5807 Events (clients logging on from non-defined subnets)

Netlogon Logging

Netlogon logging is used to capture Netlogon and NTLM events for troubleshooting and diagnostics. It is enabled on the PDC emulator and any other domain controllers that are involved in the authentication that requires troubleshooting. Nltest.exe can be used to turn on debug logging of the Netlogon process. Passing these parameters creates a Netlogon.log file in the %Windir%\Debug folder. You can use the following tables to generate the entry for the /dbflags switch.

The following defines the 0000000F bit.

NL_INIT 0x00000001 Initialization

NL_MISC 0x00000002 Misc debug

NL_LOGON 0x00000004 Logon processing

NL_SYNC 0x00000008 Synchronization and replication

The following defines the 000000F0 bit.

NL_MAILSLOT 0x00000010 Mailslot messages

NL_PULSE 0x00000020 Pulse processing

reserved 0x00000040 Reserved

Reserved 0x00000080 Reserved





NL_CRITICAL 0x00000100 Only important errors

NL_SESSION_SETUP 0x00000200 Trusted Domain maintenance

reserved 0x00000400 Reserved

NL_PACK 0x00000800 Pack/Unpack of sync messages


NL_SERVER_SESS 0x00001000 Server session maintenance

NL_CHANGELOG 0x00002000 Change Log references



Very verbose bits The following defines the 00F00000 bit.



NL_PULSE_MORE 0x00040000 Verbose pulse processing

NL_SESSION_MORE 0x00080000 Verbose session management

These define the 000F0000 bit.

NL_REPL_TIME 0x00100000 replication timing output

NL_REPL_OBJ_TIME 0x00200000 replication objects get/set timing output

NL_ENCRYPT 0x00400000 debug encrypt and decrypt across net

NL_SYNC_MORE 0x00800000 additional replication dbgprint


NL_PACK_VERBOSE 0x01000000 Verbose Pack-Unpack

NL_MAILSLOT_TEXT 0x02000000 Verbose Mailslot messages

NL_CHALLENGE_RES 0x04000000 challenge response debug

NL_NETLIB 0x08000000 Netlogon portion of Netlib

Control bits.

The following defines the F0000000 bit.

#ifdef DONT_REQUIRE_ACCOUNT




#define NL_DONT_REQUIRE_ACCOUNT 0x00020000 // Do not require account on domain controller discovery

#endif DONT_REQUIRE_ACCOUNT

The following defines the F0000000 bit.

NL_INHIBIT_CANCEL 0x10000000 Do not cancel API calls

NL_TIMESTAMP 0x20000000 TimeStamp each output line

NL_ONECHANGE_REPL 0x40000000 Only replicate one change per call

NL_BREAKPOINT 0x80000000 Enter debugger on startup

Some common combinations of bits used with the dbflag switch include:

/dbflag:0x2000FFFF = NL_INIT 0x00000001 // Initialization NL_MISC 0x00000002 // Misc debug NL_LOGON 0x00000004 // Logon processing NL_SYNC 0x00000008 // Synchronization and replication NL_MAILSLOT 0x00000010 // Mailslot messages NL_PULSE 0x00000020 // Pulse processing NL_CRITICAL 0x00000100 // Only real important errors NL_SESSION_SETUP 0x00000200 // Trusted Domain maintenance NL_PACK 0x00000800 // Pack/Unpack of sync messages NL_SERVER_SESS 0x00001000 // Server session maintenance NL_CHANGELOG 0x00002000 // Change Log references NL_TIMESTAMP 0x20000000 // TimeStamp each output line /dbflag:0x2080ffff NL_INIT 0x00000001 // Initialization NL_MISC 0x00000002 // Misc debug NL_LOGON 0x00000004 // Logon processing NL_SYNC 0x00000008 // Synchronization and replication NL_MAILSLOT 0x00000010 // Mailslot messages NL_PULSE 0x00000020 // Pulse processing NL_CRITICAL 0x00000100 // Only real important errors NL_SESSION_SETUP 0x00000200 // Trusted Domain maintenance NL_PACK 0x00000800 // Pack/Unpack of sync messages NL_SERVER_SESS 0x00001000 // Server session maintenance NL_CHANGELOG 0x00002000 // Change Log references NL_SYNC_MORE 0x00800000 // additional replication dbgprint NL_TIMESTAMP 0x20000000 // TimeStamp each output line /dbflag:0x24000000 NL_CHALLENGE_RES 0x04000000 // challenge response debug NL_TIMESTAMP 0x20000000 // TimeStamp each output line

Once troubleshooting is complete, use /dbflag:0 to switch off logging. Once Netlogon.log is 20 megabytes (MB) in size, it is renamed Netlogon.bak, and a new Netlogon.log is created with the newest Netlogon data. Once that Netlogon.log reaches 20 MB, Netlogon.bak is truncated, and the current Netlogon.log is moved to




Netlogon.bak. Because of this process, the total disk space that is consumed by Netlogon logging is 40 MB.

Netlogon Log Codes There are several different codes that might exist with each entry in the Netlogon log file.

Code Description

0x0 Successful Logon

0xC0000064 The specified user does not exist.

0xC000006A The value provided as the current password is not correct.

0xC000006C Password Policy not met.

0xC000006D The attempted logon is invalid due to a bad username.

0xC000006E User account restriction has prevented successful Logon.

0xC000006F The user account has time restrictions and may not be logged onto at this time.

0xC0000070 The user is restricted and may not log on from the source workstation.

0xC0000071 The user account's password has expired.

0xC0000072 The referenced account is currently disabled.

0xC000009A Insufficient system resources.

0xC0000193 The user's account has expired.

0xC0000224 User must change his password before he logs on the first time.

0xC0000234 The user account has been automatically locked.

Netlogon Log Logon Types There are several different types of authentication that you may find in the Netlogon.log file.

Type Description

Interactive = 2 Interactively logged on (locally or remotely)

Network Accessing system via network

Service Service started by service controller

Transitive Interactive Interactive logon through transitive trusts and secure channel (introduced in Windows 2000)

Transitive Network Network logon through transitive trusts and secure channel (introduced in Windows 2000)

Transitive Service Service logon through transitive trusts and secure channel (introduced in Windows 2000)

Generic Pass through of the credentials using the Netlogon secure channel

Debugging 5807 Events During the deployment of Active Directory, a listing of subnets was defined by the IT staff and placed in active directory (Active Directory Sites and Services). Therefore,




when clients were requesting logon or resources from a domain controller, the most suitable domain controller to be utilized was determined by their site-to-subnet mapping. Every domain controller has access to this information and is quickly able to determine if this particular domain controller is optimal for this client.

If a client is attempting to logon or locate a domain controller but the site has not been defined in the AD (ex: a new floor or wing of a building is brought online and the subnet has not yet been defined in Active Directory Sites and Services) the domain controller being queried by the client will be utilized. Also the following event will be noted in the System event log of the domain controller authenticating the client: Event ID: 5807 Source: NETLOGON User: N/A Computer: ComputerName Description:

During the past number hours there have been number connections to this Domain Controller from client machines whose IP addresses don't map to any of the existing sites in the enterprise. Those clients, therefore, have undefined sites and may connect to any Domain Controller including those that are in far distant locations from the clients. A client's site is determined by the mapping of its subnet to one of the existing sites. To move the above clients to one of the sites, please consider creating subnet object(s) covering the above IP addresses with mapping to one of the existing sites. The names and IP addresses of the clients in question have been logged on this to computer in the following log file 'SystemRoot\debug\netlogon.log' and, potentially, in the log file 'SystemRoot\debug\netlogon.bak' created if the former log becomes full. The log(s) may contain additional unrelated debugging information. To filter out the needed information, please search for lines which contain text 'NO_CLIENT_SITE:'. The first word after this string is the client name and the second word is the client IP address. The maximum size of the log(s) is controlled by the following registry DWORD value'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters\LogFileMaxSize'; the default is 20000000 bytes. The current maximum size is 20000000 bytes. To set a different maximum size, create the above registry value and set the desired maximum size in bytes.

Because these clients will be using any available domain controller for authentication rather than the one(s) that are more optimal (ex: in their site), you should enable the netlogon logging on the Domain controller(s) that are experiencing the 5807 events. In the log files you will be looking for log entries such as the examples shown below:

07/22 10:02:32 netbios_Domain_Name: NO_CLIENT_SITE: Client_NameClient_IPaddress 07/22 10:02:32 netbios_Domain_Name: NO_CLIENT_SITE: Client_NameClient_IPaddress 07/22 10:03:07 netbios_Domain_Name: NO_CLIENT_SITE: Client_Name Client_IPaddress




Account Lockout Status

Identify all domain controllers in a domain and query the lockout status of a particular accountIncludes:

Bad Pwd CountLast Bad PasswordPWD Last SetLockout TimeOrig Lock

Account Lockout Status

The Lockoutstatus tool allows administrators to identify all the domain controllers in a domain and query the lockout status of a particular account. This includes:

• Bad Pwd Count – Indicates the number of bad attempts on each individual domain controller. This value tells you which domain controllers were involved in the account lockout.

• Last Bad Password - Indicates the time in which the last attempt to log on with a bad password.

• PWD Last Set – The value of the last good password or when it was last unlocked. • Lockout Time – Displays the time when the account was locked out. • Orig Lock – The domain controller that actually locked the account (did the

originating write to the LockoutTime attrib for that user).

Files required • Lockoutstatus.exe





Other Account Lockout Tools

Alockout.dllAloinfo.exeAcctinfo.dll

Other Account Lockout Tools

ALOCKOUT Alockout.dll and Appinit.reg script are used to initialize the DLL. This is a logging tool that may help pinpoint the exact application or process sending bad credentials in an account lockout case. The tool attaches itself to various APIs that make calls to LogonUser, and then reports information on what is making those calls into a text file called Alockout.txt in winnt\debug. The events are timestamped so they can be matched up to events in Netlogon logs and/or security logs regarding the lockout.

In most account lockout cases the logging tool will be placed on client computers. Netlogon logging and/or security auditing will have pinpointed the exact computer(s) from which wrong credentials are being sent (locking out users account). This tool can then be installed on that computer in effort to catch and log the specific process sending wrong credentials.

Note: Installing these tools on Exchange servers can cause problems with the Store. Before using the tools on Exchange servers, search for specific issues on http://support.microsoft.com.

Steps to configure Alockout.dll logging: There are two separate versions for Windows 2000 and Windows XP. See Readme.txt in the .zip package for installation instructions. To set up Alockout logging, follow these steps:




Copy Alockout.dll to the \system32 directory on the computer on which account lockout occurs.

Run the Appinit.reg script to add the DLL to the Appinit_DLL key.

Restart the computer.

Wait for an account to lock out on that computer.

The output file Alockout.txt will be created in the \winnt\debug directory.

To discover the process that is causing the lockouts, match up event timestamps in Alockout.txt with the Netlogon logs and security events timestamps.

ALOINFO.EXE If account lockouts occur most frequently after a user is forced to change his or her password (password expiration policy in place), it may be helpful to know exactly which users’ passwords are about to expire. Aloinfo.exe can be used to dump all user account names along with their password age. This will allow proactive setup with the Alockout.dll logging and other account lockout/logging tools on those users’ computers prior to their changing their passwords (and getting locked out).

Usage:

aloinfo [/stored] || [/expires && /server:<server>]

This will dump password ages for all domain user accounts.

C:\aloinfo /expires /server:<DCnamer>

Lists all local service’s startup account information and mapped drives of logged on user.

C:\aloinfo /stored /server:<machinename>

ACCTINFO.DLL ACCTINFO.DLL is used to add new property pages to user objects in Active Directory Users and Computers to help isolate and troubleshoot account lockouts and to change a user’s password on a domain controller in that user’s site.

To use this extension:

Copy Addlinfo.dll to your \system32 directory.

Run regsvr32 acctinfo.dll.

Module 5: Microsoft File Replication Service 205



Module 4: Active Directory® Replication

206 Module 4: Active Directory Replication











Module Overview Module Overview

Explain key replication concepts.Describe features such as application partitions and functional levels.Describe the process by which an object is replicated between domain controllers.Describe the purpose of the Knowledge Consistency Checker (KCC).Explain variations in the replication process.

Introduction The Microsoft® Active Directory® directory service replication model encompasses the manner in which changes are propagated and tracked among domain controllers.


• Explain key replication concepts. • Describe features, such as application partitions and functional levels. • Describe the process by which an object is replicated between domain controllers. • Describe the purpose of the Knowledge Consistency Checker (KCC). • Explain variations in the replication process.

Related Topics Covered in This Lesson • DNS

Recommended Reading • Deployment Guide Resource Kit • Distributed Systems Guide Resource Kit




Section 1: Active Directory Replication Model

Architecture and Physical Structures

Notifying Partners of Changes

Replicating Updates to the Directory

Section 1: Active Directory Replication Model

Introduction This lesson examines the Active Directory replication.

Objectives After completing this lesson, you will be able to:

• Describe the replication model. • Understand the KCC. • Review replication between sites

Related Topics Covered in This Lesson • Domain Name System (DNS) • Kerberos • Active Directory replication • Password policy




Replication Model Physical Structure

MultimasterAll DCs can make changes

Pull replicationPartners pull changes from source DC

Store and forwardPrevents one DC from needing to update all others

State based replicationResolves Conflicts

Replication Model Physical Structure

To globally distribute the directory service, the Active Directory replication model incorporates the following components:

Multimaster Replication All domain controllers (DCs) accept Lightweight Directory Access Protocol (LDAP) requests for changes to attributes of Active Directory objects for which they are authoritative, subject to the security constraints that are in place. Each originating update is replicated to one or more other domain controllers that record it as a replicated update.

This provides a high degree of fault tolerance, eliminating the dependency on a single domain controller to maintain directory operations.

Pull Replication When an update occurs on a domain controller, it notifies its replication partner. The partner domain controller responds by requesting (pulling) the changes from the source domain controller.

By pulling, rather than pushing changes, the DCs are able to filter out changes that might not be needed, thereby reducing unnecessary network traffic. This process is explained in more detail later in this module.




Store-and-Forward Replication Replication is store-and-forward, meaning that changes move sequentially through a set of connected domain controllers that host directory partition replicas. Domain controllers store changes received from replication partners and forward those changes to other domain controllers. This means the originating domain controller for each change does not need to transfer those changes to every other domain controller that requires the changes.

State-Based Replication Active Directory replication is driven by the difference between the current state (the current values of all attributes) of the directory partition replica on the source and its state on the destination domain controllers. This state includes metadata that is used to resolve conflicts and to avoid sending the full replica on each replication cycle.




Directory Partition Replicas

READ ONLY PARTITIONS

NTDS.DIT

Partial Replica Set Partial Replica Set (GC)(GC)

DOMAIN CDOMAIN C

SCHEMASCHEMAAttributes & ClassesForest-Wide

CONFIGCONFIGConfiguration of the ForestForest-Wide

DOMAIN ADOMAIN A

Users, Computers, GPOsDomain-Wide

App PartitionApp PartitionDNS, Custom DataDomain- or Forest- Wide

Partial Replica Set Partial Replica Set (GC)(GC)

DOMAIN BDOMAIN B

Some attributesAll Objects

When a change is made to an object in a directory partition, the value of the changed attribute or attributes must be updated on all domain controllers that store a replica of the same directory partition. Domain controllers communicate data updates automatically, through Active Directory replication. Their communication about updates is always specific to a single directory partition at a time.

Active Directory data is logically partitioned so that all domain controllers in the forest do not store all objects in the directory. Active Directory objects are instances of schema-defined classes, which consist of named sets of attributes. Schema definitions determine whether an attribute can be administratively changed. Attributes that cannot be changed are never updated and, therefore, never replicated. However, most Active Directory objects have attribute values that can be updated.

Different categories of data are stored in replicas of different directory partitions, as follows:

Schema data • Every domain controller stores one writable schema partition that stores schema

definitions for the forest. Although the schema directory partition is writable, schema updates are allowed on only the domain controller that holds the role of schema operations master.




Configuration data • Every domain controller stores one writable configuration directory partition that

stores forest-wide data that controls site and replication operations. • Configuration data includes data such as the active Directory Site structure, a list of

domains, etc., that is interesting to all DCs in the forest. Domain-specific data that is stored in domain directory partitions:

• Every domain controller stores one writable domain directory partition. The contents include information such as users and computers in a domain, the organizational unit structure, etc.

• A domain controller that is a global catalog server stores one writable domain directory partition and a partial, read-only replica of every other domain in the forest. Global catalog read-only replicas contain a partial set of attributes for every object in the domain.

Application data • Domain controllers that are running Windows® Server® 2003 can store directory

partitions that store application data. Application directory partition replicas can be replicated to any set of domain controllers in a forest, irrespective of domain.

• The most common type of data stored in application partitions is Domain Name System (DNS) data.




Changes to Attributes

Changes occur on one DC, replicate to all othersConsistent and Predictable processChanges occur at the attribute levelAll available changes are sent during cycleReplication not dependent on time sync*

Effect of Schema Changes on Replication Attribute definitions are stored in attributeSchema objectsother replication until the schema changes are performed

Changes to Attributes

Active Directory updates originate on one domain controller (originating updates), and the same update is subsequently made on other domain controllers (replicated updates), during the replication process.

Object update behavior is consistent and predictable: when a set of changes is made to a specific directory partition replica, those changes will be propagated to all other domain controllers that store replicas of the directory partition. How soon the changes are applied, depends on the distance between the domain controllers and whether the change must be sent to other sites.

The following key points are central to understanding the behavior of Active Directory updates:

• Changes occur at the attribute level; only the changed attribute value, not the entire object, is replicated.

• At the time of replication, only the current value of an attribute that has changed is replicated. If an attribute value has changed multiple times between replication cycles (for example, between scheduled occurrences of intersite replication), only the current value is replicated.

• The smallest change that can be replicated in Windows 2000 Active Directory is an entire attribute; even if the attribute is linked and multivalued, all values replicate as a single change. The smallest change that can be replicated in Windows Server 2003 Active Directory is a separate value in a multivalued attribute that is linked. This




Windows Server 2003 feature is called linked-value replication; it will be covered in more detail later in this module.

• An attribute is available for replication as soon as it is written. • Originating updates to a single object are written to the database in the same

transaction, so partially written objects are not possible, and a consistent view of the object is maintained.

• After a replication cycle is initiated, all available changes to a directory partition on the source domain controller are sent to the destination domain controller, including changes that occur while the replication cycle is in progress.

• For replicated updates to large numbers of values in linked multivalued attributes, such as the member attribute of a group, updates are not always guaranteed to be applied in the same transaction. In this case, the updates are guaranteed to be applied in one or more subsequent transactions in the same replication cycle (all updates from one source are applied at the destination).

• Conflict resolution is effective without depending on clock synchronization. Other criteria, such as version numbers, are used to resolve conflicts; they will be discussed later in this module.

Note: Keep in mind that, while the replication mechanism itself is not dependent on successful time synchronization between DCs, the DCs use Kerberos v 5 authentication for security when communicating with each other, and that does require that the time services on domain controllers are synchronized.

Effect of Schema Changes on Replication Attribute definitions are stored in attributeSchema objects in the schema directory partition. Changes to attributeSchema objects block other replication until the schema changes are performed. During replication of any directory partition other than the schema directory partition, the replication system first checks to see whether the schema versions of the source and the destination domain controllers are in agreement. If the versions are not the same, the replication of the other directory partition is rescheduled until the schema directory partition is synchronized.

Prior to upgrading a domain controller from Windows 2000 Server to Windows Server 2003, you must update the schema to be compatible with Windows Server 2003. When you run Adprep.exe, the Windows Server 2003 schema is installed in the forest. This process upgrades the schema on each Windows 2000–based domain controller. Thereafter, you can begin upgrading domain controllers to Windows Server 2003.




Change Notification

Initial notification15 Seconds

Subsequent notification3 Seconds

Default notification valuesUpgrade vs. Clean Install

Storage of intrasite notification delay values Partitions container (Windows Server 2003)Registry (Windows 2000)

Change Notification

Replication within a site occurs as a response to changes. On its NTDS Settings object, the source domain controller stores a reps To attribute that lists all servers in the same site that pull replication from it. When a change occurs on a source domain controller, it notifies its destination replication partner, prompting the destination domain controller to request the changes from the source domain controller. The source domain controller either responds to the change request with a replication operation or places the request in a queue, if requests are already pending. Replication occurs, one request at a time, until all requests in the queue are processed.

When a change occurs on a domain controller within a site, two configurable intervals determine the delay between the change and subsequent events:

Initial notification: Initial notification is the length of time between the change to an attribute on a DC and the notification of that change to the first partner. This interval serves to stagger network traffic caused by replication. When a domain controller makes a change (originating or replicated) to a directory partition, it starts the timer for the initial notification interval; when the timer expires, the domain controller notifies all of its replication partners (for that directory partition and within its site) that it has changes. The default value for initial notification is 15 seconds.

Subsequent notification: Subsequent notification is the length of time between notification of the first replication partner and notification of each subsequent partner. A domain controller does not notify




all of its replication partners at one time. By delaying between notifications, the domain controller spreads out the load of responding to replication requests from its partners. The default delay between notifications is three seconds.

Default Notification Values The default values for the initial and subsequent notification delay intervals depend, variably, on the version of the operating system, the upgrade path, and the forest functional level.

The default initial notification delay is 15 seconds, and the subsequent notification delay is three seconds, on a domain controller under any of the following conditions:

• The forest functional level is Windows Server 2003, and the default initial notification delay value was in effect on the domain controller, if it was upgraded from Windows 2000. If non-default values are set on a domain controller that is upgraded from Windows 2000 to Windows Server 2003, the non-default value is retained.

• The domain controller has been created from a fresh (not upgraded) installation of Windows Server 2003 and promoted into a Windows 2000 or Windows Server 2003 forest.

• The domain controller has been upgraded directly from Windows NT 4.0 to Windows Server 2003.

Initial notification delay is 300 seconds, and subsequent notification delay is 30 seconds, under either of the following conditions:

• The domain controller is running Windows 2000 Server. • The domain controller has been upgraded from Windows 2000 to Windows

Server 2003, and the forest functional level is Windows 2000.

Storage of Intrasite Notification Delay Values On a domain controller that is running Windows Server 2003, intrasite notification delay values are specific to each directory partition and are stored in two of the cross-reference object for each directory partition, located in the Partitions container, within the configuration directory partition, as follows:

• The value for initial change notification delay is stored in the msDS-Replication-Notify-First-DSA-Delay attribute.

• The value for subsequent notification delay is stored in the msDS-Replication-Notify-Subsequent-DSA-Delay attribute.

Although the attribute values are present on all domain controllers that are running Windows Server 2003, the default values of 15 seconds for initial notification delay and 3 seconds for subsequent notification delay are in effect only under the conditions described earlier.




On domain controllers that are running Windows 2000, notification delay values are stored in registry entries on each domain controller:

The value for the delay before the first change notification is stored in:

HKLMACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

Replicator notify pause after modify (secs)

The value for the delay before each subsequent change notification is stored in:

HKLMACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters

Replicator notify pause between Directory Service and Exchange agents (secs)

Notification Delay Values and Their Application by Domain Controllers To accommodate both the registry and cross-reference object locations of notification delay information, the process that is used to determine which change notification delay values to apply, favors the settings in the registry, if any exist. When replication partners send notification of changes, notification delay values are checked according to the operating systems that are running on the partners, as follows:

Windows Server 2003 only:

• Check the registry for the presence of initial and subsequent notification delay values, and use those values if they exist.

• If no registry values exist, check the cross-reference object for the directory partition to which the change has occurred. If values are set, use those values.

• Otherwise, use the default values of 15 seconds for initial notification delay and 3 seconds for subsequent notification delay.

Upgraded Windows Server 2003 and Windows 2000 Server:

• Check the registry for the presence of registry values. If a value is set for an entry, use the value that is set for all relevant directory partitions.

• If no value is set on a registry entry (the default is in effect), use the default values of 300 seconds for initial notification delay and 30 seconds for subsequent notification delay.




Change Notification Between Sites

Site link Replication intervals are used between sites

15 minutes is lowest possible interval

Change Notification Between Sites speeds process

ADSIEdit used to set values

Change Notification Between Sites

Change Notification between Sites By default, the change notification intervals discussed here only apply to replication partners in the same site. Since those are assumed to be better connected than DCs in different sites, replication takes place in them more often.

Domain controllers in different sites will replicate with each other based on the site link replication intervals and site link schedules in place, among other factors, with the lowest possible replication interval that can be set on a site link being 15 minutes. This means the replication latency of the forest (length of time for a change to replicate from one DC in the forest to all DCs across all other sites) can be quite high.

It is possible to remedy this situation and have replication partners in different sites use the change notification process as well. This means that for the purposes of replication, those partners will behave as if they were in the same site, and use the change notification intervals that are appropriate for them, based on the criteria discussed previously.

If you do not use manually created connection objects for intersite replication, you can implement change notification between the sites by modifying the options attribute on the site link object. To enable change notification on a site link:

In ADSI Edit, expand the Configuration Container icon, and then expand CN=Configuration,DC=ForestRootDomainName and CN=Sites.

Expand the CN=Inter-Site Transports container, and then click CN=IP.




In the details pane, right-click the site link object whose options attribute you want to change, and then click Properties.

In the Select a property to view box, click options.

If the Value(s) box displays <not set>, in the Edit Attribute box, type 1 for the value (bit 0=1).

If the Value(s) box contains a value, you must derive the new value by using a Boolean BITWISE-OR calculation of the existing value and the value that enables the replication change you are making, and then convert that value to an integer. Therefore, if a value is set, convert the integer value to a binary value and OR that value with the value 0001. Then convert the results back to an integer, and type the value in the Edit Attribute box.

For example, if the existing decimal value is 4, that value is equal to 0100 in the binary system. The value that enables change notification is 1, or 0001 in binary. The OR operation combines 0 OR 0 = 0, 0 OR 1 = 1, 1 OR 0 = 1, 1 OR 1 = 1. Therefore, the following OR calculation computes the binary value:

0100 (existing value)

0001 (value that enables change notification)

0101 (adds enable change notification to the existing setting)

The binary value 0101 converts to the digital value 5. For information about binary calculations and converting binary values to digital values, see Windows 2000 Server Help.

Click Set, and then click OK.




Originating Updates

Originating Updates: Initiating ChangesAdd an object to the directoryModify (add, delete, or replace) attribute values of an object in the directoryMove an object by changing the name or parent of the objectDelete an object from the directory

Originating Updates

Originating Updates: Initiating Changes As an LDAP directory service, Active Directory supports the following four types of update requests:

• Add an object to the directory. • Modify (add, delete, or replace) attribute values of an object in the directory. • Move an object by changing the name or parent of the object. • Delete an object from the directory.

Each LDAP request generates a separate write transaction. LDAP directory service processes each write request as an atomic transaction; that is, the transaction is either completed in full or not applied at all. The practical limit to the number of values that can be written in one LDAP transaction is approximately 5,000 values added, modified, or deleted at the same time.

A write request either commits, and all its effects are durable, or it fails before completion and has no effect. A write request that commits is called an originating update. The absolute success or failure of an update applies, even for requests, such as add or modify, that might affect several attributes of a single object. In this case, if one attribute update fails, they all fail, and the object is not updated.




When an update that originates on one domain controller is replicated to another domain controller, the update on the non-originating domain controller is called a replicated update and it can be distinguished from an originating update.

An originating update enforces schema restrictions (allowable parent object types for an object, mandatory and optional attributes for an object, syntax for an attribute) according to the schema that exists on the domain controller at the moment of the update.

Originating Add An Add request makes a new object with a unique objectGUID attribute. The values of all replicated attributes that are set by the Add request are stamped Version = 1.

The Add request fails immediately, if the parent object does not exist, or if the originating domain controller does not contain a writable replica of the parent object’s directory partition.

Originating Modify All Modify operations replace the current value of an attribute with a new value. A modify request can specify one of the following:

• That an attribute be deleted from the object. Attribute deletion is best thought of as replacing the attribute value with NULL. The NULL value occupies no storage of its own, but it does carry a stamp, as does any value that is stored as a directory attribute.

• That a value be added to the current value of an attribute, as when modifying an attribute that can have multiple values. The effect is to replace the current values with the current values plus the added value.

For each attribute in the request, a Modify request compares the new value in the request with the existing value. If the values are the same, the request to modify that attribute is ignored. If the resulting Modify request does not change any attributes of the object, the entire request is ignored.

Otherwise, a Modify request computes a stamp in the metadata for each new replicated attribute value, by reading the version from the existing value (version=0 for an attribute that has never been written) and then adding 1 to this value. The Modify request replaces the old stamp values with new stamp values.

Originating Move A Move request is, essentially, a special Modify request for a single attribute, the name attribute. The operation proceeds as described for the Modify request.




Originating Delete A Delete request is essentially a special Modify request that performs the following series of operations:

Sets the isDeleted attribute to TRUE.

Marks the object as a tombstone, which is an object that has been deleted but not fully removed from the directory.

Changes the relative distinguished name to a value that is otherwise impossible (cannot be set by an LDAP application).

Strips all attributes that are not needed by Active Directory. A few key attributes, including objectGuid, objectSid, distinguishedName, nTSecurityDescriptor, and UsnChanged, are preserved on the tombstone.

Moves the tombstone to the Deleted Objects container, which is a hidden container within the directory partition.




Tracking Replicated Updates

Need to avoid endless replicationRelies on USNsReplication Metadata includes:

Version Number ( Number of changes ) Then timestampsThen Originating DC GUID

Last writer wins

Tracking Replicated Updates

A replicated update is performed on one domain controller when it receives replication of an originating update that was performed at another domain controller. There is not necessarily a one-to-one correspondence between originating and replicated updates. A single replicated update might reflect a set of originating updates (even updates originating at different domain controllers) to the same object.

For example, the manager of a user object can be changed at one domain controller at the same time the address of the same user is changed at another domain controller. A third domain controller might receive these changes to the user object separately and, in turn, replicate the changes to a fourth domain controller in a single replicated update.

To avoid endless replication of the same update and reapplication of an update that is received from different replication partners, a domain controller must be able to recognize replicated updates that it has already received, as opposed to those that it has not. Some directory services use timestamps to determine what changes need to be propagated, on the basis of preserving the last write. But keeping time closely synchronized in a large network is difficult. When the latest time of a directory write is the only means of determining which of two changes is recorded and replicated, skewed time on a domain controller can result in data loss or directory corruption.

Active Directory replication does not depend primarily on time to determine what changes need to be propagated. Instead it uses update sequence numbers (USNs) that are assigned by a counter that is local to each domain controller. Because these USN




counters are local, it is easy to ensure that they are reliable and that they never run backward (that is, they cannot decrease in value).

When a conflict occurs, instead of using timestamps as the primary mechanism to determine what updates are preserved, Active Directory uses volatility (version number) as the first element of the per-attribute stamps that are compared during conflict resolution. The second element is a timestamp. Therefore, if an attribute is updated once on domain controller A and once on domain controller B, the last writer’s update is preserved. But, if the attribute is updated on domain controller A, then on domain controller B, and then again on domain controller A, the update of domain controller A is preserved, even if the clock of domain controller B is set forward from that of domain controller A. With Active Directory, clock skew can never prevent a value from being overwritten.

Server Object GUID (DSA GUID) and Server Database GUID (Invocation ID) The server object that represents a domain controller, in the Sites container of the configuration directory partition, has a globally unique identifier (GUID) that identifies it to the replication system as a domain controller. This Directory System Agent (DSA) GUID is used in USNs to track originating updates. It is also used by domain controllers to locate replication partners. The DSA GUID is the GUID of the NTDS Settings object (class nTDSDSA), which is a child object of the server object. Its value is stored in the objectGUID attribute of the NTDS Settings object.

The DSA GUID is created when Active Directory is initially installed on the domain controller, and is destroyed only if Active Directory is removed from the domain controller. The DSA GUID ensures that the DSA remains recognizable when a domain controller is renamed. The DSA GUID is not affected by the Active Directory restore process.

The Active Directory database has its own GUID that the DSA uses to identify the database instance (version of the database). The database GUID is stored in the invocationId attribute on the NTDS Settings object. Unlike the DSA GUID, which never changes for the lifetime of the domain controller, the invocation ID is changed during an Active Directory restore process, to ensure replication consistency.

On domain controllers that are running Windows Server 2003, the invocation ID also changes when an application directory partition is removed from or added to the domain controller.




Update Sequence Numbers (USNs)

Used For:Tracking Replicated UpdatesDetermining Which Changes to ReplicatePreventing Endless ReplicationSimplifying Recovery After a Failure

USN Counters local to each DC64-bit counter (you won’t run out)Maintained as highestCommittedUsn attribute on RootDSE

Update Sequence Numbers (USNs)

The current USN is a 64-bit counter that is maintained by each Active Directory domain controller as the highestCommittedUsn attribute on the rootDSE object. At the start of each update transaction (originating or replicated) on a domain controller, the domain controller increments its current USN, and associates this new value with the update request.

Note: The rootDSE (DSA-specific Entry) represents the top of the logical namespace for one domain controller. RootDSE has no hierarchical name or schema class, but it does have a set of attributes that identify the contents of a given domain controller.

The current USN value is stored on an updated object as follows.

• Local USN: The USN for the update is stored in the metadata of each attribute that is changed by the update, as the local USN of that attribute (originating and replicated writes). As the name implies, this value is local to the domain controller on which the change occurs. It is possible to use the Repadmin command-line tool to view the local USN. Type repadmin /showobjmeta DCLIST <object_DN> at a command prompt, and view the column labeled “Loc. USN” in the output.




• uSNChanged: The maximum local USN, among all of an object’s attributes, is stored as the object’s uSNChanged attribute (originating and replicated writes). The uSNChanged attribute is indexed, which allows objects to be enumerated efficiently in the order of their most recent attribute write. This value can be examined using LDP or ADSIEDIT.

Note: When the forest functionality is Windows Server 2003 or Windows Server 2003 interim, discrete values of linked multivalued attributes can be updated individually. In this case, there is a uSNChanged attribute associated with each link, in addition to the uSNChanged attribute associated with each object. Therefore, updates to individual values of linked multivalued attributes do not affect the local USN; they affect only the uSNChanged attribute on the object.

• Originating USN: For an originating write only, the update’s USN value is stored with each updated attribute, as the originating USN of that attribute. Unlike the local USN and uSNChanged, the originating USN travels with the attribute’s value, as it replicates. To see the originating USN, type, repadmin /showobjmeta DCLIST <object_DN> at a command prompt, and view the column labeled “Org.USN” in the output.




Object Creation

Replication-Related Data on DC1 When a User Object is Created

Object Creation

The following series of diagrams illustrates the replication-related data for a single object and one of its attributes, as it goes from creation through replication.

Figure 1 shows the replication-related data for the user object when it is first created on domain controller DC1. Before the user object is created, the current USN for the domain controller is 4710. When the object is created, the local USN of 4711 is assigned to each attribute of the user object, and the current USN for the domain controller increments from 4710 to 4711. Because the object has not yet changed, the value of its uSNChanged attribute is the same as its uSNCreated attribute, 4711.

Figure 1. Replication-Related Data on DC1 When a User Object is Created




Figure 2 shows the change to the destination domain controller when the new user object is replicated. The object is created as a replicated update on DC2. Notice that the per-attribute originating USN and stamp (version, originating time, originating DC) are replicated from DC1 to DC2, but the per-attribute local USN and the per-object uSNChanged are unique to DC2.

Figure 2. Replication-Related Data on DC2 When a New User Object is Replicated From DC1

The following information is transferred in the metadata of an updated attribute value from the source domain controller to the destination domain controller:

• The originating USN value for the updated attribute, which is the USN assigned by the domain controller on which the update was made.

• The stamp, which is used to resolve conflicts.

Figure 3 illustrates the change in the replicated object on DC2 when someone changes the password (the userPassword property in the diagram) of the object on that domain controller. By this time, the current USN on DC2 has increased from 1746 to 2001. The update request changes the password and increments the current USN to 2002 on DC2. The request also sets the password attribute’s originating USN and local USN to 2002 and creates a new stamp for the password value. The version number of this password’s stamp is 2, which is one version number higher than the version of the previous password.




Figure 3. Replication-Related Data on DC2 After the User Password Value Has Been Changed on DC2

Finally, in Figure 4, the changed password is now replicated back to the original domain controller, whose current USN has increased to 5039. The replicated update increments the current USN of DC1 to 5040. The per-attribute originating USN and stamp (version, originating time, originating DC) are replicated from DC2 to DC1, and the per-attribute local USN and per-object uSNChanged values are set to 5040.

Figure 4. Replication-Related Data on DC1 After the Password Change Has Replicated to DC1

Note: Many operations, including the creation of a user object, increment the USN counters by more than one. However, for the purposes of this example, the slides have simplified this process and show the USNs incrementing by one.




Replication Request Filtering

Up-To-Dateness Vector and High WatermarkComplementary Filter MechanismsWork Together to Decrease Replication Latency

Prevents endless looping

Replication Request Filtering

Destination domain controllers use the originating USN to track changes they have received from other domain controllers with which they replicate. When requesting changes from a source domain controller, the destination informs the source of the updates it has already received, so that the source never replicates changes that the destination does not need.

Two values are used by source and destination domain controllers to filter updates when the destination requests changes from the source replication partner:

• Up-to-dateness vector: The current status of the latest originating updates to occur on all domain controllers that store a replica of a specific directory partition.

• High-watermark (direct up-to-dateness vector): The latest originating update to a specific directory partition that has been received by a destination, from a specific source replication partner, during the current replication cycle.




Up-to-Dateness Vector

For Tracking Originating UpdatesDetermines Attributes to Send for ReplicationFor Each DC that holds a full replica of the Partition, it holds:

Database GUID (invocation ID) of Source DCHighest-Originating-USN from Source DCTimestamp from last successful replication (Windows Server 2003)

If the destination already has an up-to-date value, the source domain controller does not send the update

Up-To-Dateness Vector

The up-to-dateness vector is a value that the destination domain controller maintains, for tracking the originating updates that are received from all source domain controllers. When a destination domain controller requests changes for a directory partition, it provides its up-to-dateness vector to the source domain controller. The source domain controller uses this value to reduce the set of attributes that it sends to the destination domain controller.

The up-to-dateness vector contains an entry for each domain controller that holds a full replica of the directory partition. The up-to-dateness vector values include the database GUID (invocation ID) of the source domain controller and the highest originating write (based on the USN) received from the source domain controller. If the up-to-dateness entry that corresponds to source domain controller X contains the USN n, the destination domain controller guarantees that it holds all updates to a specific directory partition that originated at domain controller X and that have an originating USN value of less than or equal to n.

If the destination already has an up-to-date value, the source domain controller does not send that attribute. If the source has no attributes to send for an object, it sends no information at all about that object.




At the completion of a successful replication cycle between two replication partners, the source domain controller returns its up-to-dateness vector to the destination, including the highest originating USN on the source domain controller. The destination merges this information into its up-to-dateness vector. In this way, the destination tracks the latest originating update it has received from each partner, as well as the status of every other domain controller that stores a replica of the directory partition.

Timestamp on Up-To-Dateness Vector Windows Server 2003 adds a new field to the Up-to-dateness vector (UTD), in which it records the last time the local DC completed a successful replication cycle with the partner domain controller. The replication cycle may have occurred directly (direct replication partner) or indirectly (transitive replication partner). The timestamp is updated whether or not the local domain controller actually received any changes from the partners.

The timestamp is recorded on all Windows Server 2003 domain controllers, even if the partner is running Windows 2000. The timestamp is recorded at all domain and forest functional levels.

Note: The timestamps are only updated at the end of a complete and successful replication cycle. In the case of a long sync or long full sync, the timestamp might not be updated right away, although changes are flowing.

It is possible to see the up-to-dateness vector in the output of the repadmin /showvector command. Adding the /latency switch will show the replication latencies within the forest. The replication health of Active Directory can be quickly assessed, and non-replicating domain controllers identified, because data is recorded on all domain controllers that are hosting the partition. In addition, four new replication events are recorded each time the Active Directory KCC runs and any errors provide summary information regarding replication failures. The threshold for the first error can be configured in the registry. The default is 24 hours.

HKLM\System\CurrentControlSet\Services\NTDS\Parameters

Replicator latency error interval (hours)

Event Messages There are four new event messages that use the timestamp to identify non-replicating domain controllers and trigger an event specific to a particular problem scenario. The event messages and brief descriptions of the problem scenarios are described in Table 1.




Event ID Description Details

NTDS Replication 1862 Non-replicating domain controllers in Other Sites

This is a warning, indicating that the local domain controller has not replicated with other domain controller(s) at a different site, beyond the latency threshold of 24 hours (configurable). This should be expected if site link schedules prevent replication for more than 24 hours. If that is the case, the warning threshold should be increased to match the site link schedule.

NTDS Replication 1863 Non-replicating domain controllers in the Local Site and Other Sites

This is similar to the 1862 event (described above), but it also includes non-replicating domain controllers in the local site. This is a warning, indicating that the local domain controller has not replicated with another domain controller in the local site, or at a different site, beyond the latency threshold of 24 hours (configurable). This should be expected for inter-site partners with closed schedules. For intra-site partners there are no schedules to observe, so all partner domain controllers should be up to date. If this is recorded, use Repadmin and DCDiag to troubleshoot the affected partner.

NTDS Replication 1864 Summary of domain controllers in the Local Site Not Replicating

This is a summary message of domain controllers from the local site that are no longer replicating. The totals are broken out by length of time. Additionally, the number of domain controllers that have not replicated beyond the tombstone lifetime are reported, along with the current tombstone lifetime setting for the forest.

NTDS Replication 2042 Domain ControllersThat Have Not Replicated Beyond the Tombstone Lifetime

When a domain controller that has not replicated beyond the tombstone lifetime attempts to replicate, it is blocked and this event is reported.

Table 1: Replication event log entries.

Multiple Paths without Redundant Replication Multiple replication paths can exist between a pair of domain controllers. Multiple paths provide fault tolerance and can reduce latency. However, when multiple paths exist, you might expect that the same change will be sent along each path to a specific domain controller or that a change might replicate in an endless loop. Active Directory prevents these potential problems with multiple paths, by using the up-to-dateness vector. The ability to eliminate redundancy is called “propagation dampening.”




The following is an example of how replication ordinarily occurs:

DC A updates a password attribute. In this example, the originating USN of the attribute is set to 3.

Destination DC, B, requests changes from source DC A, and sends its high-watermark and up-to-dateness vector to DC A.

According to the high-watermark that was passed by DC B, source DC A examines one or more objects, one of which contains the changed password. When DC A encounters the changed password attribute, it proceeds as follows:

a. First, DC A finds that the originating directory system agent (DSA) of the password attribute is DC A.

b. Therefore, DC A reads the up-to-dateness vector supplied by DC B and finds that DC B is guaranteed to be up-to-date with updates that originated at DC A and that have an originating USN of less than or equal to 2.

c. DC A then finds that the originating USN of the password attribute is 3.

d. Because 3 is greater than 2, DC A sends the changed password attribute to DC B.

To illustrate propagation dampening, suppose that DC B had already received the password update from DC C, which had received it from DC A. In this case, the entry in the up-to-dateness vector of DC B, for DC A, would contain the USN value 3, not 2. Therefore, DC A would not send the changed password to DC B.




High-Watermark

Determines Objects to Consider for ReplicationTable on each domain controller that contains:

Entries for direct Replication partnersHighest known USN from those partners (uSNChanged)

Destination provides value to sourceSource filters changes that do not need to be sent

High-Watermark

The high-watermark, or direct up-to-dateness vector, is a value that the destination domain controller maintains during replication to keep track of the most recent attribute change that it has received, from a specific source domain controller, for an object in a specific directory partition. When sending changes to a destination domain controller, the source domain controller provides the changes in increasing order of uSNChanged. Although the uSNChanged values from the source domain controller are not stored on objects at the destination domain controller, the destination domain controller keeps track of the uSNChanged value of the most recent object that was successfully updated from the source domain controller, for a specific directory partition. This USN is called the destination’s high-watermark, with respect to the directory partition and the source domain controller.

When requesting changes during a replication cycle, the destination provides the high-watermark value with each request to the source domain controller, which, in turn, uses this value to filter the objects that it considers for continuing replication to the destination. If the uSNChanged value of an object on the source domain controller is less than or equal to the high-watermark value of the destination domain controller, the object update has already been received by the destination domain controller, and it is, therefore, not replicated. The high-watermark serves to decrease the CPU time and the number of disk I/O operations that would otherwise be required.




The up-to-dateness vector and the high-watermark are complementary filter mechanisms that work together to decrease replication latency. Whereas the high-watermark prevents irrelevant objects from being considered by the source domain controller, with respect to a single destination, the up-to-dateness vector helps the source domain controller to filter irrelevant attributes (and entire objects, if all attributes are filtered) on the basis of the relationships between all sources of originating updates and a single destination.

For a specific directory partition, a domain controller maintains a high-watermark value for only those domain controllers from which it requests changes, but it maintains an up-to-dateness vector entry for every domain controller that has ever performed an originating update, which is, typically, every domain controller that holds a full replica of the directory partition.




Multimaster Conflict Resolution Policy

Conflict Resolution Stamp:Version Number, incremented for each Originating WriteOriginating Time of the Originating WriteDC that Performed the Originating Write

Version Numbers First

Time Stamps

Multimaster Conflict Resolution Policy

Conflict Resolution Stamp The stamp that is applied during an originating write has the following three components:

• The version is a number that is incremented for each originating write. The version of the first originating write is 1. The version of each successive originating write is increased by 1.

• The originating time is the time of the originating write, to a one-second resolution, according to the system clock of the domain controller that performed the write.

• The originating DC is the DSA GUID of the domain controller that performed the originating write.

When stamps are compared, the version number is the most significant attribute, followed by the originating time, and then the originating DC. If two stamps have the same version, the originating time almost always breaks the tie. In the extremely rare event that the same attribute is updated on two different domain controllers during the same second, the originating DC breaks the tie, in an arbitrary fashion.

Two different originating writes of a specific attribute of a particular object cannot assign the same stamp, because each originating write advances the version at a specified originating domain controller. The originating time does not contribute to uniqueness. Replicated writes cannot decrease the version, because values with smaller versions lose during conflict resolution. You can see all three components of the stamp in the output of the repadmin /showobjmeta command.




Multimaster Conflict Resolution Policy (con’t)

Types of Conflicts:Attribute value conflictAdd or Move under deleted parent, Delete non-leaf objectRelative distinguished name conflict

Multimaster Conflict Resolution Policy (con’t)

Given the semantics of LDAP directories, multimaster updates create several other possible types of conflicts can occur:

• Attribute value conflict: A Modify operation sets the value of an attribute. Concurrently, at another domain controller, a Modify operation sets the value of the same attribute to a different value. After resolution: The attribute value at all domain controllers is the value with the larger stamp.

• Add or Move under deleted parent, Delete non-leaf object: An Add or Move operation makes an object a child of a parent object. Concurrently, at another domain controller, a Delete operation deletes the parent object. After resolution: At all domain controllers, the parent object is deleted and the child object is a child of the special LostAndFound container in the directory partition. Stamps are not used in the resolution of this conflict.

• Relative distinguished name (RDN) conflict: An Add or Move operation names a child object below a parent object. Concurrently, at another domain controller, an Add or Move operation names a different child of the same parent with the same child name, resulting in two child objects with identical RDN values below the same parent object. After resolution: The child object whose naming attribute has the larger stamp keeps its given name. The child object whose relative distinguished name attribute (for example, CN for most objects, OU for organizational units, DC for domain components) has the smaller stamp is named by the following convention: At all domain controllers, a system-assigned value that is unique to the conflicting name and cannot conflict with any client-assigned value is assigned to the child object. For




example, if the relative distinguished name of a child object was "CN=ABC" before conflict resolution, its relative distinguished name after resolution is "CN=ABC*CNF:<GUID>", where “*” represents a reserved character, “CNF” is a constant that indicates a conflict resolution, and “<GUID>” represents a printable representation of the objectGUID attribute value.




Replication of Linked and Nonlinked Attributes

Linked AttributesNonlinked Attributes Group Membership and Linked-Value Replication

Group Membership Replication in Windows 2000 ForestsGroup Membership Replication in Windows 2003 Forests

Replication of Linked and Nonlinked Attributes

Attributes are replicated differently, depending on whether they are linked or nonlinked. Understanding the differences in the ways these attributes operate is helpful in understanding their effect on replication.

Linked Attributes A linked attribute represents an inter-object, distinguished-name reference. Linked attributes occur in pairs consisting of a forward link and backward link (back-link). The forward link is the linked attribute on the source object (for example, the member attribute on the group object), while the backward link is the linked attribute on the target object (for example, the memberOf attribute on the user object). A back-link value, on any object instance, consists of the distinguished names of all the objects that have the object’s distinguished name set in their corresponding forward link. For example, manager and directReports are a pair of linked attributes in which manager is the forward link and directReports is the back-link. If Bill is Joe’s manager, and the distinguished name of Bill’s user object is stored in the manager attribute of Joe’s user object, then the distinguished name of Joe’s user object appears in the directReports attribute of Bill’s user object.

A linked attribute can have either single or multiple values. For example, the manager attribute identifies the distinguished name of a single manager of the object or objects that are managed. The directReports attribute of a user object can have multiple values of user names.




The relationships between linked attributes are stored in a separate table in the directory database as link pairs. The matching pair of Link IDs (defined as any two numbers 2N and 2N+1) tie the attributes together. For example, the member attribute has a link ID of 2 and the memberOf attribute has a link ID of 3. Because the member and the memberOf attributes are linked in the database and indexed for searching, the directory can be examined for all records in which the link pair is member/memberOf and the memberOf attribute identifies the group. For example, one could ask, “What user objects have group X as a value in their memberOf attribute?”

Attributes are marked in the schema as being linked. Attributes with the distinguished name syntax Object(DS-DN), Object(DN-String), or Object(DN-Binary) can be linked, but not all such attributes are linked.

Nonlinked Attributes Nonlinked, distinguished-name attributes reference other objects in the same way that linked attributes do, except that their behavior is different when an object that is referred to is deleted, as described in “Replication of Deletion Updates” later in this section. In addition, nonlinked, multivalued attributes have an approximate limit of 1,200 values (increased from the Windows 2000 Server limit of approximately 800 values). This limit is based on an approximate maximum page size of 8 kilobytes (KB). For attributes of this maximum size, there are no storage or replication drawbacks or limitations.

Group Membership and Linked-Value Replication The replication of linked, multivalued attributes is especially important for group objects. Potentially, the linked, multivalued member attribute of a group object can have thousands of values. Linked-value replication, in Windows Server 2003, enables individual values to replicate separately. Linked-value replication requires a forest functional level of Windows Server 2003 or Windows Server 2003 interim. When it is in effect, linked-value replication solves the problem of replication delays that are caused by the inability to write an entire member attribute in a single database transaction. Linked-value replication also makes restoring group membership back-links possible, when a user or group object is authoritatively restored.

Group Membership Replication in Windows 2000 Forests Linked-value replication is not available in Windows 2000 Server forests. Because an originating update must be written in a single database transaction, and because the practical limit for a single transaction is 5,000 values, membership of more than 5,000 values is not supported in the Windows 2000 Server Active Directory. A group of this size represents a limitation, both in terms of the database write operation that is required to record a change to an attribute of that size and the transfer of that much data over the network.




These conditions have the following impacts on replication, most notably for group and distribution list objects:

• Lost changes: If values of the same multivalued attribute are updated on two different domain controllers during a period of replication latency, the most recently changed replica of the attribute, with all its multiple values, is replicated, and any earlier changes are lost. Changes to the separate values are not merged.

Note: Because all changes to an object must be written in the same database transaction, multiple changes to a single group object can take a relatively long time to be written. This increases the likelihood of another change occurring to the same object, prior to the completion of the original write.

• Excessive network bandwidth consumption: For example, when one member is added to a group of 3,000 members, the member attribute with all 3,001 values is transmitted between domain controllers. Transmission of all values, to apply a change to only the updated value or values is an inefficient use of network resources.

These limitations are effectively removed in a forest that has a functional level of Windows Server 2003 or Windows Server 2003 interim. At these levels, linked-value replication accommodates replication of individually updated member values.

Group Membership Replication in Windows Server 2003 Forests In a Windows Server 2003 forest that has a forest functional level of Windows Server 2003 or Windows Server 2003 interim, linked-value replication provides the following benefits:

• Removed likelihood of losing entire sets of changes to the same group membership made on different domain controllers.

• Greatly reduced likelihood of update collisions, in which the same member value is changed on different domain controllers at the same time, and one update is lost.

• Network efficiency is improved by transmitting only updated values and not the entire set of attribute values, which can include many thousands of values.

Although replication of many thousands of individual membership updates can be accommodated in a Windows Server 2003 forest, LDAP writes have a practical limit of approximately 5,000 updates in a single transaction. Because originating updates are required to complete in a single transaction, this practical limit of approximately 5,000 updates to a single object is recommended.

Note: Only originating updates must be applied in the same database transaction. Replicated updates can be applied in more than one database transaction.




Replication of Deletions

Tombstones and the Deleted Objects ContainerObjects turned into Tombstones when deletedisDeleted attribute=1Deleted Objects Container in each partition

Tombstone Lifetime60 days by default on Windows 2000, upgrades and 2003 Server RTM180 on new forests created with 2003 Server SP1

Removed during garbage collection

Replication of Deletions

Object deletions are replicated by replicating tombstones. After an object is deleted, but before it is removed from the directory, object references that formerly pointed to the object now refer to the deleted object’s tombstone. The isDeleted attribute, which has a value of TRUE when an object is a tombstone, indicates the object deletion to other domain controllers. Deleted objects are stored in the Deleted Objects hidden container. Every directory partition has a Deleted Objects container.

By default, tombstones have a lifetime of 60 days (180 days in a forest that was created on a server running Windows Server 2003 with SP1), after which they are permanently removed from the directory database, through a process called garbage collection.

The tombstone lifetime can be changed, but it is important to ensure that the tombstone lifetime is larger than the worst possible replication latency for any directory partition, so that a tombstone cannot be deleted before it has replicated to every directory partition replica. In addition, Active Directory does not allow data to be restored from a backup image that is older than the tombstone lifetime.

Note: A tombstone is invisible to normal LDAP searches. However, a tombstone is visible to searches that use the special LDAP control 1.2.840.113556.1.4.417.




Lingering Objects

Deletions on Nonreplicating Domain Controllers Backup Latency Interval

Half of Tombstone Lifetime by defaultEvent 2089 recorded if last backup exceeds threshold

Replication Consistency SettingsStrict – prevents lingering object replicationLoose – allows lingering object replication

Lingering Objects

Deletions on Nonreplicating Domain Controllers If a domain controller fails to replicate for a number of days that exceeds the tombstone lifetime, replicas of objects that have been deleted from a writable partition might remain in that domain controller’s directory. Because the tombstones of the deleted objects are permanently removed from the directory at the end of the tombstone lifetime, a domain controller that fails to replicate changes for tombstoned objects never deletes them.

This condition can occur for a variety of reasons, including:

• Prolonged misconfigurations. • Prolonged errors in name resolution, authentication, or the replication engine that

block inbound replication. • Turning on a domain controller that has been offline for longer than the tombstone

lifetime. • Advancing system time or reducing tombstone lifetime values, in an attempt to

accelerate garbage collection, before end-to-end replication has taken place for all directory partitions in the forest.

The condition of outdated objects can also occur when hardware or software problems render the domain controller unreachable. Regardless of the reason, a deleted object can




remain on a domain controller any time the domain controller goes offline before receiving a deletion and remains offline for longer than the tombstone lifetime of that deletion.

These outdated objects, called lingering objects, create inconsistency in the directory. If a change is made to an outdated object on the reconnected domain controller, it is possible for the object to be recreated in the directory, under certain conditions. To avoid this situation, replication of an outdated object is prohibited, by default, in newly created Windows Server 2003 forests.

Backup Latency Interval On domain controllers running Windows Server 2003 with SP1, a new NTDS Replication event provides a warning to administrators when a domain controller has not been backed up. Event ID 2089 provides the backup status of each directory partition that a domain controller stores, including application directory partitions. Specifically, event ID 2089 is logged in the Directory Service event log when partitions in the Active Directory forest are not backed up within a backup latency interval. The value for the backup latency interval is stored as a REG_DWORD value, in the Backup Latency Threshold (days) entry, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters.

By default, the value of Backup Latency Threshold is half the value of the tombstone lifetime of the forest. If, halfway through the tombstone lifetime, a directory partition has not been backed up, event ID 2089 is logged in the Directory Service event log and continues daily, until the directory partition is backed up.

This event serves as a warning to administrators and monitoring applications, to make sure that domain controllers are backed up before the tombstone lifetime expires. However, it is recommended that you perform backups at a much higher frequency than the default value of Backup Latency Threshold.

Replication Consistency Setting If the attributes on a lingering object never change, the object is never considered for replication. However, if an attribute changes, the attribute is considered for outbound replication. Because the destination domain controller does not hold the object for the attribute that is being replicated, an update cannot be performed. The way this condition is resolved depends on the replication consistency setting on the domain controller.

A registry setting, on domain controllers that are running Windows Server 2003 or Windows 2000 Server with SP3, provides a consistency value that determines whether a domain controller replicates and reanimates an updated object that has been deleted from all other replicas, or whether replication of such objects is blocked. The default settings on domain controllers that are running Windows 2000 Server with SP3 are different from the default settings on domain controllers that are running Windows Server 2003.




Strict Replication Consistency To avoid problems with reanimating objects that have been deleted, a domain controller that is running Windows Server 2003 in a newly created (not upgraded) Windows Server 2003 forest blocks inbound replication, by default, when it receives an update to an object that it does not have.

Note: Active Directory replication uses update tracking to differentiate between replicating a newly created object and updating an attribute for an existing object. Replication of a lingering object is an attempt to update an attribute or attributes of an object that the destination domain controller cannot update, because the object does not exist.

Replication is halted in the directory partition for the object, until the lingering object is removed from the source domain controller or the strict replication consistency setting is disabled. For information about how lingering objects are removed, see “Lingering Object Removal” later in this section.

Loose Replication Consistency When strict replication consistency is disabled, the effect is called loose consistency. By using loose consistency, the destination domain controller detects that it does not have the object for the attribute that is being replicated. The destination domain controller requests the entire object from the source partner, and, thereby, reanimates the object in its copy of the directory. The same process repeats on all domain controllers that do not have a copy of the object.

This mechanism can be used to cause lingering objects to be reanimated across the entire forest. If a lingering object is discovered, and its presence is intended, then perform an update to the object. As long as replication consistency is set to loose (strict replication consistency is disabled) on all domain controllers, the object will be reanimated as it replicates around the forest.

Loose replication consistency is the default setting for domain controllers that are running Windows 2000 Server with SP3 or later. The Windows 2000 Server default is not changed by upgrading to Windows Server 2003; strict replication consistency remains disabled, and replication is allowed to proceed. Keeping the Windows 2000 Server setting is required to ensure that the upgraded domain and forest are consistent with Windows 2000 Server functionality. You must change the setting manually following the upgrade.

Storage for Consistency Setting The setting for replication consistency is in the registry, on each domain controller.

The value for the consistency setting is stored in the Strict Replication Consistency entry, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters.




The values are as follows:

• Value: 1 (Set to 0 to disable) • Default: 1 (enabled) in a new Windows Server 2003 forest, otherwise 0. • Data type: REG_DWORD

Note: Having Strict Replication Consistency set to 0 or unset is equivalent to the Windows 2000 Server setting applied by the Correct Missing Objects registry entry. However, the semantics for Correct Missing Objects are the opposite of Strict Replication Consistency: Correct Missing Objects=1 is equivalent to Strict Replication Consistency=0 or unset.




Lingering Object Removal

Detecting Lingering ObjectsEvent ID 1988

Repadmin /removelingeringobjectsUse /advisorymode first!

Lingering Object Removal

On domain controllers that are running Windows Server 2003, you can use Repadmin to analyze and remove lingering objects from a domain controller that you know or suspect has not replicated within the tombstone lifetime.

If strict replication consistency is in effect, and replication fails on the destination domain controller, event ID 1988 is logged in the Directory Service event log, on the destination domain controller. Event ID 1988 indicates that the local domain controller has attempted to replicate an object from the source domain controller, and the object is not present on the local domain controller because it might have been deleted, and its tombstone might already have been garbage-collected. Event ID 1988 provides the GUID-based distinguished name of the source domain controller, as well as the distinguished name and GUID of the outdated object. Replication of the directory partition containing the outdated object does not continue with the source domain controller, until the situation has been resolved.

In this event, you can use an up-to-date domain controller as the authority against which to compare the objects on the source replication partner that is suspected of harboring lingering objects. This domain controller acts as the authoritative directory replica, to reveal outdated objects in the suspect directory database on the destination.




In the Repadmin command-line arguments that remove lingering objects, the roles of the source and the destination are switched. The repadmin /removelingeringobjects command compares the directories of two domain controllers:

• A “source” domain controller that you designate as the authoritative reference server.

• A “destination” domain controller, which is the source replication partner that has attempted to replicate an outdated object.

• The advisory mode argument allows you to view the results of the command

before you take action to remove any objects from the directory.

To use the repadmin /removelingeringobjects command, both source and destination domain controllers must be running Windows Server 2003.

The command REPADMIN /REMOVELINGERINGOBJECTS has the following syntax:

/removelingeringobjects <Dest_DC_LIST> <Source DC GUID> <NC> [/ADVISORY_MODE]

where

• <Dest_DC_List> is the DNS or NetBIOS name of one or more domain controllers that you suspect is harboring lingering objects. <DC_List> provides the ability to target specific domain controllers, such as all domain controllers in a site, all global catalog servers, or domain controllers that hold specific operations master roles. To see the syntax for DC_List, type REPADMIN /LISTHELP at the command prompt.

• <Source DC GUID> is the GUID you obtained by running repadmin /showrepl against the source domain controller that you are using as the authoritative server.

• <NC> is the distinguished name of the directory partition that contains the lingering object.

• [/ADVISORY_MODE] is an optional switch that specifies that no deletions are performed on the destination domain controller, but are displayed (logged) only. Using this switch prior to allowing Repadmin to remove any objects is recommended.

To use this command, you must first obtain the GUID of the authoritative source domain controller by running repadmin /showrepl <source_server>, where <source_server> is the name of the domain controller that has a writable copy of the directory partition that will serve as the authoritative replica. The output of this command provides the DC GUID that the /removelingeringobjects command requires to identify the authoritative source.




RemoveLingeringObjects Implementation When you run repadmin /removelingeringobjects, the tool performs the following steps to compare the directories of the source and destination domain controllers and log (or remove) any found lingering objects:

• Checks to ensure that the directory partition and the source domain controller are valid.

• Verifies that the user has the DS-Replication-Manage-Topology extended right, on the directory partition container object specified in <NC>. This extended right is required to verify object state between two domain controllers. Members of the Domain Admins group have this right, by default.

• Ensures that both source and destination use the same objects for comparison, by merging the up-to-dateness vectors to filter out any objects that have not replicated from the source to the destination or from the destination to the source. This check rules out a lingering object on the destination, if the destination has not received the tombstone from the source, and vice versa. Any such nonreplicated objects are removed from the comparison.

• Creates the list of object GUIDs for each domain controller that will be compared. Examine the metadata of each object, and use the merged up-to-dateness vector to determine whether the object should be present on both the source and the destination.

• For each GUID that is in the list for the destination, determines if it is in the list of GUIDs for the source.

• If a GUID is not found on the source, the object is identified as outdated on the destination, and it is either displayed or deleted on the destination server. If advisory mode has been specified, the GUID is displayed only.




AD Replication on a Restored Domain Controller

Non-authoritative RestoreAuthoritative RestoreAuthoritative Restore and Group Membership

2000 vs. 2003Restoring Back-linksNTDSUTIL in Windows Server 2003

AD Replication on a Restored Domain Controller

All domain controllers must be backed up, routinely, to ensure directory integrity. In case of failure on a domain controller, the backup media can be used to restore the domain controller to its state at the time of the backup. When a domain controller is restored from a backup, it can then be brought up to date by normal replication.

There are two general methods for restoring Active Directory from backup media, each of which has different replication consequences:

• Nonauthoritative restore: Replication brings the domain controller up to date, from its state at the time of backup, including updating deletions that have occurred since the time of the backup.

• Authoritative restore: Objects that were deleted can be reinstated.

Nonauthoritative Restore Nonauthoritative restore is the default method of performing a restore of Active Directory. It is used in the majority of restore situations, such as domain controller hard disk failure. Nonauthoritative restore is performed by using the backup tool that you used to create the backup file. A nonauthoritative restore returns Active Directory on the domain controller to a state that is consistent with the state at the time that the backup was performed. When the domain controller restarts following the restore process, it requests changes from its replication partners. Through the normal replication process, the restored domain controller receives any directory changes that have occurred since the time of the backup.




Because the restore process does not restore any previously deleted data to Active Directory, it is described as nonauthoritative.

Authoritative Restore The primary purpose of an authoritative restore is to reinstate objects that were deleted from Active Directory. To reinstate objects that were intentionally or accidentally deleted, a nonauthoritative restore must be completed and followed by an authoritative restore. Authoritative restore is performed by using Ntdsutil.exe.

The nonauthoritative restore process cannot reinstate deleted objects from a backup image, because the backup media that was used to restore the domain controller contains an image of Active Directory that was created before the objects were deleted. In this case, the deletions would simply be replicated from the up-to-date replication partner and applied by the restored domain controller. To reinstate a deleted object, an authoritative restore is required.

The authoritative restore process works as follows:

• The domain controller is restarted in Directory Services Restore Mode, and a nonauthoritative restore of Active Directory is performed, using backup media that was created before the object was deleted.

• Following the nonauthoritative restore, but prior to restarting, the object metadata is altered by Ntdsutil.exe, so that it has a higher USN than any other possible version of the object (by default, the version number is increased by 100,000). The effect is to render the object or objects as authoritative and reinstate them in Active Directory.

The authoritative restore process does not affect objects that were created after the backup was created.

Note: Only the domain and configuration directory partitions can be marked as authoritative. The schema directory partition cannot be authoritatively restored.

Restoring Back-links for Authoritatively Restored Objects When authoritative restore is performed on domain controllers running Windows Server 2000, the procedure recovers objects that have been deleted, but it does not restore the back-links for any objects that have linked attributes. The effect of not restoring back-links for the restored objects is particularly problematic for group memberships, which must be restored manually.

In a forest that has a forest functional level of Windows Server 2003 or Windows Server 2003 interim, the procedure for performing authoritative restore automatically restores back-links for multivalued, linked attributes. For example, the member attribute of groups to which a restored user object belongs are updated. This restoration applies to only those links that were created after the functional level was




raised. For example, if you added a user to a group before raising the forest functional level, the user’s membership in that group will not be restored if you delete the user and then authoritatively restore the user. Automatic restore of back-links requires the raised forest functional level, because link restoration is made possible by linked-value replication.

Restoring Back-links Created Before Linked-Value Replication An updated version of Ntdsutil that is included with Windows Server 2003 SP1 makes it possible to also restore back-links that were created before implementation of linked-value replication. On domain controllers that have this updated version of Ntdsutil, the authoritative restore option generates an LDAP Data Interchange Format (LDIF) file that can be used to restore any back-links that are not restored automatically. In addition, Ntdsutil generates a text file that you can use to create an LDIF file for restoring back-links for groups in other domains. The LDIF file can be used to restore back-links on domain controllers running Windows 2000 Server or Windows Server 2003, and it does not depend on forest functional level. This method also resolves the problem of links not being restored when linked user and group objects are authoritatively restored together and the restored group object replicates out before the restored user object. For more information about restoring back-links, see "Managing Active Directory Backup and Restore" in the Active Directory Operations Guide.




Section 2: Active Directory Replication Topology

Knowledge Consistency Checker:Goals of KCCArchitecture and Physical Structures

Topology Related Components

Replication Between Sites

Section 2: Active Directory Replication Topology

Active Directory implements a replication topology that takes advantage of the network speeds within sites, which are ideally configured to be equivalent to local area network (LAN) connectivity (network speed of 10 megabits per second [Mbps] or higher). The replication topology also minimizes the use of potentially slow or expensive wide area network (WAN) links between sites.

When you create a site object in Active Directory, you associate one or more Internet Protocol (IP) subnets with the site. Each domain controller in a forest is associated with an Active Directory site. A client workstation is associated with a site according to its IP address; that is, each IP address maps to one subnet, which, in turn, maps to one site.

Active Directory uses sites to:

• Optimize replication for speed and bandwidth consumption between domain controllers.

• Locate the closest domain controller for client logon, services, and directory searches.

• Direct a Distributed File System (DFS) client to the server that is hosting the requested data within the site.

• Replicate the system volume (SYSVOL), a collection of folders in the file system that exists on each domain controller in a domain and is required for implementation of Group Policy.




The ideal environment for replication topology generation is a forest that has a forest functional level of Windows Server 2003. In this case, replication topology generation is faster and can accommodate more sites and domains than when the forest has a forest functional level of Windows 2000. When at least one domain controller in each site is running Windows Server 2003, more domain controllers in each site can be used to replicate changes between sites than when all domain controllers are running Windows 2000 Server.

Replication topology generation requires the following conditions:

• A DNS infrastructure that manages the name resolution for domain controllers in the forest. Active Directory–integrated DNS is assumed, wherein DNS zone data is stored in Active Directory and is replicated to all domain controllers that are DNS servers.

• All physical locations that are represented as site objects in Active Directory have LAN connectivity.

• IP connectivity is available between each site and all sites that host operations master roles in the same forest.

• Domain controllers meet the hardware requirements for Microsoft Windows Server 2003, Standard Edition; Windows Server 2003, Enterprise Edition; and Windows Server 2003, Datacenter Edition.

• The appropriate number of domain controllers is deployed for each domain that is represented in each site.

This section covers the replication components that create the replication topology and describes how they work together; it also examines the mechanisms and rationale for routing replication traffic between domain controllers both in the same site and in different sites.




Goals of Replication Topology

Connect every directory partition replica that must be replicatedControl replication latency and costRoute replication between sites

Goals of Replication Topology

By default, the replication topology is managed automatically and optimizes existing connections. However, manual connections, created by an administrator, are not modified or optimized.

Connect Directory Partition Replicas The total replication topology is actually composed of several underlying topologies, one for each directory partition. In the case of the schema and configuration directory partitions, a single topology is created. The underlying topologies are merged, to form the minimum number of connections that are required to replicate each directory partition between all domain controllers that store replicas. Where the connections for directory partitions are identical between domain controllers, when two domain controllers store the same domain directory partition for example, a single connection can be used for replication of updates to the domain, schema, and configuration directory partitions.

A separate replication topology is also created for application directory partitions. However, in the same manner as schema and configuration directory partitions, application directory partitions can use the same topology as domain directory partitions. When application and domain directory partitions are common to the source and destination domain controllers, the KCC does not create a separate connection for the application directory partition.

A separate topology is not created for the partial replicas that are stored on global catalog servers. The connections that are needed by a global catalog server to replicate each partial replica of a domain are part of the topology that is created for each domain.




The routes for the following directory partitions or combinations of directory partitions are aggregated to arrive at the overall topology:

• Configuration and schema within a site • Each writable domain directory partition within a site • Each application directory partition within a site • Global catalog read-only, partial domain directory partitions within a site • Configuration and schema between sites • Each writable domain directory partition between sites • Each application directory partition between sites • Global catalog read-only, partial domain directory partitions between sites Replication transport protocols determine the manner in which replication data is transferred over the network media. Your network environment and server configuration dictates the transports that you can use.

Control Replication Latency and Cost Replication latency is inherent in a multimaster directory service. A period of replication latency begins when a directory update occurs on an originating domain controller and ends when replication of the change is received on the last domain controller in the forest that requires the change. Generally, the latency that is inherent in a WAN link is relative to a combination of the speed of the connection and the available bandwidth. Replication cost is an administrative value that can be used to indicate the latency that is associated with different replication routes between sites. A lower-cost route is preferred by the Intersite Topology Generator (ISTG) when it is generating the replication topology.

Site topology is the topology as represented by the physical network: the LANs and WANs that connect domain controllers in a forest. The replication topology is built to use the site topology. The site topology is represented in Active Directory by site objects and site link objects. These objects influence Active Directory replication to achieve the best balance between replication speed and the cost of bandwidth utilization, by distinguishing between replication that occurs within a site and replication that must span sites. When the KCC creates replication connections between domain controllers to generate the replication topology, it creates more connections between domain controllers in the same site than between domain controllers in different sites. The results are lower replication latency within a site and less replication bandwidth utilization between sites.

Within sites, replication is optimized for speed as follows:

• Connections between domain controllers in the same site are always arranged in a ring, with possible additional connections to reduce latency.




• Replication within a site is triggered by a change notification mechanism when an update occurs, moderated by a short, configurable delay (because groups of updates frequently occur together).

• Data is sent uncompressed, and, thus, without the processing overhead of data compression.

Between sites, replication is optimized for minimal bandwidth usage (cost) as follows: • Replication data is compressed to minimize bandwidth consumption over WAN links. • Store-and-forward replication makes efficient use of WAN links, because each

update crosses an expensive link only once. • Replication occurs at intervals that you can schedule, so that use of expensive WAN

links is managed. • The intersite topology is a layering of spanning trees (one intersite connection

between any two sites for each directory partition) and generally does not contain redundant connections.

Route Replication between Sites The KCC uses the information in Active Directory to identify the least-cost routes for replication between sites. If a domain controller is unavailable at the time the replication topology is created, making replication through that site impossible, the next least-cost route is used. This rerouting is automatic when site links are bridged (transitive), which is the default setting.

Replication is automatically routed around network failures and offline domain controllers.




KCC Architecture and Processes

KCC Functions Locally on Each DCRuns every 15 minutesComputes topology and creates/deletes connection objectsConnection object defines incoming replication from partnerCreates Inbound Connections Only

Topology concepts and componentsConfiguration NC and schema NC share same topologyEach domain and application directory NC has its own topologyGlobal catalog utilizes domain NCs

ISTG responsible for inter-site connections

KCC Architecture and Processes

The replication topology is generated by the KCC, a replication component that runs as an application on every domain controller and communicates through the distributed Active Directory database. The KCC functions locally by reading, creating, and deleting Active Directory data. Specifically, the KCC reads configuration data and reads and writes connection objects. The KCC also writes local, nonreplicated attribute values that indicate the replication partners from which to request replication.

For most of its operation, the KCC that runs on one domain controller does not communicate directly with the KCC on any other domain controller. Rather, all KCCs use the knowledge of the common, global data that is stored in the configuration directory partition, as input to the topology generation algorithm, to converge on the same view of the replication topology.

Each KCC uses its in-memory view of the topology to create inbound connections locally, manifesting only those results that apply to it. The KCC only communicates with other KCCs to make remote procedure call (RPC) requests for replication error information. The KCC uses the error information to identify gaps in the replication topology. Requests for replication error information occur only between domain controllers in the same site.

Note: The KCC uses only RPC to communicate with the directory service. The KCC does not use Lightweight Directory Access Protocol (LDAP).




Intersite Topology Generator One domain controller in each site is selected as the Intersite Topology Generator. To enable replication across site links, the ISTG automatically designates one or more servers to perform site-to-site replication. These servers are called bridgehead servers. A bridgehead is a point at which a connection leaves or enters a site.

The ISTG creates a view of the replication topology for all sites, including existing connection objects between all domain controllers that are acting as bridgehead servers. The ISTG then creates inbound connection objects for those servers in its site that it determines will act as bridgehead servers and for which connection objects do not already exist. Thus, the scope of operation for the KCC is the local server only, and the scope of operation for the ISTG is a single site.

Figure 5 below, shows the KCC architecture on servers in the same forest in two different sites.

Figure 5. KCC Architecture




The architecture and process components in the preceding diagram are described in Table 2.


Knowledge Consistency Checker (KCC)

The application running on each domain controller that communicates directly with the Ntdsa.dll to read and write replication objects.

Directory System Agent (DSA)

The directory service component that runs as Ntdsa.dll on each domain controller, providing the interfaces through which services and processes, such as the KCC, gain access to the directory database.

Extensible Storage Engine (ESE)

The directory service component that runs as Esent.dll. ESE manages the tables of records, each with one or more columns. The tables of records comprise the directory database.

Remote procedure call (RPC)

The Directory Replication Service (Drsuapi) RPC protocol, used to communicate replication status and topology to a domain controller. The KCC also uses this protocol to communicate with other KCCs to request error information when building the replication topology.

Intersite Topology Generator (ISTG)

The single KCC in a site that manages intersite connection objects for the site.

Table 2: KCC Architecture and Process Components

The four servers in the preceding diagram create identical views of the servers in their sites and generate connection objects on the basis of the current state of Active Directory data in the configuration directory partition. In addition to creating its view of the servers in its site, the KCC that operates as the ISTG in each site also creates a view of all servers in all sites in the forest. From this view, the ISTG determines the connections to create on the bridgehead servers in its own site.

Note: A connection requires two endpoints: one for the destination domain controller and one for the source domain controller. Domain controllers creating an intrasite topology always use themselves as the destination end point and must consider only the endpoint for the source domain controller. The ISTG, however, must identify both endpoints in order to create connection objects between two other servers.

Thus, the KCC creates two types of topology: intrasite and intersite. Within a site, the KCC creates a ring topology by using all servers in the site. To create the intersite topology, the ISTG in each site uses a view of all bridgehead servers in all sites in the forest. Figure 6 shows a high-level generalization of the view of both an intrasite ring topology that the KCC sees and the view of the intersite topology that the ISTG sees. Lines between domain controllers within a site represent inbound and outbound connections between the servers. The lines between sites represent configured site links. Bridgehead servers are represented as BH.




Figure 6: KCC and ISTG Views of Intrasite and Intersite Topology




Replication Topology Physical Structure Example Replication Topology Physical Structure Example

The Active Directory replication topology can use many different components. Some components are required; others are not required but are available for optimization. The diagram above illustrates most replication topology components and their places in a sample Active Directory multisite and multidomain forest. The depiction of the intersite topology that uses multiple bridgehead servers for each domain assumes that at least one domain controller in each site is running Windows Server 2003. All components of this diagram and their interactions are explained in detail later in this section.

In the diagram above, all servers are domain controllers. They independently use global knowledge of configuration data to generate one-way, inbound connection objects. The KCCs in a site collectively create an intrasite topology for all domain controllers within that site. The ISTGs from all sites collectively create an intersite topology. Within sites, one-way arrows indicate the inbound connections by which each domain controller replicates changes from its partner in the ring. For intersite replication, one-way arrows represent inbound connections that are created, by the ISTG, of each site, from bridgehead servers (BH) for the same domain (or from a global catalog server [GC] acting as a bridgehead if the domain is not present in the site) in other sites that share a site link. Domains are indicated as D1, D2, D3, and D4.

Each site in the diagram represents a physical LAN in the network, and each LAN is represented as a site object in Active Directory. Heavy solid lines between sites indicate WAN links over which two-way replication can occur and each WAN link is represented in Active Directory as a site link object. Site link objects allow connections to be created between bridgehead servers within each site that is connected by the site link.




Not shown in the diagram is that where TCP/IP WAN links are available, replication between sites uses the RPC replication transport. RPC is always used within sites. The site link between Site A and Site D uses the SMTP protocol for the replication transport to replicate the configuration and schema directory partitions and global catalog partial, read-only directory partitions. Although SMTP transport cannot be used to replicate writable domain directory partitions, this transport is required because a TCP/IP connection is not available between Site A and Site D. This configuration is acceptable for replication because Site D does not host domain controllers for any domains that must be replicated over the site link A-D.

By default, site links A-B and A-C are transitive (bridged), which means that replication of domain D2 is possible between Site B and Site C, although no site link connects the two sites. The cost values on site links A-B and A-C are site link settings that determine the routing preference for replication, which is based on the aggregated cost of available site links. The cost of a direct connection between Site C and Site B is the sum of costs on site links A-B and A-C. So replication between Site B and Site C is automatically routed through Site A, to avoid the more expensive, transitive route. Connections are created between Site B and Site C only if replication through Site A becomes impossible due to network or bridgehead server conditions.




Topology-Related Components

ConnectionsServersNTDS settingsSitesSubnetsSite linksBridgeheadsSite link bridgesCross-reference

Topology-Related Components

Active Directory uses information stored in the forest-wide configuration directory partition to establish and implement the replication topology. Several configuration objects define the components that are required by replication. The KCC uses these and other objects and their properties, to create and manage the connections by which the directory transfers updates, and to specify one or more domain controllers from which a particular server requests changes.

Connections A connection object (class nTDSConnection) defines a one-way, inbound route from one domain controller (the source) to another domain controller (the destination). The KCC uses information in cross-reference objects to create the appropriate connection objects, which enable domain controllers that store the same directory partitions to replicate with each other. The KCC creates connections for every server object in the Sites container that has an NTDS Settings object.

The connection object is a child of the replication destination’s NTDS Settings object, and the connection object references the replication source domain controller in the fromServer attribute on the connection object, that is, it represents the inbound half of a connection. The connection object contains a replication schedule and specifies a replication transport. The connection object schedule is derived from the site link schedule for intersite connections. A connection is unidirectional; a bidirectional replication connection is represented as two connection objects under two different NTDS Settings objects.




Connection objects are created in two ways:

• Automatically, by the KCC • Manually, by a directory administrator using Active Directory Sites and Services,

ADSI Edit, or scripts

Servers When a domain controller is promoted into Active Directory, the installation process creates a server object in the Servers container within the site to which the IP address of the domain controller maps. There is one server object for each domain controller in the site. A server object is distinct from the computer object that represents the computer as a security principal. These objects are in separate directory partitions and have separate GUIDs. The computer object represents the domain controller in the domain directory partition; the server object represents the domain controller in the configuration directory partition. The server object contains a reference to the associated computer object.

NTDS Settings Object The NTDS Settings object (class nTDSDSA) represents an instance of Active Directory on that server and distinguishes a domain controller from a member server in the domain. When Active Directory is removed from a server, its NTDS Settings object is deleted from Active Directory, but its server object remains, because the server object might contain objects other than NTDS Settings objects. For a specific server object, the NTDS Settings object contains the individual connection objects that represent the inbound connections from other domain controllers, in the forest, that are currently available to send changes to this domain controller.

Sites A site should represent a region of uniformly good network access, which can be interpreted as being generally equivalent to local area network (LAN) connectivity. LAN connectivity assumes high, inexpensive bandwidth that allows similar and reliable network performance, regardless of which two computers in the site are communicating. This quality of connectivity does not indicate that all servers in the site must be on the same network segment or that hop counts between all servers must be identical. Rather, it can be interpreted as the measure by which you know that if a large amount of data needed to be copied from one server to another, it would not matter to you which servers were involved. If you find that you are concerned about such situations, you might consider creating another site.

Subnets Computers on TCP/IP networks are assigned to sites based on their location in a subnet or a set of subnets. Subnets group computers in a way that identifies their physical proximity on the network. Subnet information is used during the process of domain controller location to find a domain controller in the same site as the computer that is logging on. This information also is used during Active Directory replication, to determine the best routes between domain controllers.




Site Links For replication to occur between two sites, a link must be established between the sites. Only the first site link is generated automatically; others can be created in Active Directory Sites and Services. Unless a site link is in place, the KCC cannot create connections automatically between computers in the two sites, and replication between the sites cannot take place. The Active Directory Sites and Services user interface guarantees that every site is placed in at least one site link. A site link can contain more than two sites, in which case, all the sites are treated as equally well connected.

Bridgehead Servers A bridgehead is a point at which a connection leaves or enters a site. To communicate across site links, the KCC automatically designates one or more servers, called bridgehead servers, in each site to perform site-to-site replication. Bridgeheads are discussed in greater detail in the next section.

Site Link Bridges If your IP network is composed of IP segments that are not fully routed, you can disable Bridge all site links (discussed in more detail in a coming section) for the IP transport. In this case, all IP site links are considered to be nontransitive, and you can create and configure site link bridge objects to model the actual routing behavior of your network. A site link bridge has the effect of providing routing for a disjoint network (networks that are separate and unaware of each other). When you add site links to a site link bridge, all site links within the bridge can route transitively.

Site link bridge objects are used by the KCC only when the Bridge all site links setting is disabled. Otherwise, site link bridge objects are ignored. Bridging site links manually is generally only recommended for large branch office deployments. For more information about using manual site link bridging, see the Windows Server 2003 Active Directory Branch Office Deployment Guide.

Cross-Reference Objects Cross-reference objects (class crossRef) store the location of directory partitions in CN=partitions,CN=configuration,DC=ForestRootDomain. The contents of the Partitions container are not visible in Active Directory Sites and Services, but can be viewed by using Adsiedit.msc, and viewing the Configuration directory partition.

Active Directory replication uses cross-reference objects to locate the domain controllers that store each directory partition. A cross-reference object is created during Active Directory installation to identify each new directory partition that is added to the forest. Cross-reference objects store the identity (nCName) and location (dNSRoot) of each directory partition.




Replication between Sites

Replication between Sites

Bridgehead ServersReplicate between sitesAt least one Bridgehead per partition per site

Bridgehead Server SelectionAutomatically selected by defaultManually-created connection objects

Preferred Bridgehead ServersIncreases administrative overheadMisconfiguration decreases fault-tolerance

Intersite replication of configuration and schema changes is always required when more than one site is configured in a forest, and replication between sites of domain-specific updates are required when domain controllers for a domain are located in more than one site. Replication between sites is accomplished by bridgehead servers, which replicate changes according to site link settings.

Bridgehead Servers When domain controllers for the same domain are located in different sites, at least one bridgehead server per directory partition and one bridgehead server per (IP or SMTP) transport replicate changes from one site to a bridgehead server in another site. A single bridgehead server can serve multiple partitions for each transport and for multiple transports. Replication within the site allows updates to flow between the bridgehead servers and the other domain controllers in the site. Bridgehead servers help to ensure that the data replicated across WAN links is not stale or redundant.

Any server that has a connection object with a source server in another site is acting as a destination bridgehead. Any server that is acting as a source for a connection to another site acts as a source bridgehead.




Note: You can identify a KCC-selected bridgehead server in Active Directory Sites and Services by viewing connection objects for the server (select the NTDS Settings object below the server object). If there are connections from servers in a different site or sites, the server represented by the selected NTDS Settings object is a bridgehead server. If you have Windows Support Tools installed, you can see all bridgehead servers by using the command repadmin /bridgeheads.

KCC selection of bridgehead servers guarantees the identification of bridgehead servers that are capable of replicating all directory partitions that are needed in the site, including partial global catalog partitions. By default, bridgehead servers are selected automatically by the KCC, on the domain controller that holds the ISTG role in each site. If you want to identify the domain controllers that can act as bridgehead servers, you can designate preferred bridgehead servers, from which the ISTG selects all bridgehead servers. Alternatively, if the ISTG is not used to generate the intersite topology, you can create manual intersite connection objects on domain controllers to designate bridgehead servers.

In sites that have at least one domain controller that is running Windows Server 2003, the ISTG can select bridgehead servers from all eligible domain controllers, for each directory partition that is represented in the site. For example, if three domain controllers in a site store replicas of the same domain, and domain controllers for this domain are also located in three or more other sites, the ISTG can spread the inbound connection objects from those sites among all three domain controllers, including those that are running Windows 2000 Server.

In Windows 2000 forests, a single bridgehead server per directory partition and per transport is designated as the bridgehead server that is responsible for intersite replication of that directory partition. Therefore, for the preceding example, only one of the three domain controllers would be designated by the ISTG as a bridgehead server for the domain, and all four connection objects from the four other sites would be created on the single bridgehead server. In large hub sites, a single domain controller might not be able to adequately respond to the volume of replication requests from, perhaps, thousands of branch sites.

Bridgehead Server Selection Bridgehead servers can be selected in the following ways:

• Automatically, by the ISTG from all domain controllers in the site

• Automatically, by the ISTG from all domain controllers that are identified as preferred bridgehead servers, if any preferred bridgehead servers are assigned. Preferred bridgehead servers must be assigned manually

• Manually, by creating a connection object on a domain controller in one site from a domain controller in a different site




By default, when at least one domain controller in a site is running Windows Server 2003, regardless of forest functional level, any domain controller that hosts a domain in the site is a candidate to be an ISTG-selected bridgehead server. If preferred bridgehead servers are selected, candidates are limited to this list. The connections from remote servers are distributed among the available candidate bridgehead servers in each site. The selection of multiple bridgehead servers per domain and per transport is new in Windows Server 2003. The ISTG uses an algorithm to evaluate the list of domain controllers in the site that can replicate each directory partition. This algorithm is improved in Windows Server 2003, to randomly select multiple bridgehead servers per directory partition and per transport. In sites containing only domain controllers that are running Windows 2000 Server, the ISTG selects only one bridgehead server per directory partition and one bridgehead server per transport.

When bridgehead servers are selected by the ISTG, the ISTG ensures that each directory partition in the site that has a replica in any other site can be replicated to and from that site or sites. Therefore, if a single domain controller hosts the only replica of a domain in a specific site and the domain has domain controllers in another site or other sites, that domain controller must be a bridgehead server in its site. In addition, that domain controller must be able to connect to a bridgehead server in the other site that also hosts the same domain directory partition.

Note: If a site has a global catalog server but does not contain at least one domain controller of every domain in the forest, then at least one bridgehead server must be a global catalog server.

Preferred Bridgehead Servers Because bridgehead servers must be able to accommodate more replication traffic than non-bridgehead servers, you might want to control the particular servers that have this responsibility. To specify servers that the ISTG can designate as bridgeheads, you can select domain controllers in the site that you want the ISTG to always consider as preferred bridgehead servers for the specified transport. These servers are used exclusively to replicate changes collected from the site to any other site over that transport. Designating preferred bridgehead servers also serves to exclude those domain controllers that, for reasons of capability, you do not want to be used as bridgehead servers.

Depending on the available transports, the directory partitions that must be replicated, and the availability of global catalog servers, multiple bridgehead servers might be required to replicate full and partial copies of directory data from one site to another.

The ISTG recognizes preferred bridgehead servers by reading the bridgeheadTransportList attribute of the server object. When this attribute has a value that is the distinguished name of the transport container that the server uses for intersite




replication (IP or SMTP), the KCC treats the server as a preferred bridgehead server. For example, the value for the IP transport is CN=IP,CN=Inter-Site Transports,CN=Sites,CN=Configuration,DC=ForestRootDomainName. You can use Active Directory Sites and Services to designate a preferred bridgehead server by opening the server object properties and placing either the IP or SMTP transport into the preferred list, which adds the respective transport distinguished name to the bridgeheadTransportList attribute of the server.

The bridgeheadServerListBl attribute of the transport container object is a back-link attribute of the bridgeheadTransportList attribute of the server object. If the bridgeheadServerListBl attribute contains the distinguished name of at least one server in a site, then the KCC uses only preferred bridgehead servers to replicate changes for that site, according to the following rules:

• If at least one server is designated as a preferred bridgehead server, updates to the domain directory partition hosted by that server can be replicated only from a preferred bridgehead server. If, at the time of replication, no preferred bridgehead server is available for that directory partition, replication of that directory partition does not occur.

• If any bridgehead servers are designated, but no domain controller is designated as a preferred bridgehead server for a specific directory partition that has replicas in another site or sites, the KCC selects a domain controller to act as the bridgehead server, if one is available that can replicate the directory partition to the other site or sites.

Therefore, to use preferred bridgehead servers effectively, be sure to:

• Assign at least two or more bridgehead servers for each of the following:

o Any domain directory partition that has a replica in any other site

o Any application directory partition that has a replica in another site

o The schema and configuration directory partitions (one bridgehead server replicates both) if no domains in the site have replicas in other sites

• If the site has a global catalog server, select the global catalog server as one of the preferred bridgehead servers.




Site Link Settings and Their Effects on Intersite Replication Site Link Settings and Their Effects on Intersite Replication

Site Link SettingsSites to be connectedSchedulesReplication intervalSite link cost

A site can be connected to other sites by any number of site links. For example, a hub site has site links to each of its branch sites. Each site that contains a domain controller in a multisite directory must be connected to at least one other site by at least one site link; otherwise, it cannot replicate with domain controllers in any other site.

With these objects in place replication can occur according to the settings on the site link. In Active Directory Sites and Services, the site link Properties->General tab contains the following options for configuring site links to control the replication topology:

• A list of two or more sites to be connected. • Schedule • Replication Interval • Cost

Site Link Schedule Replication using the RPC transport between sites is scheduled. The schedule specifies one or many time periods during which replication can occur. For example, you might schedule a site link for a dial-up line to be available during off-peak hours (when telephone rates are low) and unavailable during high-cost regular business hours. The schedule attribute of the site link object specifies the availability of the site link. The default setting is that replication is always available.




Note: The Ignore schedules setting on the IP container is equivalent to replication being always available. If Ignore schedules is selected, replication occurs at the designated intervals but ignores any schedule.

If replication goes through multiple site links, there must be at least one common time period (overlap) during which replication is available; otherwise, the connection is treated as not available. For example, if site link AB has a schedule of 18:00 hours to 24:00 hours and site link BC has a schedule of 17:00 hours to 20:00 hours, the resulting overlap is 18:00 hours through 20:00 hours, which is the intersection of the schedules for site link AB and site link BC. During the time in which the schedules overlap, replication can occur from site A to site C even if a domain controller in the intermediate site B is not available. If the schedules do not overlap, replication from the intermediate site to the distant site continues when the next replication schedule opens on the respective site link.

Note: Cost considerations also effect whether connections are created. However, if the site link schedules do not overlap, the cost is irrelevant.

Scheduling across time zones When scheduling replication across time zones, consider the time difference, to ensure that replication does not interfere with peak production times in the destination site.

Domain controllers store time in Coordinated Universal Time (UTC). When viewed through the Active Directory Sites and Services snap-in, time settings in site link object schedules are displayed according to the local time of the computer on which the snap-in is being run, but replication occurs according to UTC.

For example, suppose Seattle adheres to Pacific Standard Time (PST) and Japan adheres to Japan Standard Time (JST), which is 17 hours later. If a schedule is set on a domain controller in Seattle, and the site link on which the schedule is set connects Seattle and Tokyo, the actual time of replication in Tokyo is 17 hours later.

If the schedule is set to begin replication at 10:00 PM PST in Seattle, the conversion can be computed as follows:

• Convert 10:00 PM PST to 22:00 PST military time.

• Add 8 hours to arrive at 06:00 UTC, the following day.

• Add 9 hours to arrive at 15:00 JST.

• 15:00 JST converts to 3:00 PM.

Thus, when replication begins at 10:00 o’clock at night in Seattle, it is occurring in Tokyo at 3:00 o’clock in the afternoon the following day. By scheduling replication a few hours later in Seattle, you can avoid replication occurring during working hours in Japan.




Schedule implementation The times that you can set in the Schedule setting on the site link are in one-hour increments. For example, you can schedule replication to occur between 00:00 hours and 01:00 hours, between 01:00 hours and 02:00 hours, and so forth. However, each block in the actual connection schedule is 15 minutes. For this reason, when you set a schedule of 01:00 hours to 02:00 hours, you can assume that replication is queued at some point between 01:00 hours and 01:14:59 hours.

Note: RPC synchronous inbound replication is serialized so that, if the server is busy replicating this directory partition from another source, replication from a different source does not begin until the first synchronization is complete. SMTP asynchronous replication is processed serially by order of arrival, with multiple replication requests queued simultaneously.

Specifically, a replication event is queued at time t + n, where t is the replication interval that is applied across the schedule and n is a pseudo-random number from 1 minute through 15 minutes. For example, if the site link indicates that replication can occur from 02:00 hours through 07:00 hours, and the replication interval is 2 hours (120 minutes), t is 02:00 hours, 04:00 hours, and 06:00 hours. A replication event is queued on the destination domain controller between 02:00 hours and 02:14:59 hours, and another replication event is queued between 04:00 hours and 04:14:59 hours. Assuming that the first replication event that was queued is complete, another replication event is queued between 06:00 hours and 06:14:59 hours. If the synchronization took longer than two hours, the second synchronization would be ignored because an event is already in the queue.

Replication can extend beyond the end of the schedule. A period of replication latency that starts before the end of the schedule runs until completion, even if the period is still running when the schedule no longer allows replication to be available.

Note: The replication queue is shared with other events, and the time at which replication takes place is approximate. Duplicate replication events are not queued for the same directory partition and transport.

Connection object schedule Each connection object has a schedule that controls when (during what hours) and how frequently (how many times per hour) replication can occur:

• None (no replication)

• Once per hour (the default setting)

• Twice per hour

• Four times per hour




The connection object schedule and interval are derived from one of two locations, depending on whether the connection it is intrasite or intersite:

• Intrasite connections inherit a default schedule from the schedule attribute of the NTDS Site Settings object. By default, this schedule is always available and has an interval of one hour.

• Intersite connections inherit the schedule and interval from the site link.

Note: You do not need to configure the connection object schedule unless you are creating a manual intersite replication topology that does not use the KCC automatic connection objects.

Although intrasite replication is prompted by changes, intrasite connection objects inherit a default schedule so that replication occurs periodically, regardless of whether change notification has been received. The connection object schedule ensures that intrasite replication occurs if a notification message is lost, or if notification does not take place because the network experiences problems or a domain controller becomes unavailable. The NTDS Site Settings schedule has a minimum replication interval of 15 minutes. This minimum replication interval is not configurable and is the smallest interval that is possible for both intrasite and intersite replication (on a connection object or a site link, respectively).

For intersite replication, the schedule is configured on the site link object, but the connection object schedule actually determines replication; that is, the connection object schedule for an intersite connection is derived from the site link schedule, which is applied through the connection object schedule. Scheduled replication occurs independently of change notification.

The KCC uses a two-step process to compute the schedule of an intersite connection.

1. The schedules of the site links traversed by a connection are merged together.

2. This merged schedule is modified so that it is available only at certain periods. The length of those periods is equal to the maximum replication interval of the site links traversed by this connection.

By using Active Directory Sites and Services, you can manually revise the schedule on a connection object, but such an override is effective only for administrator-owned connection objects.

Replication Interval For each site link object, you can specify a value for the replication interval (frequency), which determines how often replication occurs over the site link during the time that the schedule allows. For example, if the schedule allows replication between 02:00 hours and 04:00 hours, and the replication interval is set for 30 minutes, replication can occur up to four times during the scheduled time.




The default replication interval is 180 minutes, or 3 hours. When the KCC creates a connection between a domain controller in one site and a domain controller in another site, the replication interval of the connection is the maximum interval along the minimum-cost path of site link objects, from one end of the connection to the other.

Interaction of Replication Schedule and Interval When multiple site links are required to complete replication for all sites, the replication interval settings on all of those site links combine to affect the entire length of the connection between sites. In addition, when schedules on each site link are not identical, replication can occur only when the schedules overlap.

Suppose that site A and site B have site link AB, and site B and site C have site link BC. When a domain controller in site A replicates with a domain controller in site C, it can do so only as often as the maximum interval that is set for site link AB and site link BC allows. Table 3 shows the site link settings that determine how often and during what times replication can occur between domain controllers in site A, site B, and site C.

Site Link Replication Interval Schedule

AB 30 minutes 12:00 hours to 04:00 hours

BC 60 minutes 01:00 hours to 05:00 hours

Table 3: Replication Interval and Schedule Settings for Two Site Links

Given these settings, a domain controller in site A can replicate with a domain controller in site B according to the AB site link schedule and interval, which is once every 30 minutes between the hours of 12:00 and 04:00. However, assuming that there is no site link AC, a domain controller in site A can replicate with a domain controller in site C between the hours of 01:00 and 04:00, which is where the schedules on the two site links intersect. Within that time span, they can replicate once every 60 minutes, which is the greater of the two replication intervals.

Site Link Cost The ISTG uses the cost settings on site links to determine the route of replication among three or more sites that replicate the same directory partition. The default cost value on a site link object is 100. You can assign lower or higher cost values to site links to favor inexpensive connections or expensive connections. Certain applications and services, such as domain controller Locator and DFS, also use site link cost information to locate nearest resources. For example, site link cost can be used to determine which domain controller is contacted by clients located in a site that does not include a domain controller for the specified domain. The client contacts the domain controller in a different site according to the site link that has the lowest cost assigned to it.

Cost is usually assigned, not only on the basis of the total bandwidth of the link, but also on the availability, latency, and monetary cost of the link. For example, a 128 Kbps permanent link might be assigned a lower cost than a dial-up 128 Kbps dual ISDN link,




because the dial-up ISDN link has a replication latency-producing delay that occurs as the links are being established or removed. Furthermore, in this example, the permanent link might have a fixed monthly cost, whereas the ISDN line is charged according to actual usage. Because the company is paying up front for the permanent link, the administrator might assign a lower cost to the permanent link to avoid the extra monetary cost of the ISDN connections.

The method used by the ISTG to determine the least-cost path from each site to every other site, for each directory partition, is more efficient when the forest has a functional level of Windows Server 2003 than it is at other levels.




Site Link Transitivity

Transitivity and Automatic Site Link Bridging“Bridge All Site Links” is default setting

Transitivity and ReroutingKCC automatically routes around failures

Disabling Bridge All Site LinksSometimes needed in very large environmentsReduces KCC redundancyCan cause DFS client issues

Site Link Transitivity

Transitivity and Automatic Site Link Bridging By default, site links are transitive, or bridged. If site A has a common site link with site B, and site B also has a common site link with Site C, and the two site links are bridged, domain controllers in site A can replicate directly with domain controllers in site C, under certain conditions, even though there is no site link between site A and site C. In other words, the effect of bridged site links is that replication between sites in the bridge is transitive.

The setting that implements automatic site link bridges is Bridge all site links (BASL), which is found in Active Directory Sites and Services, in the properties of the IP or SMTP intersite transport containers. The default bridging of site links occurs automatically, and no directory object represents the default bridge. Therefore, in the common case of a fully routed IP network, you do not need to manually create any site link bridge objects.

Transitivity and Rerouting For a set of bridged site links, where replication schedules in the respective site links overlap (replication is available on the site links during the same time period), connection objects can be automatically created, if they are needed, between sites that do not have site links that connect them directly. All site links for a specific transport implicitly belong to a single site link bridge for that transport.




Site link transitivity enables the KCC to reroute replication when necessary. In Figure 7, below, a domain controller that can replicate the domain is not available in Seattle. In this case, because the site links are transitive (bridged) and the schedules on the two site links allow replication at the same time, the KCC can reroute replication by creating connections between DC3 in Portland and DC2 in Boston. Connections between domain controllers in Portland and Boston might also be created when a domain controller in Portland is a global catalog server, but no global catalog server exists in the Seattle site, and the Boston site hosts a domain that is not present in the Seattle site. In this case, connections can be created between Portland and Boston to replicate the global catalog partial, read-only replica.

Note: Overlapping schedules are required for site link transitivity, even when Bridge all site links is enabled. In Figure 7, below, if the site link schedules for the SB and PS site links do not overlap, no connections are possible between Boston and Portland.

Figure 7: Transitive Replication When Site Links Are Bridged, Schedules Overlap, and Replication Must Be Rerouted

In Figure 7, creating a third site link to connect the Boston and Portland sites is unnecessary and counterproductive because of the way that the KCC uses cost to route replication. In the configuration that is shown, the KCC uses cost to choose either the route between Portland and Seattle or the route between Portland and Boston. If you wanted the KCC to use the route between Portland and Boston, you would create a site link between Portland and Boston instead of the site link between Portland and Seattle.

Disabling Bridge All Site Links By default, all site links are transitive and it is recommended to keep transitivity enabled by not changing the default value of BASL (enabled by default). However, in very large networks, transitive site links can be an issue because the KCC considers every possible connection in the bridged network and selects only one. As a result, some DCs may




experience high CPU overhead resulting from generating a large transitive replication topology. Therefore, in a Windows 2000 forest that has a very large network, or a Windows Server 2003 forest that consists of an extremely large hub-and-spoke topology, you can reduce KCC-related CPU utilization and run time by turning off BASL and creating manual site link bridges only where they are required.

Turning off BASL might affect the ability of DFS clients to locate DFS servers in the closest site. An ISTG that is running Windows Server 2003 relies on the BASL setting being turned on to generate the intersite cost matrix that DFS requires for its site-costing functionality. When the forest functional level is Windows Server 2003 or Windows Server 2003 interim, an ISTG in a site is running Windows Server 2003 with SP1 can accommodate the DFS requirements with BASL turned off.

You can use a site option to turn off automatic site link bridging for KCC operation without hampering the ability of DFS to use Intersite Messaging to calculate the cost matrix. This site option is set by running the command repadmin /siteoptions W2K3_BRIDGES_REQUIRED. This option is applied to the NTDS Site Settings object (CN=NTDS Site Settings,CN=SiteName,CN=Sites,CN=Configuration,DC=ForestRootDomain). When this method is used to disable automatic site link bridging (as opposed to turning off BASL), default Intersite Messaging options enable the site-costing calculation to occur for DFS.

The site option on the NTDS Site Settings object can be set on any domain controller, but it does not take effect until replication of the change reaches the ISTG role holder for the site.




Site Link Transitivity (con’t) Site Link Transitivity (con’t)

Site Link Cost and RoutingSite Link Changes and Replication Path

Change must replicate to the ISTG of each site KCC must run on each ISTG

Bridging Site Links ManuallyBASL must be disabledNot usually recommended

Site Link Cost and Routing When site links are bridged, the cost of replication, from a domain controller at one end of the bridge to a domain controller at the other end, is the sum of the costs on each of the intervening site links. So, if a domain controller in an interim site stores the directory partition that is being replicated, the KCC will route replication to the domain controller in the interim site rather than to the more distant site. The domain controller in the more distant site, in turn, receives replication from the interim site. If the schedules of the two site links overlap, this replication occurs in the same period of replication latency.

Figure 8, below, illustrates a situation in which two site links, connecting three sites that host the same domain, are bridged automatically (BASL is enabled). The aggregated cost of directly replicating between Portland and Boston illustrates the reason that the KCC routes replication from Portland to Seattle and from Seattle to Boston in a store-and-forward manner. Given the choice between replicating at a cost of 4 from Seattle or a cost of 7 from Boston, the ISTG in Portland chooses the lower cost and creates the connection object on DC3 from DC1 in Seattle.




Figure 8: Transitive Replication When Site Links Are Bridged, Schedules Overlap, and Replication Must Be Rerouted

In Figure 8, if DC3 in Portland needs to replicate a directory partition that is hosted on DC2 in Boston, but not by any domain controller in Seattle, or if the directory partition is hosted in Seattle but the Seattle site cannot be reached, the ISTG creates the connection object from DC2 to DC3.

Note: If Bridge all site links is disabled, a connection is never created between Boston and Portland, regardless of schedule overlap, unless you manually create a site link bridge.

Site Link Changes and Replication Path The path that replication takes between sites is computed from the information that is stored in the properties of the site link objects. When a change is made to a site link setting, the following events must occur before the change takes effect:

• The site link change must replicate to the ISTG of each site by using the previous replication topology.

• The KCC must run on each ISTG

As the path of connections is transitively figured through a set of site links, the attributes (settings) of the site link objects are combined along the path as follows:

• Costs are added together

• The replication interval is the maximum of the intervals that are set for the site links along the path

• The options, if any are set, are computed by using the AND operation

Note: Options are the values of the options attribute on the site link object. The value of this attribute determines special behavior of the site link, such as reciprocal replication and intersite change notification.




Thus the site link schedule is the overlap of all of the schedules of the subpaths. If none of the schedules overlap, the path is not used.

Bridging Site Links Manually If your IP network is composed of IP segments that are not fully routed, you can disable Bridge all site links for the IP transport. In this case, all IP site links are considered to be nontransitive, and you can create and configure site link bridge objects to model the actual routing behavior of your network. A site link bridge has the effect of providing routing for a disjoint network (networks that are separate and unaware of each other). When you add site links to a site link bridge, all site links within the bridge can route transitively.

As stated earlier, disabling BASL and bridging site links manually is generally only recommended for large branch office deployments.




Global Catalog Replication

Global catalogfor Domain B

Global catalogfor Domain B

Global catalogfor Domain C

Global catalogfor Domain C

Domain A

Domain B

Domain C

Configuration

Schema

Domain controllerfor Domain A

Domain controllerfor Domain A

Domain A

Configuration

Schema

Domain C

Domain A

Domain B

Configuration

Schema

Global Catalog Replication

A global catalog server is a domain controller that stores information about all objects in the forest, so that applications can search Active Directory without having to be referred to the specific domain controllers that store the requested data. As all domain controllers do, a global catalog server stores full, writable replicas of the schema and configuration directory partitions, and a full, writable replica of the domain directory partition for which the domain controller is authoritative. In addition, a global catalog server stores partial, read-only replicas of all other domain directory partitions in the forest. When an attributeSchema object has the isMemberOfPartialAttributeSet attribute set to TRUE, the attribute is replicated to all global catalog servers in the forest, in addition to the corresponding directory partition replicas on all authoritative domain controllers.

Global catalog servers can speed Active Directory searches and facilitate logons (a requirement for Active Directory) on the one hand, and can create increased replication traffic on the other. Whether you enable a global catalog server in a site will depend on the needs of the users and applications, and on the speed and availability of connections to other sites.

The first domain controller in the forest is designated automatically as a global catalog server. When additional sites are created, use Active Directory Sites and Services to enable a global catalog server for that site.

Before a domain controller advertises itself as a global catalog server in DNS, the entire global catalog must be replicated to the server. This process involves replication of a partial replica of every domain object in the forest, for every domain other than the




domain for which the new global catalog server is authoritative. How long this process takes depends on how many domains the forest contains, and on the relative locations of domain controllers. If there are multiple domains, and source domain controllers are located only in distant sites, the process will take longer than if all domains are in the same site or in only a few sites. When replication must occur between sites to create the global catalog, the site link schedule determines when replication can occur.

After a domain controller has been designated as a global catalog server, the KCC updates the topology, and replication of the global catalog partial directory partitions to the new global catalog server proceeds, after the KCC performs a topology check. When the KCC runs, it checks to see whether the global catalog option is selected for any domain controllers, and it creates the replication topology accordingly. The KCC configures the newly selected global catalog server to be the destination server for a read-only replica of each domain directory partition in the forest that the server does not already hold as a writeable copy. The KCC on the global catalog server must be able to reach a server that will be the source of each read-only directory partition.

When the KCC locates an available source domain controller, it creates an inbound connection on the new global catalog server, and replication of that read-only partition takes place. If the source is within the site, replication begins immediately. If the source is in a different site, replication begins at the next scheduled replication window. Replication of all objects in the partial directory partition must complete, successfully, before the directory partition is considered to be present on the global catalog server.

When all directory partitions are present, the domain controller sets its rootDSE attribute isGlobalCatalogReady to TRUE, and the Net Logon service on the domain controller registers global-catalog-specific service resource records (SRV) in DNS. At this point, the global catalog is considered to be available.

Global catalog servers request updates from a source domain controller for each domain directory partition in the forest (they generate inbound connection objects from those domain controllers). The source domain controller for replication of a given directory partition to a global catalog server can be either a normal domain controller or another global catalog server. As is true for all domain controllers, a global catalog server uses a single topology to replicate the schema and configuration directory partitions, and it uses a separate topology for each domain directory partition.

Replication of Changes to the Global Catalog Partial Attribute Set The default set of attributes that are replicated to the global catalog are identified by the schema. These attributes are referred to as the "partial attribute set" because they provide a replica of every object in the directory, but only those attributes that are most likely to be used for searches. An attribute can be added to the partial attribute set by editing the isMemberOfPartialAttributeSet value on the respective attributeSchema object. If the value is set to TRUE, the attribute is replicated to the global catalog. When such a schema change is made, replication of the change occurs differently on global catalog




servers depending on whether they are running Windows 2000, Windows Server 2003, or a combination of both, as follows:

• If both servers are running Windows 2000: The global catalog server initiates a full synchronization of all partial, read-only domain directory partition replicas, in order to become up-to-date with the extended replica image on other domain controllers. If the partial directory partition replica can be synchronized over an RPC connection, the domain controller attempts a full synchronization over the RPC connection before it uses an SMTP connection. If full synchronization is completed, the up-to-dateness vector that it creates optimizes later full synchronization on other connections.

• If both servers are running Windows Server 2003: Only the changed attributes are replicated to global catalog servers that are running Windows Server 2003. There is no replication impact.

• If one server is running Windows 2000, and the other Windows Server 2003: If a global catalog server that is running Windows Server 2003 replicates the change to a global catalog server that is running Windows 2000, the Windows Server 2003 reverts to Windows 2000 behavior. Although interruption of service does not occur, this replication causes higher bandwidth consumption than is required for usual day-to-day replication. The resulting bandwidth consumption for each global catalog server is equivalent to that caused by promoting a regular domain controller to the role of global catalog server. The deletion operation does not involve replication, but is handled locally. If you set the isMemberOfPartialAttributeSet value to FALSE in the schema, the attribute is removed from the global catalog immediately after the next replication cycle. This behavior is the same on global catalog servers running Windows Server 2003 and Windows 2000.




Urgent Replication

Account lockoutChanging the account lockout policyChanging the domain password policyChanging a local security authority (LSA) secretChanging the password on a domain controller accountChanging the RID master role owner

Urgent Replication

Certain important security and time-sensitive changes to objects and settings trigger replication immediately, overriding existing change notification and schedule settings. Urgent replication is implemented by immediately notifying replication partners, over RPC, that changes have occurred on a source domain controller. Urgent replication uses regular change notification between destination and source domain controller pairs that otherwise use change notification, but notification is sent immediately, in response to urgent events, instead of after the default period of 15 seconds (or 300 seconds on domain controllers that are running Windows 2000).

Events That Trigger Urgent Replication Urgent Active Directory replication is always triggered by certain events on all domain controllers within the same site. When change notification has been enabled between sites, these triggering events also replicate immediately between sites.

Immediate notification between domain controllers in the same site is prompted by the following:

• An account lockout, which a domain controller performs to prohibit a user from logging on after a certain number of failed attempts

• Changing the account lockout policy • Changing the domain password policy • Changing a Local Security Authority (LSA) secret, which is a secure form in which

private data is stored by the LSA (for example, the password for a trust relationship).




• Changing the password on a domain controller account • Changing the RID master role owner, which is the single domain controller in a

domain that assigns relative identifiers to all domain controllers in that domain In a mixed Windows NT 4.0 and Active Directory domain there are a number of additional events that trigger urgent replication. For more information, see "Urgent Replication Triggers in Windows 2000," Knowledge Base article: 232690.

Urgent Replication of Account Lockout Changes Account lockouts are urgently replicated to the primary domain controller (PDC) emulator and are then urgently replicated to the following:

• Domain controllers in the same domain that are located in the same site as the PDC emulator.

• Domain controllers in the same domain that are located in the same site as the domain controller that handled the account lockout.

• Domain controllers in the same domain that are located in sites that have been configured to allow change notification between sites (and, therefore, urgent replication) with the site that contains the PDC emulator or with the site where the account lockout was handled. These sites include any site that is included in the same site link as the site that contains the PDC emulator or in the same site link as the site that contains the domain controller that handled the account lockout.

In addition, when authentication fails at a domain controller other than the PDC emulator, the authentication is retried at the PDC emulator. For this reason, the PDC emulator locks the account before the domain controller that handled the failed-password attempt does, if the bad-password-attempt threshold is reached.




Section 3: Replication Tools and Settings

RepadminReplmonDCDiagNTDSUtilEvents and Registry EntriesNetwork Ports and Protocols

Section 3: Replication Tools and Settings

Introduction This section covers some of the tools and settings administrators can use to troubleshoot, configure, and monitor replication.


• Use repadmin, Replmon and DCDIAG to troubleshoot replication problems. • Switch on additional event logging for directory service events. • Identify network ports and protocols used by Active Directory replication.




Repadmin

Repadmin /replsummaryVery common troubleshooting toolWindows Server 2003 Support Toolview the replication on DCs

Repadmin

Repadmin Repadmin is used to view the replication information on domain controllers. You can determine the last successful replication of all directory partitions, identify inbound and outbound replication partners, identify the current bridgehead servers, view object metadata and up-to-dateness vectors, and generally manage Active Directory replication topology. You can use Repadmin to force replication of an entire directory partition or of a single object. You can also list domain controllers in a site.

Repadmin.exe can also be used for monitoring the relative health of an Active Directory forest. The operations replsummary, showrepl, showrepl /csv, and showvector /latency can be used to check for replication problems.

Repadmin is extended in Windows Server 2003 to enable commands to target sets of domain controllers. For example, you can target all domain controllers in a site or domain, or all domain controllers that are global catalog servers. In Windows 2000 Server, Repadmin can report information about only one domain controller at a time.

Repadmin Terminology

The following terminology is used in discussing Repadmin syntax:

• NamingContext refers to the directory partitions that Active Directory comprises. Naming contexts include the three Read/Write naming contexts—domain, schema, and configuration—and the optional Read-only naming contexts that are present on domain controllers that are global catalog servers. A naming context can also be an




application directory partition. A naming context is specified as a distinguished name, which indicates its hierarchical relationship to the forest root domain (for example, dc=child1,dc=contoso,dc=com).

• GUID refers to the 128-bit number used to uniquely identify objects stored in the directory (for example, fa1a9e6e-2e14-11d2-aa9b-bbfc0a30094c). The GUID is sometimes referred to in the syntax line as a universally unique identifier (UUID). For the purposes of Repadmin, these two terms are synonymous.

• DN is an X.500 distinguished name (for example, CN=Server1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=contoso,DC=com).

Repadmin uses the following general syntax:

repadminOperationParameters [/rpc] [/ldap][/u:Domain\User] [/pw:{Password | *}]

Examples Consider a network with the following domains and servers:

• contoso.com domain, with two domain controllers: Rootdns (which is also a global catalog) Rootdc01 The distinguished name of the domain is DC=contoso,DC=com.

• C1. contoso.com, a child domain of Domain, with one domain controller: server1 (which is also a global catalog) The distinguished name of the domain is DC=c1,DC=contoso,DC=com.

Example 1: Display the replication partners of a server The following example uses the showrepl operation of Repadmin to display the replication partners of Server1. This command is also used to find the objectGUID and InvocationID for a server for use with other operations.

No parameters are required for the showrepl operation. A remote connection is assumed; therefore, the server name (DC in the syntax) is included.

Type the following at the command prompt:

repadmin /showrepl server1.microsoft.com

Press Enter and the following output is displayed: repadmin /showrepl rootdns.microsoft.com

Building7a\rootdns

DC Options : IS_GC

Site Options: (none)

DC object GUID : 405db077-le28-4825-b225-c5bb9af6f50b

DC invocationID: 405db077-le28-4825-b225-c5bb9af6f50b




==== INBOUND NEIGHBORS ======================================

CN=Schema,CN=Configuration,Dc=contoso,dc=com

Building7b\rootdc01 via RPC

objectGuid: e55c6c75-75bb-485a-a0d3-020a44c3afe7

last attempt @ 2002-09-09 12:25.35 was successful.

CN=Configuration,Dc=contoso,dc=com




Dc=contoso,dc=com




Example 2: Force a replication event between two replication partners The following example uses the replicate operation of Repadmin to force the domain directory partition for microsoft.com on Server1 to replicate with the domain directory partition on Server2. This makes Server2 the source and Server1 the destination server.

The required parameters for the replicate operation are the name of the server that will receive changes (DestDC in the syntax), the name of the directory partition (NamingContext in the syntax), and the objectGUID of the directory partition that will send the changes (SourceDCUUID in the syntax).


repadmin /replicate rootdc01.contoso.com rootdns.microsoft.com dc=contoso,dc=com

Press Enter and the following output is displayed: Rootdns.contoso.com

Sync from rootdc01.contoso.com to rootdns.contoso.com completed successfully.

Example 3: Force a replication event for a specified directory partition with all of its replication partners The following example uses the syncall operation of Repadmin to force the domain directory partition for microsoft.com on Server1 to replicate with all of its replication partners.




The required parameter for the syncall operation is the server name (DC in the syntax). The name of the directory partition (NamingContext in the syntax) that will be synchronized is also included in this example. If this name is not included, then all directory partitions are synchronized.


repadmin /syncall server1.microsoft.com dc=contoso,dc=com

Press Enter and the following output is displayed: repadmin /syncall rootdns dc=contoso,dc=com

Syncing partition: dc=contoso,dc=com

CALLBACK MESSAGE: The following replication is in progress:

From: fea22f1d-a456-4f70-aa06-bedbda29e7eb._msdcs.contoso.com

To : 5c02bcaf-86d9-4bed-811e-d17a5cebf8bb._msdcs.contoso.com

CALLBACK MESSAGE: The following replication completed successfully:

From: fea22f1d-a456-4f70-aa06-bedbda29e7eb._msdcs.contoso.com

To : 5c02bcaf-86d9-4bed-811e-d17a5cebf8bb._msdcs.contoso.com

CALLBACK MESSAGE: SyncAll Finished.

SyncAll terminated with no errors.

Example 4: Display the highest Update Sequence Number on a server The following example uses the showutdvec operation of Repadmin to show the highest USNs for a specified directory partition on each replication partner. In this example, there are only two replication partners and the directory partition is the domain directory partition for the microsoft.com domain.

The only required parameter for the showutdvec operation is the DN of the directory partition (NamingContext in the syntax). A remote connection is assumed so a server name (DC_LIST in the syntax) is also included.


repadmin /showutdvec . dc=contoso,dc=com server2.microsoft.com

Press Enter and the following output is displayed: repadmin running command /showutdvec against server localhost

Caching GUIDs.

..

Building7b\rootdns @ USN 295458 @ Time 2002-09-09 19:33:42

Building7b\rootdc01 @ USN 338194 @ Time 2002-09-09 19:38:16




Example Five: Showing the replication status of a forest The following example uses the replsummary operation to show a summary of the replication status for all the domain controllers in the Contoso.com forest.


repadmin /replsummary

Press Enter and the following output is displayed: repadmin /replsummary

Replication Summary Start Time: 2002-09-18 14:54:49

Beginning data collection for replication summary, this may take awhile:

Source DC largest delta fails/total %% error

CLT-DC-01 54m:57s 0 / 9 0

DC-05 41m:23s 0 / 175 0

DC-06 55m:08s 0 / 66 0

DC-07 09m:29s 0 / 97 0

DC-08 18h:05m:02s 56 / 56 100 (1722) The RPC server is unavailable.

DC-09 56m:47s 0 / 12 0

DC-10 55m:10s 0 / 13 0

DC-11 56m:48s 0 / 46 0

DC-12 57m:09s 0 / 34 0

TK-DC-28 08m:01s 1 / 161 0 (8461) The replication operation was preempted.

TK-DC-29 55m:10s 0 / 115 0

Experienced the following operational errors trying to retrieve replication information:

58 - dc-08.contoso.com

File Required • Repadmin.exe

Source • Support Tools




Replmon

GUI Tool for monitoring replication

Similar tasks to repadmin

Replmon

Replmon This GUI tool enables administrators to view the low-level status of Active Directory replication, force synchronization between domain controllers, view the topology in a graphical format, and monitor the status and performance of domain controller replication.

You can use Replmon to do the following:

• See when a replication partner fails

• View the properties of directory replication partners

• Generate status reports that include direct and transitive replication partners, and detail a record of changes

• Poll replication partners and generate individual histories of successful and failed replication events

• Force replication

• Trigger the Knowledge Consistency Checker (KCC) to recalculate the replication topology

• Display changes that have not yet replicated from a given replication partner

• Display the metadata of an Active Directory object's attributes




Domain Controller Diagnostic Tool (DCDIAG)

Analyzes DC State

Helps identify abnormal behavior

Domain Controller Diagnostic Tool (DCDIAG)

Dcdiag.exe The Dcdiag.exe utility analyzes the state of domain controllers in a forest or enterprise and reports any problems, to assist in troubleshooting. As a reporting program, DCDiag encapsulates detailed knowledge of how to identify abnormal behavior in the system.

Tests run by default:

• Connectivity: Tests whether domain controllers are DNS registered, can be pinged, and have LDAP/RPC connectivity. This test cannot be skipped

• Replications: Checks for timely replication and any replication errors between domain controllers.

• NCSecDesc: Checks that the security descriptors on the naming context heads have appropriate permissions for replication.

• NetLogons: Checks that the appropriate logon privileges exist to allow replication to proceed.

• Advertising: Checks whether each domain controller is advertising itself in the roles it should be capable of. This test fails if the Netlogon Service has stopped or failed to start.

• KnowsOfRoleHolders: Checks whether the domain controller can contact the servers that hold the five operations master roles (also know as flexible single master operations or FSMO roles).

• Intersite: Checks for failures that would prevent or temporarily hold up intersite replication and tries to predict how long it will take before the KCC is able to recover. Results of this test are often not valid, especially in atypical site or KCC configurations or at the Windows Server 2003 forest functional level.




• FSMOCheck: Checks that the domain controller can contact a KDC, a Time Server, a Preferred Time Server, a PDC, and a Global Catalog server. This test does not test any of the servers for operations master roles.

• RidManager: Checks whether the RID master is accessible and to see if it contains the proper information.

• MachineAccount: Checks whether the machine account has properly registered and the services are advertised. Use /RecreateMachineAccount to attempt a repair if the local machine account is missing. Use /FixMachineAccount if the machine account flags are incorrect.

• Services: Checks whether the appropriate domain controller services are running. • OutboundSecureChannels: Checks that secure channels exist from all of the

domain controllers in the domain to the domains specified by /testdomain. The /nositerestriction parameter prevents the test from being limited to the domain controllers in the site.

• ObjectsReplicated: Checks that Machine Account and DSA objects have replicated. Use /objectdn:dn with /n:nc to specify an additional object to check.

• Frssysvol: This test checks that the file replication system (FRS) SYSVOL is ready.

• Frsevent: This test checks to see if there are errors in the file replication system. Failing replication of the SYSVOL share can cause policy problems.

• Kccevent: This test checks that the Knowledge Consistency Checker is completing without errors.

• Systemlog: This test checks that the system is running without errors. • CheckSDRefDom: This test checks that all application directory partitions have

appropriate security descriptor reference domains. • VerifyReplicas: This test verifies that all application directory partitions are

fully instantiated on all replica servers. • CrossRefValidation: This test verifies the validity of cross-references. • VerifyReferences: This test verifies that certain system references are intact for

the FRS and Replication infrastructure. • VerifyEnterpriseReferences: This test verifies that certain system references

are intact for the FRS and Replication infrastructure across all objects in the enterprise on each domain controller.

• /skip:Test

Tests not run by default:

Topology: Checks that the KCC has generated a fully connected topology for all domain controllers.

CheckSecurityError: On domain controllers running Windows Server 2003 with SP1, reports on the overall health of replication with respect to Active Directory security. May be performed against one or all domain controllers in an enterprise. When the test has completed, DCDiag presents a summary of the results, along with detailed information for each domain controller tested and the diagnosis of security errors that are encountered. The following argument is optional:/ReplSource:SourceDomainControllerto check the ability to create a replication link between a real or potential source domain controller (SourceDomainController) and the local domain controller.




CutoffServers: Checks for any server that is not receiving replications because its partners are down.

DNS: New in Windows Server 2003 SP1. Includes six optional DNS-related tests, as well as the /connectivity test, which runs by default. The tests can be run individually or all at once. The tests include the following:/DnsBasic to confirm that essential services are running and available, necessary resource records are registered, and domain and root zones are present./DnsForwarders to determine whether recursion is enabled and that any configured forwarders or root hints are functioning./DnsDelegation to confirm that the delegated name server is function and to check for broken delegations./DnsDymanicUpdate to verify that the Active Directory domain zone is configured for secure dynamic updates and to perform registration of a test record./DnsRecordRegistration to test the registration of all essential DC Locator records./DnsResolveExtName to verify basic resolution of either an intranet or Internet name.

OutboundSecureChannels: Checks that secure channels exist from all of the domain controllers in the domain to the domains specified by /testdomain. The /nositerestriction parameter prevents the test from being limited to the domain controllers in the site.

VerifyReplicas: This test verifies that all application directory partitions are fully instantiated on all replica servers.

VerifyEnterpriseReferences: This test verifies that certain system references are intact for the FRS and Replication infrastructure across all objects in the enterprise on each domain controller.

File required • Dcdiag.exe

Source • Support Tools




Events and Registry Entries

Variable Event Logging Levels

Use max values of 1 or 2 – Not higher

Events and Registry Entries

In Windows Server 2003, significant improvements have been made to Directory Services event logging, including KCC event messages such as those in table 4, below. An example is the Windows 2000 event 1311 message logged due to insufficient connectivity in the domain or forest. In Windows Server 2003, the affected partition is now listed to help determine the root cause. In addition, new messages specify which Active Directory site is affected by connectivity issues as well as specifying sites that are not included in site links.

Event ID Description

1311 Lists Partition that is unable to Replicate from this site

1789 Site Uncovered by any Site Link

1865 Lists the Sites unable to complete the spanning tree

1925 Unable to Build Replication Link

1308 Routing Around domain controller failure

1566 All domain controllers in given site are unable to replicate

1567 Preferred Bridgehead server defined but unable to replicate all partitions in the site

1864 Summary of Non-Replicating domain controllers

Table 4: Replication-related event ID messages




Logging Levels The KCC, like all subsystems in Active Directory, has a variable event logging level. By default, only the most important events are logged. You can increase the level of detail in the event log by modifying the value in the Replication Events entry in the following key.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics

Increasing the level of detail can be used to better understand the behavior of the KCC in different situations. However, a logging level value greater than 2 generally results in excessive logging that degrades the performance of the component. Increasing the logging level can be useful for troubleshooting problems, but it is not recommended for normal operation.




Network Ports Used by Active Directory Replication

RPC endpoint mapper: 135Ephemeral ports chosen over 1024

LDAP: 389LDAP over SSL: 663LDAP to GCs: 3268Kerberos: 88DNS: 53SMB over IP: 445

Note: Ports listed above are specific to replication only.

Network Ports Used by Active Directory Replication

By default, RPC-based replication uses dynamic port mapping. When connecting to an RPC endpoint during Active Directory replication, the RPC runtime on the client contacts the RPC endpoint mapper on the server at a well-known port (port 135). The server queries the RPC endpoint mapper, on this port, to determine what port has been assigned for Active Directory replication on the server. This query occurs whether the port assignment is dynamic (the default) or fixed. The client never needs to know which port to use for Active Directory replication.

Note: An endpoint comprises the protocol, local address, and port address

In addition to the dynamic port 135, other ports that are required for replication to occur are listed in Table 5, below:

Port Assignments for Active Directory Replication

Service Name UDP TCP

LDAP 389 389

LDAP 636 (Secure Sockets Layer [SSL])

LDAP 3268 (global catalog)

Kerberos 88 88

DNS 53 53

SMB over IP 445 445

Table 5: Ports required by Active Directory




Module 5: Microsoft File Replication Service

304 Module 5: Microsoft File Replication Service



Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. These materials are intended for distribution to and use only by Microsoft Premier Customers. Use or distribution of these materials by any other persons is prohibited without the express written permission of Microsoft Corporation. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.



Microsoft®, Active Directory®, Windows®, Windows NT®, and Windows Server® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.





Module Overview

Describe how FRS works.Describe how FRS interoperates with DFS.Describe the contents and importance of SysvolExplain the internal operations of FRS.Identify and fix problems with FRS replication.Indentify and use the correct tool to proactively monitor FRS replication.Troubleshoot most common FRS problems.

Module Overview

Introduction Microsoft® File Replication Service (FRS) is used primarily to replicate policies and logon scripts. It can also be used to replicate data for Distributed File System (DFS). FRS can copy and maintain shared files and folders on multiple servers simultaneously.

Objectives After completing this section, you will be able to: • Describe how FRS works. • Describe how FRS interoperates with DFS. • Describe the contents and importance of Sysvol • Explain the internal operations of FRS. • Identify and fix problems with FRS replication. • Indentify and use the correct tool to proactively monitor FRS replication. • Troubleshoot most common FRS problems.




Recommended Reading • Deployment Guide • Distributed Systems Guide • Deployment, monitoring, and troubleshooting of the Microsoft Windows® 2000 File

Replication Service using the SONAR, TOPCHK, CONSTAT and IOLOGSUM tools




Section 1: File Replication Service

Describe how FRS worksExplain the internal operations of FRSRecommended Configuration

Section 1: File Replication Service Concepts

Introduction Microsoft FRS is used primarily to replicate policies and logon scripts. It can also be used to replicate data for DFS. FRS can copy and maintain shared files and folders on multiple servers simultaneously.


• Describe how FRS works. • Describe how FRS interoperates with Distributed File System. • Explain the internal operations of FRS.




Introduction to File Replication Service

C:\

Templates

Word

Excel

Server2

Replica Set

New Copy

Introduction to File Replication Service

File Replication Service (FRS) is a multithreaded replication engine that replaces the LMRepl service that is used in Microsoft Windows NT®. The multithreading allows FRS to replicate different files between different computers simultaneously.

FRS replicates files in sequential order according to when they are closed, but the order of completion is determined by file size and link speed. FRS replicates only whole files. Therefore, even if you change only a single byte in a file, the entire file is replicated.

FRS can provide multiple distribution paths between the members in a replica set for the replication of SYSVOL and DFS folders. Therefore, if one replica is down, data will flow using a different route. Event times and version numbers are associated with replicated files to prevent replication conflicts.

FRS expands on the functionality provided by LMRepl with the following enhancements:

• Multimaster replication of files and folders to allow updates to occur independently on any server in the domain.

• Configurable schedules for replicating DFS and SYSVOL content between sites. • Automatic replication of folder and file attributes including access control lists

(ACLs).

FRS is automatically installed on Active Directory domain controllers and configured to start automatically. For member servers, the service start value is initially set to manual.

Although Active Directory replication and File Replication Service are separate mechanisms, they are conceptually similar. Therefore, it can be useful to read about




directory replication when you are learning about FRS. For information about directory replication, see “Active Directory Replication” in this course.

Key Terms Replication. The process of copying data, from one computer to another, that converges to an identical data set over time. Replication enhances availability and file sharing by duplicating shared files.

Replica. A member of a replica set that contains a copy of a shared folder or file.

Replica set. Two or more copies of a shared folder that participate in replication. Each copy must be located on a different computer.

Initial master. First member in a replica set and is the starting point for automatic replication. This means the files and folders in the initial master are the source for other new replicas.

File event time. The time at which a file system modification is made to a file or directory. This might not be the same as the file create or last write time. For example, restoring a file from a backup tape preserves the file create and last write times but the file event time is the time when the actual file restoration was performed.




Basic FRS Operation

1. File Closed

2. Entry written to change journal3. FRS monitors Journal and compares file to exclusion filter

4. File placed in aging cache

5. Change order created and inbound log updated

6. File copied to staging area on A7. Outbound log updated

8. Change Notification Sent to B

9. Inbound log updated + Ack sent to A

10. File copied to staging area on B

11. File constructed and moved to final destination area

Computer AComputer A

Computer BComputer B

Basic FRS Operation

The concepts of inbound and outbound partners in a replica set are important to FRS operations. The fundamental elements in a replica set are members and connection objects. The connection objects are unidirectional replication connections between the members of a replica set. A change flows in the direction of the connection between two replication partners. To replicate changes in both directions, a pair of connection objects is necessary. For example, two computers, A and B, are configured as a replica set and replication is enabled. If a file has changed on Computer A and needs to replicate to Computer B, then Computer A is the outbound replication partner for Computer B and Computer B the inbound replication partner for Computer A. If a change occurs on Computer B that needs to be replicated to Computer A, a second connection is needed. In this case, Computer B is the inbound partner for Computer A, and Computer A is the outbound partner for Computer B.

The replication of files involves the following steps:

When a user makes a change and closes a file belonging to a replica set, NTFS makes an entry in the NTFS change journal.

The NTFS change journal records changes, such as file creations, deletions, and modifications, made to all files on an NTFS volume. The default journal size is 128 MB and it is persistent across restarts and crashes. For this reason, if FRS is stopped or fails, it will not affect the replication of FRS content. Information recorded in the NTFS change journal will resume replication as soon as FRS is restarted.




FRS monitors the NTFS change journal for changes that apply to replicated shares. Only files that have been closed are checked for changes. File and folder filters are applied against changes in the folders of interest, notably domain DFS and SYSVOL replica sets.

The aging cache, a three-second delay designed to catch additional changes to a file, expires. This prevents a file from being replicated when the file is undergoing rapid updates.

Computer A records a change order in the inbound log. It also creates an entry in the ID table so that a recovery can take place if a crash occurs.

The inbound log contains change orders arriving from all inbound partners. The change orders are logged in the order that they arrive. Each change order contains information about a change made to a file or folder on a replica member, such as the name of the file or the time it was changed, which is used to construct a message about the change.

A copy of the changed file is constructed in a local Staging Directory.

The Staging Directory is an area where modified files are stored temporarily prior to being propagated to other replication partners. FRS encapsulates the data and attributes associated with a replicated file (or directory) object in a staging file. When the staging file has been generated on the originating computer, FRS compresses it. This saves space in the staging file and causes less data to be replicated between members. It also ensures that the file data can be supplied to partners regardless of any file activity that might prevent access to the original file.

Computer A updates the outbound log.

The outbound log contains change orders generated for a specific outbound partner. The changes can originate locally or come from an inbound partner. These change orders are eventually sent to all outbound partners.

Computer A sends a change notification to Computer B.

If it decides to accept the change order, Computer B asks for the modified file. Computer B writes to its inbound log and ID table.

Computer B copies the staging file to its Staging Directory. It then writes to its outbound log so that other outbound partners can pick up the change.

Propagated files are stored temporarily in the Staging Directory prior to being installed locally on the partner. This is done so that users do not see a file locked for an extended period of time while FRS is moving the file over a slow or congested link. In addition, if the link fails in the middle of the transfer, users do not see a partial file.




The altered file is constructed in a preinstallation area and moved to its final location on Computer B.

To provide secure communications, authenticated remote procedure call (RPC) with Kerberos encryption is used over TCP/IP as the protocol between members for replication. This means that replicated data traveling between members in a replica set is always encrypted.




Replication

FRS Replicates SYSVOL and NETLOGONSYSVOL is a key part of the infrastructure. A DC does not advertise as a DC until SYSVOL is replicatedSYSVOL contains System policies, Group Policy settings, User logon and logoff scriptsFRS objects are created by DCPROMOFRS uses KCC or manual connection objects, topology, and scheduleTwo-way replication required

FRS Replicates DFS shares alsoDomain DFS onlyExclusionsFRS objects created by DFS administrator tool

Replication

SYSVOL Replication SYSVOL is a shared folder that is built by the Active Directory Installation Wizard during the installation of Active Directory. When the first domain controller in a domain is created, the default policy objects are built from templates and stored in the SYSVOL folder. After the Active Directory installation is complete and the computer has been restarted, SYSVOL is ready. FRS then signals the Net Logon service that in turn shares the SYSVOL folder and publishes the computer as a domain controller. When creating an additional domain controller in the domain, FRS must seed (populate with information replicated from another computer) the SYSVOL folder. The computer is not advertised as a domain controller and SYSVOL is not shared until the seeding is complete.

The SYSVOL share contains many files that need to be available and synchronized between domain controllers in a domain or forest, including:

• System policies • Group Policy settings for domain members running Microsoft Windows 2000 or later • User logon and logoff scripts For example, the default folder structure contains the following folders used by network clients for policy and script information:

\Systemroot\SYSVOL\sysvol\domain_name\Policies

\Systemroot\SYSVOL\sysvol\domain_name\scripts




When adding, removing, or modifying the contents in the SYSVOL share, those changes are replicated to the SYSVOL shares on all other domain controllers in the domain.

FRS uses the same connection objects created by the Knowledge Consistency Checker (KCC) that Active Directory uses when it replicates SYSVOL content. Because the connection objects are the same, the schedule and topology for intersite replication are the same for FRS and Active Directory. Like Active Directory replication, FRS compresses all replicated content between sites. However, unlike Active Directory replication, FRS also compresses replicated content within a site.

Note: Compression was not available in Windows 2000 FRS until Service Pack 2.

Regular DFS Replication Unlike SYSVOL replication, which is enabled by default, replication for DFS roots and links must be explicitly enabled using the DFS administrative console. Only domain DFS can use FRS. Stand-alone DFS does not support automatic file replication.

It is possible to enable replication of files and folders between computers using the DFS Administrative console. The replication policy can be different for each root and link in the DFS namespace. At least two root targets or two targets configured are required in order to enable replication.

Replication cannot be enabled for the following:

• A shared folder on a computer on which FRS is not installed. • A shared folder that is not on the version of NTFS used in Windows 2000 and

Microsoft Windows Server® 2003. A shared folder on a FAT file system will not replicate.

• A shared folder that uses a cluster name in its path name. • A shared folder on a computer that does not belong to an Active Directory domain. • A shared folder on a computer in a domain which is inaccessible to the user that is

currently logged on.




FRS Concepts Overview

FRS Configuration held in Active DirectoryMembers, Subscribers, Connection objects, filtersFRS relies on AD replication of this configuration informationAD Objects determine partners, topology and schedule.

FRS monitors the NTFS USN JournalEfficiently tracks changes made to NTFS volumes

FRS Staging Directory and Staging Files‘Packaged’ form of the file to be replicatedPre-install

FRS DatabaseHolds and records configuration data, a list of existing, incoming and outgoing files and directories

FRS Concepts Overview

FRS settings are kept inside the Active Directory database, and Active Directory replication must be successful in order for the required information to reside on all domain controllers in the same domain. For FRS to function properly, certain critical objects (as well as their attributes and parent containers) must exist in Active Directory. These objects determine configurations like file filters, replication partners, replication topology and schedule.

The FRS service monitors NTFS volume Journal to determine if changes occurred to replicated content. Files are then "packaged" and transferred to the Staging Folder, so they can be transferred wile the original file is still accessible.

All configurations, list of existing files and folders, their properties, as well as replication data is store in a Jet Database file known as the FRS database.

All these concepts will be covered in details in the next slides.




Recommended Configuration

Microsoft recommends the following FRS TESTED limits: A maximum file size of 20 gigabytes (GB)A maximum of 64 GB of dataA maximum of 500,000 files under the replica rootA maximum of 1,000,000 simultaneous change ordersA maximum of 150 replica sets per computerA maximum of 1,000 replica members

Restrict the size of the FRS Jet Database to 8 terabytes (TB) or less. Database size is not related to the number of folders and files that are in the replica tree.

For extremely large data sets (500,000+ files or 64 GB+ disk size)Use the Robocopy.exe Resource Kit tool to copy data.

Recommended Configurations

These are the recommended limits for FRS, tested by Microsoft, regarding replication, size limitations for the FRS Jet Database, for the staging area, and for the update sequence number (USN) journal. (These limits were tested with standard DFS volumes in mind It is unlikely you will hit these limitations with the contents of Sysvol.)

Content and data limits • A maximum file size of 20 gigabytes (GB).

• A maximum of 64 GB of data.

• A maximum of 500,000 files under the replica root.

• A maximum of 1,000,000 simultaneous change orders.

Topology limits • A maximum of 150 replica sets per computer.

• A maximum of 1,000 replica members.

FRS Jet Database Microsoft recommends that you restrict the size of the FRS Jet database to 8 terabytes or less. The size of the FRS Jet database is not related to the disk space requirements of the data in the replica tree, but to the number of folders and files that are in the replica tree. You must also consider that FRS keeps a record of deleted files and folders for 60 days




and also records all outbound change orders for seven days. This means that the size of the FRS Jet database will increase as more files are replicated.

Staging area The Staging folder temporarily stores modified files before they are propagated to other replication partners. The FRS encapsulates the data and attributes that are associated with a replicated file object or folder object in a staging file. The FRS needs enough staging area space on both upstream and downstream computers to replicate files. For Windows 2000 Service Pack 2 (SP2) and in later versions of Windows 2000, when a staging file has been generated on the originating computer, the FRS compresses the file. This saves space in the staging file and causes less data to be replicated between members. It also makes sure that the file data can be supplied to partners regardless of what file activity might prevent access to the original file. The staging area on both domain controllers must be as large as the largest file that you want to replicate. It must also be sufficiently large to store any other files pending replication. The staging area size limits in Windows 2000 Server and Windows Server 2003 are:

• Default size: 660 megabytes (MB)

• Minimum size: 10 MB

• Maximum size: 2 terabytes

USN journal The update sequence number (USN) journal provides a persistent log of all changes that are made to files in a volume. As files, directories, and other NTFS file system objects are added, deleted, and modified, NTFS enters records in the USN change journal, one for each volume on the computer. In Windows 2000 Server Service Pack 4 and later, the default size of the USN Journal is 512 MB. Microsoft recommends that you increase the default size by 128 MB for every 100,000 files and folders.

Other issues Microsoft recommends that you evaluate your FRS configuration with caution. As the number of files and data size increase, you may experience scenarios that affect computer and network performance.

Extremely large data sets For data sets of more than 500,000 files or 64 GB of disk space, Microsoft recommends that you use the Robocopy.exe Resource Kit tool to copy data.




Morphed directories Morphed directories are folders and files that have been replicated to other servers and are exact copies of each other, but FRS cannot determine the most recent folder. Because FRS cannot determine the most recent folder, it creates a duplicate folder.

Non-authoritative restores A non-authoritative restore synchronizes an out-of-date domain controller with an up-to-date source. You must stop the NTFRS service and set the startup to manual on the outdated domain controller before you initiate the non-authoritative restore. In scenarios with multiple domain controllers, this can affect network performance for a significant length of time.

Network latency You may experience network latency as the number of replicated files and the replicated data size increase.

For more information about tested limits and FRS recommendations, read the article below.

Reference: For more information, see the following Knowledge Base article: 840675 "Configuration and operational recommendations for the File Replication service in Windows Server 2003 and Windows 2000 Server."




Managing FRS

Distributing disk usageFor best performance, place FRS Logs and Staging Folder on separate disks, especially on HUB servers.

Maintaining the staging directoryLargest file to be replicated

Default = 660 MB / Minimum = 10 MB / Maximum = 2TBService pack revision

CompressionStaging area free space

"Production rate" - how much change needs to be replicated"Consumption rate" - ability of downstream computers to accept files

Managing FRS

To keep an FRS implementation running smoothly, there are some best practices to be aware of. Understanding ways to manage and restore replicated files helps maintain the best replication performance across your network, while successfully monitoring FRS allows errors to be found and troubleshot.

Distributing Disk Usage To distribute disk traffic, store the FRS logs on a separate disk from the Staging Directory, the working directory, and the content that is to be replicated. (The working directory contains the Ntfrs.jdb file.) This is very important when gathering FRS log data at a high severity level. In fact, locating the FRS logs and the Staging Directory on a separate disk drive or partition from that of the operating system or drive containing replicated content allows for the best replication performance because it distributes disk input/output (I/O). In a hub-and-spoke topology, this is especially recommended for the hub servers.

Reference: For more information, see the following Knowledge Base article: 221093 " HOW TO: Relocate the NTFRS Jet Database and Log Files."

Maintaining the Staging Directory The staging space limit governs the maximum amount of disk space that FRS can use to hold staging files. It is important to make sure that the Staging Directory for replicated shares is large enough to hold the staging files. The minimum size for the Staging




Directory is 10 MB and the maximum size is 2 terabytes. However, the default maximum size set in the registry is 660 MB.

In Windows 2000 SP2 or earlier, if the staging area becomes full, FRS will pause replication until space can be recovered by replicating one or more staging files to all outbound partners. Therefore, you should use a generous estimate for staging area size.

Beginning with Windows 2000 SP3, an updated staging file management algorithm is used to delete staging files that have not recently been used. When FRS tries unsuccessfully to allocate space for a staging file (because either there is not enough space or because the amount of space in use has reached 90 percent of the staging space limit), it will begin to delete staging files. Staged files are deleted (in the order of the longest time since the last access) until the amount of space in use has dropped below 60 percent of the staging space limit. The staging files will be regenerated when requested by replication partners. Although no longer critical, it is still recommended to use generous estimates when determining staging area size in order to prevent disk/CPU performance from being consumed by repeated staging and deleting of files.

To adjust the size of the staging space, go to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters

value of the Staging Space Limit in KB

Minimum 10Mb

Maximum 2Tb

To avoid an interruption to FRS replication when the Staging Directory is full, construct replica connections that have comparable bandwidth for all outbound partners. It is also a good idea to balance the bandwidth for inbound and outbound connections.

Locating the Staging Directory on a partition or disk drive that does not affect the operating system or other critical operations provides the best replication performance.

Determining the correct staging size for a given dataset

The ratio of staging area size to data set size depends upon a range of factors

Factor Description

Largest file to be replicated The largest file that FRS can replicate is determined by the staging area size on both the upstream and downstream computers. Therefore, the largest possible file that FRS can replicate is 2 terabytes, when the staging area size has been set to this maximum value.

If using Windows 2000 SP2 or later Windows 2000 SP2 and later compress the data in the staging area. Some file types (text, some binaries, and documents) are more compressible than others (e.g. compressed archives, and multimedia files).




Factor Description

If using Windows 2000 SP3 or later If using Windows 2000 SP2 or earlier, FRS stops replicating if the staging area runs out of free space. This means that if a replica set member goes offline for an extended period of time, it does not block replication on an upstream member because the staging area is filled. Therefore, use a generous estimate for staging area size. Windows 2000 SP3 and later have an updated staging file management algorithm. On these systems, when FRS tries to allocate space for a staging file and is not successful (because either there is not enough space or because the amount of space in use has reached 90 percent of the staging space limit parameter), FRS starts to delete staging files. Staged files are deleted (in the order of the longest time since the last access) until the amount of space in use has dropped below 60 percent of the staging space limit parameter. Consequently, it is not as critical to use as generous an estimate for staging area size as it was for pre-SP3 systems, but it is still advised to do so in order to prevent disk/CPU performance being consumed by repeatedly staging and deleting files.

Production rate (how much change needs to be replicated)

FRS replicates whole files that have been changed. As so, the rate of change is sum (sizes of files modified), not sum (size of changes to files). With pre-Windows 2000 SP2, there is the issue of the multiple changes to the same file. FRS can enter a file into the staging area multiple times—once for each time it was written and closed (but note that the FRS 'aging cache' prevents more than one change order + staging file being generated within three seconds).

Consumption rate (ability of downstream computers to accept files)

Staging area for a file is eventually released when all outbound partners receive the staged files. The ability of downstream partners to accept files is a key factor in determining Staging Area Size. Sub-factors here include: Replication schedule between partners: How long must the files wait for a chance to be replicated? Availability of partners: Issues like planned/unplanned downtime can cause backlogs. An outbound partner that has not connected for awhile can cause a lot of staging space to be required. Bandwidth available between partners: How long will it take to replicate the files? Number of downstream partners: FRS staging space will be decided by the slowest partner.




Other rules to remember include:

• The quality of monitoring of replication backlogs is an important issue. If replication backlogs are not carefully monitored, then the staging area can be exhausted (with Windows 2000 Service Pack 2) or “churn” and consume disk/CPU usage (with Windows 2000 Service Pack 3 or later).

• If using SP2 or earlier, “unnecessary” replication can be provoked by antivirus or file system policy.

• When adding a new member, FRS on the upstream partner needs to generate special “directed” staging files that will be used to replicate only to the new member. FRS throttles staging space usage in this scenario, but it requires additional staging space to support up to 128 additional outstanding staging files per new downstream partner during this process. The amount of space this uses depends on the size of the files currently waiting replication. In the worst case, it would be the 128 largest files in the replica set. Note that the SP2 compression and SP3 least recently used (LRU) algorithms behavior on staging files applies in this case as well, easing demands on staging space.




NTFS Junction Points in SYSVOL

Windows NT file system (NTFS) version 5.0 is required for FRSBecause junctions are used within the SYSVOL folder. An underlying "reparse point" allows NTFS to transparently remap an operation to the target object.

Use LINKD to recreate the junction points

Folder in File System Contents

SYSVOL\Domain Policies and NETLOGON

SYSVOL\Staging Staging Folder

SYSVOL\Staging areas\<Dns Domain Name> <Junction> to SYSVOL\Staging

SYSVOL\Sysvol\<Dns Domain Name> <Junction> to SYSVOL\Domain

NTFS Junction Points in SYSVOL

The System Volume (SYSVOL) is a shared directory that stores the server copy of the domain's public files that must be shared for common access and replication throughout a domain. The SYSVOL folder on a domain controller contains the following items:

• Net Logon shares. These typically host logon scripts and policy objects for network client computers.

• User logon scripts for domains where the administrator uses Active Directory Users and Computers.

• Windows Group Policy. • File replication service (FRS) staging folder and files that must be available and

synchronized between domain controllers. • File system junctions. File system junctions are used extensively in the SYSVOL structure and are a feature of NTFS file system 3.0. You must be aware of the existence of junction points and how they operate so that you can avoid data loss or corruption that may occur if you modify the SYSVOL structure.

SYSVOL uses junction points to manage a single instance store. Junction points are also referred to as reparse points (directory junctions and volume mount points). A junction point is a physical location on a hard disk that points to data that is located elsewhere on your hard disk or on another storage device. Junction points are created when you create a mounted drive.




In a single instance store, the physical files only exist one time on the file system. However, in SYSVOL, the physical files are located in the following locations:

• Sysvol\Domain

-or-

• Sysvol\Staging\Domain

The additional folder structures are reparse points that redirect file input/output to the original locations. The following table lists the folders in Sysvol that contain junction points and the locations to which these junction points resolve:

SYSVOL Folders Contents

SYSVOL\Domain Policies and NETLOGON

SYSVOL\Staging Staging Folder

SYSVOL\Staging areas\<DnsDomainName> <Junction> to SYSVOL\Staging

SYSVOL\Sysvol\<DnsDomainName> <Junction> to SYSVOL\Domain

This configuration maintains data consistency by making sure that a single instance of the data set exists. Additionally, this configuration permits more than one access point for the data set. For example, SYSVOL\Domain or SYSVOL\Sysvol\contoso.com, allows for redundancy but does not allow for duplicate files. Junctions graft the namespace (any bounded area in which a specific name can be resolved) of the destination file system location to an NTFS volume. An underlying reparse point permits NTFS to transparently remap an operation to the destination object. As a result, if you modify the data in the SYSVOL structure, changes occur directly on these physical files. Additionally, if you perform a cut-and-paste operation or a copy-and-paste operation with these folders in the SYSVOL structure that contains junction points, the cut-and-paste operation or the copy-and-paste operation occur in the junction point information. It is recommends that you avoid performing a cut-and-paste operation or a copy-and-paste operation on the SYSVOL structure, especially when you perform the paste operation on the same server. If you perform a cut-and-paste operation or a copy-and-paste operation on the SYSVOL structure, a copy of the junction point information is created. This does not result in a copy of the actual data. Instead, a copy of the junction point information only is created. If you modify any of the files that appear in that folder, you modify the source files directly. Also, do not modify the SYSVOL structure without understanding the behavior of junction points and how these points affect Active Directory in your enterprise. This




recommendation applies to backup and restore operations of the SYSVOL structure. By default, if you back up SYSVOL by using NTBackup.exe, the backup file includes a backup of the folder's junction point information. If you restore a SYSVOL structure from a backup file to a different location on the same server, do not restore the junction point information. To do so, use the advanced restore options.

Note: Under Windows Server 2003, if you copy %systemroot%\SYSVOL, you do not copy the junction points. However, under Windows 2000, if you copy %systemroot%\SYSVOL, you do copy the junction points.

Junction points can be managed and recreated by using a resource kit tool called LINKD. The following article describes how to use LINKD tool to re-create SYSVOL junction points.

Reference: For more information, see the following Knowledge Base article: 315457 "How to rebuild the SYSVOL tree and its content in a domain."

The following articles describe SYSVOL and Junction Points in detail.

Reference: For more information, see the following Knowledge Base articles: 324175 "Best Practices for Sysvol Maintenance" and 186750 "Usage of NTFS 5.0 Junctions in the Sysvol Folder."




Intersite vs. Intrasite Replication for SYSVOL

Scheduled ReplicationScheduled Replication

Site Link Replication Intervaland Cost

(i.e. 15 min or 180 min)

Automatic ReplicationAutomatic Replication

Change Notification(immediate to all partners)

Intersite vs. Intrasite Replication for SYSVOL

Intersite vs. Intrasite Replication for SYSVOL FRS replicates SYSVOL using the same intrasite connection objects and schedule built by the KCC for Active Directory replication. The connection object schedule is an attribute associated with each connection object. The connection object schedule contains a 7x24 array of bytes, one byte for each hour in a seven-day week. The low four bits of each byte are used to indicate the number of times replication is attempted in that hour. The upper four bits of each byte are reserved for future use. Intrasite SYSVOL replication occurs once per hour by default, unless changes are made that trigger replication. When information in the SYSVOL folder is modified, FRS replicates the information immediately to all replication partners.

The example connection object schedule below would trigger both SYSVOL and Active Directory replication. Insight about the replication schedule can be gathered using ntfrsutl as this command displays a 24-hour by seven-day schedule in Coordinated Universal Time (UTC) time beginning on Sunday. An example of this could be:

Day 1: 000000000000000000000000 SUN Day 2: 000000005555555555000000 MON Day 3: 111111111111111111111111 TUE Day 4: 000000000000000000000000 WED Day 5: 100010001000100010001000 THU Day 6: 100100100100100100100100 FRI Day 7: FFFFFFFFFFFFFFFFFFFFFFFFF SAT




The array of hexadecimal digits shows one digit per hour in a week. The number of bits set in each hex digit indicate the number of times a replication cycle is initiated in that hour where:

F = replication 4x per hour 5 = replication 2x per hour 1 = replication 1x per hour 0 = no replication scheduled

Intersite replication for SYSVOL is determined by the same Active Directory connection object schedule. However, intersite replication taking place over a slow wide area network (WAN) link can consume valuable bandwidth during peak business hours. For this reason, administrators might want to modify the replication schedule for intersite replication to take place after peak hours. Doing so involves overriding the topology and schedule set on connection objects by the Intersite Topology Generator (ISTG). You can override the intersite KCC by modifying the default replication schedule for SYSVOL.

Scheduling Replication for DFS Replicas For DFS replica sets, FRS uses the connection objects, topology, and schedule built by the DFS administrative snap-in. However, a schedule for a DFS replica set can be assigned to the connection object or to the replica set itself. Generally, it is preferable to change the schedule on the connection object because this will override a schedule assigned to a replica set. However, assigning a schedule to the replica set might be more appropriate for a replica set with a large number of replicas. For example, it would be a tedious process to configure the schedules on all of the connection objects for a replica set with more than 100 members.

You schedule replication for a DFS replica set by making replication either available or unavailable. FRS will start replicating to the outbound partners and stop replicating at the point at which the window closes, even if all the files have not been replicated. This allows organizations to allocate replication when network bandwidth is available. For example, if a user dumps a 1-GB file into a replica set, it might be desirable to schedule replication during nonpeak hours.




FRS and Active Directory

MemberRef AttributeIdentifies Replica Sets this computer

“subscribes” to

ComputerRef AttributePoints to Computer Object

representing Member’s Machine

File Replication ServiceContainer

NTFRS Settings Object

NTFRS Replica Set Object(s)

NTFRS Member Object(s)

ComputerObject

NTFRSSubscriptions

Container

NTFRSSubscriber

Object

FRS and Active Directory

For replication to occur, FRS requires information from Active Directory. SYSVOL replication occurs between domain controllers; however both domain controllers and member servers can participate in DFS replication. Therefore, both domain controllers that are members of DFS replica sets and domain controllers that are contacted by DFS member servers must contain the required objects.

Active Directory replication must be successful in order for the required information to reside on all domain controllers in the same domain. For FRS to function properly, certain critical objects (as well as their attributes and parent containers) must exist in Active Directory.

FRS Member Object A member object is a replica set’s link to the computer object. Each member object has one or more connection objects that specify the inbound partners for the member.

For SYSVOL replica sets, the member objects are created once the reboot during Active Directory installation is completed.

The member objects can be viewed in Active Directory Users and Computers by turning on Advanced Features from the View menu and examining the system folder.

SYSVOL The table below describes the location of the different types of objects created by the file replication service for the system volume and shows a pair of computers.




DN Object Class

DC=Contoso, DC=COM Root Domain NC

CN=SYSTEM Container

CN=File Replication Service nTFRSSettings

CN=Domain System Volume nTFRSReplicaSet

CN=ROOTDNS nTFRSMember

CN=ROOTDC01 nTFRSMember

The FRS replica set object (nTFRSReplicaSet) identifies the replica set name and functions as the parent container for member computers of the replica set. It has an attribute that describes file and folder filters (FrsFileFilter and FrsFolderFilter).

A member object (nTFRSMember) exists for each computer that is a member of a DFS or SYSVOL FRS replica set. The member object has a reference to the distinguished name (DN) of the member’s computer object or members of SYSVOL replica sets. There is an additional Server_Reference attribute that points to the DN of the NTDS-Settings object for the server. This attribute allows FRS to access the replication topology without it pointing to a valid NTDS-Settings Object. Rreplication will not occur from that particular computer. Potential causes of failure include domain controllers being retired without being demoted and administrators deleting the NTDS-Settings object or removing servers using NTDSUTIL.

Missing serverReference attributes can be repaired by using LDP or ADSIEDIT to reset the value to the DN of the server’s NTDS settings object in the configuration Naming Context. Once repaired, FRS will pick up the change at the next poll of the directory.

DFS Replica Sets Like SYSVOL, member objects exist for each computer that is a member of a DFS replica set. There are two important distinctions:

The Server_Reference attribute is NULL by default.

Connection objects and associated schedule reside under the member objects in the domain NC rather than the configuration NC.

DN Object Class

DC=Contoso, DC=COM Root Domain NC

CN=SYSTEM Container

CN=File Replication Service nTFRSSettings

CN=DFS Volumes nTFRSSettings

CN=DFSRoot nTFRSSettings

CN=Apps nTFRSReplicaSet

CN=Student1 nTFRSMember

CN=Student2 nTFRSConnection




DN Object Class

CN=Student2 nTFRSMember

CN=Student1 nTFRSConnection

FRS Subscriber Objects FRS Subscriber Objects tell the File Replication Service to which replica set the computer belongs. The subscriber object also contains a reference to the FRS member object, which FRS reads to determine whether the computer is still a member of the replica set and to obtain a list of inbound partner connections. FRS polls Active Directory periodically in order to identify changes to any of these objects, and it updates its configuration as needed.

FRS Connection Object Connection objects represent a physical connection to a data store. For SYSVOL, FRS uses connection objects created by the KCC or connection objects created manually through the Active Directory Sites and Services administrative console, scripts, or a manual topology generation tool. The KCC is run periodically on Active Directory domain controllers to optimize and adjust the topology for failed computers or lost connections. When FRS is used to replicate DFS shares, connection objects are created using the DFS administrative console.

Connection Object Schedule Each connection object has a schedule attribute that defines when the inbound replication partner will replicate changes. FRS replicates SYSVOL using the same connection object topology and schedule created by the KCC for Active Directory replication. For DFS replica sets, FRS uses the connection object topology and schedule created by DFS.




FRS Polling Intervals

FRS Configuration held in Active DirectoryActive Directory replication propagates FRS configurationsWhen FRS starts up, it first polls Active Directory for configuration changes that might have been made while the computer was offline.Then FRS determines the replica set subscribers, or inbound and outbound partners, for each replica set.FRS polls Active Directory at regular intervals for configuration changesIf no computer object exists, replication will not take place. The service will continue to poll until a computer object has been created.

FRS Polling Intervals

When FRS starts up, it first polls Active Directory for configuration changes that might have been made while the computer was offline. Then FRS determines the replica set subscribers, or inbound and outbound partners, for each replica set. For this reason, it is important to keep FRS running on domain controllers and DFS replica members to ensure that these configuration changes are made.

When a computer is online, FRS polls Active Directory at regular intervals for configuration changes that affect its partner relationships. The objects polled are the computer objects and subscriber objects. If no computer object exists, replication will not take place. The service will continue to poll until a computer object has been created.

The default polling interval for a replica member server is determined by one of two registry values: DS Polling Long Interval in Minutes or DS Polling Short Interval in Minutes. When no DFS configuration changes are occurring in Active Directory, the polling interval is determined by the value of DS Polling Long Interval in Minutes, the default value of which is every hour. When configuration information changes, the polling interval is reset to the value of DS Polling Short Interval in Minutes, which by default is set to every five minutes. Once the configuration has stabilized, the polling interval is reset back to DS Polling Long Interval in Minutes.

The local computer will reset the polling interval automatically when certain events take place.




Events that reset the polling interval include:

• Adding a replica. • Deleting a replica. • Adding a connection. • Deleting a connection. • Changing a schedule. • Changing a file or folder filter.

It is possible to alter the default settings for these polling intervals. However, the DS Polling Long Interval in Minutes and the DS Polling Short Interval in Minutes registry values are not visible in the registry by default and must be manually added before these settings can be changed.

HKLM\System\CurrentControlSet\Services\NtFrs\Parameters

Value Name: DS Polling Long Interval in Minutes

Value Type : DWORD

Max : 35000

Min : 1

Default : 60

HKLM\System\CurrentControlSet\Services\NtFrs\Parameters

Value Name: DS Polling Short Interval in Minutes

Value Type : DWORD

Max : 35000

Min : 1

Default : 5

If there have been no configuration changes in the directory after the completion of eight short polling intervals, FRS will automatically begin polling according to the DS Polling Long Interval In Minutes default value of 60 minutes, unless another value has been specified in the registry.

FRS running on a domain controller will always poll according to the default value of the DS Polling Short Interval In Minutes unless another value has been specified in the registry.

Ntfrsutl.exe can also be used to view or modify directory services (DS) polling intervals.




FRS Tables and Logs

Connection table Inbound/outbound partner connection.

Version vector table Measures how up-to-date a replica is compared to another replica

ID table Lists files in the replica set

Inbound log Pending change orders to be processed

Outbound log Pending change orders to be sent to outbound partners

FRS Tables and Logs

FRS transactions are stored in a Microsoft Jet database that defaults to \Systemroot\Ntfrs\Jet\Ntfrs.jdb. Each computer hosting a replica has a set of three tables and two logs stored in the Ntfrs.jdb file. These include:

• A connection table. This contains one record per inbound/outbound partner connection.

• A version vector table. This is a table that measures how up-to-date a replica is compared to another replica. Each replica member in a replica set is assigned a number. When an inbound partner joins a replica set for the first time, its number is added to the version vector. This process is referred to as a version vector join. A version vector join also occurs when FRS replicates or when the outbound log wraps.

• An ID table. This lists all files in the replica set of which FRS is aware. Data stored in the ID table includes a globally unique identifier (GUID), file name ID, parent file ID, file object ID, parent object ID, version number, and event time.

• An inbound log. This stores pending change orders to be processed. As entries are processed, acknowledgments are sent to the inbound partners. Data stored in the inbound log includes the change order's GUID, file name, object ID, parent object ID, version number, and event time. During a planned shutdown, all new change orders to come in after the last update are written to the inbound log. If an unplanned shutdown or network interruption occurs, the inbound partner resends all the change orders in its outbound log for which acknowledgments have not yet been received.




• An outbound log. This stores pending change orders to be sent to outbound partners. Change orders remain in the outbound log until all outbound partners receive and acknowledge the change. Data stored in the outbound log is the same as that stored in the inbound log. Also in the outbound log is the leading (next change) and trailing (last acknowledged) index for each partner. The outbound log or logs can become quite large, particularly when replicas are down, links between replicas are slow, replication hours are restricted, or a large number of changes occur. For example, if one of four replicas is down, snapshots of the file image and log entries are maintained until this server becomes available. Any one of these circumstances can cause the outbound log to grow or become full.

The NTFS USN journal logs all changes made to files or directories on an NTFS volume. When the journal becomes full, it will wrap, causing logged information to be deleted. When the changes are finally sent, the inbound partner sends all changes in log file order.

If an inbound partner performs a non-authoritative restore or trims its outbound log because the outbound partner has been offline for a long period of time, causing excessive space to be taken up in the Staging Directory, the last acknowledged change order for a replica partner is overwritten, and a complete synchronization must be completed for the replica. This involves the outbound partner sending its version vector with the changes it has received to the inbound partner. The inbound partner checks its ID table by using this state to determine what changes occurred afterward and sends them.

In case of a system failure, changes are always logged in the Jet database for recovery purposes before any disk files are moved.




FRS Logs

Log SettingsDebug Log Files Debug Log Severity Debug Maximum Log Messages

Analyzing Log Records

FRS Logs

FRS creates text-based logs in the systemroot\Debug directory to help debug problems. The Ntfrsapi.log file contains events that take place during promotion and demotion—namely, creating the subkeys in the NTFRS registry subkey.

To observe a particular event, take a snapshot of the log files as close to the occurrence of the event as possible. Save the log files in a different location so they can be examined afterward.

Log Settings The Ntfrs log files store transaction and event detail in sequentially numbered files: Ntfrs_0001 through Ntfrs_0005. Transactions and events are written to the log with the highest sequence number in existence at that time. The characteristics of the log files are determined by the values of several registry entries in the following subkey:

HKLM\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters

After the number of logs specified by the value of the Debug Log Files registry entry has been filled, the lowest log version is deleted and the remaining log file names are decreased by n–1 to make room for a new log file.

Log detail is controlled by the value of the Debug Log Severity registry entry, ranging from 0 to 5, with 5 providing the most detail. Log size is determined by the value of the Debug Maximum Log Messages registry entry. The default value of 20,000 lines for Debug Maximum Log Messages results in a roughly 2 MB log file for a total of 10 MB of logs (Debug Log Files * [Debug Maximum Log Message * 120]). Setting Debug




Log Messages to 50,000 results in a 5 MB log file and 25 MB of total log space with default settings.

To change the quantity, size, or level of detail of FRS log files, edit the values of the registry entries. Before you increase either the size or the quantity of log files, make sure that sufficient disk space is available. In general, budget 1 MB for each 10,000 messages.

To capture a random or intermittent event, expand FRS logging capability. For example, increase the number of log files to 50 and then archive the files when they become full. This accumulates the history needed to respond to overnight queries from users.

Depending on the problem that is being investigated, it might be necessary to review logs on multiple member computers. System clocks must be synchronized so that events can be correlated between replication partners.

Finally, the recovery setting for the FRS service in service control manager (SCM) can be critical to locating and keeping important log events on the system. If the service is asserting but SCM is configured to automatically start FRS upon error, enough log traffic might be generated to cause events in Ntfrs_0005.log to decrease and be deleted from the drive. Stop the service on both the inbound and outbound replicas close to the time when an error occurs, and then copy the logs to a safe place.

If debugging replication problems is not a priority, disable logging to reduce disk traffic.

Analyzing Log Records Solving problems with NTFRS logs requires ensuring that the value of the Debug Log Severity registry entry is set high enough to capture the events needed to identify the problem. Severity settings range from 0 to 5 and are cumulative, meaning that a setting of 4 includes log events with a severity of 0 to 3. Error logs hold a severity setting of 0 and are always displayed. Setting the severity level at 5 will cause every action to be logged and can make finding more important information difficult. The default log setting is 2.

In order to more easily identify errors, warning messages, and milestone events in the log files, filter out less important information. FRS log content is designed to be filtered easily. All log records have a date and time stamp and an identifying string annotated by two colons (::) with a letter or pair of letters between each colon. For example, a log entry that has a :U: as an identifying string includes information related to the NTFS journal. A log entry containing information about directory service polling will have an identifying string of :DS:. A tracking log entry has an identifying string of :T: and summarizes a change order that has finished or is in the process of updating a given file or directory.

Tracking log entries can be a helpful way to identify and understand problems that can occur during the change order process. A tracking log entry will tell you what files have been changed and where the change originated.

The following tracking log entry describes a remote change order that is creating a new file called test_file in Replica-A. The version number is zero.




7/31-08:40:08 :T: CoG: d42cda60 CxtG: 000001b7 [RemCo ] Name: test_file 7/31-08:40:08 :T: EventTime: Mon Jul 31, 2000 08:40:04 Ver: 0 7/31-08:40:08 :T: FileG: ceff96a6-5c9f-433a-989c841454a1593b FID: 61a70000 0000036c 7/31-08:40:08 :T: ParentG: 1a89f4e1-a0c0-43e4-aedbe869f767f372 Size: 00000000 00000008 7/31-08:40:08 :T: OrigG: 2eea81b4-f92d-4941-9f269d4bbdd7ea05 Attr: 00000020 7/31-08:40:08 :T: LocnCmd: Create State: IBCO_COMMIT_STARTED ReplicaName: Replica-A (1) 7/31-08:40:08 :T: CoFlags: 0000040c [Content Locn NewFile ] 7/31-08:40:08 :T: UsnReason: 00000002 [DatExt ]

The individual fields that comprise the tracking log entry are described below. In some cases, only the first DWORD of a GUID is actually displayed in the log.

Identifier Description

:T: Identifying string.

CoG: Change order GUID - Uniquely identifies a create/delete/rename/ modify action for a file.

CxtG: Connection GUID - Identifies the connection object in the topology connecting an upstream computer to the computer that delivered this change order.

[ ] – RemCo [ ] – RemCo, Abort [ ] – LclCo [ ] – LclCo, Abort

Identifies a remote change order. Identifies a remote changer order that was aborted. Identifies a local change order. Identifies a local change order that was aborted.

Name: File name.

EventTime: Time on the originating member at which the change was performed.

Ver: Version number of the file. Increases by one each time a local change order is created.

FileG: File GUID - Uniquely identifies the file or directory and is used as the NTFS object ID on the file or directory. The corresponding file/directory on each replica member has the same File GUID.

FID: File ID - The NTFS volume-specific file ID (File Reference Number).

ParentG: Parent GUID - The GUID of the parent directory that contains this file or directory.

Size: The approximate size of the file or directory noted in hexadecimal.

OrigG: Originator GUID - The GUID associated with the member of the replica set that originated this update.

Attr: File attributes - The attribute flags for the file/directory.

LocnCmd: Location command - One of Create, Delete, NoCmd, MoveDir; indicating that the file is being created, deleted, updated, or is changing parent directories.

State: The change order state - One of IBCO_STAGING_RETRY, IBCO_FETCH_RETRY, IBCO_INSTALL_RETRY, IBCO_COMMIT_STARTED; indicating that the change order is being




Identifier Description retried later because of insufficient staging space, inability to complete the fetch of the staging file, or inability to install the change to the file. Finished change orders have a state of IBCO_COMMIT_STARTED.

ReplicaName: The name of the replica set containing this file or directory.

CoFlags: Change 0rder flags Abort - Set when CO is being aborted. VVAct - Set when VV activate request is made. Content - Valid content command. Locn - Valid location command. LclCo- CO is locally generated. Retry - CO needs to retry. InstallInc - Local install not completed. Refresh - CO is an upstream-originated file refresh request. OofOrd – Do not check/update version vector. NewFile - If CO fails, delete IDTable entry. DirectedCo - This CO is directed to a single connection. DemandRef - CO is a downstream demand for refresh. VVjoinToOri - CO is from vvjoin to originator. MorphGen - CO generated as part of name morph resolution. MoveinGen - This CO was generated as part of a sub-dir MOVEIN. OidReset - All CO did was reset the object identifier back to FRS defined value. CmpresStage - The stage file for this CO is compressed.

UsnReason: Flags set in the NTFS change log describing modifications to the file. Close - Change log close record. Create - File or directory was created. Delete - File or directory was deleted. RenNew - File or directory was renamed. DatOvrWrt - Main file data stream was overwritten. DatExt - Main file data stream was extended. DatTrunc - Main file data stream was truncated. Info - Basic info change (attrib, last write time, and so forth). Oid - Object ID change. StreamNam - Alternate data stream name change. StrmOvrWrt - Alternate data stream was overwritten. StrmExt - Alternate data stream was extended. StrmTrunc - Alternate data stream was truncated. EAChg - Extended file attribute was changed. Security - File access permissions changed. IndexableChg - File change requires reindexing. Hlink - Hard link change. CompressChg - File compression attribute changed. EncryptChg - File encryption changed. Reparse - Reparse point changed.




The following tracking log entry describes a local change order (a change order originating on the computer where the log was produced) that is updating the same file, test_file. The version number is now 1. Notice that the originator GUID is different from that of the tracking log entry above. The File GUID and parent GUID of both log entries are the same for both change orders because the same file is involved and it has not changed parent directories.

7/31-08:56:55 :T: CoG: cd55ad6f CxtG: 37b12c93 [LclCo ] Name: test_file 7/31-08:56:55 :T: EventTime: Mon Jul 31, 2000 08:56:52 Ver: 1 7/31-08:56:55 :T: FileG: ceff96a6-5c9f-433a-989c841454a1593b FID: 61a70000 0000036c 7/31-08:56:55 :T: ParentG: 1a89f4e1-a0c0-43e4-aedbe869f767f372 Size: 00000000 00000200 7/31-08:56:55 :T: OrigG: 8f759ded-e611-43c4-be05c10138dfdea4 Attr: 00000020 7/31-08:56:55 :T: LocnCmd: NoCmd State: IBCO_COMMIT_STARTED ReplicaName: Replica-A (1) 7/31-08:56:55 :T: CoFlags: 00000024 [Content LclCo ] 7/31-08:56:55 :T: UsnReason: 00000002 [DatExt ]

When searching for specific log entries, a good practice is to start at the bottom of the last log file and work your way up. Focus on keywords such as install, success, and fail. If an error is not found, start at the bottom of the previous log (Ntfrs_0005, then Ntfrs_0004, and so on). Use the findstr command to isolate errors in the log files as follows:

findstr /in ":SO: invalid abort error warn fail" ntfrs *.log >err.tmp

For a more concise log file, you might want to filter out the following:

findstr /v "IO_PEND_ERROR_SUCCESS FrsErrorSuccess" err.tmp > error.tmp

Depending on the context, some errors (such as "jet attach db – 1811. Db not found") can be ignored because the Ntfrs.jdb file does not exist the first time that FRS starts. Until the service creates the file, expect to see this immediately after the Active Directory Installation wizard or when you delete the Ntfrs.jdb file manually.

Sharing violations, designated by the SHARING_VIOLATION status code, occur when a user or process has a lock on a file or when FRS is attempting to apply an update to a file. A persistent SHARING _VIOLATION might indicate that a file is locked open by a user or computer process and will not replicate. FRS will retry the update until it succeeds. Because FRS tracks only closed files, locked files and directories do not replicate. The net files command might be helpful in identifying files that are locked open or in use. You might also see communication-related failures, such as an unsuccessful attempt to make an RPC call to a member computer that is down or off the network. These can be ignored.

If failure errors are encountered, look at the thread number and follow up all events in the log that have matching thread identifiers until you see the associated change order. However, because FRS is multithreaded, different threads can execute the different stages of change order processing. In this case you will need to search the logs based on other




criteria, such as change order GUID, file name, file GUID, or File ID, in order to extract the records related to a given file change.

All change orders are assigned a GUID and all corresponding files and directories have the same file GUID on each replica set member. This information can be used to determine why a file on Computer A has not replicated to a second or third replica by locating the Change order GUID (CoG) or File GUID number in the Ntfrs_00n.log files on the originating server and then searching for the same GUID in the logs on the second and third replicas.




File and Folder Filters

Configuration stored in the Domain ContainerNTFRS-Replica-Set class objectFrsFileFilter and FrsDirectoryFilterSystem attributes

By default, files and folders excluded from replication:File names starting with a tilde (~) characterFiles with ‘.bak’ or ‘.tmp’ extensionsEFS encrypted filesNTFS mount points

Affects only new files, and not the ones already present

File and Folder Filters

File and folder filters are maintained for each FRS replica set, including SYSVOL and domain DFS with FRS replication enabled. They are kept in the Active Directory Domain Container and can be accessed using ADSI Edit, Active Directory Users and Computers, or DFS Microsoft Management Console (MMC) snap-in.

By default, the following files and folders are excluded from FRS replication: • File names starting with a tilde (~) character • Files with .bak or .tmp extensions • NTFS mount points • Files encrypted with EFS • All reparse points except RSS and SIS

Note that FRS might need to periodically read every file in the replica set to send the file contents to another computer. This causes FRS to recall all files that Remote Storage has sent to secondary storage, which might take a long time (hours or days). If you use tape for your secondary storage, remember FRS recalls files in directory order rather than media order, so the excessive number of tape seeks performed by FRS could ruin the tapes and cause data loss.

Filters act as exclusion filters only for new files and folders added to a replica set. They have no effect on existing files in the replica set. For example, if you change the existing file filter from "*.tmp, *.bak" to "*.old, *.bak," FRS does not go through the replica set and exclude all files that match *.old, nor does it go through the replica set and begin to




replicate all files that match *.tmp. After the filter change, new files added to the replica set matching *.old are not replicated. New files added to the replica set matching *.tmp are replicated.

In addition, any pre-existing file in the replica set that matched the old file filters (such as Test.tmp, created when the old filter was in force) is not automatically replicated when the filter changes. You must explicitly modify such files before they begin replicating. Likewise, you must explicitly delete any pre-existing files in the replica set that match *.old. Until that happens, changes to those files continue to replicate.

These rules apply in the same manner to the directory exclusion filter. If a directory is excluded, all subdirectories and files under that directory are also excluded.

These rules are designed to protect a system from user error. For example, a filter is accidentally changed to exclude a file like *.doc, FRS does not go through and delete every Microsoft Word file in the replica set. Similarly, if there is an unintentional omission of *.tmp from the filter, FRS does not go through each replica and begin replicating every temporary file that it finds.




Version Vector Join (VVJoin)

Is a process where every file in SYSVOL must have an MD5 hash calculated and compared with its replication partner.

VVJoins are typically caused by a DC establishing a connection with a new partner. This could occur because its normal replication partner becomes unavailable or if a D2 is performed.

A VVJoin can be a CPU-intensive process for a DC to performToo many simultaneous VVJoins on a busy bridgehead could cause FRS replication failures.As a baseline, avoid removing more than 10 Replication Connection Objects per DC, as 10 VVJoins will take place when new connections are created.

Version Vector Join (VVJoin)

A Version Vector Join (VVJoin) is the process in which a downstream partner joins with an upstream partner for the first time. In a VVJoin, every file in SYSVOL must have an MD5 hash calculated and compared with its replication partner. VVJoins are typically caused by a DC establishing a connection with a new partner. This could occur because its normal replication partner becomes unavailable or if a D2 is performed.

When a new DC joins the domain, a “version vector” is created and distributed from the new DC to each of the other DCs in the domain, to make sure each of the replication partners has the right version of the SYSVOL data. In Windows 2000 Server, the new DC pulls the entire SYSVOL tree from every DC in the domain at the same time, in parallel. This is a serialized process in Windows Server 2003 and Windows 2000 Server SP3+. The new DC will do a Version Vector Join (VVJoin) during promotion. Then, after completion, it will contact other DCs in the domain, one at a time, for changes. If the source DC is up to date, the VVJoin is still done to the others, but no replication takes place.

VVJoin can be a CPU-intensive process for a DC to perform because too many simultaneous VVJoins on a busy bridgehead could cause FRS replication failures. Therefore, avoid removing more than 10 Replication Connection Objects per DC, as 10 VVJoins will take place when new connections are recreated.




NTFS Change Journal

Used by FRS to track changesAvailable on NTFS 5.0 formatted volumes

Journal exists per volume (C:\, D:\, etc.)NFTS Journal size varies by OS version and SP

Windows 2000 ServerSP1 and SP2: 32MB, SP3: 128MB, SP4: 512MB

Windows Server 20032003 RTM: 128MB, SP1: 512MB

To determine the NTFS Journal size, use ‘fsutil’Recommendation is to increase Journal Size in 128MB for every 100,000 files on disk

NTFS Change Journal

FRS works only with Windows 2000 or Windows Server 2003 NTFS formatted volumes because it relies on the NTFS change journal to provide a persistent (that is, logged) record of files that have changed on a member computer. Files are replicated only after they have been modified and closed. Files that are locked by their owners are not replicated until they are unlocked.

The NTFS Change Journal is a log file that NTFS maintains and which describes the nature of changes that have occurred on the file system. NTFS updates this log transactionally and so it is kept in sync with the file system state, even in the case of a power failure or crash. As a result, FRS does not lose track of a changed file even if the system shuts down abruptly. The Change Journal has a bounded maximum size—if it exceeds the defined size, then NTFS discards a number of the older records in order to keep the journal within the defined size limits.

FRS uses this mechanism to track changes on the file tree being replicated. If items are discarded from the NTFS Change Journal before FRS has processed them, then FRS loses track of the file system state—a state known as Journal Wrap. In this case, the computer might need to undergo the non-authoritative restore process (also known the D2 process).




As a rule of thumb, the NTFS Change Journal for an NTFS volume should be sized at 128 MB per 100,000 files being managed by FRS on that NTFS volume. Note that:

• Multiple FRS replication trees may exist on the same NTFS volume. Thus, the size of the journal should be decided based upon the total number of files managed by FRS on that volume

• Other files may also be stored on the same NTFS volume, and file operations on these files will be entered in the Change Journal, even though they are ignored by FRS. A rule of thumb in this case is to consider an extra 8 MB per 100,000 files in this case (but this is really approximate—it depends upon how much activity might occur on those files).

• For best results, put FRS-related shares on their own NTFS volume (or volumes) so that the journal cannot be affected by other file activity.

The following article describes how to alter the size of the NTFS Change Journal.

Reference: For more information, see the following Knowledge Base article: 221111 "Description of FRS Entries in the Registry."




Section 2: Common FRS Problems

Understand the most common FRS issuesTroubleshoot FRS problems

Section 2: Common FRS Problems

Introduction This section will cover the most common problems that affect FRS and how you can detect and troubleshoot theses errors.


• Understand the most common FRS issues. • Troubleshoot FRS problems.




Journal Wrap Errors

Occur if a sufficient number of changes occur too fast and FRS can’t keep up or while FRS is turned off

The last change that FRS recorded during shutdown no longer exists in the USN Change Journal, during startup.

Changes might have occurred to files while the service was turned off. No record of the change exists in the USN journal. To guard against data inconsistency, FRS asserts into a journal wrap state.

Why it happensAdministrators may stop the FRS service for long periodsError conditions may cause the FRS service to shut down

In large replica sets, it might happen during an authoritative restore (D4)How to recover from a Journal Wrap State

Check for Event ID 13568 at FRS Event Log. The affected replica member will need to be reinitialized with a non-authoritative restore (BURFLAGS=D2).

Journal Wrap Errors

The USN journal is a log of fixed size that records all changes that take place on NTFS 5.0-formatted partitions. NTFRS monitors the NTFS USN journal file for closed files in FRS replicated directories as long as FRS is running. Journal wrap errors occur if a sufficient number of changes take place while FRS is turned off such that the last USN change that FRS recorded during shutdown no longer exists in the USN journal during startup. The risk is that changes to files and folders for FRS-replicated trees may have taken place too fast while the service couldn't keep up or was turned off, and no record of the change exists in the USN journal. To guard against data inconsistency, FRS asserts into a journal wrap state. To perform maintenance on FRS replica set members, administrators may stop the FRS service for long periods of time, not realizing the potential impact. In addition, error conditions may cause the FRS service to shut down, resulting in a journal wrap error. In extremely large replica sets, replica members may encounter the "journal_wrap_error" during an authoritative restore (BURFLAGS=D4). To recover, the affected replica member will need to be reinitialized with a non-authoritative restore (BURFLAGS=D2) where it will synchronize files from an existing inbound partner. This reinitialization can be time consuming for large replica sets.




Note: Before Windows 2000 SP4, different service pack and hotfix revisions could cause differences in FRS behavior, and some documentation may still reference these differences. The minimum supported revision level of the FRS binaries as of this writing is Windows 2000 SP4

Appropriate options to reduce journal wrap errors include:

• Place the FRS-replicated content on less busy volumes. • Keep the FRS service running. • Avoid making changes to FRS-replicated content while the service is turned off. • Increase the USN journal size. FRS is a service that needs to be running at all times on Windows domain controllers and members of FRS-replicated DFS sets. Increasing the USN journal size and thus the number of changes that it can hold before the journal "wraps", decreases the possibility that the USN journal wrap will occur. The USN journal size can be changed by setting the following registry key:

HKLM\System\CCS\Services\NTFRS\Parameters\"Ntfs Journal size in MB" (REG_DWORD)

This setting applies to all volumes that are hosting an FRS replica tree. You have to stop and then restart the NTFRS service for the increases to the USN journal size take effect. Decreases to the USN journal size can only be made by reformatting all volumes that contain FRS-replicated content. The number of changes that a given USN journal file can hold can be estimated with the following formula:

journal size/((60 bytes + (length of file name)) * 2)

The number "2" in this formula stems from two journal entries for each file change: 1 for open and 1 for close. Divide the journal size by the size per change to determine the approximate number of changes that can occur before the journal wrap error is encountered. Assuming 8.3 filenames, this maps to approximately 200,000 files and/or directories for a 32-MB journal file. The number of changes would be less if long file names were used.

The following articles describe how to recover from a Journal Wrap state.

Reference: For more information, see the following Knowledge Base articles: 290762 "Using the BurFlags registry key to reinitialize File Replication service replica sets" and 292438 "Troubleshooting journal_wrap errors on Sysvol and DFS replica sets."




Backlog Files

Backlogs refer to pending change ordersBacklogs are per connection basis.They consist of files waiting to be replicated, deletion orders, or VVJoins, and they are expected when changes are made. Excessive backlogs or backlogs that never reduce in size are abnormal and need to be investigated.

A DC with excessive backlogs typically does not indicate a problem with itself but its replication partners.

The backlogs will remain until its replication partners successfully replicate in the changes. Otherwise, it indicates that at least one replication partner is unreachable or failing to replicate via FRS (such as it being in journal wrap).

Backlog Files

Backlogs refer to pending change orders, on a pre-connection basis. Change orders can consist of files waiting to be replicated, deletion orders, or VVJoins. Backlogs are files stored in the Staging folder waiting for the replication partners to pull it. They are expected when changes are made. Excessive backlogs or backlogs that never reduce in size are abnormal and need to be investigated.

A DC with excessive backlogs typically does not indicate a problem with itself but its replication partners. The backlogs will remain until its replication partners successfully replicate in the changes. Large or prolonged backlogs often indicate that at least one replication partner is unreachable or failing to replicate via FRS (such as it being in journal wrap).

An instance of when excessive backlogs are a local problem is when the backlogs are unintentionally created. This can occur by antivirus or disk defragmentation software that may modify files as it scans/manipulates them.

Bridgehead servers will tend to have the most backlogs since they have the most replication partners. The Primary Domain Controller Emulator (PDCe) may also have more backlogs than average, since, by default, changes to Group Policy Objects (GPOs) originate from that DC.




Name Collisions

Name Collisions Occur When directories with the same name were created on different replicas, FRS detects the name conflict during replication. Since only one copy can be kept the original name, the last writer wins and a morphed name is created.

NTFRS_xxxxxxxx (where xxxxxxxx = 8 random hex digits)Causes of Name Collision

The same directory is modified on different members at the same time.Someone creating a folder in SYSVOL and then manually copying it to other DCs instead of waiting for FRS to do so.An authoritative restore (D4) is performed on a replica set member and the service is still running on one or more downstream partners.

Dealing With Morphed DirectoriesMake a safe copy of the tree for recovery.If the healthy directory contains all files, delete the morphed directory.If the morphed directory contains desired files, delete the empty "good" directory and rename the morphed directory.If the morphed directory contains a subset of good data, construct a good tree.

Name Collisions

If two or more users create files with the same file name on different replica set members, a collision occurs when they attempt to replicate those files to other members. When two files or directories are created with the same name, only one copy can be kept. A name collision occurs when the second file replicates to a member after the first file with the same name has already arrived. FRS detects that a name collision has occurred and applies the 'last writer wins' algorithm to the files. The most recently updated file is kept while the other file is deleted. The deletion is propagated to the other replica members and the most recent file is installed.

When a name collision occurs with a directory rather than a file, the “last writer wins” algorithm is applied and the most recently updated directory is given a new non-conflicting name, called a morphed name, while the other directory keeps its original name. The renamed directory is propagated to the other replica members and all copies of the directory are given the new name. It is then up to an administrator to delete the unwanted directory tree.

If there is a directory created with an unusual name that contains the same files that are contained in a directory with the correct name, you must delete the wrongly named directory on each replica member.

If a conflicted directory can be identified that contains all the files that it is supposed to have and the non-conflicted directory is empty, delete the empty, non-conflicted directory and rename the conflicted directory, giving it the non-conflicted directory's name.




If various conflicted directories contain some good data, copy the data of interest to either the non-conflicted directory or the most complete conflicted directory, and create a new, complete, non-conflicted directory.

Name collisions happen frequently when FRS is used to replicate DFS root target information. In addition to replicating the DFS root target information, FRS also attempts to replicate any directories created by DFS. Because DFS creates the same directory on each root target, FRS detects a name collision when it begins the replication process. FRS resolves the collision by renaming each collided directory. The renamed directories show up with morphed names. For example, you might see a directory with a name such as Linkname_ntfrs_1fab4343.

It is recommended that the directories be removed when the DFS link is added. The only workaround for this problem is to not use FRS to replicate root target information.

Name collisions can also occur when a non-authoritative restore is performed when the File Replication service on the inbound partners was not turned off. If this should happen, eliminate the wrongly named directories after making sure that a duplicate directory exists with the same files.




Excessive Replication and Sharing Violations

Virus Scans or File System Policies (ACLs) against SYSVOLCauses constant changes on large FRS-replicated directories

May result in the replication of MB or even GB worth of filesIssue happens periodically and changes may take place on more than one member

Number and rate of modified files needing replication often become unsustainable

Avoiding Excessive Replication by Antivirus UtilitiesConfigure the list of folders that are targeted and excluded by AV

822158 - Virus scanning recommendations on a Windows 2000 or on a Windows Server 2003 domain controller

Files locked by an application can’t be changed by FRS replicationSharing Violations can be caused by Antivirus, Defrag or Backup Tools

Excessive Replication and Sharing Violations

Excessive Replication FRS was updated in Windows 2000 SP3 to detect and suppress duplicate updates. This conserves network bandwidth. Event 13567 is logged to inform the administrator that this suppression has occurred.

More specifically, this event is logged when FRS detects that 15 identical updates were made to FRS-replicated files within a one-hour period and this condition has occurred over three consecutive hours. The duplicate changes might have occurred on a single file 15 times, or 15 unique files one time each, or any combination between those two.

Identical updates are changes to existing files where the MD5 checksum for the last change is identical to the previous change to that same file. Duplicate changes are common when administrators or programs make repetitive changes to FRS-replicated files, which may result in excessive replication or backlogs that delay data consistency between replication partners. The above thresholds apply only to event logging. FRS suppression works continuously, and does not apply to folders. Therefore, frequent changes to folders may affect FRS replication and staging even when suppression is turned on. Starting with Windows 2000 SP3, FRS will generate event ID 13567 when excessive replication occurs.




Excessive replication is defined as an average of 15 or more identical updates to files every hour for three consecutive hours. FRS suppression does not apply to folders. Please read the following article for detailed troubleshooting information:

Reference: For more information, see the following Knowledge Base article: 315045 "FRS Event 13567 Is Recorded in the File Replication Service Event Log After You Install Service Pack 3."

Known causes of excessive replication • File System Group Policies on SYSVOL A File System GPO can be applied to set certain NTFS permissions to files or folders. It is not recommended to set this policy on SYSVOL or any DFS share that is replicated by FRS, since it triggers a change in the file and therefore a replication change order. You can use the GetReportsForGPO.wsf script (delivered with the Group Policy Management Console (GPMC) scripts and normally located under C:\Program Files\GPMC\Scripts) to dump all group policies and the settings that are applied to the domain controllers.

Group Policies on a DC get applied every five minutes if there is a change, and every 16 hours regardless of whether the policy is new or already existing. So at the very least, every 16 hours, a full SYSVOL replication could be initiated from every DC.

• Antivirus Software Some antivirus software will rewrite the security descriptor of a file which FRS detects as a valid change. Most antivirus vendors have updated their software so this should not be an issue with the latest versions of the software.

To test if this is a problem, run a scan against SYSVOL, examine the ntfrs log files in the debug directory and check for the following “Skip local CO for.” If this is present, the antivirus software is not causing a problem. A change order that generates replication will show LCOLO. You can also use FRSDIAG and review the FRSDIAG.txt file.

Reference: For more information, see the following Knowledge Base article: 284947 "Antivirus programs may modify security descriptors and cause excessive replication of FRS data in SYSVOL and DFS."

• Disk Defragmenter Some disk defragmenter programs can cause FRS replication, because they change file properties. As so, it is not recommended to defragment SYSVOL or any DFS directories that are replicated by FRS.

Reference: For more information, see the following Knowledge Base article: 282791 "FRS: Disk Defragmentation Causes Excessive FRS Replication Traffic."




To determine if any of the issues above is causing excessive replication, you can:

• Examine the domain controller event logs and look for 13567 events, or use Sonar/Ultrasound.

• Configure FRS debug registry values on the target domain controller for the FRS log files.

HKLM\System\CCS\Services\NTFRS\Parameters. See Q221111 (Restart FRS after making the registry changes).

Debug Log Files (REG_DWORD): 0 minimum, no maximum, 5 default. Set this to between 20 and 50.

Debug Log Severity (REG_DWORD): 0 minimum, 5 maximum, 4 default (if SP2 or later is installed, the default value is 2). Set this to 4.

Debug Maximum Log Messages (REG_DWORD): no minimum, no maximum, 10000 default. Set this to 20000.

• Use FRSDIAG to set registry keys on the target machine. Create a FRSDIAG report and review the FRSDiag.txt output. FRSDIAG checks for suspicious outlog entries.

Reference: For more information, see the following Knowledge Base article: 221111 "Restart FRS after making the registry changes."

Sharing Violations Sharing violations can occur if other sources have open handles to a file that needs to be replicated. Typically, programs that can instigate sharing violations are:

• Antivirus programs • Disk-optimization tools • File system policies that repeatedly apply access control list (ACL) changes • A user profile or personal data that is constantly in use and is placed on the replica set • Any other type of data that is held open for long periods by an end user, a program,

or a process Windows 2000 Server SP4 and higher logs a 13573 event in the FRS log when a sharing violation has occurred and includes which file is being held open.

Please read the following articles for detailed troubleshooting information:

Reference: For more information, see the following Knowledge Base articles: 822300 "FRS Encounters "ERROR_SHARING_VIOLATION" Errors When It Tries to Replicate Data That Is Still in Use" and 816493 "How to Configure the File Replication Service to Allow Fewer Sharing Violations That Block Replication."




The Install Override feature If enabled (which it will be with the latest FRS releases), Install Override tells FRS to attempt to rename an opened target file out of the way in order to allow installation of a new updated version of the file. For example, an open .exe or .dll file would be treated this way.

Normally (i.e., when FALSE) FRS will wait until it can open the target with write access. Install Override only works if FRS can open the file for rename. This requires DELETE access to the file so if the target file is currently open with a sharing mode that denies DELETE access to other opens, then FRS will not be able to install the updated version until the file is closed. Note that Install Override only applies to files, not folders.

Use FrsFlags.vbs to toggle the feature on or off. This is included in the Windows Server 2003 Resource Kit.

For more information about programs compatible with FRS, consult the following article.

Reference: For more information, see the following Knowledge Base article: 815263 "Antivirus, backup, and disk optimization programs that are compatible with the File Replication Service."




Solving Replication Conflicts

Event Time LagEvent Time Lag

Check Version Number

Absolute Event Time

AcceptChange

RejectChange

Source# =

Destination#

Originating GUID

File Size

Timestamp on source is more

than 30 Minutes older

than destination

RejectChange

Change within 30 minutes differenceChange within 30 minutes difference

Source# <

Destination#

Source# >

Destination#

Timestamp on source is more

than 30 Minutes

newer than destination

AcceptChange

Solving Replication Conflicts

Multimaster replication allows any member of a replica set to propagate changes made to replicated files and folders to any other member in the set. There are no primary/secondary or master/subordinate relationships between members. When a replicated file is changed and closed, FRS propagates that change to other members in the replica set as determined by the connection object topology. Those members decide whether to accept or reject the change based on the event time and version number of a file.

The File Replication Service will generate an error and fail to replicate if the replication partners show a difference in system clock times by plus or minus 30 minutes.

A version number is the numerical value FRS uses to track changes that are made to a replicated file. It is assigned by a counting mechanism. When a changed file is closed, the version number on the file increases by one and an event time is noted in the tracking log. By default, FRS will check the version number associated with a file every 30 minutes. The following examples illustrate how event times and version numbers are used in replication.

A replica set is comprised of Computer A and Computer B. If File X on Computer A is updated and then closed, FRS notifies Computer B of the change.

• If the event time of the change to File X on Computer A is within the default 30-minute window, FRS will then check the version number of the file. If the version number of File X on Computer A is greater than the version number of File X on Computer B, the change is accepted and the file is updated. If the version number of




File X on Computer A is less than the version number of File X on Computer B, the change is rejected.

• If the version numbers of both files are equal, the event time is checked again, this time without the 30-minute window. In other words, if the event time associated with File X on Computer A is later than the event time associated with File X on Computer B, the change is accepted and the file on Computer B is updated. If the event time associated with File X on Computer A is earlier than the event time associated with File X on Computer B, the change is rejected.

• If the version numbers and event times match, then the change which contains the larger file size wins. Note the event times are stored internally as a 64-bit number.

• If the version numbers, event times, and the file size match, then the file from the replica with the largest originator GUID is accepted.

FRS uses a "last most frequent writer wins" algorithm, which means that usually the most recent update to a file or folder in a replica set becomes the version of the file or folder that replicates to the other members of the replica set. The circumstance where an older file replaces a newer one occurs when more updates have happened on replica A than on replica B before they have had a chance to replicate. FRS does not merge changes. Rather, a version of a particular file overwrites all other versions. For this reason, FRS is best suited for replicating files that are updated infrequently, such as product specifications, software distribution points, and Web content.

Files containing information that is updated more frequently must accommodate two scenarios: concurrent users and replication latency.

The following is an example of a concurrent user scenario: User A and User B open the same 100-page document on different replicas. User A adds 100 pages to the document and saves it. User B deletes 80 pages and then saves the same document. The 20-page document saved last by User B becomes the authoritative file because User B's change was the most recent. User A's changes are lost. To avoid this type of problem, User A and User B need to coordinate their updates to the document so that one of them updates it first and the other waits for the replicated changes to appear on the local server before making further changes.

FRS cannot enforce file-sharing restrictions or file locking between two users who are working on the same file on two different replica set members. Restricted file sharing is the process of allowing one user at a time to access a file and blocking certain types of access to all other users. This can only be done if both users are trying to access the same file on the same replica. The reason for this is that a replica set might contain hundreds of members and, because all members might not be connected to the network at any given time, it is not possible to enforce file-sharing restrictions or file-locking restrictions across all members.




The following are examples of replication latency scenarios: A user makes a change to a DFS share. Assume that the replication schedule specifies that replication takes place only at night. This means that updates originating on replicas in one site during the day are not available on replicas in other sites until the replication window opens in the evening.

One way for a user to deny access to other users in order to prevent changes from being made to his or her files or folders is to store those files in a home directory. Home directories are created during setup in the C:\Documents and Settings\username folder. A user can apply permissions to the home directory specifying the type of access given to other users. This is ideal for individuals who do not want other users manipulating their files. However, replication latency can also affect home directories, especially if the user travels.

For example, a user traveling on business from Paris to Berlin makes a change to a file in his or her home directory on a replica located on a server in Berlin at 1 p.m.. The replication schedule for the home directory specifies that replication takes place at 6 p.m. When the user returns to Paris at 4 p.m. and opens the file on a replica located in Paris, it does not reflect the changes that the user made in Berlin. The user has, in a sense, beaten his changes home.

One way to avoid problems with replication latency is to set shorter replication schedules so that replication will occur more frequently.




Restoring Replicated Files

Non-Authoritative Restore (BURFLAGS = D2)One member of replica set is lost or damaged

Stop NTFRS Service, set BURFLAGS = D2 in registry, Start NTFRS Service.FRS will rename the current SYSVOL content to a "pre-existing" folder, and then synchronize with its inbound partner, comparing file IDs and MD5 checksums. If they don’t match, a new copy is taken from partner and old file is left at the “pre-existing” folder.Check DCs for “pre-existing” folders after a D2 process. Event ID 13516 is logged when D2 finishes.

Authoritative Restore (BURFLAGS = D4)Entire replica set is lost or damaged. Be careful when using it!One machine is set to D4 (auth), all others to D2 (non-auth).

Primary Restore (System State Restore)Reinitializes the first member of a replica.

Restoring Replicated Files

Replicated files and folders can be backed up like any other share. It is a good idea to keep a backup copy of SYSVOL or a DFS configuration in the event that disaster recovery becomes necessary. However, the backup utility does not distinguish between types of data (replicated or otherwise) when backing up or restoring files and folders. If FRS replicates the restored files, it is important to make sure that the files contain the latest or most valid data.

In the context of FRS, a backup utility is used to backup replicated files. However, restoring replicated files does not involve the use of a backup utility. It involves changing the BackUp Restore bit in the registry to reinitialize an FRS replica member. The common recovery methods for FRS are non-authoritative or authoritative restores on failed FRS replica members.

A non-authoritative restore occurs when only one member in a replica set is lost, for example, from the failure of a disk drive. You can restore its contents from a backup tape, and then let FRS restore the files that have changed since the backup tape was made. This minimizes network traffic by not restoring static files.

An authoritative restore occurs when the entire replica set or every copy of the SYSVOL share has been corrupted or deleted. In this situation, it is best to restore only one replica from the backup tape. The restored files and folders can then be replicated to the other members in the replica set.

It is also possible to perform a primary restore of FRS.




Non-Authoritative Restore Process When a file is backed up, the version number of the file is not retained. However, if you back up the file from a replica set, the file object ID will be saved along with the other file attributes. It is the file object ID that guides the non-authoritative restore process.

If an individual replica is lost, you can use a non-authoritative restore to create a new replica member. FRS can remain running during a non-authoritative restore because the failed member is removed from the replica set, and, therefore, none of the restored files is replicated.

When FRS notices that the configuration has changed, it begins its initialization sequence to add the new member. The first step is to move the newly restored files from the new replica's root directory to a temporary directory for pre-existing files. FRS then connects with an inbound partner and requests information about every file in the replica share. The inbound partner supplies information such as the version number and file object ID for each file and folder. Using the file object ID, FRS locates the file or folder that was restored from tape and does a checksum-based comparison of the file contents with the inbound partner. If the checksums match, FRS places the restored file into the replica share. If the checksums do not match, FRS requests the file from the inbound partner. Files generally are not copied from the inbound partner unless the backup data is out of date.

At the conclusion of the non-authoritative restore process, the file content and the version information on the new member matches the content on the inbound partner. The files supplied from the backup tape are used only if they have file object IDs and their content matches the content of the corresponding file held by the inbound partner. This is especially valuable if the two members are linked by a low-speed network connection.

To perform a non-authoritative restore of a DFS replica Remove the failed member from the replica set.

Disable replication on the host server.

Repair the faulty member. For example, replace a disk drive that has failed.

Add the member back as a new replica. Do not specify it as an initial master when you enable replication unless you want to do an authoritative restore.

Note that initial master is only relevant when this member is the only member (that is, the first member) in the replica set. In this case, FRS enumerates the replica share and preloads its database with information about each file and folder. In addition, FRS assigns file object IDs to every file and folder in the replica share. If this member is not the only member in the replica share, FRS always treats the addition of a new member as non-authoritative.

Optionally, you can force a non-authoritative restore of the data in the Sysvol folder by following the steps outlined below. If you start the FRS with the Burflags registry value set to D2, the FRS performs a full synchronization of files and folders from a direct or




transitive replication partner that is hosting the authoritative copy of files and folders in the replica set. When you start the FRS with the Burflags registry entry set to D2, the configuration is generally referred to as a non-authoritative restore, even though no restore of the system state occurred. Think of the D2 setting as rebuilding the FRS part of the replica domain controller as if the domain controller were new.

Note: You have to stop the NT File Replication Service (NTFRS) service, and then set the startup type for NTFRS to manual on the domain controller where you want to perform the non-authoritative restore. This prevents the service from starting unintentionally while this operation is performed.

1. Start a command prompt. To do this, click Start, click Run, type cmd, and then click OK.

2. At the command prompt, type net stop ntfrs, and then press ENTER.

3. Click Start, click Run, type services.msc, and then click OK.

4. In the Services snap-in, double-click File Replication, click Manual under Startup Type, click Apply, and then click OK.

5. Click Start, click Run, type regedit, and then click OK.

6. Locate and then click the BurFlags value under the following registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup

7. If the key that is mentioned in step 6 does not exist, create it. To do this, click Edit, click New, click DWORD Value, type BurFlags, and then click OK.

8. In the right pane, right-click BurFlags, click Modify, type d2 in the Value data box, and then click OK.

9. Locate and then expand the following registry key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Sysvol Seeding\Domain System Volume (Sysvol share)

10. On the Edit menu, click New, click String Value, type Replica Set Parent, and then click OK.

11. In the right pane, right-click Replica Set Parent, click Modify, type the name of a domain controller that has the Sysvol data that you want to replicate in the Value data box, and then click OK.

12. Quit Registry Editor.

13. At a command prompt, type net start ntfrs, and then press ENTER.

14. Click Start, click Run, type services.msc, and then click OK.




15. In the Services snap-in, double-click File Replication, click Automatic under Startup Type, click Apply, and then click OK.

Authoritative Restore Process With an authoritative restore process, the restored files and folders are given the newest version number. This means that the replicated share that was just restored is automatically replicated to other members in the replica set. Perform an authoritative restore only if an entire replica set becomes corrupted.

To perform an authoritative restore of SYSVOL 1. Restore the data from tape to an alternate location.

2. Perform a non-authoritative restore.

3. After the SYSVOL share is published, copy the restored data from the alternate location into the actual location.

Again, optionally you can use the BURFLAGS key to perform an authoritative restore of Sysvol. It is important to note this procedure is not recommended by Microsoft due to the impact it may cause. If you start the FRS with the Burflags registry value set to D4, the FRS initially treats the files and folders on its local copy of the SYSVOL tree as authoritative for the replica set. Only one member of an FRS replica set should be initialized with the D4 setting.

When you start the FRS with the Burflags registry entry set to D4, the configuration is generally referred to as an “authoritative restore” for the contents of an FRS replica set, even though no actual restore of the system state occurred. Think of the D4 setting as rebuilding the FRS part of the first domain controller in a new domain. If you set Burflags to D4 on a single domain controller and set Burflags to D2 on all other domain controllers in that domain, you can rebuild the SYSVOL tree in that domain. This bulk rebuild process is known as a hub, branch, or bulk FRS restart.

Primary Restore Process A primary restore preinitializes the first member of a replica set by using the initial files found in the replica tree.

Use extreme caution when performing a primary restore. If a primary restore is done on a member of a replica set when other members in the replica set are still active, FRS will preinitialize the database on the member being restored using the current set of files found in the replica tree, not the backed up information. As a result, a complete set of files and folders will be sent out with a new set of object IDs that all look different to the other members of the replica set. This generates name-morph conflicts for the directories and causes FRS to generate a duplicate replica tree, with one set of directory trees having the normal name and the other set having the normal name with the suffix NTFRS_xxxxxxxx, which creates a tremendous inconvenience for network administrators.




To perform a primary restore of FRS 1. On the ntbackup Restore tab, select the backup set that you want to restore.

2. Click Start, click Restore, and then select the Advanced button.

3. On the Advanced Restore Options dialog box, select When restoring replicated data sets, mark the restored data as the primary data for all replica sets.

When the computer restarts, the FRS replica sets on the computer will undergo a primary restore.

Restoring Files on a Domain Controller Restoration on a domain controller is FRS-aware. This means that the backup utility recognizes that it is working with replicated data. Therefore, when using the backup utility a non-authoritative restore is always performed on domain controllers unless specified otherwise. This is because it is assumed that there is another replica in the domain with valid data.

In a non-authoritative restore, the version of the file currently residing on the other replica set members always takes precedence. This causes other servers to overwrite the restored data, thereby preventing you from restoring the data. Performing an authoritative restore assigns the newest version number to the replicated data to guarantee its replication to other servers.

Use the primary restore method if—and only if—the domain controller on which you are performing the restore is the last member of the replica set. This switch is intended for disaster recovery cases when the whole replica set is lost. Setting a member as initial master when it has other members from which to synchronize can result in name collision.

Restoring Files on a Member Server Restoration on a member server is always authoritative because it is not FRS-aware. In other words, it assumes that there are no other copies of the restored files on other servers. As a result, the replica being restored replicates its data to other members of the replica set.

Note that an authoritative restore is simply the restoration of a file onto a member that is actively replicating files. It does not produce a mirror image of the backup tape content in the replica share. Any new files that were created in the replica share after the backup tape was created are not deleted from the replica share. To perform a true authoritative restore of a replica share so that it mirrors the content from the backup tape, the user must first delete all files in the replica share and then restore the data from the backup tape.

Reference: For more information, see the following Knowledge Base articles: 290762 “Using the BurFlags registry key to reinitialize File Replication Service replica sets" and 840674 "How to force a non-authoritative restore of the data in the Sysvol folder on a domain controller in Windows 2000 Server and in Windows Server 2003."




Pre-staging files for FRS

Files Don’t Travel Network ADD

Replication Partner

MD5 HashGenerated

Take a Backup(NTBACKUP)

Restore Backup with MD5 Hashes

Pre-staging files for FRS

FRS supports pre-staging, a feature that allows new members that join a replica set to be populated from a backup of an existing member prior to joining the replica set.

Pre-staging a replica set is useful when there is a large number of files that have not changed and it is not necessary to replicate them to other members of the replica set. Rather than populating a low-bandwidth wire with unnecessary files, it is possible to pre-stage the replica set by making a backup of the outbound partner and restoring the backup onto an inbound partner at another site. Then add the inbound partner to the replica set, enable FRS replication, and allow the inbound partner to synchronize.

To pre-stage a replica 1. Set up at least two DFS alternates, such as \\Server1\Apps and \\Server2\Apps.

2. Enable replication between two replica members, such as \\Server1 and \\Server2. It is possible to designate any server as primary, but the replicated folders must be empty when the computers are added to the DFS/FRS replica set.

3. Copy the files destined for the replica set into the replicated \\Server1\Apps folder. Because \\Server1 has at least one outbound partner (\\Server2), when you copy a file into \\Server1, it causes FRS to generate a staging file and a change order is sent to \\Server2. An MD5 (a hash algorithm) checksum is computed during the staging file generation and the result is saved in the IDTable on \\Server1and in the change order sent to \\Server2. When \\Server2 processes this change order it saves the MD5 checksum in the IDTable on\\Server2. This process is the only way an MD5 checksum is saved in the IDTable and the use of the MD5 is




necessary to avoid overhead when new members are added later. When step 3 is finished, the replicated files should exist on both \\Server1 and \\Server2, and both IDTables should have MD5 checksums for each file and folder.

4. Use Windows NTBackup or a third-party equivalent to backup the contents of the replica tree from either \\Server1 or \\Server2. Windows NTBackup saves and restores the object identification (ID) attribute associated with each file and folder. Neither the Windows nor the Microsoft MS-DOS® copy commands preserve this information when files are copied from \\Server1 to \\Server2. This object ID must be restored with the files when new members are added later.

5. On \\Server3 and all future replica members, restore the backup to the \\Server3\Apps replicated folder (using the Restore files to an alternate location option) before you add the computer to the replica set.

6. To enable replication to \\Server3\Apps, FRS on \\Server3 moves all files from the target folder to the pre-existing folder, and then initiates a full synchronization (also referred to as a version vector join operation) from all computers that \\Server3 has inbound NTDS connection objects. In the case of DFS replica sets with a full mesh topology preferred by the Windows 2000 DFS snap-in, the sets can include all servers participating in the replica set, such as \\Server1 and \\Server2. The Windows Server 2003 release of the DFS snap-in supports more optimal topologies including a custom option. The key requirement in this situation is that \\Server3 has inbound connections from an upstream partner, \\Server1 and \\Server2 in this case, whose IDTABLE contains MD5 checksums for files contained in the replica sets of interest. FRS on \\Server1enumerates all the files and folders in its IDTable and sends directed (that is, single target) change orders to \\Server3. Because the IDTable has an MD5 checksum, it is included in the change order. As \\Server3 processes these change orders, this server takes the object ID for the file or folder from the change order and attempts to locate the corresponding file in the pre-existing folder. If the server locates the file, it recomputes the MD5 checksum on the content of that file, compares the result to the MD5 checksum it received in the change order and, if they match, uses the pre-existing file instead of attempting to obtain the file from \\Server1. If \\Server3 does not find the file or if the MD5 checksum does not match, the server obtains the file from \\Server1. Any change to the file content, such as to the access control lists, data streams, or attributes, can cause an MD5 mismatch and the file is obtained from \\Server1 or other upstream partner. Meanwhile, FRS on \\Server2 (and all other upstream partners of the new or reinitialized replica member) is performing the same process as \\Server1. \\Server3 processes a change order for a given file or folder from either Server1 or Server2, whichever arrives first. The other change is ignored. When all replication activity has settled out, the IDTables on all three servers have an identical MD5 checksum and identical file content in the replicated folder. Repeat steps 5 and 6 to add additional servers to the replica set.




Section 3: Troubleshooting FRS

Understand the tools available to t-shoot FRSIdentify problems with FRS and FRS replicationUnderstand tools to Monitor FRSKnow what to do when FRS stops replicating

Section 3: Troubleshooting FRS

Introduction There are several tools and ways to monitor and solve FRS problems, from continuous and proactive monitoring to snapshot and drill down troubleshooting. This section will cover the main tools and how they are positioned regarding monitoring/troubleshooting FRS.


• Understand the tools available to troubleshoot FRS. • Identify problems with FRS and FRS replication. • Identify tools to proactively monitor FRS replication. • Know what to do when FRS stops replicating.




Overview of Tools

Monitoring vs. TroubleshootingSonar

FRS events, FRS performance counters, FRS RPC

UltrasoundFRS events, performance counters, FRS topology, file system, FRS RPC

FRSDiagFRS events, FRS debug logs, FRS RPC, DS events, AD Replication

Continuous Monitoring Snapshot Troubleshooting

Sonar

Ultrasound

NtfrsUtl

FRSDiagFRS MOM Mgmt Pack +

Bridge to Ultrasound

Overview of Tools

There are several tools available to monitor and troubleshoot FRS, starting from a very proactive perspective to reactive troubleshooting.

Microsoft recommends FRS monitoring so you can predict problems and avoid downtime. Tools like Sonar, Ultrasound and the FRS MOM Management Pack can help you accomplish this task, while also helping troubleshoot issues. Tools like Ntfrsutl and FRSDiag on the other hand are very useful to capture data for further and detailed troubleshooting of FRS issues.

In the next slides we will discuss each tool in detail.




Sonar

Simple dashboardView an FRS environment and look for sources of problemsEspecially useful in large environments which have languished with no periodic monitoring Useful for checking:

FRS Service State, Journal Wrap, Backlog, Sharing violation indicatorsCan run on any machine with .NET Framework v.1.1Available for download at Microsoft Website

Sonar

Sonar is a tool designed for monitoring key statistics about FRS members in a replica set, and is available for download from the Microsoft Web site.

An administrator can use SONAR to easily watch key statistics on a replica set, so they may monitor for traffic levels, backlogs, free space, and other issues. SONAR allows definition of filters that define rules for which rows to display, and also allows definition of column sets that can be viewed. SONAR does not modify any settings on the computers which you monitor. It just passively reads information.

SONAR can collect status information from FRS running on either Windows 2000 (all service packs), or Windows Server 2003.

Note that the SONAR display is member-oriented. It tries to roll up statistics into a per-member view, instead of a per-connection view. This is by design, to provide a way of monitoring a set at a high level for members that are in trouble. However, the actual trouble may be connection-specific. So troubleshooting often involves first finding a member that requires attention using SONAR, and then drilling into the connection-specific issues.

When SONAR is started with no command line options, it allows a query to be defined. The administrator can choose a domain, and then a replica set within the domain. They can also choose to just view the hub computers in the domain. In this case, SONAR queries the FRS topology in Active Directory and then limits the view to just those computers with a larger than average number of connections. Note that members can be explicitly added or removed later.




Once the replica set is selected (or an existing query is loaded), SONAR displays replica member status in a grid view. The users should now click Refresh All to collect the data from the member systems.

Files Required • Sonar.exe • Ntfrsapi.dll (on local computers) • The Microsoft .NET Framework Common Language Runtime

Source Resource Kit




Ultrasound

ULTRASOUND provides in-depth health analysisEasy to use red, yellow, green lightsMonitors Replica set, member and connection basis.Available for download at Microsoft Website

Three key components to Ultrasound AGENT: WMI provider to collect FRS specific information (Windows 2000 SP2+) DATABASE: Controller service that collects and analyses health status – state stored in SQL/MSDE databaseCONSOLE: User interface to view data and perform admin tasks

Features:Alerts, alert grouping, resolution, continuously running monitoring servicePropagation Test files to troubleshoot replication issues and monitor progressComputed metrics – # of backlog files on a per connection, journal space, etc.Email notification, Basic history and reporting functionality, Topology changes

Ultrasound

Ultrasound is a monitoring and troubleshooting tool for the FRS. Ultrasound is a powerful tool to measure the health of FRS replica sets by providing health ratings and historical information about replica sets. Ultrasound also allows administrators to monitor the progress of replication and detect problems that can cause replication to become backlogged or stopped.

It is a follow-on project from 'SONAR', but Sonar will continue to be used by customers who need FRS status without deploying the ultrasound infrastructure

Ultrasound works by installing Microsoft Windows Management Instrumentation (WMI) providers on replica members in an organization. These providers gather FRS status information, which is polled and gathered by the Ultrasound controller. The controller is the service component of Ultrasound that collects data about monitored replica sets, pushes the information into the database, and analyzes the data to look for problems or other issues that require administrator notification or intervention. By using the user interface portion of Ultrasound, known as the console, administrators can configure Ultrasound to alert them via email of serious problems and use an incident log in Ultrasound to keep track of changes or tasks they performed in response to alerts.




Three key components to the Ultrasound infrastructure exist.

• WMI provider that is installed on FRS systems (supports down to Windows 2000 SP2 systems).

• Controller service in C++ that collects and analyses health status; state stored in SQL/MSDE database.

• UI (console) in C#/WinForms to view data and perform administrator tasks.

Source http://www.microsoft.com/frs

System Requirements Supported Operating Systems: Windows 2000 Service Pack 3; Windows Server 2003; Windows Server 2003 Service Pack 1; Windows XP

Hardware Requirements Video adapter: The Ultrasound console (the user interface component of Ultrasound) is designed for viewing in 1024x768 or higher resolution. Hard disk space: Hard disk space requirements vary according to the Ultrasound component to be installed. Microsoft Data Engine (MSDE): 100 MB. MSDE is installed separately from Ultrasound. Initial size of Ultrasound database: 5 MB. After FRS monitoring begins, the size of the database depends on the number and size of monitored replica sets. MSDE has 2 GB database size limit. Console: 2.5 MB. Controller: 1.5 MB. Provider: 400 KB.

Operating System and Service Pack Requirements for FRS Replica Members To reduce the likelihood of FRS errors, it is recommended that all replica members run the current service packs and post-service pack releases for either Windows 2000 Server or Windows Server 2003. The minimum supported operating system for FRS servers monitored by Ultrasound is Windows 2000 Server with SP2 and the hotfix described in article 322141, "Ntfrs.exe Does Not Clean Up the Staging Folders on Members with No Outbound Partners in Windows 2000." Later service pack releases, however, offer significant FRS improvements and bug fixes. Therefore, we recommend that all replica set servers run one of the following:

• Windows 2000 Server with SP3 and the post-Service Pack 3 release of Ntfrs.exe, described in article 815473, "File Replication Service Does Not Log Errors on Sharing Violations."




• Windows 2000 Server with SP4. • Windows Server 2003. Install the pre-Service Pack 1 release of Ntfrs.exe as

described in article 823230. Ultrasound Installation Requirements Ultrasound controller can be installed on Windows 2000 SP3 and above, Windows Server 2003, and Windows XP. Ultrasound cannot be installed on Windows Vista™ or on prerelease versions of Windows Server Longhorn. The Ultrasound controller runs only on 32-bit operating systems. The system running the Ultrasound console must have the following software installed:

• The .NET Framework version 1.1. This is included in Windows Server 2003, or you can download it. The .NET Framework 1.1 can be installed on Windows 2000 (with SP2 recommended) and Windows XP Professional.

• Microsoft Data Access Components (MDAC) 2.6. This is included in Windows Server 2003, or you can install it. Note that MDAC is automatically upgraded to version 2.6 when you install MSDE. Ultrasound requires a database server to store FRS data. You must install the database server separately before you install the controller. You can download and install Microsoft SQL Server 2000 Desktop Engine (MSDE), which is available for free or you can use SQL Server 2000 with SP3a. At this time, SQL Server 2005 and SQL Server Express Edition 2005 are not supported due to an incompatibility. The workaround is to install Ultrasound on SQL Server 2000 and then upgrade to SQL Server 2005.




Ultrasound Experience Ultrasound Experience

The Ultrasound console is the Windows-based UI used to access Ultrasound. The console enables the user to visualize all the data stored in the database, in addition to setting the configuration options for the controller and providers. To simplify some of the details, the console won’t talk to the controller directly. Instead, the database will be used as the path of communication.

The Ultrasound console is implemented in C#, and will require the .NET Framework (which includes the C# Runtime) to execute. The setup package for the console will include the .NET Framework redistributable.

Feature Overview Multiple consoles can be run simultaneously.

The security model does not force users to be a domain administrator for simple usage.

The console deploys the providers to the FRS servers.

Alerts, Notifications, and Logs As data flows through the system, alerts and notifications are used to warn the user of potential problems, or of changes in the state of Ultrasound. Logs are used by administrators to record management and troubleshooting observations or activities.

There are two types of alerts within the system:

• Informational Alerts: Alerts fired by the Ultrasound or Scalpel to pass information about an internal state change.




• Query Alerts: Alerts defined by a SQL query, and used by administrators to warn of potential problems.

Alerts are internal to Ultrasound. If the user wants to be notified of an alert through some external channel, such as mail or the Event Log, then a notification needs to be set up for that alert.

Alerts and notifications provide the user with the ability to be updated quickly when potential problems arise. However, we could get into a situation when a massive amount of alerts and notifications could be fired for the same set of problem. This is often referred to as event storms. To accommodate this, Ultrasound has a mechanism to suppress duplicate alerts.




Ultrasound Reporting Pack

Sample Web page and SQL script that produces report on propagation test operations, FRS health, and some metrics

Ultrasound Reporting Pack

The Ultrasound Reporting Pack is composed of sample Web pages and SQL scripts that produce reports on propagation test operations, FRS health, and some metrics.

As an alternative to using the Ultrasound console, when all you need is a quick overview of your system's health, we suggest you create or use a simple Web front end over the Ultrasound DB. This system could meet your reporting needs, without adding a considerable infrastructure to support.

The scripts contain hard-coded SQL queries, with wrappers to render results in HTML. While simple, these scripts have already been adopted by some groups to provide a quick overview of FRS health, propagation tests, etc. The setup requires an IIS server with ASP enabled, network connectivity to the Ultrasound SQL server, and about 30 KB of disk space.

Installation Download the archive file UltrasoundWebReporting.zip from the Microsoft Web site to the machine which hosts IIS. Create a directory, which will store the Ultrasound Web reporting virtual directory, and extract the archive there. Further instructions are available inside the zip file.




FRS MOM Management Pack Using Ultrasound

The MP acts as a bridge between Ultrasound & MOMProvides health information about FRS through MOMScripts executed on the Ultrasound controllerMOM agent runs the script as Local System

FRS Replica set, member, connection health

FRS MOM Management Pack Using Ultrasound

The FRS Management Pack for MOM monitors and collects FRS information from an Ultrasound database at regular intervals. It also reports health data for replica sets, members, and connections while diagnosing problems with the Ultrasound controller service. With embedded expertise, this management pack allows you to react to costly service outages.

Together with Ultrasound, the FRS Management Pack provides a mechanism for monitoring the Microsoft Windows File Replication Service running on Windows 2000 Service Pack 3 (SP4) or later.

The FRS Management Pack uses a MOM agent on the Ultrasound controller computer to analyze the Ultrasound database. MOM applies the FRS Management Pack rules to the Ultrasound Controller to determine the health status of individual replica sets, members, and connections.

The FRS Management Pack forwards the health information from Ultrasound to MOM. However, you should use the Ultrasound console for further detailed information about FRS health and to modify the rules and settings that define the health of FRS objects.




FRSDiag

Suitable tool when problem has been narrowed to a few serversCollects detailed information:

Event Log(s)Ntfrsutl Output(s)FRS Debug Log(s)FRS Registry DumpSYSVOL DumpRepadmin /showreps and /showconn

Performs several tests on the data collected to detect common known problemsFeatures

Creates a .CAB with all files for Product support Requires .NET frameworkIs bandwidth intensive – hence run only when detailed logs are needed for further debugging

FRSDiag

FRSDiag provides a graphical interface to diagnose and troubleshoot problems with FRS. It can gather snapshot data of the service and perform automated tests against it, which can then be compiled into an overview of potential problems.

This tool automatically checks for known FRS problems, including excessive replication, sharing violation, missing parent bug, Active Directory replication broken, missing connection objects, missing serverReference, disabled connection object, and improperly set FileFilters and DirectoryFilters.

It can collect the same information as Health_chk, and allows you to turn on/off information you do not want to collect. It also parses Event Logs and Error Scan logs, which can be corrupted because of new QFEs and it automatically compresses the information into a CAB.

Information It Grabs by Default • Event Log(s) • ntfrsutl Output(s) • FRS Debug Log(s) • FRS Registry Dump • SYSVOL Dump • Repadmin /showreps and /showconn




Further Information It Creates FRS Debug Logs Errorscan

FRSDiag.txt -- Contains results of tests.

FRSDiag_Log.txt -- Contains information about which tests were requested.

Connstat.txt -- Parses out the sets.txt into a more readable format.

IDTable.txt -- Parses out the IDTable into a more readable format with full file paths

.CAB with everything compressed in it.




Event Log Monitoring

FRS + Proactive Monitoring + Troubleshooting ToolsRecipe for healthy replicas.

The following Events can also give you directions when monitoring and troubleshooting

Event ID Summary Actions required

13508 Unable to RPC to a replication partner Wait for 13509. If no 13509 follows within 4 hours (rule of thumb) then investigate.

13509 Able to RPC to a replication partner No action – this indicates the 13508 wait is over.

13511 FRS Database is out of disk space Make more space available for FRS database.

13512 Enabled disk cache detected Typically no action required.

13522 Staging area full On Windows 2000 before SP3, requires administrator intervention to clear replication backlog.In Windows 2000 SP3 and later, automatic processes trim the size of the staging area and treat it like a cache. Administrators should investigate increase as an optimization if this is repeatedly logged.

13526 SID cannot be determined from the distinguished name

Restart FRS on that system

13548 Systems clocks are too far apart Correct clocks on one or more replica members

13557 Duplicate connections configured Delete the unnecessary connection object

13567 Excessive replication detected and suppressed.

Investigate what is causing excessive replication work; in the meantime the FRS server is CPU and disk resources to damp the replication traffic by comparing files to previously sent copies.

13568 Journal wrap Depends on FRS version

Event Log Monitoring

FRS event logs are a key source of monitoring information. The following table summarizes the main FRS event log entries that should be monitored on a regular basis, and this section describes each event in more detail and the actions required in each case.

Event ID Priority Summary Actions required

13508 (depends) Unable to RPC to a replication partner

Wait for 13509. If no 13509 follows within 4 hours (rule of thumb), then investigate.

13509 (none) Able to RPC to a replication partner No action. This indicates the 13508 wait is over.

13511 P1 FRS database is out of disk space Make more space available for FRS database.

13512 (none) Enabled disk cache detected Typically no action required.

13522 SP2: P1 SP3: P3

Staging area full On Windows 2000 before SP3, requires administrator intervention to clear replication backlog. In Windows 2000 SP3 and later, automatic processes trim the size of the staging area and treat it like a cache. Administrators should investigate increase as an optimization if this is repeatedly logged.




Event ID Priority Summary Actions required

13526 P1 SID cannot be determined from the distinguished name

Restart FRS on that system.

13548 P1 Systems clocks are too far apart Correct clocks on one or more replica members.

13557 P1 Duplicate connections configured Delete the unnecessary connection object.

13567 P2 Excessive replication detected and suppressed

Investigate what is causing excessive replication work; in the meantime, the FRS server isusing CPU and disk resources to damp the replication traffic by comparing files to previously sent copies.

13568 SP2: P2 SP3: P1

Journal wrap Depends on FRS version. See details, below.

Troubleshooting FRS Events 13508 without FRS Event 13509 Event 13508 in the FRS log is a warning that the FRS service has been unable to complete the RPC connection to a specific replication partner. It indicates that FRS is having trouble enabling replication with that partner and will keep trying to establish the connection.

A single event ID 13508 does not mean anything is broken or not working. Simply look for event ID 13509 to ensure the problem was resolved. Based on the time between event IDs 13508 and 13509, you can determine if there is a real problem that needs to be addressed.

Note: That if FRS is stopped after a 13508 interval and then later started at a time when the communication issue has been resolved, no 13509 will be entered in the event log. So, an event indicating that FRS has started, and without a 13508 message, indicates replication connections are being made correctly.

Because FRS servers gather their replication topology information from their closest Active Directory domain controller, there is also an expected case where a replica partner in another site will not be aware of the replica set until the topology information has been replicated to domain controllers in that site. When the topology information finally reaches that remote domain controller, the FRS partner in that site will be able to participate in the replica set and lead to FRS event ID 13509. In addition, FRS polls the topology in the active directory at defined intervals. These delays and schedules (and especially in topologies with multiple hops) can delay propagation of the FRS replication topology.




Procedures for Troubleshooting FRS Event 13508 without Event 13509:

1. Examine the 13508 event in the FRS Event Log in order to determine which machine that FRS has been unable to communicate with.

2. Determine whether the remote computer is working properly and verify that FRS is running on it. A good method to do this to execute ‘NTFRSUTL VERSION <FQDN_of_remote_DC_name>’ from the computer logging the 13508 event.

3. If this fails, check network connectivity by pinging the <FQDN_of_remote_DC_name>’ from the computer logging the 13508 event. If this fails, then troubleshoot as a DNS or TCP/IP issue. If it succeeds, confirm the FRS service is started on the remote domain controller.

4. Determine whether FRS has ever been able to communicate with the remote computer by looking for 13509 in the event log and review recent change management to networking, firewalls, DNS configuration, and Active Directory infrastructure to see if there is a correlation.

5. Determine whether there is anything between the two computers that is capable of blocking RPC traffic, such as a firewall or router.

6. Confirm that Active Directory replication is working.

Troubleshooting FRS Event 13511 FRS event ID 13511 is logged when the FRS database is out of disk space.

To correct this situation, free up some disk space on the volume containing the FRS database. If this is not possible, then consider moving the database to a larger volume with more free space.

Troubleshooting FRS Event 13526 FRS event ID 13526 is logged when a domain controller becomes unreachable.

Event ID:13526

The file replication service cannot replicate d:\Systemroot\sysvol\domain with the computer DC1 because the computer SID cannot be determined from the distinguished name "cn=dc1,ou=domain controller,dc=contoso,dc=com".

The file Replication Service will try later.

SYSVOL and DFS content are not being replicated.

For additional domain controllers, FRS replication failures can prevent the sharing of netlogon and sysvol shares, the application of policy, and the machine registering itself in the Active Directory as a domain controller.

For existing computers, files and folders in SYSVOL or DFS replica set are inconsistent between replica members.




This problem occurs because FRS polls Active Directory at regular intervals to read FRS configuration information. During the polling, an operation is performed to resolve the security identifier (SID) of an FRS replication partner. The binding handle might become invalid if the bound domain controller becomes unreachable over the network or restarts in a single polling interval (the default is five minutes).

To resolve this issue, restart FRS on the computer logging the error message.

Troubleshooting FRS Event 13548 FRS event ID 13548 is logged when two replica partners have diverged time settings.

Event ID: 13548

The File Replication Service is unable to replicate with its partner computer because the difference in clock times is outside the range of plus or minus 30 minutes.

The detected time difference is: XX minutes

This error could be caused by the selection of an incorrect time zone on the local computer or its replication partner.

Check that the time zone and system clock are correctly set on both computers. They must be within 30 minutes of each other, but preferably much closer.

Troubleshooting FRS Event 13522 FRS needs adequate staging area space on both upstream and downstream computers in order to replicate files.

On Windows 2000 before SP3, event 13522 indicates that the FRS service has paused because the staging area is full. Replication will resume if staging space becomes available or if the staging space limit is increased.

On Windows 2000 before SP3, clear the replication backlog. There are four common reasons why the staging area may fill up:

• One or more downstream partners are not accepting changes. This could be a temporary condition due to the schedule being turned off and FRS waiting for it to open, or a permanent state because the service is turned off, or the downstream partner is in an error state.

• The rate of change in files exceeds the rate at which FRS can process them. • There are no obvious changes made but the staging area is filling up anyway. This is

typically the "excessive replication" issue described in "Troubleshooting FRS Event 13567."

• A parent directory for large numbers of changes is failing to replicate. So, all changes underneath it are blocked.




Often, there is a combination of both problems. That is, changes cannot be replicated and the amount of change orders does not map to the amount of changes made.

Troubleshooting FRS Event 13557 FRS event ID 13557 is logged when duplicate connections are detected between to members:

Event ID: 13557

The File Replication Service has detected a duplicate connection object between this computer "<Computer 1>" and a computer named "<Computer 2>".

This was detected for the following replica set:

"DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"

To resolve this problem, it is necessary to delete duplicate connection objects between the direct replication partners that are noted in the event text.

Troubleshooting FRS Event 13567 Event 13567 in the FRS event log is generated on Windows 2000 Service Pack 3 and later systems when unnecessary file change activity is detected.

Unnecessary file change activity means that a file has been written by some users or applications, but no change is actually made to the file. FRS detects that the file has not changed, and maintains a count of how often this happens. If the condition is detected more than 15 times per hour during a three hour period, the FRS service logs the 13567 event.

Such events should be investigated to find the application or user that is modifying file content.

Reference: For more information, see the following Knowledge Base article: 315045 "FRS Event 13567 Is Recorded in the File Replication Service Event Log."

Troubleshooting FRS Event 13568 FRS event ID 13568 contains the following message:

The File Replication Service has detected that the replica set "1" is in

JRNL_WRAP_ERROR.

If FRS processing falls behind the NTFS update sequence numbers (USN) journal, and if USN journal information that FRS needed has been discarded, then FRS enters a "journal wrap" condition. FRS then needs to rebuild its current replication state with respect to the file system and other replication partners.

Each file change on the NTFS volume occupies approximately 100 bytes in this journal (possibly more, depending on filename size). As a rule of thumb, the NTFS USN journal for an NTFS volume should be sized at 128 MB per 100,000 files being managed by FRS on that NTFS volume.




Prior to Windows 2000 SP3, the default journal size was 32 MB and the maximum journal size was 128 MB. In Windows 2000 SP3 and later, the default journal size is 128 MB, and the maximum journal size is 10 GB

The journal size may be configured with a registry key, but keep in mind that once you increase it you should not lower it again since this will cause a “journal wrap.”

Journal Wrap Conditions Many files are added at once to a replica tree while FRS is very busy, starting up, or not running. FRS can encounter journal wrap conditions in the following cases:

• On a server that is being used for authoritative restore, or as the primary for a new replica partner, there is a lot of file activity at the start of this process and this can consume USN journal records. Following the rule of thumb mentioned above is sufficient to avoid this condition.

• An NTFS file system needs to be processed with Chkdsk and Chkdsk corrects the file system structure. In this case, NTFS creates a new NTFS USN journal for the volume or deletes the corrupt entries from the end of the journal.

• The NTFS USN journal is deleted or reduced in size. • NTFRS service is in an error state that prevents it from processing changes in the

USN journal. If FRS is experiencing journal wrap errors on a particular server, it cannot replicate files until the condition has been cleared. To continue replication, the administrator must stop FRS on that server and perform a non-authoritative restore of the data so that the system may synchronize with its replication partners. Note the following:

• Windows 2000 SP1 cannot perform this process automatically. • In Windows 2000 SP2, FRS performs this process automatically. • In Windows 2000 SP3 and later, FRS does not perform this process automatically.

The reason for this change was that it was typically being performed at times that were not planned by administrators. There is a registry setting available that allows FRS to perform the automatic non-authoritative restore, just as in Windows 2000 SP2. Enable journal wrap automatic restore registry key must be set to 1 in the following registry key: HKLM\System\Ccs\Services\Ntfrs\Parameters. However, Microsoft recommends leaving this as a manual process.

In addition to the event log entries listed above, the table in the following document lists other event types that might appear in relation to FRS.

Reference: For more information, see "Deployment, Monitoring and Troubleshooting of the Windows 2000 File Replication Service Using the SONAR, TOPCHK, CONSTAT and IOLOGSUM Tools" (Troubleshooting_frs.doc), included with the version of Sonar.exe that is available from the Free Tool Downloads link at http://www.microsoft.com/windows/reskits/webresources.




DFS Replication (DFSr)

DFS Replication is the successor to the File Replication service (FRS)It’s a new state-based, multi-master replication engine.Supports replication scheduling and bandwidth throttling.

Already available on Windows Server 2003 R2 (only for DFS)DFS Replication will be available for Active Directory (SYSVOL) with Windows Server Codename “Longhorn”

Enabled automatically at Windows Server “Longhorn” domain functional levelAutomated migration to DFS Replication, so SYSVOL can get all benefits and improvements

DFS Replication uses a new compression algorithm known as RemoteDifferential Compression (RDC).

RDC is a “diff over the wire” protocol, used to efficiently update files over a limited-bandwidth network. RDC detects insertions, removals, re-arrangements of data in files, enabling DFS Replication to replicate only the deltas (changes).

DFS Replication (DFSr)

The Distributed File System solution in Windows Server 2003 R2 helps administrators address new challenges by providing two technologies, DFS Namespaces and DFS Replication, which, when used together, offer simplified, fault-tolerant access to files, load sharing, and WAN-friendly replication.

• DFS Namespaces, formerly known as Distributed File System, allows administrators to group shared folders located on different servers and present them to users as a virtual tree of folders known as a namespace. A namespace provides numerous benefits, including increased availability of data, load sharing, and simplified data migration.

• DFS Replication, the successor to the FRS introduced in the Microsoft Windows 2000 Server operating systems, is a new state-based, multimaster replication engine that supports replication scheduling and bandwidth throttling. DFS Replication uses a new compression algorithm known as Remote Differential Compression (RDC). RDC is a “diff over the wire” protocol that can be used to efficiently update files over a limited-bandwidth network. RDC detects insertions, removals, rearrangements of data in files, enabling DFS Replication to replicate only the deltas (changes) when files are updated. DFS Replication is already available on Windows Server 2003 R2 for DFS namespaces.

DFS Replication will not be available for SYSVOL replication if you install Windows Server 2003 R2 on a Domain Controller. However, DFSr will be available on Windows Server codename "Longhorn" Domain Controllers, being enabled automatically when




Domain Functional Level is raised to "Longhorn" level. SYSVOL migration to DFSr will be automated so you the Domain receives all the benefits and improvements provided by DFSr.

Unlike FRS, which replicates the entire file even if only one byte is changed, DFS Replication uses RDC to replicate only the differences (or changes) between the members. This allows branch offices with slow WAN connections to participate in replication using minimal bandwidth.

RDC is especially efficient when small changes to large files are made. For example, a change to a 2-MB PowerPoint® presentation can result in only 60 KB being sent across the network, a 97 percent savings in bytes transferred. We recently ran a test on a mix of 780 Office files (.doc, .ppt, and .xls) replicating from a source server to a target server using DFS Replication with RDC. The target server had version X of the files and the source server had version X+, and the two versions differed with significant edits. The percent savings in bytes transferred was on average 50 percent and significantly better for large files.

Note that RDC might not be beneficial below a certain file size threshold and on high-speed LANs where network bandwidth is not limited. DFS Replication additionally compresses the bytes over the wire for further efficiency of transfer.




DFS Replication Benefits and Improvements

Simplified process for replicating folders to the same set of serversReplication groups and Replicated folders

Differential replication of changes to filesRDC is especially efficient when small changes to large files are made. DFS Replication additionally compresses the bytes over the wire for efficiency

Efficient and scalable replication algorithmFlexible scheduling and bandwidth throttlingSupported in stand-alone and domain namespaces and on individual foldersSelf-healing after USN journal wraps and database corruptionSupport for seeding or pre-staging new servers, New management toolsBuilt-in health metrics and diagnostic events

WMI providers can report journal wraps, database loss, insufficient disk space

DFS Replication Benefits and Improvements

DFS Replication provides substantial improvements over the FRS. The benefits of using DFS Replication, as well as notable improvements over FRS, are described below.

Simplified process for replicating discrete folders to the same set of servers The process of setting up replicated folders is simplified in Windows Server 2003 R2 by the introduction of replication groups and replicated folders. A replication group is a set of servers, known as members, that participates in the replication of one or more replicated folders. A replicated folder is a folder that is kept synchronized on each member. As data changes in each replicated folder, the changes are replicated across connections between the members. The connections between all members form the replication topology.

Creating multiple replicated folders in a single replication group simplifies the process of deploying replicated folders because the topology, schedule, and bandwidth throttling for the replication group are applied to each replicated folder. To deploy additional replicated folders, administrators use a brief wizard to define the local path and permissions for each new replicated folder. Each replicated folder also has its own settings, such as file and subfolder filters, so that administrators can filter out different files and subfolders for each replicated folder. In addition, you can configure replicated folders so that they are replicated to a subset of the members of a replication group.

The replicated folders stored on each member can be located on different volumes in the member, and the replicated folders do not need to be shared folders or part of a




namespace, though the DFS Management snap-in makes it easy to share replicated folders and optionally publish them in an existing namespace.

Differential replication of changes to files • Thanks to RDC, DFS Replication replicates only the differences (or changes)

between the two servers. As a result, bandwidth use during replication is minimized, an important consideration for branch offices that use low-bandwidth WAN connections to the hub office.

Efficient and scalable replication When two members of a replication group begin to synchronize with each other, they use an efficient algorithm for determining which files need to be replicated. The amount of metadata exchanged is minimal, and because the synchronization is state-based instead of event-based, the possibility of sending changes unnecessarily (due to the order the changes occur) is eliminated.

The introduction of state-based synchronization, along with RDC, allows DFS Replication to support replicating more files to more members than FRS. The scalability figures tested are as follows:

• Each server can be a member of up to 16 replication groups. • Each replication group can contain up to 16 replicated folders. • Each server can have up to 100 connections (for example, 50 incoming connections

and 50 outgoing connections). • On each server, the number of replication groups multiplied by the number of

replicated folders multiplied by the number of connections must be kept to 200 or fewer.

• A replication group can contain up to 100 members. • A volume can contain up to 500,000 replicated files. • The largest tested file size is 4 GB. DFS Replication does not have any scalability

issues with large files. When RDC is enabled, downloads can be resumed from where they were interrupted.

Flexible scheduling and bandwidth throttling DFS Replication supports replication scheduling and bandwidth throttling in 15-minute increments during a seven-day period. When specifying a replication interval, administrators choose the start and stop times as well as the bandwidth to use during that interval. The settings for bandwidth usage range from 16 kbps to 512 mbps as well as full (unlimited) bandwidth. Administrators can configure a default schedule and bandwidth that applies to all connections between members and optionally create a custom schedule and bandwidth for individual connections.




Supported in stand-alone and domain-based namespaces and on individual folders Unlike FRS, which is only supported in domain-based namespaces, DFS Replication can be used in both stand-alone and domain-based namespaces, as well as for folders that are not part of any namespace.

Self-healing after USN journal wraps and database corruption DFS Replication provides "self-healing" for USN journal wraps and Jet database corruption. Although replication temporarily stops during this healing process, the service recovers without any administrator intervention. To repair itself, DFS Replication scans the file system and re-creates the DFS Replication database. The database must then be synchronized with a database on another member. During the synchronization process, the amount of metadata sent across the network is dictated by the number of files under the replicated folder's local path and the size of the metadata to be sent per file. The metadata size for a file is two times the length of the file name plus approximately 144 bytes. Additional RPC and TCP overhead results in a roughly 5 percent overhead. Therefore, in the worst case, for 1 million files in the database with an average file name size of 50 bytes, approximately 194 MB of metadata is sent across the network.

Support for seeding or pre-staging new servers Before adding a new server to a replication group, administrators can prestage the replicated folders on the server by either copying the data to the server or restoring a backup. As described earlier, the synchronization process is very efficient in terms of bandwidth usage and metadata exchanged, resulting in minimal WAN traffic during the initial synchronization.

New management tools Administrators can use the DFS Management snap-in to configure both DFS Namespaces and DFS Replication. The snap-in provides integration between the two Distributed File System components so that administrators can:

• Select an existing folder in a namespace and configure DFS Replication on the folder targets (shared folders) associated with the folder.

• Add a replicated folder to an existing namespace.

Built-in health metrics and diagnostic events DFS Replication provides built-in Windows Management Instrumentation (WMI) providers for monitoring the health of DFS Replication. For example, the WMI providers can report USN journal wraps, database loss, insufficient disk space, network connectivity issues, sharing violations, excessive replication, and clock skew between members. These events are also reported in the DFS Replication event log, which is used exclusively for storing events related to replication. A Microsoft Operations Manager (MOM) management pack for monitoring DFS Replication is also available.




Module Summary

DFS NamespaceFRS does the replication (Sysvol, Netlogon)

Check FRS Replication dailySettings stored in AD – depends on AD replication

Common ProblemsJournal Wrap, Large Backlog Files, Name Collisions

(Morphed Folders), Excessive Replication, Sharing Violations, VVJoins

ToolsSonar, Ultrasound, FRSDiag, MOM, FRS Event Log, NTFRSUtl

Module Summary

FRS replicates information among all DFS members, so information is available and the same across all servers. DFS and FRS are used in Active Directory to store the SYSVOL structure, which is a DFS Namespace replicated to all domain controllers in a domain. It is very important to monitor FRS replication because all GPOs are stored there. FRS settings are stored inside the Active Directory Database, so Active Directory replication health is also important.

Most Common FRS Problems The following common FRS problems were discussed in this module, as well the steps to identify and address them:

• Journal Wrap • Large Backlog Files • Name Collisions (Morphed Folders) • Excessive Replication • Sharing Violations • VVJoins




FRS Tools Several tools are available to monitor and troubleshoot FRS, from snapshot problem resolution to proactive monitoring:

• Ultrasound • Microsoft Operations Management (MOM) • Sonar • FRSDiag • FRS Event Log • Ntfrsutl

Module 6: Group Policy Concepts and Troubleshooting 393



Module 6: Group Policy Concepts and Troubleshooting

394 Module 6: Group Policy Concepts and Troubleshooting











Module Overview

Introduction Group Policy is an infrastructure used to deliver and apply one or more desired configurations or policy settings to a set of targeted users and computers within an Active Directory environment. This infrastructure consists of both client and server components, and, and because of factors such as the large number of policy settings available, the interaction between multiple policies, and inheritance options, Group Policy design can be complex


• Describe the architecture dependencies of Group Policy in Active Directory. • Name and describe the different components of a Group Policy Object (GPO). • Name possible problems that can arise with GPOs, and describe the tools used in

troubleshooting.




Section 1: Group Policy Concepts

Introduction In Microsoft® Active Directory®, Group Policy is used to define configurations for groups of users and computers. You can create a specific desktop configuration for a particular group of users or computers by using the Group Policy Microsoft Management Console (MMC) snap-in.


• Identify the components of Group Policy. • Identify where Group Policy components are stored.

Related Topics Covered in This Lesson • Group Policy Management Console




Active Directory Group Policy

Provides Directory-based Configuration Management

Contained in Group Policy Objects (GPOs)

Consists of Both Client-side and Server-side Components

Can control Software deploymentFileRegistry SecurityScripts

Active Directory Group Policy

Active Directory Group Policy Active Directory Group Policy provides directory-based desktop configuration management, enabling administrators to deliver and apply one or more desired configurations or policy settings to a set of targeted users and computers within an Active Directory environment. Through Group Policy IT administrators can implement standard computing environments for sets of users and computers, including settings for:

• Registry entries • Security options • Certificates • Software • Permissions on folders, files, registry keys, and services • Running scripts including Startup and Shutdown scripts and Logon and Logoff

scripts This infrastructure consists of both client-side and server-side components. On the client side, a Group Policy engine and multiple client-side extensions (CSEs) are responsible for writing specific policy settings on target client computers.

On the server side, policies are stored both as directory objects and in files in the sysvol directory on the domain controllers (DCs). Taken together these two components make up the virtual objects that are referred to as Group Policy Objects (GPOs).




Group Policy settings that are based on administrative templates are not persistent between sessions—they do not tattoo the registry. Many, but not all, Group Policy settings are undone, and default registry settings are restored, when a user logs off or the system is rebooted. GPO registry settings are written to the following two secure registry locations, and then removed up when a Group Policy object no longer applies:

\Software\Policies

\Software\Microsoft\Windows\CurrentVersion\Policies

Note: Group Policy settings that are configured outside of the Administrative Templates section will remain in (will tattoo) the system, even when the policy that implemented the change no longer applies to the system.




Active Directory Integration of Group Policy

Group Policy Areas of Dependency

Group Policy Objects and Active Directory Object HierarchyGPO storage in Active DirectoryGroup Policy replication dependenciesGPO application based on account’s location and ACLs

Site

Domain

OU

Active Directory Integration of Group Policy

The Group Policy mechanism is integrated with and relies on several Active Directory elements:

Active Directory objects: One part of a GPO consists of objects stored in Active Directory. These objects can be found in the policies container in the domain Naming Context of each domain in the forest.

Active Directory structure: Group Policy depends on Active Directory as the targeting framework that allows you to link GPOs to specific Active Directory containers, such as sites, domains, or organizational unit(s) (OUs). The location of the user or computer account and the GPOs linked to the site, domain, or OU(s) in which those accounts reside, determine the policies that are applied to the user or computer.

Active Directory security: In addition to using the Active Directory structure to target GPOs, a user or computer account must be present in the Access Control List (ACL) of the GPO and have correct permissions assigned (Read and Apply Group Policy), in order for the GPO to apply to that user or computer. Permissions on Group Policy objects, and the container objects to which they are linked, also determine which individuals are able to manage or administer Group Policy.

Active Directory replication: Because GPOs are implemented as both Active Directory objects and file system objects stored in Sysvol, Group Policy relies on both Active Directory and File Replication Service (FRS) for SYSVOL replication, to ensure that Group Policy remains consistent across the Active Directory environment.




GPO Storage in Active Directory

Group Policy Objects are viewed, created, and managed via GPMC and Group Policy Editor:“Behind the curtains” each GPO consists of two data elements:

Group Policy Container (GPC), stored in Active DirectoryGroup Policy Template (GPT), stored in file system of DCs

GPO“Default Domain Policy”

GPC

GPT

CN=Policies,CN=System,DC=contoso,DC=com

\\contoso.com\SYSVOL\Contoso.com\Policies

GPO Storage In Active Directory

GPO Storage in Active Directory Understanding where GPOs are stored and how they are structured can help you troubleshoot problems you might encounter when you implement Group Policy. While Group Policy objects are created and managed, via the MMC user interface, as single, individual objects, behind the curtains GPOs consists of two distinct components that are stored in separate locations:

• The Group Policy container (GPC) - Stored in Domain > System > Policies • The Group Policy template (GPT) - Stored in the Universal Naming Convention

(UNC) path \\<DNS Domain Name>\SYSVOL\<DNS Domain Name>\Policies

It is useful to think of any GPO as having an Active Directory portion (the GPC) and a file system portion (the GPT). Key attributes of these objects link these two data elements together to define a single GPO.

When troubleshooting Group Policy issues, it is important to understand this concept, to be familiar with the locations of both sets of data, to understand what kind of information is contained in each, and to understand how the information is used in applying Group Policy to a user or computer account.




Group Policy Container Characteristics

AD Object stored in Domain > System > Policies container

DN: CN={<GUID>},CN=Policies,CN=System,DC=contoso,DC=com

Named by GUID rather than friendly nameAttributes contain “metadata” relative to the GPO:

Version, Display Name, etc.Replicated to other domain controllers via Active Directory Replication

Group Policy Container Characteristics (GPC)

Group Policy Container A Group Policy Container represents the GPO in Active Directory, and it stores its properties, which can include both computer and user Group Policy information. The Policies container is the default location of the GPCs. The path to the Policies container, in Lightweight Directory Access Protocol (LDAP) syntax, is CN=Policies,CN=System,DC=Domain_Name,DC=Domain_Name, where the Domain_Name values specify a fully qualified domain name (FQDN).

For example, here is the LDAP path to the Default Domain Policy GPC in the contoso.com domain:

CN={31B2F340-016D-11D2-945F-00C04FB984F9},CN=Policies,CN=System,DC=contoso,DC=com

In this example, CN={31B2F340-016D-11D2-945F-00C04FB984F9} is the Group Policy Container for the Default Domain Policy. This GPC lives in the Policies container, which is in the System container.

The Group Policy container has attributes that help describe how to deploy GPOs to the domain, OUs, and sites within the domain, and has a link to the file system component of a GPO, which is the Group Policy template.




Some of the information in a Group Policy container includes:

• Version information: Ensures that the information is synchronized with the Group Policy template information.

• Status information: Indicates whether the user or computer portion of the GPO is enabled or disabled.

• List of components: Lists extensions that have settings in the GPO. These attributes are gPCMachineExtensionNames and gPCUserExtensionNames.

• File system path: Specifies the UNC path to the Sysvol folder. This attribute is gPCFileSysPath.

• Functionality version: Gives the version of the tool that created the GPO. This attribute is gPCFunctionalityVersion.

• WMI filter: Contains the distinguished name of the Window Management Instrumentation (WMI) filter. This attribute is gPCWQLFilter.

As you can see, the GPC is an object that contains general information about, rather than specific policy settings for, the GPO.

There are three identifiers that are common to every GPC:

• displayName: the name assigned to the GPO, as displayed in GPEdit.

• objectGUID: the actual Active Directory object Globally Unique Identifier (GUID) of the GPO.

• Name: a unique numeric GUID that identifies the folder under SYSVOL/Policies in which the GPT (the file system portion of the GPO) can be found. This GUID name is a globally unique identifier, but it is not related to the object GUID of the GPO. It is used to uniquely identify the storage location of the associated GPT in the file system. It is used in path names that refer to the SYSVOL/Policies directory that contains the GPT.




GPC Characteristics (con’t)

GroupPolicyContainer SubcontainersGroup Policy Container-Related Attributes of Domain, Site, and OU Containers

gPLinkgPOptions

Managing Group Policy Links for a Site, Domain, or OU

GPC Characteristics (con’t)

GroupPolicyContainer Subcontainers Within each GroupPolicyContainer there are a series of subcontainers, the first levels of which are User and Machine. These two containers are used to separate some User-specific and Computer-specific Group Policy components.

GPC-Related Attributes on Domain, Site, and OU Containers Domain, site, and OU container objects contain two optional Group Policy container-related attributes, gPLink and gPOptions. The gPLink property contains the prioritized list of GPOs, and the gPOptions property contains the Block Policy Inheritance setting.

The gPLink attribute holds a list of all GPOs that are linked to the container, and a number for each listed Group Policy container that represents the Enforced (previously called No Override) and Disabled option settings. The list appears in priority order, from lowest to highest priority GPO.

Note: The gPLink attribute of the Site, Domain, or OU points to the GPC, and the GPC points to the UNC path of the GPT.

The gPOptions attribute holds an integer value that indicates whether the Block Policy Inheritance option of a domain or OU is enabled (0) or disabled (1).




Managing Group Policy Links for a Site, Domain, or OU To manage GPO links to a site, domain, or OU, you must have read and write access to the gPLink and gPOptions properties. By default, Domain Admins have this permission for domains and organizational units, and only Enterprise Admins and Domain Admins of the forest root domain can manage links to sites. Active Directory supports security settings on a per-property basis. This means that a non-administrator can be delegated read and write access to specific properties. In that case, if non-administrators have read and write access to the gPLink and gPOptions properties, they can manage the list of GPOs linked to the site, domain, or OU for which they have access.




Group Policy Template Characteristics (GPT)

Stored in GUID-named folder under SYSVOL\PoliciesFolder name GUID matches GPC “name” GUID

Location stored as “gPCFileSysPath” attribute of GPC object gPCFileSysPath:\\contoso.com\sysvol\contoso.com\Policies\{31B2F340-016D-11D2-945F-00C04FB984F9}

Relies on FRS replicationContains:

.ADM files - Display available registry settings in Group Policy Editor

.POL files - Store selected registry settingsCSE-specific data - Different data formats per CSE

Group Policy Template Characteristics (GPT)

The majority of Group Policy settings are stored in the file systems of the domain controllers. This part of each GPO is known as the Group Policy Template. The GroupPolicyContainer object for each GPO has an attribute, GPCFileSysPath, which contains the UNC path to its related Group Policy template.

All Group Policy templates for a domain are stored in the \\domain_name\Sysvol\domain_name\Policies folder, where domain_name is the FQDN of the domain. The Group Policy template, for the most part, stores the actual data for the policy extensions. For example, it would store the Security Settings .inf file, the Administrative Template-based policy settings .adm and .pol files, the applications available for the Group Policy Software installation extension, and, potentially, scripts.

The Gpt.ini File The Gpt.ini file contains the GPO version number of the GPT, and is located at the root of each Group Policy template.

An example of its contents would be:

[General]

Version=65539

Normally, this is identical to the version-number property of the corresponding GroupPolicyContainer object. It is encoded in the same way: as a decimal representation of a 4-byte hexadecimal number, the upper two bytes of which contain the GPO user settings version, and the lower two bytes of which contain the computer settings version.




In this example, the version is equal to 10003 hexadecimal, giving a user settings version of 1 and a computer settings version of 3.

The CSEs will check the version number to see if the client is out of date from the last processing of policy settings or if the currently applied policy settings (cached policies) are up to date. If the cached version is different from the version in the Group Policy template or Group Policy container, then policy settings will be reprocessed.

Group Policy Template Subfolders The Group Policy template folder contains the following subfolders:

• Machine: Includes a Registry.pol file that contains the registry settings to be applied to computers. When a computer initializes, this Registry.pol file is downloaded and applied to the HKEY_LOCAL_MACHINE portion of the registry. Depending on the contents of the GPO, the Machine folder can contain the following subfolders:

o Scripts\Startup: Contains the scripts that are to run when the computer starts up.

o Scripts\Shutdown: Contains the scripts that are to run when the computer shuts down.

o Applications: Contains the advertisement files (.aas files) used by the Windows installer.

o Microsoft\Windows NT\Secedit: Contains the Gpttmpl.inf file, which includes the default security configuration settings for a Windows Server 2003 domain controller.

o Adm: Contains all of the .adm files for the GPO. • User: Includes a Registry.pol file that contains the registry settings to be applied to

users. When a user logs on to a computer, this Registry.pol file is downloaded and applied to the HKEY_CURRENT_USER portion of the registry. Depending on the contents of the GPO, the User folder can contain the following subfolders:

o Applications: Contains the advertisement files (.aas files) used by the Windows installer.

o Documents and Settings: Contains the Fdeploy.ini file, which includes status information about the Folder Redirection options for the current user’s special folders.

o Microsoft\RemoteInstall: Contains the OSCfilter.ini file, which holds user options for operating system installation through Remote Installation Services.

o Microsoft\IEAK: Contains settings for the Internet Explorer Maintenance snap-in.

o Scripts\Logon: Contains all the user logon scripts and related files for this GPO.




o Scripts\Logoff: Contains all the user logoff scripts and related files for this GPO.

The User and Machine folders are created at install time, and the other folders are created, as needed, when policy is set.

Group Policy Object Editor Use of SYSVOL Each policy setting that is changed in a GPO causes at least two files to be rewritten, the GPT.ini and the file that is holding the changed setting. Making many changes to a GPO can cause a lot of network traffic, as Sysvol replicates these changes. This congestion should only occur on a local area network, where Sysvol replication occurs frequently. Across wide area network links, the inter-site replication schedule will cause these changes to be amalgamated into a smaller amount of traffic (for example, four changes to the Registry.pol file will result in only a single file replication).




GPO Synchronization

GPT and GPC should be in synch for GPO to be applied consistently:

GPC stores version number in “versionNumber” attributeGPT stores version number in GPT.INI file under SYSVOL\<DNSDOMAIN>\Policies\GUID_Named_FolderVersion numbers also cached locally on client

Version numbers incremented when GPO is editedPolicy refresh occurs at reboot, logon, or configured background refresh interval

If version number same, typically no update; however, each CSE has its own configurable refresh option

GPO Synchronization

GPC and GPT Synchronization Because the GPC and GPT rely on different mechanisms to replicate their data among Domain Controllers, replication and synchronization status of this data is a key consideration in troubleshooting. Some key points to remember are:

• The GPC is an Active Directory object, dependent upon Active Directory Replication. The current version number is stored in the VersionNumber attribute.

• The GPT is a File System object, stored in the SYSVOL tree and dependent upon FRS for replication. The version number is stored in the GPT.INI file.

There are several potential problem situations in Active Directory and in a networked environment that might prevent GPO edits from being written consistently to both the GPC and GPT, and that might cause version numbers to get out of synch. There is always some latency in both FRS and Active Directory replication, but if inconsistent version numbers persist on a particular Domain Controller, it could indicate that there is a problem with one of those replication mechanisms.

The version numbers are updated whenever an Administrator edits a GPO. If Active Directory replication and FRS are functioning correctly, changes based on the Administrator’s edits will be written to both the GPC and GPT, and the version numbers of each will be updated and then replicated.




As a rule, Policy Refresh occurs at reboot, logon, or at the configured background refresh interval. If no edits have been made to the GPO since the last refresh, the version number will remain the same.

Note: Before the Windows 2000 Service Pack 2 was released, the version numbers in the GPT and GPC needed to be in synch or the client would not apply the policy. This is no longer the case.




Client-Side Components and Processes

Group Policy Engine and CSEsUserenvClient-Side Extensions

Client-Side Components and Processes

Group Policy Engine The Group Policy engine is the infrastructure that processes Group Policy components, including server-side extensions and client-side extensions. The Group Policy engine is embedded in userenv.dll, which runs inside of Winlogon.exe, as shown in Figure 1:

Figure 1: Group Policy Engine Architecture and CSE Components




Group Policy Engine Processes Client-Side Extensions Client-side extensions are those components running on the client system that process and apply the Group Policy settings to that system. There are a number of extensions that are preinstalled in Windows Server 2003. Other Microsoft applications and third party application vendors can also write and install additional extensions to implement Group Policy management of these applications.

Group Policy Engine Architecture and CSE Components


Group Policy Engine The framework that handles functionalities across CSEs; the Group Policy engine runs inside userenv.dll.

Winlogon.exe A component of the Windows operating system that provides interactive logon support, Winlogon is the service in which the Group Policy engine runs. Winlogon is the only system component that actively interacts with the Group Policy engine.

userenv.dll Runs inside Winlogon and contains the Group Policy engine and the Administrative Templates extension.

gptext.dll Used to configure Scripts, IP Security, QoS Packet Scheduler, and Wireless settings.

fdeploy.dll Used to configure folder redirection.

scecli.dll Used to configure security settings.

iedkcs32.dll Used to manage various Internet Explorer settings.

appmgmts.dll Used to configure software installation settings.

dskquota.dll Used for setting disk quotas.

Table 1: Group Policy Engine Architecture and CSE Components




Application of Group Policy

UserEnv process locates and applies GPOs:Searches hierarchy of “containing objects” from location of User Account to domain root“Container Object Hierarchy” determines which GPOs are “inherited” and applied to User or Computer AccountsSecurity and WMI Filtering evaluated“Block Inheritance” and “no override” flags are considered

Determining which CSEs to call

Application of Group Policy

How Group Policy is Applied The userenv process on the client carries out the function of locating and applying GPOs at startup, logon, and at the configured Policy Refresh Interval. There are two primary milestones that the Group Policy engine uses for GPO processing:

• Creation of the list of GPOs targeted at the user or computer. • Invoking the relevant CSEs to process those policy settings within the GPO list that

are relevant to them.

The following steps are required to reach the first milestone in GPO processing, GPO list creation:

1. Query the Active Directory for the gPLink and gPOptions properties in the site and domain hierarchies to which the user or computer object belongs.

2. Query the Active Directory for the GroupPolicyContainer objects referenced in the gPLink properties.

3. Evaluate security filtering to determine whether or not the user or computer have the Apply Group Policy access permission to the GPO (permissions will be discussed below).

4. Evaluate the WMI query against the WMI repository on the client computer to determine whether or not the computer meets the query requirements.




Once the GPO list is created, the Group Policy engine and the CSEs work together to process the settings in the Group Policy template. The steps that are required to determine which CSEs to call are:

1. Retrieve the list of CSEs registered with Winlogon.

2. Check to see whether it is appropriate to run a particular CSE (for example, whether background processing or slow link processing is enabled for the extension).

3. Check the CSE history against the list of Applied GPOs. GPOs with new version numbers and GPOs that have settings relevant to the CSE (that is, they have the CSE extension GUID in the Group Policy container gpcUserExtension or gpcMachineExtension properties) are added to the Changed GPO List. GPOs that are no longer in the Applied GPO List are added to the Deleted GPO List.

4. Check to see whether the appropriate CSE should be processing policy settings for the user or the computer.

5. Check the version number listed in the GPO against its recorded version history in the registry to determine whether the GPO needs reprocessing.

The version number that is stored in the Gpt.ini allows the CSEs to check whether or not the client is out of date from the last processing of policy settings or if the currently applied policy settings (cached policies) are up to date. If the cached version is different from the version in the Group Policy template or the Group Policy container, then policy settings will be reprocessed.

If all of the version numbers are unchanged, the MaxNoGPOListChanges interval might have expired; if so, the CSE processes policy settings, without regard to an unchanged version number.

Steps 3 through 5 are repeated by each CSE for all GPOs in the GPO list. After one CSE is done, the next CSE that needs to run repeats the entire process.

Group Policy updates are dynamic and occur at specific intervals. If there have been no changes to Group Policy, the client computer still forces a refresh of the security policy settings every 16 hours plus a randomized delay of up to 30 minutes.




CSE Operation

CSEs called by Winlogon duringComputer startupUser logonRefresh intervals

CSE Policy options:Allow processing across a slow network connectionDo not apply during periodic background processingProcess even if the Group Policy Objects have not changed

Administrative templates processed first

CSE Operation

Client-Side Extension Operation CSEs are called by the Winlogon process at computer startup, at user logon, and at Group Policy refresh intervals. CSEs are registered with Winlogon in the registry. This registration information includes a DLL and a DLL entry point (function call), by which the CSE processing can be initiated. The Winlogon process uses these to trigger Group Policy processing.

Each extension can opt not to perform processing at any of these points (for example, by setting the option to Avoid processing during background refresh).

Preferences and Policy Configuration For each of the Group Policy Client-side extensions, there is a GPO setting that can be used to manipulate its behavior. These can be found with the Group Policy Object Editor in the following location:

Computer Settings\Administrative Templates\System\Group Policy

The computer policy options are:

• Allow processing across a slow network connection: When a Client -side extension registers itself with the operating system, it sets values in the registry, specifying whether it should be called when policy is being applied across a slow link. Some extensions move large amounts of data, so processing across a slow link can affect performance (for example, consider the time involved in installing a large application file across a 28.8 Kbps modem line).




Note: The values that the Client-side extension puts in the registry are considered to be preferences. However, if an administrator decides that the Client-side extension should run across a slow link regardless of the amount of data, the administrator can enable this policy.

• Do not apply during periodic background processing: Computer policy is applied at boot time, and then, in the background, approximately every 90 to 120 minutes thereafter. User policy is applied at user logon, and then approximately every 90 minutes after that. Some extensions can process policy only during the initial run, because it is risky to do it in the background. For example, Software Installation application upgrades are installed during the initial run and are not installed in the background. If installation were done in the background, a user could be running an application while the application is uninstalled and a new version of the application is installed. The application could also have a shared component that is in use by another application. Both of these situations could prevent the installation from completing successfully. The Do not apply during periodic background processing option gives the administrator the ability to override this logic and force the extension to either run or not run, in the background. Domain controller policy refreshes every five minutes.

• Process even if the Group Policy Objects have not changed: By default, if the GPOs on the server have not changed, the system will not continually reapply them to the client (since the client should already have all the settings). However, users might be able to change some policies and settings, if they are administrators of their computers. In this case, it might make sense to reapply these settings during logon or during the periodic refresh cycle, to get the computer back to the desired state. For example, assume that you have used Group Policy to define a specific set of security options for a file. Then the user (with administrative privileges) logs on and changes the options. The Group Policy administrator might want to set the policy to process Group Policy, even if the GPOs have not changed, so that the security is reapplied at every boot. This also applies to applications. Group Policy installs an application, but the user can remove the application or delete the icon. The Process even if the Group Policy Objects have not changed option gives the administrator the ability to restore the application at the next user logon.

Order of Extension Processing Administrative Templates policy settings are always processed first. Other extensions are processed in an indeterminate order.




Group Policy Processing Rules

GPOs Processed in the following order:Local GPOGPOs linked to the client’s siteGPOs linked to the client’s DomainGPOs linked to the user/computer’s OU

Link OrderEnforcing GPOsBlocking Inheritance

Group Policy Processing Rules

GPOs that apply to a user or a computer do not all have the same precedence. Settings that are applied later can override settings that are applied earlier.

Local Policy (if present) is applied first. Next, userenv searches the Active Directory container object hierarchy for linked GPOs, starting at the outermost level, and searching down the directory tree. The result is that Group Policy settings are processed in the following order:

1. Local Group Policy object: ach computer has exactly one Group Policy object that is stored locally. The Local GPO processes for both computer and user Group Policy settings.

2. Site: Any GPOs that have been linked to the site that the computer belongs to are processed next. Processing is in the order that is specified by the administrator on the Linked Group Policy Objects tab for the site, in the Group Policy Management Console (GPMC). The GPO with the lowest link order is processed last, and, therefore, has the highest precedence.

3. Domain: Processing of multiple domain-linked GPOs is in the order specified by the administrator on the Linked Group Policy Objects tab for the domain, in GPMC. The GPO with the lowest link order is processed last, and, therefore, has the highest precedence.




4. Organizational units: GPOs that are linked to the organizational unit that is highest in the Active Directory hierarchy are processed first, then GPOs that are linked to its child organizational unit are processed, and so on. Finally, the GPOs that are linked to the organizational unit that contains the user or computer are processed.

This order means that the local GPO is processed first, and GPOs that are linked to the organizational unit of which the computer or user is a direct member are processed last; this overwrites the settings in the earlier GPOs if there are conflicts. (If there are no conflicts, the earlier and later settings are merely aggregated.)

This also means that a policy could be set at the domain level, reversed at the OU level, reversed again at the child OU level, and so forth.

Conflict resolution between policies applies to individual settings, not to entire GPOs. It could easily happen that one setting in a GPO encounters a conflict but all other settings in that GPO are applied.

Where there are conflicts in settings between these GPOs, the last GPO applied wins.

Note: GPO precedence, inheritance, and related scope of management issues are common causes of unexpected GPO values applying to clients. Therefore it is critical to understand the order in which GPOs are applied and how conflicts between settings are resolved.

Other processing rules that can affect the final outcome of the policy include:

• Link Order: At the level of each organizational unit in the Active Directory hierarchy, one, many, or no GPOs can be linked. If several GPOs are linked to an organizational unit, their processing is in the order that is specified by the administrator on the Linked Group Policy Objects tab for the organizational unit in GPMC. The GPO with the lowest link order is processed last, and, therefore, has the highest precedence.

• Link Disabled: If a Group Policy link is disabled, the GPO will not apply to users or computers within the container to which that GPO is linked.

• Enforce: The Enforce setting is a property of the link between an Active Directory container and a GPO. It is used to force that GPO to all Active Directory objects within a container, no matter how deeply they are nested. The settings within a GPO that is enforced override other settings that would prevail because they are applied later. If there are conflicting settings in GPOs that are enforced at two levels of the hierarchy, the setting enforced furthest from the client prevails. This is a reversal of the usual rule, in which the setting from the nearest-linked GPO would prevail.




• Block Inheritance: The Block Inheritance setting applies to an entire Active Directory container. It blocks the inheritance of all GPOs, except those for which the link from the parent Active Directory object to the GPO has the Enforce setting enabled. Administrators who have set Block Inheritance on their domain or OU can still make explicit links to GPOs elsewhere in the domain, including to GPOs that might otherwise be inherited. (Domains do not inherit GPOs from parent domains.) When Block Inheritance is applied at a domain level, it blocks GPOs that are linked to sites.

Regardless of whether it is part of a domain or is a stand-alone machine, every computer has a single Local GPO that is always processed. The Local GPO can’t be blocked by domain-based GPOs. However, settings in domain-based GPOs always take precedence, because they are processed after the Local GPO.

If you suspect that inheritance or other processing rules are producing unwanted results, the following procedures may help to resolve this issue:

To view or change precedence order of GPOs:

1. Open GPMC and click any site, domain, or organizational unit node.

2. Click the Group Policy Inheritance tab, and examine the precedence order of the GPOs. Within each domain, site, and organizational unit, the link order controls when links are applied.

3. To change the precedence of a link, you can change the link order, moving each link up or down in the list to the appropriate location. The link with the higher order (with 1 being the highest order) has the higher precedence for a given site, domain, or organizational unit. For example, if you add six GPO links and later decide that you want the last one that you added to have highest precedence, you can move the GPO link to the top of the list.

To check GPO links:

1. In GPMC, select the GPO you are troubleshooting, and then click the Scope tab. You will see a list of containers that are linked to the GPO and the status of those links.

2. To change the status of a link, click the Details tab, and then, in GPO Status, choose an option. You can enable all settings, disable only computer settings, or disable only user settings.




Targeting GPOs and Security Filtering

Primarily through GPO Links to Site, Domain, OU

Security FilteringRead and Apply Group Policy (AGP) permissions RequiredBy default, all GPOs give Authenticated Users Group Read and ApplyPermissions can limit scope

WMI FilteringIf statement is true, then the GPO processedAvailable only on Windows XP and later clients

Targeting GPOs and Security Filtering

Security Filtering The site, domain, and OU links from a GPO are used as the primary targeting principle for defining the computers and users that should receive a GPO. And, as was discussed in the previous sections, the Group Policy Engine will check to see whether or not that user or computer also has both Read and Apply Group Policy (AGP) permissions on the GPO, either explicitly, or though group membership.

Note: By default, all GPOs have Read and Apply Group Policy Allowed for the Authenticated Users group. Therefore, the default behavior is for every GPO to apply to every Authenticated User.

Security filtering (and WMI filtering, discussed below) can be used to further refine the users and computers that will receive and apply the settings in a GPO. You can limit the scope to a specific set of users, groups, or computers within the organizational unit, domain, or site, or even narrow the scope of a GPO, so that it applies only to a single group, user, or computer. Security filtering determines whether to apply the GPO as a whole; it cannot be used selectively, on different settings, within a GPO.

These permissions can be changed through the Group Policy Management Console, which manages these permissions as a single unit and displays the security filtering for the GPO on the GPO Scope tab. In GPMC, groups, users, and computers can be added or removed as security filters for each GPO.




How Security Filtering is Processed Before processing a GPO, the Group Policy engine checks the Access Control List associated with the GPO. The ACL must contain an access control entry (ACE) that gives the user or computer both Apply Group Policy and Read permissions, in order for the Group Policy engine to add the GPO to the processing list.

In addition, if an ACE on a GPO denies the Apply Group Policy or Read permission, the Group Policy engine does not add the GPO to its list of GPOs to process.

Note: In general, Deny ACEs should be avoided, because you can achieve the same results by granting or not granting Allow permissions.

WMI Filtering If, after security filtering, appropriate permissions are granted to the GPO, it is added to the list of GPOs to download. Upon download, the Group Policy engine reads the gPCWQLFilter attribute in the Group Policy container, to determine whether a WMI filter is applied to the GPO. If it is, the WMI filter, which contains one or more WQL statements, is evaluated. If the statement evaluates to true, then the GPO is processed. There are tradeoffs in using WMI filters, because they can increase the amount of time it takes to process policy, especially if the filter to be evaluated takes a long time to process.

WMI Filtering Scenarios Sample uses of WMI filters include:

• Services: Computers on which Dynamic Host Configuration Protocol (DHCP) is turned on.

• Registry: Computers that have this registry key populated. • Hardware inventory: Computers that have a Pentium III processor. • Software inventory: Computers that have Visual Studio .NET installed. • Hardware configuration: Computers that have network interface cards (NICs) on

interrupt level 3. • Software configuration: Computers that have multi-casting turned on. • Associations: Computers that have any services dependent on Systems Network

Architecture (SNA) service. Client support for WMI filters exists only on Windows XP, Windows Server 2003, and later operating systems. Windows 2000 clients will ignore any WMI filter and the GPO is always applied, regardless of the WMI filter. WMI filters are only available in domains that have at least one Windows Server 2003 domain controller.




Group Policy Loopback Mode

Default GPO Application Behavior:Policies are applied based on location of user account, rather than location of computer account

Loopback Policy:Applied to computer accountPolicies are applied based on location of computer account, rather than user accountUseful on multiple-user or task-oriented machines:

Kiosks, Terminal Servers, labs, etc.

Options:Replace

GPOs linked to computer account’s “container hierarchy” are used instead of GPOs linked to user account container hierarchy

MergeGPOs from both computer account and user account container hierarchies are appliedIf specific policies are in conflict, computer account’s GPOs prevail

Group Policy Loopback Mode

User Group Policy Loopback Processing Mode Loopback processing is a way to enforce a set of user settings at a computer, regardless of who logs on to that computer. Typically, user settings are applied based on the location in Active Directory of the user object (that is, the OU that the user is in). If loopback processing is set for a computer, the user settings for anyone logged on to that computer are (partially or fully) dependent on the location of the computer object in Active Directory.

The behavior depends on the mode of loopback processing. In Replace mode, only the user settings defined in GPOs applied to the computer are used. In Merge mode, user settings from GPOs that would normally apply to the user are used, provided they do not conflict with user settings in GPOs that apply to the computer.

Security filters can affect the way loopback processing is applied. Even when the GPOs associated with the computer are used to define user settings, the user’s credentials, not the computer’s credentials, are validated against the GPO’s security filter. Therefore, the user’s credentials determine whether the GPO should be applied.

To determine whether loopback processing is in effect, look for the User Group Policy loopback processing mode setting on the Settings tab of the report, under Computer Configuration \Administrative Templates \System\Group Policy in a Resultant Set of Policy (RSoP) report.




Loopback Mode Example

If Loopback is enabled:When user Kimakers logs onto computer \\Lobby-01:

GPOs user settings apply from KioskBoxes OU, Workstations OU, Support OU, etc.

Replace and Merge options govern resultant policy

SouthWest OU

Contoso.com Domain

Departments OU

Workstations OU

Kiosk Boxes OU

\\Lobby-01

USER: Kimakers

SWUsers OU

Support OU

Loopback Mode Example

In the example above, without loopback enabled, Kimakers would apply User Configuration settings from GPOs linked to: Contoso.com, Departments OU, Support OU, and SWUsers OU. The computer would apply Computer Configuration settings from GPOs linked to: Contoso.com, Departments OU, Support OU, Workstations OU, and Kiosk Boxes OU.

If loopback is enabled in merge mode, Kimakers would apply User Configuration settings from GPOs linked to: Contoso.com, Departments OU, Support OU, and SWUsers OU, then from: Contoso.com, Departments OU, Support OU, Workstations OU, and Kiosk Boxes OU. Precedence would be given in reverse order, barring any use of the No Override and Block inheritance options.

If loopback is enabled in replace mode, Kimakers would apply User configuration settings from GPOs linked to: Contoso.com, Departments OU, Support OU, Workstations OU, and Kiosk Boxes OU.




Logon Optimizations

Optimization is off during the user’s first logon to a computerOptimization is always off when:

User has a roaming profileUser has a home directoryLogon script on the user object present

Folder redirection and software installation require a synchronous application of policy to apply:

With optimization on it may take two logons for policy settings to apply

Logon Optimizations

By default, Microsoft Windows XP does not wait for the network to be fully initialized at startup and logon. Any existing users who are logging on are logged on using cached credentials, which results in shorter logon times. Because the computer does not wait for the network to be fully started, Group Policy is applied in the background (asynchronously) once the network becomes available. Table 2 below compares the way policy is processed for different operating systems.

Operating System Boot Logon Policy Refresh

Windows 2000 Synchronously Synchronously Asynchronously

Windows XP Pro Asynchronously Asynchronously Asynchronously

Windows Server 2003 Synchronously Synchronously Asynchronously

Table 2: Default Policy Processing for Client Computers

The boot time is the time it takes before a user sees the logon (Ctrl-Alt-Delete) screen. Logon time is the time it takes before a user can begin working on the computer.

Asynchronous processing in Windows XP Professional Edition enables faster boot and logon times, compared to synchronous processing in Windows 2000, in which users must wait for all their policies to apply, before they can begin a computer session. However, all Group Policy settings are still processed, in full, whenever a user first logs on to a computer.




Some GPO settings can take up to three logons to become effective Because background refresh is the default behavior in Windows XP, some policy extensions, such as Software Installation and Folder Redirection, may require as many as three logons to apply changes.

This behavior exists because Software Installation and Folder Redirection policies cannot apply during an asynchronous or background application of policy. These extensions can only apply when they are processed synchronously.

Here is a sample scenario showing the way in which policies are applied:

• An administrator deploys a software package to User A. • User A logs on but receives a background (asynchronous) application of policy. • Because the policy application was asynchronous, the software that was set to be

installed cannot be installed at logon. Instead, the computer is tagged, indicating that software needs to be installed.

• The next time the user logs on, the computer logs the user on synchronously to allow the software package to be installed. (This is the same behavior as Windows 2000.) This results in one extra logon for the software to be installed.

In the case of Advanced folder redirection, because policy is evaluated based on security group membership, three logons will be required: the first logon, to update the cached user object (and security group membership); the second logon, for policy to detect the change in security group membership and require a foreground policy application; and the third logon, to actually apply folder redirection policy in the foreground.

Changes to some user object properties may take two logons to become effective When Fast Logon Optimization is enabled, all user logons are cached. The user’s logon information is updated after logon, which means that changes to user object properties, such as adding a roaming profile path, home directory, or user object logon script, will not be detected until the second logon. At the second logon, the system detects that the user has a Roaming User Profile, HOMEDIR, or user object logon script, and disables the Fast Logon Optimization for that user. (However, the user’s computer could still experience fast boot.)

Reverting to Windows 2000 logon processing If it is necessary to guarantee the application of Folder Redirection, Software Installation, or roaming user profile settings in just one logon or boot cycle of the computer, administrators can enable the setting Always wait for the network at computer startup and logon which is located in the Group Policy snap-in at Computer Configuration\Administrative Templates\System\Logon.

Reference: For more information, see the following Knowledge Base article: 305293 “Description of the Windows XP Professional Fast Logon Optimization.”




Group Policy History

Information about each GPO that is read and appliedStored in RegistrySub Keys

Display nameDSPathFileSysPathGPOLinkGPONameIparamOptionsVersion

Group Policy History

As GPOs are read and applied, information about each of them is written to the registry on the client computer. This information includes the Group Policy extensions that applied policy, the order in which the GPOs were applied, version data, and the options that were defined for each GPO. The version data is also used to determine whether changes have been made to the GPO since the last time policy was applied.

In the registry, the history of the application of GPOs is broken down by Group Policy extension.

For Group Policy objects applied to the local computer:

• HKLM\Software\Microsoft\Windows\CurrentVersion\Group Policy\History. For Group Policy objects applied to the currently logged on user: • HKCU\Software\Microsoft\Windows\CurrentVersion\Group Policy\History.

Underneath each of the keys that represent installed Group Policy extensions, there will be keys for each of the Group Policy objects that have been applied. Each of these is assigned a number that corresponds to the order in which it was applied. The first GPO is given the number 0 and, as other GPOs are applied, the value assigned to the key is incremented. The registry values that may be used are:

• DisplayName: DisplayName is the friendly name of the Group Policy object, as displayed in the GPMC and Group Policy Editor.




• DSPath: DSPath is the distinguished name (DN) of the path to the Group Policy object stored in Active Directory. For example: LDAP://CN=Machine,CN={GUID of GPO},CN=Policies,CN=System,DC=<Domain>.

This attribute will not be present for local Group Policy objects because these are not stored in the Active Directory.

• FileSysPath: FileSysPath is the path to the Group Policy template (GPT), or file-based policy, contained in the Group Policy. If this is a GPO from the domain, the path will be a UNC path to the SYSVOL share on the domain controllers. If this is a local Group Policy object, this will be a local path that points to the structure beginning with the path:

%SystemRoot%\system32\GroupPolicy

• GPOLink: The GPOLink value identifies the scope to which the Group Policy object was applied, therefore affecting the computer or user. The following values are valid:

0= No link information

1= The GPO is linked to a machine (local)

2= The GPO is linked to a site

3= The GPO is linked to a domain

4= The GPO is linked to an organizational unit

• GPOName: The GPOName value contains the name of the GPO, as it is referenced. For Group Policy objects associated with computers, this name will be the friendly name of the GPO. For Group Policy objects stored in the Active Directory, this will be the GUID of the GPO.

• lParam: The lParam value is used to perform various functions on GPOs. Group Policy extensions can customize this value.

• Options: The Options value represents the options selected by the administrator to configure the Group Policy object link. These include such things as the option to disable the Group Policy object or to force the settings defined in the GPO on subcontainers.

• Version: The Version registry value specifies the version number of the GPO when it was last applied. The number is used to determine if the GPO has changed since it was last applied.




Section 2: GPO Tools and Troubleshooting

GPMCRSoPConvergence and GPOToolRefresh and GPUpdateUserenv LoggingOther Issues and Scenarios

Section 2: GPO Tools and Troubleshooting

Introduction There are a number of tools to help with troubleshooting Group Policy. Many of these tools are new to Microsoft Windows Server 2003.


• Analyze Group Policy settings that have been applied to users and computers. • Check that domain controller Group Policy settings are functioning. • Use advanced troubleshooting tools to record the details of policies being processed.

Related Topics Covered in This Lesson Active Directory Replication

File Replication Service (FRS)

Recommended Reading Troubleshooting Group Policy Problems, in the Group Policy Operations Guide on TechNet

Windows Group Policy Guide (from the Windows Server 2003 Resource Kit).

Group Policy Technical Reference on Technet




Group Policy Management Console

A unified graphical user interface (GUI) for GPO administration

Backup/restore of GPOs

Import/export and copy/paste of GPOs and WMI filters

Simplified management of Group Policy-related security.

HTML reporting for GPO settings and RSoP data

Scripting of Group Policy-related tasks (but not settings within a GPO)

Group Policy Management Console

The Group Policy Management Console (GPMC) simplifies the management of Group Policy by making it easier to understand, deploy, manage, and troubleshoot Group Policy implementations. GPMC also enables automation of Group Policy operations via scripting. GPMC can be used to manage Windows Server 2003-based and Windows 2000–based Group Policy implementations. Key enhancements delivered via GPMC include:

• A unified graphical user interface (GUI) that makes Group Policy much easier to use. • Backup/restore of Group Policy objects (GPOs). • Import/export and copy/paste of GPOs and Windows Management Instrumentation

(WMI) filters. • Simplified management of Group Policy-related security. • HTML reporting for GPO settings and Resultant Set of Policy (RSoP) data. • Scripting of Group Policy related tasks that are exposed within this tool (not scripting

of settings within a GPO). Prior to the availability of GPMC, administrators were required to use several Microsoft tools to manage Group Policy. GPMC integrates the existing Group Policy functionality exposed in these tools into a single, unified console, along with the new capabilities listed above.




Resultant Set of Policies

Tracks final set of processed policy settings

RSoP ArchitectureLogging ModePlanning Mode

Supported platforms – XP and Server 2003

Resultant Set of Policies

In Windows XP and Windows Server 2003, a mechanism called Resultant Set of Policy (RSoP) allows you to track the final set of processed policy settings, and can also be used to track problems with the core Group Policy processing.

RSoP Architecture Resultant Set of Policy (RSoP) uses WMI to determine how policy settings are applied to users and computers. RSoP has two modes: logging mode and planning mode.

• Logging mode: Determines the resultant effect of policy settings that have been applied to an existing user and computer, based on a site, domain, and OU. Logging mode is available on Windows XP and later operating systems.

• Planning mode: Simulates the resulting effect of policy settings that are applied to a user and computer. Planning mode requires a Windows Server 2003 computer as a domain controller.

Windows 2000 Clients The RSoP data is only generated on Windows XP or Windows Server 2003. In Windows 2000, this same data is not as readily available, and will take more effort to find. However, the causes of a GPO application failure are often the same on Windows 2000 clients as they are on Windows XP and Windows Server 2003 clients, and so will still be relevant in troubleshooting. Even if the domain controllers are running Windows 2000, it is possible to use the RSoP tools from a domain-joined Windows XP workstation.




RSoP Tools

A primary resource for troubleshooting

Generating RSoP dataGPMCHelp and support

Report contents

GPResult

RSoP Tools

Generating RSoP Data For Group Policy to work, the administrator must ensure that:

• The underlying infrastructure is in place to support delivery of GPOs to the client. • The user and computer are appropriately targeted to receive the intended GPOs. • Group Policy processing puts the correct GPOs into effect. A Group Policy Results report is the primary resource for troubleshooting, so when investigating a problem, the administrator, where possible, should generate an RSoP report for the user and computer combination that is encountering the problem. The sections of the report contain the information you can often use to find the cause of the problem, and can help you find answers to the following three basic questions to assist in troubleshooting:

• Was the GPO applied to the client? The Summary tab shows this information. • Is the policy setting listed in GPMC Results? The Settings tab shows this information. • Is the GPO listed as Denied in GPO Results? The Summary tab shows this

information.

Note: This information is taken from several sources, including the “Fixing Core Group Policy problems,” topic on Microsoft’s Technet website. Those Technet articles and white papers use flowcharts with this information that can also help guide you to the root cause of a problem.




The GPMC includes RSoP features integrated directly into the tool, and so is the recommended tool for gathering RSoP data, In GPMC, RSoP logging mode is referred to as Group Policy Results; planning mode is referred to as Group Policy Modeling.

Other methods of gathering RSoP data include:

• The Resultant Set of Policy snap-in and wizard can be run to gather RSoP logging data by using the Advanced System Information - Policy tool in the Help and Support Center.

• The GPresult command-line tool generates policy settings data that appears at the command line, or the output can be logged to a file for later analysis.

Operating RSoP in Logging Mode In environments in which all machines are Windows XP or higher, Planning Mode is most often used during troubleshooting, as this will report on what has actually been applied at the client. The data that is presented is the actual resultant set of policy data obtained from the target computer. It is not possible to get Group Policy Results data for a Windows 2000 computer. (However, with Group Policy Modeling, you can simulate the RSoP data.)

To generate an RSoP report with GPMC:

1. Open Group Policy Management.

2. In the console tree, double-click the forest in which you want to create a Group Policy Results query, right-click Group Policy Results, and then click Group Policy Results Wizard.

3. In the Group Policy Results Wizard, click Next, and then enter the appropriate information.

4. After completing the wizard, click Finish.

When you choose the logging mode of operation from the second wizard screen that appears when you are loading the Resultant Set of Policy snap-in, you must choose whether you want to log computer policy settings, user policy settings, or both.

If you choose to log computer policy settings, you can extract information about the local computer or a remote computer. If you want to log access information about a remote computer, you must have WMI remote access privilege to the remote computer’s WMI repository.

When logging user policy settings, you can choose to either log the current user or another user. To log another user’s policy settings, that user must have logged on to the computer that is serving as the target of the RSoP logging data.




RSoP Reports Each Group Policy Results query is represented by a node in the tree view under the Group Policy Results container. Each node has three tabs:

• Summary: This is analogous to the information shown for the corresponding tab on a Group Policy Modeling node. In particular, this page shows the component status for the various Group Policy extensions. This information tells you whether there were any issues with a particular extension; it is a good place to begin troubleshooting.

• Settings: This is analogous to the information shown for the corresponding tab on a Group Policy Modeling node.

• Events: This tab shows all policy-related events from the target computer. Note that to gather this data, the user performing the query must have access to remotely view the event log. By default, this access is granted to all users on Windows XP, but not on Windows Server 2003. This data is useful for troubleshooting Group Policy issues. For example if the summary report indicates that a particular Group Policy component failed to process, you might be able to uncover the reason by looking for errors and warnings in the event log.

GPresult The GPresult command-line tool runs in logging mode. Just as the Resultant Set of Policy snap-in and wizard do, this tool depends on the RSoP namespace, classes, and methods in the WMI repository to display policy settings data. The advantage of using this tool is that policy settings are displayed at the command line or logged to a file. There is no need to launch a snap-in or run a wizard. By placing Preset commands in batch files, you can quickly and repeatedly extract logging mode data.

Running Preset without any parameters returns logging mode data about the current user and computer. However, with appropriate access to remote WMI repositories, you can run this command to extract RSoP data from other computers on the network.

Note: Preset was a Windows 2000 Resource Kit Tool but since Windows XP shipped, it has been included as part of the standard Windows XP install and has been enhanced to query RSoP data.




Replication Convergence

Group Policy depends on both Active Directory replication and FRSChanges must be propagated to all domain controllersGPOTool can check for consistency

Replication Convergence

Troubleshooting After a change has been made on one domain controller, there can be a lag time before the change is replicated to all other domain controllers.

Until changes to a GPO have been replicated to the domain controller a client is accessing, that client will receive the earlier version of the GPO during Group Policy refresh. If you suspect both replication and Group Policy refresh issues, address the replication issue first. Then refresh Group Policy at the client.

Changes to the OU memberships of computers and users also need to be replicated, before they can be reflected in Group Policy application at the client.

In general, it is best to use the same domain controller for all GPO editing or to agree to a process—such as delegated administration of GPOs—to minimize the likelihood of the same GPO being edited on different domain controllers. If changes are made to the same GPO at two different domain controllers, the last change wins. Also, if you delegate control of a specific GPO to a user group, members of that group might be unable to perform the delegated tasks until the permissions have been replicated to their domain controller.

There are several options for troubleshooting replication issues:

• The Group Policy container and Group Policy template are each assigned version numbers that are incremented when the GPO is modified. Use Gpotool.exe to verify that the versions are synchronized.




• Use Event Viewer to examine the Directory Service event log on the domain controller. Active Directory replication errors will appear with source=KCC.

• Use Event Viewer to examine the File Replication Service event log on the domain controller. FRS errors will appear with source=NTFRS.

• Verify that the SYSVOL share exists on the domain controller. You should be able to find \\domain_controller_name\SYSVOL, where domain_controller_name is the fully qualified domain name (not the NetBIOS name) of the domain controller.

• To troubleshoot Active Directory replication issues, use Replmon.exe and the other Active Directory support tools that ship with Windows Server 2003. These are listed under “Active Directory support tools” in the Help and Support Center for Windows Server 2003.

• You can use Gpotool.exe to identify problems related to domain controller health, including Active Directory replication and FRS issues.

• To troubleshoot file replication issues, check the status of the Directory File Service links and targets as described under “To check status of a DFS root, DFS link, or target” in the Help and Support Center for Windows Server 2003. Group Policy requires Directory File Service.

• You can use the Sonar.exe tool to check the health of the SYSVOL share.




GPOTool

Checks GPO consistency across DCs

Compares Version Numbers

Part of Windows Server 2003 Resource Kit

GPOTool

GPOTool.EXE GPOTool is a command-line tool that allows administrators to check Group Policy object (GPO) stability and to monitor policy replication. GPOTool can browse GPOs and check for GPO consistency within and across domains. This tool also displays information about GPOs, including properties that cannot be accessed through Group Policy Object Editor.

GPOTool reads mandatory and optional directory services properties (version, friendly name, extension GUIDs, and SYSVOL data [Gpt.ini]), compares directory services and SYSVOL version numbers, and performs other consistency checks.

A command-line option can be set to search GPOs based on friendly name or GUID. A partial match is also supported for both name and GUID.

By default, all available domain controllers in the domain will be used; this can be overwritten by using the /dc: parameter.

File Required GPOTool.exe




GPO Refresh

At computer startup, Group Policy is refreshed and computer settings are applied

User settings are applied when a user logs on

Refresh intervals DCs every 5 minutesAll other computers every 90 to 120 minutes

GPUpdate

Group Policy Refresh

Group Policy refresh refers to the retrieval of GPOs by a client. During Group Policy refresh, the client contacts an available domain controller to see if any of the GPOs have changed. If any GPOs have changed, the domain controller provides a list of all the appropriate GPOs, regardless of whether their version numbers have actually changed.

By default, GPOs are processed by CSEs at the computer only if the version number of at least one GPO has changed on the domain controller that the computer is accessing. You can use policy settings to change this behavior. (Some CSEs process unchanged GPOs if the user’s group membership has changed.)

Group Policy is refreshed, and computer and user settings are applied, in the following instances:

• User settings are applied when a user logs on. • At computer startup, Group Policy is refreshed, and computer settings are applied. • When GPUpdate is run at the client computer (secedit /refreshpolicy on Windows

2000 clients). • At the refresh interval, if one is configured at that computer. By default, domain

controllers are refreshed every 5 minutes, and all other computers are refreshed every 90 to 120 minutes (90 minutes, with a random factor of plus 30 minutes).

Troubleshooting • Replication and Group Policy refresh can both manifest of instances of lag-time

issues: the system is working properly, but changes have not yet appeared at the




client. Ensure that changes to the policy have replicated to the DC the client is accessing.

• Force a policy refresh by having the user log off and log on (user settings only), restart the computer, or use GPUpdate.

• To see the last time the GPOs from the computer’s OU were processed, look on the Summary tab of the Group Policy Results report in GPMC, under Computer Configuration Summary, and then under General.

• To see the last time the GPOs from the user’s OU were processed, look on the Summary tab of the Group Policy Results report in GPMC, under User Configuration Summary, and then under General.

• To collect Group Policy refresh information from clients and store that information at a central location, use Gpmonitor.exe. This tool is included in the Windows Server 2003 Deployment Kit.

Note: Remember that some types of settings, such as Folder Redirection, Roaming Profiles, and Software Installation, can be applied only during logon. If these settings are received when Group Policy is refreshed, the settings are evaluated, but they are not applied until the next time the user logs on.

If the computer is running Windows XP, and these settings first reach the computer during logon, they might not be applied until the next time the user logs on. For some extensions, it might take two or three logons for the settings to be applied.

If a user or computer’s OU is modified, simply refreshing policy will not suffice to apply the updated policies to the user or computer. The security principal’s context will need to be updated, which requires a logoff/logon for the user, a reboot for the computer, or the standard scheduled session update.




GPUpdate

Can force policy updates

Switches/Target:{Computer | User}/Force/Wait/Logoff/Boot

GPUpdate

After making changes to group policies, you might want the changes to be applied immediately, without waiting for the default update interval or without restarting the computer. To make this update, at a command prompt, run the Gpupdate.exe utility. (GPUpdate replaces the /refreshpolicy switch in the command-line tool, Secedit.exe, in Microsoft Windows 2000.)

The switches that can be used with GPUpdate include:

/Target:{Computer | User} This switch can be used to specify that only user or computer policy settings are updated. If no switch is used, by default both user and computer policy settings are updated.

/Force This switch reapplies all policy settings. By default, only the policy settings that have changed are applied.

/Wait:{value} This switch enables you to set the number of seconds that you have to wait for any policy processing to finish. The default value is 600 seconds. The value 0 means that you do not have to wait. The value -1 means that you have to wait indefinitely. When the time limit is exceeded, the command prompt returns, but the policy processing continues.

/Logoff This switch can cause a session to log off from the computer after the Group Policy settings have been updated. This behavior is required for those Group Policy client




computer extensions that do not process policy on a background update cycle, but are able to process policy when a user logs on to the computer. Two examples of such behavior can be observed with the user-targeted Software Installation and Folder Redirection features. This switch does not have an effect if extensions have not been called that require you to log off from the computer.

/Boot This switch can cause a restart of a computer after the Group Policy settings are updated. This behavior is required for those Group Policy client extensions that do not process policy on a background update cycle, but are able to process policy at Startup. An example of this behavior is observed with the computer-targeted Software Installation feature. This switch does not have an effect if extensions have not been called that require a restart of your computer.




Network Connectivity and Slow Links

GPOs cannot be delivered without connectivityDNSICMP

Slow links500 kilobits per second or less by defaultSecurity settings and Administrative Template always appliedScripts may time out

Network Connectivity and Slow Links

Network Connectivity Issues Obviously, Group Policy cannot be delivered to clients who are not connected to the network. In this case, the user can log on with cached credentials, and the last set of policies that the computer received will be applied. This is relevant to a user who logs on to a corporate network through a virtual private network (VPN) connection. In this scenario, the usual application of Group Policy does not occur, because the user is already logged on to the computer before the VPN connection is established. One way to ensure that the normal Group Policy processing occurs at logon is by using the option to connect to a remote network through the initial logon prompt.

Network connectivity can also be the root cause of replication problems.

Troubleshooting Basic Connectivity • Ping: Check system event logs on the client computer (look for failed access

attempts). You can also use the ping or netdiag commands to test network connectivity.

• Check DNS: Ping the computer using the NetBIOS name. Ping the computer again using the fully qualified domain name of the target computer. If the first ping works but the second does not, then there is probably a DNS problem. Use Netdiag.exe to research the problem further.

• Check ICMP: Internet Control Message Protocol (ICMP) is used to detect a slow link when the client initially connects to a domain controller, and, therefore, is required for Group Policy. ICMP must also be enabled if a firewall is in use. By




default, the packet size used for slow link detection is 2048 bytes. Routers and firewalls must also support this packet size, to ensure that slow link detection can succeed. You can verify that ICMP is not working by looking at the userenv log, which would indicate that pinging the computer fails.

Troubleshooting Slow Links By default, Group Policy defines a slow link as 500 kilobits per second or less. You can change this setting on a per-policy basis in the computer configuration, the user configuration, or both. The setting is in the Administrative Templates; look under System, and then under Group Policy.

• When the computer is connected to the network over a slow link, Security settings and Administrative Template settings are always applied.

• By default, Software Installation, scripts, and Folder Redirection settings are not applied over a slow link.

• Group Policy is not processed if the user connects to the network over a slow link with cached credentials. To ensure that Group Policy is applied over a slow link, the user must select the Logon using dialup connection check box while using the Logon dialog box.

• Even if Group Policy settings are configured to run scripts over slow links, the scripts might be executed so slowly that they exceed the configured time-out period. In this case the script will fail to complete, and a UserInit event will be posted.




DCGPOFIX

Restores Default GPOs to a pristine state

Creates settings based Dcpromo operations

Default Domain and Default Domains Controllers GPOs

For Disaster Recovery Only

DCGPOFIX

In the case in which either or both of the default GPOs are deleted, the Dcgpofix tool can re-create the two default Group Policy objects (GPOs) and create their settings, based on the operations that are performed only during Dcpromo. When you run Dcgpofix, you will lose any changes that have been made to these Group Policy objects since installation, including those made by Exchange Server setup or Systems Management Server (SMS), and any other custom changes.

Note: The Dcgpofix tool is a disaster recovery tool that will restore your environment to a functional state, only. It should not be used as a replacement for a backup strategy using GPMC, but only when a GPO backup for the Default Domain Policy and Default Domain Controller Policy does not exist. Microsoft recommends that as soon as you run Dcgpofix, you review the security settings in these GPOs and manually adjust the security settings to suit your requirements.

Syntax dcgpofix [/ignoreschema][/target: {domain | dc | both}]

Parameters • /ignoreschema

This switch is optional. By default Dcgpofix checks the Active Directory schema version number to ensure compatibility between the version of Dcgpofix you are using and the Active Directory schema configuration. If the versions are not




compatible, Dcgpofix.exe will not run. The /ignoreschema switch will make the tool ignore the Active Directory schema version number, but is not recommended.

• /target: {domain | dc | both} This switch is optional. Specifies the target domain, domain controller, or both. If you do not specify /target, dcgpofix uses both by default.

Dcgpofix.exe is located in the C:\Windows\System32 folder. You must be a domain or enterprise administrator to use this tool and it will only run on servers running the Windows Server 2003 family.

The original documentation for the Dcgpofix.exe tool incorrectly indicates that the Dcgpofix tool will restore security settings in the Default Domain Controller Policy to the same state that they were in immediately after Dcpromo successfully completed. This is not the case.

Table 3, below, lists differences in security settings in the Default Domain Controller Policy, after you run the Dcgpofix tool, and the security settings in a new installation of Windows Server 2003, after you run Dcpromo. Microsoft recommends that you adjust these security settings to match the requirements in your environment, after you run the Dcgpofix tool.

Setting in Default Domain Controller Policy

Value after running DCPromo on cleanly installed Windows Server 2003 system

Value after running DCGPOFIX

Audit Account Management Success No Auditing

Audit Directory Service Access

Success No Auditing

Audit Policy Change Success No Auditing

Audit System Events Success No Auditing

Create Global Objects Not defined SERVICE, Administrators

Deny access to computer from network

SUPPORT_388945a0 (Empty)

Deny logon locally SUPPORT_388945a0 (Empty)

Impersonate a client after authentication

Not defined SERVICE, Administrators

Load and unload device drivers

Administrators, Print Operators Administrators

Log on as a batch job LOCAL SERVICE, SUPPORT_388945a0 (Empty)

Log on as a service NETWORK SERVICE (Empty)

Shut down the system Administrators, Backup Operators, Service Operators, Print Operators

Account Operators, Administrators, Backup Operators, Service Operators, Print Operators

Table 3: Differences in Default Domain Controller Policy after running Dcgpofix vs. a new installation of

Windows Server 2003 after Dcpromo




The following settings will change after you run the Dcgpofix tool:

• AuditAccountManage • AuditDSAccess • AuditPolicyChange • AuditSystemEvents • SeCreateGlobalPrivilege • SeImpersonatePrivilege • SeLoadDriverPrivilege • SeShutdownPrivilege Based on configuration options, the following settings may also change

• SeBatchLogonRight (only LOCAL SERVICE, not the SUPPORT_388945a0 account)

• SeServiceLogonRight




User Environment Debug Logging

Userenv.logCollect GPOs using distinguished nameProcess GPOs to identify client-side extensions requiredProcess each extension to alter user or computer settings

User Environment Debug Logging

Because userenv tracks the Group Policy engine and registry-based Group Policy, it is the most frequently used log file for Group Policy troubleshooting. Userenv is especially useful in a Windows 2000 environment, because you don’t have the benefit of using Resultant Set of Policy (RSoP). Most of the questions that RSoP answers are in the userenv log.

To generate userenv.log you need to first enable verbose logging.

1. Log on to the client computer as the administrator, and run Regedit.

2. Locate the following key: HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Winlogon.

3. Right-click Winlogon, select New, and then click DWORD Value.

4. Enter the following name for the DWORD Value: UserEnvDebugLevel.

5. Enter 30002 as the hexadecimal value. This writes the userenv into userenv.log, located in the \%windir%\debug directory.

6. Run “gpupdate /force” to ensure a full listing of total Group Policy processing.




Note: The userenv logs entries pertaining to profiles, Group Policy core processing, and registry (.adm) processing on the client. The entries pertaining to profiles are intermingled with the Group Policy entries and not easily distinguished from them.

There are three main parts of the userenv.log file that are relevant for troubleshooting Group Policy. Each of these three parts is created independently for the user and computer account.

• The first section locates the account in the Active Directory. Using the user or computer distinguished name, it creates a list of GPOs that are linked to the site, domain, and OU.

• Once the list of GPOs is collected, each GPO is examined for extensions that are required to process the policy settings. Any failures that occurred because the policy is disabled, filtered, or empty are shown here, in the order that they are processed. This section is especially useful for troubleshooting failures in which a policy was not applied.

• After this section is completed, each extension appears and makes the changes to the environment. Included here are details, such as registry values that are altered, along with a timestamp for each alteration that is useful for troubleshooting slow application of GPOs.




Troubleshooting Review

Scope of ManagementGPO InheritanceGroup Policy RefreshSecurity FilteringWMI FilteringDisabled LinkReplication

Asynchronous Application GPOsClient-Side Extension IssueLoopback ProcessingLack of OS SupportInaccessible or Empty GPOs

Troubleshooting Review

As the information in this module illustrates, there are many components and dependencies involved in the successful application of Group Policy in Active Directory, and a failure or incorrect configuration in any of those areas can cause problems. A summary of potential causes of GPO problems is listed below:

Scope of Management One of the most common causes of a GPO not being applied to a user or computer is that the GPO is not linked to a site, domain, or OU of which the computer or user is a member. GPOs are delivered to clients based on the site, domain, and OU memberships of the computer and the logged-on user. Group memberships are only used to further restrict applications of the GPO.

GPO Inheritance Although GPOs have been applied, and the correct policy setting is listed, Group Policy inheritance might result in an unexpected GPO winning a conflict and providing a different value from the one that is expected. In an RSoP report, the settings are nested by source and type; click Show on the nested rows to expose the settings, and then look at the Winning GPO column to discover which GPO defines the value for the policy setting.

Group Policy Refresh If Group Policy refresh has not occurred since a GPO was modified and replicated, a newly added setting will not be applied. After the changes to the GPO have been




replicated to the client’s domain controller, they need to be downloaded by the client. This occurs during Group Policy refresh. You can either wait for a background refresh or force the refresh by running GPUpdate, by logging off/on (for user configuration), or by restarting the computer (for computer configuration).

Security Filtering The user or computer does not have the user rights assigned for the GPO. The required privileges are Read and Apply Group Policy. Alternatively, a GPO might be associated with a Deny ACE, which overrides any other privileges granted to the user or computer. GPMC and RSoP reports should be consulted in troubleshooting these issues.

WMI Filtering A WMI filter applied to a GPO is, essentially, a Boolean (true/false) decision as to whether the entire GPO should be applied to the client computer. The filter is evaluated at the client when the GPO is applied. Based on the embedded WMI Query Language (WQL) query, the GPO will either be enabled or disabled. GPMC and RSoP reports should be consulted in troubleshooting these issues.

Disabled Link There is a link to the GPO from a site, domain, or OU in the hierarchy of the user or computer, but that link has been explicitly disabled. You can quickly scan the navigation pane of GPMC for disabled links, to see if this is the case.

Replication After a setting is added to a GPO, that change must be replicated throughout the network. If the setting is specified in the GPO, but is not listed in the Group Policy Results report on the client, or, if you expected the winning GPO to supply a value for the setting other than the value that was actually applied, it might be that the setting was recently added to the GPO, but the change has not yet been replicated to the domain controller that supplied the GPO to the client. Check the version numbers of the policies in GPMC, and possibly use GPOTool to check for consistency in versions across DCs and across the GPC and GPT.

Asynchronous Application of Group Policy A computer running Windows XP performs startup and logon tasks asynchronously, thereby improving startup and logon performance. For example, a user does not wait for the network to be fully initialized before logging on. Provided that a user has logged on previously, a computer running Windows XP uses cached credentials. This is necessary because the network might not be fully initialized.

If the problem is with a setting that can only be applied during startup or logon, it might have been detected during asynchronous Group Policy processing, for example, as part of a Group Policy refresh, or during the asynchronous processing used for logon optimization in Windows XP.




Client-Side Extension Issue After the core Group Policy engine has completed initial processing of the GPOs, it passes specific settings to CSEs to process. If the setting is listed, but the value is wrong, or the behavior on the client does not reflect the setting value, the failure might have occurred after this setting was passed to a CSE to process. For example, even if a Folder Redirection setting has been successfully passed to the Folder Redirection CSE, the CSE might not be able to complete processing for the setting. Check the Events tab in an RSoP report, or scan the userenv.log for problems.

Loopback Processing Loopback processing is a way to enforce a set of user settings at a computer, regardless of who logs on at that computer. If not used properly, Loopback can cause unpredictable results.

To determine whether loopback processing is in effect, look for the User Group Policy loopback processing mode setting, on the Settings tab of the report, under Computer Configuration \Administrative Templates \System/Group Policy.

Lack of Operating System Support Some policy settings are supported on only certain operating systems or require a minimum service pack to be applied. (For example, Software Restriction Policies are available only on Windows XP and later clients.) When a GPO delivers a policy setting to a client computer that does not support that setting, the operating system ignores the setting.

Inaccessible GPO There is a link to the GPO, but the GPO cannot be accessed. There are several possible reasons for this:

• The permissions on the GPO or on folders in the path to the Group Policy template are insufficient for it to be accessed and read. If this situation occurs, the Component Status section of the Group Policy Results report will indicate Failure for the component Group Policy Infrastructure.

• The GPO might have been deleted, but the link to it remains for some reason (such as replication lag).

• Network connectivity problems might prevent access to the GPO. Group Policy requires a reliable networking infrastructure to ensure appropriate communication between the client computer and a domain controller. This includes TCP/IP, DNS, and other dependent technologies.

• The client is unable to contact any domain controller.

Empty GPO A GPO will not be read if it has no settings, and it will be listed in an RSoP report as “Access Denied.” This occurs when an administrator has configured a GPO and linked to it, but has not set any policy settings within the GPO.

Outgoing Assessment 451



Outgoing Assessment

452 Outgoing Assessment




Outgoing Assessment 453



Outgoing Assessment

We are nearing the end of this WorkshopPLUS course, and it’s time for the Outgoing Assessment. (You of course remember the Incoming Assessment!)

For a refresher on why the Incoming Assessment and Outgoing Assessments are so important – and what the benefits are to you, to your management, and to Microsoft – turn to the “Incoming Assessment” section near the front of this Workbook. In case you don’t look at the “Incoming Assessment” section now, we want to be sure that you are clear about the benefits to you of the Outgoing Assessment.

Benefits to you:

• You get an opportunity to see how much you’ve learned – a measure of improvement. • Students are not always aware of how much they’ve learned. • Students are happily surprised – even amazed – at how much they learn and

how much their scores improve. • You finish the workshop feeling really good because:

• You know that your hard work was worth it. • You feel more confident than ever in your ability to perform well on the job.

• The subject matter experts who created this assessment believe that it covers the key points that all students should learn from this workshop. On the last day, after the Outgoing Assessment, the Trainer will review each question and answer, making sure that you understand all the key concepts.

Note: Your results are anonymous. Your Assessment form has a field for Student Number but no place for recording your name.

Privacy / Anonymity

We want to again mention the steps being taken to the ensure privacy and anonymity of your results:

• You record only your Student Number on the Assessment form…not your name. • Some time after the workshop, the scores from the class will be entered into a

database. The person entering the scores will not know who took a given Assessment because the forms have only Student Numbers on them. In addition, the Student Numbers will not be entered: instead, a made-up code number will be entered. Assessment forms will then meet with secure and environment-friendly total destruction.

• No one will see or have access to your individual Assessment scores – not your manager, not others in your company, and not any Microsoft-employed Technical Account Managers, Engagement Managers, or Support Professionals.

454 Outgoing Assessment



• Only aggregated class-average results might be shared with your management.

Note: Copies of your Action Plans will be forwarded to your Technical Account Manager (TAM) who will follow up with your IT management.

Action Planning 455



Action Planning

456 Action Planning






Microsoft, Active Directory, Windows, Windows NT, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.


Action Planning 457



Action Planning In this session, you will apply the knowledge and skills learned during this WorkshopPLUS course to create actionable plans tailored specifically, by you, for your workplace.

Specifically, you will:

• Identify situations and/or opportunities that will benefit from your enhanced knowledge and skill. For example: • Problem situations that need to be addressed. • Proactive opportunities that could improve operations in your IT environment.

• Create actionable plans to improve identified situations and/or execute on proactive opportunities. For each plan, you will work through these steps: • Define the situation and/or opportunity. • Create a realistic, achievable, and detailed plan of action. • Identify all needed skills (in addition to your own). • Identify any risks to the plan and/or dependencies. • Define success metrics. • Estimate hours required to complete the plan. • Estimate the completion schedule.

Your final Action Plans will be documented using the WorkshopPLUS Action Plan Worksheet, which you will take back to the workplace for execution.

Note: Copies of your Action Plans will be forwarded to your Technical Account Manager (TAM) who will follow up with your IT management.

Action Plan Worksheet

Released: January 2006 Copyright © 2006 by Microsoft Corporation Microsoft Confidential

WorkshopPLUS Title: Active Directory: Troubleshooting Instructor: John Doe Plan #: 1 of 5

Name and Company: Contoso TAM: John Smith Date: 2 July 2006

Existing Situation or Proactive Opportunity Action Plan • Group policy not applying as expected

• Some get policy, some do not.

• Sometimes get it when they shouldn’t and vice versa

Skills Needed • CPO Troubleshooting skills

• Networking

1. Test or retest in lab to try to reproduce

2. RSOP report to check for incorrect configurations

3. Contact test users to help isolate problem

4. Turn on userenv

5. Remove redeploy policy

6. Review OU and delegation structure

7. AD and FRS - GPOTool

Success Metrics Risks to Plan and/or Dependencies All users apply policy successfully.

Estimated Hours Estimated Completion Date 9 Hours 1-2 days

• Depend on desktop personnel, Helpdesk, Users

• User disruptions – risk

• Change to policy causes (worsen problem)

Definitions on Back



Definitions

Existing Situation or Proactive Opportunity Transfer the text from the same column in the Prioritization Worksheet. Elaborate on the situation or opportunity as needed.

Action Plan What specific and detailed actions must be executed to address this situation or opportunity?

Skills Needed What types of people-resources are required to accomplish the Action Plan? Do you need a DBA, Messaging, or Windows expert (etc.)?

Success Metrics How will you measure successful execution of the Action Plan? Are there specific indicators? For example, will this plan:

• Reduce helpdesk calls by a specific percentage or number (30% or 100/month)?

• Reduce unplanned downtime for a specific service by an estimated percentage or time unit (30% or 12 hours/month)?

• Reduce the amount of time (minutes/hours) it takes an administrator to perform a specific repetitive task? If so, how much per month?

• Increase production of <widgets> by a certain percentage or number? If so, how much per week/month?

• Unblock or accelerate a specific effort/project that will deliver new functionality or enhance worker productivity?

Risks to Plan and/or Dependencies What could derail this plan? Are there any external dependencies that could hinder successful execution?

Estimated Hours How many labor-hours will it take to successfully accomplish the plan?

Estimated Completion Date By what future date will this Action Plan be completed?





Existing Situation or Proactive Opportunity Action Plan • User logons are failing when the WAN links go offline.

• Once users use cached logons, they cannot access local resources (file server and printers).

• Remote locations do not have GC’s due to slow WAN links and concerns about size of Domains.

65 sites • 45 locations with 128k lines + 10 locations with 56K lines

• 45 DC’s in remote offices

• 4 Domains

1 GB DIT per domain GC=3GB Skills Needed • Network Engineers

• Decision-makers for the WAN’s

• AD Engineer to promote DCs

1. Evaluate the Speeds of the WAN links and determine the impact of adding additional amounts of replication and database size.

2. Promote DC’s on faster links to GC’s.

3. Increase bandwidth for slower sites.

Success Metrics Risks to Plan and/or Dependencies Redundancy provided for WAN down situations.

Estimated Hours Estimated Completion Date

• WAN link saturation

• DCs needing reboot after GC promotion.

• Exchange Servers and clients required reboots.



320 hours Estimated 90 days to implement

Definitions on Back



Definitions














Released: January 2006 Copyri ation Microsoft Confidential

ght © 2006 by Microsoft Corpor



Existing Situation or Proactive Opportunity Action Plan • Users experience mixed results accessing resources after completed

helpdesk ticket changes for 1-2 hours (Group Membership)

• Replication is delayed 1-2 hours between offices and changes take a long time to happen. Group membership get overwritten by simultaneous admins.

Skills Needed • AD Engineer

• Network Saturation Reporting

1. Change replication intervals to 15 minutes.

2. Evaluate change based replication for faster sites.

3. Upgrade to 2003 FFL to enable LVR.

Success Metrics Risks to Plan and/or Dependencies Changes happen within 30 minutes of ticket close (HelpDesk)

Estimated Hours Estimated Completion Date 40 hours 3 weeks

• Over usage of WAN links.

• Impact to users.

• Initial change delays.

Definitions on Back



Definitions














Released: January 2006 Copyri ation Microsoft Confidential

ght © 2006 by Microsoft Corpor



Existing Situation or Proactive Opportunity Action Plan • Improve login and address book lookup performance at remote sites

• Give clients the ability to login at remote sites and view the exchange address book even when the WAN link is down.

Skills Needed • AD

• Infrastructure

Make all DCs become GCs through a manual process

Success Metrics Risks to Plan and/or Dependencies Clients at remote sites can login while WAN is down or there is a reduction of WAN bandwidth.

Estimated Hours Estimated Completion Date 2 hours 3 weeks

Ensure this is not going to cause any problems once the GC is enabled for DCs in sites that have slow links due to GC replication traffic. This may need to be done during off peak hours.

Definitions on Back



Definitions

















Existing Situation or Proactive Opportunity Action Plan • DNS not working.

• DC records are not consistently available.

• AD is reliant on solid DNS infrastructure, but NetID has been problematic and managed by another group who does not want to work well with the AD administrators.

Skills Needed • AD

• DNS

• NetID

GOAL: To migrate from NetID to MS DNS with AD integrated zones on all DCs in the forest. Tentative steps would include: 1. Uninstall NetID on 1 DC in domain

2. Pull secondaries to MS DNS forests

3. Connect to AD integrated

4. Uninstall NetID and install MS DNS

5. Configure forwarders

Repeat for each domain

Success Metrics Risks to Plan and/or Dependencies • All NetID servers removed from the environment.

• All DNS records properly registered for all DCs and clients and replicating to completely to all DNS servers.

• AD no longer experiences unexpected stops in replication.

Estimated Hours Estimated Completion Date • Plan: 12-20 hours

• Test: 20 hours

3 months

Dependencies: • Start with root domain, then child domains, then grandchild domains.

Risks: • Migration might not work. (low, if properly tested)

• AD Replication might break. (extremely low, but needs to list)

• Migration should be performed after-hours local to the machines being migrated.

• Migration should be performed by Enterprise Admins in the US and not



• Implement: 30 hours the local subsidiary’s admins. (Consistency and skill set required)

Definitions on Back



Definitions