Transcript Slide 1

Advanced Active
Directory Design
and Troubleshooting
Ed Whittington
Principal Software Engineer
HP Business Critical Call Center
Oct. 06, 2002
Topics
Troubleshooting Basics
Troubleshooting Tools
DNS Troubleshooting
Troubleshooting Replication
Troubleshooting DCPromo
Troubleshooting FRS Replication and DFS
Troubleshooting Group Policy
Troubleshooting in .NET
Troubleshooting Basics
Basic Troubleshooting Steps
Define the problem (make sure there is one)
•
What’s failing?
•
Client authentication and security
•
Group policy application.
•
Replication.
•
Name resolution.
•
Errors and warnings in event logs.
•
FRS/DFS
•
Application
•
How is the problem replicated?
•
One or multiple machines?
•
Narrow the variables
Basic Troubleshooting Steps
MPSReports_DS (from HP or Microsoft)
Get the Log files
• Event logs
– http://www.eventid.net
• %windir%\debug\usermode\Userenv.log
• %windir%\debug\DCPromo*.log
Turn on Verbose Logging
Run NetDiag, DCDiag (verbose)
Get status report from Replication Monitor.
Basic Troubleshooting Steps
•
Check DNS.
•
Resolver on ALL computers.
•
Name Server Properties (forwarding, etc.).
•
Monitoring tab – test name resolution.
•
Nslookup, ping to test name resolution.
•
Ping SRV records.
•
Check Replication.
•
Force replication.
•
Identify who isn’t replicating to whom.
•
Outbound vs. inbound.
Basic Troubleshooting Steps
If all else fails, try demoting.
•
Really cleans up a lot of problems… If problem is isolated
to one DC.
•
If replication isn’t working, demotion won’t work.
•
Reinstall to remove the AD, then clean up AD
•
•
Ntdsutil to remove server object.
•
Delete server object from Sites & Services.
•
Delete FRS server object from System container.
Can manually demote a DC.
Manual Demotion of a DC
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet
\Control\ProductOptions
Product Type=
•
–
ServerNT (when the computer is a Member Server)
–
LanManNT (when the computer is a Domain Controller)
Change from LanManNT to ServerNT
It’s now a “dirty” member server
Clean server objects from the AD (Ntdsutil)
Clean up the disk and Registry
1.
Create new Forward Lookup Zone – Bogus.com
2.
Run DCpromo – create new forest for Bogus.com
3.
Demote and eliminate Bogus.com
4.
Wait for Replication
5.
Promote back into domain – use same name if desired
Tool in Windows .NET
Troubleshooting Tools
Gathering Information
Netdiag.exe
NETDIAG.EXE
/v - verbose – always turn this on.
/l - log – writes netdiag.log to default directory.
/d:domain controller – finds DC in domain.
/test: - runs only specified tests.
/skip: - skips specified tests.
Can’t execute remotely.
C:>netdiag /v /l
Netdiag.exe
Domain Controller Discovery
Bindings, IP address, Default Gateway tests
DNS tests
NBTstat and WINS ping
Netstat
Route
Trust
Kerberos
Dcdiag.exe
DCdiag /v
Domain controller functions of netdiag
More domain-specific
FSMO roles
Connectivity
Replications
Domain controller locator
Intersite “health”
Topology integrity
Nltest.exe
/server:servername
Sets default server
/dsgetdc:domainname Dsgetdcname API
[ /gc /timeserv /ldap ]
/dclist:domainname
Lists DCs in domain
/parentdomain
Lists parent domain
/dsgetsite
/dsgetsitecov
Lists site of server
Lists DC “covering” site
/dcname:domainname Lists PDC for domain
/dcpromo
Tests potential success of DCPromo
/whowill:domain user
Returns name of DC that
will authenticate user
Netdom.exe
/join
/add
/reset
/resetpwd
/query FSMO
/trust
NTDSUtil
• Built-in utility.
• Directly accesses Active Directory.
• Authoritative Restore.
– Can restore an older version of the AD and force it on all DCs to correct
variety of problems.
– Entire AD or single tree.
– Can’t restore the schema.
• FSMO Roles.
– List, Transfer, Seize roles.
– Better than UI – can manipulate all roles in forest and all domains from
one utility..
NTDSUtil
Metadata Cleanup
– Delete orphaned objects.
– Servers
– Domains
– The UI can and will lie to you! Don’t trust it.
Useful tool for listing contents of the AD
– Sites, domains, servers, FSMO role holders.
– Domains in site.
– Servers in domain, servers in site.
Q216364, Q216498, Q230306
Gpresult.exe
Run on client
Returns:
• Security group membership
• User and Computer policy info
• GPOs applied to each
• Registry settings set in the GPO
• Client-side extensions set
– Scripts applied
Remember
• Policy is cached – reboot / login to clear
• Note who authenticating server is
– Environmental Variable “logon server”
Much Improved in .NET!
GPOtool.exe
Run on domain controller.
Returns:
• Analysis of all GPOs in domain.
• GUID and friendly name of all GPOs.
• DS and Sysvol versions.
• Errors encountered.
Good group policy troubleshooting tool.
May take a long time to process (#GPOs)
ADSIedit.exe
GUI much like Users & Computers snap-in /Advanced features.
Graphical view of AD.
Like LDP.exe but:
• Easier to browse.
• Can modify attribute values
Don’t confuse with Users & Computers!
LDP.exe
Takes time to set up:
• Connect
• Bind
• View – Tree
• Enter DN to start (blank for default)
Exposes attributes quickly, easy to see.
Faster than ADSIedit – no GUI to traverse.
LDAP searches.
Can delete and modify, but not as easy as ADSIedit.
Can execute remotely.
DCPromo.log, DCPromoui.log
Located in %systemroot%\debug.
Logged every time dcpromo runs.
DCPromo.log
• Shorter.
• Appended (read bottom up).
DCPromoUI.log and DCPromoUI.xxxx.log
• Results of what is seen in the UI – longer.
• Find: Results of getdsdcname, DNS query, Time service sync,
authentication, replication, Site info.
• Error (0x0) = success – no error .
Error reporting different – read both logs.
Userenv.log
Located: %systemroot%\debug\usermode
User environment info:
• Group policy (registry)
• Client side extensions
– Scripts
– Security
Increase verbose logging (Q221833)
Take time – read and study and you may be surprised at what you
can find!
Additional User Mode Logs
Client-side extensions
• Registry see Q216357
HKLM\software\Microsoft\WindowsNT\currentversion\winlogon\ GPExtension
• Errors created in %windir%\debug\user mode
– Named after the .dll
– Scripts = Gptext.dll = gptext.log
– Folder Redirection = fdeploy.dll = fdeploy.log
– Security = scecli.dll = winlogon.log
– Q245422
– Produced automatically on error (except winlogon.log)
– Check User Mode directory for these files
• Invaluable in debugging. Use them!
Client Side Extensions (registry)
Windows .NET Troubleshooting
Tools
Remote Desktop Resource Redirection
Client Resources Available when using Terminal Services Remote Desktop
• File System – Local drives and Network drives on Local Machine available on
Remote machine
• Audio – Audio streams such as .wav and .mp3 files can be played through the client
sound system.
• Port – Applications have access to the serial and parallel ports
• Printer – The default local or network printer on the client becomes the defaultprinting device for the Remote Desktop.
• Clipboard – The Remote Desktop and client computer share a clipboard
• Terminal Services Virtual Channel Application Programming Interfaces (APIs)
are provided to extend client resource redirection for custom applications.
WMI
Computer management
Active Directory
• Provider: MicrosoftActiveDirectory
• Classes:
– Replication - See replprov.mof %windir%\system32
Trust health
• Provider: MicrosoftHealthMonitor
• Classes: see system32\wbem\trusthm.mof
DNS
• Provider: MicrosoftDNS
• Classes: system32\wbem\dnsprov.mof
Cluster
• MSCluster
Also look in CIM Studio in MSDN
WMIC Sample Commands
Look in %windir%\system32\wbem *.mof files for names of providers,
classes, etc.
Active Directory
• Provider: MicrosoftActiveDirectory
• wmic:/namespace: \\root\microsoftactivedirectory
PATH msad_replneighbor
(shows replication partners)
• wmic:/namespace:\\root\rsop\user path RSOP_GPO
(lists GPOs with User settings)
Admin Tool Improvements
Users and Computers snap-in
• Drag and drop.
• Multi-select and edit user objects.
• Heavily revised object picker.
Users and Computers, Sites and Services, DNS Snap-ins
• Saved queries.
• Viewing Saved DS, DNS, FRS eventlogs on non-DCs!
.NET Adminpak (only on XP)
Command Line Tools
GPresult
• Enhanced reporting
DCDiag
• dcdiag /test:DCPromo
Repadmin – enhanced reporting
Netdom – computername for DCrename
Others
Shipped on
• Service Pack 2 CD (install manually)
• .NET Server, AdvSvr CD
Windows .NET Improvement to NTDSUtil
Change Offline, DS Repair Mode Password While Online!
NTDSUtil
• Set DSRM Password (main menu)
Increases server up-time limited by password change interval in Win2K.
• (Had to reboot to DS Repair mode to change.)
• Q223301 (Win2K limit)
Cool error message!
Setting password failed.
WIN32 Error Code: 0x6ba
Error Message: The RPC server is unavailable.
See Microsoft Knowledge Base article Q271641 at
http://support.microsoft.com for more information.
Errors in Windows .NET
Kinder, Gentler and Report to Microsoft
Active Directory Load Balancing Tool
Does the job of branch office deployment.
•
KCC chooses BHS for connection objects – choose the same one.
•
Tool allows you to spread the load to other DCs in the site (that have
that NC).
•
ADLB tool modifies the Hub DC’s replication schedules to spread it
out over time.
•
Generates a log – like replmon’s status log.
•
For Deployments with hundreds of branch offices all
replicating to a single hub..
•
Tool=no benefit to sites with only one DC per domain.
Future: Graphical Replication Monitoring
Tool
Very much like ‘Age of Directories’
Ability to make configuration changes
Not in .NET - maybe Longhorn or Blackcomb?
Troubleshooting DNS
DNS Resolver Configuration
Win2K clients, servers point to Win2K DNS Name Server that
is SOA for their zone.
•
Don’t point to ISP, other Internal NS.
(even as “additional”.)
•
Keep it simple.
Win2K Name Servers forward to ISP or internal name server
hosting registered domain.
DNS Name Server Configuration Basics
•
•
Dynamic updates = Yes.
Active Directory Integrated Zone
• Select one “Primary”
• All other ADI Primary NS point to it for DNS
• Win2k Name Servers can:
• Forward to ISP or Internal NS.
• Use root hints (or modify root hints).
• Reverse Lookup Zones NOT required
• Needed only for tools - NSLookup
ADI Primary and Standard Secondary mixed
zone
•
•
Only a DC can host an ADI primary zone
Member Servers can host Secondary zone
• Synch off of an ADI Primary
ADI Primary
Secondary
Secondary
ADI Primary
ADI Primary
DNS Case Study
Forwarding
na.corp.net
sa.corp.net
eu.corp.net
na.corp.net
Secondary
zones
corp.net
sa.corp.net
eu.corp.net
DNS Case Study
corp.net
eu.corp.net
sa.corp.net
na.corp.net
na.corp.net
sa.corp.net
find
na.corp.net
eu.corp.net
With Conditional Forwarding Feature
In Windows .NET Server…
corp.net
na.corp.net
sa.corp.net
eu.corp.net
find
na.corp.net
Problem: SRV records only in Root domainLocation
of SRV:
w2k.net
corp.com
corp.com
PDC
GC
Cname
= Zone Xfer
= Forwarder
NA.w2k.net
EU.w2k.net
Solution: Delegate _msdcs zone
corp.com
w2k.net
_msdcs
Location
of SRV:
PDC
_msdcs
GC
_tcp
Cname
_sites
_udp
= Delegation
= Forwarder
NA.w2k.net
EU.w2k.net
DNS Hotfix
Symptom: Replication breaks
Configuration: Using Secondary Zones for root _msdcs at child
domains.
Problem: Serial Number of Secondary zone is higher than the
primary – zone transfers stop.
Hotfix Q304653
• The Serial Number Is Decremented in DNS When You Reboot
• Solved in .Net
DNS Troubleshooting Basics
•
•
Check DNS event log (and others).
Check Location of DNS servers.
• Usually want Name Server in remote sites.
• Check population of SRV records.
• _msdcs; _tcp; _udp; _sites
• Need Kerberos, LDAP records for each DC.
• Correct address, etc.
• Can delete, repopulate by restarting netlogon.
• Check Delegations – correct names, IP.
DNS Troubleshooting Basics
•
•
•
•
•
•
Use of Active Directory Integrated (ADI) zones.
• Put standard secondary zones on mbr svrs.
• Can clear problems by switching to Std Pri.
Ping DC by SRV record:
ping <guid>.site._msdcs.compaq.com.
Clear the server cache.
• Negative Caching problems.
Test – Server Properties – Monitoring tab.
Test – Ping names, NSLookup.
Troubleshooting AD
Replication
Replication Troubleshooting Tools
Event logs – Directory Services, System
Sites and Services snap-in
Age of Directories (AOD) – HP
Replication Monitor
Aelita Event Admin
NetPro Directory Analyzer
Command Line (Support Tools & Res Kit)
DCdiag, Netdiag
Repadmin.exe
Event Logs for Replication Troubleshooting
Directory Services Log
• 5778 - Subnets not mapped.
– Will break client’s “site awareness.”
• 1311 - serious - Not enough connectivity.
– Connectivity, traffic issue.
– Sites with DCs and no site links.
– Site topology incorrectly defined.
• DNS Lookup failure.
• 1772 – RPC Server is unavailable.
– Physical connectivity.
– DNS.
Event Logs for Replication Troubleshooting
System Log
• Netlogon errors
– Authentication
– Trusts
– Secure channel
• w32Time errors
– Kerberos authentication required for replication
– DCs must be no more than five minutes out of sync.
– Watch time zones!
Sites and Services Snap-in
Check for duplicate connection objects.
• KCC generating >1 connection between 2 DCs.
• Delete all connections and select “check replication topology”
option to regenerate them.
• If they come back, find out why.
– Usually a DNS problem.
• Breaks FRS and AD replication.
Sites and Services Snap-in
Check for sites with no DC’s…
• OK to have a site with no servers if you plan it that way.
• If there should be a server in that site, find it and move it there.
Make sure all subnets are mapped to correct sites.
• Keep up on IP addressing changes.
Sites and Services Snap-in
Make sure site links are correct.
• Link correct sites per design (need a drawing).
• Cost, schedule, replication frequency.
Force replication between DCs.
• All connections are inbound.
• Use “check replication topology.”
• Create new site, user named for the DC.
– Checks Configuration NC and Domain NC.
– Force Replication Between Replication Partners.
– On DC1 from DC2 and on DC2 from DC1.
Sites and Services Snap-in
• Validate inbound, outbound replication on all DCs.
– Create new site, user named for the DC.
– Checks Configuration NC and Domain NC.
– Wait for replication (don’t force it).
– Check each DC for copy of these users, sites.
DC1
DC3
DC2
User
Site
User
Site
User
Site
DC1
DC1
DC2
DC2
DC1
DC1
DC2
DC2
DC3
DC3
DC3
DC3
DC3
Check Cname DNS Records
•
In root _msdcs zone (only), alias record mapping DC’s
FQDN to its server GUID.

Only one record.
–
Delete duplicates.

Match GUID in alias record to GUID reported by Repadmin
/showreps.

If in doubt, delete DC’s Alias record(s) and re-start netlogon on
broken DC to re-register .
Age Of Directories Tool - Demo
If interested, contact me [email protected]
Replication Monitor
Status report (replication health report)
List of all GCs, BHS, Trusts
List of all replication errors on all DCs in domain
Changes not replicated
Replication partners
Force push/pull replication
Meta-data
Group Policy Object status
FSMO validation
Inbound connections (including reason)
Replication Monitor
Command-Line Utilities
RepAdmin
• In Support Tools.
• Perhaps the most useful tool for troubleshooting replication.
• /showreps - lists inbound, outbound connections.
– Only one to list outbound connections.
– Lists Server GUID (used for replication).
– Lists successful replication messages.
– Lists replication errors.
– Lists Replication partner used to replicate every naming context – inbound and
outbound.
NTDS Diagnostic Logging
HKLM\system\CCS\Services\NTDS\diagnostics
• Set value = 0-5
– 0 = off 5=very verbose
– Start with 3 to begin with
– Reported in Event log
• Important Values
1 Knowledge Consistency Checker
13 Name Resolution
5 Replication Events
8 Directory Access
9 Internal Processing
18 Global Catalog
Things that break Replication
(or indicate that it’s broken)
Duplicate connection objects
Orphaned objects
• Esp. DC objects, caused by a DC being removed from the domain
without successful DCPromo.
• Garbage Collection initiated manually before all DCs and GCs
are fully replicated.
• Reported in event logs.
Things that break Replication
(or indicate that it’s broken)
DC unavailable
• Down
• Name Resolution
• Network problem
DNS misconfigured
• TCP/IP addresses change
– Delegation
– Client resolver configuration (including name servers)
– DHCP scope configuration for DNS registration
• Failure to Contact a DNS server (for SRV records)
Things that break Replication
(or indicate that it’s broken)
KCC doesn’t do it’s job
• Routes around inaccessible DCs by creating duplicate connection
objects.
• When DCs come back on line, KCC should clean up the duplicate
connection objects.
– Usually doesn’t…
– Causes replication errors.
– Events in the DS Log.
– Need to clean them up manually.
Lingering Object Behavior
Basics
Scenerios
Object Deletions
Deleted objects turn into tombstones
• Tombstones replicated to other DCs
• This is how replication partners learn that an object was deleted
Tombstones purged from local database after tombstone lifetime has expired
• AD: 60 days, adjustable (2 days minimum)
• Sysvol: 60 days
If tombstone does not replicate to a DC, object deletion is not replicated
• Object not deleted on this DC
• Object is now a Lingering Object
• Can be on DC or GC
Rule: tombstone lifetime =
• Max time DC can be disconnected
• Max lifetime of Backup tape
Lingering Objects – Scenarios
Deleted object re-appears on all domain controllers in a domain and on all GCs
Deleted account does not disappear from Exchange GAL
Object was moved between domains and disconnected GC is brought online
Replication error on GC when new object is created
• Lingering object still holds attribute where uniqueness is enforced
(samAccountName)
• Exchange cannot create mailbox because object already exists
Why does this Happen????
DCs disconnected for more than tombstone lifetime
• Left in storage room for long time
• Replication failures
– I.e., bridgehead servers overloaded, no monitoring in place
• WAN connections down for a long time
– Tombstone lifetime abuse
– “Somebody” changed time on a DC to garbage collect an object
– Tombstone lifetime was changed to garbage collect objects on single servers
Can this be avoided?
• YES, monitor KCC topology and replication
• Do not set tombstone lifetime to less than 60 days
• DCs offline > tombstone lifetime must be re-promoted
Lingering Objects
Strict vs. Loose Replication Behavior
Replication Behavior
• Defines how DC reacts if an update for an object is replicated in, and the object does
not exist on DC
Loose Behavior
• DC requests full copy from replication source
• Logs event ID: 1388
Strict Behavior
• DC stops replication from offending replication source
• Logs error code 8240 (ERROR_DS_NO_SUCH_OBJECT) embedded in event ID
1084
• Requires logging level 1
Behavior can be set via registry key
• HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Parameters\Strict Replication
Consistency
• Introduced in Q314282
Deleting Lingering Objects
If found on a DC
• In loose behavior: Delete the object via users and computers
• In strict behavior: Follow procedures outlined in Q314282
On GC (in read-only NC)
• Object cannot be changed or deleted on GC
• Solution 1: Delete object on writeable replica (if possible)
• Solution 2: Use ldp to delete the object on the GC
– Support to remove lingering objects from GC added in Q314282
– Follow procedures outlined in Q314282
You might have to set loose behavior temporarily
Best Practice Recommendations
DC has not replicated for more than 60 days
• Tombstone lifetime default (60 days)
– Do not replicate, re-install OS
• Tombstone lifetime adjusted to > 60 days
– 60 days < time DC disconnected < tombstone lifetime
– Re-connect DC, restore sysvol
– Time DC disconnected > tombstone lifetime
– Do not replicate, re-install OS
If you have to disconnect a DC
• Make sure that it replicates successfully before you take it off-line
New deployments
• Add registry key to enforce strict replication behavior at DC OS installation
time
More Best Practice Recommendations
Existing deployments
• Default setting: Loose replication (even on SP3)
• Goal: Get to strict mode asap
• Set registry key to strict mode on all DCs
• Watch event logs on DCs
– If you get many replication errors on single DCs, re-promote DC
– For small number of replication errors, clean-up the DC
– Delete lingering objects if necessary
– Follow procedures outlined in Q314282
• If you were monitoring…
– Then don’t worry, you won’t see any replication errors 
Don’t lower tombstone lifetime to less than 60 days
Monitor!
Lingering Object Fix
Q317097 (good instructions)
HKLM\System\CurrentControlSet\Services\NTDS\Parameters…
• Add Value Name = Correct Missing Object
• Data Type =REG_DWORD
• Value = 1 (tight)
0 (loose)
Allows or Restricts AD replication when lingering objects are
discovered.
• Tight when you want to know.
• Loose to inventory and remove the objects.
Value Level Replication
WNT: Object Replication
• change to attribute or value
W2K: Attribute level replication
• Better than NT (more efficient)
• Change to attribute replicates attribute
• Change to value replicates attribute
• Problem: Multi-Valued Attributes
–
–
–
–
–
Group = Attribute
Member = Value
Change Member = replicate attribute with all members
Impacts network traffic
Limit (per Microsoft) of 5,000 users/group
.NET: Value Level Replication
• Replicates values – not attributes
• Eliminates 5,000 user/group limit
Domain Limit
There is a limit of about 800 child domains to a single parent
Child domains are unlinked, multi-valued attribute – stored in the
crossref attribute of the domain object
Jet database limits the data that can be stored. No way to patch –
must change Jet
“Might” be improved in Longhorn (not Whistler)
Domain Limit
One customer got to 900 domains
• Replication failed
• Authentication failed
• Mission critical application failed
Temporary Repair
• Demote all domains in reverse order of creation to return to 800
• Fixed Replication
Solution
• Redesign and redeployed to a single domain
DCPromo Troubleshooting
DCPromo Basics
First Test of:
• DNS registration and resolution .
• LDAP query and response.
• Kerberos authentication.
• Active Directory replication.
• FRS replication.
• Application of group policy.
Validation and Flow …
• Chapter 2, Active Directory Data Storage in the Windows 2000
Resource Kit
DCPromo Logs
%windir%\debug
• Dcpromo.log
• Dcpromoui.log
• Dcpromoui.xxx.log
Set verbosity on dcpromoui.log
• HKLM\Software\Microsoft\Windows\CurrentVersion\AdminDebug
• Values: DCpromo and DCPromoui
• Data
– 380001 = Default
– 0xFF003 – full file and debugger logging output
– 0xFF001 – maximum detail to DCPromoui.log
DCPromo Phases
Initialization
• UI Input
- DNS Name resolution
• LDAP Query/resp - Kerberos Authentication
AD Replication
FRS Replication
Wrap Up
• Apply policy
- Upgrade Trusts
• Publish new DC in the DS
Initialization Phase
Authorization error
• Enterprise Admin required to create new domain (or to remove
the last one).
• Domain Admin required to add replica DC (or demote a
replica).
Can’t find DNS with Dynamic Updates.
• Prompt to let DCPromo configure DNS.
– Creating domain.
– Answer NO!
Replicas, Child – must find DNS server to locate a “sourcing
DC.”
Errors Creating
the Computer Account
Need privileges to create the account.
First creates the account, puts it in domain/computers container.
Then puts it in domain controller’s OU.
Source DC identified in DCPromo logs.
DCPromo Initialization Checklist
Privileges required
• Enterprise Admin if creating new domain.
• Domain Admin if creating a replica.
System time configured properly
• Kerberos requires sync within five minutes.
• All parent, child domain DCs.
Sufficient free disk space.
• ~850 MB
Domain Naming Master FSMO required if creating new domain.
DCPromo Initialization Checklist
Everyone or Enterprise DC group has “Access this computer from
network”
Enterprise DC group rights:
• Manage Replication Topology.
• Replicating Directory Changes.
• Replication Synchronization.
Sourcing DC
• Security policy applied.
• Enable Computer and user account to be trusted for delegation.
DCPromo Initialization Checklist
Target DC has valid Kerberos tickets.
• Kerbtray.exe utility from Resource Kit.
GC must be contacted.
• Nltest /dsgetdc:compaq.com/GC
Able to contact a functional existing DC.
• Uses UDP (watch for firewall issues).
– Can use TCP but it’s a Microsoft Secret!
• Use Ping, NLTest, Nslookup to find a DC.
If Source DC not Reachable...
See if one responds.
• Ping FQDN of domain (Ping compaq.com).
• NLTest /dsgetdc:compaq.com /ds
– Other: /gc /pdc
/timeserv
• Check Site mapping for this computer.
– Nltest /server:<name> /dsgetsite
Check Dcpromoui.log to see source.
Force DCPromo to use a specific source
• Q224390
• Turn off Netlogon on other DCs.
Join the Server to the domain then DCPromo.
Info to Collect for Debug
Netdiag /v
• Problem DC
• Source DC (see dcpromo.log)
DCDiag /v
• Source DC
Replication working? (other DC in site)
AD & FRS Replication Phases
Initially inbound connection created to replicate from source DC.
• Machine acct (DC1$) moved to DC OU.
– UserAccountControl Attribute set
– 4096 (1000 hex) = Workstation/Server
– 532480 (82000 hex) = DC
– Account is moved.
• Error: DC1$ not found, access denied, etc.
– Credentials of account running Dcpromo
– Source must have computer object.
– Source must have security policy applied to itself.
– Q250874
AD & FRS Replication Phases
After first reboot…
• Outbound connection created.
• AD changes for new DC replicated to source.
– Including UserAccountControl attribute.
– Server (Replication) object.
– Replicated to other DCs.
• Sysvol is populated (policies copied to new DC).
• Sysvol and Netlogon Shares created.
Troubleshooting Missing Sysvol, Netlogon
Shares
Outbound connection failed
• Look in Sites and Services or Repadmin
• UserAccountControl still 4096 on source
[Q257338] – Good but …
• Build manual “outbound” connection
• Force KCC to “Check Replication Topology”
• Check UDP traffic if in a remote site.
Missing Sysvol and Netlogon Shares
Create replication “links” manually then force replication:
• Repadmin /add (adds outbound link)
• Repadmin /sync (forces replication)
Can’t create them manually. When Replication is fixed, they’ll get
created.
Tracking Down a GUID
Problem: GUID referenced in event log. What is it?
Solution: (Q216359)
• LDP – search for the GUID
• Search.vbs in Support tools
Orphaned Object (will kill replication)
• Turn up NTDS diagnostic logging
– Internal processing
– Replication
• Find object (GUID) in event logs
• Delete it via LDP
DCPromo Improvements in
Windows .NET
Install From Media (IFM)
Source Replica AD from Media in DCPromo
• GCs or DCs (Replica only).
• No initial replication from a DC.
– Faster (no searching for a DC).
– Less network impact (No full sync on the WAN).
– Easy branch office installation.
• After initial load, replicates changes.
• Network connectivity still required.
• Unattended Answer File Support:
– ReplicateFromMedia
– ReplicationSourcePath
Install From Media (IFM)
Unattended Answer File Support
• ReplicateFromMedia
• ReplicationSourcePath
Media must be local drive.
Media useful life < 60 days.
How?Use Backup Files/Media
• Create first DC in domain.
• Back up DC.
• Restore to Media (local disk, CD, …).
• C:>dcpromo /adv.
• Wizard produces an additional screen…
DCPromo Answer File
See Q223757
[Unattended]
Unattendmode=fullunattended
[DCINSTALL]
UserName=administrator
Password=Password3
UserDomain=corp.net
DatabasePath=c:\windows\ntds
LogPath=c:\windows\ntds
SYSVOLPath=c:\windows\sysvol
SafeModeAdminPassword=Password2
CriticalReplicationOnly
SiteName=Seattle
ReplicaOrNewDomain=Replica
ReplicaDomainDNSName=corp.net
ReplicationSourceDC=
ReplicateFromMedia=yes
ReplicationSourcePath=e:\DSrestore
RebootOnSuccess=yes
! Leave this blank for IFM
File Replication Service (FRS)
Basics
FRS Background
File Replication Service
• Replicates file system portion of policy
• Optional replication engine for DFS
Concepts
Challenges
• Journal wraps
• Staging File backlog
• Reconciliation / Morphed Directories
Concepts
Objects in DS
• Members, Subscribers, Conn. objects, filters
• Depends on AD replication
• Determines partners and schedule
NTFS USN Journal
• Used by FRS to track changes to NTFS volumes
Staging File and Directory
• Rename safe
• Compression support
Database
• Record of incoming, outgoing & existing files
File Replica Service (FRS)
Replaces NT 3.X\4.0 LMREPL service
Replicates SYSTEM Policy, Group Policy, DFS
• Group policy templates
• Ntconfig.pol & logon scripts for down-level clients
– NETLOGON Share
• DFS share contents
Multi-threaded replication engine
• Replicate different files to different computers simultaneously.
Terminology
• Computer A and B replicate DFS+SYSVOL
• B is computer A’s outbound partner
• A is B’s inbound partner.
• A is B’s “upstream” partner
• Changes flow “downstream to B Downstream
Upstream
Computer
A
Replication
B’s Inbound partner
Computer
B
A’s Outbound partner
Basic Operation
1
DC1
GPO
Change created on
DC1
2
GPO
3
Temp File moved
to staging
directory
4
Notify Replication
partners (replicas)
of changes
Partners pull
changes from DC1
DC2
File and Folder Filters
Excluded from FRS Replication:
• Computer specific EFS files/folders
• File names beginning with ~
• Files with .bak or .tmp extensions
• NTFS Mount Points
• Reparse points
Configurable for DFS shares
The Replication Process
AD Object
version updated
GPO
\winnt\sysvol\sysvol\
compaq.com\policies
DC1
\winnt\sysvol\
staging\
domain
\winnt\sysvol\s
taging areas
\compaq.com
Notify
Partners
The Replication Process
Pull
DC1
/\winnt\sysvol\sysvol\
DO_NOT_REMOVE_
ntfrs_PreInstall_Dom
ain
DC2
Sysvol
version of
GPT.ini GPO
updated
/\winnt\sysvol\
sysvol\compaq.com\
policies
FRS Replication
Observe File Replication Process
• Edit a group policy – modify and save it.
• Copy of changed file goes to staging and staging areas
directories.
• Copied to staging/staging areas directories on other DCs..
• Moved to sysvol\sysvol directory on the DC.
• Group policy file is updated.
Distributed File System (DFS)
DFS Basics
Domain-based (Win2K) vs Standalone (NT)
Root
• Must be on a DC.
• Contains PKT.
• DFS service.
Replica
• PKT from DC, stored locally.
• DC or Member Server.
FRS Replicates Data between DCs
• Member servers DFS replicate data to share via DFS service.
Site Aware (clients locate “closest” DFS Replica)
The DFS Replication Process
Data
DC1 Root
DFS service
FRS
SVR1
SVR2
DC2
Replica
Replica
Data
Replica
Data
DFS Troubleshooting
Symptom: Shared folders not in sync.
Make Sure DFS service is started on all servers and DCs.
Make sure AD Replication is working.
Make sure FRS is working.
DFSUtil.exe.
Watch for applications that keep files open.
• Anti-virus.
• Defragmenters.
FRS Troubleshooting
Techniques
Basics
Remember…
• You MUST install latest service pack and hot fix.
– Post SP2 (SP3) Hot fix Q307319
– Don’t go any further until this is installed.
• “Multi Master” characteristics replicates changes (and
problems) quickly. Turn off the FRS Service to get control.
• FRS depends on AD Replication, which depends on DNS.
Diagnostic Tools
Event Viewer: FRS log, DS Log
NTFRSutl.exe
• /outlog – outbound logs
• /inlog – inbound logs
• /ds – directory service
NTFRSxxx.log in \winnt\debug
NTFRS Health Check utility
• HP, Microsoft
Netdiag, DCDiag
AD replication tools
FRS Replication
What happens if it breaks?
• Changes not replicated to all DCs, resulting in inconsistent
AD
• Group policy gets out of sync and may not get applied.
– GPOTool: Version mismatch
• Logon scripts don’t get applied.
• DFS shares out of sync.
FRS Replication
How to tell if it’s broken
• Events in FRS log
– Event 1000, 1001 in app log every five minutes.
• Files backed up in staging areas
– Get size of staging directories (MB).
– Get date of oldest file (how long it has been broken).
• Group Policy not applied (new changes)
Replication Problems
Ensure DNS is working.
• DNS Lookup Failures in events (description).
• Ping, Nslookup to resolve names.
– Domain name
– DC, Server names
Ensure AD Replication is working.
• Create New Objects and see if they replicate.
• Repadmin/showreps and /showconn
• DS Event Log
• DCDiag
Replication Problems
Staging Areas should have no files
• Common FRS problem.
• Check size of dir, date of files.
Ensure FRS is working.
• Create text file on each DC, named for the DC.
• Put it in \winnt\sysvol\sysvol\<domain name>.
• All DCs should have copy of all DCs’ text files .
Replication Problems
FRS Event Log
• 13508 – Normal…but watch them
• 13509 – success after having 13508s
• 13514 – When Sysvol share not created “FRS preventing
computer from becoming a DC”
• 13553,13554 – FRS successfully added computer to replica set
(DCPromo successful)
• 13557 – Duplicate Connection Objects
• 13522 – Staging area full Q264822
• Lots of KB Articles: Search for “FRS and Event”
Interpreting the Logs NTFRS_000x.log
\WINNT\DEBUG
Identify errors, warning messages and milestone events in the log
files
Very difficult to interpret
NTFRSutl.exe
Ntfrsutl inlog = Lists inbound log
Ntfrsutl outlog = Lists outbound log
Ntfrsutl sets = Lists replica sets
Ntfrsutl DS = FRS’s view of the DS
Can execute remotely:
Ntfrsutl sets DC1
Group Policy Troubleshooting
Group Policy Troubleshooting Basics
Policy isn’t getting applied
• Set something easy – Admin Templates
– User Settings: Log off/on
– Computer Settings: Reboot
• Client-side extensions act as separate policies – debug separately from Admin
Templates
– Folder Redirection
– Scripts
– Disk Quotas
– Security
– IE Branding
– EFS Recovery
– IPSec
– Application Management
Group Policy Troubleshooting Basics
Policy applied, but settings not effective.
• Userenv.log (verbose) Q221833
• Set Diagnostic logging Q186454
HKLM\software\Microsoft\WindowsNT\CurrentVersion\Diagnostics
Value: RunDiagnosticLoggingGroupPolicy
Value Type: REG_DWORD
Value Data: 3
(value 0-5 0=off)
– Change One setting in GPO
– Logoff/on or reboot
– Verbose info in Application log
– Lists all registry settings applied to user
– Turn it off afterward – fills the event log fast!
Gpresult.exe
Resource Kit command-line utility.
Reports applied policy for user, computer.
• DN
• Security groups
Verbose mode – gpresult /v
• Registry settings
• Computer: Client-side extensions.
WATCH:
• Logon server.
• Cached policy on client may mask solution.
• Refresh Policy – make sure it’s applied .
GPOtool
Resource Kit command-line utility.
Run on DC only.
• Version Comparison: AD vs. Sysvol.
– AD version set immediately on change.
– Sysvol version set after FRS Replication.
• Friendly name /GUID association
Policy {08FAB736-9628-41D5-B5A8-37A0F98D7E43}
Policy OK
Details:
------------------------------------------------------------
DC: Qtest-DC2.qtest.cpqcorp.net
Friendly name: Folder Redirection Policy
Solving Version Mismatch
Small mismatch is normal.
• After change until FRS Replication completes.
• Be patient – see if it resolves.
Big mismatch is bad.
• Prevents application of policy.
• Unreplicated changes.
• Manually set FRS version = AD version.
– %windir%\sysvol\sysvol\<domain>\policies\{guid}\gpt.ini
– Will lose changes.
Resetting Default Domain Policy or Default DC
Policy
These policies are always same (GUID).
•
Default Domain: {31B2F340-016D-11D2-945F-00C04FB984F9}
•
Default DC: {6AC1786C-016F-11D2-945F-00C04FB984F9}
Changes are a mess – need to restore default.
To restore security defaults only, import the BasicDC.inf template (Q258595).
If settings are hosed, copy an original copy of the policy to
winnt\sysvol\sysvol\ <domain>\policies.
•
Copying policies only supported for these two cases.
•
Other will have different GUIDs.
•
Can’t copy other policies from one forest to another for debug.
How to copy the Default Domain
and Default DC policy
1.
Get a copy of a clean, default policy folder.
–
Restore the policy folder (GUID) from backup.
–
Create new domain and copy the GUID folder from that machine .
–
Don’t zip it .
2.
Delete existing policy.
3.
Wait for replication.
4.
Copy new policy folder to winnt\sysvol\sysvol\<domain>\policies.
5.
Wait for replication.
6.
Run GPOtool to make sure it shows up on all DCs.
Unable to Edit Group Policy
Group policy changed on PDC by default.
If PDC is not available.
• Dialog: Change on any DC, current DC or not.
• Error: Unable to contact Domain (no DC).
Solution: Transfer or seize the PDC role to another DC.
Can set policy to NOT use PDC …. Don’t!
Using Userenv.log to solve Group Policy problems
Turn on Verbose Logging Q221833
interpreting group policy information in userenv.log
Debugging Logon Scripts
(script doesn’t apply)
Configure it via group policy snap-in.
Make sure policy is applied.
•
Set a desktop setting.
•
Use Gpresult /v.
•
Enable verbose logging for Userenv.log.
Turn on “Run logon scripts visible.”
Create simple logon script as a .bat file to make sure it’s not the script failing.
Example: Using Userenv.log to find script errors.
Can’t find FSMO Role Holder
Problem: Operation trying to contact a FSMO role holder – PDC
Emulator or…?
• Can ping by name – seems to be ok
• Operation can’t find it
Solution:
• Find out who has that role:
netdom query fsmo
(returns a quick list)
• Transfer the role to a local DC
Group Policy Refresh Anomaly
Users complain of a 5-25 second “hang” intermittently in any
application – Outlook, Word, 3rd party apps. Keystrokes are
buffered and they can continue to work
Noticed direct correlation between the 1704 events (GP Refresh)
and the “hang”.
Change refresh interval via group policy and the frequency of the
“hang” changed.
Group Policy Refresh Anomaly
Cause: SceCli applies group policy every 16 hrs (default) if no gpo changes
have occurred. (DCs are every 5 minutes)
• Broadcasts WM_settingschanged to all top level windows
• Wakes up sleeping processes causing massive paging in/out of memory –
causing hangs
• More pronounced on “slower” computers
Solution: Configure Policy Refresh Interval in Group Policy so refresh occurs
every 12 hrs at midnight/noon so users don’t notice it.
Account Lockout
Background
Finding locked out user accounts
Client Bugs and Fixes
Server Bugs and Fixes
Resolution and Futures
Lockout Reasons & Options
Prevent spoofing or hijacking account
Optional event logging in Audit Policy
Account Lockout Options
• Timed lockout
– Account enabled after admin defined time
• Hard lockout
– Account disabled until reset by admin
• Lockout policy defined in group policy
– Single lockout and password policy per domain
– Location: default domain policy
Account Lockout on DC’s
Each DC records # of bad password attempts
BDC check PDC for latest password
All Bad password attempts seen by PDC
• PDC always 1st to lock out account
• PDC urgently replicates lockout when threshold reached
• Bad password attempts not replicated by DC
BadPasswordCount reset to 0 on 1st good password
PDC chaining operations
If BDC fails authentication with:
•
•
•
•
•
STATUS_WRONG_PASSWORD
STATUS_PASSWORD_EXPIRED
STATUS_PASSWORD_MUST_CHANGE
STATUS_ACCOUNT_LOCKED_OUT
Referred to as “BadPasswordStatus”
BDC chains authentication to PDC
• Return status from PDC if status = success or listed above
• Otherwise, ignore PDC status and use local status
Exception to PDC chaining
• AvoidPDCOnWan enabled and PDC in remote site (Q225511)
• 10 “BadPasswordStatus”events logged in 10 minutes
– NegativeCache enhancement Q263821
– Cache reset after good password entered
Troubleshooting account lockouts
Your goal: Answer the 4 W’s
• Who, Where, When and Why
Environment setup
• Enable Auditing in domain policy
– Account Logon Events – Failure
– Account Management – Success
– Logon Events – Failure
– Security Event log on DC’s: 10K events + over-write
• Enable netlogon logging (ntlm clients)
– NLTEST /DBFLAG:2080FFFF (no reboot)
• Enable Kerberos Logging
– Q262177: Kerberos logging (kerb clients)
Account Lockout – Where
DC Resources
• NTLM Clients
– Search DC & CLIENT NETLOGON.LOG for lockouts
– 0xC000006A = bad passwords
– 0xC0000234 = account lockout
• NTLM + Kerberos Clients
–
–
–
–
–
–
–
Search DS Event Logs
Q230254, Q299475, Q273499 and Q301677 for description
644: NTLM + Kerberos Lockout Event
675: Kerberos badd password
681: NTLM bad password
529: Failed logon
531: Account disabled
Tools
• EVENTCOMB
• AL.EXE
• NETMON.EXE
EVENTCOMB
AL.EXE
Account Lockout: Why
Attack, “Pilot Error” or Bug
• Wrong Password entered, mis-configured Service Account
Scenario
• Account type: user, computer or service account
• Lockout trigger?
• logon, drive access, following p/w change)
Drill Down: Look at TOD, pattern & frequency
• Process related lockouts
– Structured pattern
– Logged when users not present
– Look for:
–
common services, applications, client configuration
• User related lockouts
– Random pattern,
– Fewer events logged
– Look at:
–
shortcuts, mapped drives, logon scripts, applications
Account Lockout – Client
Win9X
•
•
•
•
Q278558: Access denied to a mapped drive after disconnect
Q272594: Client can't log on after log off w/o reboot
Q293793: VREDIR looses file tracking structures
Q271496: One unsuccessful logon attempt triggers lockout (1:3)
– Net use + dsgetdc + logon attempt.
• Q266772: Logon fails if Unicode string password to NTLM SSPI
DS Client on Win95, Windows 98, 98 Second Ed
• DSCLIENT *MUST be installed before any hotfixes!
– Q301344, Q283261
– DS Client lets WIN98 account lockout fixes work on Win95
Win2K
• Q275508: User locked when accessing home dir after changing p/w
• Hotfix or SP2
Windows XP
• None
Account Lockout: Server Fixes
Read server side KB articles
• Q287639: Win9x Clients Locked Out after unlock
– MSV1 package does password check against BDC with old password during 2nd
phase of logon
• Q278299: Bad p/w count not reset to 0 (ntlm)
– Original hotfix had regression. Confirm latest version deployed.
• Q263821: Bad p/w count not reset to 0 (kerb)
• Q292573: DSA.MSC and ADSI may not use same DC to WinSERaid:16662
(post SP2 hotfix)
Resolution
• Windows 2000 DC’s: Install SP2 + Q314282
– Same QFE as lingering object and other good DC fixes
• Service Pack 3
PDC FSMO Load Reduction
Windows 2000 domains are much larger than their NT 4 predecessors
• i.e. > 50,000 clients
NT 4 and WIN9X clients still deployed and target PDC only for updates
Windows 2000 / XP clients use Windows 2000 DCs in mixed mode domains
(Q284937)
Older applications select PDC only rather than any DC
Applications may enumerate whole domain ( NT 4 usrmgr, srvmgr )
Result: PDC gets more load
Symptoms of Overload
High CPU utilization for long period
• Greater than 70%
• High average disk queue
– Disk queue > number spindles
• Timeout of requests
– Password changes
Steps to Optimize PDC
Optimize hardware and software
Hide PDC from DNS clients
Implement WINS optimizations
Block down-level enumeration
PDC in dummy site
Optimize Hardware & Software
Run Windows 2000 Advance Server with /3gb switch
• Enables ESE cache of 1.5 gb
4 Processor Server is optimal
2 Gb RAM
Disk
• RAID 1 set for OS and Page File
• RAID 1 set for Log Files
• RAID 0+1 for NTDS.DIT and sysvol
Run only core DC services
Disk
• RAID 1 set for OS and Page File
• RAID 1 set for Log Files
• RAID 0+1 for NTDS.DIT and sysvol
Run only core DC services
Hiding Techniques (DNS)
Lower PDC SRV Priority
• Reduce chance of DS aware clients selecting PDC before other DCs
• HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\LdapSrvPriority=1000
• Data type: Reg_DWORD
PDC only Site
• Clients will use it only as last resort
• Create a site-link to real site
Disable AutoSite Coverge on PDC
• HKLM\System\CurrentControlSet\Services\Netlogon\Parameters\AutoSiteCoverage=0
Hiding Techniques (WINS)
Down-level clients locate DCs through 1C queries
WINS always adds PDC first in 1C list
Remove PDC from top of list (SP2) Q269424
– HKLM\System\CCS\Services\WINS\Parameters
– Value name: Add1Bto1CQueries
– Data type: Reg_DWORD
– Value data: 0 = disabled, 1 = Enabled (default)
Randomize 1C list for general load balancing
– HKLM\System\CCS\Services\WINS\Parameters
– Value name: Randomize1cList
– Data type: Reg_DWORD
– Value data: 0 = disabled, 1 = Enabled
– Q231305 (NT4 SP4 and later)
Block Enumeration
Old (non DS enabled) applications often call SAM APIs to
enumerate entire domain
Hard to control
Block unauthorized users from seeing more than 100 objects per
call
• New access control right determines access
• HKLM\System\CCS\Control\Lsa\SamDoExtendedEnumerationAccessCheck=1
• Q268339
Misc. – Server Applications
Server based applications can create frequent changes in the directory
• Agent based systems
– Create and delete accounts
– Grant accounts rights in the domain
Changes create replication
• AD replication for frequent group changes
• FRS changes for policy changes
Apply SMS hot fixes
• Q311127, Q278345
• Read articles, configuration necessary
Distributed Link Tracking
Purpose
• Used to track moves of linked files across volumes and servers (shell
shortcuts)
• Uses AD objects to track files and volumes
Objects stored in DS
• linkTrackVolentry object for each NTFS volume in the domain
• linkTrackOMTEntry created for each linked item that is moved
• Clients query service when a shell shortcut or OLE link can’t be resolved
Clients refresh links every 30 days
DCs scavenge objects older than 90 days
Distributed Link Tracking
DLT is an optional service
• Enabled by default
Typically not included in DS capacity planning
Best Practices
• Disable on all DCs
– Reduces AD replication traffic
– Reduces AD database size
• Use Group Policy to disable DLT server service on DCs
• Remove objects from DS
– Use staggered approach
• Q312403
DC/GC Promotion Consideration
DC Promotion / Demotion
Process to cleanup after failed promotion
GC Promotion
GC Demotion
DC Promotion / Demotion
Create proper sites before hand
Failed promotion or removing server
• Manually clean out metadata from any failed attempt
– When replacing a failed DC
– When a DCPROMO has failed
– To clean meta data
– Use NTDSUTIL
– FRS member / subscriber objects
– Machine account in domain
• Allow replication to all DCs before promoting again
GC Promotion
First GC in site may go online before all partitions are replicated
• Default: GC will advertise after all partitions in site replicate
• Exchange may use GC before ready
• Mail may bounce
Best Practice
• Stop Netlogon
• Mark DC as GC
• Use repadmin to monitor success
• Start Netlogon all NCs replicated
SP3 will wait for all partitions to replicate before advertising
GC Demotion
GC removal requires time for object removal
The KCC removes 500 objects per default 15 min cycle
Best Practice
• Monitor for event 1069 to record progress
• Forced GC removal when needed (Q297935)
– Remove each partition with repadmin
– repadmin /delete DC=globalit,DC=unity,DC=com %destgc% /nosource
Container Inheritable ACE’s
ACE that applies to either all objects or objects of a specific class in a container
• Example: Delegate right to reset user passwords in one OU
Security Descriptor propagation copies ACE to all objects
• Makes access check very fast
– All information is on directory object
• Also class specific ACEs are copied to all objects
– Example: ACE used to delegate right to reset user passwords also copied to computer and
container objects
Increases object size – database size
• Increase proportional to size of subtree
– If set on domain root: Highest impact
– If set on OU: Lower impact (depends on number of objects in OU)
• Low impact if set on schema or configuration container
SD propagation is asynchronous
• Takes time to propagate (i.e., 3 hours in 50,000 user domain)
Container Inheritable ACEs
Best Practices
Don’t add container inheritable ACEs to domain root
Add on OUs as appropriate
• Best Practice Documentation recommends OUs for
– Users
– Groups
– Computers
• Container inheritable ACEs on these OUs have small impact only
Watch SD propagator events
• SD propagation running: 1257 (Level 2)
• SD propagation report (objects touched): 1258 (Level 2)
• SD propagation terminated abnormally: 1262 (Level 0)
Always leave sufficient disk space on database partition
• 20% of database size, at least 500 MB
• Monitor!
Test ACL changes in lab or pilot domain to bracket size increase
Container Inheritable ACEs
The Future
Windows .NET will have single-instance store for Security Descriptors
• Objects have links to security descriptors
• If container inheritable ACE changes, only one SD changes
– No impact on disk size
Does not require .NET only forest
• SD propagation happens on local DC
• Transparent to other DCs
• Feature available immediately
Monitor SD prop events after upgrading a DC
• SD propagator will build single instance store after the domain controller boots .NET
for the first time
Database will shrink after OS upgrade
• Need to off-line defrag database to see changes
Forest Recovery
Imagine the unthinkable
• All domain controllers crash and won’t reboot
• Data corruption replicates through the forest
• Schema becomes unavailable
• Somebody made changes to the schema that prevent standard applications
from installing
• Malicious administrator performs irreparable damage to the schema that
replicates through the forest
• You lose your root domain
• You win the lottery
So far, this has never happened
• But you want to be prepared
Forest Recovery
Rolling back in time
Restore –
Changes lost
Identified
Root Cause
Catastrophic
Event
Changes
Time
Backup
Backup
Backup
Backup
Backup
Backup
Backup
Backup
Backup
Backup
Forest Business Recovery
High Level Steps
Shutdown all domain controllers in forest
In each domain
• Restore one DC from good backup tape
• Re-install OS on all other domain controllers
• Re-promote all other domain controllers
Start with root domain first
Forest Recovery
Shutdown all DCs
Restore one DC per domain
(off-network)
Disable GC service
Break replication
Seize FSMO roles
Increase RID by 100,000
Bring restored DCs
back on the network
Enable GC on
at least one root DC
Forest Recovery
Re-install OS on all other DCs
Promote all other DCs
Enable GC service as needed
Move FSMOs as needed
Forest Recovery
Detailed steps available very soon in
white paper on microsoft.com
• Best Practice for Recovering your
Active Directory Forest
FRS Concepts revisited
Objects in DS
• Members, Subscribers, Conn. objects, filters
• Depends on AD replication
• Determines partners and schedule
NTFS USN Journal
• Used by FRS to track changes to NTFS volumes
Staging File and Directory
• Rename safe
• Compression support
Database
• Record of incoming, outgoing & existing files
FRS Replication Operation
Create / Modify file
NTFS
Drive
NTFS
Drive
FRS learns of file changes from
the NTFS “USN Change journal”
Filter out unwanted files
Age Cache waits 3s
Rename + move file to final
location
Write OB Log
Write entry in FRS ID Table
Copy file into Pre-install area
Build staging file
Replica copies file to staging dir
Write to OB log for other replicas
Send change order to
partner
Request change
Write to Inbound and ID log
Journal Wraps / Staging backlog
NTFS USN Journal is a fixed-size log of file changes
• FRS Service must run to keep up with these changes
• Last ∆ in FRS DB must exist in NTFS journal
– If not, FRS cannot know all changes. Called ‘journal wrap’
• Resolution
– Keep Service running (especially during bulk modifications)
– Increase size of USN journal (automatic in SP3 rollup)
Staging File backlog
• Before SP3, staging files stored until all direct partners receive the staged files
– Associated with connections
• Common causes of backlogs:
– Offline downstream partners
– Full SYNCS by Administrators or applications
–
Antivirus , Disk Optimizers, File system policy
• Sharing violations / Move-In problems
Reconcilation & Morphed Directories
Files: Last-writer wins
• All change orders have event times (UTC)
• Event time of CO compared to ID Table
– Event time > 30 minutes, last writer wins
– Event time < 30 minutes, highest version wins
Folders: Last-writer wins
• Conflicting change gets morphed name
– Preserves files associated with directory
– First-writer wins for name conflicts of folders
• Causes
– BURFLAGS abuse
– Conflicting creates on replication failure
FRS Enhancements (Q319473)
QFE roll-up of coming Service Pack 3 changes
Increases NTFS USN journal: 128 MB
Dynamic staging file relocation
LRU staging files deleted: 60 / 90 rule
Staging files for offline partners deleted
SYSTEM = Full Control / NTFS bug
Duplicate changes not sent on wire + event
Office XP (Excel) data deletion fix
Topology Enhancements
DFSGUI from .NET Server
• Runs on XP clients in Windows 2000 domains
• Available on microsoft.com now: Q304718
New topology options
• Full Mesh, Ring, Simple Hub & Spoke
• Custom Topologies
• Connection Tuning
– Enable / disable individual connections
– Change orders are associated with connections
–
Disabling connections deletes associated backlog
Connection Priority (may pull this)
• Bit on options attribute of connection object
• Defines partners used during initial / recovery sync
– High: “Must” source all connections in class
– Medium: Source from at least 1 connection in class
– Low: “best effort” sync
FRS best practices
Run Q307319 + new NTFS.SYS
Keep service running
• Avoids journal wraps
Join empty replica sets
Don’t place DFS targets on OS partition
DFS: enable replication on child links
• Targets can be taken offline
• Incremental sourcing & advertisement of data
• Replica set specific burflags
Properly size staging dir
• 128 largest files + 50% or 650 MB minimum
Don’t delete files from staging directory
• Change orders, # of VV joins, file size
FRS best practices
Topology management
• No full mesh
• SYSVOL: requires 1 in / outbound CO
Forceful deletion of FRS members
• Delete member and subscriber objects
Tools
NTFRSUTL
• NTFRSUTL DS
– Repadmin /showconn for FRS
– DS Object inventory + topology review
• NTFRSUTL SETS
– Repadmin showreps for FRS
– Status of downstream partner sync status
• NTFRSUTL INLOG | OUTLOG: IDTABLE
– Inbound + outbound changes + tree inventory
Debug Logs: systemroot%\debug\ntfrs_*.log
• Two way conversation between partners
Summary
All deployments should run SP2
Deploy SP3 when available
Q314282 provides roll-up fix for many issues
• Lingering objects
• Account lockouts
• PDC overload situations
Monitor Active Directory
New Documentation
Available on microsoft.com
• Best Practices for Active Directory Delegation
– http://www.microsoft.com/windows2000/techinfo/planning/activedirectory/addeladmin.asp
Coming soon
• Active Directory Monitoring Guidelines and Key Indicators
• Active Directory Forest Recovery
Eventcomb
– http://download.microsoft.com/download/win2000adserv/secops/RTM/NT
5/EN-US/SecOps.exe