Mark Russinovich Technical Fellow Microsoft Corporation Session Code: WCL303 About Me Technical Fellow, Microsoft Co-founder and chief software architect of Winternals Software Co-author of Windows Internals 4th and.

Download Report

Transcript Mark Russinovich Technical Fellow Microsoft Corporation Session Code: WCL303 About Me Technical Fellow, Microsoft Co-founder and chief software architect of Winternals Software Co-author of Windows Internals 4th and.

Mark Russinovich
Technical Fellow
Microsoft Corporation
Session Code: WCL303
About Me
Technical Fellow, Microsoft
Co-founder and chief software
architect of Winternals Software
Co-author of Windows Internals 4th
and 5th edition and Inside Windows
2000 3rd edition with David Solomon
Author of TechNet Sysinternals
Home of blog and forums
Contributing Editor TechNet
Magazine, Windows IT Pro Magazine
Ph.D. in Computer Engineering
Outline
Introduction
Sluggish Performance
Application Hangs
Error Messages
Application Crashes
Blue Screens
Case of the Unexplained…
This is the 2009 version of the “case of the
unexplained” talk series
2007 & 2008 versions covered different cases
Can view webcast on Sysinternals->Mark’s webcasts
Based on real case studies
Some of these have been written up on my blog
Troubleshooting
Most applications do a poor job of reporting
unexpected errors
Locked, missing or corrupt files
Missing or corrupt registry data
Permissions problems
Errors manifest in several different ways
Misleading error messages
Crashes or hangs
Purpose of Talk
Show you how to solve these classes of
problems by peering beneath the surface
Interpreting file and registry activity
Interpreting call stacks
You’ll learn tools and techniques to help you
solve seemingly unsolvable problems
Tools We’ll Use
Sysinternals: www.microsoft.com/technet/sysinternals
Process Explorer – process/thread viewer
Process Monitor – file/registry/process/thread tracing
Autoruns – displays all autostart locations
SigCheck – shows file version information
PsExec – execute processes remotely or in the system account
Pslist – list process information
Strings – dumps printable strings in any file
ADInsight – real time LDAP (Active Directory) monitor
Zoomit – presentation tool I’m using
Microsoft downloads:
Kernrate – sample-based system profiler
Visual Studio: Spy++ - Window analysis utility
Debugging Tools for Windows: Windbg application and kernel debugger:
www.microsoft.com/whdc/devtools/debugging/Windbg
Outline
Sluggish Performance
Application Hangs
Error Messages
Application Crashes
Blue Screens
The Case of the Slow Outlook Attachment
User would see CPU burst and Outlook would
hang for 15+ seconds whenever they received
an attachment:
Process Monitor
Process Monitor is a real-time file, registry, process and thread monitor
It requires Windows 2000 SP4 w/Update Rollup 1, XP SP2 or higher, Server 2003
SP1 or higher, Vista, or Server 2008 (including 64-bit versions of Windows)
It replaces Filemon and Regmon, but you can use Filemon and Regmon on older
operating systems
Enhancements over Filemon/Regmon include:
More advanced filtering
Operation call stacks
Boot-time logging
Data mining views
Process tree to see short-lived processes
When in doubt, run Process Monitor!
It will often show you the cause for error messages
It many times tells you what is causing sluggish performance
The Case of the Slow Outlook
Attachment (Continued)
Process Monitor trace of next received
attachment implicated antivirus:
The Case of the Slow Outlook
Attachment: Solved
Searched web for confirmation:
Checked AV settings found problematic option and
disabled scanning:
Process Explorer
Process Explorer is a Task Manager replacement
You can literally replace Task Manager with Options>Replace Task Manager
Hide-when-minimize to always have it handy
Hover the mouse to see a tooltip showing the process
consuming the most CPU
Open System Information graph to see CPU usage
history
Graphs are time stamped with hover showing biggest
consumer at point in time
Also includes other activity such as I/O, kernel memory
limits
The Case of the Periodic VMWare Freezes
Noticed CPU peg every 10 seconds and the
desktop freeze when running VMWare
Saw in the Process Explorer System Information
graph that it was the System process:
Processes and Threads
A process represents an instance of a running program
Address space
Resources (e.g., open handles)
Security profile (token)
A thread is an execution context within a process
Unit of scheduling (threads run, processes don’t run)
All threads in a process share the same per-process address space
The System process is the default home for kernel mode system
threads
Functions in OS and some drivers that need to run as real threads
E.g., need to run concurrently with other system activity, wait on timers,
perform background “housekeeping” work
Other host processes: svchost, Iexplore, mmc, dllhost
Viewing Threads
Task Manager doesn’t show thread
details within a process
Process Explorer does on “Threads”
tab
Displays thread details such as ID,
CPU usage, start time, state, priority
Start address is where the thread
began running (not where it is now)
Click Module to get details on
module containing thread start
address
Thread Start Functions and Symbol Information
Process Explorer can map the addresses within a
module to the names of functions
This can help identify which component within a process is
responsible for CPU usage
Requires symbol information:
Download the latest Debugging Tools for Windows from
Microsoft (free)
Configure Process Monitor’s symbol engine:
Use dbghelp.dll from the Debugging Tools
Point at the Microsoft public symbol server (or internal symbol
server if you have access)
Can configure multiple symbol paths separated by “;”
The Case of the Periodic VMWare
Freezes: Solved
Opened Threads tab for System process and paused
after a spike:
Ftser2k was XM Radio USB/Serial driver
Stopping it didn’t remove spikes
Http.sys is IIS kernel-mode cache driver
Went to device manager and showed hidden devices
Stopped http.sys and hangs went away
Didn’t care about dependent services
The Case of the Runaway Internet
Explorer
Noticed a CPU spike and hovered over Process Explorer
to see culprit:
That was unexpected, because had just installed Adobe
Acrobat Reader and exited Internet Explorer
IE’s window wasn’t visible, but it was still in the process list
The Case of the Runaway Internet
Explorer: Investigation
The thread had a generic start address:
Required deeper investigation…
Call Stacks
Sometimes a thread start
address doesn’t tell you
what a thread is doing
The stack might provide
a hint:
The stack is a per-thread
region of memory that
records a history of
function nesting
The bottom from
(Function 3) is where the
thread will continue
executing
Function 1
Function 2
Function 3
Viewing Call Stacks
Click Stack on the Threads tab to
view a thread’s call stack
Lists functions in reverse
chronological order
Note that start address on
Threads tab is different than first
function shown in stack
This is because all threads created
by Windows programs start in a
library function in Kernel32.dll
which calls the programmed start
address
The Case of the Runaway Internet
Explorer: Stack Investigation
I double-clicked on the thread to see its stack:
The Case of the Runaway Internet Explorer:
What is GP.OCX?
Opened DLL view to see DLL’s version information:
DLL Search Online didn’t return any useful results
The Case of the Runaway Internet
Explorer: Solved
Searched for NOS Microsystems:
Conclusion: Adobe uses gp.ocx, which had hit an
infinite-loop bug
Terminated IE process to stop CPU usage
Outline
Sluggish Performance
Application Hangs
Error Messages
Application Crashes
Blue Screens
The Case of the Logon Script Hangs
Multiple users complained that logon would take three minutes
Investigation revealed that all complaints were from Dell Precision 670
workstations
But only some of the 670 workstations were affected
User configured Process Explorer to run during logon and saw
Lisa Client consuming CPU:
Lisa Client was custom logon application that checked system for
installed applications
Lisa Client CPU then went idle for several minutes, then exited and
system would start acting normally
The Case of the Logon Script Hangs
(Continued)
User captured a Process Monitor trace after
manually running Lisa Client
Saw three-minute delay correspond to device error:
Details column showed
IOCTL_SCSI_PASS_THROUGH
Captured trace on working system and looked
for IOCTL_SCSI_PASS_THROUGH operation
No device error and no delay:
The Case of the Logon Script Hangs:
Solved
Device error lead user to look at disks:
Working systems had Fujitsu disks
Systems with hangs had Seagate
Solution:
Temporary: wrote WMI script that queried disk
type and would not launch Lisa Client on Seagate
systems
Final: Application developers changed Lisa Client to
avoid performing problematic command
Outline
Sluggish Performance
Application Hangs
Error Messages
Application Crashes
Blue Screens
Undocumented Settings
The Case of the MMC Startup Failure
User would get an error every time they started an MMC snapin:
The Case of the MMC Startup Failure:
Solved
Ran Process Monitor and saw an Access Denied
error on an IE registry key:
Checked permissions and Administrators had no
access
Solution: added full-access for Administrators
and MMC started successfully
The Case of the Favorite that Wouldn’t
Save
User tried to change the URL for one of his IE
favorites:
Trying to save a new favorite resulted in a
similar error:
The Case of the Favorite that Wouldn’t
Save: Solved
Captured a Process Monitor trace:
AccessChk showed that folder was Medium Integrity
(IE requires Low):
Fixed integrity with Icacls and problem solved
The Case of the Persistent Executable
Noticed that opening volumes in Explorer was really slow
Volume context menu indicated presence of Autorun.inf
The Case of the Persistent Executable
(Continued)
Files reappeared after deleting, so monitored activity with
Process Monitor
File was recreated by Explorer, so looked at stack
Viewing Autostarts
Use Autoruns to see what’s configured to start when the system
boots and you login
Windows MsConfig shows a subset defined autostart locations
MsConfig doesn’t show as much information
The Case of the Persistent Executable
(Solved)
Process Explorer DLL search showed that
amvo.dll loaded into Explorer and all its children
Found amv0.exe and used Autoruns to delete it
from the system Run key
Outline
Sluggish Performance
Application Hangs
Error Messages
Application Crashes
Blue Screens
Application Crashes
In most cases, there’s nothing you can do about
application crashes
They are caused by a bug in in the program
Only the developer can fix a bug
However, the crash may be caused by
misconfiguration or an extension (a plugin)
Monitor the application’s crash with Process
Monitor if it’s reproducible
Look for extensions in the crash file with Windbg
Finding the Crash Dump
On pre-Vista systems, finding the dump file is
easy:
Attaching to the Dying Process
Vista doesn’t save crash dumps for most crashes
Only if Microsoft requests a dump for study and you send it in
When a crash occurs, don’t dismiss the crash dialog:
Launch Windbg and attach to the
process
You can save a dump with the .dump
command
Identifying the Crashed Process
On Vista, the process name might not be enough to
identify the instance that’s crashed:
To determine the PID of the crashed instance, look at
WerFault’s command line:
Enabling Dump Archiving on Vista and
Windows Server 2008
Or you can configure Vista SP1 and Windows
Server 2008 to always generate and save a
dump file
Create a key named:
HKLM\Software\Microsoft\Windows\Windows
Error Reporting\LocalDumps
Dumps go to %LOCALAPPDATA%\CrashDumps
Override with a DumpFolder value
(REG_EXPAND_SZ)
Limit dump history with a DumpCount value
(DWORD)
Analyzing a Crash
Basic crash dump analysis is easy and it might tell you the cause
Requires Windbg and symbol configuration
Once the dump is loaded, find the faulting thread
The debugger might identify it
If the debugger doesn’t, examine each thread stack looking for “fault”,
“exception”, or “error” names
Examine the stack of the faulting thread to look for third-party
plugins
If you suspect an extension:
Check for a new version
Uninstall it if the problem persists
The Case of the Explorer Context Menu
Crash
Explorer would randomly crash when the user rightclicked on a file
Attached to process and executed !analyze -v:
Didn’t know what muangys.dll was and because
module was unloaded, Windbg provided no
information
The Case of the Explorer Context Menu
Crash (Cont)
Ran Process Explorer and looked at Explorer DLL
view to find muangys.dll:
File had no version information, but Strings
identified the company and application:
The Case of the Explorer Context Menu
Crash: Solved
Was part of Icon editing software, which
developer relied upon
No newer version
Solution: disable shell extension with Autoruns
Outline
Sluggish Performance
Application Hangs
Error Messages
Application Crashes
Blue Screens
Crashes and Hangs
Windows has various components that run in Kernel Mode, the
highest privilege mode of the OS
OS components: Ntoskrnl.exe, Hal.dll
Drivers: Ntfs.sys, Tcpip.sys, device drivers
Kernel-mode components are privileged extensions to the OS
have to adhere to various rules
Not accessing invalid memory
Accessing memory at the right “Interrupt Request Level”
Not causing resource deadlocks
When a kernel-mode component performs an illegal operation,
Windows crashes (blue screens)
Crashing helps preserve the integrity of user data
A resource deadlock can hang the system
Online Crash Analysis
When you reboot after a crash, Windows offers to upload it to Microsoft Online Crash
Analysis (OCA)
Automated server generates a thumbprint of the crash and uses it as a key in a database
If the database has an entry, the user is told the cause and directed at a fix
Basic Crash Dump Analysis
Many times OCA doesn’t know the cause:
Basic crash dump analysis is easy and it might tell you
the cause
Requires Windbg and symbol configuration
Dump files are in either:
\Windows\Memory.dmp: Vista and servers
\Windows\Minidump: Windows 2000 Pro and Windows XP
The Case of the Crashed Phone Call
Laptop crashed during a Skype VOIP call
User reconnected and system crashed again
Minidump file pointed at Intel wireless driver:
The Case of the Crashed Phone Call
(Cont)
Looked at file properties to determine what
device the driver was for:
Found device in Device Manager:
The Case of the Crashed Phone Call
(Cont)
Right-clicked and checked Windows Update for
newer driver:
Need to check OEM site, so had to find version
number
The Case of the Crashed Phone Call:
Solved
OEM site had older version:
Intel site had newer one:
Installed and crashes stopped
Summary and More Information
A few basic tools and techniques can solve seemingly impossible
problems
I learn by always trying to determine the root cause
Resources:
Webcasts of two previous “Case of the Unexplained “ talked
Sysinternals->Mark’s Webcasts
Sysinternals Video Library: in-depth dive on tools and troubleshooting
My blog
Windows Internals: understand the way the OS works
If you’ve solved one, send me a description, screenshots and log
files!
I’ll send you a signed copy of Windows Internals
Resources
www.microsoft.com/teched
www.microsoft.com/learning
Sessions On-Demand & Community
Microsoft Certification & Training Resources
http://microsoft.com/technet
http://microsoft.com/msdn
Resources for IT Professionals
Resources for Developers
www.microsoft.com/learning
Microsoft Certification and Training Resources
Track Resources
→ Want to find out which Windows Client sessions are best
suited to help you in your deployment lifecycle?
→ Want to talk face-to-face with folks from
the Windows Product Team?
Meet us today at the
Springboard Series Lounge, or visit us at
www.microsoft.com/springboard
Springboard Series
The Springboard Series empowers you to select the right resources, at the right
technical level, at the right point in your Windows® Client adoption and management
process. Come see why Springboard Series is your destination for Windows 7.
Complete an
evaluation on
CommNet and
enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.