Unit OS 6: Windows I/O Processing

Download Report

Transcript Unit OS 6: Windows I/O Processing

Unit OS6: Device Management
6.3. Windows I/O Processing
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze
Copyright Notice
© 2000-2005 David A. Solomon and Mark Russinovich
These materials are part of the Windows Operating
System Internals Curriculum Development Kit,
developed by David A. Solomon and Mark E.
Russinovich with Andreas Polze
Microsoft has licensed these materials from David
Solomon Expert Seminars, Inc. for distribution to
academic organizations solely for use in academic
environments (and not for commercial use)
2
Roadmap for Section 6.3
Driver and Device Objects
I/O Request Packets (IRP) Processing
Driver Layering and Filtering
Plug-and-Play (PnP) and Power Manager
Operation
Monitoring I/O Activity with Filemon
3
Driver Object
A driver object represents a loaded driver
Names are visible in the Object Manager
namespace under \Drivers
A driver fills in its driver object with pointers to its I/O
functions e.g. open, read, write
When you get the “One or More Drivers Failed to
Start” message its because the Service Control
Manager didn’t find one or more driver objects in the
\Drivers directory for drivers that should have
started
4
Device Objects
A device object represents an instance of a
device
Device objects are linked in a list off the driver
object
A driver creates device objects to represent the
interface to the logical device, so each generally
has a unique name visible under \Devices
Device objects point back at the Driver object
5
Driver and Device Objects
Driver Object
\Device\TCP
\Device\UDP
\Device\IP
\TCPIP
Open
Write
Read
Open(…)
Read(…)
Write(…)
Dispatch Table
Loaded Driver Image
TCP/IP Drivers Driver and Device Objects
6
File Objects
Represents open instance of a device (files on a volume are virtual
devices)
Applications and drivers “open” devices by name
The name is parsed by the Object Manager
When an open succeeds the object manager creates a file object to
represent the open instance of the device and a file handle in the
process handle table
A file object links to the device object of the “device” which is
opened
File objects store additional information
File offset for sequential access
File open characteristics (e.g. delete-on-close)
File name
Accesses granted for convenience
7
I/O Request Packets
System services and drivers allocate I/O request packets to describe I/O
A request packet contains:
File object at which I/O is directed
I/O characteristics (e.g. synchronous, non-buffered)
Byte offset
Length
Buffer location
The I/O Manager locates the driver to which to hand the IRP by following
the links:
File Object
Device Object
Driver Object
8
Putting it Together: Request Flow
Process
DeviceIoControl
User Mode
Kernel Mode
Dispatch
Table
NtDeviceIoControlFile
File
Device
Driver
Object
Object
Object
Handle
Table
IRP
DispatchDeviceControl( DeviceObject, Irp )
Driver Code
9
I/O Request Packet
Environment
subsystem or
DLL
1)An application writes
a file to the printer,
passing a handle to
the file object
User mode
Kernel mode
Services
2)The I/O manager
creates an IRP and
initializes first stack
location
I/O manager
IRP header
IRP stack
location
WRITE
parameters
3)The I/O manager uses
the driver object to locate
the WRITE dispatch
routine and calls it,
passing the IRP
Dispatch
routine(s)
File
object
Start I/O
ISR
Device
object
Driver
object
DPC
routine
Device Driver
10
IRP data
IRP consists of two parts:
Fixed portion (header):
Type and size of the request
Whether request is synchronous or asynchronous
Pointer to buffer for buffered I/O
State information (changes with progress of the request)
One or more stack locations:
Function code
Function-specific parameters
Pointer to caller‘s file object
While active, IRPs are stored in a thread-specific queue
I/O system may free any outstanding IRPs if thread terminates
11
I/O Processing –
synch. I/O to a single-layered driver
1. The I/O request passes through a subsystem DLL
2. The subsystem DLL calls the I/O manager‘s NtWriteFile() service
3. I/O manager sends the request in form of an IRP to the driver (a device
driver)
4. The driver starts the I/O operation
5. When the device completes the operation and interrupts the CPU, the
device driver services the int.
6. The I/O manager completes the I/O request
12
Completing an I/O request
Servicing an interrupt:
ISR schedules Deferred Procedure Call (DPC); dismisses int.
DPC routine starts next I/O request and completes interrupt servicing
May call completion routine of higher-level driver
I/O completion:
Record the outcome of the operation in an I/O status block
Return data to the calling thread – by queuing a kernel-mode
Asynchronous Procedure Call (APC)
APC executes in context of calling thread; copies data; frees IRP;
sets calling thread to signaled state
I/O is now considered complete; waiting threads are released
13
Flow of Interrupts
0
2
3
Peripheral Device
Controller
CPU Interrupt
Controller
n
CPU Interrupt
Service Table
ISR Address
Spin Lock
Dispatch
Code
Read from device
Raise IRQL
Grab Spinlock
Drop Spinlock
AcknowledgeInterrupt
Request DPC
Lower IRQL
Interrupt
Object
KiInterruptDispatch
Driver ISR
14
Servicing an Interrupt:
Deferred Procedure Calls (DPCs)
Used to defer processing from higher (device) interrupt level to a lower
(dispatch) level
Also used for quantum end and timer expiration
Driver (usually ISR) queues request
One queue per CPU. DPCs are normally queued to the current processor, but
can be targeted to other CPUs
Executes specified procedure at dispatch IRQL (or “dispatch level”, also “DPC
level”) when all higher-IRQL work (interrupts) completed
Maximum times recommended: ISR: 10 usec, DPC: 25 usec
See http://www.microsoft.com/whdc/driver/perform/mmdrv.mspx
queue head
DPC object
DPC object
DPC object
15
Delivering a DPC
DPC routines can‘t
assume what
process address
space is currently
mapped
DPC
1. Timer expires, kernel
queues DPC that will
release all waiting threads
Kernel requests SW int.
Interrupt
dispatch table
high
Power failure
2. DPC interrupt occurs
when IRQL drops below
dispatch/DPC level
DPCDPC DPC
DPC queue
DPC routines can call kernel functions
but can‘t call system services, generate
page faults, or create or wait on objects
3. After DPC interrupt,
control transfers to
thread dispatcher
Dispatch/DPC
APC
Low
dispatcher
4. Dispatcher executes each DPC
routine in DPC queue
16
I/O Completion:
Asynchronous Procedure Calls (APCs)
Execute code in context of a particular user thread
APC routines can acquire resources (objects), incur page faults,
call system services
APC queue is thread-specific
User mode & kernel mode APCs
Permission required for user mode APCs
Executive uses APCs to complete work in thread space
Wait for asynchronous I/O operation
Emulate delivery of POSIX signals
Make threads suspend/terminate itself (env. subsystems)
APCs are delivered when thread is in alertable wait state
WaitForMultipleObjectsEx(), SleepEx()
17
Asynchronous Procedure Calls
(APCs)
Special kernel APCs
Run in kernel mode, at IRQL 1
Always deliverable unless thread is already at IRQL 1 or above
Used for I/O completion reporting from “arbitrary thread context”
Kernel-mode interface is linkable, but not documented
“Ordinary” kernel APCs
Always deliverable if at IRQL 0, unless explicitly disabled
(disable with KeEnterCriticalRegion)
User mode APCs
Used for I/O completion callback routines (see ReadFileEx, WriteFileEx); also,
QueueUserApc
Only deliverable when thread is in “alertable wait”
Thread
Object
K
APC objects
U
18
Driver Layering and Filtering
To divide functionality across
drivers, provide added value, etc.
Process
User Mode
Only the lowest layer talks to
the I/O hardware
“Filter drivers” attach their devices
to other devices
They see all requests first and
can manipulate them
Example filter drivers:
File system filter driver
Kernel Mode
System Services
File System
Driver
I/O Manager
Volume
Manager
Driver
IRP
Bus filter driver
Disk Driver
19
Driver Filtering:
Volume Shadow Copy
New to XP/Server 2003
Addresses the “backup open files” problem
Volumes can be “snapshotted”
Allows “hot backup” (including open files)
Applications can tie in with mechanism to ensure consistent
snapshots
Database servers flush transactions
Windows components
such as the Registry
flush data files
Different snapshot providers
can implement different snapshot
mechanisms:
Copy-on-write
Volsnap is the built-in provider:
•Built into Windows XP/Server 2003
•Implements copy-on-write snapshots
•Saves volume changes in files on the
volume
•Uses defrag API to determine where the file
is and where paging file is to avoid tracking
their changes
Mirroring
20
Volume Snapshots
Writers
Oracle
SQL
Backup
Application
2. Writers told
to freeze
activity
4. Writers told
to resume
(“thaw”)
activity
1.
Volume Shadow
Copy Service
5. Backup application
saves data from volume
Shadow copies
Backup
application
requests
shadow copy
3. Providers asked to
create volume shadow
copies
Volume Shadow
Copy Driver
(volsnap.sys)
Mirror provider
Providers
21
Volsnap.sys
Backup Application
Application
Shadow
Volume
C:
C:
File System Driver
Backup read of sector c
a
b
c
Snapshot
Volsnap.sys
Application read of sector c
All reads of sector d
a
d
b
…
c
22
Shadow Copies of Shared Folders
When enabled, Server
2003 uses shadow copy
to periodically create
snapshots of volumes
Schedule and space used
is configurable
23
Shadow Copies on Shared Folders
Shadow copies are only
exposed as network shares
Clients may install an
Explorer extension that
integrates with the file server
and let’s them
View the state of folders and
files within a snapshot
Rollback individual folders and
files to a snapshot
24
The PnP Manager
In NT 4.0 each device driver is responsible for
enumerating all supported busses in search of
devices they support
As of Windows 2000, the PnP Manager has bus
drivers enumerate their busses and inform it of
present devices
If the device driver for a device not already
present on the system, the PnP Manager in the
kernel informs the user-mode PnP Manager to
start the Hardware Wizard
25
The PnP Manager
Once a device driver is located,
the PnP Manager determines if
the driver is signed
If the driver
is not signed, the system’s driver
signing policy determines whether
or not the driver is installed
After loading a driver, the PnP Manager calls the driver’s AddDevice entry point
The driver informs the PnP Manager of the device’s resource requirements
The PnP Manager reconfigures other devices to accommodate the new device
26
The PnP Manager
Enumeration is recursive, and directed by bus drivers
Bus drivers identify device on a bus
As busses and devices are registered, a device tree is constructed,
and filled in with devices
Keyboard
Video
Disk
USB
PCI
Battery
ACPI
Device Tree
Root
27
Resource Arbitration
Devices require system hardware resources to function (e.g. IRQs,
I/O ports)
The PnP Manager keeps track of hardware resource assignments
If a device requires a resource that’s already been assigned, the
PnP Manager tries to reassign resources in order to accommodate
Example:
1. Device 1 can use IRQ 5 or IRQ 6
2. PnP Manager assigns it IRQ 5
3. Device 2 can only use IRQ 5
4. PnP Manager reassigns Device 1 IRQ 6
5. PnP Manager assigns Device 2 IRQ 5
28
Plug and Play (PnP) State Transitions
PnP manager recognizes hardware, allocates resources, loads
driver, notifies about config. changes
Not started
Start-device
command
Query-remove
command
Pending
remove
Remove
command
Started
Start-device
command
Removed
Query-stop
command
Pending stop
Surprise
remove
Remove
command
Surprise-remove
Stop
command
command
Stopped
Device Plug and Play state transitions
29
The Power Manager
A system must have an ACPI-compliant BIOS for full compatibility (APM
gives limited power support)
A number of factors guide the Power Manager’s decision to change power
state:
System activity level
System battery level
Shutdown, hibernate, or sleep requests from
applications
User actions, such as pressing the power button
Control Panel power settings
The system can go into low power modes,
but it requires the cooperation of every
device driver - applications can provide their input as well
30
The Power Manager
There are different system power states:
On
Everything is fully on
Standby
Intermediate states
Lower standby states must consume less power than higher ones
Hibernating
Save memory to disk in a file called hiberfil.sys in the root directory
of the system volume
Off
All devices are off
Device drivers manage their own power level
Only a driver knows the capabilities of their device
Some devices only have “on” and “off”, others have intermediate states
Drivers can control their own power independently of system power
Display can dim, disk spin down, etc.
31
Power Manager
based on the Advanced Configuration and Power Interface (ACPI)
State
Power Consumption
Software Resumption
HW Latency
S0 (fully on)
Maximum
Not applicable
None
S1 (sleeping)
Less than S0,
more than S2
System resumes where it left
off (returns to S0)
Less than 2
sec.
S2 (sleeping)
Less than S1,
more than S3
System resumes where it left
off (returns to S0)
2 or more
sec.
S3 (sleeping)
Less than S2,
processor is off
System resumes where it left
off (returns to S0)
Same as S2
S4 (sleeping)
Trickle current to power
button and wake
circuitry
System restarts from
hibernate file and resumes
where it left off (returns to S0)
Long and
undefined
S5 (fully off)
Trickle current to
power button
System boot
Long and
undefined
System Power-State Definitions
32
Troubleshooting I/O Activity
Filemon can be a great help to understand and troubleshooting I/O
problems
Two basic techniques:
Go to end of log and look backwards to where problem occurred or is
evident and focused on the last things done
Compare a good log with a bad log
Often comparing the I/O activity of a failing process with one that works
may point to the problem
Have to first massage log file to remove data that differs run to run
Delete first 3 columns (they are always different: line #, time, process
id)
Easy to do with Excel by deleting columns
Then compare with FC (built in tool) or Windiff (Resource Kit)
33
Filemon
# - operation number
Process: image name + process id
Request: internal I/O request code
Result: return code from I/O operation
Other: flags passed on I/O request
34
Using Filemon
Start/stop logging (Control/E)
Clear display (Control/X)
Open Explorer window to folder containing file:
Double click on a line does this
Find – finds text within window
Save to log file
Advanced mode
Network option
35
What Filemon Monitors
By default Filemon traces all file I/O to:
Local non-removable media
Network shares
Stores all output in listview
Can exhaust virtual memory in long
runs
You can limit captured data with history
depth
You can limit what is monitored:
What volumes to watch in Volumes menu
What paths and processes to watch in Filter dialog
What operations to watch in Filter dialog (reads,
writes, successes and errors)
36
Filemon Filtering and Highlighting
Include and exclude filters are substring matches against
the process and path columns
Exclude overrides include filter
Be careful that you don’t exclude potentially useful data
Capture everything and save the log
Then apply filters (you can always reload the log)
Highlight matches all columns
37
Basic vs Advanced Mode
Basic mode massages output to be sysadminfriendly and target common troubleshooting
Things you don’t see in Basic mode:
Raw I/O request names
Various internal file system operations
Activity in the System process
Page file I/O
Filemon file system activity
38
Understanding Disk Activity
Use Filemon to see why you’re hard disk is crunching
Process performance counters show I/O activity, but not to
where
System performance counters show which disks are being hit,
but not which files or which process
Filemon pinpoints which file(s) are being accessed, by whom,
and how frequently
You can also use Filemon on a server to determine which
file(s) were being accessed most frequently
Import into Excel and make a pie chart by file name or operation
type
Move heavy-access files to a different disk on a different
controller
39
Polling and File Change Notification
Many applications respond to file and directory changes
A poorly written application will “poll” for changes
A well-written application will request notification by the system
of changes
Polling for changes causes performance degradation
Context switches including TLB flush
Cache invalidation
Physical memory usage
CPU usage
Alternative: file change notification
When you run Filemon on an idle system you should
only see bursty system background activity
Polling is visible as periodic accesses to the same files and
directories
File change notification is visible as directory queries that have
no result
40
Example: Word Crash
While typing in the document Word XP
would intermittently close without any error
message
To troubleshoot ran Filemon on user’s
system
Set the history depth to 10,000
Asked user to send Filemon log when Word
exited
41
Solution: Word Crash
Working backwards, the first “strange” or
unexplainable behavior are the constant reads
past end of file to MSSP3ES.LEX
User looked up what .LEX file was
Related to Word proofing tools
Uninstalled and reinstalled proofing tools & problem
went away
42
Example: Useless Excel Error
Message
Excel reports an error “Unable to read
file" when starting
43
Solution: Useless Excel Error Message
Filemon trace shows Excel reading file in
XLStart folder
All Office apps autoload files in their start folders
Should have reported:
Name and location of file
Reason why it didn’t like it
44
Further Reading
Mark E. Russinovich and David A. Solomon,
Microsoft Windows Internals, 4th Edition,
Microsoft Press, 2004.
I/O Processing (from pp. 561)
The Plug and Play (PnP) Manager (from pp. 590)
The Power Manager (from pp. 607)
Troubleshooting File System Problems (from pp.
711)
45
Source Code References
Windows Research Kernel sources
\base\ntos\io – I/O Manager
\base\ntos\inc\io.h – additional structure/type
definitions
\base\ntos\verifer – Driver Verifier
\base\ntos\inc\verifier.h – additional structure/type
definitions
46