: Writing Windows WDM Device Drivers

Test Driver Code

Test Driver Code

As explained in Chapter 6, a driver writer has to add DebugPrint.c[31] and DebugPrint.h source files to their driver project to support DebugPrint calls. These routines ensure that the trace statement output is sent to the DebugPrint driver. It should eventually be possible to put all the code in DebugPrint.c into a static library or DLL that is linked with test drivers.

The main job performed by the DebugPrint code in each test driver is to write events to the DebugPrint driver. The kernel provides several routines to call other drivers, including ZwCreateFile, ZwWriteFile, and ZwClose. The documentation for these functions says that these functions can only be called at PASSIVE_LEVEL IRQL. This means that they cannot be called directly from some sorts of driver code.

In my initial design, I ignored this issue. However, I soon ran into another problem that was more difficult to track down. Eventually, I worked out that calls to ZwWriteFile have to be made in the same process context as ZwCreateFile. The DDK documentation did not make this point prominently. In my initial design, I used ZwCreateFile in the test driver DriverEntry routine. However, the print routines using ZwWriteFile could naturally be called in dispatch routines, which are usually not running in the same process context.

Thread and process contexts are not normally a problem for most device drivers. When you process a normal user request, the IRP has all the information you need. The fact that you may be running in an arbitrary thread context does not effect the job you have to do. The Zw file access functions are one case when the process context is significant. If you use "neither" Buffered I/O nor Direct I/O or use METHOD_NEITHER IOCTLs, the thread context is also important.

System Threads

The solution to both these problems is to use a system thread. A driver can create a system thread (or threads) that runs in kernel mode. This thread runs at PASSIVE_LEVEL IRQL. If the DebugPrint system thread makes all the Zw calls, the "same process" problem is fixed. A doubly-linked list is used to store the events for processing by the system thread. Inserting events into this list can be done safely at IRQL levels up to and including DISPATCH_LEVEL This means that most types of driver code can generate DebugPrint trace output.

The DebugPrintInit routine calls PsCreateSystemThread (at PASSIVE_LEVEL IRQL) to create its system thread as shown in the partial code in Listing 14.1. PsCreateSystemThread is passed the name of the function to run and a context to pass to it. The DebugPrint system thread function is defined as follows.

void DebugPrintSystemThread(IN PVOID Context)

DebugPrintInit passes just a NULL context to the thread function. Some drivers may wish to create one thread per device, and so will usually pass a pointer to the device extension as the context. A DebugPrint test driver has just one system thread for all its devices.

The DebugPrint system thread does not need a high priority. Therefore, DebugPrintSystemThread calls KeSetPriorityThread straightaway to set its priority to the lowest real-time priority. It uses KeGetCurrentThread to get a pointer to its own thread object.

If PsCreateSystemThread succeeds, it returns a handle to the thread. Later, I shall show that a driver can wait for certain objects to be set. It can wait for the completion of a system thread if it has a pointer to the system thread object, not a handle. Use ObReferenceObjectByHandle to retrieve the thread object pointer. If this succeeds, call ZwClose to close the thread handle. Technically speaking, the call to ZwClose reduces the reference count to the thread handle. When the thread completes, its handle reference count will be decremented to zero.

A system thread must terminate itself using PsTerminateSystemThread. I am not sure what happens if a system thread function simply returns without calling PsTerminateSystemThread. The main driver code cannot force a system thread to terminate. For this reason, a global Boolean variable called ExitNow is used. This is set true when the driver wants the system thread to exit.

DebugPrintClose waits for the ThreadExiting event to be set into the signalled state by the thread as it exits. This ensures that the thread has completed by the time DebugPrintClose exits. DebugPrintClose is commonly called from a driver unload routine. All driver code must have stopped running when the unload routine exits, as the driver address space may soon be deleted. Make sure that any other call-backs are disabled, e.g., interrupt handlers, Deferred Procedure Calls, and timers.

Listing 14.1 DebugPrint test driver thread and event handling

KEVENT ThreadEvent;
KEVENT ThreadExiting;
PVOID ThreadObjectPointer=NULL;
void DebugPrintInit(char* _DriverName) {
//
ExitNow = false;
KeInitializeEvent(&ThreadEvent, SynchronizationEvent, FALSE);
KeInitia1izeEvent(&ThreadExiting, SynchronizationEvent, FALSE);
HANDLE threadHandle;
status = PsCreateSystemThread(&threadHandle, THREAD_ALL_ACCESS, NULL, NULL,NULL, DebugPrintSystemThread, NULL);
if (!NT_SUCCESS(status)) return;
status = ObReferenceObjectByHandle(threadHandle, THREAD_ALL_ACCESS, NULL, KernelMode, &ThreadObjectPointer, NULL);
if (NT_SUCCESS(status)) ZwClose(threadHandle);
//
}
void DebugPrintClose() {
//
ExitNow = true;
KeSetEvent(&ThreadEvent, 0, FALSE);
KeWaitForSingleObject(&ThreadExiting, Executive, KernelMode, FALSE, NULL);
//
}
void DebugPrintSystemThread(IN PVOID Context) {
// Lower thread priority
KeSetPriorityThread(KeGetCurrentThread(), LOW_REALTIME_PRIORITY);
// Make One second relative timeout
LARGE_INTEGER OneSecondTimeout;
OneSecondTimeout.QuadPart = 1i64 * 1000000i64 * 10i64;
// Loop waiting for events or ExitNow
while (true) {
KeWaitForSingleObject(&ThreadEvent, Executive, KernelMode, FALSE, &0neSecondTimeout);
// Process events
//
if (ExitNow) break;
}
// Tidy up
if (ThreadObjectPointer!=NULL) {
ObDereferenceObject(&ThreadObjectPointer);
ThreadObjectPointer = NULL;
}
DebugPrintStarted = FALSE;
ClearEvents();
KeSetEvent(&ThreadExiting, 0, FALSE);
PsTerminateSystemThread(STATUS_SUCCESS);
}

System Worker Threads

An alternative technique is available if you want to perform occasional short tasks at PASSIVE_LEVEL IRQL. However, this system worker thread method was not suitable for the DebugPrint software because the thread context may change between calls.

To use the system worker thread method, first allocate a WORK_QUEUE_ITEM structure from nonpaged memory. Call ExInitializeWorkItem passing this pointer, a callback routine, and a context for it. When you want your function to be run, call ExQueueWorkItem. In due course, your callback routine is called at PASSIVE_LEVEL in the context of a system thread. Do not forget to free the WORK_QUEUE_ITEM structure memory when finished with it. System worker threads have a lower priority than system threads running at the lowest real time priority, but higher than most user mode threads.

The WdmIo and PHDIo drivers, described in Chapters 15-18, show how to use a system worker thread.

In W2000, it is recommended that you use the IoAllocateWorkItem, IoQueueWorkItem, IoFreeWorkItem functions instead.

Events

The main DebugPrint test driver code sets the ExitNow Boolean to true when it wants its system thread to terminate. However, it is not a good idea for the system thread to spin continuously waiting for this value to become true.

Instead, a kernel event called ThreadEvent signals when to check ExitNow. A KEVENT must be defined for the event in nonpaged memory. Kernel events are very similar to their user mode Win32 cousins.

Listing 14.1 shows how ThreadEvent is initialized using KeInitializeEvent, at PASSIVE_LEVEL IRQL. Two types of events are supported, SynchronizationEvent and NotificationEvent. The last parameter sets the initial state of the event, which is nonsignalled in this case.

When set, a Synchronization event only releases one waiting thread before reverting to the nonsignalled state. A Notification event stays signalled until explicitly reset.

DebugPrintClose uses KeSetEvent to set an event into the signalled state, after setting ExitNow to true. The third parameter to KeSetEvent specifies whether you are going to call one of the KeWait routines straightaway. If not, you can call KeSetEvent at any IRQL up to and including DISPATCH_LEVEL. If waiting, you must be running at PASSIVE_LEVEL.

If you need to put an event into the nonsignalled state, call KeClearEvent or call KeResetEvent to determine the previous event state. You can use KeReadStateEvent to read the event state. All these routines can be called at DISPATCH_LEVEL or lower.

For NT and W2000 drivers you can use IoCreateNotificationEvent and IoCreateSynchronizationEvent to share an event between two or more drivers.

Synchronization

A thread running at PASSIVE_LEVEL can synchronize with other activities by waiting for dispatcher objects such as events, Mutex objects, and semaphores. You can wait for timer and thread objects. Finally, you can also wait on file objects if they have been opened in ZwCreateFile for overlapped I/O.

Although driver dispatch routines run at PASSIVE_LEVEL, they should not wait on kernel dispatcher objects, other than with a zero time-out. You can wait for inherently synchronous operations to complete using nonzero time-outs. Plug and Play handlers can wait on dispatcher objects. For example, the ForwardIrpAndWait routine described in Chapter 9 uses an event to signal when lower drivers have finished processing an IRP.

A thread waits for dispatcher objects to become signalled using KeWaitForSingleObject or KeWaitForMultipleObjects, which are similar to the Win32 equivalents. As Table 14.1 shows, KeWaitForSingleObject waits on just one dispatcher object, or until a time-out has expired. A negative timeout value is used for relative periods, as a LARGE_INTEGER in units of 100 nanoseconds. The DebugPrint system thread calls KeWaitForSingleObject with a relativetime-out of one second. Positive time-out values represent an absolute system time, in 100-nanosecond units since January 1, 1601[32] in the GMT time zone.

The KeWaitForMultipleObjects routine works in a similar way, except that you can pass an array of dispatcher objects. You can opt to wait for just one of the objects to become signalled, or all of them.

Table 14.1 KeWaitForSingleObject function

NTSTATUS KeWaitForSingleObject (IRQL==PASSIVE_LEVEL) or at DISPATCH_LEVEL if a zero time-out is given
Parameter Description
IN PVOID Object Pointer to dispatcher object
IN KWAIT_REASON WaitReason Usually Executive for drivers, but can be UserRequest if running for user in a user thread.
IN KPROCESSOR_MODE WaitMode Kernel Mode for drivers
IN BOOLEAN Alertable FALSE for drivers
IN PLARGE_INTEGER Timeout NULL for an infinite time-out. Negative time-outs are relative. Positive time-outs are absolute.
Returns STATUS_SUCCESS STATUS_TIMEOUT

Mutex Objects

A Mutex is a mutual exclusion dispatcher object that can only be owned by one thread at a time. Mutexes are sometimes called "mutants." Initialize a KMUTEX object in nonpaged memory using KeInitializeMutex; the Level parameter is used to ensure that multiprocessor Windows 2000 systems can acquire multiple Mutexes safely.

A Mutex object is in the signalled state when it is available. A thread requests ownership using one of the KeWaitFor routines. If two or more threads are waiting for a Mutex, only one thread will wake up and become its owner. Call KeReleaseMutex to release ownership.

If you already own a Mutex and ask for it again, the KeWaitFor routine will return immediately. An internal counter is incremented, so call KeReleaseMutex once for each time you requested ownership of the Mutex.

The kernel causes a bugcheck if you do not release a Mutex before returning control to the I/O Manager. KeInitializeMutex and KeReleaseMutex must be called at PASSIVE_LEVEL You can also inspect the Mutex state using KeReadStateMutex at an IRQL up to and including DISPATCH_LEVEL

A Fast Mutex is a variation on an ordinary Mutex that is faster because it does not permit multiple ownership requests. An Executive Resource is another similar synchronization object, available in W2000 only. See the DDK documentation for more details of these objects.

Semaphores

A semaphore is a dispatcher object that maintains a count. Call KeInitializeSemaphore at PASSIVE_LEVEL IRQL to initialize a KSEMAPHORE object in nonpaged memory. You must specify maximum and initial counts.

A semaphore is nonsignalled when zero and signalled with any count greater than zero. A thread that calls one of the KeWaitFor routines and finds a signalled semaphore will decrement its count and the thread will proceed. If a semaphore's count is 2 and three threads simultaneously attempt to wait for the semaphore, only two will proceed. The semaphore count ends up as 0 with one thread still waiting.

Call KeReleaseSemaphore, at DISPATCH_LEVEL or lower, to add a value to a semaphore count. You can read the semaphore count at any IRQL using KeReadStateSemaphore.

Timer, Thread, and File Objects

A timer is a dispatcher object that becomes signalled when its timer expires. A file object becomes signalled when an overlapped I/O operation has completed. The file must have been opened in ZwCreateFile with the DesiredAccess SYNCHRONIZE flag set. You can also wait for thread completion.

DebugPrint System Thread Function

Listing 14.1 shows that the DebugPrint system thread for drivers under test primarily consists of a loop that waits for the ExitNow flag to become true or for trace events to arrive.

At the top of this main loop, the thread function calls KeWaitForSingleObject to wait for the ThreadEvent to become signalled. As stated previously, DebugPrintClose sets the ExitNow flag to true and sets the ThreadEvent into the signalled state. The thread function is released; if it finds ExitNow true, it exits its main loop, tidies up, and terminates.

The call to KeWaitForSingleObject includes a one-second time-out. This is used to let the thread function look for and process trace events in the EventList buffer, as described in the next sections.

Generating Trace Events

The two formatted print functions, DebugPrint and DebugPrint2, eventually call DebugPrintMsg. I will not go into the details of how the formatted prints work. You can work it out for yourself by looking at the code in the Print and DebugSprintf routines in DebugPrint.c. The only point to note is that the DebugPrint routines can accept a variable number of arguments. I have assumed successfully so far that the va_list macros defined in stdarg.h work satisfactorily in kernel mode drivers.

Listing 14.2 shows how DebugPrintMsg builds a trace event and puts it in a DEBUGPRINT_EVENT structure allocated from nonpaged memory. The DEBUGPRINT_EVENT structure is added into the EventList doubly-linked list. DebugPrintMsg is passed a NULL-terminated ANSI message string.

The event data consists of the following three items:

the current system time in GMT,

the driver name (specified in DebugPrintInit), and

the message.

DebugPrintMsg first gets the current system time in GMT using KeQuerySystemTime. It converts the number of 100-nanosecond intervals since January 1, 1601 into a more recognizable TIME_FIELDS structure using RtlTimeToTimeFields. The ExSystemTimeToLocalTime function (which converts from GMT to local time) is only available in W2000, so it is not used here. The time is converted to the local timezone in the DebugPrint Monitor application.

It would reduce the event structure size if the LARGE_INTEGER output from KeQuerySystemTime were stored directly. However, there is no equivalent of the RtlTimeToTimeFields routine in Win32, so the event structure holds the time as time fields.

DebugPrintMsg now works out the size of the event data (i.e., the three previous data items, including the strings' terminating NULLs). It then determines the size of the DEBUGPRINT_EVENT structure that envelops the event data. It allocates some nonpaged memory for this structure and fills it in. It then uses ExInterlockedInsertTailList to insert the DEBUGPRINT_EVENT structure at the end of EventList.

Listing 14.2 DebugPrint test driver DebugPrintMsg function

void DebugPrintMsg(char* Msg) {
if (!DebugPrintStarted) return;
// Get current time
LARGE_INTEGER Now, NowLocal;
KeQuerySystemTime(&Now);
TIME_FIELDS NowTF;
RtlTimeToTimeFields(&Now, &NowTF);
// Get size of Msg and complete event
USHORT MsgLen = ANSIstrlen(Msg)+1;
ULONG EventDataLen = sizeof(TIME_FIELDS) + DriverNameLen + MsgLen;
ULONG len = sizeof(LIST_ENTRY)+sizeof(ULONG)+EventDataLen;
// Allocate event buffer
PDEBUGPRINT_EVENT pEvent = (PDEBUGPRINT_EVENT)ExAllocatePool(NonPagedPool.len);
if (pEvent!=NULL) {
PUCHAR buffer = (PUCHAR)pEvent->EventData;
// Copy event info to buffer
RtlCopyMemory(buffer, &NowTF, sizeof(TIME_FIELDS));
buffer += sizeof(TIME_FIELDS);
RtlCopyMemory( buffer, DriverName, DriverNameLen);
buffer += DriverNameLen;
RtlCopyMemory(buffer, Msg, MsgLen);
// Insert event into event list for processing by system thread
pEvent->len = EventDataLen;
ExInterlockedInsertTailList(&EventList, &pEvent->ListEntry, &EventListLock);
}
}

Linked Lists

Doubly-Linked Lists

A doubly-linked list is a slightly complicated beast to use safely. First, you need to declare a LIST_ENTRY structure in nonpaged memory for the list head. Drivers that need one list per device declare the list head in the device extension. However, the DebugPrint test driver code declares just its EventList variable as a global, as it is available to all devices.

LIST_ENTRY EventList;

Next, define a structure that you want to put in your doubly-linked list. Include a LIST_ ENTRY field in this structure to provide the links in both directions of the list. The DebugPrint structure is called DEBUGPRINT_EVENT. EventData is a variable length field, as it is not always 1-byte long. The Len field gives its length.

typedef struct _DEBUGPRINT_EVENT {
LIST_ENTRY ListEntry;
ULONG Len;
UCHAR EventData[1];
} DEBUGPRINT_EVENT, *PDEBUGPRINT_EVENT;

Initialize a doubly-linked list using InitializeListHead, passing a pointer to the list head variable. You can now insert DEBUGPRINT_EVENT structures at the head or tail of the list using the InsertHeadList and InsertTailList routines. The corresponding RemoveHeadList and RemoveTailList routines remove entries from the list[33]. Find out if the list is empty first using IsListEmpty.

All well and good. However, it is important that attempts to access the list are carried out safely so that the links are not corrupted in a multiprocessor environment. The kernel provides interlocked versions of the add and remove routines that use a spin lock to guard access to the link structure. The DebugPrint test driver code uses a spin lock called EventListLock and initializes it as normal.

KeInitializeSpinLock(&EventListLock);
InitializeListHead(&EventList);

Listing 14.2 shows how to use one of the interlocked linked list routines, ExInterlockedInsertTailList. It is passed pointers to the list head, the LIST_ENTRY field in your structure, and the spin lock.

Listing 14.3 shows an extract from the DebugPrint test driver system thread function. This is the code that is run every second to see if any events have been produced by calls to DebugPrintMsg.

Listing 14.3 DebugPrint test driver system thread event processing

// Loop until all available events have been removed
while(true) {
PLIST_ENTRY pListEntry = ExInterlockedRemoveHeadList(&EventList, &EventListLock);
if (pListEntry==NULL) break;
// Get event as DEBUGPRINT_EVENT
PDEBUGPRINT_EVENT pEvent = CONTAINING_RECORD(pListEntry, DEBUGPRINT_EVENT, ListEntry);
// Get length of event data
ULONG EventDataLen = pEvent->Len;
// Send event to DebugPrint
NTSTATUS status = ZwWriteFile(DebugPrintDeviceHandle, NULL, NULL, NULL,
&IoStatus, pEvent->EventData, EventDataLen, &ByteOffset, NULL);
// Ignore error returns
// Free our event buffer
ExFreePool(pEvent);
}

The code loops until all the events in EventList have been removed and sent to the DebugPrint driver. It removes the first entry from the doubly-linked list using ExInterlockedRemoveHeadList, passing pointers to the list head and the guarding spin lock. The return value is NULL if there is nothing left in the list.

ExInterlockedRemoveHeadList returns a pointer to the ListEntry field in the DEBUGPRINT_ EVENT structure. What is really needed is not this, but a pointer to the DEBUGPRINT_EVENT structure itself. For this particular structure, a simple cast would suffice. However, there is a way to deal correctly with the general case in which the LIST_ENTRY variable is not at the beginning of the structure. The system header files provide the appropriate macro, CONTAINING_RECORD. Pass the LIST_ENTRY pointer, the data type of your structure and the name of its LIST_ENTRY field. The returned value is the pointer to the DEBUGPRINT_EVENT structure.

Having got the correct event pointer in pEvent, the system thread extracts the length of the event data and writes the event data itself to the DebugPrint driver using ZwWriteFile. Finally, it frees the memory that was allocated for the DEBUGPRINT_EVENT structure, before checking to see if any more events are available.

The ClearEvents routine is called to clear any remaining events when the system thread finishes or DebugPrintClose is called. ClearEvents removes any events using ExInterlockedRemoveHeadList and frees the event memory.

Singly-Linked Lists

There are kernel functions for singly-linked lists, which are really stacks. Declare a list head as a variable of type SINGLE_LIST_ENTRY. Initialize it by setting its Next field to NULL. PushEntryList puts an entry onto the front of the list, while PopEntryList removes an entry from the front of the list. You must use the same technique as before to get the correct pointer from a popped entry: include a SINGLE_LIST_ENTRY field in the structure you put on the list and use CONTAINING_RECORD.

There are also interlocked versions of these routines, using a spin lock as before to ensure accesses are carried out safely.

Device Queues

A device queue is an enhanced form of a doubly-linked list, usually used for storing IRPs. These are covered in glorious detail in Chapter 16.


Final Pieces

Let's draw together the final pieces of the DebugPrint code for test drivers.

DebugPrintInit makes a copy of the driver name passed to it. Why does it bother to do this? Well, initially it did not. However, the call to DebugPrintInit usually comes from a DriverEntry routine. DriverEntry code is often marked as being discardable once the driver has been initialized. This meant that the code with the driver name string might be discarded. I found that this did indeed happen and so the original pointer referred to invalid memory. Watch out for this problem in your drivers.

Listing 14.4 shows how the system thread opens a connection to the DebugPrint driver using ZwCreateFile. It opens a connection to a DebugPrint driver device called DevicePHDDebugPrint. The DebugPrint driver also uses a driver interface so that user mode programs, such as DebugPrint Monitor, can find this device.

While drivers can use Plug and Play Notification to find other drivers that support a device interface, this is a complicated approach. Instead, the one and only DebugPrint device is given a kernel name of PHDDebugPrint. Calls to ZwCreateFile do not use Win32 symbolic link names for devices. Instead, as in this example, they must use the full kernel device name.

The filename must be specified as a UNICODE_STRING. RtlInitUnicodeString is used to initialize the DebugPrintName variable. The filename is set into an OBJECT_ATTRIBUTES structure using InitializeObjectAttributes. A pointer to this structure is finally passed to ZwCreateFile. Like its Win32 equivalent, you must specify access and share parameters to ZwCreateFile. There are a host of other options, so consult the documentation for full details. Finally, you get a file HANDLE to use in further I/O requests.

As mentioned earlier, subsequent calls to ZwReadFile, ZwWriteFile, ZwQueryInformationFile, and ZwClose must take place in the same thread context as the call to ZwCreateFile. All these file I/O routines must be called at PASSIVE_LEVEL.

A typical call to ZwWriteFile is illustrated in Listing 14.3. As well as the file handle, simply specify the data pointer and a transfer count. The completion details can be found in an IO_STATUS_BLOCK structure. Specify a file pointer byte offset.

The DebugPrint test driver code eventually closes the file handle using ZwClose just before it terminates.

The system thread code tried to open its connection to the DebugPrint driver for 5 minutes. This ensures that the DebugPrint driver has started during system startup.

Listing 14.4 DebugPrint test driver system thread file opening

// Make appropriate ObjectAttributes for ZwCreateFile
UNICODE_STRING DebugPrintName;
RtlInitUnicodeString(&DebugPrintName, _"DevicePHDDebugPrint");
OBJECT_ATTRIBUTES ObjectAttributes;
InitializeObjectAttributes(&ObjectAttributes, &DebugPrintName, OBJ_CASE_INSENSITIVE, NULL, NULL);
// Open handle to DebugPrint device
IO_STATUS_BLOCK IoStatus;
HANDLE DebugPrintDeviceHandle = NULL;
NTSTATUS status = ZwCreateFile(&DebugPrintDeviceHandle,
GENERIC_READ | GENERIC_WRITE,
&ObjectAttributes, &IoStatus,
0, // alloc size = none
FILE_ATTRIBUTE_NORMAL,
FILE_SHARE_READ|FILE_SHARE_WRITE,
FILE_OPEN,
0,
NULL, // eabuffer
0); // ealength
if (!NT_SUCCESS(status) || DebugPrintDeviceHandle==NULL) goto exit1;


: 0.268. /Cache: 3 / 0