A couple of weeks ago I wrote a script called BSODCheck.vbs. This script will remotely check to see if a machine has created a new memory.dmp file. The existence of a new file indicates that a machine has blue screened and crashed.
This script is intended to be run periodically and the results of the script are compared against existing results to let you know if a new dump has been created since the last time the script has run. Well after using this script for a period of weeks I noticed that some of our servers were crashing for no apparent reason.
The first step to resolving this issue was to take a look at the dump files themselves. To do that I needed to copy the dump files to a local directory and examine them with a utility called WinDbg. These debugging tools are available free of cost from Microsoft.
After downloading and installing the debugging tools we need to configure WinDbg to use the correct symbols files. This is done by clicking “File -> Symbol File Path … ->” then entering the following text in the “Symbol path:” box.
SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Now that this is configured I can open the memory.dmp files that I copied off of the machines earlier. That is done by clicking “File -> Open Crash Dump …” and browsing to the location of the memory.dmp file.
After the file has been opened I can analyze the crash with the “!analyze –v” command. That command revealed the following crash dump analysis information:
******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* RDR_FILE_SYSTEM (27) If you see RxExceptionFilter on the stack then the 2nd and 3rd parameters are the exception record and context record. Do a .cxr on the 3rd parameter and then kb to obtain a more informative stack trace. The high 16 bits of the first parameter is the RDBSS bugcheck code, which is defined as follows: RDBSS_BUG_CHECK_CACHESUP = 0xca550000, RDBSS_BUG_CHECK_CLEANUP = 0xc1ee0000, RDBSS_BUG_CHECK_CLOSE = 0xc10e0000, RDBSS_BUG_CHECK_NTEXCEPT = 0xbaad0000, Arguments: Arg1: baad0080 Arg2: f78de840 Arg3: f78de53c Arg4: b9ebf79d Debugging Details: ------------------ EXCEPTION_RECORD: f78de840 -- (.exr fffffffff78de840) ExceptionAddress: b9ebf79d (SYMEVENT!SYMEvent_GetVMDataPtr+0x00004dbd) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 00000000 Parameter[1]: 00000028 Attempt to read from address 00000028 CONTEXT: f78de53c -- (.cxr fffffffff78de53c) eax=88ff58e8 ebx=00000000 ecx=00000004 edx=00000000 esi=89f67cf0 edi=b6f4cfa9 eip=b9ebf79d esp=f78de908 ebp=f78de974 iopl=0 nv up ei pl nz na po nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202 SYMEVENT!SYMEvent_GetVMDataPtr+0x4dbd: b9ebf79d 8a5124 mov dl,byte ptr [ecx+24h] ds:0023:00000028=?? Resetting default scope PROCESS_NAME: System CURRENT_IRQL: 0 ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s". READ_ADDRESS: 00000028 BUGCHECK_STR: 0x27 DEFAULT_BUCKET_ID: NULL_CLASS_PTR_DEREFERENCE LAST_CONTROL_TRANSFER: from 8080be67 to b9ebf79d STACK_TEXT: WARNING: Stack unwind information not available. Following frames may be wrong. f78de974 8080be67 88ff58e8 89f67cf0 8a113c00 SYMEVENT!SYMEvent_GetVMDataPtr+0x4dbd f78de9c0 8080db05 8a0d5aa8 00000000 89f67c00 nt!IopCompleteUnloadOrDelete+0xce f78de9d8 b9ebd268 89f67c00 8a113c80 b9ec9c28 nt!IoDeleteDevice+0x81 f78de9e4 b9ec9c28 00000000 b9ec56d8 00000001 SYMEVENT!SYMEvent_GetVMDataPtr+0x2888 f78de9ec b9ec56d8 00000001 8a113c80 89f67cf0 SYMEVENT!EventObjectCreate+0xb98 00000000 00000000 00000000 00000000 00000000 SYMEVENT!SYMEvent_GetSubTask+0x18c8 FOLLOWUP_IP: SYMEVENT!SYMEvent_GetVMDataPtr+4dbd b9ebf79d 8a5124 mov dl,byte ptr [ecx+24h] SYMBOL_STACK_INDEX: 0 FOLLOWUP_NAME: MachineOwner MODULE_NAME: SYMEVENT IMAGE_NAME: SYMEVENT.SYS DEBUG_FLR_IMAGE_TIMESTAMP: 42ddb287 SYMBOL_NAME: SYMEVENT!SYMEvent_GetVMDataPtr+4dbd STACK_COMMAND: .cxr 0xfffffffff78de53c ; kb FAILURE_BUCKET_ID: 0x27_SYMEVENT!SYMEvent_GetVMDataPtr+4dbd BUCKET_ID: 0x27_SYMEVENT!SYMEvent_GetVMDataPtr+4dbd Followup: MachineOwner
When we look at crash dumps we can keep some things in mind. Most of the time, even though Microsoft has a bad reputation for being unstable, the cause of the crash is not related to Microsoft code at all. So 99% of the time we can exclude code created by Microsoft as the likely culprit of the problem.
In this particular dump analysis we can see that the module that faulted was SYMEVENT.SYS. This file is a driver created by Symantec that is used to scan files for viruses.
1: kd> lmvm SYMEVENT start end module name b9afa000 b9b16cc0 SYMEVENT (export symbols) SYMEVENT.SYS Loaded symbol image file: SYMEVENT.SYS Image path: \??\C:\Program Files\Symantec\SYMEVENT.SYS Image name: SYMEVENT.SYS Timestamp: Tue Jul 19 21:10:15 2005 (42DDB287) CheckSum: 00024467 ImageSize: 0001CCC0 Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0
I also checked the event log of this machine and it revealed the following stop error:
After Googling for this particular combination of stop error and the faulting module, I found the following link explaining the issue in some detail.
From the article:
Situation:
You install Symantec AntiVirus on a computer that runs Windows 2003/XP/2000/NT. After the installation, the computer unexpectedly restarts or encounters a blue screen with a STOP message similar to the following:STOP 0x0000007f (0x00000008, 0x00000000, 0x00000000, 0x00000000)
UNEXPECTED_KERNEL_MODE_TRAPYou may see the following message in the Event Log: “Event ID: 1005. Source: SAVRT: Symantec AntiVirus Auto-Protect could not scan file <path><filename> for viruses due to low kernel stack.”
A common configuration for this situation is a Windows 2000 Server with Terminal Services in Remote Administration Mode with a combination of any of the following applications: Symantec AntiVirus Corporate Edition, St. Bernard Open File Manager, Quota Manager, Legato RepliStor, or other “filter drivers” that register with the Kernel Stack.
Solution:
This problem occurs because there is a limited amount of kernel space available for kernel drivers. If the operating system runs out of kernel space, then the computer displays a blue screen error message.
To fix this problem, do all of the procedures and all of the steps within each procedure. Do the procedures and steps in the order in which they appear.