Previous Table of Contents Next


Chapter 14
Demystifying The “Blue Screen”

  Understanding A Core Dump
  Understanding Kernal Mode Error Messages
  Using A Memory Dump File

If you believe in Murphy’s law that “if anything can go wrong, it will,” then you already understand why Microsoft includes the ability to create a core dump in its operating systems. A core dump is that mysterious blue screen with white characters that you see when Windows NT has a system failure. Many of us in the industry refer to this screen as the “Blue Screen Of Death” (BSOD). We like to inject a little humor into the situation, because, if you can’t laugh at the problem, you are just going to get an ulcer. Believe me, I’ve been there many times. A core dump includes specific information about a problem that Windows NT is unable to deal with on its own.

A core dump is primarily used by three types of people. The first type of person is the Windows NT device driver developer. This person writes kernel mode device drivers for Windows NT and usually has a very intimate knowledge of the kernel. The developer generally spends quite a bit of time debugging the device driver. The second type of person is usually a professional support technician. This person spends a good deal of time trying to determine the cause of a problem to prevent future occurrences of the same problem. Most times, this person is a Microsoft employee who has been called in to help solve the problem of a recurring system crash on a server. The third type of person that uses a core dump is you. You might not get deeply involved in the process of dissecting all the time, but you’ve probably seen the blue screen often enough to get a feel for some of the possible errors that can occur on your system. Then again, you might want to get more involved with using a core dump to determine the cause of a problem but lack the knowledge of how to accomplish this goal. That’s where I come into the picture. I’m going to try to help you gain a better understanding of the process and what you can do to determine the root cause of a problem.

Sometimes, a problem just takes a bit of common sense. For example, let’s say you just installed new software or hardware and the computer dumped a blue screen in your face after you rebooted. In this case, you could be fairly certain that the new software or hardware is the cause of the failure. You might not know if the failure was directly or indirectly caused by the new software or hardware. But, you would know it happened because you introduced something new into the system. Most times, you’ll just choose the Last Known Good option at system startup, and the server will once more boot without a problem. To determine the cause of the problem, however, you really need to understand more about the information that a core dump provides.

Understanding A Core Dump

If you are fortunate, you will never encounter a core dump. Realistically, though, you should be prepared for this eventuality to occur. When a core dump occurs, it means Windows NT encountered an error it could not handle. If you enabled the Recovery option to write debugging information to a dump file, a file called MEMORY.DMP will be placed in the SystemRoot directory, as well. This dump file will be the exact same size as the amount of physical memory you have installed in your system and contain a complete dump of the system RAM at the time the error occurred. If you have a server with 64MB of RAM, your MEMORY.DMP file will be 64MB in size. You will explore the options for reviewing a dump file later in this chapter, in the section entitled “Dissecting A Memory Dump File.” For now, let’s concentrate on core dumps.


Setting Your Recovery Options:  

Specifying your recovery options is performed through the Control Panel System applet. The recovery options appear on the Startup/Shutdown properties sheet. Within this property sheet, you can specify the following settings:

  Write an event to the system log—Specifies to insert a copy of the stop message, not the complete dump, into the system event log when a stop message is encountered.
  Send an administrative alert—Specifies to send an alert to your administrator that a stop message was encountered along with a copy of the stop message. To use this option, the Alerter and Messenger services must be running on both computers.
  Write debugging information to—Specifies to write a dump file to the specified file in the specified location.
  Overwrite any existing file—Specifies to write a new dump file over a previous version, if it exists.
  Automatically reboot—Specifies to reboot the computer after writing the dump file or, if no dump file was selected to be written, shortly after the core dump screen has been displayed.

By default, these options are enabled when you install Windows NT Server. On a Windows NT Workstation, these options are disabled. For what it is worth, I always recommend to enable the preceding options on all computers, regardless of whether they are workstations or servers. When a stop error occurs, you will want to know about it simply because a stop error shouldn’t occur normally. If it does occur, you will want to know why, so you can prevent the error from recurring.



Previous Table of Contents Next