Chapter 8 - General Troubleshooting
This chapter identifies tools that are available in Windows NT to help you troubleshoot problems. It contains information about troubleshooting hardware problems, and how to use information in the Registry to determine why services aren't working correctly. It also contains an example of using information in the Registry for troubleshooting.
Careful record keeping is essential to successful troubleshooting. You should have records of your network layout, cabling, previous problems and their solutions, dates of installation of hardware and software, and so on, all readily accessible.
Many problems can be avoided with routine virus checks. Be sure to check for viruses before installing or upgrading Windows NT on a computer that is already in use.
This chapter gives information about other chapters with troubleshooting help; it explains methodology; provides an overview of Windows NT tools; describes hardware problem-solving; and explains how to identify which services or drivers are working.
Sources of Troubleshooting Information
In addition to the troubleshooting tools that are described in this chapter, there are several other sources of troubleshooting information:
This section discusses approaches for solving a problem and presents an example of a troubleshooting scenario. There are three parts to this methodology:
Isolating the Problem
First, try to isolate the problem. What, precisely, is not working correctly? Try to narrow down exactly what you expect to have happen versus what is happening.
For example, if your computer does not complete startup, you need to identify how far it gets, and write down any error messages. On an x86-based computer, if you get an error such as Missing operating system from the system BIOS when you start your computer, the problem is very different than if startup fails after the boot loader (NTLDR) starts. You know that the NTLDR has started when you see the message
NTDETECT V1.0 Checking Hardware . . .
Another way to isolate the problem is to figure out if there are related programs or functionality that works correctly on this computer. If so, what are the differences between what works and what does not work?
Identifying Whether It Works in Other Situations
Has what you are trying to do ever worked on this computer before? If so, something might have changed that affects it. Have you changed hardware or installed new software? Has somebody else been using the computer, and could that person have made changes you do not know about?
If this program or functionality has never worked on this computer, compare the setup and configuration on this computer with the same program on another computer to identify differences.
As an example, identical 624 MB IDE disks are installed on two different x86-based computers. On one computer, 609 MB are available after creating partitions. On the other computer, only 504 MB are available. If you look at the messages that the system BIOS displays when starting up the two computers, you may see that the computer with 609 MB has a newer BIOS than the other computer. You would need to upgrade the other computer's system BIOS, or obtain a third-party translation utility that enables the computer to access the entire disk.
Defining an Action Plan
Try to identify all of the variables that could affect the problem. As you troubleshoot the problem, try to change only one of these variables at a time. Keep records of what you do and the effect of each action.
It's advantageous to develop your plan on paper. Decide what steps you want to take, and what you expect to do based on the results of each step. Then do the steps in order, and follow your plan.
If you see a result for which you have no plan:
Here is a scenario that shows applying this approach to an actual problem. A user was trying to upgrade a home computer to a newer version of Windows NT 4.0 (before the final product was available). The user was about half finished with copying files from the CD to the hard disk when a message appeared saying that a file could not be copied. This was how the user isolated the cause of the problem.
The user has successfully installed earlier versions of the software on this computer. Since the last upgrade of Windows NT, the CD had been used to install another program, with no problems.
Nothing has been changed on the computer since the last upgrade, except installing the other program. That program should have no relationship to the problem. Other people can install the same version of Windows NT from CD on similar computers.
The user noticed that the CD-ROM drive made noises like it was spinning faster and then slower just before the error message.
These are the steps that he used to identify and recover from the problem.
Step 1. Check the event log to see if there are any errors logged. The CD ROM device was reporting bad blocks on the CD, so Windows NT knew that there were problems.
Step 2. Inspect the CD for dust or scratches. There were no obvious problems on the CD, and the user previously had no problems using the CD.
Step 3. Copy files from the CD manually rather than running Windows NT Setup. The file that caused the error copied fine, but other files on the CD could not be copied.
Step 4. Get another CD of the same version and try to install Windows NT from it. (Perhaps there is a problem with the CD itself.) Windows NT Setup failed on the same file on both CDs, and manually copying files fails on the same files.
Step 5. Install software from other CDs that have worked on this computer before. The user noted that some work, some do not. The ones that do not work have more data on them than ones that install successfully. Therefore, something must be wrong with accessing data on the later tracks of the CD. Data is recorded on CDs starting on the innermost track. CDs vary their spin rate when reading inner versus outer tracks. Something might be wrong with the motor synchronization spin rate.
Step 6. Look inside the CD-ROM drive for signs of dust or hair that might interfere with proper operation at one end of the read head's range of motion. A hair was found stuck to the read head.
Using Troubleshooting Tools
This section provides a brief overview of the troubleshooting tools that are available on the Windows NT Server product CD and the Windows NT Server Resource Kit CD.
Windows NT Tools
These tools are installed when you install Windows NT Server:
Windows NT Server Resource Kit Tools
The Windows NT Server Resource Kit contains many tools that can be used for troubleshooting. For information about all of the tools available in the Windows NT Server Resource Kit, refer to the online Resource Kit Tools Help (Rktools.hlp) and double-click each of the tools groups from the Contents page.
Troubleshooting Hardware Problems
There are three Microsoft products that you can use to help troubleshoot hardware problems:
Using the Hardware Compatibility List (HCL)
The most common cause of hardware problems is the use of hardware that is not listed on the Hardware Compatibility List (HCL). The HCL included in the Windows NT Server Resource Kit lists the hardware components that have been tested and have passed compatibility testing with Windows NT version 4.0. It is especially important for you to refer to the HCL if you plan to use any modems, tape backup units, and SCSI adapters.
The latest HCL is available on:
To avoid problems, make sure that you are using a device make and model that is listed on the HCL. If several models from one manufacturer are included in the HCL, only those models are supported; a slightly different model might cause problems. Where special criteria are required for a model to be supported (for example, if a particular version of driver is required), this information is described as a footnote in the HCL. As additional hardware is tested, the HCL is updated, in this way new device drivers and other system components are added to the HCL. The updated list and software are available through the electronic services listed at the end of the HCL.
Using the Windows NT Hardware Detection Tool (NTHQ)
NTHQ is an MS-DOS-based program. The next procedure describes how to run the program.
To run NTHQ
The file Readme.txt on the floppy disk contains details about NTHQ. You can see the same information by clicking the Help button on the NTHQ screen.
These are the three ways that NTHQ is most often used:
Using the Windows NT Diagnostics Administrative Tool
You can use this program to display Registry information in an easily-readable format. The Windows NT Diagnostics Administrative Tool enables you to:
To run Windows NT Diagnostics
The information that you can view is organized into nine tabs. The next screen shot shows the kind of information that you see when you click the System tab.
These are the tabs that you can select in Windows NT Diagnostics:
Other Approaches to Troubleshooting Hardware Problems
If your hardware components are listed on the HCL, and you are still having problems, check that the physical connections are secure.
If you are using a SCSI device, check it termination. Even if you are sure the termination is correct, and you are having problems that could be due to incorrect termination, open the computer case and check again. You should use active rather than passive terminators whenever possible.
Note Terminators are used to provide the correct impedance at the end of a cable. If the impedance is too high or too low, internal signal reflections can take place. These echoes represent noise on the cable, and can corrupt subsequent signals, which can result in degraded performance or data loss.
Passive terminators are resistors with the appropriate resistance value for the characteristic impedance of the cable. Active terminators are slightly more sophisticated electronics that are able to better maintain the correct impedance necessary to eliminate signal reflection.
Verify that the SCSI cables are not longer than they need to be. If a two-foot cable is long enough to connect the device to the controller, do not use a three-foot cable just because you have one available. The acceptable lengths vary depending on such factors as: whether you are using basic SCSI, SCSI-2, wide SCSI, ultra-wide SCSI, differential SCSI; the quality of the termination; and the quality of the devices being used. Consult your hardware documentation for this information.
Check your hardware configuration. I/O and interrupt conflicts that went unnoticed under another operating system must be resolved when you switch to Windows NT. Likewise, you must pay much closer attention to CMOS and EISA configuration parameters when using Windows NT.
The Knowledge Base is a good source of information for hardware problems. There are several articles about memory problems, memory parity errors, SCSI problems, and other hardware information in the Knowledge Base.
If your computer crashes randomly and inconsistently, you might have memory problems. On x86-based computers, you can use the /maxmem switch in your Boot.ini file to troubleshoot memory problems. Chapter 6, "Troubleshooting Startup and Disk Problems," contains more information about the /maxmem switch and video problems.
Troubleshooting Using HKEY_LOCAL_MACHINE
Problems can often be traced to services, device drivers, or startup control data. The Registry key HKEY_LOCAL_MACHINE contains this configuration information, so it is a good place to look for information to solve these types of problems. You have two Registry editors that you can use to look at information in the Registry:
Most of the examples in this section use the Regedt32.exe. You see the following screen when you run Regedt32.exe:
The following table briefly describes the Registry keys.
The HARDWARE and SYSTEM keys are the most useful for troubleshooting.
Note Do not change information in the Registry when you are using it for troubleshooting. Instead, use the options in Control Panel, such as Services, Devices, Network, and SCSI Adapters, to change Registry information.
The Registry information and examples in this section are for a Windows NT Server computer that uses the TCP/IP network protocol. It uses a DHCP server to get IP addresses. If your computer has a different configuration, or has third-party device drivers or services installed, the Registry will contain different information.
This key describes the physical hardware in the computer. Since the data in the HARDWARE key is stored in binary form, the best way to view the data are by using Windows NT Diagnostics, one of the programs in the Administrative Tools (Common) program group. See the section titled "Using the Windows NT Diagnostics Administrative Tool," presented earlier in this chapter, for more information about the program.
For more information about the HKEY_LOCAL_MACHINE \HARDWARE key, see Chapter 23 in the Windows NT Workstation Resource Guide, "Overview of the Windows NT Registry."
The HKEY_LOCAL_MACHINE \SYSTEM key contains information that controls system startup, device driver loading, Windows NT services, and operating system behavior. All startup-related data that must be stored (rather than computed during startup) is saved in the SYSTEM key. This screen shot shows the SYSTEM key and its subkeys.
The most important troubleshooting information in the Registry key HKEY_LOCAL_MACHINE \SYSTEM are the control sets. A control set contains system configuration information, such as which device drivers and services to load and start. There are at least two control sets, and sometimes more, depending on how often you change system settings, or have problems with the settings you choose. The preceding screen shot shows the following control sets:
The Registry subkey HKEY_LOCAL_MACHINE \SYSTEM \Select identifies how the control sets are used, and determines which control set is used at startup. This subkey contains the following value entries:
The next screen shot shows the value entries for the Select subkey.
Note The Registry editors each display the Registry in a similar way. The window on the left contains the key and subkey names. The window on the right contains value entries. In the preceding screen shot, one value entry is Current : REG_DWORD : 0x1. In this example, Current is the name, REG_DWORD is the data type, and 0x1 is the value. These terms will be used in the rest of this section.
The values for the value entries in the Select subkey identify which control set is Current, Default, Failed, and LastKnownGood. For example, a value of 0x1 indicates that you should look at ControlSet001 to find the infromation.
In the preceding screen shot, Current and Default are both 0x1. Failed is 0, and LastKnownGood is 0x2.
Therefore, ControlSet001 is the Current and the Default control set. ControlSet001 will be the one modified if you make any changes by using options in Control Panel. ControlSet001 will be used for the Default control set the next time you start the computer.
ControlSet002 is the LastKnownGood control set. If you decide to use the Last Known Good control set to start the computer, Windows NT will use ControlSet002.
For more information about the use of the control sets, see:
Finding Service and Device Dependencies
This section describes using information in the Control and Services subkeys to troubleshoot problems with your computer. The next screen shot shows the CurrentControlSet and its subkeys.
When you install Windows NT, it creates the Control and Services subkeys for each control set in HKEY_LOCAL_MACHINE \SYSTEM. Some information, such as which services are part of which group, and the order in which to load the groups, is the same for all Windows NT computers. Other information, such as which devices and services to load when you start your computer, is based on the hardware installed on your computer and the network software that you select for installation.
Each control set has four subkeys:
You can see the order in which device drivers should be loaded and initialized by viewing the Registry subkey HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \ServiceGroupOrder. Individual drivers that are members of a service group are loaded in this order:
"Service Groups," presented later in this chapter, lists drivers that are in each group.
The Registry subkey HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \ServicesService name controls how services are loaded. This section describes some of the value entries for this subkey, with an explanation of their values. The next screen shot shows the subkey HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \LanmanWorkstation and its value entries.
Figure 8.1. The Registry subkey HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \LanmanWorkstation
When a subkey has a value for the DependOnGroup value entry, at least one service from the group must be loaded before this service is loaded. This table shows services that have a value for DependOnGroup. The LanmanWorkstation service, shown in Figure 8.1, has a value for the DependOnGroup value entry.
This value entry identifies specific services that must be loaded before this service is loaded. The "Troubleshooting Example," presented later in this chapter, shows how you can use information in the DependOnService value entry to determine which services need to be started.
This table lists the services on the example computer that have a value for DependOnServices.
By knowing the dependencies, you can troubleshoot a problem more effectively. For example, if you stop the Workstation service, the Alerter, Messenger, and Net Logon services are also stopped, because they are dependent upon the Workstation service. If an error occurs when you try to start the Workstation service, any of the files that are part of Workstation service could be missing or corrupt. This is also why, if you start one of the services that depend on Workstation service, the Service Control Manager will automatically start the Workstation service if it is not already running.
This value entry controls whether an error during the startup of this driver will cause the system to switch to the LastKnownGood control set. If the value is 0 (ignore, no error is reported) or 1 (normal, error reported), startup proceeds. If the value is 2 (severe) or 3 (critical), an error is reported and LastKnownGood control set will be used.
The ErrorControl value for LanmanWorkstation is 0x1, which indicates that if there was an error starting LanmanWorkstation, an error would be logged in the event log, but Windows NT would complete startup.
This value entry identifies the path and file name of the driver. You can use My Computer or Windows NT Explorer to verify the existence of the named file. The ImagePath for LanmanWorkstation is %SystemRoot%\system32\services.exe.
This value entry determines when services are loaded during system startup. If a service is not starting, you need to know when and how it should be starting. Then look for the services that should have been loaded prior to this service. The values are described as follows:
The Type value entry helps you know where the service fits in the architecture. These are its possible values:
Many of the services that have a Type value of 0x20 are part of the Services.exe. For example, if your network protocol is TCP/IP, and you are configured to use a DHCP server to get IP addresses, these services that have a Type value of 0x20 are in the Services.exe:
These services are part of the Netdde.exe:
Many device drivers are arranged in groups to make startup easier. When device drivers are being loaded, Windows NT loads the groups in the order defined by ServiceGroupOrder. The next table shows which drivers are in each group.
This section describes using information in the DependOnGroup and DependOnService to find the cause of the following error message that you see after you log on.
You can use the Event Viewer to see which services or drivers did not start.
To run Event Viewer
The event log shows the following entries:
Sometimes, as you can see by the preceding System Log screen shot, several events are logged at approximately the same time. In this example, the newest event is entered at the top. Usually, if you look at the oldest event, you will find the reason that all of the events are logged. In the preceding example, the fourth entry from the top was the first one logged at 1:41:24. Double-clicking on it results in this event detail.
But if you look in the Registry there is no subkey HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Workstation. You have two methods that you can use.
You can use Regedit.exe to find the name anywhere in the control set.
To use Regedit.exe to find the Workstation service
If you think that the service name is part of the key name, you can use the Windows NT Registry Editor.
To use Regedt32.exe to find the Workstation service
Both Registry editors find a match on the subkey HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \LanmanWorkstation. The DisplayName value entry contains the name that you see when you use the Services icon in Control Panel, or the Services tab in the Windows NT Diagnostics administrative tool, to view information about services.
Therefore, this subkey is the one you are searching for. Its Start value is 0x4, which means it is disabled. It should be set to 0x2, which indicates it would start automatically when you start Windows NT.
As it turns out, you specifically disabled the Workstation service by using the Services icon in Control Panel and setting the Startup Type to Disabled. Then, the computer was restarted to see what happened.
But what about the other errors that are in the event log? If you double-click each of the first three entries, you find the following descriptions:
The Messenger service depends on the Workstation service which failed to start because of the following error. The specified service is disabled and cannot be started. The Computer Browser service depends on the TCP/IP NetBIOS Helper service which failed to start because of the following error. The dependency group or service failed to start. The TCP/IP NetBIOS Helper service depends on the NetworkProvider group and no member of this group started.
Changing the LanmanWorkstation service to start automatically will solve the problem with the Messenger service failing to start.
The Computer Browser and TCP/IP NetBIOS errors are both the result of no member of the NetworkProvider group starting. How do you find what services are in the NetworkProvider group? Regedt32.exe doesn't have an option to search for data, so you can use the Regedit.exe to find the NetworkProvider group.
To use Regedit.exe to find the NetworkProvider group
The only subkey that has a Group value of NetworkProvider is LanmanWorkstation. Changing LanmanWorkstation to start automatically will also solve these problems.
Identifying a Service or Driver That Doesn't Start
Some services are configured to start automatically on Windows NT. The specific services depend on your computer configuration, and which network services and protocols you are using.
You can use the Services option on Control Panel to view which services should have started automatically and see which ones did start. For example, the next screen shot was taken when the Workstation service was disabled.
You can see that TCP/IP NetBIOS Helper is configured to start automatically, but it did not start. The section "Troubleshooting Example," presented earlier in this chapter, describes why it did not start.
Sometimes, if a file that is needed to load or run Windows NT becomes corrupt or is deleted, the system displays a message about a problem with the file. You might also get information logged in the event log. You can use the message or the information in the event log to find the problem.
Not all executables or dynamic link libraries report missing or corrupt files, and the symptoms can be unpredictable with a file missing. What do you do if there is no indication of an error, but you think some component did not start correctly?
You can check to see if all the Windows NT system files exist and appear to be uncorrupted. Symptoms of file corruption include a file being an unusual size (for example, zero bytes or larger than its original size), or having a date or time that does not match the Windows NT installation date or dates on service packs that you have installed. You can compare files in your %systemroot%\System32 folder and subfolders with files in these folders on another computer that has the same Windows NT version and service packs installed.
If you think that you might be having a problem with a Windows NT system file, you can run Windows NT Setup and repair the problem by using the Verify Windows NT system files option.
If you can log on to your computer, you can use the Drivers utility on the Windows NT Server Resource Kit CD to display information about all the drivers that were loaded. If you have previously printed the output from the Drivers utility (by redirecting the output to a printer or a file), you can compare the previous output with one that you produce when you think you might be having problems with drivers not loading. Another method of determining if there are drivers missing from the list is to run the Drivers utility on a similar computer and compare the results.
The following table is a description of the output from the Drivers utility. The most important field is ModuleName, which is the name of the component.
To get a hardcopy of the output from the Drivers utility enter drivers >filename at the command prompt, and then print the file. The next example shows some of the output from a Drivers report.
ModuleName Code Data Bss Paged Init LinkDate ------------------------------------------------------------------------ ntoskrnl.exe 265472 39040 0 432128 76800 Mon Apr 15 16:28:07 1996 hal.dll 19904 2272 0 8992 10784 Wed Apr 10 10:24:22 1996 Atdisk.sys 12352 64 0 0 10368 Wed Apr 10 10:30:24 1996 Disk.sys 2304 0 0 7648 1504 Wed Apr 10 10:31:18 1996 CLASS2.SYS 6112 0 0 1472 1024 Thu Apr 11 09:21:58 1996 Ftdisk.sys 22880 32 0 1504 2048 Fri Apr 12 12:00:30 1996 Diskperf.sys 2048 0 0 0 768 Wed Apr 10 10:30:17 1996