Code for this article: Aug98Bugslayer.exe (75KB)
John Robbins is a software engineer at NuMega Technologies Inc. who specializes in debuggers. He can be reached at firstname.lastname@example.org.|
You've probably heard the proverb, "An ounce of prevention is worth a pound of cure." When it comes to handling crash problems in your code, the proverb should be: "A couple of key lines of code can keep your customers using your application so you can keep your job." Well, I guess that might not be as pithy and memorable as the original, but at least my proverb mentions code.
Since I am not going to make a million bucks writing proverbs, I had better stick to what I need to talk about here in the Bugslayer column. This month I'll cover exception handlers and unhandled exception filters or crash handlers. If you have been doing any C++ programming at all, you have probably already dealt with exception handlers. Crash handlers are those routines that can gain control right before the application shows that nice fault dialog that drives your users crazy. While the exception handlers are C++-specific, the crash handlers work with both C++ and Visual Basic®-based code.
To help you make your applications more robust, I will show you how to apply some of the assistance the operating system and the compiler offer. Additionally, if used judiciously, these ideas allow you to gather more information when your app does crash. This lets you solve potential problems faster. I will start out with a brief primer on exception and crash handling as the basis for some of the concepts that I will discuss later. I will also discuss the reusable code that I wrote for this column, which you can use in your exception and crash handlers. Finally, I will deal with some of the issues that have surfaced about the IMAGEHLP symbol engine that I first presented with the CrashFinder application in the April 1998 Bugslayer column.
Structured Exception Is On First,
C++ Exception Is On Third
The toughest thing about getting up to speed on exception handling is that there are two main types in C/C++ programming: Structured Exception Handling (SEH) and C++ exception handling. Many people talk about both types as if they are the same; however, they are two distinctly different types. What makes it confusing is that they can be combined. I'll quickly cover their differences and similarities. I'll also describe how to combine exception handler types to avoid some of the problem issues altogether.
SEH is provided by the operating system and it deals directly with crashes like access violations. SEH is language-independent, but it is usually implemented in C and C++ programs with the __try, __except, and __finally keywords. The idea is to set your code inside a __try block, and then to determine how to handle the exception in the __except block (also called an exception handler). The __finally block ensures that a section of code will always be executed upon leaving a function. Figure 1 shows a typical function with SEH. The __except block almost looks like a function call, but the parentheses specify the return value for a special routine called an exception filter. The code in Figure 1 specifies EXCEPTION_EXECUTE_HANDLER, which indicates that the code in the __except block must be executed every time any exception occurs inside the try block. The exception filter allows you to determine if the exception handler should be executed or not. You can make the exception filter code as simple or as complicated as you like.
Executing the exception handler is sometimes called unwinding the exception. If you were writing a math package, for example, you would handle divide by zero attempts and return NaN (not a number). When this fatal error occurs, your exception filter can call the special GetExceptionCode functionwhich can only be called in exception filtersto determine the exception value.
The following code shows the exception filter determining the exception type. If the exception is a divide by zero, then the exception handler will be executed. If it is any other exception, EXCEPTION_CONTINUE_SEARCH tells the operating system to execute the next __except block up the call chain.
double IntegerDivide (double x , double y )
double dbRet ;
dbRet = x / y ;
except ( EXCEPTION_INT_DIVIDE_BY_ZERO ==
GetExceptionCode ( )
dbRet = NaN ;
return ( dbRet ) ;
|If your exception filter requires more complexity, you can even call one of your own functions as the exception filter, as long as it returns how to execute the exception handler. In addition to the special GetExceptionCode function, the GetExceptionInformation function can also be called in the exception filter area. GetExceptionInformation returns a pointer to an EXCEPTION_POINTERS structure that completely describes the reason for a crash and the state of the CPU at that time. You may have guessed that this will come in handy later in the column.
SEH is not limited to just handling crashes. You can also create your own exceptions with the RaiseException API call. While most people do not use RaiseException, it can offer you a way to quickly leave deeply nested conditional statements in your code. While I personally do not program this way, it is cleaner than the old setjmp and longjmp runtime functions.
Before you jump in and start using SEH indiscriminately, there are two limitations that you should keep in mind. The first is minor: your error codes are limited to a single unsigned integer. The second limitation is a little more serious: SEH does not mix well with C++ programming. When SEH unwinds out of a function, it does not call any of the C++ object destructors for objects created on the stack. Since C++ objects can do all sorts of things in their constructors, this can lead to many problems. However, if you look at the code in this column, you will see that I use SEH heavily when I can.
If you would like to learn more about SEH, I recommend two references in addition to perusing the Microsoft®
Developer Network (MSDN). The best overview of SEH is in Jeffrey Richter's Advanced Windows, Third Edition (Microsoft Press, 1997). If you are curious about the actual SEH implementation, check out Matt Pietrek's article, "A Crash Course on the Depths of Win32 Structured Exception Handling" (MSJ, January 1997).
Because C++ exception handling is a part of the C++ language specification, it is probably more familiar to most programmers than SEH. The keywords for C++ exception handling are try and catch. The throw keyword allows you to initiate an exception unwind. While SEH is limited to just a single unsigned integer, a C++ exception catch can handle any variable type, including classes. If you design your error handling to derive off a common error class, then you can handle just about any error you need to in your code. This is exactly what MFC does with its CException class. Figure 2 shows C++ exception handling in action with an MFC CFile class read.
There are a couple of things to keep in mind with C++ exception handling. First, it does not handle your program crashes automatically. However, I will show how that can be done later. Second, C++ exception processing is not free. If you are working on extremely performance-sensitive code, the compiler does a great deal of work setting up and removing the try and catch blocks even if you never throw any exceptions. While these cases are rare, it is something to note. Third, C++ exception handling is not turned on by default, so you must compile with the GX switch. If you are new to C++ exception handling, MSDN is a great place to start learning about it.
As I promised earlier, there is a way to combine both SEH and C++ exceptions so that you only have to use one type. The C runtime library function _set_se_translator lets you set a translator function that will be called when a structured exception happens so that you can throw a C++ exception. This powerful function is one of those hidden gems. The following code snippet shows all that a translator function must do:
void SEHToCPPException ( UINT uiEx ,
EXCEPTION_POINTERS * pExp )
throw CSEHException ( uiEx , pExp ) ;
|The first parameter is the SEH code returned through a call to GetExceptionCode. The second parameter is the exception state from a call to GetExceptionInformation.
When translating a hard SEH crash into a C++ exception, handle the crash only if you expect that a crash is possible. For example, if you allow users to extend your application with DLLs, you can wrap the calls to the DLL with try...catch blocks to handle the crash. However, in the course of normal processing, you should end the application when you get a hard SEH crash. In one of my own programs, I accidentally handled an access violation instead of just crashing. As a result, instead of leaving the user's data alone, I proceeded to wipe out her data files.
Be careful about handling hard SEH exceptions as C++ exceptions because the process is in an unstable state.
You can go ahead and show dialogs and write out crash
information to a file as part of your handling. However, you need to be cognizant that the stack might be blown, so
you do not have the room to call many functions. Since the exception code is passed to your filter function, you need to check it and degenerate gracefully if there is insufficient stack space.
When I was putting this column together, I thought I would do a simple CSEHException class and a CMFCSEHException class derived from CException as part of the code. I decided to leave that as an exercise for the reader because both classes are rather simple and the really interesting code is in translating the EXCEPTION_POINTER information into human-readable form! Before I jump into that code, I want to cover the basics of crash handlers.
Captain, She's Gonna Blow!
In C++ code, you have the opportunity to handle those cases where you think that you might have a hard crash. But as we all know, crashes never happen where you expect them. Unfortunately, when your users see your crashes, they just see a dialog that gives them some hexadecimal gibberish and gives you almost no information to figure out the problem. It would be nice to be able to show your own dialog that tells the user in plain language what the problem is and gather enough information so you can easily figure out what happened. Through the magic of the SetUnhandledExceptionFilter API, you can easily do this. I have always referred to these handlers as crash handlers. Amazingly, this functionality has been in Win32® since Windows NT® 3.5, but it is almost undocumented. In the January 1998 MSDN, there are only nine places that even mention this function.
From my experience, crash handlers have excellent bugslaying capabilities. In one project that I worked on, when a crash occurred, I put up a dialog that explained that there was a crash and directed the user to call our technical support number. I logged all the information I could about the crash into a file, which included the state of the user's system. I also iterated the program's main objects so I could report down to the class level what objects were active and what they contained. I was logging almost too much information about the state of the program. With a crash report, I had a 90 percent chance of duplicating the user's problem. If that is not proactive bugslaying, I don't know what is!
Needless to say, I find SetUnhandledExceptionFilter quite powerful. If you look at its name, SetUnhandledExceptionFilter, you can probably start to guess what the function does. The one parameter to SetUnhandledExceptionFilter is a pointer to a function that is called in the final __except for the application. This function returns the same values that any exception filter would return: EXCEPTION_EXECUTE_HANDLER, EXCEPTION_
CONTINUE_EXECUTION, or EXCEPTION_CONTINUE_
SEARCH. You can do anything you want in the filter function, but as I warned earlier in the C++ _set_se_translator discussion, the application is probably unstable. To be on the safe side, you might want to avoid any C runtime library calls, as well as MFC. If you write your exception filter function in Visual Basic, you should be extra careful about what you access from the Visual Basic runtime. While I am obligated to mention these warnings, the vast majority of your crashes will be access violationsyou should not have any problems if you write a complete crash handling system in your function.
Your exception filter also gets a pointer to an EXCEPTION_POINTERS structure. Later, I will present several routines that translate this for you. Since each company has different crash handler needs, I will let you write
There are a couple of issues to remember when using SetUnhandledExceptionFilter. The first is that any exception filter that you set cannot be debugged. This is a known bug. Knowledge Base article Q173652 says that under a debugger the process wide filter is not called. This can be a bit of a pain, but in a C++ program you can just use your function in a regular SEH exception filter to debug it. If you look at the CH_TEST.CPP test program, which is part of this month's source code (Aug98Bugslayer.exe), this is exactly what I did to debug it. An alternative is to use a kernel debugger like WinDBG to get around this limitation.
Another issue is that calling SetUnhandledExceptionFilter is a process global operation. If you build the coolest crash handler in the world for your OLE control and the container crasheseven if it's not your faultyour crash handler will be executed. While you might think this could keep you from using SetUnhandledExceptionFilter, I have some code that might help you out.
Handle Only This
I wrote some simple functions to limit a crash handler to a specific module or modules (see Figure 3). I placed the code in the reusable BugslayerUtil.DLL, which you can find at Aug98Bugslayer.exe.
The basic idea for limiting the crash handler is that I set an unhandled exception filter. When it is called, I check the module it came from. If it is from one of the modules requested, I call the exception handler, but if it is from a module outside those requested, I call the previous exception filter I replaced. By calling the replaced one, multiple modules could use the crash handling API I defined without stepping on each other.
To set your filter function, simply call SetCrashHand-lerFilter. Internally, SetCrashHandlerFilter saves your filter function to a static value and calls SetUnhandledExceptionFilter to set the real exception filter, CrashHandlerExceptionFilter. If you do not add any modules that limit the exception filtering, CrashHandlerExceptionFilter will always call your exception filter no matter which module had the hard crash. It is best if you set your call to SetCrashHandlerFilter as soon as you can and make sure that you call it again with a NULL filter function right before you unload.
Adding a module to limit crash handling is done through the AddCrashHandlerLimitModule. All you need to pass to this function is the HMODULE for the module in question. If you have multiple modules that you want to limit crash handling to, just call AddCrashHandlerLimitModule for each one. The array of module handles for limiting are allocated and kept out of the main process heap.
As you look at the various functions in Figure 3, you will see that I do not make any C runtime library calls at all. Since the crash handler routines are called in extraordinary situations, I cannot rely on the runtime being in a stable state. To clean up any memory that I allocated, I use the automatic static class trick that I first discussed in the October 1997 Bugslayer column. I also provide a couple of functions that allow you to get the limit module size and a copy of the arrayGetLimitModuleCount and GetLimitModulesArray, respectively. I will leave it up to you to write a RemoveCrashHandlerLimitModule function.
Now that you have written your exception handlers and crash handlers, it's time to talk about those EXCEPTION_
POINTERS structures each gets passed. Since this is where all the interesting information about the crash is stored, I wanted to develop a set of functions that you can call to translate the information into human-readable form. With these functions, all you need to concentrate on is the display of information to the user in a manner that's appropriate
for your particular application. All of these functions are
in Figure 3.
I tried to keep the functions as simple as possible. All you need to do is to pass in the EXCEPTION_POINTERS structures. Each function returns a pointer to a constant string that holds the text. If you looked at the code, you might have noticed that each function has a corresponding function whose name ends in "VB". When I put these functions together I did not realize that Visual Basic couldn't handle a string returned from a function; it can only deal with a string as a parameter. Therefore, to use these functions from Visual Basic, you must pass in your own string buffer. Since the EXCEPTION_POINTERS-handling functions will be called in crash situations, I set them up to use a static buffer in CrashHandler.cpp. When using these functions from Visual Basic, declare a global string variable and Dim it early in the program so the memory is available.
The GetRegisterString function simply returns the formatted register string. The GetFaultReason function is a little more interesting in that it returns a complete
description of the problem. The returned string shows the process, the exception reason, the module that caused the exception, the address of the exception, andif symbol information is availablethe function, source, and line where the crash occurred.
CH_TESTS.EXE caused a EXCEPTION_ACCESS_VIOLATION in module
CH_TESTS.EXE at 001B:004010FB, Baz()+64 bytes,
CH_Tests.cpp, line 56+3 bytes
The most interesting functions are GetFirstStack-TraceString and GetNextStackTraceString. These functions, as their names indicate, let you walk the stack. Like the FindFirstFile and FindNextFile APIs, you can call GetFirstStackTraceString and then continue to call GetNextStackTraceString until it returns FALSE to walk the entire stack. In addition to the EXCEPTION_POINTERS structure, these functions take a flag option parameter that lets you control the amount of information that you want to see in the resulting string. The following string shows all the options turned on.
001B:004018AA (0x00000001 0x008C0F90 0x008C0200 0x77F8FE94)
CH_TESTS.EXE, main()+1857 bytes, CH_Tests.cpp,
line 341+7 bytes
The values in parentheses are the possible parameters to the function. Figure 4 shows the options flags and what each will include in the output string.
To see these functions in action, I included two sample test programs. The first, CH_TEST, is a C/C++ example. The second program, CrashTest, is a Visual Basic-based example. Between these two programs, you should get a pretty good idea of how to use all of the functions I've presented.
The implementation of these functions is rather straightforward and consists mainly of string buffer manipulations. For all the symbol information, I use the IMAGEHLP.DLL symbol engine that I discussed in the April 1998 Bugslayer column. The interesting part of the implementation takes place at the end of Figure 3. When I first started testing the functions, I noticed that the source and line information for an address would appear the first time that I requested it, but that it never appeared on subsequent lookups at the same address. Several astute readers had mentioned that they had seen the same thing with the CrashFinder application, but we were never able to see what was unique about the situation. It appeared that there was a bug in the SymGetLineFromAddr function. It only finds the source and line information once for an address when looking up the information in PDB symbols. It seems to work correctly for C7 and COFF symbols.
To work around this bug, reader Iain Coulthard figured out that SymGetLineFromAddr only seemed to find the source lines for addresses that are at the start of a linein other words, addresses with a displacement of zero bytes. Iain found that if you look backwards from the original address until you found the next address that had a zero displacement, the source line lookup works. After finding the zero-displacement address, just subtract the original address from the found address to compute the displacement. Iain's solution was much quicker than mine; the only way I found to make the SymGetLineFromAddr function work was to completely shut down and restart the symbol engine on each SymGetLineFromAddr call.
I put Iain's workaround in the InternalSym-GetLineFromAddr function so the work was not scattered across the source. InternalSymGetLineFromAddr also takes care of the case where an older version of IMAGEHLP.DLL that does not support SymGetLineFromAddr is on the system. If you plan on using SymGetLineFromAddr for a debugger, you might want to wait until a fixed version of IMAGEHLP.DLL has been released. If you do not want my workaround, undefine WORK_AROUND_SRCLINE_BUG when building BugslayerUtil.DLL.
I hope I have been able to show you the power of exception handlers and crash handlers. If used properly, they can save you from full crashes. If you do crash, you can maximize the information that helps you solve the problem quickly. You might want to consider building your release builds with COFF symbol information. This will add to your binary's size, but using the code I presented you can get some excellent free source and line information during a crash.
Many people have asked me where to find the version of IMAGEHLP.DLL that supports source and line lookup. It is on the November 1997 (or later) Platform SDK in the \MSSDK\bin\Win95\i386 directory. Although it is in the Win95 directory, it works perfectly well with Windows NT. Additionally, it does not look like IMAGEHLP.DLL is redistributable, so I am unable to email copies to those that need it. However, the entire Platform SDK is downloadable from http://msdn.microsoft.com/downloads/sdks/platform/platform.asp if you do not have the MSDN CD. For further information on redistributing IMAGEHLP.DLL and other files, check out Redist.txt in the \MSSDK\license directory on the Platform SDK.
There was a big bug in the April 1998 Bugslayer column! I mentioned that building your release builds with full PDB symbols would only add 1KB to the size of your binary. I failed to mention that turning on the /DEBUG flag for LINK.EXE also turns on the /OPT:NOREF flag. This means that all the unreferenced functions (COMDATS) will be included as well. Consequently, this can quickly jack up the size of your binary. Therefore, when linking with /DEBUG, make sure to specify /OPT:REF to force only referenced functions to show up in the resultant image. /OPT:REF will also turn off incremental linking, but you never want to have incremental linking turned on for release builds because it will add a ton of padding to the binary and waste all sorts of space. Incremental linking should only be used in debug builds.
Finally, the size of the PDB record can be a bit bigger than I thought it could be. I was under the impression that it started with the NB10 at the end of the binary. Evidently, there is some header information before it that looks like offset information into the PDB preceding the NB10. This portion can vary in size depending on the size of the binary, but not enough to stop you from building your release builds with full PDB files.
Got a debugging tip? Send it to email@example.com so you can bask in your two minutes and thirteen seconds of fame as you help your fellow developers!
Tip #11 One simple thing I've learned to do is to invalidate that which has been deallocated. For example, if you delete a pointer, set that pointer to NULL immediately afterward. If you close a handle, set that handle to NULL (or INVALID_HANDLE_VALUE) immediately afterward. This is especially true when these are members of a class. By setting a pointer to NULL, you prevent double delete calls. delete NULL is valid. (Thanks to Sam Blackburn, firstname.lastname@example.org.)
Tip #12 In the April 1998 column, I presented a tip about automatically initializing structures that require a size field to be filled out. Reader Simon Salter (email@example.com) offered an even better way to accomplish this using C++ templates:
template <typename T>
class SWindowStruct : public T
memset ( this , 0 , sizeof ( T ) ) ;
cbSize = sizeof ( T ) ;
Using this class, you just need to declare a structure like the following and it is taken care of automatically:
SWindowStruct<REBARBANDINFO> stRBBI ;
Have a tricky issue dealing with bugs? Send your questions or bug slaying tips via email to John Robbins: firstname.lastname@example.org