Code for this article: BugSlayer1297.exe (170KB)
John Robbins is a software engineer at NuMega Technologies Inc. who specializes in debuggers. He can be reached at email@example.com.|
These days it's common for an application to have a
front-end done in one language, three or four components done in other languages, and yet another part that is running in a different process or on a completely different machine. Trying to track down a bug that occurs between the VBScript-based Active Server Page and an object that it calls on another machine can be a hair-raising experience. Unfortunately, there are no debuggers yet that will allow you to single-step across machines, or even do something seemingly simple like stepping from VBScript into a C++ control. Debugging modern applications definitely isn't easy.
Help is on the way! In this column, I will present a tool, TraceSrv, that at least allows you to easily add trace statements to all the parts of your applications, no matter what machine they reside on, and allows the output to be viewed all in the same place. While not as useful as the ultimate debuggerwhere you'd be able to single-step everything, everywhereat least you will have a fighting chance of tracking down problems in your multilanguage, multiprocess, and multimachine development. As all good Bugslayers know, trace statements are right up there with ASSERT macros in the Good Things list. You can never have enough of either.
In presenting TraceSrv, I will start out with a set of requirements and the underlying technology decisions, and then highlight the design and development. Most of the code for TraceSrv is not rocket science. I originally thought that TraceSrv would be rather easy to develop, but I spent a great deal of time filling many holes in my knowledge, as well as working around bugs in some tools. I hope I can save you some of the frustrations that I had to deal with.
First, let's go over my design objectives for TraceSrv. (All of you start all of your projects with the complete written specification of what you are doing, right? Check.). Here are the requirements that I started with:
If one of the TraceSrv options is changed, all of the currently active viewers
should be notified so that all viewers, even on other machines, are in synch with the current options.
- You must be able to use TraceSrv with most common programming languages including, at a minimum, C++, Visual Basic®, Delphi, Visual Basic for Applications, Java, and Visual Basic Script.
- It must be very simple to use inside a programming language.
TraceSrv must always be running so any application can connect to it at any time.
The trace statements of a program on multiple machines should go to the same place.
The trace statements should be viewable on multiple machines at the same time by trace viewer applications.
- There should be some trace statement options processing:
Prefix the trace statement with the time the trace statement was received.
Prefix the trace statement with the number of the trace statement.
Prefix the process ID of the process that sent the trace statement.
Append a carriage return and line feed, if needed.
Send the trace statement through to a kernel debugger
where the TraceSrv process is running.
Since TraceSrv is still under development, all of the code in this month's column has been tested only on Windows NT®. However, since Windows® 95 Distributed COM (DCOM) works just like Windows NT DCOM, it should be just a matter of getting the registry set and using DCOMCNFG.EXE to get the machine information specified.
At first glance, the requirements for TraceSrv look
kind of daunting because of things like network development and multilanguage programming. I thought I could address multilanguage issues with a simple DLL that anyone could load with a simple API. Since I am primarily a systems programmer, my ignorance of VBScript or Java was getting in my way. Particularly when I looked at VBScript, I realized that no matter how much hacking I
did I wasn't going to get VBScript to call a DLL directly. I finally started getting a clue when I saw that VBScript supported CreateObject; I just needed a COM object, and VBScript would be able to use it just fine. Since COM works in almost all languages, I decided to make TraceSrv a simple COM object.
COM made the network programming problem go away fairly easily. You basically get DCOM for free. The "running all the time" problem is solved by DCOM because you can have your DCOM servers running as Win32® services. The object is always ready if you use an automatic start service.
My first brush with DCOM services in the Windows NT 4.0 alpha days was rather scary. Not only did you have to write the servicenot the easiest thing in the worldyou also had to do all sorts of weird stuff with COM to get it all hooked up. Fortunately, my copy of Visual Studio 97 finally arrived. The Active Template Library (ATL) 2.1 with Visual C++® 5.0 handles all of the grunge work and even provides a wizard to help generate the code! Once that was settled, I needed to define the interface for TraceSrv.
The TraceSrv Interface
TraceSrv.idl (see Figure 1) is the main interface for TraceSrv. Basically, I use the Trace method of the ITrace interface to have a trace statement sent to TraceSrv. To hit the broadest number of languages, I decided to set the string type passed as a BSTR.
To write a trace statement viewer, all you need to do is handle the events from the ITraceEvent interface. Its properties, which match the requirements above, are on the ITrace interface in case an application using TraceSrv might want to change them. When a TraceSrv property is changed, it generates an event that a trace viewer should handle. While I probably should have used IPropertyNotifySink to do the property change notification, I could never get it to work in ATL. Since there were so few properties, it was simpler to just have change notifications for each of them. The TraceView program I'll go over later shows how to handle each event TraceSrv generates.
When I first developed TraceSrv, I got the event handling hooked up just fine in C++, but I could not get Visual Basic to see the events. After getting nowhere for a while, the July 1997 issue of the Microsoft Developer Network showed up on my doorstep and the article, "Events: From an ATL Server to a Visual Basic Sink," by Robert Coleridge, spelled out exactly how to do it. In a nutshell, Visual Basic does not generate vtables for event sinks, so all events must be fired through IDispatch. Now that you see how relatively simple the interface is, it's time to dive into the code.
The TraceSrv Code
After I had the ATL project wizard crank out a DCOM service, it seemed like almost 90 percent of the code was there. The only parts that I really had to code were the actual TraceSrv interface and handlers. Most of the code that I did is in Trace.h and Trace.cpp (see Figure 2).
Overall, everything was pretty easy. For the IConnectionPointContainer hookup, I followed both Coleridge's example and the source code from the ATL DrawSrv sample. Getting the IConnectionPoint code was even easier as the ATL Proxy Generator can create the whole code file for you.
I paid careful attention to the BSTR string processing. Since I could think of scenarios where the trace statements would be coming in fast and furious, I wanted to make sure the string handling was as fast as possible. The CTrace::
ProcessTrace function in Trace.cpp does a lot of string manipulation, especially considering the different items that can be placed on the front and the end of the final string output by TraceSrv. I had originally used the CComBSTR class for the string manipulation. But when I started stepping through the code and looking at what it did, I noticed that, for almost every method and operator in the class, it does memory allocation or deallocation each time with the SysXXXString functions. While this is the proper and safe thing to do, it could lead to some real performance problems in programs like TraceSrv that do a good deal of string manipulation.
To speed up the string processing, I wrote a simple class called CFastBSTR that handles the BSTR manipulation directly. The class is in FastBSTR.h (see Figure 3). Its sole job is to allocate a single buffer for the data and to play games with the leading size DWORD in the GetStringByteLength function. Some people might feel that it is better to stick with the exact rules on BSTRs, but I felt that performance is important. You could easily change the code in CFastBSTR to use the SysXXXString functions if my gyrations make you uncomfortable.
There are only two other items worth pointing out about the code. The first is that I used the Memory Dumper and Validator code from my article, "Introducing the Bugslayer: Annihilating Bugs in an Application Near You" (MSJ, October 1997). The only two classes that I could use it on are CTrace and CFastBSTR. I included the prebuilt libraries in the source code for this month's column in the .\LIB\Intel and .\LIB\Alpha directories.
The second item I need to point out is that the build is always Unicode. Since TraceSrv is designed to be a Win32 DCOM service that only runs on Windows NT, I felt that there was no reason to slow it down with the ANSI translation problems. In case you don't know, Windows NT uses Unicode for everything internally. In non-Unicode (single-byte or Far Eastern multibyte) builds of your program, a whole bunch of string processing and memory allocation and deallocation is done for each call into the operating system that requires a string buffer.
This can really slow down your program, and since I went to all the trouble to get fast BSTR handling, I did not want to slow down TraceSrv any more. My decision to make TraceSrv a full Unicode program has little effect when you use it for your trace statements. All the Visual Basic dialects use BSTRs internally so you don't have to do any work at all. If you are programming in C++, you already deal with the fact that OLE and ActiveX are Unicode, so the only issue is that you might need to convert the strings you're sending to TraceSrv to BSTRs.
Running TraceSrv and DCOMCNFG
Now that you have seen a little of the code, I want to cover what happens after you build TraceSrv and want to use it. The Visual C++ 5.0-based project that comes with the source code for this column is basically the one that the Application Wizard generated, so the last step of the build is to register TraceSrv. The registration portions are all part of the ATL code that you get for free, but TraceSrv is only registered as a local server EXE. TraceSrv won't run as a Win32 service unless you specify the service command-line option. While I could make the service registration part of the build, I chose not to because debugging a Win32 service without a kernel debugger like SoftICE is not simple. Also, if you are in the middle of a fix-compile-debug cycle, it's a real pain to have to shell out to an MS-DOS® box and run "net stop tracesrv" just to get the build to work. After you have done sufficient debugging and testing with TraceSrv running as a local server, you can register it and run it as a service.
You do not have to run TraceSrv as a service to allow access to it across the network. This is convenient for debugging because all you need to do is start TraceSrv in the debugger and you can watch things connect and debug when appropriate. What worked best for me was to run both the client and TraceSrv under debuggers on the respective machines; when one hits a breakpoint, break on the other one to avoid any possible timeout problems. I always compiled the Visual Basic portions down to native code and used the Visual C++ debugger. This ensured that the Visual Basic client was stopped dead in the debugger since the Visual Basic IDE debugger is actually the client that does the connecting, not the application being debugged.
When you want to use TraceSrv across the network, you need to run the DCOMCNFG.EXE program to get the proper information set in the registry. The first thing you need to do is get the default DCOM properties set up for your machine. Since you could leave yourself exposed to some serious security problems, you might want to check with your network administrators before deploying TraceSrv in a company environment. If you have a small network and are King of the Domain like I am, Figure 4 contains the settings that worked best for me on all the machines.
After you have registered TraceSrv (either as part of the build or with the service command-line option), start DCOMCNFG.EXE, select the Trace Class, and click the Properties button. I only changed the Location property page. If you want to run TraceSrv only on the local machine, check "Run application on this computer" and leave all the options blank. If you want to run TraceSrv only on another machine, check "Run application on the following computer" and specify the server. (Note that DCOMCNFG will let you put the current computer name in the box, but then it won't create the server.) If you want to run locally and remotely, check both "Run application on this computer" and "Run application on the following computer." To avoid lots of headaches, double-check that all the options on the Trace Class security page are set to use the defaults.
For the most part, you should not have to change the settings in DCOMCNFG.EXE. It's rather interesting to set different security and identity options to see what effect they have in starting and connecting to TraceSrv. If you get into a situation where you can no longer start TraceSrv, simply run TraceSrv with the -UnRegServer command-line option; this cleans out the registry so you can start fresh again. Automatic registration and unregistration are nice features of ATL.
Now that you have seen enough about TraceSrv to know how to build it, use it, and get it set up, you probably think that this is the end of the column. Originally, I thought so too, but some really nasty bugs showed up when I started using TraceSrv. Most were mine, but a couple were in the tools. So let's go slay some bugs.
The Initial Spate of Bugs
The first problem I encountered occurred after I got TraceSrv up and running and connected to it from multiple client processes. My design requirements state that all clients will use the same instance of TraceSrv. When I was testing, each process that was connecting was getting its own copy of the ITrace interface, so there was no way that a trace viewer would ever see the output from multiple processes.
This stumped me a bit because I didn't think it would be that hard to make a single-instance interface. After fumbling around for a day, I was ready to override IClassFactory::CreateInstance and force it to always return a single ITrace interface. While this was not the correct thing to do, at least it would have allowed only one instance. Fortunately, while poking through the ATL code I ran across the CComClassFactorySingleton class, which the documentation says is used to create a single instanceexactly what I needed. This is handled by the DECLARE_
CLASSFACTORY_SINGLETON (CTrace) macro in Trace.cpp. I attribute this bug to my ignorance of ATL.
After getting a single instance of TraceSrv, I noticed that the CComBSTR was doing all those allocations and deallocations on almost every method call. I started developing the CFastBSTR class, and all of a sudden none of my trace events were getting sent to trace viewer applications. When I stepped through CTrace::ProcessTrace function, everything was fine with the string memory up to the ATL Proxy Generator-generated CProxyDITraceEvent::Fire_TraceEvent function, but the trace event was never received by the trace viewer. While I did not think there could be anything wrong with the generated code, I stepped into the code and found a very big surprise.
The generated code for Fire_TraceEvent simply sets up the VARIANTARG parameter and calls IDispatch::
Invoke. The only problem is that IDispatch::Invoke was returning exactly why the data was not getting over to the trace viewer, but it never checked the return value, which happened to be C00000005. What happened was that I had messed up the prefixed size value in the CFastBSTR class, but since the generated code didn't check the return value, I had no way of knowing it. While this is not a bug in the ATL Proxy Generator, I wish the authors had applied their bug slaying techniques and put an ASSERT macro on the return value. The moral of the story is to not trust generated code until you have looked it over before compiling. To fix the generated code, I renamed the ATL Proxy Generator file CPTraceSrv.h to CorrectProxyTraceSrv.h and added return value checking on any calls to IDispatch::Invoke.
After fixing the CFastBSTR size prefix problem, I took a look at the 48 bytes of memory leaks that kept being reported whenever I ran TraceSrv. The memory leaks had to be in memory allocated inside ATL, because the leak reports were not from the two class's memory dumpers I wrote using the MemDumperValidator library. I tracked these down to lines 1605 and 1607 of ATLIMPL.CPP. In CSecurityDescriptor::GetTokenSids, two PSID variables are malloced; when I followed the call stack and looked at each place the values were used I found that the memory is never freed. However, this is not technically a memory leak. The PSIDs that are allocated are passed into the API calls SetSecurityDescriptorOwner and SetSecurityDescriptorGroup. The SIDs passed into these functions are referenced, but not copied, so they must remain allocated through the life of the program.
The next problem I found will, unfortunately, cause you problems if you do not use the Visual C++ IDE to build TraceSrv and try to build using just the TraceSrv.mak file. I swap between Intel and DEC Alpha machines when doing my development; when I first moved the project from the Alpha back to the Intel machine, I found a bug in Visual C++. No matter what I set as the current project in the IDE on either machine, the TraceSrv.mak makefile is always saved with a default configuration of "TraceSrv Win32 Alpha Unicode Debug." To build an Intel debug version from the command line, you will need to specify the entire CFG value like so:
nmake /nologo /f TraceSrv.mak CFG="TraceSrv - Win32 Unicode Debug"
If you do not specify the complete CFG value for what you are trying to build, you will get unknown command-line options to CL.EXE, and LINK.EXE will report the wrong machine type.
After finally fixing or working around all of the initial bugs, TraceSrv was up and running fairly well. Now, before I can wrap up, I need to cover the TraceView, Win32 security, and the VBScript sample.
TraceView and the Security Dance
While TraceSrv by itself is pretty useful, it really helps to have a viewer to see the trace statements. I wrote TraceView in Visual Basic because it was rather simple to do, and I needed to learn more about Visual Basic anyway. When you look at the source code for TraceView, you should not see much that hasn't been done before.
I tried to make TraceView a little more useful than a plain edit control by giving it a toolbar, a status bar, window position saving and restoring, file saving capabilities, forward and backward searching, and allowing the window to stay on top. For internalization ease, I keep all of the strings in a resource file instead of hardcoding them. While I won't discuss the resource string loading, I'll just mention that I modified the generated LoadResStrings function (which I renamed LoadFormResStrings) to make it a little more helpful when finding an item that does not have the tag property filled out.
When I first started using TraceView it worked great. But when I started testing all the different ways to connect TraceView to TraceSrv, I had some problems. If TraceView and TraceSrv were on the same machine, TraceView could connect to TraceSrv if it ran as a service or as a local server. TraceView could also connect properly if TraceSrv ran as a local server on another machine using DCOM. When I tried to have TraceView connect to TraceSrv running as a DCOM service on another machine, it would always fail, giving me the Visual Basic error message "Run-time error -2147023071 (8007021) Automation Error." I looked up the error value in WINERROR.H; the ID is RPC_S_SEC_PKG_ERROR, "A security package specific error occurred."
I had never seen this error ID before. When I searched the Microsoft Developer Network for it, all I got back was that it was in WINERROR.H and that it was listed in the system error appendixes. After poking at this for several days, I found that I could only get a Visual Basic-based program to connect to the remote TraceSrv service if I did not use the WithEvents keyword in its declaration. If I used the WithEvents keyword, I would always get the RPC_S_SEC_PKG_
ERROR error. This had me pretty stumped until a friend pointed out that I did not have the security for the service set correctly.
When I stepped back and walked through what happened, it started to make sense. The WithEvents keyword is setting up an IConnectionPoint interface that the server will use to call into the clientin essence a callback. This means the server must have the correct security permissions to call back into the client. When running on the same machine, this worked just fine because TraceSrv, whether started as a local server or as a service, runs under the same user identification or is trusted. Running TraceSrv on the remote machine as a local server and TraceView on another machine worked because I was lucky. On both Windows NT Workstation machines without a domain controller, I was logged in as John with the same password. According to KnowledgeBase article Q158508, "COM Security Frequently Asked Questions," Windows NT Workstation "falls back to a 'matching account names and passwords' mode. If you use the same ASCII names on the two machines running Windows NT Workstation and the accounts have the same passwords, then DCOM and other [Windows] NT security (such as filesystem) should work as though you were really logged on to the two machines with the same account." When I logged into the remote machine as Bob, started TraceSrv as a local server, and tried to connect TraceView on the client machine logged in as John, I got the RPC_S_SEC_PKG_ERROR error. My test case, running TraceSrv as a local server on a remote machine, did not take into account all the permutations for connections.
Getting a remote local server started with proper security is fairly easyjust log in as a user that has network permissionsbut a Win32 service takes a little more work. By default, Win32 services have no security credentials, so TraceSrv caused a security error whenever it tried to do anything with the IConnectionPoint interface it was passed. What I needed was a way to have the client tell DCOM the security level it will allow for its own interfaces. This is handled through the CoInitializeSecurity function call, which should be called immediately after your application calls CoInitialize. In TraceView written in Visual Basic, calling CoInitializeSecurity will not work. If you try calling CoInitializeSecurity as the first thing in your Sub Main,
it will return the error code 0x80010119 (RPC_E_ TOO_LATE) which means "Security must be initialized before any interfaces are marshalled or unmarshalled. It cannot be changed once initialized." As you can see, Visual Basic is happily marshalling away long before your code ever gets called.
There are two ways to get around this little Visual Basic roadblock. The first is to fire up DCOMCNFG and set the Default Authentication Level (on the Default Properties Page) to None. While this might be fine for my little sealed network at home, it is not the best solution in a real development shop. My second method is a little more secure and appropriate: on the machine that you will run TraceSrv on, register TraceSrv as a service, start up the Control Panel, and open the Services icon. Move to the TraceSrv entry and click the Startup button. Under Log On As, select the This Account radio button and type in the user name and password for the account. This allows the service to get the security it needs from a known account on the network. As the "COM Security Frequently Asked Questions" article points out, "Localsystem is a very privileged account locally.... However, it has no network privileges and cannot leave the machine via any [Windows] NT-secured mechanism, including file system, named pipes, DCOM, or secure RPC." Once I got the service starting under the proper account, TraceView worked fine.
If you are working with a domain server, you might want to consider creating a specific account that you can use just for starting things like TraceSrv. For example, if you have a Build account that your build machines use to send mail, you might want to use that.
VBScript Versus SECURITY.DLL
After getting TraceView up and running, I figured I was finished with TraceSrv. All I had to do was write VBScript and Visual Basic for Applications-based tests. One of the key requirements of writing bug-free code is to actually execute it, and since I promised everyone at the beginning of the column that TraceSrv would work with all languages, I wanted to make sure it did. The two tests are distributed with the source code for the column in the \VBA and \VBScript subdirectories. The Visual Basic for Applications sample is a Word 97 document, and the VBScript sample is an HTML page for Internet Explorer (IE).
I tested the Visual Basic for Applications and VBScript code by connecting to TraceSrv in each of the different ways to run TraceSrv. Everything worked well, except that I found a known problem in IE 3.02: if IE is connecting to TraceSrv running on another machine, either as a local server or as a DCOM service, it crashes in the FreeCredentialsHandle function of SECURITY.DLL. This is true for both Intel and Alpha versions of IE. The bug shows up only when exiting the whole IE process, not when changing pages or at any time during use. This bug has been fixed with IE 4.0, which works fine with TraceSrv.
While I ran into a few problems getting TraceSrv up and running, I hope that you can learn from the mistakes I made and the bugs I had to work around. I learned most about the security features in Windows NT. While they can, at first, appear pretty restrictive, they are actually quite good; once I figured out how to work with them instead of against them, I started to see some interesting tool possibilities.
TraceSrv is a tool that will help you track down the bugs in your multilanguage, multiprocess, and multimachine programs. Look at ways that you could extend both TraceSrv and TraceView to make them more useful. One of the big weaknesses of TraceSrv is that, while it works great for brand-new development, it is not that useful for existing projects that already use OutputDebugString. My next column will demonstrate how to retrofit your existing projects with almost no fuss at all.
Slay Those Bugs!
Continuing from my inaugural Bugslayer article (MSJ, October 1997), here are some more debugging tips. Send your tips in and I'll include them in future columns.
Tip 3: Always use /W4 and /WX with CL.EXE. Why track down bugs yourself when the Visual C++ compiler will find many of them for you, simply by compiling your code? Always set the warning level to 4 so you can catch excellent problems like using uninitialized stack variables, unused variables, and signed/unsigned mismatches. The /WX switch makes the compiler treat all warnings as errors so nothing slips through the cracks. These two command-line switches have caught thousands of bugs for me.
There's only one drawback: many of the standard windows and OLE headers do not compile without warnings. In these cases you can use the #pragma warning directive around those files to turn off warnings and then turn them back on. I have header files called WarningsOff.h and WarningsOn.h that I use in many of my projects to bracket the standard files so they compile.
Tip 4: Use /GF to combat really strange memory bugs. The other day at work, we had a case where there was a super-bizarre memory corruption. We eventually tracked it down to someone accidentally using a static string buffer
char szBuff = "Static String"
and passing that string to a function that writes to the buffer. If you use the /GF switch on CL.EXE, it places all the static strings into a read-only section of the binary. If any part of the program tries to write to one of the static buffers it immediately causes an access violation. If we had used
/GF on all of our code, this bug would have been caught immediately. (Thanks to Matt Pietrek.)|
Have a tricky issue dealing with bugs? Send your questions or bug slaying tips via email to John Robbins: firstname.lastname@example.org