Matt Pietrek is the author of Windows 95 System Programming Secrets (IDG Books, 1995). He works at NuMega Technologies Inc., and can be reached at email@example.com.
When I started working with OLE recently, something that jumped out at me is just how much stuff is kept in the registry. It turned out that many of my registry entries were from old programs, long since deleted. The problem is especially bad for developers as we tend to run a lot of programs once or twice for testing, then delete them. In an ideal world, there shouldn't be dead registry entries, as every program would have an uninstall program to clean up after itself. The reality is that most programs (especially small utilities) don't bother with uninstall programs.
One day, it occurred to me that finding dead registry entries shouldn't be that hard. I figured I could write a quick 'n dirty program to examine every registry value and see if it contains a filename. Then I could check to see if that filename exists and report all the registry entries that reference nonexistent filenames. Of course, nothing is ever that simple. I ran into numerous hurdles as I progressed. Nonetheless, I persevered and came up with the CLEANREG program.
Before I go any further, let me be the first to admit that CLEANREG isn't a cure-all. It doesn't find every dead registry entry. What constitutes a "dead" registry entry is highly subjective. CLEANREG also requires intelligent decision making on your part. As a result, you can seriously hose your registry if you do something dumb like deleting the HKEY_LOCAL_MACHINE tree. Of course, you can do the same thing with REGEDIT, so you're no worse off.
Before I wrote CLEANREG, I checked around for similar programs. The only thing I found was the REGCLEAN program that comes with Visual Basic¨. REGCLEAN focuses on OLE inconsistencies in the registry and doesn't look for missing filenames in the general case. My CLEANREG program totally ignores OLE issues and only catches the subset of registry entries that contain file information.
The CLEANREG code is shown in Figure 1. In the following description, I'll assume that you're familiar with registry keys and values and how they're accessed programmatically. CLEANREG has two components. The code for registry scanning and file-existence checking is in CLEANREG.CPP. This module is mostly isolated from the other component, the user interface, which is in CLNREGUI.CPP. Figure 2 shows what the CLEANREG user interface looks like. I'll refer to this figure later on when I describe the fun I had putting together the user interface.
Figure 2 CLEANREG
I'll start my show of horrors with the registry manipulation and file-related code in CLEANREG.CPP. Figure 3 shows condensed call trees for the module. The SearchRegistryForMissingFiles function is what the user interface calls to kick off a complete registry scan. The function is just two calls to ScanRegNode, one to look at the HKEY_CURRENT_USER registry branch and one to look at the HKEY_LOCAL_MACHINE branch.
Figure 3 Search and Delete Call Trees for CLEANREG
As you may recall, between Windows NT¨ and Windows¨ 95 there are seven predefined registry keys. Why does CLEANREG only examine two of them? I don't scan HKEY_
CLASSES_ROOT because this branch is just a subkey equivalent to HKEY_LOCAL_MACHINE\SOFTWARE\
Classes. I don't search HKEY_USERS because it could contain information for accounts other than the currently logged-on user. Instead, I scan HKEY_CURRENT_USER, which is a subkey of HKEY_USERS. I also don't scan the performance keys (HKEY_PERFORMANCE_DATA on Windows NT and HKEY_DYN_DATA on Windows 95) because these keys aren't likely to contain filenames and their values are constantly changing.
ScanRegNode is the workhorse of CLEANREG.CPP. It uses boilerplate code for recursively enumerating through all of the values of a key and all of its subkeys. As it reads in each value, ScanRegNode determines if it's a REG_SZ value (a string, as opposed to, say, a REG_DWORD). Each REG_SZ value is passed to CheckForFilename, which does its best to extract a filename at the beginning of the string. If CheckForFilename returns TRUE, ScanRegNode passes the filename (along with the current fully qualified registry path) to CheckForExistence. CheckForExistence decides if this is a possible dead filename.
As I was writing the CLEANREG program, the first major hurdle I ran into was in the CheckForFilename function. It turns out that applications are remarkably inconsistent in storing filenames and command lines in the registry. Even worse, the expanded set of legal characters in long filenames makes it well-nigh impossible to differentiate a filename from a command line. The Win32¨ SDK documentation even states this in Knowledge Base article Q108233. Until I wrote CLEANREG, I had no idea of just what a mess long filenames can be to deal with. For example,
"foo -p .exe"
can be the legal name of a program. How you're supposed to tell this apart from a program called "foo" that takes the command line
is beyond me. Some of you might be thinking, so what, long filenames can be delimited with double quotes. While this is true, the strings in the registry don't all follow this convention. One value may include quotes where needed, another won't. A third value may contain quotes in the string, but the quotes surround an entire command line, including arguments, rather than just a filename.
The result of this filename mess is that my function CheckForFilename does a reasonable but far from flawless job of extracting a fully qualified file or path name from the beginning of the string passed to it. The brief synopsis of the algorithm is to come up with the shortest possible sequence of legal filename characters at the start of a string. The one exception is that, if the string starts with a double quote, the function assumes there will be another double quote marking the end of the filename. If so, CheckForFilename removes the quotes from the filename it returns.
After dealing with the quote issue, CheckForFilename scans the string for the first nonlegal filename character and truncates the filename there. I consider the forward slash (/) to be an illegal character, even though the SDK says it can be a directory separator that's equivalent to \. Since I couldn't get any program to accept directory paths with a /, I considered it an illegal filename character.
The next step for CheckForFilename is to see if the current working string is fully qualified. My cheesy solution here was to look for the substring :\ at character 2 in the working string. The :\ is the delimiter between the drive letter and the root directory. This means that CheckForFilename won't pick up on relative filenames or filenames without a complete path. This works out fine for the purposes of CLEANREG, since you can't check for the existence of a file without knowing exactly where the file is in the first place. CLEANREG does not catch UNC names that start with \\. I leave it as an exercise for readers to modify CLEANREG to do this if they want.
At this point, CheckForFilename has what could be a legal filename, but many of the strings that make it to this point are actually command lines with arguments or parameters to be filled in. For this reason, CheckForFilename can optionally search for characters such as a hyphen (only if preceded by a space) and a hat (^) that are rarely used in real filenames. If CheckForFilename locates such a character, it truncates the filename string there.
Whenever CheckForFilename truncates the working string, it truncates back to the first character that is not a space. I determined with a little experimentation that I couldn't create a filename with spaces at the end. If you try to create a file called "foo ", it shows up as "foo" on disk. My rule about legal filenames was whether "copy con filename" could create the file.
If CheckForFilename thinks it's found a filename, the CheckForExistence function determines if the file exists. If the filename doesn't exist, the function informs the user interface of the filename and the corresponding registry path where the filename was found. My first pass at the CheckForExistence function was dirt simple: just call GetFileAttributes. If GetFileAttributes returns -1, the file doesn't exist. This is a quickie new method over the obsolete OpenFile (OF_EXIST...) technique.
There were a couple of problems with this simple approach. The first was the undesired intrusion of numerous critical error dialog boxes telling me that there was no disk in drive A: (my floppy). It turns out that many filename strings in the registry refer to the floppy the program was installed from. Another problem was that a whole slew of nonexistent files had paths to my CD-ROM drive. Since asking the user to put every one of their CD-ROMs into the drive wasn't an option, another compromise needed to be made.
I ultimately decided to make CheckForExistence only report nonexistent filenames from hard drives and network drives. This killed both of the above problems. I implemented this by extracting the first three characters of the filename (the root directory of the drive), passing them to GetDriveType. Only if GetDriveType reports that the drive is a fixed or network drive will CheckForExistence continue on and call GetFileAttributes.
The remaining routines in CLEANREG.CPP are for deleting registry keys or values. DeleteRegistryPath is the top-level routine that splits the input string into appropriate tokens and determines whether a registry value or registry key was specified. If it's a value to be deleted, the function calls the RegDeleteValue Win32 API. If it's an entire key, then all subkeys below that key need to be deleted.
In an ideal world, deleting a registry key and all of its children would be as simple as calling RegDeleteKey. Indeed, on Windows 95 this is what happens. Unfortunately, on Windows NT you must manually delete all the subkeys of a key before you can delete the original key. I encapsulated all this code in the RecursiveRegDeleteKey routine. RecursiveRegDeleteKey uses recursion to navigate all the way down to the bottom keys, delete them, and then work its way back up the tree. Why the Windows NT version of RegDeleteKey doesn't have this option is beyond me.
That's it for the CLEANREG.CPP code. Note that there's no user interface code. In fact, my early versions of CLEANREG were command-line oriented, with a simple main function that called SearchRegistryForMissingFiles. CheckForExistence wrote the names of any nonexistent files to stdout. But command-line programs aren't in vogue, so the next step was to put a GUI on top of CLEANREG.
The CLEANREG User Interface
The information that CLEANREG needs to convey to the user is quite simple: a list of non-existent filenames and the registry paths where they were found. A very simple user interface would simply contain a list box with the filename and the corresponding registry path. You could then select a filename, hit Delete, and the value would cease to exist.
One problem with this simple approach is that the filenames might be several subkeys below the primary key for a program. For instance, consider the filename F:\MSOffice\Office\bdrec.dll, which is located in the registry path
In this case, I want to remove the entire class reference from the registry (starting with the long GUID portion in the curly braces). Deleting just the InprocServer32 node would leave a registered CLSID without any information about its server. Thus, what I needed to do was give users a complete registry path and let them decide where to prune.
If you've used REGEDIT, you know that it uses a TreeView control to display all the registry nodes. By selecting a particular node and hitting Delete in REGEDIT, you can delete a key and all of its children. This works well, so I adopted this model for CLEANREG. As you can see in Figure 2, the top list box lists all the nonexistent filenames that CLEANREG found in the registry. As each list box entry is highlighted, the TreeView expands to display the corresponding registry path. You can then click on the desired node in the TreeView and hit Delete to remove either the key or value (depending on what you've highlighted).
A key difference between the CLEANREG UI and REGEDIT is that I was lazy and didn't do a split pane window with registry keys on the left side and the associated values on the right. Instead, registry values appear just as additional subitems in the TreeView.
An actual registry value is stored in the TreeView in the form [value: xxx], where xxx is the name of the registry value. The actual string contents of the REG_SZ value (yes, it's confusing to keep all this straight!) appear as yet another subitem beneath the [value: xxx] item. If either a [value: xxx] or its subitem is selected when you hit Delete, CLEANREG removes the value from the registry. If anything else is selected, CLEANREG treats the selected node as a registry key, and deletes the key and all of its child keys and values.
The remaining elements of the CLEANREG UI are fairly self-explanatory. The Update after delete checkbox (which is enabled by default) lets you delete multiple registry entries without forcing a complete registry rescan after each deletion. The Sane filenames checkbox reflects the setting of the g_fBeRealistic global variable. This variable tells CheckForFilename (described earlier) if it should report legal but unlikely filenames. Each time you toggle the Sane filenames button, CLEANREG does a complete update, so this button is a quick way to refresh the dialog's contents. Pressing the F5 key will also refresh, just like Windows Explorer or File Manager.
The sources for implementing the CLEANREG UI are entirely within CLNREGUI.CPP. The code is all pretty straightforward, except for the TreeView control code. Doing serious work with TreeViews was the other unanticipated adventure when writing CLEANREG. I'll tell all shortly.
The AddItemToUI function is the way the registry-scanning code communicates with the user interface. Conceptually, AddItemToUI's work is simple enough: put the nonexistent filename in the list box, and put the registry path key nodes into TreeView items. I used list box item data (the LB_SETITEMDATA message) to hook up the list box to the TreeView. This means that when you select a list box item, the appropriate TreeView item expands. For each list box entry, the code stores the corresponding HTREEITEM for the TreeView. (If you're unfamiliar with TreeView controls, an HTREEITEM corresponds to exactly one selectable node within the control.)
The first snag I ran into with TreeView controls was that there's no way to build a nested series of HTREEITEMs. You can only create one HTREEITEM at a time. Thus, I needed to take a \-delimited registry path string and break it into individual key names. For each key name, the code creates an HTREEITEM (as long as an HTREEITEM with that name doesn't already exist). If there are additional subkey names in the registry path, the process is repeated until there's an HTREEITEM for each key name in the registry path. This task lends itself to recursion, so I wrote a single routine, GetHTIForRegPath, to do it. Given an input HTREEITEM and a registry key path, it creates an HTREEITEM for the first key name in the path. The routine then recurses to handle any remaining subkeys.
A snag that came up when writing GetHTIForRegPath was that I couldn't just extract the first key name in the registry path and create an HTREEITEM for it. Why not? There might already be an HTREEITEM with that name, so I'd be creating duplicate TreeView nodes. Thus, GetHTIForRegPath has to first check if an HTREEITEM with the given name already exists, and only create a new HTREEITEM if there isn't one already. This turned out to be more difficult than I expected.
For starters, the TreeView control offers no searching capabilities (not even in the new TreeView from the new Common Controls DLL, which ships with Microsoft¨ Internet Explorer 3.0). Given a string, you can't have the TreeView return an HTREEITEM for a child node with a matching name. Put another way, there's no equivalent to the list box LB_FINDSTRING message. Instead, you have to enumerate through all of the child HTREEITEMs yourself, querying each one in turn for its name. (In case you're wondering, MFC wouldn't have helped here. The MFC TreeView classes don't appear to encapsulate any functionality that you can't get with the standard TreeView messages in COMMCTRL.H.)
After resigning myself to searching though every child node myself, I hit a real head banger in the documentation. Enumerating through the child nodes of an HTREEITEM is a first/next affair using the TreeView_GetNextItem function. One of the parameters to TreeView_GetNextItem is an HTREEITEM. Another parameter specifies the relationship between the passed-in HTREEITEM and the HTREEITEM you're asking for. To enumerate all the child items in a TreeView control, you pass TVGN_CHILD the first time and TVGN_NEXT in subsequent calls. The problem is, the SDK documentation states that for TVGN_CHILD, the HTREEITEM must be NULL. Obviously this is complete nonsense. Without knowing which HTREEITEM is the parent, how can you enumerate its children?
Operating on the assumption that the documentation was confused, I tried passing a nonzero HTREEITEM. Since I wanted to start the enumeration at the root of the TreeView, I passed in TVI_ROOT. Big mistake! CLEANREG promptly blew up inside COMCTL32.DLL. A little poking around in a debugger led me to figure out that HTREEITEMs are really pointers. TVI_ROOT, which is defined as ((HTREEITEM)0xFFFF0000), obviously isn't a pointer. I finally got around this problem by creating a dummy root node in the TreeView called My Registry.
So much for entering nodes into TreeView controls. The other challenge was to get the data back out. That is, when you click on a node and hit Delete, CLEANREG needs to create a \-delimited registry path for the selected HTREEITEM. This is the string that CLEANREG passes to DeleteRegistryPath in CLEANREG.CPP. Since I already had a routine called GetHTIForRegPath to put registry paths into the TreeView, I wrote a mirror-image routine, GetRegPathForHTI, to do the opposite. Given an HTREEITEM, GetRegPathForHTI walks up the HTREEITEM hierarchy and builds a \-delimited path as it goes. Incidentally, both GetHTIForRegPath and GetRegPathForHTI could be used as a basis for writing general-purpose TreeView routines for other programs.
Some Final Notes on CLEANREG
If you peruse the CLEANREG sources, you'll see that I wrote the code to be Unicode-aware. By passing "UNICODE=1" to NMAKE when building CLEANREG, you'll get a Unicode version of the program. You might consider doing this if you only run Windows NT, as you'll get a slight speed gain. After all, Windows NT uses Unicode internally and for all of its programs, so why shouldn't other utility programs? Making the code Unicode-ready was a little more work, but it uncovered some bugs that I wouldn't have found otherwise. Since CLEANREG is designed to run on both Windows NT and Windows 95, the EXE that I've included with the sources is the ANSI-compiled version.
Earlier, I mentioned that CLEANREG was far from perfect. Run it yourself and you'll see what I mean. For instance, many of the nonexistent files are from the recent file lists that many programs (such as Microsoft¨ Word and the Visual C++ IDE) keep in the registry. While it usually doesn't hurt anything to delete these entries, it doesn't buy you much. If you continue using the program that put the entries there, they'll be replaced by other filenames.
Another thing to be aware of is that many install programs fill out a complete set of registry entries, even if you didn't install all of the program's options. For instance, Microsoft Office for Windows 95 adds in numerous entries for its Binder program even if you don't install the Binder. Whether or not it's OK to delete these entries is a judgment call that depends on the individual program.
As currently implemented, CLEANREG can report command lines like "C:\FOO.EXE BAR.TXT" as files. After all, they're legal filenames. This is where your judgment is needed. Just because CLEANREG puts a filename in its list box doesn't mean that it's safe to delete the associated registry entry. Likewise, nothing in CLEANREG prevents you from picking a branch that's too close to the root of the registry and deleting it. CLEANREG prompts you before deleting an item to make sure that you really want to do so. If you say OK, you'll get what you asked for.
The moral is this: CLEANREG is a tool for assisting you in finding potentially dead registry entries. It's certainly not infallible. Make sure you understand what you're doing. Back up your registry before running any program that deletes registry entries. And please, use it with caution just as you would with REGEDIT.
Have a question about programming in Windows? Send it to Matt at firstname.lastname@example.org
From the September 1996 issue of Microsoft Systems Journal.