Systems Management Server Recovery Planning

Published: January 20, 2000
**
**
On This Page
OverviewOverview
Planning for RecoveryPlanning for Recovery
Implementing Failure Reduction StrategiesImplementing Failure Reduction Strategies
Backing Up DataBacking Up Data
RecoveryRecovery
Best PracticesBest Practices
Poor PracticesPoor Practices
Recovery ToolsRecovery Tools

This article is intended for any Microsoft® Systems Management Server 2.0 (SMS) administrator who is responsible for keeping SMS data safe and for maximizing the uptime of SMS operations or who needs to return an SMS site to usual operations after it has failed.

warning icon

Warning: Client-server applications recovery in general—and SMS recovery in particular—is very different and much more complex than typical application or operating-system data recovery. You must become familiar with the concepts and procedures in this document, or you will not successfully recover from a site failure. Even under ideal conditions and with the best preparation, recovering a site after it has failed is difficult, tedious, and filled with opportunities for error—errors that can cause irrecoverable loss. Time that you invest now in reading and preparing is very valuable when you are later recovering a site from failure.

Overview

Definition of Site Recovery

A Systems Management Server site recovery occurs whenever you install an SMS site with a site code or site server name that was previously used in that SMS hierarchy. Repairing and resynchronizing data are the core tasks of a site recovery, and they are required to prevent interrupting operations or corrupting data. Resynchronizing serial numbers is the most critical task when you are reinstalling a site. You must always perform this task when reinstalling a site.

If you are reinstalling a site, you must do it as part of a site recovery operation. The other SMS sites in the hierarchy expect a reinstalled site to be in its former state. To prevent operations problems in the hierarchy, you must treat a reinstallation as a recovery.

Note: Not understanding or misunderstanding these recovery requirements is the most common cause of unsuccessful site recoveries.

Concepts

Recovering SMS is significantly different than recovering other client-server applications because of the following issues:

Distributed data. The complete data set for an SMS site is stored in several different locations, some of which might not be on the site server. Without a complete data set, full recovery is impossible.

Distributed tasks. SMS uses multiple instances of processes to carry out tasks, and it uses multiple storage locations for task instructions and data, which can become orphaned by a site failure. Orphaned data must be cleaned up after a site failure, and it is a bigger problem in hierarchies than in stand-alone sites. The smallest problem is wasted disk space. The greatest problem is mixed versions of data and tasks, depending on what was happening at the time of failure and how old the backed up data is.

Non-transaction-based tasks. SMS uses transactions to ensure that series of operations are successfully completed. If a transaction is not successfully completed, SMS rolls back the actions. Not all SMS tasks are based on transactions, so you must clean up partially completed tasks after a site failure to resume usual operations. Non-transaction tasks are also an issue with the Backup SMS Site Server task; you must shut down the site during backup, or the backup itself will contain relational integrity problems.

Serial numbers. SMS depends on sequentially issued serial numbers to track objects and tasks. Because backed-up data is old, it is out of synchronization with the rest of the site and hierarchy. Therefore, serial numbers must be resynchronized. (This applies even to stand-alone sites, because the site uses serial numbers in its client relationships.) Depending on the component, if there is a serial number problem, either all new data is rejected, or all new data creates duplicate objects that conflict with the original objects.

Multiple accounts for site operations. SMS security is compartmentalized to limit potential damage if the security of a single account is breached; a particular process uses a particular account with limited privileges. This means that SMS processes—especially SMS client processes—risk being locked out if mistakes are made during a recovery operation. All accounts and passwords must exactly match before and after recovery.

Object-level security. SMS depends on object-level security—not share security—to protect its registry and file data. Except for the SMSLOGON share, all SMS shares grant the Everyone group full permission to the share. You must restore explicit permissions after a site failure to protect sensitive data, but low rights processes must still have the correct read, execute, and write access necessary to carry out their tasks. There are more than 1,000 registry keys and directories, each with its own access control list, each list with multiple access control entries, and each entry with multiple properties. Because there are no defined zones where a group of keys or directories all have the same privileges, every single property is unique.

Regained control of clients and secondary sites. A site failure isolates clients and secondary sites of the failed site. Even if no backup is available, recovering a failed site is valuable for regaining control of the local SMS clients and secondary sites without having to clean up and rebuild all the clients or secondary sites that were children of the failed site.

Recovery Cycle

The following diagram shows a complete SMS site recovery cycle:

SMS Site Recovery Cycle

Skills Required for SMS Maintenance and Recovery Procedures

Successfully completing SMS maintenance and recovery procedures requires the administrator to be comfortable with performing detailed manual tasks on local and remote computers. Sometimes these tasks use Resource Kit tools for the operating system, Systems Management Server, and Microsoft SQL Server™. In many cases, the administrator is directly manipulating critical data on live sites.

These tasks can include:

Configuring the operating system, working with the file system, and editing the registry.

Configuring operating system security, including accounts, services, shares, trusts, permissions, rights, organizational units, domains, forests, and policy templates.

Configuring SQL Server, restoring databases, and updating SQL Server tables with queries.

warning icon

Warning: Failure to complete the procedures successfully can cause site operations to fail, corrupted site data, or even corrupted data at multiple sites. If there is any doubt that the skill level of the SMS administrator is sufficient, a more experienced administrator should supervise the maintenance and recovery procedures. If a skilled administrator is unavailable, product support services should be called to guide the administrator through the procedures until the administrator is proficient enough to manage the operation alone.

Top of pageTop of page

Planning for Recovery

You cannot prevent failure, but you can prepare for it. The way to prepare for the failure of an SMS site system (any computer in the service of the SMS site) is to gather all the configuration data needed to rebuild the site system exactly as it was before the failure.

Microsoft supplies two batch files that gather some the information you need: Machinfo.bat for all site systems and SMSSQLinfo.bat for SMS site database systems. By default, these batch files run each time the Backup SMS Site Server task runs, but you can also run them manually. These batch files write data to various output files, but if a disk crashes or other magnetic media fail, these output files are not accessible. Therefore, you are encouraged to back up these output files or record their data on the configuration worksheets provided below.

Practices That Simplify Recovery

There are two general practices that help simplify recovery:

Client connection account management. Before the site fails, create an additional client connection account using a name different from the default of SMSClient_<SiteCode>.

To avoid locking out clients, never change the password of an SMS client connection account. This is essentially what happens when you run the fresh site install during recovery, if the old site was using the default account name. Instead, create new SMS client connection accounts with new passwords before site failure. After the new account information is propagated to all domain controllers, client access points (CAPs), and clients, you can change or delete the old accounts.

In Microsoft Windows NT® User Manager for Domains, if account lockout is enabled and any single SMS client uses an invalid password to try to access an account, that account is locked out when the number of bad logon attempts is met. This has implications for SMS clients because of the various client connection accounts they use. For example, an SMS client that has been offline for a long time can cause a lockout because all the passwords of its client connection accounts might have expired. When the SMS client attempts to return online with an old, invalid account password, it causes that client connection account to be locked out.

For more information about account lockouts, see the Systems Management Server 2.0 Security Essentials article.

Server connection account management. When you set up a site, a default server connection account is created with the format of SMSServer_<SiteCode>, and a random password is assigned to it. When you run the fresh site install during recovery, a different password is generated. You must run a site reset to propagate the new server connection account password to all remote site systems in the site. If there are many logon servers, it can take a very long time to run the cycle to update all the logon points. In this case consider using the SMSAccountSetup.ini file or setting up command-line parameters to specify a password for the server connection account. This password can be documented, and then reused during site recovery, to avoid having to run site reset to re-enable access to the site server for the remote site systems.

Data Collection Needed for Recovery: Machinfo.bat

Some of the information needed to fill out the site system configuration worksheet is provided by Machinfo.bat when you run it on each site system being backed up. You can run Machinfo.bat manually, but the Backup SMS Site Server task runs it automatically. Machinfo.bat writes its output to the following default files:

On site servers: \%SITE_SERVER_DEST %\SMSbkSiteConfigNT*.txt

On SMS site database servers: \%SITE_DB_SERVER_DEST%\SMSbkSQLConfigNT*.txt

Machinfo.bat requires that the following Windows NT Resource Kit tools be in the path:

Now.exe

Srvinfo.exe

Tlist.exe

Machinfo.bat: Detail

Syntax \SMS\bin\i386\machinfo.bat

SiteSystem Folder\ FilePrefix

SiteSystem

The site system for which information is being gathered. Run Machinfo.bat on this computer.

Folder\

The folder where the output files are written. This can be an absolute path (C:\SMS\bin\i386\temp\), a relative path (..\temp\), or a UNC path (\\server\share\), but it must end with a backslash.

FilePrefix

The characters that compose the start of each output file’s name. There are two output files:

FilePrefixData.txt Contains information provided by Srvinfo -D, Net View, Srvinfo -NS, Net Share, Ipconfig, and Tlist.

FilePrefixWinMSD.txt Contains information provided by Winmsd.exe.

Example

machinfo.bat Serv1 \\Serv1\Public\ Serv1_

Security

The security context in which Machinfo.bat is running must have administrative rights on SiteSystem and its domain, and it must be able to write files to Folder.

Limitations

Although partition information is output from Machinfo.bat, Machinfo.bat does not record the drive location of shares on remote computers; you must collect this information manually if it is needed.

Data Collection Needed for Recovery: Smssqlinfo.bat

Some of the information needed to fill out the site database system configuration worksheet is provided by Smssqlinfo.bat when you run it on each SMS site database system and each software metering database system being backed up. You can run Smssqlinfo.bat manually, but the Backup SMS Site Server task runs it automatically. Smssqlinfo.bat writes its output to the following default file:

\ %SITE_DB_SERVER_DEST%\SMSbkSQLConfigSQL*.txt

Smssqlinfo.bat requires that the SQL Server utility Isql.exe be in the path and also that the SQL script Smssqlinfo.sql be in the same folder as Smssqlinfo.bat (which it will be unless it was moved).

Smssqlinfo.bat: Detail

Syntax

\SMS\bin\i386\smssqlinfo.bat SiteSystem DBname Folder \ FilePrefix

SiteSystem

The SMS site database system or software metering database system for which information is being gathered. Run Smssqlinfo.bat on this computer.

DBname

The SQL Server database name. You can find this by running SQL Server Enterprise Manager and looking in the Databases folder:

Console Root -> Microsoft SQL Servers -> SQL Server Group -> SiteSystem -> Databases

Folder\

The folder where the output files are written. This can be an absolute path (C:\SMS\bin\i386\temp\), a relative path (..\temp\), or a UNC path (\\server\share\), but it must end with a backslash.

FilePrefix

The characters that compose the start of each output file's name. There are four output files:

FilePrefixData.txt Contains information about SQL Server configuration.

FilePrefixdboption.txt Contains information about SQL Server options.

FilePrefixhelpdb.txt Contains information about the database, such as size, owner, and log files.

FilePrefixrevdatabase.txt Contains a SQL Server script that you must use to recreate the ridge segment structure in a SQL Server 6.5 database prior to restoring a database backup. Because SQL Server 7.0 does not use a ridge segment structure, you do not have to recreate it.

Examples

smssqlinfo.bat Serv2 SMS \\Serv2\Public\ Serv2_
smssqlinfo.bat Serv3 SMS_LICDB ..\Temp\ Serv3_

Security

The security context in which Smssqlinfo.bat is running must have administrative rights on SiteSystem and its domain, and it must be able to write files to Folder.

Configuration Worksheets

To correctly recover a site system, you must have all server configuration data available. Recovery could fail if there are any mismatches between the original server configuration and the recovered server configuration, and without this data, it might be difficult to figure out why the recovery failed. After a site system fails, it might no longer be possible to obtain this information from the failed system, so it is prudent to gather server configuration data for each SMS site system before it fails.

The following worksheets contain data an administrator should have to ensure the successful recovery of an SMS site system. Make sure to record enough information on custom configurations so that you can rebuild the site exactly after a failure. You can copy and print these worksheets to maintain a separate worksheet for each site system. There are four worksheets for configuration data:
Site System Configuration Worksheet
Site Database System Configuration Worksheet
Setup and Site Options Configuration Worksheet
Accounts and Passwords Configuration Worksheet

Top of pageTop of page

Implementing Failure Reduction Strategies

Much has been written about preventing computers from failing, and each company has various prevention strategies based on their business needs. It is beyond the scope of this paper to cover all the tactics that can be employed. However, thinking about failure reduction is an important part of the recovery cycle, and it's to your advantage to do all you can to prevent failures in the first place and to learn what you can from a failure so you can reduce the chances of future failures.

Replacing Site Server Hardware Before It Fails

Replacing site server hardware before it fails is a key step in preventing site failure. As soon as there are any signs of unreliable behavior on site servers, swap out the old hardware for new hardware. Use the procedures in Moving Servers Between Domains.

Providing Redundancy for Help Desk

Guaranteeing 24x7 support for help desk is a critical requirement for large organizations. There are two approaches to ensuring continuous operation of remote tools in an SMS site. You can implement one or both of the following solutions:

Use a top-level central site.

Use a read-only replica of the live SMS site database.

In both cases, because there is a default site for regular help desk use, you should check every day to verify that the backup can be successfully used to support remote control sessions.

Using a Top-Level Central Site

There are three points to bear in mind with regard to using a top-level central site:

Because remote control directly accesses a client across a network, any site that has a hardware inventory for the client allows remote control access if there is network connectivity to the client from that site. Therefore, if you have a top-level central site as a parent above the central site, and all hardware inventory goes to the central site, either site will support remote control to all clients in the database that have network connectivity.

To reduce delays in propagating data down the hierarchy and to ensure that help desk is using the most up-to-date inventory as soon as possible, all management and help desk tasks should be done from the central site.

To ensure that help desk can quickly switch to the top-level central site, SMS Administrator consoles should be configured for both sites, but they should use the central site by default.

Using a Read-Only Replica of the Live SMS Site Database

You can set up a read-only replica site that contains a copy of the live SMS site database. Help desk can then use the database copy on the replica site as a backup for remote tools support if the live site goes down. This provides true redundancy for any site in the hierarchy, which even clustering does not provide. You can create a highly stable and redundant system by incorporating the following into your plans for a replica site:

The SMS site database for the site server should be on a different computer than the copy of the SMS site database for the replica site.

The live site and the replica site should be on different, uninterruptible power supplies, hubs, and routers.

Two or more replica sites can be created and then located in different geographic regions in case the network infrastructure fails.

Keep in mind that the replica site is not a fully functioning site, because it is used only for help desk remote tools support. The SMS Provider is the only site component that is used for the replica site. It is used to gain access to the SMS site database.

Any changes made to site configuration or software distribution on the replica site are lost the next time the database is updated from the live site. If naming the site "replica" and educating the administrators isn't enough, the security for the SMS Administrator console for the replica site can be set so that the account used by the help desk personnel has read-only permissions. This guarantees that administrators can't inadvertently make changes to the replica site and then wonder why their changes were lost and never implemented.

Setting Up the Replica Site
There are four steps to setting up the replica site:

1.

Install a site called "Replica Site for Help Desk Backup" or another name of your choice. This site should have network connectivity that is equivalent to the that of the live site, which it is backing up, so that it has equivalent support for help desk and remote control.

2.

Install the replica site as a stand-alone site with minimum options. Consider using a dedicated instance of SQL Server to avoid load conflicts between other applications and help desk, which requires a fast, real-time response.

3.

Wait until setup has been completed and the replica site has finished initializing. To determine whether setup has been completed and the replica site is initialized, verify that the Clicomp.box directory has been created on the client access point and that the SMS Administrator console can load and display site properties.

4.

Stop and disable the site services: SMS_SITE_COMPONENT_MANAGER, SMS_EXECUTIVE, and SMS_SQL_MONITOR. This is absolutely necessary to guarantee that the replica site never interferes with the data in the live central site database.

Replicating the Live SMS Site Database
Each site has a task that can back up an SMS site database. However, it is recommended that you automate the task through SQL Server scheduled maintenance, so that the entire replication process is automated. Here are the necessary steps:

1.

Back up mission-critical sites every 24 hours to minimize the loss of data. To obtain a valid site server backup, you must stop the site before performing the backup, which ensures that the backup will be an accurate snapshot of the site server.

2.

Use the database backup from the site server backup to update the replica site. It is best to use this backup, rather than run another backup for the replica site to avoid loading the SQL Server for the live site.

3.

Use te SMS site database replication from the live site to perform the database restore to the SQL Server in order to replace the replica SMS site database. You can set up replication as an automated SQL Server task that is scheduled a couple hours after the site backup finishes. This step ensures that there is no conflict in the schedule and that the newest data is loaded on the replica site.

Pointing the SMS Administrator Console at the Replica Site
If there is a large help desk load and help desk can accept working with slightly out-of-date data, the SMS Administrator consoles for help desk can point to the replica site by default. However, the help-desk SMS Administrator consoles ideally can point to the live site by default so that they work with the up-to-date data. In either case, the SMS Administrator consoles should be configured for both database locations so that they can be switched over quickly in case of failure.

Providing Redundancy for Client Operations

Partitioning Site Operations Is Built into SMS
You must understand how site operations are partitioned so that you know how to provide redundancy for client operations. SMS 2.0 was designed to avoid direct interaction between the site server and clients. The one exception to this is Client Configuration Manager, which runs only on the site server and directly touches clients. It was moved from client access points (CAPs) to the site server to increase the security of an SMS site. In all other cases, a combination of SMS_SITE_COMPONENT_MANAGER and SMS_EXECUTIVE manage the site systems, and the site systems interact with the SMS clients. The administrator has the option of running a CAP, distribution point, logon point, and software metering server on the site server, but there is no requirement to do so. SMS supports multiple instances of all of these site systems.

Continuing SMS Client Operations After Site System Failures
The two basic failure scenarios related to site systems are:

The site server goes down.

A remote site system goes down.

Consider a site design that has three CAPs: one on the site server and two on other computers. If any one of the CAPs goes down, two-thirds of the capacity to support clients is still running. From the client's perspective, it doesn't matter if the CAP is on the site server or not. In such a situation, from the perspective of the SMS client, service has continued uninterrupted, even if the site server goes down. This logic applies to all types of site systems. The client can still:

Log on and be discovered.

Log on after a prolonged downtime and update its files.

Run inventory and drop inventory and status files on the CAP.

Process assigned software packages on schedule.

Meter applications running on it.

Receive help desk support, including remote control, if redundancy is implemented for help desk.

If the site server fails, backlogs of files build up on the site systems, and they won't get updates. Therefore, it is important to recover the site server as soon as possible.

If a remote site system fails and can't be repaired quickly, another computer that is running can be added to the site to replace the lost resources. It is important to do this quickly: Although a site can continue functioning with two-thirds of the usual resources for some type of site system, it isn't likely to perform well with only one-third of the usual level of resources.

Staging Software Distribution to Continue Without a Site Server
If software distribution is staged weeks into the future, it can run on schedule even if the site server is down for weeks. Packages can be distributed to the appropriate distribution points at the appropriate sites, and advertisements and assignments can be created and distributed with appropriate start times in the future. All this can continue running while the site server is down. Until the site server comes up again, no status from the clients is processed. However, if a work-stopping problem occurs, help desk can still troubleshoot deployment issues if redundancy is set up for help desk support.

Using Multiple Remote Instances of Site Systems
To avoid performance problems and delays or slowdowns on client computers, use multiple remote instances of site systems so that even at peak periods of use, if a site system goes down, there is enough capacity in the remaining site systems to adequately support the clients until the site system can be brought back up again.

Investing in Multiple Site Systems and Reliable Site Servers
Keep in mind that only the site server can create site systems. If the site server is down when a remote site system goes down, a new one cannot be built until the site server comes up again. Remote site systems can be backed up, but that entails a large amount of regular overhead. Rather than backing up site systems, it is generally a better investment to ensure that the site server:

Is running with high-quality reliable hardware.

Has either a stockpile of replacement parts available, or an identical clone to which a backup can be restored.

Is backed up frequently.

Has trained staff who can perform live-site recoveries, so they can always bring the site server up again quickly and reliably.

After the Site Server Is Back Online
All the discovery records, inventory, status, and software metering information that is created while the site server is down is stored on the remote site systems waiting for the site server to come back online. As soon as the site server is back online, the backlog of client information is processed, and if appropriate, forwarded up the hierarchy.

Top of pageTop of page

Backing Up Data

Backing up data stored on computers is critical to recovering them from failures, and backing up SMS data is critical to recovering a failed site. You can schedule SMS to have all unique data backed up periodically, so that you can completely restore any site system that fails.

Your SMS Backup Strategy

Because each SMS site hierarchy can be very different, Microsoft cannot supply you with a comprehensive SMS backup strategy. There are several questions you should answer before you decide on a particular SMS backup strategy for your SMS sites:

Are your other SMS task schedules compatible with your schedule for the Backup SMS Site Server task? Tasks that take a long time to finish, such as database maintenance and client package installation, should not be interrupted, if possible. Schedule these tasks so that they finish before the next Backup SMS Site Server task starts.

Do you want your backup to omit some data in order to reduce the time the site is shut down? In some cases, you might want to trade backup completeness for backup speed or for minimizing site disruption during the backup or both. If so, do you have a restoration strategy to compensate for losing data that was omitted in the backup?

Does the SMS backup involve process-to-process network communication? Does the communication take place over a slow link? How reliable is the link? It is best to configure the backup to use only high-speed reliable links.

Performing Recommended Tasks Before Site Backup
Run the following database consistency check tasks before performing a site backup to reduce the risk of a corrupted backup:

dbcc checkdb

dbcc checkcatalog

dbcc newalloc (SQL Server 6.5 only)

dbcc textalloc (SQL Server 6.5 only)

Backing Up the Whole Hierarchy
It is important to back up the whole hierarchy, but there is no advantage to backing up all the sites in the hierarchy simultaneously. You can back up and restore each site independently of the other sites.

Backup Utility Options
You can back up SMS by using the built-in automated site backup task, the backup utility in the operating system, or third party backup utilities.

If you choose to use any backup other than the automated site backup task, you must ensure that the backup cycle performs the same steps in the same order as those listed in the SMS backup control file, or there is a risk that the backup will not be valid.

It is critical that the site backup is a snapshot of all data and that all the processes that access the data are stopped. If such processes are not stopped, partially completed tasks cannot be synchronized with each other. This can cause problems after the site recovery. Therefore, the only supported site backup is one that is made as a snapshot of all data at a time when all processes that might access data are stopped.

To have a valid backup, you must carry out the same tasks in the same order as in the SMS backup control file. The simplest and most reliable plan is to run the Backup SMS Site Server task and use other backup applications to save the backups to tape.

Backup and Primary Sites

The SMS_SITE_BACKUP service performs primary site backups. This service is enabled and run according to a schedule you set for the Backup SMS Site Server task using the SMS Administrator console in Database Maintenance, Tasks:

Systems Management Server
		SMS site database (site code - site name)
		Site Hierarchysite code - site name
		Site Settings
		Database Maintenance
		Tasks
		

Note: By default, SMS_SITE_BACKUP is not enabled. Before you enable SMS_SITE_BACKUP, read this section to become familiar with SMS backup issues and tradeoffs.

SMS_SITE_BACKUP backs up the SMS site database, the software metering database, the site server's SMS and NAL registries, and the \SMS directory tree on the site server. SMS_SITE_BACKUP does not back up data at other sites or site systems.

The behavior of SMS_SITE_BACKUP is controlled by the SMS backup control file (on site servers in \SMS\Inboxes\Smsbkup.box\Smsbkup.ctl); edit this file to change what is backed up and what is not backed up. (For information about the SMS backup control file, see the SMS Online Help topic About the SMS Backup Control File.) However, before you make changes to this file, read this paper to become familiar with SMS_SITE_BACKUP issues.

There are three key things to keep in mind about SMS_SITE_BACKUP:

SMS_SITE_BACKUP stops all SMS services on SMS site servers and SMS site database servers while backup occurs. Therefore, you must consider the size of your hierarchy, the number of SMS packages it has, and your network throughput. For example, if your hierarchy is so busy that the SMS services that distribute packages run for 22 hours a day, then a backup task that runs for 3 hours a day will cause package distributions to get behind a little more each day and never catch up.

It is better to back up less data more often than to back up more data less often. Much of the SMS data is replicated to site systems (client access points, distribution points, and logon points). Although restoration is quicker if you back up all these site systems, it is not necessary to back them up, because SMS site servers can propagate data back to any site systems that fail.

SMS_SITE_BACKUP is designed primarily to back up only the site server and SMS site database server in an SMS site. Although you can use SMS_SITE_BACKUP to backup other data on other computers, such backups are subject to risks and complications that are not associated with backing up only the site server and SMS site database server.

See the SMS Online Help in the SMS Administrator console for information about enabling, scheduling, and troubleshooting the Backup SMS Site Server task and for information about editing the SMS backup control file.

What Is Backed Up by Default
The default SMS backup control file backs up the following data:

The SMS site database and software metering database are backed up. The default SMS backup control file is configured for an SMS site in which the software metering database and the SMS site database are part of the same instance of SQL Server. (This is the default configuration created by SMS Setup.) If you move your software metering database to an instance of SQL Server separate from the SMS site database, you must uncomment lines in the SMS backup control file that start with "METERING" so that the server with your software metering database gets backed up along with the software metering database itself.

SQL Server configuration data is backed up. This includes all data for the SMS site databases and software metering databases: Master DB, MSDB, and Model DB.

The SMS directory tree on each site server is backed up. This includes all files in the SMS directory tree on each site server, but it does not include files on any other site systems.

All SMS and NAL registry keys are backed up.

What Is Not Backed Up by Default
The default SMS backup control file does not back up the data listed below. However, you can add commands to the backup control file's Tasks section to back up any files on the site server, or you can use non-SMS tools to back up these files.

SMS files on site systems other than the site server directory tree are not backed up. If you have several site systems, it might be more efficient to omit backing them up because their files exist on the site server, which is being backed up. Therefore, if a restoration ever becomes necessary because a system fails, the site server automatically propagates these files back to each site system within 24 hours.

Ideally, every SMS site has more than one computer performing each site system role, so if a client's default server is unavailable, the client searches its list for another site system with that role, and then uses that site system while the default site system is being restored.

Package source files are not backed up. For example, \Smspkg*.

Any SMS-related files moved from their default locations are not backed up. For example, Custom MOF files and Health Monitor files

Any SMS-related files never stored in the \SMS directory tree are not backed up. For example, Crystal Info for SMS and custom Y2K database files.

SMS accounts are not backed up. To ensure that SMS account data is safe:

Have a backup domain controller in each Windows NT domain where SMS accounts are defined in case the primary domain controller fails.

Periodically back up all domain controllers in domains where SMS accounts are defined.

Write down and save any changes made to SMS accounts and rights to shared directories so that you can make those changes again after a failure.

Custom SMS Administrator consoles or Network Monitor files are not backed up. Keep these files under the \SMS directory so they are automatically backed up.

SMS clients are not backed up. Ideally, the SMS data on clients should be backed up regularly; otherwise, assigned programs that had already run when a client failed can run again when the failed client is restored. SMS client data is stored in the system directory, so if that directory is backed up, the SMS client data is backed up.

Note: The client cannot be backed up without the risk of corrupting the client data on the disk or in the backup.

Other Data That Is Not Backed Up
Although tasks can be added to the SMS backup control file so that it backs up anything, it is safer and more convenient to place files somewhere under the \SMS directory to ensure that they are backed up each time SMS_SITE_BACKUP runs.

Alternatively, some data can be backed up by non-SMS backup tools, which reduces the time that SMS_SITE_BACKUP runs. This minimizes site downtime and a backlog of SMS tasks when the site resumes operating.

Scheduling SMS_SITE_BACKUP
SMS_SITE_BACKUP must run frequently enough to avoid a large difference between the data in the backup and the current data at the time of a failure. For example, if several SMS tasks are performed each hour, then SMS_SITE_BACKUP should run daily. If several SMS tasks are performed each day, backing up once or twice per week is acceptable.

SMS_SITE_BACKUP must run when the site server and SMS site database will not be disturbed. To avoid the risk of corrupting data, there should be no access to the SMS site server or the SMS site database while SMS_SITE_BACKUP runs. This means no packages can be sent, no SMS Administrator consoles can access the site, no Crystal Info reports can be run, and no data can be processed. Clients, however, will continue to have uninterrupted access to client access points, distribution points, and logon points.

Coordinating SMS_SITE_BACKUP with Other Backups
SMS_SITE_BACKUP writes to a location that can be backed up by another backup tool as part of your overall corporate backup plan. Therefore, be sure to coordinate these two activities so that SMS_SITE_BACKUP does not run at the same time as your corporate backup; otherwise, the SMS files will not be copied by your corporate backup tool, or SMS_SITE_BACKUP will not be able to write to those files while they are being copied by your corporate backup tool.

SMS_SITE_BACKUP overwrites any previous SMS_SITE_BACKUP files. Therefore, if you want to save multiple copies of SMS backups, your corporate backup tool must copy them (or you must copy them to a different directory) at least as often as SMS_SITE_BACKUP runs.

Running SMS_SITE_BACKUP Now
After the schedule for the backup is configured in the SMS Administrator console, it can take up to 24 hours for the schedule change to take effect. A backup can be started immediately, at any time, by starting the SMS_SITE_BACKUP service on the site server. When the backup cycle is completed, the service automatically stops itself. This technique can be useful to run the first backup right away or to run an unscheduled backup before doing unexpected hardware maintenance on the server.

Upgrade Overwrites Backup Control File
If you are using a customized backup control file, save a copy of it before you upgrade. The copy in the directory SMS\inboxes\smsbkup.box\smsbkup.ctl is overwritten during the upgrade. After upgrading your site, use the old backup control file as a guide to update the new file.

There are significant differences between the backup control file (Smsbkup.ctl) for SMS 2.0 SP1 and SMS 2.0 SP2. The backup control file used for SMS 2.0 SP1 cannot be used for SMS 2.0 SP2, and it is overwritten during the upgrade process. If you have customized the backup control file, you must reproduce the customizations in the new version of the file.

The SP1 version of the backup control file was based on the features that Express Setup installed, and it generated spurious errors if the site had been installed using Custom Setup with fewer features installed.

The SP2 version of the backup control file is based on a minimal installation of SMS. The backup tasks for the optional components are listed, but they are commented out. If optional components are installed on your site, you must uncomment the tasks for those components to include them in the backup.

If you are using the default backup control and any of the following components are in use, edit the new Smsbkup.ctl file to remove the comments from the lines that refer to them.

Crystal Info for SMS

Network Monitor

SNMP Events

Software Metering

Backup and Secondary Sites

SMS_SITE_BACKUP does not back up any data on secondary sites except data that is already in the primary site's SMS site database. However, there is data on secondary site servers that can help restore them. Therefore, consider backing up secondary site servers periodically.

The Backup SMS Site Server task does not support secondary sites. You must use the backup control file on a primary site as a template and create a script to back up the secondary site. Because secondary sites do not have an SMS site database and do not have a remote SQL Server, there is no SQL Server interaction or interaction with remote site systems when you back up a secondary site. This greatly simplifies secondary site backup.

Evaluating Secondary Site Backup Requirements
You must evaluate backup requirements for each secondary site. Backup requirements depend on:

The availability of backup support at the site server location.

How mission-critical the site is.

Cost tradeoffs.

For example, you can take a different approach to backup in the following scenarios:

Recommended backup: a mission-critical site with a hundred local users, including a part-time server administrator; site has a tape backup unit on the local network.

Unnecessary backup: a non-critical remote office with only four permanent staff; site has no local tape backup unit.

Keep in mind that backup is a regular overhead cost, and with reliable hardware and a secure server location, failure rarely happens. Your combined goal is to reduce the cost of desktop management and maintain reasonable uptime.

If there is more than one CAP, logon point, or distribution point, even if the site server goes down, the client continues to install software and pass inventory up the hierarchy. Putting a second CAP, logon point, and distribution point on an extra server to provide better support for the clients at that site can be more cost-effective than running a regular backup on the site server.

Backing Up a Site Server's Client Files

The backup and restore operation is not available for the SMS client. Currently, SMS does not support stopping and restarting the client processes. If you back up the client data while the processes are active, you risk corrupting the data on the disk and the data in the backup.

Backing Up Site Systems

If CAPs or logon points fail, the easiest way to recover them is to delete the CAPs and logon points and let the site rebuild them from data stored on the site server. This ensures that the site is synchronized. However, if the the CAP has many files, the site has many CAPs, and the network is slow, it might take a while to rebuild them all.

Backing Up Account Data

For increased security, you can configure SMS to use many accounts. If your site is configured this way, recreating all the accounts and passwords can be tedious even if you have recorded all the account names and passwords; without them, you must start over.

The best strategy is to use multiple domain controllers so that you do not have to back up account data. If you have only a single domain controller or no domain controllers in the site, back up the account information in \%SYSDIR%\System32\Config\Sam.

Top of pageTop of page

Recovery

There are two basic procedures for site recovery.

For site recovery with a backup, do the following:

1.

Back up the registry keys, files, and if appropriate, the SMS site database.

2.

Copy the backup to tape and store the tape.

3.

After site failure, restore the backup and resynchronize and repair the site.

For site recovery without a backup, do the following:

1.

After site failure, resynchronize and repair the site.

2.

Recreate and redistribute the packages and advertisements.

Key Backup and Recovery Points

The following items are the key things to know about backup and recovery:

Sites can be backed up and restored.

Sites can be recovered without a backup by using the same site code as the failed site, but more data is lost than if the sites were recovered using a backup.

Procedures exist to recover from losing data after a site recovery.

The more often you back up data, the less data is lost because of a site failure, although you can always lose some data. The administrators responsible for the site must decide the cost-benefit, break-even point of spending resources on backup versus spending resources dealing with partial data loss after a site failure. This applies to all SMS site systems.

Top of pageTop of page

Best Practices

What follows is a summary of the best practices discussed in this article.

Always run a full site recovery operation when you install a site using a previously used site server name or site code.

Always back up before and after you upgrade SMS, SQL Server, or the operating system.

Always back up after you change accounts.

Do not restore an SMS site database unless you have a recent backup. You can recover a site's functionality without restoring a backup. However, all inventory and status information is lost, and most or all data is lost.

The more often you back up data, the less data is lost because of a site failure.

Back up the SMS site backup to tape, and store some backup tapes offsite.

Back up the site as a snapshot.
SMS data is stored in several places. When you are backing up a site, always stop the SMS site services and back up the following items, so they can be restored as a snapshot. Failing to follow this procedure is the most common cause of corrupted backups, and trying to restore these corrupted backups is the second most common cause of unsuccessful site recoveries. SMS data from all the following locations must be restored for an SMS site to function correctly:

SQL Server databases for the site.

Site server files from the site directory (from share SMS_<site code>).

The SMS and NAL registry keys.

Configuration information on computers running Windows NT and SQL Server.

Document your hierarchy.

Document your site codes, site names, and site server names to avoid naming conflicts.

Diagram your site hierarchy structure to make it easier to design plans for distributing software that makes best use of network bandwidth and to make troubleshooting problems easier as they flow through the hierarchy.

Document customizations to the operating system security.

Note domain accounts that have been placed in local groups, or vice versa.

Record auditing, account, and system policies that have been implemented.

Document account rights or permissions that have been altered.

Although custom SMS account names might be self-explanatory, document how the account names are used.

Verify that the backup control file matches the site configuration.

In SP1, the backup control is based on a full installation of all components, and if software metering or Crystal Reports is not installed, backup generates error messages every time it runs. In SP2, some lines in the the backup control file are commented out to avoid error messages, and if these are needed for a complete backup, the comment marks should be removed.

Top of pageTop of page

Poor Practices

The following list includes backup and recovery practices you should avoid.

Disconnecting a child site from a failed site does not protect it from, or clean up, any damage that might have already occurred. Disconnecting a child site from a failed site does not protect it if mistakes are made during recovery and it is later reconnected to any site below the failed site in the hierarchy.

To protect a child site, always check for failed site recoveries at the new parent site, and all sites above it, before connecting a child site to the new parent site. Checking for failed recoveries helps to avoid corrupting software distribution objects at the connecting site and at all of its child sites.

Note: Connecting to sites that were not correctly recovered is the most common cause of corrupting software distribution data for the whole hierarchy.

Recovering a site without a valid backup results in more data loss and a longer time to re-establish regular operations than recovering a site with a valid backup.

Running backup while the SMS site is running causes relational integrity problems. To avoid these problems in the site backup, stop site services before running backup, and then run the backup. The Backup SMS Site Server task stops the services automatically.

Backing up only the SMS site database or site server files—that is, restoring only part of the site data—results in a completely broken site.

Failing to record account passwords or server configuration information before site failure is problematic. If you do not have account passwords or server configuration information, a site failure is much more disruptive, and recovery is much more problematic than if you do have this information.

Incorrectly synchronizing serial numbers after reinstalling a site causes problems. The worst case that can result from serial number problems for software distribution objects is that you must manually delete all software distribution packages and advertisements files and records from all site servers, site systems, and clients under the failed site.

Using the backup task on a parent site to backup a secondary site can cause problems:

Usually a secondary site is connected to a parent site through a poor quality network connection; otherwise, the clients on the secondary site are usually included only as members of the parent site.

The backup task was designed to back up local data, and it cannot compensate for the problems caused by poor quality network connections. A poor quality network connection could cause a corrupted backup, which would cause a failed site recovery.

There is no throttling support in the backup task, so it could saturate the network connection of a slow link for an extended period.

All the sites can be inactive if multiple backups were being run, which can be a significant problem for a parent site with many secondary sites. It can also be a problem with a site that was close to its limits to keeping up with the load and that must be operational as many hours a week as possible.

Backing up the SMS client while it is running can cause data corruption, as can trying to stop the client, so the client must not be backed up at this time.

Don't use spaces in the backup destination name.

In SP1, if the site directory is "D:\SMS" and the backup destination name is "D:\SMS Backup", backup deletes the site when it cleans up the old backup.

In SP2, double quotes were added around the token for backup destination on the line where the old backup is deleted, which fixes the problem in SP1. However, since much of the configuration information is collected using command-line tools, and some of them can't accept a destination name with spaces in it, some configuration information is not backed up if there are spaces in the destination name.

Top of pageTop of page

Recovery Tools

The SMS Recovery Expert is built into the SMS Maintenance and Recovery area of this site, and the other tools are included in the Recovery Tools.

SMS Recovery Expert

The SMS Recovery Expert generates a complete list of required tasks to recover a failed SMS site based on your specific scenario and configuration. This list is broken into the following phases:

Prepare

Rebuild

Restore

Repair

You must always run the SMS Recovery Expert. When the SMS Recovery Wizard is available, the list of tasks that the SMS Recovery Expert generates indicates when to run the SMS Recovery Wizard.

Recovery tasks are included only in the SMS Recovery Expert, because it is beyond the scope of a single document to explain every type of recovery scenario. For a list of all possible recovery tasks, see All SMS Recovery Tasks (note that clicking on this link will open a new instance of the browser).

If a failed SMS site is on a secure network that does not have access to the recovery page on the Microsoft.com Web site, download the recovery tools and documents from the Web page to removable media, print out the recovery tasks, and then move the media to the secure network. You can also call PSS for assistance.

SMS Recovery Wizard

The SMS Recovery Wizard automates some of the repair and resynchronization tasks that are required to complete a recovery. It also does some repair that is impossible to do manually. Running the SMS Recovery Wizard is optional, but if you do run it, you must run the SMS Recovery Wizard after you use the SMS Recovery Expert.

ACL Reset Tool (ACLreset.exe)

Use ACL Reset to reset the access control lists used by the SMS Server Connection account. Run ACL Reset each time you create a new SMS Server Connection account even if you recreate it with the same name. However, you do not have to use this tool when you first set up a site.

Hierarchy Maintenance Utility (Preinst.exe)

The Hierarchy Maintenance Utility (Preinst.exe), formerly called Site Utilities Tool, passes commands to Hierarchy Manager while Hierarchy Manager is running. Use Preinst.exe to diagnose problems in a site, repair a site, or stop all SMS services at a site. For example, suppose that you remove an SMS site incorrectly by removing a child site from its parent site without detaching it first. You can use Preinst.exe to bypass the SMS Administrator console and delete the incorrectly removed site from the parent SMS site database.

Unenforce Software Metering (Unenforce.exe)

Use Unenforce Software Metering (Unenforce.exe) to turn off software metering enforcement. Usually, it is best to do this through the Software Metering Management Tool in the SMS Administrator console. However, if software metering enforcement is turned on and the site server fails, you can use Unenforce.exe to quickly turn off software metering enforcement. This prevents users from being denied access to their applications due to license balancing errors. Unenforce,exe works by setting the BIPASSIVE flag to "1" in the Microsoft Visual FoxPro® database on the software metering server.


Top of pageTop of page