|
The Microsoft IT group backs up its clustered Microsoft® Exchange Server 2003
mailbox infrastructure by using a modified version of the Backup feature in Microsoft
Windows Server™ 2003. The modifications enable Microsoft IT to increase throughput
and back up the data faster, thereby meeting service level agreements (SLAs) despite
the very large quantity of data backed up nightly.
Introduction
The Microsoft IT group uses a two-stage process to complete the backup requirements
in its clustered Exchange Server 2003 environment.
The process is discussed at a high level in the IT Showcase technical case study
"Messaging Backup and Restore at Microsoft,".
The IT Showcase white paper
Exchange 2003 Design and Architecture at Microsoft also includes a discussion
on this topic.
This document defines the setup process and server optimizations required to complete
the Exchange Server 2003 backup process that Microsoft IT uses. This document
assumes that readers are either Exchange architects or technical implementers and
are already familiar with Exchange Server 2003, Windows Server 2003 clusters,
and Exchange server backup procedures. Note For security reasons, the sample names of forests,
domains, internal resources, organizations, and internally developed applications
and files used in this document do not represent actual names used within Microsoft
and are for illustration purposes only. In addition, the contents of this document
describe how Microsoft IT runs its enterprise data center. The procedures and processes
included in this document are not intended to be prescriptive guidance on how to
run a generic data center and may not be supported by Microsoft Customer Support
Services. Cluster Setup
The setup steps for clusters are based on the assumption that the Exchange storage
groups and mail stores already exist. The initial steps require some specific cluster
configurations to allow for disk resource movement between nodes without affecting
the Exchange clustered resources. Naming of Clustered Servers
Microsoft IT uses a multi-node clustered design in which the largest cluster configuration
contains seven nodes. There are four active (A) nodes, one primary passive (P) node,
and two alternate passive (p) nodes. Such a cluster is abbreviated as AAAAPpp. Each
node configuration provides a different level of functionality specifically to support
the backup process.
Servers in the Exchange server clusters are named through a specific, three-part
naming conventionxxx-yyyy-nto allow for easy recognition of
server roles. In this naming convention: - The xxx part represents the geographic location of the server.
- The yyyy part identifies the default role of the server in the cluster.
- The n part is the numeric representation of the server within the cluster.
Although the geographic location code is arbitrary to this discussion, clearly identifying
the server roles is necessary for promoting an understanding the Microsoft IT Exchange
server backup process. Table 1 lists the role definitions. Table 1 Roles of Servers in an Exchange Server Cluster |
Role |
Definition | |
ACTV |
Active node | |
PASS |
Passive node (both primary and alternate) |
For example, the second active node in an Exchange server cluster located in a Texas
data center would be named TEX-ACTV-2. Naming of Cluster Resource Groups
A resource group is a movable entity of resources within a cluster. Resources within
a group can be moved within the cluster independently of other resource groups.
The cluster is configured to support multiple resource groups through a three-part
naming convention similar to the server naming conventionxxx-zzzz-nto
allow for easy recognition of resource group functions. Each resource group in a
cluster represents a virtual server for that cluster. In this naming convention: - The xxx part represents the geographic location of the Exchange instance.
- The zzzz part identifies the role of the Exchange instance.
- The n part is the numeric representation of the Exchange instance within
the cluster.
Clearly identifying the resource group roles is also necessary for understanding
the Microsoft IT Exchange server backup process. Table 2 lists the role definitions. Table 2 Roles of Resource Groups in an Exchange Server
Cluster |
Role |
Definition | |
MAIL |
Mailbox group. This resource group maintains all resources to support Exchange functionality.
There are four MAIL groups within the seven-node cluster to maintain four active
Exchange virtual instances. | |
BACK |
Backup group. This resource group maintains an Internet Protocol (IP) address, a
network name, a physical disk, and file share resources to support the disk-to-disk
and disk-to-tape backup processes. There are four BACK groups within the seven-node
cluster; one backup group is assigned to each instance of an Exchange mail server. |
For example, the first mailbox resource group in an Exchange server cluster located
in a Texas data center would be TEX-MAIL-1. Cluster Layout
In the Microsoft IT configuration, each cluster contains its own resource groups
(defined by role) and server names (defined by default status). Each of the four
active-node servers has its own MAIL resource group and BACK resource group. The
resource group layout used is required for supporting the two-stage backup process
that Microsoft IT uses.
Custom Backup Configurations
To accomplish its backup process, Microsoft IT customized the configuration of the
built-in Backup feature of Windows Server 2003 by modifying the registry, creating
backup selection files, and creating backup scripts. Modifying the Registry
The following steps detail how Microsoft IT implemented specific registry modifications
that optimized the data throughput of the built-in Backup engine:
Warning Incorrectly editing the registry can have serious,
unexpected consequences that can prevent the system from starting and require you
to reinstall Microsoft Windows®. A recommended best practice is to export the subkey
before editing it. You can also set a System Restore point prior to using Registry
Editor. - Start a Remote Desktop Connection session to xxx-ACTV-0 by using an account
that will be assigned to support the scheduled tasks for Backup. The account will
need, at a minimum, Exchange View only administrator user rights and must
be a member of the Backup Operators group on each node within the cluster.
- Start Regedit.exe, and then browse to HKEY_CURRENT_USER\Software\Microsoft\Ntbackup.
Note There will be subkeys at this location in the registry
only if the Backup feature was previously run to back up data on the computer. If
subkeys exist, skip to step 4; if subkeys do not exist, continue with the next step. - Start Backup, and then back up a single, small file to disk. This action will
cause Backup to create the necessary registry entries. After Backup has finished,
you can delete the backed-up file that you created. Close Backup.
- Expand HKEY_CURRENT_USER\Software\Microsoft\Ntbackup to reveal the subkeys.
- Browse to the HKEY_CURRENT_USER\Software\Microsoft\Ntbackup\Backup Engine
subkey.
- Edit the value of the entry Logical Disk Buffer Size from 32 to
64.
- Edit the value of the entry Max Buffer Size from 512 to 1024.
- Edit the value of the entry Max Num Tape Buffers from 9 to
16.
- Close Regedit.exe.
Creating Backup Selection Files
Microsoft IT uses all four available storage groups for a total of 20 mail database
stores per Exchange virtual instance. The sizes of Microsoft IT's mail database
stores average 3050 gigabytes (GB). Based on both the sizes of its mail database
stores within the storage group and the goal to optimize the time needed for the
data recovery process, Microsoft IT backs up Exchange at the mail database store
level rather than the storage group level. More specific reasons for backing up
this way include: - It reduces the size of the .bkf flat file created during the backup process
on disk. A storage group backup would produce one ~200-GB flat file, whereas a mail
database store backup would produce multiple ~30-GB to ~40-GB files (one per mail
store). The time savings of using a mail database store backup is significant because
if an error occurs when streaming this content to tape, the retry occurs with a
~30-GB to ~40-GB file rather than a ~200-GB file.
- It allows for running more concurrent jobs when streaming to tape (one scheduled
job per flat file), depending on the number of tape devices available to process
requests.
However, it must be noted that backing up by mail store rather than by storage group
makes recovery more complicated in the event that an entire storage group has to
be recovered due to log replay requirements. Other IT organizations who manage smaller
mail stores may choose to perform storage group backups for its comparative simplicity.
The following steps can be used to create a preconfigured storage group backup job
for each storage group: - Start a Remote Desktop Connection session to xxx-ACTV-0 by using an account
assigned to support the scheduled tasks for Backup.
- Start Backup.
- Click the Backup tab.
- Expand Microsoft Exchange Server xxx-MAIL-0 in
this example.
- Expand Microsoft Information Store.
- Select the first storage group.
- Click the Job menu, and then click Save Selection As.
- Type SG1 for the file name (as an example for the first storage group),
and then click Save to store the file to the default folder on disk: C:\Documents
and Settings\user_name\Local Settings\Application Data\Microsoft\Windows
NT\NTBackup\data. The user name in this path is for the account referenced in step
1.
- Cancel the selection of the first storage group.
- Repeat steps 6 through 9 for the remaining storage groups by using SG2,
SG3, and SG4 for the corresponding storage groups if required.
The goal of these steps is to create the Backup selection files (.bks) for each
storage group.
Note Never modify the selection files after they are created,
or they will be rendered useless and will have to be re-created through the preceding
steps. Creating Backup Scripts
Microsoft IT created command-prompt backup scripts for its full nightly backup jobs.
The following command-prompt statements were collected in the Sg1.cmd file to back
up the first storage group. The command file automatically re-enabled the relevant
registry modifications for the Backup feature to ensure optimized throughput. reg add "HKCU\Software\Microsoft\Ntbackup\Backup Engine" /v
"Logical Disk Buffer Size" /t REG_SZ /d 64 /f
reg add "HKCU\Software\Microsoft\Ntbackup\Backup Engine" /v
"Max Buffer Size" /t REG_SZ /d 1024 /f
reg add "HKCU\Software\Microsoft\Ntbackup\Backup Engine" /v
"Max Num Tape Buffers" /t REG_SZ /d 16 /f
C:\%OS%\system32\Ntbackup.exe backup "@C:\Documents and
Settings\user_name\Local Settings\Application
Data\Microsoft\Windows NT\NTBackup\data\Sg1.bks" /n "SG1" /d
"SG1" /v:no /r:no /rs:no /hc:off /m normal /j "SG1" /l:s /f
U:\Sg1.bkf" Note Ntbackup.exe is the command line executable program
file that provides the services associated with the Backup feature.
The following are definitions for the subset of command-prompt switches used with
Ntbackup.exe in the preceding example: - /d "set description" (specifies a label for each backup set)
- /f "file name" (specifies path and file name of .bkf file)
- /hc:[on/off] (uses hardware compression)
- /j "job name" (specifies job name used in log file)
- /l:[f/s/n] (specifies log file summary type: full, summary, or none)
- /m "[normal/copy/differential/incremental/daily]" (specifies type)
- /n "media name" (specifies new media name)
- /r:[yes/no] (restricts access to owner or Administrators group)
- /rs:[yes/no] (uses remote storage)
- /v:[yes/no] (verifies data after backup is complete)
Note For a complete list of command-prompt switches and
their definitions, see the
Ntbackup Web page.
Microsoft IT created additional .cmd files to reference the other selection files
for SG2, SG3, and SG4. These .cmd files were created without the registry modification
references. Microsoft IT then copied all backup .cmd files to a folder designated
for storing them. Creation of Scheduled Tasks
Microsoft IT uses the Scheduled Tasks application to automate the nightly full backup
process. Creating Schedules for Backup Jobs
Microsoft IT uses the following procedure to set up the scheduled tasks to run the
backup jobs: - Start a Remote Desktop Connection session to xxx-ACTV-0 by using an account
assigned to support the scheduled tasks for backup.
- Click Start, point to All Programs, point to Accessories,
point to System Tools, and then click Scheduled Tasks to start the
application.
- Click File, and then click New to create a blank scheduled task
(use a name such as SG1).
- Open the properties of the new scheduled task, and then type the path of the
Sg1.cmd file created earlier.
- Set the account details and password under which the new scheduled task will
run. Use the same account used during step 1.
- Click the Schedule tab, and then define a start time. For example, use
8:00 PM for the first backup job (SG1).
- Click OK to save the modifications to the scheduled task.
- Create new scheduled tasks for the remaining storage group backup jobs. Schedule
backup jobs for SG2, SG3, and SG4 to start five minutes after SG1 to ensure that
the registry modifications for Backup are in place.
Creating Move Schedules for Cluster Resource Groups
Microsoft IT creates move schedules for cluster resource groups. This action ensures
that the BACK resource group is active on the same node as the MAIL resource group
for the start of the schedule for the first stage, disk-to-disk backup.
The use of these scheduled tasks ensures that the xxx-BACK-0 group is moved
between active and alternate passive nodes to process the first-stage and second-stage
backups.
To create a scheduled task that will move xxx-BACK-0 to the active node: - Start a Remote Desktop Connection session to xxx-ACTV-0 by using an account
assigned to support the scheduled tasks for backup.
- Start Scheduled Tasks.
- Create a new scheduled task called MoveACTV.
- Open the properties of the schedule, and then type C:\Windows\system32\cluster.exe
group xxx-BACK-0 /move:xxx-ACTV-0 in the Run
box.
- Set the account details and password under which the new scheduled task will
run. This account should be the same account used during step 1.
- Click the Schedule tab, and then define a start time. For example, use
7:00 PM to ensure that the move is completed prior to the 8:00 P.M. scheduled
backup.
To create a scheduled task that will move xxx-BACK-0 to an alternate passive
node: - Create a new scheduled task called MovePASS.
- Open the properties of the schedule, and then type C:\Windows\system32\cluster.exe
group xxx-BACK-0 /move:xxx-PASS-2 in the Run
box.
- Set the account details and password under which the new scheduled task will
run. This account should be the same account used during step 1.
- Click the Schedule tab, and then define a start time. For example, use
1:30 AM to allow enough time for the 8:00 P.M. scheduled backup to be completed.
Note There is no automation in place to monitor the end
time of the first stage, disk-to-disk backup. The process needs to be monitored
manually to ensure that the resource groups are not removed before the disk-to-disk
backup is completed. Issues
Microsoft IT encountered one key issue that needed to be resolved before the solution
was ready to be implemented.
Using the version of the Backup feature included with Windows Server 2003, Microsoft
IT detected a reduction in sustainable throughput over a 20-minute period. A cache
contention issue caused the reduction in throughput. This issue was further aggravated
by the increased backup throughput available as a result of the registry modifications
to the Backup engine.
Microsoft IT noticed a negative trend in performance when running four concurrent
backups (one per storage group). Processor utilization increased over time, and
a corresponding increase occurred in processor utilization on the system process.
These increases correlated to a simultaneous decrease in sustainable throughput.
All of these situations can be detected by means of performance monitoring.
To eliminate the cache contention problem, Microsoft IT used a revised version of
Backup that provides a new command-prompt switch. The switch enables a "file unbuffered"
setting to bypass the cache manager. This change provides a number of benefits during
the disk-to-disk backup process: - Sustainable throughput over time
- Reduction in processor utilizationpeak utilization reduced to 30 percent
on average
- Elimination of impacts to the system process during the backup job
The revised version of Backup that supports the file unbuffered (/fu) switch is
targeted for general delivery with Service Pack 1 for Windows Server 2003. However,
the revised version of Ntbackup.exe can be directly downloaded as a hotfix through
the Microsoft Knowledge Base article
"System performance is negatively affected when Ntbackup.exe writes to a destination
.bkf file".
For users who have deployed the revised version of Ntbackup.exe, the last line of
the backup script defined earlier in this document should be modified to include
the /fu switch. A modified example of that command is as follows: C:\%OS%\system32\Ntbackup.exe backup "@C:\Documents and
Settings\user_name\Local Settings\Application
Data\Microsoft\Windows NT\NTBackup\data\Sg1.bks" /n "SG1" /d
"SG1" /v:no /r:no /rs:no /hc:off /fu /m normal /j "SG1" /l:s
/f U:\Sg1.bkf"
To illustrate the dramatic improvement in backup performance associated with the
use of the /fu switch, a pair of System Monitor screenshots were taken of an Exchange
mailbox server running Ntbackup.exe. The first image shows the server running Ntbackup.exe
without using the /fu switch. As shown in Figure 1, the disk write bytes per second
(white line) performance of Ntbackup.exe continually degraded over time while the
processor utilization associated with the System process (pink line) quickly spiked
to and remained at 100 percent. This resulted in the total processor utilization
(red line) starting relatively high and continued to increase for the duration of
the backup.
.gif)
If your browser does not support inline frames, click here
to view on a separate page.
Figure 1 Running Ntbackup.exe without the /fu switch
On the same server backing up the same data while working under the same load, the
/fu switch was added to the backup script and a second System Monitor screenshot
was snapped, as shown in Figure 2.
.gif)
If your browser does not support inline frames, click here
to view on a separate page.
Figure 2 Running Ntbackup.exe with the /fu switch
In contrast to the previous image, Figure 2 shows how the disk write bytes per second
(white line) performance of Ntbackup.exe immediately achieved a much higher level
and maintained that level throughout the backup process. The processor utilization
associated with the System process (pink line) started near zero and was largely
unaffected by running Ntbackup.exe. This resulted in the total processor utilization
(red line) starting relatively low and maintained that same level for the duration
of the backup. Findings and Recommendations
The optimizations and throughputs that Microsoft IT received in its two-stage backup
process are based on its specific hardware platform that was tuned for optimized
throughput. Microsoft IT recommends that any enterprise that is planning an Exchange
deployment should perform an investigation during the infrastructure design stage
to determine the optimized throughput of a proposed storage solution. The findings
of Microsoft IT include the following: - Using the updated version of Ntbackup.exe enables the use of the new /fu switch,
which increases initial throughput and sustains it over time with a substantial
reduction in processor utilization without impact upon the System process.
- Individual backup throughput per storage group can be sustained at approximately
1.2 GB per minute.
- Total throughput can be sustained at approximately 4.8 GB per minute per Exchange
virtual server with four concurrent backups running.
- Some storage area network (SAN) enclosures may not be able to support such high
levels of sustained throughput. The enclosure used by Microsoft IT sustains between
6 GB and 7 GB per minute, which enables Microsoft IT to run eight concurrent backups
across two Exchange virtual servers per SAN.
- Sustainable throughput should be monitored through System Monitor in Windows
Server 2003 for the disk-to-disk backup process during testing stages. Specifically,
the Total Disk Write Bytes/Sec performance counter should be monitored.
- Throughput to tape can be sustained at a rate of 1.6 GB per minute per stream
with up to four concurrent streams to four LTO1 tape devices, as validated in production.
- Microsoft IT achieved best throughput by having ample Logical Unit Number (LUN)-to-SAN
controller distribution that uses a third-party multipath software solution. Microsoft
IT's configuration uses the following design:
- SG1 and SG2 data, log, and backup disks per virtual server on controller 1
- SG3 and SG4 data, log, and backup disks per virtual server on controller 2
- Restore rates can be achieved in the range of 2 GB per minute for a disk-to-disk-based
restoration. This throughput is achievable once the disks being written to are not
under any form of production load.
- Microsoft IT configured the backup target disks by using redundant array of
independent disks level 5 (RAID 5) as a cost-effective use of disk resources.
- Microsoft IT disabled mirrored write-back cache on all backup targets to eliminate
a potential degradation of disk write performance.
Note This setting is specific to HP StorageWorks Enterprise
Virtual Array 5000, the storage enclosure deployed by Microsoft IT. - Microsoft IT modified the starting sector on all backup target disks to eliminate
the offset on primary partitions as a result of the default master boot record (MBR)
structure to improve disk performance. Microsoft IT recommends the use of the Windows
2000 Server Resource Kit tool Diskpar.exe to partition LUN volumes to achieve this
configuration.
- Microsoft IT strongly advises against the sharing of disk spindles to support
backup targets and Exchange databases. Following this as a best practice helps ensure
that sequential content streaming to tape does not adversely affect the random access
requirements for Exchange. Following this recommendation enables Microsoft IT to
stream the backup content to tape as part of the second-stage process at any time
during the day without affecting users supported on the clusters.
|