Service Management Functions

Storage Management

On This Page
IntroductionIntroduction
Executive SummaryExecutive Summary
Process and ActivitiesProcess and Activities
Roles and ResponsibilitiesRoles and Responsibilities
Relationship to Other ProcessesRelationship to Other Processes
ContributorsContributors

Introduction

Document Purpose

This guide provides detailed information about the storage management service management function (SMF) for organizations that have deployed, or are considering deploying, Microsoft® technologies in a data center or other type of enterprise computing environment. This is one of the more than 20 SMFs defined and described in Microsoft® Operations Framework (MOF). The guide assumes that the reader is familiar with the intent, background, and fundamental concepts of MOF as well as the Microsoft technologies discussed.

An overview of MOF and its companion, Microsoft Solutions Framework (MSF), is available in the Introduction to Service Management Functions guide. This overview guide also provides abstracts of each of the service management functions defined within MOF. Detailed information about the concepts and principles of each of the frameworks is also available in technical papers available at http://www.microsoft.com/solutions/msm/.

Executive Summary

Storage management deals with onsite and offsite data storage for the purposes of data restoration and historical archiving. The storage management team must ensure the physical security of backups and archives. The goal of storage management is to define, track, and maintain data and data resources in the production IT environment.

Storage management is concerned with the operation and maintenance aspects of storage media.

Process and Activities

Storage Management Overview

The storage management operational process is a key component of the overall system administration process. Storage management is concerned with the operation and maintenance aspects of storage management. The process is used to define, track, and maintain data and data resources in the production IT environment.

Defining data and data resources involves the following tasks:

Developing the necessary plans for classifying, storing, restoring, and recovering data

Developing the appropriate policies and procedures for storing, restoring, and recovering data

Tracking data and data resources involves the following tasks:

Developing the appropriate procedures for monitoring storage resources (that is, availability, capacity, and performance)

Monitoring storage resources to ensure that they are in a usable state, according to business requirements

Predicting future storage needs based on current trends

Maintaining data and data resources involves the following tasks:

Submitting requests for change (RFCs) according to the change management process for any required changes to data and/or storage resources

Changing and tuning storage resources to improve availability, capacity, or performance needs (subject to the dictates of the change management process)

Ensuring that data is stored in accordance with established data security policies

Taking appropriate action to meet changes to storage needs

The storage management operational process consists of the following two major focus areas: data backup, restore, and recovery operations; and storage resource management. Each area contains various activities and associated tasks, which are described in this document.

Data Backup, Restore, and Recovery Operations

Storing, restoring, and recovering data are key storage management operational activities surrounding one of the most important business assets: company data. These activities ensure that data are stored properly and available for both restore and recovery, according to business requirements.

Data should be classified according to type, and a strategy should be developed to ensure that backup, restore, and recovery operations can be performed to fulfill business requirements and service level objectives. For more information, see the "Classify the Data" and the "Planning a Backup Strategy" sections of this document.

Note: The data backup, restore, and recovery operations activities also address planning for disaster recovery operations, but the scope is limited specifically to recovering data. For more information, see the "Disaster Recovery Considerations" section. This document does not discuss overall "business recovery" operations that typically include recovering all infrastructure components (servers, networks, and so on) in the event of a disaster. For complete business recovery operational details, see the contingency planning and the service continuity management guides.

Storage Resource Management

Storage resource management is a key storage management activity focused on ensuring that important storage media, such as disks, are formatted and installed with the appropriate file systems, and that removable storage media (for example, tapes, CDs, and so on) are organized (for example, through the use of libraries), used, recycled, and eventually retired according to business needs. For more information, see the "Disk Management", "File System Administration", and the "Tape Management" sections of this document.

In addition, storage resource management involves using management technologies to monitor storage resources to ensure they meet availability, capacity, and performance requirements. For more information, see the "Develop a Storage Monitoring and Management Plan and Storage Event Monitoring" sections of this document.

The ongoing, daily storage management activities in an existing data center include: data backup, restore, and recovery operations; storage resource management activities; and other activities described in this document.

Goals and Objectives

The goals and objectives of storage management are to ensure that adequate storage exists to meet the business needs pertaining to SLAs using available technology resources. This ensures that any failures are identified in a timely fashion, future business requirements that impact storage are understood by the IT department, and the operation of the storage management function is undertaken in the most efficient and effective manner.

Scope

Storage management is concerned with the design, implementation, and operation of appropriate storage solutions to meet the needs of the organization:

Storage management produces the necessary policies and procedures for classifying, storing, restoring, and recovering data.

Storage management monitors storage resources on an ongoing basis and predicts future storage resource requirements.

Storage management implements changes and tuning actions to maintain a stable, efficient, and managed storage environment that meets business requirements.

Major Processes

Storage management comprises two main processes and a number of sub-processes as follows:

Data backup, restore, and recovery operations

Planning a backup strategy

Classify the data

Define the backup requirements

Determine how much data to store

Determine where the data is located

Determine projected data growth

Determine backup and restore performance

Determine the database backup and restore needs

Determine e-mail backup requirements

Determine backup requirements for personal computer clients

Determine time tables for backups and restores

Determine data archiving (offsite storage) requirements

Identify the constraints

Define backup and restore policies

Analyze backup and restore requirements

Select and acquire storage infrastructure components

Develop a storage monitoring and management plan

Develop procedures and methods

Develop a resource plan

Test the backup strategy

Implementing the backup strategy

Disaster recovery considerations

Questions a disaster recovery plan should answer

Testing restore and recovery procedures

Hierarchical storage management

The relationship between remote storage and removable storage

Storage resource management

Storage event monitoring

Storage management events to monitor

Analyzing events

Media management

Disk management

Common disk configurations

Direct-attached storage configurations

Centralized disk storage configurations

Network-attached storage configurations

Storage area networks

File system administration

Volume management

What is a volume set?

Managing disk availability

Selecting a RAID strategy

What is a disk cluster?

Managing disk capacity

Managing disk performance

Disk fragmentation

Tape management

Preparing media for data storage

Methods for using tape media for backups and recycling

To be avoided: tape-a-day

Grandfather-Father-Son (GFS)

The Tower of Hanoi

Media retirement

Figure 1: Storage management process and activities

Figure 1: Storage management process and activities

Data Backup, Restore, and Recovery Operations

Storing, restoring, and recovering data are key storage management activities for maintaining company data. Data should be classified by type, and a strategy should be developed to ensure that operation fulfill business requirements and service level objectives. For more information, see the "Classify the Data" and the "Planning a Backup Strategy" sections of this document.

Storage management also addresses planning for disaster recovery. This document covers data recovery, but does not cover overall business recovery operations for other infrastructure components including servers and networks. For more information, see the "Disaster Recovery Considerations" section of this document. For complete business recovery operational details, see the service continuity management guide.

Planning a Backup Strategy

Backups, restores, and data recovery operations are some of the most important tasks that an IT organization performs. Businesses cannot risk losing access to data for any significant amount of time; therefore, the organization should develop and follow a detailed plan, commonly called a backup strategy. An all-encompassing, master backup strategy can be difficult to apply consistently due to differences in staffing and technologies that typically exist from one business unit to another throughout an organization. It may be valuable to develop individual strategies for various business units or user groups, depending on application usage.

The process steps described in this section are iterative. Each step can be performed with variations, whenever a new customer's service level agreement (SLA) impacts backup, restore, or data recovery requirements; or if business needs change and affect the previously mentioned issues.

Note: Executing the final steps of the backup strategy described below involves the implementation and testing of the storage solution selected. For additional details about piloting, testing, and releasing new technologies into the production IT environment, see the MOF release management guide.

An understanding of the following concepts is important when developing a backup strategy.

What is a backup?

A backup is the process of periodically moving data from one type of medium (typically hard disk) to a secondary storage medium for potential retrieval at a later date (short-term, usually within a few days to a couple of months).

The secondary storage medium is most often magnetic tape, but may also include hard disk, CD-ROM, and optical disk. Write specific policies to inform users of their responsibilities regarding backups. For example, personal data is typically the responsibility of individual users to store and restore. Company data is often stored on servers that are subject to scheduled backups, thereby ensuring data restore and recovery capabilities. Write procedures that explain these policies to users along with the defined backup schedules for different classes of data.

What is archival storage?

Archival storage is sometimes referred to as "data archiving", is essentially the same task as a backup, except that the intent is to store the data for long periods of time, possibly forever. Often, data must be kept for long periods of time because of legal reasons, and it is important that these reasons be known to the storage manager and taken into account when planning for storage needs. For more information, see the "Determine Data Archiving (Offsite Storage) Requirements" section of this document.

What is a restore?

A restore is the process of retrieving data (a single file or many files) from a storage medium to a target location (typically a hard disk). In most data centers, there are policies that inform users of their responsibilities with regard to data restores. For example, personal data is typically the responsibility of individual end users to store and restore. Company data, on the other hand, often has to be stored on servers that are subject to scheduled backups, thereby ensuring data restore and recovery capabilities, should the need arise.

What is data recovery?

Data recovery is the process of completely restoring data to the state it was in at some prior point in time. Data recovery is usually performed as a result of some kind of disaster that has caused serious data loss, corruption, or both. Although we often think of disasters as being either natural (as in the case of an earthquake) or man-made (as in the case of a computer virus), a disaster can be defined as any event that causes serious interruption to the running of a business. For example, a hard disk crash on a production system could cause an e-commerce system to cease operation, and for many companies, would qualify as a disaster requiring a complete data recovery. Proper planning could mitigate these circumstances.

Classify the Data

One of the first steps that operations must execute prior to developing a good backup strategy is to classify the various types of data in the IT environment. For example, most organizations do not back up "user data", defined as personal data not related to the business. So, "user data" would be a type of data classification that could be ruled out of scope for scheduled backups and therefore falls to the responsibility of individual users to store.

"Company business data", on the other hand, could be a classification of data that is important to the company and is scheduled for regular backups. Within the "company business data" classification, there could be varying levels of company data, such as company private, while other data types could be "company resource data", "project data", and so on.

A good rule is to classify data according to its business impact. For example, there is some data that the company must have available or the business cannot run—like a parts list for a manufacturing company. This type of data has a high business impact and should be classified accordingly. Sometimes there is data that does not have to be online all the time, but must be available when needed—for example, the testing data generated by medical companies performing drug research. This too could be classified as "high business impact", because the company would be at risk if a product was flawed, and the company could not produce testing data for the last several years.

Define Backup Requirements

When the different data types have been classified, the requirements and specifications for each data type can be defined.

Note: Many of the specific requirements discussed here for determining a productive backup strategy should be provided to IT as the result of SLA development and not demand much time or effort for IT staff to discover. The service level manager and the customer liaison work with customer management to ensure the customers business requirements are satisfactorily addressed through the delivery of IT services. These requirements should include backup, restore, and recovery business needs, which are then negotiated and eventually committed to by IT. Each of these requirements is discussed in this section to ensure that nothing is missed during backup strategy development.

Determine How Much Data to Store

Determine, for each of the different data types, how much data needs to be stored. Whether you are dealing with terabytes of data or megabytes of data will influence the strategy. Understanding this will help to determine the types of devices required for doing the backup, the required media, whether there is sufficient time for the backup or if an online storage method must be considered, and so on.

Determine Where the Data Is Located

Now that the types of data in the environment and the storage needs of each data type are known, one must determine where the data is located. This information is critical in determining the technologies needed to implement the backup strategy. For example, in a geographically distributed environment, with servers located across the country —or the planet—a centralized backup solution could result in flooding the networks with backup data. This could have a potentially serious impact on business productivity. In such a case, a localized backup solution may need to be considered, perhaps in an automated mode to reduce cost.

Many companies are finding that a lot of valuable company business data is located on mobile personal computers. This can be a difficult situation for IT because attempts to back up desktop computers en masse are usually cost prohibitive. When more and more of these client personal computers are mobile laptop computers, the situation grows more complex. A recommended best practice is to direct all personal computer users to store company business data on targeted servers, which are backed up regularly.

Note: Fortunately, technologies are becoming increasingly available that allow users' data and settings to "follow" them whenever they move from location to location, thereby increasing productivity. Taking advantage of such capabilities should be a high priority for more IT organizations.

Determine Projected Data Growth

Another critical piece of information needed to develop a backup strategy is estimating the projected growth of data by type. IT should make sure that the backup strategy developed is not quickly outdated. Future plans about the projected number of users and what type of data they create should be considered. If the company is planning to hire 100 new employees, the amount of user and business data will grow accordingly. Prepare for the future and build in the required capacity. For more information, see the "Managing Disk Capacity" section of this document.

Determine Backup and Restore Performance Requirements

Information Technology (IT) Operations needs to determine the performance requirements for backups, restores and recovery. These requirements should align with business needs. During the course of developing SLAs, specific service level objectives (metrics) regarding backup, restore and recovery performance are defined, negotiated, and agreed to between the different business units and IT. Note that these service level objectives must be monitored for compliance with SLAs to ensure that both IT and customer commitments are being met.

Determine the Database Backup and Restore Needs

A company's most pertinent, critical data resides in databases. Each database is different; be certain to take advantage of the tools offered by database vendors for backing up, restoring, and recovering data contained in their different databases.

Most of the major database vendors provide the ability to back up their databases online, without shutting the database down. They typically provide tools that can generate lists of files that need to be backed up and ensure that control files, archive logs, redo logs, and table spaces are backed up appropriately. Some tools even provide event-driven archival capabilities that automatically execute archiving data when a volume exceeds a predetermined capacity. For more information, see the database management section of this document.

Determine E-mail Backup Requirements

For most companies, e-mail is a mission-critical application because of the growing dependence on the instant exchange of messages in the business world. E-mail systems rely on databases, yet there are still special e-mail-specific considerations that should be considered when planning the backup strategy:

What type of support does the e-mail system provide for backups and restores?

Does it provide a capability for online backup?

Does it allow both full and incremental backups?

What is the expected performance when backing up or restoring your e-mail database?

Does the e-mail system allow backing up and restoring individual documents, mailboxes, and folders, or does it always require a complete restore of the entire e-mail database?

Note: Users commonly request assistance in recovering individual e-mail messages, folders, documents, and other items that have accidentally been deleted from the system. If the entire database must be restored every time this happens, it can have a big impact on productivity.

Determine Backup Requirements for Personal Computer Clients

Rather than backing up hundreds, even thousands, of personal computer clients, many IT organizations choose to require their users to store company-critical data on servers. This allows important data to be stored according to preset backup schedules. If some users have specific needs for desktop or mobile backups, use the capabilities of the toolsets provided by the different platforms (for example, Microsoft Windows NT®, Microsoft Windows® 2000, UNIX and so on) to do this easily and securely.

Note: Users may resist server storage because of fears they will not be able to access their data when they need it. Address this issue by ensuring that there is a high-availability plan for storage systems. For more information regarding restoring user data, see the Restoring User Data and User Settings with Windows 2000 IntelliMirror®.

Determine Time Tables for Backups and Restores

Determine how often the data needs to be backed up per data type. For example, users working files may be backed up on a daily basis, system data on a weekly basis, and critical database transactions twice a day.

Determine the allowable timeframe for performing a backup. For instance, user files can be backed up any time users are not working on them, while some transactional databases may only have a few hours available for backup.

Evaluate the amount of data needing backup, the existing infrastructure, and the technologies to use to estimate the time required for each backup. In the case of offline backups, all these factors can affect users' access to data. For this reason, calculations for backup time requirements should be compared to specific business requirements. If the business demands that users have access to data 22 hours per day, a four-hour offline backup will not work; another solution would need to be found (for example, online backup, SAN, and so on).

The allowable timeframe for data recovery on a per data type basis must be known. For example, it might be perfectly acceptable to take two days to restore some user files, while company business data might have to be recovered in two hours. When determining allowable recovery time, remember that this includes a combination of the time needed to access the storage media plus the time required to actually restore the data to disk. The clearest example of this is when a full system recovery is required and media must be obtained from offsite storage. This information is used to determine the specific backup schedules enforced by operations.

Determine Data Archiving (Offsite Storage) Requirements

When developing the requirements for different data types, also plan—for each type—how the storage media, should be secured and maintained. For instance, high business impact data should backed up regularly, and periodically stored offsite. User data, if backed up at all, will not require offsite storage. Security restrictions for data both onsite and offsite will also have to be gauged. Again, the data classification can help determine the security needs.

Also determine the length of storage time per data type. For example, user files may need to be kept for only three weeks, while information about company employees may be need to be kept for five years.

Consider the following types of data and information when planning for offsite storage:

A full backup of the entire system, done weekly

Contents of the Definitive Software Library

Documents required to support an insurance claim, such as hardware and software inventory records, and copies of purchase orders or receipts for computer hardware and software

A copy of information required to reinstall and reconfigure network hardware

Identify the Constraints

As with any strategy development effort, be careful that the backup plan does not conflict with any existing, or proposed, standards or policies. Security policies may exist that dictate restrictions for data access (for example, who can request restoration of certain files), offsite storage (for example, which data must be securely stored in a vault), and so on. The backup strategy should comply with these policies.

SLAs should contain specific service level objectives for different IT customers (for example, user groups) that detail things like allowable time to restore, onsite versus offsite storage, backup schedules, and so on. The backup strategy should enable these service level objectives to be achieved. If a conflict arises, the storage manager and the service level manager determine a solution or renegotiate the service level objectives.

The specific infrastructure may also provide certain constraints on the backup strategy. Available network bandwidth, storage devices installed, cost, and other factors can limit the final strategy.

Define the Backup and Restore Policies

With all of the information gathered in the previous steps, the backup policies can now be defined and documented. Do not publish any policies that cannot be enforced. Implement the appropriate monitoring and measurements to ensure compliance.

It is imperative that specific policies regarding data backups and restores be written, made available to all necessary personnel, and strictly enforced. These policies should reflect any commitments made by IT to other IT entities via Operating Level Agreements (OLAs) or to clients via Service Level Agreements (SLAs).

As a guideline, storage policies should be developed with the following considerations:

Relating the data classification scheme to backup schedules. For example, company business data can be backed up once a day, database transaction data can be backed up twice a day, and so on.

Pertinent constraints. For example, all requests to restore files can be submitted through a request for change (RFC).

How long data will be backed up. For example, the data for Application X can be stored for one year.

How backups are scheduled. For example, to avoid network overload, the data for Business Unit A is not backed up at the same time as the data for Business Unit B.

Storage resource management (SRM). For example, storage events can be reported nightly and kept for one week.

Security considerations. For example, all company proprietary data can be encrypted.

Maintenance considerations. For example, storage media can be tested.

Disaster recovery considerations. For example, in the event of hard disk failure, all data stored up to the end of the previous business day must be recoverable. For another example, in the event of a disaster—as defined by the contingency plan—all data must be recoverable to the end of the previous business week. Note that this policy implies that offsite secure storage exists and is being used.

Analyze the Backup and Restore Requirements

Review all of the requirement information gathered and the constraints and policies identified, reduce any redundancies, and document the results. This document is used as a basis for executing the next step in the process.

Storage management efficiency can be increased for environments that need to manage storage devices in a distributed environment. Consolidating storage servers in a central location can achieve this objective. Storage management administration, monitoring storage resources, and overall network performance can be improved by this approach. The overall efficiency of a storage management solution can be improved by such a consolidation.

Select and Acquire Storage Infrastructure Components

Use the results of an analysis of your backup requirements to consider various storage solutions to meet business needs, including existing capabilities. With the advancements being made in storage technologies and architectures, it is worthwhile to consider the different options available.

The organization may have all the storage components it needs to address the requirements defined for the backup strategy. But if it does not, there needs to be a balance of the requirements already defined along with the constraints—especially budget constraints. Then, select the right technology for the job.

Develop a Storage Monitoring and Management Plan

Review the management solutions that are currently available in the IT environment. Include, if applicable, the vendor management solutions that are included with storage technologies or are available for purchase. Select and acquire, if necessary, the monitoring and management solution that best fits the business requirements. For more information, see the "Storage Event Monitoring" section of this document.

Management systems used to monitor and manage network and system resources typically do not contain any user data, therefore these systems usually do not require any archival storage. However, the management system backup media should still be stored in a secure location, according to the rules specified in IT security policies. Be sure to include backing up management systems in the overall backup strategy.

Develop Procedures and Methods

Develop the detailed procedures and methods that will be used by the storage management staff to run and maintain the storage solution. The procedures developed will be specific to the types of technologies deployed, but the methods chosen for backups are more general. Remember that should also include procedures for monitoring and managing the solution.

There are essentially three different types of backups that can be performed: a full backup and two different types of partial backups— incremental and differential. The following are typical methods for performing backups used by many companies today:

Full backup. A full backup involves the complete storage of all system and data files. One full backup should be performed once each week and once each month. Usually a full backup is performed at the end of a business week, often over the weekend when users do not need system access. Today, many companies do not have the luxury of scheduled system downtime and have to perform full backups online. Storage managers perform partial backups (incremental or differential) Monday through Thursday to save time.

Incremental backup. With the incremental backup, only those files that have changed since the last backup—full or incremental—are stored. Incremental backups usually take less time and require less media storage, because there are fewer files. Consider that restoring files may require more time since the restore process requires two or more steps. Files must be restored from the last full backup, the data from every other incremental backup that has been performed and the data from the last incremental backup—which must be done in order to bring the restored files up to date. If files have been restored as of Thursday, the incremental backups for Monday, Tuesday, and Wednesday have to be restored. This can take a lot of time.

Differential backup. The differential backup is a partial backup that reduces the time required to restore files. This is because each time the differential backup is performed, every file is backed up that has been changed since the last full backup. To return to a known good-data state, restore the full backup plus the last differential backup. This method can be a time saver.

Unattended backups. Backups can be fully automated to increase efficiency, enhance security, and reduce errors. Consideration should be given to automated backup tools, auto-changers, and tape library systems.

Develop a Resource Plan

After selecting the appropriate technologies and storage architecture to meet backup and restore requirements, other areas to address include staffing, training requirements, and organizational issues.

For the solution, determine the appropriate number of people required to implement and run the backup strategy. This may mean moving IT staff between different positions or possibly hiring additional staff. Resource considerations like this must be weighed against budget constraints.

Evaluate the current skills of the staff assigned to implement, run, and maintain the backup strategy, and compare the findings with the requirements of the selected storage solution(s). If training is required, determine whether this training is available inside the company or if outside education is needed. Often, the lead time to get people into training class can take longer than desired. Knowing when the staff will have the appropriate skill levels will have a direct impact on when the strategy can be implemented.

A best practice is to time staff training to coincide with the storage technologies arrival. Remember, the shorter the time between training and actual "hands-on" usage in a production environment, the better.

Test the Backup Strategy

Appropriate tests must be conducted to ensure that the backup strategy and associated technologies deliver the expected results. For more information about the steps required before releasing new technologies into the production environment, see the MOF release management SMF guide.

Implementing the Backup Strategy

With the appropriate storage infrastructure components now acquired and the staff fully trained, install the storage solution and associated monitoring and management tools into the IT environment. This effort often involves joint cooperation between different groups, including storage managers, network specialists, and the like.

The planning stages should be outlined and discussed in detail before reviewing the tasks to perform. Different servers need to implement different fault tolerance and recovery options. The critical questions that need to be asked during the planning stage are:

How critical is the data or information on a server?

Can automatic replication be set up quickly and easily?

If the server went down, what would be the impact on your business?

Is the server handling multiple functions?

If the server is a core-networking server (a DNS or WINS server) is all of the data being backed up on a daily basis?

What is the role of the server? Is it a Web server, a file server, a database server?

Disaster Recovery Considerations

Disaster recovery is a major topic of discussion for most IT organizations and should not be equated with doing backups, archiving, and data recovery, although each of these activities must be considered and addressed when developing an overall disaster recovery plan. Typically, disasters that require such an extensive planning effort are cataclysmic events like the destruction of a facility and/or mission-critical systems and networks (perhaps due to fire or earthquake, and so on). For this reason, disaster recovery plans must encompass all aspects of recovering critical IT infrastructure components, and not just your data.

Recovering all of your computer components, however, will not do much good if you do not have your data. This is why backup, restore, and data recovery procedures must be defined and followed as part of your disaster recovery plan.

The difference between traditional backups and archival storage is the length of storage retention (backups are short-term and archival storage is long-term) and the location of the data (backups are onsite and archival storage is offsite). Thus, when a disaster occurs, IT can get its data from an offsite location. Some companies even build and maintain redundant IT sites complete with data duplication or pay for third parties to provide such services to address their disaster recovery needs.

Questions a Disaster Recovery Plan Should Answer

It is assumed by the MOF process model that a disaster recovery plan for IT (addressed within the service continuity management SMF) has been developed. This plan should provide detailed answers to the following key questions:

What are the possible failure scenarios?

What is the critical data?

How often should backups be performed?

When should a full backup be performed versus an incremental or differential backup?

To what medium will should the backup be sent (tape, diskette, disk)?

Will backups be performed online (while users are working) or offline?

Will backups be done manually or scheduled to be done automatically?

If the backup is automated, how will it be verified that it successfully occurred?

How will backups be ensured to be useable?

How long will backups be saved before reusing the medium?

Assuming failure, how much time will it take to restore from the most recent backup?

Is that an acceptable amount of downtime?

Where will backups be stored, and do the appropriate people have access to them?

If the responsible system administrator is gone, who else knows the proper passwords and procedures to do backups and, if necessary, to restore the system?

Testing Restore and Recovery Procedures

Restore and data recovery procedures should be well planned and periodically tested as part of the overall data security and service continuity management efforts. This ensures that the procedures are capable of meeting expectations.

The following issues should be taken into consideration:

The frequency of testing restore procedures.

The frequency of testing data recovery procedures.

The frequency of testing backup-media integrity (two years is typical, but different media have different requirements). Minimally, conform to manufacturer specifications.

The time required to perform full and partial data restores with verification.

The need for re-testing data restore capabilities after major hardware or software revisions are released into the production environment.

Hierarchical Storage Management

Hierarchical storage management (HSM) refers to the capability to automatically (and transparently) migrate files across a hierarchy of storage devices. Rank the devices in this hierarchy according to parameters such as available capacity, storage speed, and cost per megabyte of storage; and set rules (typically based on the frequency of data access) that limit and define how files are migrated along the hierarchy. Attempts to restore files should also be transparent with HSM.

HSM should be evaluated for feasibility when determining the backup strategy. Remember, however, that HSM, if used, will be part of the backup strategy, but should not be considered a replacement for doing backups or data archiving. The purpose of HSM is to better manage the costs of administering and storing data and make storage management easier, not ensure data recovery.

Storage Resource Management

Whether the environment is centralized or distributed, the various storage technologies that are being used still must be managed. This requires making good use of the vendor tools that come with the various storage systems, using third-party tool offerings that fit the organization's needs, and wrapping these technologies in well-defined policies and procedures. In the end, the capability to easily monitor and analyze the storage management systems availability, capacity, and performance should be available. Easy configuration of storage systems, preferably from a single console, and generation of much-needed reports should be available as well.

Storage resource management (SRM) is a key storage management activity focused on ensuring that important storage devices, such as disks, are formatted and installed with the appropriate files systems. For more information, see the "Disk Management" and "Tape Management" sections in this document.

In addition, SRM includes using management technologies to monitor storage resources to ensure that they meet availability, capacity, and performance requirements. For more information, see the "Storage Event Monitoring" section in this document.

Monitoring and managing the storage management resources used in the production environment are extremely important tasks. It is therefore imperative that the management system(s) and tools used by administrators and storage managers provide all of the capabilities required (monitoring, tuning, configuring, and so on) to ensure that data is stored properly and available for restore and recovery operations when needed.

Typically, the tools used in the production environment to monitor and manage storage resources consist of functions provided as part of installed operating systems and/or those offered by third-party vendors.

Using a management system requires proper training and skills. An understanding of some of the basic concepts necessary for monitoring and managing storage resources successfully, as well as analyzing the results, is required. In addition, selecting the right tool for the right job increases the operations groups ability for ensuring data and storage resource availability, capacity, and performance.

Storage Event Monitoring

With the heavy emphasis today on fast and efficient—yet continuous—data access, storage management support teams cannot deliver the required quality of service if they only react to storage events after they have happened. Instead, support teams must be proactive and do everything in their power to address incidents before they impact the business.

Storage device availability, performance, and capacities must be monitored on an ongoing basis in order to capture the information required to do analyze potential problems, performance bottlenecks, or capacity shortages. This means that IT personnel must perform the tasks of monitoring storage management events. For additional information, see the service monitoring and control guide.

Storage Management Events to Monitor

The basic types of events that are of interest to a storage manager are:

Availability. How many times has the storage systems been down? Is this causing IT to not meet its service level objectives?

Errors. How many hardware, software, and network errors are occurring on storage systems? Does this exceed the manufacturer's specifications? Does this signify the possibility of a service outage in the near future?

Performance. Have performance thresholds been exceeded for the storage systems? How many times? Is this causing IT to not meet its service level objectives? For more information, see the "What Is a Threshold?" section of this document.

Capacity. Which storage systems are approaching full capacity? Does this match your storage planning expectations?

Analyzing Events

For monitoring storage management events and thresholds to be meaningful one must do something with the resulting data. Perform the task of analyzing the event data on a periodic basis and do trend analysis on storage system performance and capacities. If events and thresholds are merely monitored and not analyzed, reaction is the only option. It is the analysis of the data that really allows proactive storage resource management. Identify potential performance problems before they impact the business and predict future storage capacity requirements based on your collected data.

In addition, reports that track storage resource event trends pertaining to availability, capacity, and performance should be generated periodically and distributed to all concerned IT staff.

Media Management

Media management plays an important role in the storage management process. Media management includes the various tasks associated with administering and maintaining storage media (the physical media on which data is stored). The media librarian is responsible for maintaining the media library. The media librarian's role is a part of the Operations role cluster defined in the MOF Team Model.

There are many different types of media used in the production environment, such as hard disk subsystems, CD-ROMs, video, audio, and tape media of many different sorts (for example, reel-to-reel, DAT, and so on). These media are often packaged for different purposes, such as disk "farms", tape libraries, and so on. Understanding what must be done to manage these different media types is critical to ensuring that data is stored properly and capable of being either restored or recovered whenever it is required.

Disk Management

Managing disk subsystems is one of the more important tasks associated with media management because the vast majority of important business data still resides on disks today. Disk management includes administering and maintaining both the physical disks themselves as well as the logical disk volumes that may be used for data storage. Be careful to ensure that disk subsystems are available when needed, have the appropriate capacity to handle project growth, and perform at a level that meets expectations for data access.

Common Disk Configurations

The following is a high-level overview of some of the more common disk storage configurations in use within the industry today.

Direct-Attached Storage Configurations

Direct-attached storage has been in existence for years and is still found in most, if not all computer environments. With this architecture, storage devices are connected directly to servers through a bus connection like SCSI or by Fibre Channel.

While low in cost, access to storage is directly dependent upon the reliability of the server storage subsystems due to the direct connections. This can place greater emphasis on offsite data storage for disaster recovery.

Often the servers to which the data storage devices are connected are made by different manufacturers and support different operating systems. Thus, in essence, each server has its own proprietary storage architecture, resulting in numerous islands of storage automation within the datacenter. This can have a negative impact on data sharing because users must know exactly where storage is located in order to use it. It can also increase maintenance efforts because different tools and procedures are needed to manage, tune, and monitor the storage systems.

Centralized Disk Storage Configurations

Centralized disk storage is also very common today. Essentially, this architecture involves consolidating disk storage devices to one central location and includes some built-in redundancy.

This type of storage architecture is slightly more expensive than direct-attached, and storage choices are somewhat more limited due to topology and connectivity restrictions. It still, however, addresses some of the issues that face direct-attached architectures (see previous section). For example, the redundancy that comes with centralized disk storage architectures provides greater data protection and reduces downtime. Backups can be done with the implementation of a single procedure instead of many, but note that tape libraries still need to be accessed via a LAN and can still impact the network. And both data sharing and storage management are made easier with a centralized disk approach.

Network-Attached Storage Configurations

Network-attached storage (NAS) architecture gives users access to data via data storage devices directly connected to a network. This is accomplished through the implementation of a "thin server" (a special-purpose server) embedded in the storage device itself.

Essentially, this architecture is similar to the direct-attached storage approach and thus has some of the same issues. Data access is dependent upon the reliability of the storage subsystems, and if they should fail, productivity is decreased. Because backups must be done over the LAN, network performance can be impacted. But NAS does allow storage devices to be independent of file servers, so file sharing is easier. It is a flexible solution because storage devices can be placed anywhere on the network. NAS is also easy to set up and maintain, and it can provide a cost-effective method when storage expansion is required. Be aware though, that each storage device is treated as a node on the network and that the thin server still "owns" the device, just like the direct-attached storage solution.

Storage Area Networks

The latest option for storage architectures, a storage area network (SAN), is a high-speed dedicated network used to interconnect servers and clients to a shared "pool" of storage devices such as modular disk arrays and tape libraries. Such pools typically consist of servers, external storage devices, hubs and switches, and both network and storage management tools.

A SAN increases the availability of data by allowing any server on the network to access any storage device on the SAN (regardless of location or operating system). Server performance is also increased because storage-intensive processes such as backups and recoveries can be offloaded to the SAN. SANs are being used in some datacenters to increase server connectivity to centralized arrays and tape libraries, thereby allowing an amortization of storage cost over a large number of servers.

The usage of this architecture is increasing as the technologies implemented in SAN solutions (for example, Fibre Channel) are becoming more mature, allowing costs to be reduced.

Reasons for increased usage of SAN solutions include increased availability, reliability and performance due to Fibre Channel technology which provides greater bandwidth, multiple paths, and redundancy; the ability to centralize management, thereby reducing costs; and easy scalability due to the fact that both storage devices and servers can be added online.

File System Administration

Depending on the type of computers the IT organization is supporting, it may have multiple file systems under its care. Each file system has its own characteristics, system requirements, and capabilities. When installing a new system, selecting the right file system for the organization's needs can have a major impact on issues such as security, distributed computing, backup, restore and recovery capabilities.

Volume Management

Volume management includes the tasks that create, delete, alter, and maintain storage volumes in a system. Exactly how volume management is accomplished varies depending on the file system being used.

What Is a Volume Set?

A disk volume set is a way to create one large logical disk out of multiple smaller disks. Note that if any of the smaller disks fail, the entire volume set will be lost. Be sure to back up the volume sets as part of the regular backup schedule.

Managing Disk Availability

Fault tolerance is the ability of a system to continue functioning when part of the system fails. Fault tolerance combats problems such as disk failures, power outages, or corrupted operating systems. These problems can impact startup files, the operating system itself, or system files. Note that although the data is always available and current in a fault-tolerant system, tape backups still must be made to protect the information about the disk subsystem against user errors and natural disasters. Disk fault tolerance is not an alternative to a backup strategy with offsite storage. Fault-tolerant disk systems are standardized and categorized in six levels, known as RAID level 0 through level 5. Each level offers a specific mix of performance, reliability, and cost.

Redundant Array of Independent Disks (RAID) is a technology that consists of a class of disk drives that employ two or more combinations of disk drives that provide clients with a fault-tolerant solution and improved disk performance. There are several different levels of RAID disks:

Level 0: Disk Striping

This level provides the capability to do "data striping", which refers to the spreading of file blocks across multiple disks as opposed to sequentially writing a file to a single disk.

Results: High performance but no fault tolerance.

Level 1: Disk Mirroring

This level provides the capability to do disk "mirroring", which refers to the technique whereby data is written to two disks simultaneously. If one disk fails, the other disk can be used automatically without loss of service or data. This is a common method employed by online database systems that cannot afford to be taken offline. Note that because each file is stored in two locations, twice the usual storage space is needed to implement this feature.

Results: Improved fault tolerance; performance equivalent to a single drive; requires online backups.

Level 2: Non-Error Correcting

This RAID level was originally designed for disk drives that did not have built-in error correction.

Results: Since SCSI drives all have built-in error correction. This level is not used much anymore.

Level 3: Disk Striping and Parity

This level also provides data striping, but the data is striped at the byte level. One disk is reserved for error correction (parity) data.

Results: Improved performance and some fault tolerance (dependent on the hardware controller).

Level 4: Disk Striping and Parity

This level also provides both striping and parity like Level 3, but data is striped at the block level instead.

Results: Great for high-speed "read" situations (similar to Level 0 performance).

Level 5: Disk Striping and Parity

This level provides data striping and parity similar to Level 4, but rather than writing parity to a dedicated disk, parity is written to all the drives in the array. This level requires a minimum of three disks. As more disks are added to a RAID-5 set, the amount of overhead decreases. However, the benefits of having many disks in a RAID-5 set drops off when seven or more disks are used in the set.

Results: High performance and excellent fault tolerance.

Note: Of these RAID types, only RAID-1 and RAID-5 are commonly used.

Selecting a RAID Strategy

RAID strategies include hardware and software solutions. Choosing between RAID-1 and RAID-5 volumes depends on your computing environment. Consider the following when selecting a RAID strategy:

When compared to RAID-5 volumes, a mirrored volume implementation has a lower entry cost, requires less system memory, provides better overall performance, and does not show performance degradation during a failure. However, its cost per megabyte is higher than that for RAID-5 volumes.

A software RAID-5 volume implementation has better read performance and a lower cost per megabyte, but it requires more system memory and loses its performance advantage when a disk in the array is missing.

Hardware or software RAID-5 volumes are a good solution for data redundancy in a computing environment in which most activity consists of reading data. For example, you might want to use a RAID-5 volume on a server that is used to maintain all copies of the programs for the organizations site. It enables protection of the programs against the loss of a single disk in the striped volume. In addition, the read performance improves due to concurrent reads across the disks that make up the RAID-5 volume.

In an environment in which frequent updates to the information occur, it might be better to use mirrored volumes. However, a RAID-5 volume can still be used if redundancy is desired and if the storage overhead cost of a mirror is prohibitive.

What Is a Disk Cluster?

Disk clustering is a technology solution that allows two or more computers to be connected together in such a way that they appear to act as a single computer. This technology solution is used to achieve fault tolerance.

Managing Disk Capacity

Ensuring that there is enough disk capacity for growth needs is a function of the capacity management process. The storage administration role can monitor disks to ensure that capacity thresholds are not exceeded and periodically increase disk capacity based on resource needs. For more information, see the MOF capacity management SMF guide.

Disk Fragmentation

Disk fragmentation refers to a disk condition that occurs when a disk has been used for some time (creating files, adding files, deleting files, modifying files), and the files end up in "pieces". Logically the files are contiguous, but physically the "pieces" are spread all over the disk. This is a natural result of disk usage, invisible to end users, but that can cause disk performance problems and therefore needs to be monitored and periodically repaired.

Tape Management

Important business data must be stored securely and with the confidence that the IT organization can restore data to users when requested, or in the event of a disaster, that data and file systems can be fully recovered. This can only be achieved if the media used to store data, which is on tape for most datacenter environments, is properly prepared, maintained, and recycled.

There is a life cycle associated with tape storage media. Essentially, it consists of five phases:

1.

Preparing tape media for data storage (includes "initializing" or "formatting" the media if necessary)

2.

Using tape media for backup (includes defining how media is to be selected; how to check the condition of the media; how new backups are added to the media; and when data can be overwritten on the media)

3.

Archiving or "vaulting" the media in a safe place

4.

Recycling media for reuse

5.

Media retirement

Preparing Media for Data Storage

Whether initialization or formatting the tape media is needed is dependent on the type of tape media purchased. Typically, pre-initialized tape is widely available, but at a higher cost than non-formatted tape media. Since initializing tape can be a time consuming effort, the higher cost of the media should be measured against the labor hours required to do it manually.

Methods for Using Tape Media for Backups and Recycling

Having a plan defined for how tape media will be used for backups is very important. This plan should include defining how tape media is selected, how tape media should be checked for errors, and when tapes can be rewritten. Without such a plan, there is a risk of storing critical business data on questionable media and this can result in the data being irrecoverable.

The following sub-sections describe several common methods for using tape media found in the industry today.

To Be Avoided: Tape-a-Day

This is a very risky method of tape rotation. In tape-a-day method, a single set of tapes is continually reused for backups. This means that every time a backup is performed, the last backup performed is written over. Of course, this means that if files from two weeks ago need to be restored, it will be impossible because those files have been completely wiped out by the last backup. This is totally unacceptable for most datacenters and is a practice that should definitely be avoided.

Grandfather-Father-Son (GFS)

This is one of the most common methods of doing media rotation and uses three sets of tapes for backing up data on a daily, weekly, monthly, and quarterly basis.

The terms used in GFS are defined as follows:

Daily incremental backup (referred to as the "son" in the GFS method). There are four daily tapes that should each be labeled "Monday - Thursday", or something similar. These tapes are reused weekly for partial backups on the day specified by the label.

Full weekly backup (referred to as the "father" in the GFS method). There are up to five weekly sets of tapes, each labeled "Week 1 – Week 5", or something similar. These sets are reused monthly for full backups each week on the day in which a daily incremental backup is not performed).

Full monthly backup (referred to as the "grandfather" in the GFS method). There are three monthly sets of tapes, each labeled "Month 1 – Month 3", or something similar. These sets are used for full backups on the last business day of each month. They are reused quarterly.

Note: Each media set may consist of a single tape or multiple tapes, depending on the amount of data to be stored.

The following table describes a possible Grandfather-Father-Son tape rotation scenario for a single month.

Table 1 GFS Media Rotation Schedule

The shaded areas represent previous backups, while the white areas represent the most recent backups. In this single month scenario, only the daily backup tapes have been reused.

The GFS method as described allows a data history of 2-3 months, which for many organizations is sufficient. If data archiving is required, the tapes may be pulled from the rotation and stored offsite, replacing the stored set with new tapes.

The Tower of Hanoi

This is another tape rotation method that is also widely used. The name is derived from an ancient Chinese game of the same name that uses recursive techniques. In the game, a player moves a stack of disks from one peg to another, with the restriction that a smaller disk can only be placed on a larger disk. With this method, more media sets are required than with the GFS method. Therefore, this method provides more assurance that data can be recovered because every time a media set is added to this schedule, the backup history doubles. This schedule can be used with either a daily or weekly rotation.

The following table displays this method and an explanation follows:

Table 2 Tower of Hanoi Rotation Scenario

This method calls for starting the backup schedule with one media set (for example, set A) and reusing this set every other backup session. The next media set (for example, set B) is used on the first non-A day and is reused every fourth backup session. The next media set (for example, set C) is used on the first non-A or non-B day and repeats every eighth session. Media set D starts on the first non-A, non-B, and non-C day and repeats every sixteenth session. And finally, media set E alternates with every media set D.

An estimate of data traffic can be used to determine the frequency of rotation. A minimum of five media sets should be used for a weekly rotation or eight sets for a daily rotation. Again, sets should be periodically removed (and replaced) from the rotation for data archive purposes.

Media Retirement

With any of the media rotation schemes discussed above, multiple tapes are being used and reused. To ensure data integrity, the media should periodically be retired. Note that each tape manufacturer should provide information regarding the recommended lifetime of their media.

When reviewing tape errors on a regular basis, watch for excessive soft errors, and retire tapes after they have been used a specific number of times.

Roles and Responsibilities

Principal roles and their associated responsibilities for storage management have been defined according to industry best practices. Organizations might need to combine some roles, depending on organizational size, organizational structure, and the underlying service level agreements existing between the IT department and the business it serves.

Storage management is a critical operational process that is performed daily in every datacenter. Therefore, it is important to assemble the right team to perform the work. This section describes the roles that are recommended for building a team. Some of the roles directly relate to daily storage management tasks, while others are necessary only at particular times in the overall process. The role descriptions should not necessarily be interpreted as job descriptions.

Depending on the size and structure of an IT organization, an individual may perform more than one role. However, there should be only one process owner per process. This ensures that one individual is accountable for the overall performance of a process. It also ensures that there is one key individual to take initiative with resolving problems.

The following describes the roles that are required to perform daily storage management processes.

Storage Administrator

The storage administrator is responsible for carrying out the storage management process. With regard to process design and/or re-engineering efforts, the storage administrator has the most responsibility for the process.

The storage administrator is responsible for all of the process improvement efforts affecting storage management and its activities. These activities may take anywhere from 25 - 75 percent of the administrator's time. The storage administrator should also be able to spend a lot of time working on process improvements and be able to maintain good relations with stakeholders that have vested interests in the success of the process.

The storage administrator:

Determines backup, restore, and data recovery strategies.

Ensures that adequate backup, restore, and recovery procedures are in place and followed.

Ensures backup documentation exists and remains current.

Ensures accurate representation of storage resources in the CMDB.

Executes end-user backup and restoration requests.

Forecasts future storage capacity requirements.

Media Librarian

The media librarian maintains the media library and:

Ensures supply and control of limited-use media (for example, magnetic tapes, diskettes, cartridges, paper, microfiche, and so on).

Audits the physical media library, and ensures consistency of logical and physical media library.

Ensures the transport of media to offsite storage location in accordance with media retention and rotation policies.

Handles media according to manufacturer's recommendations.

Ensures media are loaded for backups and restored.

Ensures prompt removal of media from backup and restore devices following use.

Ensures all media are logged and tracked in the logical media library.

Supplies and controls media for the test environment.

Supplies and controls media for production testing.

Ensures media associated with production release are available and procedures are in place prior to service activation.

Relationship to Other Processes

Storage management is a service management function (SMF) in the operating quadrant of the Microsoft Operations Framework (MOF) process model. Various IT processes are dependent upon or are in other ways affected by what occurs during the daily performance of the storage management process in the data center. The graphic below depicts the relationship between storage management and other MOF service management functions (SMFs).

Figure 2: Relationship to other SMFs in the Operating quadrant

Figure 2: Relationship to other SMFs in the Operating quadrant
See full-sized image.

System Administration

System administration deals with the administration model used by an organization. Some organizations prefer a model where all IT functions are performed at a single site with a team of IT professionals collocated at that site. Other organizations prefer a distributed branch-office model where both technologies and support staff are geographically distributed. System administration examines the trade-offs of each model. Each type of system administration model will require unique storage and backup requirements.

Security Administration

Security administration is an IT process concerned with implementing and managing security controls that enforce corporate security policies thereby ensuring data and system security within the production IT environment. Storage management and security administration have a relationship because corporate data, the primary concern of the storage management process, must remain secure at all times. When data exists on disks within the corporate domain, it can be made secure through passwords and varying security levels provided via software utilities. But when data is stored to tape or other external storage devices, such security devices no longer apply and extra caution must be made to ensure data security (for example, keeping the data offsite, under lock and key, encrypted, and so on). The storage manager and the security administrator need to work together to ensure that the corporate data security policies are closely followed.

Service Monitoring and Control

Storage management monitors and controls the hard disks, tapes, and other storage devices. This can include monitoring for low storage space, or it may involve monitoring a backup job to ensure that it completes correctly. Storage management will have to work closely with the service monitoring and control SMF to ensure that events are monitored and support incidents created in the event of failure.

Network Administration

Network administration is an IT process concerned with managing all production networks under change management and configuration management control. Network administration and storage management have a relationship because specific change management work orders may occasionally require network configurations for various storage resources to be altered. In such cases, the network administrator and the storage manager should coordinate efforts to fulfill the work order and ensure strict adherence to storage management and network administration service level objectives.

Change Management

Change management is an IT process that manages (logs and approves) and controls (tracks and coordinates) all changes to the production IT environment. The relationship between storage management and change management is no different than change management's relationship to any other process; that is, no changes can be made to storage management resources without a request for change (RFC) being duly processed and approved. Further, certain non-scheduled requests to store and restore data may be required to go through the change management process (RFC submittal).

The change manager owns the change management process and typically relies on various change domain coordinators for specific expertise in the different areas (domains) of technologies and applications that may come under change control. The change manager and one or more change domain coordinators will need to periodically interact with storage management personnel when changes are proposed either directly to storage management systems and/or applications, or in conducting risk and impact assessments when such systems may be impacted by changes to related infrastructure components (for example, a server, LAN or disk drive, and so on).

Configuration Management

Configuration management is an IT process used to specify, track, and report on each IT component under configuration control or configuration item (CI). Data are stored in a logical entity known as the configuration management database (CMDB) typically consisting of multiple distinct databases. Storage management is related to configuration management through the CMDB entries that must be processed every time there is a change initiated (via change management) to any of the storage management configuration items. The storage manager and the configuration manager (the configuration management process owner) need to agree on the storage management CMDB structures (attributes and relationships) for storage CIs. These are hardware, software, network components, users, and so on. Note that no changes should occur to any storage management CIs without an RFC being processed and approved.

The storage manager may have to interact with various configuration domain coordinators responsible for various aspects of the CMDB. For example, one or more domain coordinators may be responsible for tracking different storage management infrastructure components, such as the network, the associated disk drives, and so on.

Availability Management

Availability management is an IT process concerned with assuring continual user access to IT services and addresses issues such as service availability, reliability, maintainability, security, and the ability of services to meet availability service level objectives defined within an SLA. Storage management has a strong relationship to availability management due to availability management's focus on "service availability", and the fact that the data management, data storage, and data restore and recovery capabilities inherent in the storage management process are required in order to meet service availability objectives, and must therefore be included when developing service availability plans.

The storage manager and the availability manager should work together to develop appropriate storage "availability" plans. This effort should be driven by defined service level objectives.

Capacity Management

Capacity management is an IT process concerned with assuring IT resource capacities meet business requirements and are being appropriately optimized. Storage management has a strong relationship to capacity management due to capacity management's focus on overall "service capacity" and the fact that the data management, data storage, data restore and recovery capabilities inherent in the storage management process have a direct impact on the hardware and network capacity requirements that must be addressed when developing service capacity plans.

The storage manager and the capacity manager should work together to develop appropriate storage "capacity" plans. This effort should be driven by defined service level objectives.

Service Continuity Management

Service continuity management is an IT process for developing a coherent and well-defined plan that specifies how IT can recover from a disaster and safeguard systems to prevent incidents from becoming disasters. The relationship between service continuity management and storage management is through the development, testing, and actual execution of the disaster recovery plan created as a result of the service continuity management process, involving both the contingency manager and storage manager. Such a plan must dictate data storage and data recovery requirements and capabilities in the event of a disaster. Storage management must therefore ensure that these requirements can be met.

Contributors

Many of the practices that this document describes are based on years of IT implementation experience by Accenture, Avanade, Microsoft Consulting Services, Fox IT, Hewlett-Packard Company, Lucent Technologies/NetworkCare Professional Services, and Unisys Corporation.

Microsoft gratefully acknowledges the generous assistance of these organizations in providing material for this document.

Program Management Team

William Bagley, Microsoft Corporation

Jeff Yuhas, Microsoft Corporation

Lead Writer

Jeff Drake, Hewlett Packard Corporation

Contributing Writers

Vicky Howells, Fox IT

Editors

Nancy Huber, Microsoft Corporation

Christine Waresak, Volt Technical Services


Top of pageTop of pagePrevious21 of 23Next
**
**