Printer Friendly Version      Send     
Click to Rate and Give Feedback
TechNet
TechNet Library
Microsoft Distributed File System

IT Value Card

Published: December 8, 2005

This card describes how the Microsoft Corporation Information Technology group (Microsoft IT) uses Microsoft® Windows Server™ 2003 R2 Distributed File System (DFS), which contains new state-of-the-art replication, management, and compression technologies that ensure the efficient use of bandwidth. DFS includes a rich set of technologies, including DFS Namespaces, DFS Replication (DFS-R), and Remote Differential Compression (RDC).

Download

Download IT Value Card, 290 KB, Microsoft Word file

PowerPoint IT Pro Webcast, WMA, MP3

IT Business Benefits

 

Benefit

Source or Derivation

Faster Replication

300 percent faster replication of large files using DFS Replication (DFS-R)

Metrics reported by DFS-R Health Report, with greatest benefit seen for files 290 megabytes (MB) and larger

Faster Compression

DFS Remote Differential Compression (RDC) two to three times faster than rsync 2.6.2

Microsoft internal testing found RDC twice as fast as the rsync 2.6.2 protocol in dealing with 4 MB files, and three times faster dealing with 290 MB files

Operations

40 percent less time spent on managing replication operational activities

50 percent savings in managing content

30 percent savings in replication issues

Executive Summary

The Microsoft Corporation Information Technology group (Microsoft IT) uses Microsoft® Windows Server™ 2003 R2 Distributed File System (DFS) technology to help it better manage servers in some 140 branch offices around the world. DFS, new with Windows Server 2003 R2, helps ensure that Microsoft employees always have access to the data they need, while significantly reducing the bandwidth used for replication between sites. Windows Server 2003 operating system is part of Microsoft Windows Server System™ integrated server software.

Situation

Microsoft, like many other businesses, finds that maintaining remote locations generates significant operational costs and administrative challenges. One of the greatest challenges is to ensure that the applications, tools, reports, learning materials, and other internal content frequently accessed by users are available to them at the closest site. A key goal is to optimize bandwidth utilization so the same data that multiple users would have accessed across the wide area network (WAN) from the remote site is instead copied over the same network pipe once and is made available at a local server. The challenge of accomplishing this is exacerbated by the volume and variety of data that is replicated across Microsoft sites worldwide operations, including replication between some 250 servers.

Much of the difficulty comes from the distributed nature of the networks linking branch offices to regional hubs and corporate headquarters. These WANs have an undesirable effect on the efficient operation of branch offices. One of the fundamental problems with integrating branch offices is the underlying dependency on file system operations. Most file system protocols, such as UNIX's Network File System (NFS) and Microsoft's Common Internet File System (CIFS), do not operate efficiently over low-bandwidth or high-latency networks. These protocols were originally developed under the basic assumption that bandwidth constraints were nonexistent. Microsoft IT needed a better solution for linking some 140 branch offices with three regional hubs: Redmond, Singapore, and Dublin. It also needed better replication reporting.

Solution

Microsoft IT is enhancing management of its branch office servers through deployment of the Windows Server 2003 R2 operating system. Its underlying technologies support the seamless integration of servers located in branch offices with the enterprise network. Windows Server 2003 R2 allows organizations to maintain the performance, availability, and productivity benefits of a local branch server while avoiding connectivity limitations and management overhead.

A key enabling technology is the newly redesigned DFS, which contains state-of-the-art replication, management, and compression technologies that ensure the efficient use of bandwidth.

DFS includes bandwidth-intelligent file system technologies, and provides an efficient framework for server-to-server file replication. DFS employs state-of-the-art compression algorithms and efficient replication mechanisms that ensure files are only transferred when needed and that only the minimal set of information required is replicated, while maintaining distributed file consistency.

DFS can also help simplify the management and increase the overall productivity of an organization's branch offices. Key elements of DFS include:

  • DFS Namespaces. DFS Namespaces allows administrators to group shared folders located on different servers and present them to users as a virtual tree of folders known as a "namespace." A namespace provides numerous benefits, including increased availability of data, load sharing, and simplified data migration.

    If local servers become unavailable, DFS Namespaces configurations provide for client failover by closest site selection and fail back to a preferred server. For example, if fallback is enabled on a DFS link that has targets in both the branch and the hub, branch clients will automatically fail over to the hub when service is unavailable, and automatically fail back to the branch when that service is available again.

    The namespace is administered by using the DFS Management Console, which provides a hierarchical view of the namespace. The DFS Management Console incorporates functionality that was previously available through command-line interface. The DFS Management Console applies features from Microsoft Management Console (MMC) 3.0, including in-the-box HTML reports and diagnostics.

    Bb735136.image001(en-us,TechNet.10).gif

    Figure 1. When users access a folder in the namespace (1), the client computers contact the namespace server and receive a referral. Client computers access the first server in their respective referrals (2).

  • DFS-R. DFS-R is a robust multimaster file-replication service that is significantly more scalable and efficient in synchronizing file servers than its predecessor, File Replication Services. DFS-R can be used to replicate branch office data to other branches and hub servers, any of which can serve as backup sources if the server at a branch office goes down. DFS-R supports automated recovery from database loss or corruption. DFS-R supports replication scheduling and bandwidth throttling. DFS-R uses a new compression algorithm known as RDC.
  • RDC. RDC is an advanced WAN-friendly compression technology that optimizes data transfers over limited-bandwidth networks. Instead of transferring similar or redundant data repeatedly, RDC accurately identifies changes (referred to as "deltas") within and across files and transmits only those changes to achieve significant bandwidth savings. RDC detects insertions, removals, or rearrangements of data in files, enabling DFS-R to replicate only the changed file blocks when files are updated. In addition to calculating file deltas and transferring only the differences, RDC can also copy any similar file from any client or server to another using data that is common to both computers. This further reduces the amount of the data sent and the overall bandwidth requirements for file transfers. Local differencing techniques, sometimes called "patching," are used to transform the old version to a new version. The differences between two known versions of a file are calculated on a server; and then sent to the client.

Microsoft IT deployed DFS at 140 branch offices and at its three major data centers, replacing an internal replication tool, called Robocopy that required extensive scripting. Microsoft IT configured DFS to failover to the nearest branch server, or the nearest regional data center, in case of a local outage. When the local server is restored, it automatically resumes the role as the primary data store.

The DFS Management Console (a plug-in for MMC version 3.0) is used for configuration and server.

Bb735136.image002(en-us,TechNet.10).gif

Figure 2. The DFS Management Console provides a hierarchical view of the namespace.

Deployment Notes

To enable all features in DFS Namespaces, you must configure servers and clients as follows:

  • Servers where namespace management tasks are performed must run Windows Server 2003 R2.
  • To take advantage of new namespace features all servers that host namespaces must run Windows Server 2003 SP1 or Windows Server 2003 R2.
  • To take advantage of new namespace features, all domain controllers must run Windows Server 2003 with SP1 or Windows Server 2003 R2.
  • Namespaces must be created on NTFS file system volumes.
  • Clients that access namespaces can run any of the supported client operating systems, but only clients running the following operating systems, service packs, and the appropriate client failback hotfix can be configured for client failback: Windows® XP operating system with Service Pack 2 and the Windows XP Client Failback hotfix. Windows Server 2003 SP1, and the Windows Server 2003 Client Failback hotfix.

Note: Administrators can request that certain servers be listed at the top or bottom of the referral list, irrespective of their site location. For example, an administrator may wish to designate a server used for data protection as the lowest priority server across all sites. This server would then be treated as a server of last resort, even for clients local to the server.

Benefits

Deployment of Windows Server 2003 R2 and DFS provided Microsoft IT with a number of benefits, including significant reduction in bandwidth usage and the ability to use local servers as service caches.

Significant Reduction in Bandwidth Usage

Microsoft IT found significant reduction in bandwidth usage from DFS-R. For example, previously changing just the title on a 3 MB Microsoft PowerPoint® presentation graphics program would result in the entire file being sent across the network for replication, which could take a minute or more. With the delta-based RDC of DFS-R, only the change in the title is sent, taking less than a second to replicate. Microsoft IT internal performance testing found bandwidth reduction factors ranging from 37 to 95 percent, compared to Microsoft's earlier custom solution. Bandwidth reduction varied according to how much of the data in a file had been changed, and the file type—with image files generally showing less reduction than documents, spreadsheets, and PowerPoint presentations.

Testing by Microsoft IT also found RDC to be faster than using rsync technology, especially on larger files. Testing found that RDC was twice as fast in compressing files as using the rsync 2.6.2 protocol when dealing with 4 MB files, and nearly three times faster when working with 290 MB files.

Service Cache

With DFS, servers at a branch office perform as a service cache that does not hold a unique state and does not require system backup. If the server fails, there is no impact on branch office functionality, remote clients just fail over from the local branch office server to another server—by closest site selection, and then fail back to a preferred server when services are restored.

Precise and Proactive Reporting

The previous solution lacked a mechanism to monitor and report replication progress, success, and failure without having to log on to each server and scrutinize the extremely verbose log file. DFS-R includes proactive reports that provide various parameters of the replication topology and server health. Reports are displayed in a single dashboard for all the servers in a replication topology, so the operations team members have a single Web page providing the data they need to identify and troubleshoot issues.

Cross-File Replication

Before copying a file from the source to a remote server, DFS Cross-File Replication checks to see if the file is already available in another location on the same receiving server. If found it copies over locally instead of replicating it over the wire, helping to optimize use of bandwidth.

Less Time Spent on Server Management

The Microsoft IT deployment of DFS has resulted in a 40 percent reduction in time spent managing servers in branch offices. Microsoft IT has seen about a 50 percent reduction in time spent on activities such as hosting new content, archiving old content, and adding and removing servers. The group has seen a 30 percent time savings in handling day-to-day replication issues. Microsoft IT was able to reduce the size of its Tier 2 support by one head count, reallocating the person to other work.

Lessons Learned

It is possible to use DFS Namespaces when domain controllers and namespace servers run a mix of Windows 2000 Server operating system, Windows Server 2003 R2, Windows Server 2003 with SP1, and Windows Server 2003 without SP1, but some functionality is disabled or limited depending on the operating systems on the servers. Some examples of mixed-mode behavior are as follows:

  • If domain controllers or namespace servers are running Windows Server 2003 without SP1, they cannot provide referrals that support target priority or client failback.
  • If domain controllers or namespace servers are running Windows 2000 Server, they cannot provide referrals that support target server priority or client failback, nor can they order targets by lowest cost in referrals. Additional configuration is required to enable these namespace servers and domain controllers to detect the site of each target server in the namespace.
  • If the DFS Management snap-in connects to a namespace server that is not running Windows Server 2003 SP1 or Windows Server 2003 R2, none of the new configuration settings (e.g., client failback and target priority) can be enabled.

Global Microsoft IT Environment

The Microsoft enterprise is large, complex, and constantly changing. The mission of the Microsoft IT group is fairly unique. In addition to running a world-class utility that keeps the business productive, its primary mission is to be Microsoft's first and best customer. This involves testing all enterprise software in the early stages of beta development by deploying it throughout the company, providing valuable feedback to product groups to ensure predictable and trustworthy services for customers, clients, and partners. The following data gives some idea of the environment in which this all occurs (numbers are approximate):

  • Nearly 90,000 users of IT
  • More than 300,000 computers and devices
  • More than 400 sites supported worldwide
  • Global line-of-business (LOB) applications (for example, Siebel, Clarify, MS Sales, and World-Wide Sales and Marketing Database)
  • Global Virtual help desk
  • Seven sites running Microsoft Exchange Server globally
  • 110 servers running Exchange Server
  • 38 mailbox servers
  • More than 3 million internal e-mail messages per day
  • More than 8.8 million external e-mail messages per day
  • More than 6.8 million e-mail messages blocked per day
  • More than 7.5 million remote connections per month

For More Information

Contact your local Microsoft office at http://www.microsoft.com/worldwide or visit http://www.microsoft.com/technet/itshowcase

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker