Microsoft Distributed File System
IT Value Card
Published: December 8, 2005
This card describes how the Microsoft Corporation Information Technology group (Microsoft
IT) uses Microsoft® Windows Server™ 2003 R2 Distributed File System (DFS), which
contains new state-of-the-art replication, management, and compression technologies
that ensure the efficient use of bandwidth. DFS includes a rich set of technologies,
including DFS Namespaces, DFS Replication (DFS-R), and Remote Differential Compression
(RDC).
|
IT Business Benefits
|
|
Benefit
|
Source or Derivation
|
|
Faster Replication
|
300 percent faster replication of large files using DFS Replication (DFS-R)
|
Metrics reported by DFS-R Health Report, with greatest benefit seen for files 290
megabytes (MB) and larger
|
|
Faster Compression
|
DFS Remote Differential Compression (RDC) two to three times faster than rsync 2.6.2
|
Microsoft internal testing found RDC twice as fast as the rsync 2.6.2 protocol in
dealing with 4 MB files, and three times faster dealing with 290 MB files
|
|
Operations
|
40 percent less time spent on managing replication operational activities
|
50 percent savings in managing content
30 percent savings in replication issues
|
Executive Summary
The Microsoft Corporation Information Technology group (Microsoft IT) uses Microsoft® Windows Server™ 2003 R2 Distributed File System
(DFS) technology to help it better manage servers in some 140 branch offices around
the world. DFS, new with Windows Server 2003 R2, helps ensure that Microsoft employees
always have access to the data they need, while significantly reducing the bandwidth
used for replication between sites. Windows Server 2003 operating system is part
of Microsoft Windows Server System™ integrated server software.
Situation
Microsoft, like many other businesses, finds that maintaining remote locations generates
significant operational costs and administrative challenges. One of the greatest
challenges is to ensure that the applications, tools, reports, learning materials,
and other internal content frequently accessed by users are available to them at
the closest site. A key goal is to optimize bandwidth utilization so the same data
that multiple users would have accessed across the wide area network (WAN) from
the remote site is instead copied over the same network pipe once and is made available
at a local server. The challenge of accomplishing this is exacerbated by the volume
and variety of data that is replicated across Microsoft sites worldwide operations,
including replication between some 250 servers.
Much of the difficulty comes from the distributed nature of the networks linking
branch offices to regional hubs and corporate headquarters. These WANs have an undesirable
effect on the efficient operation of branch offices. One of the fundamental problems
with integrating branch offices is the underlying dependency on file system operations.
Most file system protocols, such as UNIX's Network File System (NFS) and Microsoft's
Common Internet File System (CIFS), do not operate efficiently over low-bandwidth
or high-latency networks. These protocols were originally developed under the basic
assumption that bandwidth constraints were nonexistent. Microsoft IT needed a better
solution for linking some 140 branch offices with three regional hubs: Redmond,
Singapore, and Dublin. It also needed better replication reporting.
Solution
Microsoft IT is enhancing management of its branch office servers through deployment
of the Windows Server 2003 R2 operating system. Its underlying technologies support
the seamless integration of servers located in branch offices with the enterprise
network. Windows Server 2003 R2 allows organizations to maintain the performance,
availability, and productivity benefits of a local branch server while avoiding
connectivity limitations and management overhead.
A key enabling technology is the newly redesigned DFS, which contains state-of-the-art
replication, management, and compression technologies that ensure the efficient
use of bandwidth.
DFS includes bandwidth-intelligent file system technologies, and provides an efficient
framework for server-to-server file replication. DFS employs state-of-the-art compression
algorithms and efficient replication mechanisms that ensure files are only transferred
when needed and that only the minimal set of information required is replicated,
while maintaining distributed file consistency.
DFS can also help simplify the management and increase the overall productivity
of an organization's branch offices. Key elements of DFS include:
- DFS Namespaces. DFS Namespaces
allows administrators to group shared folders located on different servers and present
them to users as a virtual tree of folders known as a "namespace." A namespace
provides numerous benefits, including increased availability of data, load sharing,
and simplified data migration.
If local servers become unavailable, DFS Namespaces configurations provide for client
failover by closest site selection and fail back to a preferred server. For example,
if fallback is enabled on a DFS link that has targets in both the branch and the
hub, branch clients will automatically fail over to the hub when service is unavailable,
and automatically fail back to the branch when that service is available again.
The namespace is administered by using the DFS Management Console, which provides
a hierarchical view of the namespace. The DFS Management Console incorporates functionality
that was previously available through command-line interface. The DFS Management
Console applies features from Microsoft Management Console (MMC) 3.0, including
in-the-box HTML reports and diagnostics.
.gif)
Figure 1. When users access a folder in the namespace (1), the client computers contact
the namespace server and receive a referral. Client computers access the first server
in their respective referrals (2).
-
DFS-R. DFS-R is a robust multimaster
file-replication service that is significantly more scalable and efficient in synchronizing
file servers than its predecessor, File Replication Services. DFS-R can be used
to replicate branch office data to other branches and hub servers, any of which
can serve as backup sources if the server at a branch office goes down. DFS-R supports
automated recovery from database loss or corruption. DFS-R supports replication
scheduling and bandwidth throttling. DFS-R uses a new compression algorithm known
as RDC.
-
RDC. RDC is an advanced WAN-friendly
compression technology that optimizes data transfers over limited-bandwidth networks.
Instead of transferring similar or redundant data repeatedly, RDC accurately identifies
changes (referred to as "deltas") within and across files and transmits only those
changes to achieve significant bandwidth savings. RDC detects insertions, removals,
or rearrangements of data in files, enabling DFS-R to replicate only the changed
file blocks when files are updated. In addition to calculating file deltas and transferring
only the differences, RDC can also copy any similar file from any client or server
to another using data that is common to both computers. This further reduces the
amount of the data sent and the overall bandwidth requirements for file transfers.
Local differencing techniques, sometimes called "patching," are used to
transform the old version to a new version. The differences between two known versions
of a file are calculated on a server; and then sent to the client.
Microsoft IT deployed DFS at 140 branch offices and at its three major data centers,
replacing an internal replication tool, called Robocopy that required extensive
scripting. Microsoft IT configured DFS to failover to the nearest branch server,
or the nearest regional data center, in case of a local outage. When the local server
is restored, it automatically resumes the role as the primary data store.
The DFS Management Console (a plug-in for MMC version 3.0) is used for configuration
and server.
.gif)
Figure 2. The DFS Management Console provides a hierarchical view of the namespace.
Deployment Notes
To enable all features in DFS Namespaces, you must configure servers and clients
as follows:
-
Servers where namespace management tasks are performed must run Windows Server 2003
R2.
-
To take advantage of new namespace features all servers that host namespaces must
run Windows Server 2003 SP1 or Windows Server 2003 R2.
-
To take advantage of new namespace features, all domain controllers must run Windows
Server 2003 with SP1 or Windows Server 2003 R2.
-
Namespaces must be created on NTFS file system volumes.
-
Clients that access namespaces can run any of the supported client operating systems,
but only clients running the following operating systems, service packs, and the
appropriate client failback hotfix can be configured for client failback: Windows® XP operating system with Service Pack 2 and
the Windows XP Client Failback hotfix. Windows Server 2003 SP1, and the Windows
Server 2003 Client Failback hotfix.
Note: Administrators can request
that certain servers be listed at the top or bottom of the referral list, irrespective
of their site location. For example, an administrator may wish to designate a server
used for data protection as the lowest priority server across all sites. This server
would then be treated as a server of last resort, even for clients local to the
server.
Benefits
Deployment of Windows Server 2003 R2 and DFS provided Microsoft IT with a number
of benefits, including significant reduction in bandwidth usage and the ability
to use local servers as service caches.
Significant Reduction in Bandwidth Usage
Microsoft IT found significant reduction in bandwidth usage from DFS-R. For example,
previously changing just the title on a 3 MB Microsoft PowerPoint®
presentation graphics program would result in the entire file being sent across
the network for replication, which could take a minute or more. With the delta-based
RDC of DFS-R, only the change in the title is sent, taking less than a second to
replicate. Microsoft IT internal performance testing found bandwidth reduction factors
ranging from 37 to 95 percent, compared to Microsoft's earlier custom solution.
Bandwidth reduction varied according to how much of the data in a file had been
changed, and the file type—with image files generally showing less reduction than
documents, spreadsheets, and PowerPoint presentations.
Testing by Microsoft IT also found RDC to be faster than using rsync technology,
especially on larger files. Testing found that RDC was twice as fast in compressing
files as using the rsync 2.6.2 protocol when dealing with 4 MB files, and nearly
three times faster when working with 290 MB files.
Service Cache
With DFS, servers at a branch office perform as a service cache that does not hold
a unique state and does not require system backup. If the server fails, there is
no impact on branch office functionality, remote clients just fail over from the
local branch office server to another server—by closest site selection, and then
fail back to a preferred server when services are restored.
Precise and Proactive Reporting
The previous solution lacked a mechanism to monitor and report replication progress,
success, and failure without having to log on to each server and scrutinize the
extremely verbose log file. DFS-R includes proactive reports that provide various
parameters of the replication topology and server health. Reports are displayed
in a single dashboard for all the servers in a replication topology, so the operations
team members have a single Web page providing the data they need to identify and
troubleshoot issues.
Cross-File Replication
Before copying a file from the source to a remote server, DFS Cross-File Replication
checks to see if the file is already available in another location on the same receiving
server. If found it copies over locally instead of replicating it over the wire,
helping to optimize use of bandwidth.
Less Time Spent on Server Management
The Microsoft IT deployment of DFS has resulted in a 40 percent reduction in time
spent managing servers in branch offices. Microsoft IT has seen about a 50 percent
reduction in time spent on activities such as hosting new content, archiving old
content, and adding and removing servers. The group has seen a 30 percent time savings
in handling day-to-day replication issues. Microsoft IT was able to reduce the size
of its Tier 2 support by one head count, reallocating the person to other work.
Lessons Learned
It is possible to use DFS Namespaces when domain controllers and namespace servers
run a mix of Windows 2000 Server operating system, Windows Server 2003 R2, Windows
Server 2003 with SP1, and Windows Server 2003 without SP1, but some functionality
is disabled or limited depending on the operating systems on the servers. Some examples
of mixed-mode behavior are as follows:
-
If domain controllers or namespace servers are running Windows Server 2003 without
SP1, they cannot provide referrals that support target priority or client failback.
-
If domain controllers or namespace servers are running Windows 2000 Server, they
cannot provide referrals that support target server priority or client failback,
nor can they order targets by lowest cost in referrals. Additional configuration
is required to enable these namespace servers and domain controllers to detect the
site of each target server in the namespace.
-
If the DFS Management snap-in connects to a namespace server that is not running
Windows Server 2003 SP1 or Windows Server 2003 R2, none of the new configuration
settings (e.g., client failback and target priority) can be enabled.
Global Microsoft IT Environment
The Microsoft enterprise is large, complex, and constantly changing. The mission
of the Microsoft IT group is fairly unique. In addition to running a world-class
utility that keeps the business productive, its primary mission is to be Microsoft's
first and best customer. This involves testing all enterprise software in the early
stages of beta development by deploying it throughout the company, providing valuable
feedback to product groups to ensure predictable and trustworthy services for customers,
clients, and partners. The following data gives some idea of the environment in
which this all occurs (numbers are approximate):
- Nearly 90,000 users of IT
- More than 300,000 computers and devices
- More than 400 sites supported worldwide
- Global line-of-business (LOB) applications (for example, Siebel, Clarify, MS Sales,
and World-Wide Sales and Marketing Database)
- Global Virtual help desk
- Seven sites running Microsoft Exchange Server globally
- 110 servers running Exchange Server
- 38 mailbox servers
- More than 3 million internal e-mail messages per day
- More than 8.8 million external e-mail messages per day
- More than 6.8 million e-mail messages blocked per day
- More than 7.5 million remote connections per month
For More Information
Contact your local Microsoft office at http://www.microsoft.com/worldwide
or visit http://www.microsoft.com/technet/itshowcase