Service Management Functions

Network Administration

Published: May 31, 2005

For the latest information, please see http://www.microsoft.com/mof

On This Page
Executive SummaryExecutive Summary
IntroductionIntroduction
Network Administration OverviewNetwork Administration Overview
Processes and ActivitiesProcesses and Activities
Roles and ResponsibilitiesRoles and Responsibilities
Relationship to Other SMFsRelationship to Other SMFs

Executive Summary

As IT capabilities have increased steadily due to improvements in technology and practices, businesses have grown increasingly reliant on the IT infrastructure to support critical business processes and to create new opportunities. Although the most novel and innovative business use of IT technology often occurs at the application level, the importance of core services should not be taken lightly. All the functionality of a reliable, available, and secure IT infrastructure starts with the selection and proper maintenance of the hardware and basic services that form the foundation of that infrastructure.

The Network Administration service management function (SMF) defines and delivers the processes and procedures required to operate basic network services, including Dynamic Host Configuration Protocol (DHCP), Windows Internet Name Service (WINS), and Domain Name System (DNS), on a day-to-day basis. This SMF provides fundamental guidance for operating these services and maintaining the hardware layer on which they reside. It also provides references to appropriate resources for topic-specific operating guidance on hardware and network-level software. The Network Administration SMF resides in the Operating Quadrant of Microsoft® Operations Framework (MOF).

Introduction

The Network Administration SMF presents a unified approach to the operation and maintenance of network infrastructures, including Remote Access Service (RAS), local area networks (LANs), and wide area networks (WANs). This document describes best practices and processes that may be applied generally across a broad range of network configurations and topologies. The Network Administration SMF reflects best practices developed though operation of the extensive and highly complex Microsoft internal networks, with input from partner and customer experience and guidance provided through the IT Infrastructure Library (ITIL), published by the United Kingdom’s Office of Government Commerce (OGC).

This guide provides detailed information about the Network Administration SMF for organizations that have deployed, or are considering deploying, Microsoft technologies in a data center or other type of enterprise computing environment. This is one of the more than 20 SMFs defined and described in Microsoft Operations Framework (MOF). The guide assumes that the reader is familiar with the intent, background, and fundamental concepts of MOF as well as the Microsoft technologies discussed.

An overview of MOF and its companion, Microsoft Solutions Framework (MSF), is available in the MOF Service Management Function Overview guide. This overview also provides abstracts of each of the service management functions defined within MOF. Detailed information about the concepts and principles of each of the frameworks is available in technical papers at http://go.microsoft.com/fwlink/?LinkId=47748.

Audience

This document is written primarily for IT professionals and managers, including network engineers, systems engineers, architects, and others who wish to implement standardized practices and policies within their IT organizations. The guidance provided in this document is intended to facilitate the operation and management of networks in organizations of all sizes, but is aimed primarily at large organizations with complex IT architectures and multiple locations. The guide assumes a professional level of competence and knowledge of fundamental network principles. For those who wish to review network nomenclature, architecture, and other basics, we recommend the Microsoft Press® book, ALS Networking Essentials Plus, Third Edition, ISBN 0-7356-0912-8. To review Microsoft guidance and recommendations for network architecture, please refer to the Microsoft Windows Server System™ Reference Architecture (WSSRA). An overview of WSSRA is available at
http://go.microsoft.com/fwlink/?LinkId=47749.

The complete set of WSSRA guidelines, blueprints, and documentation is available at
http://go.microsoft.com/fwlink/?LinkId=47750.

What’s New

This version of the Network Administration SMF contains updated references to currently available Product Operating Guides (POGs) for Microsoft technologies operated as part of an organization’s network infrastructure. These technologies include DNS, WINS, and DHCP servers installed as part of the Microsoft Windows Server™ 2003 operating system. The guidance in this SMF reflects the current best practices used by the Microsoft IT organization in operating and maintaining the extensive internal Microsoft network.

Feedback

Please direct questions and feedback about this SMF guide to cisfdbk@microsoft.com.

Network Administration Overview

As defined in MOF, a network consists of the infrastructure components through which computer systems and shared peripherals communicate with each other. It is the most basic level of an IT infrastructure—without network facilities, there is no infrastructure, just a collection of individual computers. The Network Administration SMF is focused on the operation of this basic service.

The Network Administration SMF is situated in the MOF Operating Quadrant, illustrated below in Figure 1. It is closely related to the Storage Management, Directory Services Administration, and Job Scheduling SMFs since it provides a similar foundation on which higher-order IT layers are built.

Figure 1. The MOF Process Model, with SMFs. The Network Administration SMF resides in the Operating Quadrant.

Figure 1. The MOF Process Model, with SMFs. The Network Administration SMF resides in the Operating Quadrant.
See full-sized image

Figure 2 illustrates the internal organization of the Operating Quadrant. The System Administration, Security Administration, and Service Monitoring and Control SMFs all exert some level of control over the more fundamental SMFs (bottom row of the triangle) in the quadrant. In some organizations, the triangular hierarchy depicted in Figure 2 may be collapsed to reflect organizational staffing levels and assignments, with some sharing of functions and responsibilities.

Figure 2. Organizational hierarchy of the Operating Quadrant

Figure 2. Organizational hierarchy of the Operating Quadrant

Goals and Objectives

The goal of the Network Administration SMF is to provide and reference a solid foundation of processes for administering a network environment on a day-to-day basis. This entails managing and providing operational support for various elements within the production environment. The SMF’s objectives include providing planning and deployment services to expand existing network facilities, as well as support services to troubleshoot and repair faults in the network environment. Through effective implementation of the Network Administration SMF, IT organizations can expect to:

Improve their deployment of network infrastructure.

Improve troubleshooting processes and associated incident-management processes.

Increase network reliability.

Enhance availability of IT solutions and services.

Scope

A typical network consists of hardware—including cabling, routers, switches, hubs, physical servers, and other components—and the software or firmware that controls the manner in which the hard components are utilized. In the networking model described by Open Systems Interconnection (OSI), the typical IT infrastructure is constructed in layers, from basal components that are used by all services at the bottom of the stack, to specialized applications at the top.

The layers making up the OSI stack are (from the top, down):

1.

Application

2.

Presentation

3.

Session

4.

Transport

5.

Network

6.

Link (Data Link)

7.

Physical

Network administration is typically involved with the first three layers of the stack, which mostly consist of hardware. There is some overlap between network and system administration at the transport level, which includes the linking and networking protocols that enable the transfer of data from one point to another. From the MOF perspective, management of such services as DNS, WINS, and DHCP provides the basic name resolution services required by fully featured IT services. Depending upon the organization, these core services may also be included as network service functions. Since DNS, WINS, and DHCP run on servers, network servers are sometimes included among the hardware components managed by the Network Administration SMF.

There is overlap between the Network Administration SMF and its sister SMFs within the Operating Quadrant. Network servers, such as DNS and WINS, require basic maintenance operations such as health monitoring (Service Monitoring and Control). In organizations running Microsoft Active Directory® directory service, there may be overlap in the processes applied to manage Active Directory itself and DHCP, which is tightly integrated with it.

The Network Administration SMF is also closely aligned with SMFs outside its quadrant. Upgrading network components is an intrinsic part of proactive network operations. These changes are controlled through the Change Management, Configuration Management, and Release Management SMFs within the Changing Quadrant. Similarly, although resolving user outages or other issues is the responsibility of the Incident Management SMF within the Supporting Quadrant, troubleshooting network-related issues is typically a specialty task that occurs within the Network Administration SMF. Some of these allied processes will be referred to or described within this SMF since they are central to a unified approach to managing a network.

This SMF provides generalized guidance for the configuration and maintenance of the hardware and software components of a network. Individual networks may vary widely in their overall architectures and component-level constituents. For this reason, guidance for specific hardware configuration and maintenance is presented as general recommendations within this document, and the reader is directed to review the specific manufacturer’s guidance for information about individual hardware components. Similarly, Microsoft has published individual POGs for DNS, DHCP, and WINS operating on the Windows Server platform. Links to each of these guides are provided at http://go.microsoft.com/fwlink/?LinkId=47751.

Key Definitions

DHCP

Dynamic Host Configuration Protocol (DHCP) is a TCP/IP standard that reduces the complexity and administrative overhead of managing network client IP address configurations by automating the assignment of IP addresses.

DNS

Domain Name System (DNS), in computer communications, is a method of translating Internet addresses so that computers connected in the Internet can find each other. A DNS server translates a numerical address assigned to a computer (such as 207.46.228.91) into a sequence of words, and vice versa.

network

Techniques, physical connections, and computer programs used to link two or more computers. Network users are able to share files, printers, and other resources; send electronic messages; and run programs on other computers.

NOS

A network operating system (NOS) is an operating system that includes software to communicate with other computers by means of a network. This allows such resources as files, application programs, and printers to be shared between computers.

protocol

A set of established standards for data transfer that enables computers to communicate with each other.

RAS

Remote Access Service (RAS) is a technology that permits remote users to log on to and use a corporate network.

VoIP

Voice over IP (VoIP) is a technology that enables voice communications (telephony) over the Internet.

WINS

Windows Internet Name Service (WINS) is the name resolution system used for Microsoft Windows NT® Server 4.0 and earlier Microsoft operating systems.

Processes and Activities

This chapter provides a detailed discussion of the processes and activities that occur in the Network Administration SMF. The initial architecture, design, and development of a network infrastructure are beyond the scope of this SMF. However, network administrators should be completely familiar with their network’s architecture and configuration in order to properly operate, expand, and maintain it. The day-to-day operation of a static, healthy network occurs primarily within the Operating Quadrant, as defined in MOF. However, other typical activities, such as routine network upgrade, component replacement, or troubleshooting, involve a broader scope of SMFs. (This relationship between the Network Administration SMF and related SMFs will be described in more detail in a subsequent section.)

Figure 3. Network administration tasks relate closely to processes in other MOF SMFs.

Figure 3. Network administration tasks relate closely to processes in other MOF SMFs.
See full-sized image

Network Components Overview

As mentioned previously, networks consist of complex architectures of hardware and software. Each of these components requires routine monitoring or maintenance to achieve negotiated operating levels. Components are occasionally subject to fault or error, and they may be eventually replaced or upgraded to meet business demand. To understand the processes required to operate the network, it is appropriate to briefly review the network components themselves.

Hardware Components

The hardware layer of a network may be extensive, typically including the following components:

Cabling

Network adapters/network interface cards (NICs)

Hubs

Switches

Routers

Content switching

Wireless access points

Firewalls

These components may be supplied through a variety of vendors. In fact, depending on the degree of standardization to which an IT organization subscribes, individual component categories—routers, for example—may be obtained from multiple vendors. The product documentation accompanying these components generally describes their installation and configuration in detail.

Software Components

As described above, many network hardware components contain firmware that may require initial configuration according to the manufacturer’s recommendations. For example, network devices typically are configured either through a firmware-based HTML interface, which is accessed by an Internet browser that is pointed to the device’s specific IP address, or through a telnet session.

For networks operating on Microsoft Windows Server, there are several software components in the network as well. These services include DNS, WINS, and DHCP. Each of these components provides basic functionality within the network and is critical to the availability of higher-order services. In many networks, Remote Access Service (RAS) is also a highly used network component. Maintaining these components is a necessary part of network administration.

Network Processes Overview

IT specialists recognize that their typical network-related workload is divided into three major categories. Their tasks often involve changing the network infrastructure, frequently by deploying new network segments or by reconfiguring hardware or network services in some way. Network administrators must also maintain the infrastructure by monitoring its health and performing routine maintenance. And finally, they support network operations by troubleshooting outages or other performance issues, which sometimes leads to necessary changes in the infrastructure.

Established networks may require change for a variety of reasons. The expansion of a business into new facilities, the addition of staff, or the acquisition of subsidiaries may all trigger a need to expand network facilities. Business demands may necessitate the upgrading of network facilities as well—for example, increased corporate use of video or other digital media in the workplace may require upgrading to higher speed and bandwidth networks. Sometimes new standards or vendor changes will require replacement or reconfiguration of network hardware. The MOF Change Management SMF describes the process by which changes of this nature are evaluated and approved. Some of the significant parts of the change process described in the Change Management SMF include:

Communicating the change to the customer.

Defining a roll-back plan.

Completing a technical review of the proposed change.

Convening the change advisory board (CAB) to approve the change.

Capturing the pre-change configuration of the system.

Validating the post-change configuration against the expected results.

Testing the change to confirm that it meets desired functionality.

Maintaining a Network

Operating the network infrastructure is largely a matter of monitoring its performance, evaluating that against expected norms, and generating work items to troubleshoot if performance drops off. Most hardware components within a network should operate without hands-on maintenance or intervention within the manufacturer’s specifications for mean time between failure and other performance criteria. The MOF Capacity Management SMF provides details for capacity planning that will help the network design team in optimizing network performance.

The server-based components of the network do require periodic attention, however. These components require regular backups, where applicable, and evaluations of storage or capacity requirements, in accordance with the Storage Management SMF. Specific guidance for DNS, WINS, and DHCP follow.

DNS Monitoring and Maintenance

Domain Name System (DNS) is the primary method for name resolution for many operating systems, including UNIX, Linux, Microsoft Windows®, and others. DNS is also a requirement for deploying Active Directory, but Active Directory is not a requirement for deploying DNS. Management of Active Directory is a function of the MOF Directory Services SMF.

Operation of the DNS service is primarily focused on monitoring the continued health of the service. This task is generally assigned to the Service Monitoring and Control SMF; however, network managers are key stakeholders in identifying the attributes to be monitored and determining the thresholds at which alerts are set and action is taken. The performance items typically monitored include:

Total responses sent/second

Total queries received/second

WINS lookups received/second

WINS responses sent/second

WINS reverse lookups received/second

WINS reverse responses sent/second

Pages input/second

Pages output/second

Pages read/second

Pages written/second

Details for configuring the Windows System Monitor to log these specific items are provided in the DNS Product Operations Guide, which also describes additional procedures for accessing this information through scripting. In addition, the guide provides procedures for reporting this data and evaluating it for performance optimization. The DNS Product Operations Guide is available from Microsoft TechNet at
http://go.microsoft.com/fwlink/?LinkId=47752.

Within an organization, DNS servers will generally be configured to optimized settings. In general, standardization of DNS settings will aid in preventing failures due to setting incompatibilities between servers and/or subnets. All configuration settings should be stored in the configuration management database (CMDB) in order to provide a quick reference and the ability to restore or duplicate settings in the event of component failure or system expansion. Standardization of server and other infrastructure configurations is described in the Infrastructure Engineering SMF, while configuration documentation is discussed in the Configuration Management SMF.

WINS Maintenance

Although Microsoft Windows Server 2003 uses DNS as its primary method for matching a host name to an Internet Protocol (IP) address, it also supports the Windows Internet Name Service (WINS) for the same purpose. WINS is the name resolution system used for Windows NT Server 4.0 and earlier operating systems.

As with DNS, the primary functions of network operations personnel relative to WINS maintenance is the continued monitoring of the health of the service, applying and documenting configurations, and providing reliable backups of registry keys and other critical configuration settings.

Detailed recommendations for the capture of usage statistics, system load, and utilization metrics are provided in the WINS Service Product Operations Guide. This guide is available as a free download from Microsoft TechNet at
http://go.microsoft.com/fwlink/?LinkId=47753.

DHCP Maintenance

Dynamic Host Configuration Protocol (DHCP) is a TCP/IP standard that reduces the complexity and administrative overhead of managing network client IP address configurations. Microsoft Windows Server 2003 provides the DHCP service, which enables a computer to function as a DHCP server and to configure DHCP-enabled client computers on a network. DHCP runs on a server computer, enabling the automatic, centralized management of IP addresses and other TCP/IP configuration settings for a network’s client computers. The Microsoft DHCP service also provides integration with the Active Directory and DNS services, enhanced monitoring and statistical reporting for DHCP servers, vendor-specific options and user-class support, multicast address allocation, and rogue DHCP server detection.

Similar to the operation of DNS and WINS, the ongoing operation of DHCP involves monitoring the service and applying the information obtained in activities to maintain service availability. Operation also includes backup and restore functions and the retention of configuration settings in a central database. The DHCP Product Operations Guide contains guidance for these processes and is available as a free download from Microsoft TechNet at
http://go.microsoft.com/fwlink/?LinkId=47754.

RAS Maintenance

Remote Access Service (RAS) is a core Windows feature that enables connectivity to the corporate network through switched services, such as analog and ISDN modems, as well as virtual tunnels over TCP/IP using PPTP and L2TP on the Internet. Users connecting by means of RAS experience virtually the same application and data access as if they were physically present on the network. Authentication of clients may be enabled over a variety of authentication protocols providing different levels of network security according to the needs of the network administrator. Remote Access Quarantine Service (RQS) is an important component of RAS. RQS, when added to RAS, provides the capability to enforce security state by customized client scripts and the Remote Quarantine Client (RQC), which signals to RQS that the user has passed the security state checks.

Ongoing operation of RAS involves monitoring the service to maintain service availability, security, and capacity management. The detailed guide for RAS management processes is available as a free download from Microsoft TechNet at
http://go.microsoft.com/fwlink/?LinkId=47755.

Supporting a Network

Network support is closely aligned with activities in the Supporting Quadrant, particularly the Incident Management SMF and Problem Management SMF. Through the incident resolution process described in the Incident Management SMF, IT networking specialists correct network errors, develop workarounds, and prevent or mitigate impending network issues. Although the generic process for resolving incidents is described in the Incident Management SMF guidance document, network-specific processes for troubleshooting are provided in the following sections.

Importance of Network Troubleshooting

No matter how well a system has been designed or operated, there will always be issues that occur that will affect the network—from hardware failures to user errors. Because so many applications and services depend on network availability, the pressure on the network administrators when a network component fails is significant. For this reason, it is crucial that network troubleshooting techniques and tools be familiar to all those who provide support.

Network troubleshooting is performed by a special team of experts called a resolver group. For more information about resolver groups and their function, refer to the Incident Management Service Management Function Guide.

Troubleshooting Methodology

Having a plan of action is one of the key requirements in troubleshooting a network incident. Many of the incidents handled are likely to be user issues involving non-network errors, such as improper use of software or workstation setup. On the occasion that an administrator confronts what appears to be a truly network-related issue, he or she should follow an established troubleshooting procedure. The following steps provide a recommended model for effective network troubleshooting:

1.

Establish the symptoms.

2.

Identify the affected area.

3.

Establish what has changed.

4.

Select the most probable cause.

5.

Implement a solution.

6.

Test the results.

7.

Recognize the potential effects of the solution.

8.

Document the solution.

The process followed may vary slightly or may be performed in a slightly different order, but the overall process should contain all of the listed procedures. The following sections examine each of these steps.

Establishing the Symptoms

The first step in troubleshooting a network incident is to determine exactly what is going wrong and to note the effect of the incident on the network. This evaluation provides the administrator with sufficient knowledge to assign a priority to the incident. In a large network environment, there are often many more calls for support than the network support staff can handle at one particular time. Therefore, it is essential to establish a system of priorities that dictates which calls get addressed first. As in the emergency department of a hospital, the priorities should not necessarily be based on who is first in line. More often, it is the severity of the incident that determines who gets attention first, although it is usually not wise to ignore the political reality that senior management incidents frequently are addressed before those of the rank and file.

The following guidelines may assist in establishing incident resolution priorities:

Shared resources take precedence over individual resources. An incident involving a server or other network component that prevents many users from working must take precedence over one that affects only a single user.

Network-wide incidents take precedence over workgroup or departmental incidents. Resources that provide services to the entire network, such as e-mail servers, should be considered before departmental resources, such as file and print servers.

Rate departmental issues according to the function of the department. Incidents involving resources belonging to a department that is critical to the organization, such as order entry or customer service call centers, should take precedence over departments that can better tolerate a period of downtime, such as research and development.

System-wide incidents take precedence over isolated incidents. An incident that puts an entire computer out of commission, preventing a user from getting any work done, should take precedence over an issue a user is experiencing with a single device or application.

Part of the process of narrowing down the cause of a particular incident involves obtaining accurate information about what has occurred. Users are often vague about what they were doing when they experienced the incident, or even what the indications of the error or issue were. For example, in many cases, users call the help desk because they received an error message, but they neglect to write down the wording of the message. Persistent but subtle training of users in the proper procedures for documenting and reporting incidents is also part of the network support technician’s job.

Asking questions such as the following can help determine the cause of an incident:

What exactly were you doing when the incident occurred?

Have you had any other incidents?

Was the computer behaving normally just before the incident occurred?

Has any hardware or software been installed, removed, or reconfigured recently?

Did you (or anyone else) do anything to try to resolve the incident? What did you do?

Identifying the Affected Area

The next step in assessing the nature of the incident is to attempt to duplicate it. Network incidents that you can easily duplicate are far easier to fix, primarily because they can be tested to see if the repair was successful. However, there are many types of network incidents that are intermittent or that might occur for only a short period of time. In these cases, the incident may be left open until it occurs again. In some instances, having the user reproduce the incident can lead to the solution. User error is a common cause of incidents that can seem to be hardware- or network-related to the inexperienced user.

Once the incident has been duplicated, the actual source may be determined. If, for example, a user has trouble opening a file in a word processing application, the difficulty might lie in the application, the user’s computer, the file server where the file is stored, or any of the networking components in-between. The process of isolating the location of the incident consists of eliminating the elements that are not the cause—in a logical and methodical manner. In an incident such as this, only a limited number of possible causes are network-related.

If it is possible to duplicate the incident, isolation of the cause may be initiated by reproducing the conditions under which the incident occurred, using a procedure such as the following:

1.

Have the user reproduce the incident on the computer repeatedly to determine whether the user’s actions are triggering the error.

2.

Attempt to reproduce the incident by duplicating the user task. If the incident does not occur, the cause might be in how the user is performing a particular task. Check the user’s procedures carefully to see if he or she is doing something wrong. It is entirely possible that the resolver and the user perform the same task in different ways and that the user’s method is exposing an incident that the resolver’s doesn’t.

3.

If the incident recurs upon performing the task, log off from the user’s account, log on using an account with administrative privileges, and repeat the task. If the incident does not recur, it is probably the result of the user not having the rights or permissions needed to perform the task.

4.

If the incident recurs, try to perform the same task on another, similarly equipped computer connected to the same network. If the incident can’t be reproduced on another computer, the cause likely lies in the user’s computer or its connection to the network. If the incident does recur on another computer, it is likely a network incident, either in the server that the computer was communicating with or the hardware that connects the two.

If the incident lies somewhere in the network and not in the user’s computer, the resolver can then begin the process of isolating the area of the network that is the source of the incident. For example, if the incident is reproduced on another nearby computer, then begin performing the same task on computers located elsewhere on the network. Again, proceed methodically and document the results. For example, try to reproduce the incident on another computer connected to the same hub, and then on a computer connected to a different hub on the same LAN. If the incident occurs throughout the LAN, try a computer on a different LAN. Eventually, the source of the incident should be traced to a particular component, such as a server, router, hub, or cable. A configuration management database (CMDB) should have an accurate representation of all the dependencies in the IT infrastructure and can be an invaluable tool in determining root cause. See the Configuration Management Service Management Function Guide for more information.

Establishing What Has Changed

When a computer or other network component that previously worked properly now does not, it is logical to assume that some change has occurred. When a user reports an incident, one of the most important pieces of information the network troubleshooter can gather is how the computing environment changed immediately prior to the malfunction. Unfortunately, getting this information from the user can often be difficult. The response to the question “Has anything changed on the computer recently?” is nearly always “No,” and it’s only some time later that the user remembers to mention that a major hardware or software upgrade was performed just prior to the incident’s occurrence. On a network with a properly established CMDB, it should be easy to determine if any upgrades or modifications to the user’s computer have been made recently. The CMDB is the first place to look for information like this.

Major changes, such as the installation of new hardware or software, are obvious possible causes of network incidents, but the network troubleshooter must be aware that more subtle changes can cause incidents as well. For example, an increase in network traffic levels, as disclosed by a protocol analyzer, can be a contributing cause of a reduction in network performance. Occasional incidents noticed by several users of the same application, cable segment, or LAN can indicate the existence of a fault in a network component. Tracking down the source of a networking incident can often be a form of detective work, and learning to “interrogate” your “suspects” properly can be an important part of the troubleshooting process.

Selecting the Most Probable Cause

There’s an old medical school axiom that states, “When you hear hoofbeats, think horses, not zebras.” In the context of network troubleshooting, this means that when searching for possible causes of an incident, begin with the obvious. For example, if a workstation is unable to communicate with a file server, don’t start by checking the routers between the two systems; check the simple things on the workstation first, such as whether the network cable is plugged into the computer. The other important part of the process is to work methodically and document everything investigated in order to avoid duplication of efforts.

Implementing a Solution

After the source of the incident has been isolated to a particular piece of equipment, proceed to determine if the error is caused by hardware or software. If it is a hardware incident, an option is to replace the unit that is at fault or try using an alternate. Communication incidents, for example, might require replacing network cables until the faulty one is found. If the incident is in a server, replacement of components, such as hard drives, might be performed until the faulty component is identified. If the incident is caused by software, try running an application or storing data on a different computer, or reinstalling the software on the offending system.

In some cases, the process of isolating the source of an incident also resolves the incident. If, for example, the incident investigation involves replacing network patch cables until the faulty one is located, then replacing the bad cable is also the resolution of the incident. In other cases, however, the resolution might be more involved, such as having to reinstall a server application or operating system. Because other users might need to access that server, resolution of the incident may require deferral until a later time when the network is not in use and the server has been backed up. In some cases, outside help, such as a contractor to pull new cables, may be required. This can require careful scheduling to avoid having the contractor’s work conflict with user and operator activities. Sometimes an interim solution or workaround, such as providing a substitute workstation or server, may be indicated until the incident can be resolved definitively.

Testing the Results

After resolving the incident, return to the very beginning of the process and repeat the task that originally caused the incident. If the incident no longer occurs, test the other functions related to the changes made to ensure that in fixing one incident, another hasn’t been created. It is at this point that the time spent documenting the troubleshooting process becomes worthwhile. Exactly repeat the procedures used to duplicate the incident to ensure that the incident the user originally experienced has been completely eliminated, and not just temporarily masked. If the incident was intermittent to begin with, it may take some time to ascertain if the incident resolution has been effective. The user may need to be queried several times to make sure that the incident is not recurring.

Recognizing the Potential Effects of the Solution

It is important, throughout the troubleshooting process, to stay cognizant of the network as an entity and not focus too closely on the incidents experienced by one user (or application, or LAN). It is sometimes possible, while implementing a solution to one incident, to create another that is more severe or that affects more users. For example, if users on one LAN are experiencing high traffic levels that diminish their workstation performance, a possible remedy is to connect some of their computers to a different LAN. However, although this solution might help the users originally experiencing the incident, you might overload another LAN in the process, causing another incident that is more severe than the first one. A better solution might be to create an entirely new LAN and move some of the affected users over to it.

Documenting the Solution

Although it is presented here as a separate step, the process of documenting incident resolution actions should begin as soon as the user calls for help. A well-organized network support organization should have an incident management system in place in which each incident is registered and contains a complete record of the issue and the steps taken to isolate and resolve it. In many cases, a technical support organization operates using tiers, which are groups of technicians of different skill levels. Calls come in to the first tier; and if the incident is sufficiently complex or the first-tier technician is unable to resolve it, the call is escalated to the second tier, which is composed of senior technicians. As long as everyone involved in the process documents his or her activities, there should be no incident where one technician hands off the ticket to another. In addition, keeping careful notes prevents people from duplicating each other’s efforts. For a more detailed explanation of this process, refer to the Incident Management SMF document.

Roles and Responsibilities

Principal roles and their associated responsibilities for network administration have been defined according to industry best practices. Organizations might need to combine some roles depending on organizational size, organizational structure, and the underlying service level agreements existing between the IT department and the business it serves.

It is important to note that these are roles, not job descriptions. A small organization may have one person perform several roles, while a large organization may have a team of people for each role. The specific responsibilities associated with each role are summarized below.

Network Manager

The network manager is responsible for providing network communications services for IT applications and services. Since networking is critical to so many types of applications, network managers are usually under significant pressure to maintain and improve the data communications infrastructure. As a result, the network manager must participate in IT design changes, monitor the existing infrastructure, and repair the infrastructure when it fails.

Often the network manager will be assisted by junior network technicians and network support technicians in the performance of his or her duties.

Table 1. Network Manager Responsibilities

RoleMain Responsibilities

Network manager

Manages the data communications needs of the company.

Manages the physical network infrastructure, including wired and wireless local area network (LAN).

Manages infrastructure servers: Active Directory, WINS, DNS, DHCP, Proxy, RAS, and Internet Security and Acceleration (ISA) Server.

Manages the acquisition of new network hardware as required.

Participates in network planning, design, development, deployment, and modification.

Monitors and controls service levels of network suppliers.

Liaises with the Service Monitoring and Control SMF to establish a list of monitored network activities.

Ensures that data communication within the company is reliable and of sufficient capacity to meet business needs.

Provides physical connections to the corporate LAN as required.

Ensures that data communications packets are routed efficiently.

Provides regular feedback on network performance, both in general and against specific service levels.

Provides access to the corporate network via dial-up or virtual private network (VPN) as required.

Monitors bandwidth use, analyzes traffic patterns and volumes, and determines impact/implications of issues.

Ensures security standards are upheld.

Network Technician

The network technician works closely with the network manager. In fact, the network technician performs the routine monitoring of the network on behalf of the network manager. The network technician is the person who actually performs site installations as directed by the network manager.

Table 2. Network Technician Responsibilities

RoleMain Responsibilities

Network technician

Monitors and controls service levels of network suppliers.

Ensures detection of alerts from the network infrastructure.

Provides physical connections to the corporate LAN as required.

Ensures that data communications packets are routed efficiently.

Provides regular feedback on network performance, both in general and against specific service levels.

Monitors bandwidth use, analyzes traffic patterns and volumes, and determines impact/implications of issues.

Ensures security standards are upheld.

Network Support Technician

The network support technician works closely with the network manager, incident manager, and problem manager. The network support technician is responsible for resolving incidents on the network, identifying problems and errors, and establishing workarounds to restore network operation.

Table 3. Network Technician Responsibilities

RoleMain Responsibilities

Network support technician

Handles service requests.

Monitors incident details, including the configuration items affected.

Investigates and diagnoses incidents and problems (including resolution where possible).

Detects possible problems and notifies problem management.

Documents the resolution and recovery of assigned incidents.

Acts as a restoration team member, if required, during major incidents.

Carries out actions in order to correct known errors.

Network Security Technician

The network security technician is responsible for implementing standards and policies that secure the data and voice networks from internal and external threats. These standards and policies are incorporated into the network design and may include data encryption, encapsulation, and certification. All of these design characteristics must typically be applied to ensure data confidentiality, integrity, and availability.

Table 4. Network Security Technician Responsibilities

RoleMain Responsibilities

Network security technician

Performs monitoring and analysis of intrusion detection and other security breaches.

Maintains access list.

Performs firewall maintenance.

Voice Communications Technician

Voice communications and data communications are becoming more closely related every day. In fact, most voice traffic is currently converted to data at some point in its transfer to the receiver, and voice-over-IP (VoIP) telephones are becoming increasingly common.

The voice communications technician is responsible for providing voice communications services for business personnel and IT applications. This can include providing telephones to the desktop or modems for dial-up computer access.

The voice communications technician is also responsible for installing and maintaining the interactive voice response (IVR) and predictive dialing systems that a company may have in place for its call centers and service desks.

Table 5. Voice Communications Technician Responsibilities

RoleMain Responsibilities

Voice communications technician

Ensures that the communications infrastructure is in place and in good working order.

Installs and maintains telephones, voice mail, and other communications equipment.

Installs and maintains private branch exchange (PBX) systems.

Installs modem banks for in-bound dial-up network and virtual private networks.

Installs and maintains in-bound interactive voice response (IVR) systems.

Installs and maintains outbound predictive dialing systems.

Outsourcing Manager

The outsourcing manager works with the network manager and the security manager to identify and mitigate potential security risks associated with suppliers and vendors.

Table 6. Outsourcing Manager Responsibilities

RoleMain Responsibilities

Outsourcing manager

Evaluates partner offerings for applicability to need.

Negotiates and manages costs associated with partnerships.

Determines which partners will be the primary source of service and which will be the secondary or backup partners.

Manages IT procurement and purchasing functions.

Monitors the performance of provider services.

Works with the partner to optimize performance.

Assesses and minimizes any security risks that a supplier poses.

Audits a supplier for security compliance.

Creates contingency plans in the event that one or more partners fail to meet their contractual obligations.

Security Compliance Auditor

The security compliance auditor works with the network manager and the security manager to audit network security efforts and evaluate risks identified as a result of the audit.

Table 7. Security Compliance Auditor Responsibilities

RoleMain Responsibilities

Security compliance auditor

Audits the efforts of the various security technicians associated with the network to ensure compliance with the standards set by the security manager.

Evaluates risks to the enterprise identified as a result of the security audit.

Relationship to Other SMFs

As proactive network administration becomes a more critical and central function to the operation and administration of your computing environment, it is important to understand how providing this service affects other operational processes. The following sections describe how network administration affects and/or interacts with the other MOF SMFs.

Changing Quadrant

Change Management

Network administration works closely with change management to ensure that plans to release changes to the network do not negatively affect the existing infrastructure. It is the job of network administration to understand how all of the pieces of the network infrastructure work together and to be able to assess the impact of releasing a change.

Configuration Management

Configuration management includes the processes and procedures necessary to account for the equipment in its current configuration and to document all subsequent changes to the configuration. Network administration should ensure that the current configuration of the network and all of its components is accurately represented in the CMDB, which will facilitate the troubleshooting of network-related incidents.

Release Management

The Release Management SMF pertains to the efficient release of changes across the IT environment. Network administration works with release management when it becomes necessary to change the IT network infrastructure in some manner. Network administration may also work with the release and availability managers to ensure that network resources are available for major releases of other widespread IT changes.

Operating Quadrant

Directory Services Administration

Directory services contain all user and system profiles. Directory services administration deals with properly configuring and modifying object profiles to optimize functionality and security in a system. It is extremely important that directory services administrators be familiar with the directory's network requirements. Directory replication can place a significant load on the network and should be configured with the capacity of the links as the primary consideration. Active Directory, when used, establishes a close relationship between itself, DNS, and DHCP. The configuration and operations processes associated with all three services may be performed through the Directory Services or Network Management SMFs individually or as a blend of the two.

Job Scheduling

Network administration can be involved in batch processing tasks at different times throughout the day (or night) such that use of system resources are maximized, but business and system functions are not compromised. The impact of these tasks on the network should be one of the criteria for determining when to schedule them.

Security Administration

Security is a crucial part of a network infrastructure. An information system with a weak security foundation eventually experiences a security breach. Examples of security breaches include data loss, data disclosure, loss of system availability, corruption of data, and so on. Depending on the information system and the severity of the breach, the results could vary from embarrassment, to loss of revenue, to loss of life. An improperly configured or inadequately physically protected network can be a tremendous security risk. Network administrators must ensure proper physical security of network components to prevent unauthorized access. Network administrators must also be familiar with proper firewall configuration and maintenance.

Service Monitoring and Control

An active service monitoring and control (SMC) function is critical to the functioning of a network. Through SMC, IT organizations monitor the current health of their networks and are alerted to changes that can affect the stability of these networks and potentially cause unscheduled outages.

SMC functions include the analysis of event logs and the recording of information gathered from a variety of specialty tools. The analysis of this data can influence decisions to upgrade or expand the network or its hardware, which can have a significant impact on availability, stability, capacity, and cost.

System Administration

System administration defines the administration model used by an organization. Some organizations prefer a model where all IT functions are performed at a single site with a team of IT professionals co-located at that site. Other organizations prefer a distributed branch office model where both technologies and support staff are geographically distributed. System administration examines the trade-offs of each model. Each type of system administration model has unique network requirements. As the systems and personnel become farther apart geographically, the load on the network and the need for reliable network links becomes more important.

Storage Management

Storage management deals with on-site and off-site data storage for the purposes of data restoration and historical archiving. Such technologies as storage area networks and remote tape backup units can place significant stress on the network infrastructure. Network administration should provide dedicated network links for any technology that is highly network dependent.

Supporting Quadrant

Incident Management

The Incident Management SMF is responsible for resolving incidents and user issues that may occur throughout the IT infrastructure. Network troubleshooters work closely with incident management roles to identify, diagnose, and resolve those incidents that are network related.

Problem Management

When specific problems involving multiple systems become persistent on the network, network administration works closely with problem management to determine the causes of the problems and to provide solutions to them.

Optimizing Quadrant

Availability Management

Availability management is a primary concern of network administration. An entire enterprise can be disabled if network resources become unavailable. Network administration works very closely with availability management to implement such technologies as redundant network links in order to maximize availability.

Capacity Management

Capacity management deals with planning for additional resources as current system resource use increases and begins to near the point of full capacity. It is the responsibility of capacity management to take this information and use it to justify upgrading, expanding, or possibly downsizing its networking resources.

Infrastructure Engineering

The Infrastructure Engineering SMF involves, among other functions, the consolidation and management of organizational IT standards and policies. Network administration works closely with infrastructure engineering to establish standards for network devices and other hardware, as well as developing standard architectures for design and expansion.

IT Service Continuity Management

It is the responsibility of network administration to assist in the planning and testing of a contingency plan that includes not only hardware and software losses, but also facilities losses. This may involve the procurement and installation of backup network hardware or the specification of alternate emergency facilities that may be used in the event of a catastrophic infrastructure loss.


Top of pageTop of pagePrevious12 of 23Next
**
**