The intent of this document is to provide operational guidance on Microsoft Systems Management Server version 2.0 (SMS) for organizations that have deployed, or are considering deploying, Microsoft technologies in a data center or in other enterprise computing environments. This document describes how to use and maintain SMS within such an environment.
This material is valuable to anyone wanting to deploy or manage this product within an existing IT infrastructure, especially one that uses IT Infrastructure Library (ITIL) or Microsoft Operations Framework (MOF). This material is aimed primarily at two main groups: information technology (IT) service managers and IT support staff (including analysts and service desk specialists).
This guide tells you how to use the tools and technologies that are delivered with this product in a specific scenario to ensure that you can run SMS smoothly and accurately in your environment.
You can read this guide as a single volume. Reading the document this way provides necessary context so that the reader can understand later material more readily. However, some people will prefer to use the document as a reference, only looking up information as they need it.
Click here to download a copy of this guide.
MOF, which is part of Microsoft Frameworks, connects products and technologies to IT customer solutions. MOF offers comprehensive service management and technical guidance for achieving mission-critical production system reliability, availability, and manageability for Microsoft products and technologies. MOF is divided into four phases, each representing a part of the MOF process model. Each phase comprises various Service Management Functions (SMF). SMFs are foundational-level best practices and are the core of the MOF process model. Section 5 of this Operations Guide describes each SMF and how this product relates to it. For more information about MOF, see http://www.microsoft.com/technet/itsolutions/cits/mo/mof/default.mspx.
Every company consists of employees (people), activities that those employees perform (processes), and tools that help them perform those activities (technology). No matter what the business, it most likely consists of people, processes, and technology working together to achieve a common goal. The table below demonstrates this point.
| Area | People | Process | Technology |
Auto repair industry | Mechanic | Repair manual | Socket set |
Military | Soldier | Battle plan | Weapons |
Software development industry | Programmer | Project plan | Compiler, debugger |
IT operations | IT technician | Microsoft Operations Framework | Microsoft Systems Management Server |
SMS is also a tool that can help IT operations personnel perform the processes that help ensure system security, reliability, and availability. This section uses the Microsoft Operations Framework as the processes that must be performed and demonstrates how SMS can facilitate the service management best practices that are defined in MOF. Some of the key benefits of proper service management are business alignment, system security, reliability, and availability.
You should use this document when you plan a Systems Management Server deployment. Its primary purpose is to enable those who are using SMS to minimize downtime and to get the best return on their investment in Systems Management Server. This document exposes schedule maintenance tasks for the management tool you choose to ensure the that you can reduce operational costs, such as down time due to system failure, duplication of maintenance.
SMS is Microsofts enterprise management tool for delivering configuration and change management of Microsoft Windows server and workstation operating systems. It also provides a number of essential tools that are necessary to support problem management process.
Configuration Management
Configuration management is concerned with the full lifecycle of the components that build business applications and IT infrastructure. This lifecycle covers initial development through to implementation and eventual retirement of the configured item. SMS directly supports the concept of configuration management by providing tools that identify configuration items (CIs) and by serving as a repository for CI details.
Change Management
Change management is concerned with the coordination, authorization and communication needed to ensure that the right people know of and approve a change to the IT infrastructure or services. SMS provides direct support for change management through software distribution and status system features using a database that tracks changes from implementation to completion.
Problem Management
Problem management is concerned with reducing the effect and recurrence of IT service problems by responding to them efficiently and effectively. SMS provides help desk tools that support problem management.
The following table is a reference to the tools that are used in the procedures described in this operations guide.
| Required Tools | Description |
Microsoft SQL Server Enterprise Manager | Tool for configuring and managing Microsoft SQL Server |
SMS Administrator console | Tool for configuring and managing SMS |
Active Directory Users and Computers or User Manager for Domains | Tool for managing users in Active Directory and Windows NT 4.0 domains |
SMSTrace | Tool for analyzing SMS log files (available with SMS Support tools) |
WBEMTEST | Tool for testing WMI connectivity (available with WMI) |
SMS Message Viewer | Tool for analyzing SMS status messages (installed as part of SMS Administrator console) |
Recommended Tools | Description |
Scripting Technologies | VBScript allows more complex manipulation of SMS Via the WMI APIs |
OSQL | Microsoft SQL Server command-line interface |
Batch Files | Running a sequence of command-line commands via a batch file script |
The following checklists are a quick reference for those tasks that must be performed on a regular basis. These task lists summarize the tasks that are described in subsequent sections of this guide.
| Daily Tasks |
Check staffing level |
Review daily problem reports |
Review emergency change requests |
Execute Database Consistency Checkers |
Check SQL Server error logs |
Monitor network for performance or error related conditions |
Monitor client status |
Monitor site components and service status |
Monitor site system status |
Monitor event logs on key servers |
Monitor package and advertisement status |
Monitor system performance |
Monitor system directories (logon servers) |
Make a secure backup of each Systems Management Server site |
Weekly Tasks |
Attend Change Management Review Board |
Run team meetings |
Meet with business managers |
Improve SQL performance |
Check system directories (site servers) |
Monitor system directories (client access points) |
Produce management reports |
Manage file system |
Monthly Tasks |
Compile a monthly status report |
Have one-on-one meetings with team members |
Review system health and performance |
Improve system performance |
Secure system accounts |
Review access to SMS functions |
Confirm site recovery from backup media |
| As-Needed Tasks | |
Task | When to Perform |
Team brainstorming | Periodically to ensure that the team shares ideas and plans |
Collect feedback with a survey | At least once per year and more often based on business requirements |
Provide third-line escalation and technical authority | When necessary |
Troubleshoot and fix problems reported | When necessary |
Distribute software packages to client workstations | When necessary |
Update repair disk | After every change |
Review status messages for security violations | Weekly |
Amend access to SMS security functions | When necessary |
Create package for new software application | When necessary |
Test software package created by software development team | After package is created, when necessary |
This section describes how SMS facilitates the service management functions (SMFs) defined in MOF. The table below provides a summary of how the product facilitates each. It is followed by greater description of each of the applicable SMFs and how you can use SMS to carry out the SMF processes.
SMS assists customers in managing their environments. SMS is specifically aimed at customers who want to manage large numbers of personal computer devices and who want to perform change, configuration and release management tasks.
| MOF SMF | How the Product Applies |
Operations | SMS assists operations by providing extensible inventory, including historical information. This allows operations staff to track the changes that are made to the system. |
System Administration | SMS helps systems administration staff make changes to a large group of devices. SMS calls these groups collections. |
Security Management | SMS provides no support for this SMF. |
Service Monitoring and Control | SMS provides rudimentary support for Service Monitoring and Control by utilizing HealthMon, a tool that is bundled with SMS. |
Job Scheduling | SMS assists Job Scheduling by distributing and scheduling jobs on computing devices. |
Network Administration | Microsoft Network Monitor provides assistance for network administration. |
Directory Services Administration | SMS provides no support for this SMF. |
Print and Output Management | SMS provides no support for this SMF. |
Storage Management | SMS provides no support for this SMF. |
Support |
|
Service Desk | SMS remote control and inventory assist service desk in simple problem resolution and device identification. |
Incident Management | SMS remote control and inventory assists in incident management and problem resolution. |
Problem Management | SMS inventory assists in problem management. |
Optimizing |
|
Service Level Management | SMS provides no support for this SMF. |
Financial Management | SMS provides no support for this SMF. |
Capacity Management | SMS provides little support for this SMF; however inventory history provides some historical data which you can use for trend analysis. |
Availability Management | SMS provides no support for this SMF. |
Service Continuity Management | SMS provides no support for this SMF. |
Workforce Management | SMS provides no support for this SMF. |
Changing |
|
Change Management | SMS provides a status database which tracks changes from implementation to completion. SMS will also allow administrators to apply changes through software distribution. |
Release Management | SMS can facilitate the deployment of software to the client desktop. |
Configuration Management | SMS provides tools that identify configuration items (CIs) and serves as a repository for CI details. |
This section offers detail about operational tasks that SMS 2.0 performs and how they match up to the MOF SMFs.
The purpose of System Administration is to ensure that day-to-day tasks required by the IT infrastructure are carried out in the most efficient manner possible in order to continue to meet the service level requirements defined in the service level agreement (SLA). These day-to-day tasks include Security Management, Service Monitoring and Control, Job Scheduling, Network Administration, Directory Services Administration, Print and Output Management, and Storage Management.
SMS provides a number of tools and features that assist in System Administration. The following are key tools:
| Tool | Description |
Remote Tools | SMS Remote Tools is a suite of complementary applications that helps you to deliver help desk assistance to your desktop clients without having toactually travel to the client computer. |
Inventory | SMS hardware and software inventory provides information about the hardware and software that is installed throughout your SMS site. With hardware and software inventory, you can determine information such as how many computers you have in your organization and the configuration of these computers. |
Network Trace | Network Trace graphically displays the connections between site systems and network devices such as routers and hubs. |
Trace32 | Trace32 is an advanced log file viewer provided with SMS that enables real time manipulation of log files. |
SMS Installer | SMS Installer provides drag-and-drop functionality for constructing executable files that perform a number of required tasks. Many system administrators use this tool to streamline their operations. |
Security is an important part of system infrastructure. An information system with a weak security foundation eventually experiences a security breach. Examples of security breaches include such items as data loss, data disclosure, loss of system availability, and corruption of data. Depending on the information system and the severity of the breach, the results could include embarrassment, loss of revenue, or loss of life.
The primary goals of security are to ensure:
| • | Data confidentiality. Only authorized individuals should be able to view data. |
| • | Data integrity. All authorized users should feel confident that the data presented to them is accurate and not improperly modified. |
| • | Data availability. Authorized users should be able to access the data they need, when they need it. |
SMS provides a number of tools and features that assist in System Administration. The key ones are described below:
| Tool | Description |
Remote Tools | SMS Remote Tools is a suite of complementary applications that helps you to deliver help desk assistance to your desktop clients without having toactually travel to the client computer. |
Inventory | SMS hardware and software inventory provides information about the environmental systems and how they are configured. |
Network Trace | Network Trace graphically displays the connections between site systems and network devices such as routers and hubs. |
Trace32 | Trace32 is an advanced log file viewer provided with SMS that enables real time manipulation of log files. |
SMS Installer | SMS Installer provides drag-and-drop functionality for constructing executable files that perform a number of required tasks. Many system administrators use this tool to streamline their operations. |
Data center operations is at the center of IT and bears the responsibility of ensuring that all IT services are delivered according to the specific SLAs. Service monitoring allows operations personnel to ensure not only that the service is currently meeting its SLA, which is reactive monitoring, but also that it is likely to meet its SLA in the near future, which is proactive monitoring.
SMS provides some solutions for service monitoring and control; however this solution is not complete and requires additional tools from third party vendors:
| Tool | Description |
SNMP Event to Trap Translator | The SNMP Event to Trap translator converts Windows Event Log messages into SNMP Traps. These can be collated and displayed by using an SNMP Trap utility such as HP OpenView. |
Microsoft Healthmon | Healthmon provides basic monitoring of core services. |
The job scheduling service management function (SMF) is concerned with ensuring the efficient processing of data at a predetermined time and in a prescribed sequence to maximize the use of system resources and minimize the impact to online users. A batch process is a system interaction with a database that runs in the background and in a sequential manner without interaction from an end user. Batch processes can be automated or manually initiated. Batches are usually run after business hours when user interaction with the system is low.
Although SMS is not designed to perform this process, you can use SMS software distribution components, such as packages, programs and advertisements, for advanced job scheduling. Therefore, SMS software distribution can support this SMF.
Network administration ensures that the network operates efficiently at all times to avoid negatively affecting the operation of the enterprise. Network administration is responsible for a reliable, consistent, and scalable network infrastructure that meets or exceeds service levels, and optimizes enterprise assets.
SMS incorporates two tools or features to assist in network administration. These are:
| Tool | Description |
Network Monitor | Network Monitor is an advanced tool that provides solutions for network analysis, troubleshooting, and fault finding. |
Network Trace | Provides a simple graphical representation of the logical layout of the network and can be used by Network Administration |
The objective of the service desk is to provide a single point of contact for advice, guidance, and rapid restoration of normal services to its customers and users.
SMS incorporates two features to assist the service desk. These are:
| Tool | Description |
Remote Tools | Remote tools allow the service desk to control a users desktop to analyze and investigate an issue. |
Inventory | Provides service desk with information about the configuration of a problematic device. |
An incident is any event which is not part of the standard operation of a service which causes, or might cause, an interruption to, or a reduction in, the quality of service. The primary goal of Incident Management is to restore service operation within SLA limits as quickly as possible and to minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained.
The following SMS features assist with incident management:
| Tool | Description |
Network Monitor | Allows analysis of the live network. |
Inventory | Shows current configuration of devices. |
Remote Tools | Allows incident management staff to access user desktops through remote control to help diagnose problems. |
Capacity Management is responsible for ensuring that the capacity of the IT infrastructure matches the evolving demands of the business in the most cost-effective and timely manner. The process encompasses:
| • | Undertaking activities to tune the system to make the most efficient use of existing resources. |
| • | Understanding the demands currently being made for IT resources and producing forecasts for future requirements. |
| • | Influencing the demand for resources in conjunction with Financial Management. |
| • | Creating and maintaining a capacity plan that ensures existing and future business capacity requirements can be satisfied. |
SMS provides little assistance for capacity management, except that inventory of disk size history can be used to plot future requirements.
Release Management is the process of coordinating and managing releases to a live environment. The process includes planning, testing, and deployment. This process ensures that releases are implemented in the live environment as quickly as possible in order to meet business requirements. It also ensures that releases are implemented in a controlled and systematic way that limits negative impacts to the IT environment.
SMS provides software distribution to assist release management in planning, storing and deploying releases to the production environment. SMS does not provide extensive features for directly controlling releases.
Configuration Management is a critical process that is responsible for identifying, controlling, and tracking all versions of hardware, software, documentation, processes, procedures, and all other components of the IT environment. The goal of configuration management is to ensure that only authorized components, referred to as configuration items, are used in the IT environment and that all changes to configuration items (CIs) are recorded and tracked through the component lifecycle. The CI data controlled by configuration management is stored in the configuration management database (CMDB). The following are examples of CIs:
| • | Hardware |
| • | Software |
| • | Network equipment |
| • | Configurations |
| • | Processes |
| • | Procedures |
| • | Telephony equipment |
| • | Documentation |
| • | Service level agreements |
| • | Problem records |
SMS hardware and software inventory features assist administrators in configuration management. Although little tracking beyond history is provided, some configuration management requirements are adhered to.
Every business-critical system must be properly cared for to ensure its continued security, availability, and reliability. SMS is no exception to this rule. This section outlines the operational requirements of SMS. Explanations are provided for how to introduce change to it, how to monitor it, how to troubleshoot it, and how to optimize it for continued performance.
The following table is a reference to the tools used in the procedures described in this operations guide.
| Required Tools | Description |
Trace32 | Log file viewer. |
Microsoft SQL Server Enterprise Manager | Microsoft SQL Management Console. |
SMS Administrator console | SMS Management Console. |
Performance Monitor | Tool to analyze performance of Windows operating system components. |
Microsoft Network Monitor | Too to analyze packets moving across a network. |
SMS Status Viewer | Tool to analyze SMS specific status messages. |
SMS Service Manager | Tool to manage the state of SMS services and threads. |
Windows Explorer | Tool to manage files and directories. |
Windows NT Backup | Tool to make a tape or disk backup of files. |
User Manager for Domains or Active Directory Users and Computers | Tool to manage the security and addition/deletion of user and group accounts. |
Windows Event Log | Tool to analyze Windows event logs. |
Recommended Tools | Description |
Crystal Reports for Windows NT | Tool to perform in-depth analysis and reporting of WMI and event logs. |
Microsoft Office | Tool for producing reports, databases, presentations and spreadsheet. |
Visual Basic Scripting Edition | Tool for automating common administrative tasks. |
The following checklists provide a quick reference for those product maintenance tasks that must be performed on a regular basis. These task lists are a summary of the tasks that are described in subsequent sections of this guide. They are limited to those tasks required for maintaining the product.
| Daily Tasks |
Execute Database Consistency Checkers |
Check SQL Server Error Logs |
Monitor network for performance or error related conditions |
Monitor client status |
Monitor site components and service status |
Monitor site system status |
Monitor event logs on key servers |
Monitor package and advertisement status |
Monitor system performance |
Monitor system directories (logon servers) |
Make a secure backup of each Systems Management Server site |
Weekly Tasks |
Check system directories (site servers) |
Monthly Tasks |
Review access to SMS functions |
Confirm site recovery from backup media |
| As-Needed Tasks | |
Task | When to Perform |
Troubleshoot and fix problems reported | As required |
Amend access to SMS security functions | When personnel are added or removed from the team. |
This section outlines the tasks that must be performed to maintain SMS. These tasks are arranged in the form of a lifecycle in order to make it easier to think about these complex issues. This lifecycle is as follows:
Operating
Managing SMS operations on a day-to-day basis, responding to events and performing regular tasks.
Supporting
Responding to problems and managing these through to completion.
Optimizing
Improving the services delivered by SMS where it is possible.
Changing
Introducing change into the system in a controlled and managed way.
It is critical that SMS, is managed in a very controlled way. Distributed applications, such as SMS, require the structured approach to management that this model delivers.
This section explains the regular activities required by SMS to ensure its continued smooth operation in accordance with service level requirements defined in the SLA.
Weekly Task: | Run team meetings. | ||||||||||||||||
Description: | Run a weekly team meeting to plan activities for the coming week and provide a forum for good communications between all team members. Review open calls, decisions by the change review board, and performance against agreed service levels. Identify issues, outages or deviations from expected or agreed service levels and identify why these have occurred. Agree to an action plan with the team or other involved people to deal with issues. | ||||||||||||||||
Process: | The meeting should run at the same time and on the same day of each week. An agenda should be issued prior to the meeting and a documented set of actions (with owners and deadlines) distributed after the meeting. The SMS Manager should ensure that assigned actions are carried out by the agreed-upon date. Solicit feedback from the team about the effectiveness of the meeting and make appropriate improvements. | ||||||||||||||||
Monthly Task | Compile a monthly status report. | ||||||||||||||||
Description: | Produce a status report and share it with operational team members, senior management, and user representatives. | ||||||||||||||||
Process: | The monthly status report should include the number of open and closed calls, performance against service levels, any outstanding issues, and major activities planned for the next month. | ||||||||||||||||
Monthly Task | Complete one-on-one meetings with team members. | ||||||||||||||||
Description: | Spend time with each team member, reviewing activities and progress during the month. | ||||||||||||||||
Process: | The meeting should be used to review completed activities, to assess performance against objectives and agreed-upon metrics, to identify good work and areas for improvement, and to set new objectives (if needed) and review training requirements. | ||||||||||||||||
As-Needed Task | Team brainstorming. | ||||||||||||||||
Description: | Run occasional facilitated team events to allow for the free flow of ideas and identify improvements in the working practices | ||||||||||||||||
Process: | Arrange for professional facilitation of event. | ||||||||||||||||
Daily Task | Run database consistency checkers. | ||||||||||||||||
Description: | Although Microsoft SQL Server 7.0 performs a number of maintenance functions automatically, it is a good practice for administrators to run database consistency checks (DBCC) against data within the Systems Management Server database. | ||||||||||||||||
Process: | The SQL Analyst should run the following DBCC commands against the SMS site database and (if present) the software metering database on a daily basis. DBCC CHECKDB Checks the allocation and structural integrity of all the objects in the databases. DBCC CHECKCATALOG Checks for consistency in and between system tables in the specified database. DBCC NEWALLOC Checks the allocation of data and index pages for each table within the structures of the databases. If the SMS site database log or the software metering database log are configured for static, rather than dynamic, sizing, the SQL Analyst should run DBCC SQLPerf (Logspace) to get statistics about the use of transaction-log space in all databases. | ||||||||||||||||
Automation: | The SQL analyst can use the SMS schedule SQL commands feature to run valid SQL commands against the SQL Server. For more information about this feature, see the Systems Management Server 2.0 Administrators Guide. | ||||||||||||||||
Daily Tasks | Check SQL Server error logs . | ||||||||||||||||
Description: | The SQL Server agent creates an error log which, by default, records errors and warnings. Because Systems Management Server depends on SQL Server running smoothly, the error log should be monitored regularly for problems. SQL Server maintains up to nine SQL Server Agent error logs, each of which has an extension indicating its age. | ||||||||||||||||
Process: | The error log should be investigated for specific SQL Server errors. Three types of messages are displayed in the SQL Server Agent error log: informational, warning and error. Error messages usually require intervention by an SQL Analyst, but a prudent analyst will also monitor the warning messages. The SQL analyst can configure SQL to send error messages to a specific user or computer by network popup. | ||||||||||||||||
Daily Task | Monitor the network for performance or error related conditions. | ||||||||||||||||
Description: | Run daily captures on the network to identify issues or conditions such as excessive TCP retransmissions. This complex task can be simplified with Network Monitor, which listens on the network for specific events. | ||||||||||||||||
Process: | Currently there are six experts available to help analyze traffic that you gather. Average Server Response Time Uses server message block (SMB), network control block (NCB), any specified TCP port (such as HTTP, Finger, and FTP), and any specified IPX socket to calculate the response time (how many seconds it takes for a server to respond to a client request for data). Property Distribution Gathers statistical information for the selected protocol property in the current capture, including the number and percentage of frames. Protocol Coalesce Merges data from frames that are part of the same transaction. A portion of data sent from one computer can be split into several smaller fragments to be sent over the network and reassembled at the destination. Protocol Distribution Calculates statistics for the protocols used in a capture. TCP Retransmit Finds all the TCP frames in the capture that have been transmitted to the same computer more than once, and can be used to find computers that are having network connection problems. Top Users Helps you determine who is using the network, or which computers might be causing problems such as broadcast storms. At least one computer (preferably more) should be configured to run Network Monitor and placed within the network topology at locations where they can monitor and gather valuable data. Suggested locations are on a critical segment(s) of the network or on either side of a router. | ||||||||||||||||
Daily Task | Monitor client status. | ||||||||||||||||
Description: | SMS clients use status messages to indicate their current condition and whether tasks, such as software and hardware inventory processing, software distribution, and software metering, have been completed. The event screener must monitor these messages to see if clients are experiencing problems. | ||||||||||||||||
Process: | Unlike site components, the SMS Administrator console does not give a visual indication of client status. The event screener must run status message queries to establish whether any client devices are reporting problems. The queries should be run at the central site or management site at the start of each working day. It might be beneficial to create collections based on these queries and have them re-evaluate membership on a daily basis. The following standard queries are provided within the SMS Administrator console: Client components experiencing fatal errors Clients that failed to create a hardware inventory (MIF) file Clients that failed to create a software inventory (Sinv) file Clients that failed to install <component name> Status message queries are based on the fact that a client device could create a status message and that this message has eventually reached the central site server or management site server at the top of the Systems Management Server hierarchy. SMS clients can fail to create status messages for a number of reasons such as insufficient access rights to client access point(s) or failure of copy queue functionality. To identify these devices, the event screener must write a program to return all client devices that have not reported a status message within the last x days, where x is the normal frequency of hardware or software inventory plus an allowance for the propagation delay for messages to reach the management site. | ||||||||||||||||
Automation: | SMS status filter rules can be used to write messages to the Windows NT event log (on the site server) whenever certain client status message are received. The SNMP Event to Trap translator can then be used to forward that event to an SNMP management console. It is also possible to launch a program on receipt of a status message. For more information about the status system and status filter rules, see the Microsoft System Management Server 2.0 Administrator's Guide. | ||||||||||||||||
Daily Task | Monitor site components and service status. | ||||||||||||||||
Description: | Monitor the state of site server components and services throughout the SMS hierarchy and identify problems whenever services fail or generate a defined number of warning or error messages. | ||||||||||||||||
Process: | The status (event) monitor must regularly monitor the status indicator display for each site service to determine if it is operating in the expected way. The SMS Administrator console should be connected to the central site server or management site server so that the display shows a comprehensive summary of the entire SMS hierarchy. The failure of any site system in the hierarchy changes the status of the comprehensive summary and alerts the status monitor, who can then look in the hierarchy to find the site component or service that is failing. The information displayed in the SMS Administrator console at the management site or central site might be out of date because of the time needed for status message summaries to be replicated up to the management site or central site from child sites. The actual state of the service or thread component might not be accurately reported in the SMS Administrator console because, by default, the SMS Component Summarizer automatically resets a failed component to a status of OK at 12:00 midnight. If the status monitor is monitoring only the graphic display, serious errors can be missed. The number of error and warning messages received is not reset by the SMS Component Status summarizer, so the status monitor should check these counts in addition to the graphical display to find errors. To impose a level of management discipline, the automatic process of clearing warning conditions should be disabled, and status (event) screeners should manually clear state and error counters once problems have been resolved. Status indicators in the SMS Administrator console change only when a set number of warning or error messages have been received. It may take some time to trigger the change of status from normal to warning (or critical) and as a result, the status messages leading up to the event might have been deleted. If the status system has not been configured to send status messages to the management site, the status monitor must connect to the site containing the failed system to obtain more detailed information. | ||||||||||||||||
Automation: | Whenever the SMS Component Summarizer discovers a problem, it sends a pop-up message to the site server console suggesting that the logged on user check the status system. This works if someone is logged on to the console at that time. Because most Systems Management Server site servers and component servers are in locked computer rooms, there is little chance than an operator will be logged on at the appropriate time and this can lead to unheeded warning message(s) and the continuance or growth of problems. The pop-up warning messages are generated as part of a status message filter rule that is fired whenever a service or thread component is placed into a critical or warning condition. To replace this default action, the status monitor must replace the net send console message with a command that sends an notification message to members of the site operators and status monitors groups. Having received the error message, the site operator should attempt to fix the problem or escalate it to site integrity support. Because an automated process might notify individuals before the problem is reflected in the SMS Administrator console, there must be a mechanism to ensure that the fault is not reported by status monitors when the SMS Administrator console display changes. | ||||||||||||||||
Weekly Task | Check system directories (site servers). | ||||||||||||||||
Description: | A backlog of files on the site server can indicate that a service or service component is not running or that the site server is too busy to process the files. If these situations are not monitored, data might be missing from the database, and server response times might degrade. Correcting backlogs can require more than merely restarting a particular service; backlogs can be caused by a system failure, by incorrect configuration, by site server or site components being inadequately sized to perform the task expected of them, or by incorrect positioning of site servers and site components. There are a large number of system directories, but each one must be checked on each site server within the SMS hierarchy. It is recommended that you check site servers at the bottom of the hierarchy first because there might not be a backlog higher up (if files are stalled at lower levels). To troubleshoot some issues, the administrator must enable logging for one or more SMS components. Because logging can cause processing backlogs by reducing site server performance up to 15 percent, it is recommended that logging be disabled when problems are resolved. | ||||||||||||||||
Process: | Check major system directories to verify that updates are occurring as expected and that file backlogs are not being created, including: Data Discovery Manager SMS\Inboxes\DDM.box If there is a backlog of DDR files in this directory, Discovery Data Manager could be processing changes to site boundaries or the parent site, or it could be sending inventory details to a new parent site. These changes take priority over normal DDR processing; normal DDR processing is suspended until they are processed. | ||||||||||||||||
Daily Task | Monitor System performance. | ||||||||||||||||
Description: | To check whether the site server and component servers have sufficient resources and that SMS site services are running optimally, the site operator must monitor site server and component server performance during the day, and should ensure that performance data is written to log files for later trend analysis and troubleshooting. | ||||||||||||||||
Process: | Monitoring performance on-line Primary site servers are key to the successful SMS operation. The SMS health monitor tool should be set up to poll each site and component server within the organization and display warnings and error messages on the management console whenever server performance falls outside normal parameters. This approach requires that the site operator regularly check the health monitor management console. Log performance for later analysis To allow systems administration staff to establish performance trends and identify potential bottlenecks, the site operator should ensure that the Windows NT Performance Monitor logging tool is installed on each server within the organization and that it is configured to capture performance data each hour and record it to a log file. Monitored objects should include: All Windows NT Servers Systems Management Server Site Servers | ||||||||||||||||
Automation: | Rather than have site operators check the SMS health monitor console regularly, third-party management tools or Windows NT performance monitor alerts can be used to send operators on-line messages, e-mail or other notifications. | ||||||||||||||||
Weekly Task | Check system directories (site servers). | ||||||||||||||||
Process: | Client Configuration Manager SMS\Inboxes\Ccm.box This is used to store configuration requests for SMS component installation where the logged-on user does not have sufficient access rights; it can also be created by DDM for a client that is to be installed using Windows NT remote client installation. A backlog might indicate that CCM is not running or is too busy to process the requests. SMS\Inboxes\Ccrretry.box A backlog of files in this directory might indicate that CCM is unable to find an account with sufficient (administrator) privileges on a number of client devices. If these devices are not part of the domain or should be excluded from the site, consult the resource kit for information about how to prevent CCM from attempting installation. Replication Manager SMS\Inboxes\ReplMgr.Box\Outbound Files are placed in this folder by any SMS component requiring replication. A backlog indicates that Replication Manager has been stalled or is too busy to process further requests. Outstanding requests might also be backlogged in the high, normal and low priority folders below this parent folder. Scheduler SMS\Inboxes\Schedule.box\ToSend At certain times, the ToSend folder might contain a large number of files that appear to be backlogged. This condition can be ignored when the scheduler is required to communicate with and send files to a large number of sites. But a backlog of files more than two weeks old might indicate a problem with the scheduler or with communication links to the destination site. Inventory Processor SMS\Inboxes\Inventory.Box Inventory information from clients (placed here by Inbox Manager Assistant). A backlog might indicate that the inventory processor has stalled or is too busy to process any further requests. | ||||||||||||||||
Weekly Task | Check system directories (site servers). | ||||||||||||||||
Process: | Inventory Data Loader SMS\Inboxes\Dataldr.box MIF files that are to be processed are stored here until they are moved into the process directory. A backlog might indicate that the inventory data loader is stalled or is too busy to process files. SMS\Inboxes\Dataldr.box\Process Files that the Data Loader is unable to process are moved to the BADMIFS directory. A backlog might indicate that the SQL Server is unavailable, or that the Data Loader is too busy to process the files. If the folder contains a number of very large files, this might suggest that hardware inventory is collecting too much detail. SMS\Inboxes\Dataldr.box\Process\BadMifs The Inventory Data Loader places MIF files that cannot be processed here. It should always be empty. SMS\Inboxes\Dataldr.box\Process\Orphans If the client that sends a MIF file is not yet in the SMS site database, the Inventory Data Loader moves it to \Orphans. The Inventory Data Loader creates a DDR for Discovery Data Manager to process, so that the discovery data can be added to the SMS site database. If there are many files in \Orphans, Inventory Data Loader might be busy creating DDRs. Inventory Data Loader attempts to process MIF files in the \Orphans directory every 10 minutes. Software Inventory SMS\Inboxes\Sinv Software inventory files are held here prior to being processed. A backlog of files might indicate that software inventory processor is stalled or too busy to process additional files. SMS\Inboxes\Sinv\badsinv The location where software inventory processor places files that are incorrectly formatted or that it has been unable (after three attempts) to load into the SMS site database. This directory should always be empty unless the SMS site database has been offline. SMS\Inboxes\Sinv\Orphans Software Inventory Processor moves an inventory file to \Orphans if the client that sent the file is not yet in the SMS site database. | ||||||||||||||||
Weekly Task | Check system directories (site servers). | ||||||||||||||||
Process: | Status Messages Status message processing can also be affected by client and server settings. For instance, an administrator can flood the system if all detailed messages are enabled. A status message backlog can also be created if the administrator has defined a program which is fired by most status messages (through a status filter rule) and this takes more than a few second(s) to run. Status processing stops while the program is executing, up to 60 seconds per program and this delay can cause a huge backlog in large environments. SMS\Inboxes\Statmgr.box\Queue (thread component) If there are a large number of files in the directory, use the Component Status summary in the SMS Administrator console to see if a thread component on the site server is flooding the status system by rapidly reporting the same status message over and over. If you discover a flood, create a status filter rule that instructs Status Manager to discard the flooding status message and search the Microsoft Knowledge Base for more information about the problem. SMS\Inboxes\Statmgr.box\Statmsgs An extremely large number of small files (tens of thousands or hundreds of thousands smaller than 1KB) might indicate that a component is flooding the status system. If you suspect a flood, create a status filter rule that instructs Status Manager to discard the flooding status message, then search the Microsoft Knowledge Base for more information about the problem. A large number of large files (thousands or tens of thousands larger than 1KB) indicates that Status Manager is receiving many status messages replicated from child sites. If necessary, you can modify the status filter rules at the child sites so that unimportant status messages are not replicated to the parent site. SMS\Inboxes\Statmgr.box\Futureq A large buildup of *.svf files in the SMS\Inboxes\Statmgr.box\Futureq directory probably indicates that many SMS clients at the site have system clocks set substantially ahead of the site servers system clock. Correct this problem by synchronizing all computer system clocks in the organization to the correct time. | ||||||||||||||||
Weekly Task | Check system directories (site servers). | ||||||||||||||||
Process: | SMS\Inboxes\Statmgr.box\Retry Status Manager is designed to process incoming status messages even when the SMS site database cannot be accessed. While the site database is inaccessible, Status Manager stores pending database transactions as *.sql files in the SMS\Inboxes\Statmgr.box\Retry directory; these text files should not be deleted or altered in any way. Status Manager periodically retries the oldest pending transaction. You can force Status Manager to retry it by stopping and restarting Status Manager using the SMS Service Manager. After the oldest pending transaction succeeds, Status Manager rapidly runs all of the pending transactions chronologically, from oldest to newest. SMS\Inboxes\Statmgr.box\Outbound Status messages intended for replication to the parent site are placed in this directory. A build up does not necessarily indicate a problem with replication manager or status manager, because status replication priority levels might be set to low or medium. But if the folder contains files more than a week old or the number of files continues to build up over time, it could indicate that replication manager has a problem or an SMS component is flooding the status system. SMS\Inboxes\Compsumm.box A build up of files in this directory over time might indicate that status summarizer is stalled or cannot cope with demand. SMS\Inboxes\Compsumm.box\repl Status summaries intended for replication to the parent site placed in this directory. A build up does not necessarily indicate a problem with replication manager or status manager, because summary replication priority levels might be set to low or medium. But if the folder contains files more than a week old or the number of files continues to build up over time, it could indicate that replication manager has a problem. | ||||||||||||||||
Automation: | There are so many directories to check that this task can be very labor intensive. In Systems Management Server implementations with hundreds of sites, it might not be practical to check them manually, so you must find an alternative method. You must use third-party tools to perform this task because no Microsoft products are available. The tool must check each directory, looking for file backlogs, files older than a set date, or files in directories reserved for incorrectly formatted or invalid files. Any third-party tool you select should have the capability to send on-line messages, e-mail or other notifications to the site operator when system directories fill up with old or unprocessed files. | ||||||||||||||||
Weekly Task | Monitor system directories (client access points). | ||||||||||||||||
Description: | Although there are several important directories on client access points (CAPs), it is the responsibility of the Inbox Manager Assistant running on the client access point (Windows NT) or Inbox Manager running on the site server (Novell NetWare) to copy these files to the appropriate directories on the site server. Backlogs might be caused by a system failure, by incorrect site server configuration, or by CAPs that are inadequately sized to perform correctly or are incorrectly positioned. | ||||||||||||||||
Process: | On CAPs that are running Windows NT, a backlog of files might indicate that the Inbox Manager Assistant service is not running on the CAP or that it is unable to contact the SMS site server due to network or connectivity issues. On CAPs that are running Novell NetWare, a build-up of files over time might indicate problems with the Inbox Manager service running on the site server or the fact that the site server is unable to contact or communicate with the Novell NetWare server. A backlog in any of the following directories indicates that client data is missing from the SMS site database. The status system should report situations where the site server is unable to contact a CAP. CAP_<site code>\statmsgs CAP_<site code>\inventory.box CAP_<site code>\sinv.box CAP_<site code>\ccr.box | ||||||||||||||||
Automation: | The number of CAPs within an organization might make this task very labor intensive. In SMS deployments with hundreds of servers, it is impractical to check each CAP manually, so you must find an alternative method. You must use third-party tools to perform this task because no Microsoft products are available. The tool must check each directory, looking for file backlogs, files older than a set date, or files in directories reserved for incorrectly formatted or invalid files. Any third-party tool you select should have the capability to send on-line messages, e-mail or other notifications to the site operator when system directories fill up with old or unprocessed files. | ||||||||||||||||
Daily Task | Monitor system directories (logon servers). | ||||||||||||||||
Description: | When a user logs on to the network and network discovery is enabled, clients write a data discovery record to the DDR directory on the logon server. Windows NT logon points The logon discovery agent (a thread of SMS_EXECUTIVE) running on the Windows NT domain controller copies data discovery records to the site server. Novell Netware logon points The Inbox Manager thread running on the SMS site server polls Novell NetWare servers every day (default) to obtain data discovery records generated by Novell NetWare clients. Backlogs on these servers might be caused by a system failure, incorrect site server configuration, network communication issues, or incorrectly positioned logon points. | ||||||||||||||||
Process: | SMSLOGON\DDR.Box Clients discovered by Systems Management Server logon discovery place a data discovery record in this directory. A backlog of files might indicate network communication problems or issues with services running on the site server or logon point (Windows NT only). It might also indicate that the Novell NetWare logon point polling interval is too long. | ||||||||||||||||
Automation: | The number of logon points within an organization might make this task very labor intensive. In SMS implementations with a large number of logon points it is impractical to check them manually and you must find an alternative method. You must use third-party tools to perform this task because there are no Microsoft products available. The tool must check each directory in turn, looking for file backlogs, files older than a set date, or files in directories reserved for incorrectly formatted or invalid files. | ||||||||||||||||
Daily Task | Monitor site system status. | ||||||||||||||||
Description: | Monitor the state of site systems throughout the SMS hierarchy and identify when available disk space falls below a set value or when site systems fail and are placed offline. | ||||||||||||||||
Process: | The site system status summarizer running on the site server polls each site system every 60 minutes to determine that it still is online, that it has sufficient disk space, and that the SQL Server database has sufficient free space for software metering database servers or primary site servers. The status (event) monitor must regularly monitor the status indicator display for each site service to determine if it is operating in the expected way. The SMS Administrator console should be connected to the central site server or management site server so that the display shows a comprehensive summary of the entire SMS hierarchy. The failure of any site system in the hierarchy changes the status of the comprehensive summary and alerts the status monitor, who can then look in the hierarchy to find the site component or service that is failing. The information displayed in the SMS Administrator console at the management sites or central sites might be out of date because of the time needed for site system summaries to be replicated up from child sites. Network communication problems or insufficient access rights might also prevent this information from being displayed correctly. If the status system has not been configured to send status messages to the management site, then the screener must connect to the site containing the failed system to obtain more detailed information. | ||||||||||||||||
Automation: | Whenever the site system status summarizer discovers a problem, it sends a pop-up message to the site server console suggesting that the logged on user check the status system. This works if someone is logged on to the console at that time. As most SMS site servers and component servers are in locked computer rooms, there is little chance that an operator is logged on at the appropriate time. This can lead to unseen warning messages which can lead to a server component being in a critical or non-functional condition because, for example, it runs out of disk space. The pop-up warning messages are generated as part of a status message filter rule that is fired whenever a system component is placed into a critical or warning condition. To replace this default action, the status monitor must replace the net send console message with a command that sends a message to members of the site operators and status monitors groups. Having received the error message, the site operator should attempt to fix the problem or escalate the issue to site integrity support. Because an automated process might notify individuals before the problem is reflected in the SMS Administrator console, there must be a mechanism to ensure that the fault is not reported by status monitors when the display changes. | ||||||||||||||||
Daily Task | Monitor package and advertisement status. | ||||||||||||||||
Description: | For each package and advertisement distributed by the site operator, check that source files reach distribution point(s) and that clients receive the advertised program and run it successfully. | ||||||||||||||||
Process: | To perform this task, it is expected that the SMS Administrator console is connected to the central site server or management site server so that the display shows a comprehensive count of the number of status messages received for each package and advertisement. As messages are received, the status monitor can look in the hierarchy to find more detailed information. Monitoring Package Distribution An advertisement is not available to clients until the package has been successfully placed on at least one distribution point within the site. The message counts reported by the Package Status Summarizer should be monitored to make sure package distribution is successful. Monitoring Advertisements The message counts reported by the Advertisement Status Summarizer should be monitored to see whether the advertised program reached the target workstations, whether they rejected it (because of an unsupported client computer platform or an expired advertisement) and whether it ran successfully. The information displayed in the SMS Administrator console at the central site or management sites might be out of date because of the time needed for site system summaries to be replicated up from child sites. Network communication problems or insufficient access rights might also prevent this information from being displayed correctly. If the status system has not been configured to send status messages to the management site, then the status monitor must connect to the site containing the failed advertisement or package to obtain more detailed information. More information on package and advertisement status reporting can be found in the Microsoft Systems Management Server 2.0 Administrator's Guide. | ||||||||||||||||
Automation: | The status summarizers show information only for clients and distribution points that have reported a status message. In some cases, clients might be switched off or network communication problems might have prevented status messages from reaching the site server. To determine which clients have not (yet) returned advertisement status messages, the screener must check status messages for each target workstation. To achieve this for anything other than small distributions, it is necessary to use Microsoft Office tools or a separate application program. | ||||||||||||||||
Daily Task | Monitor event logs on key servers. | ||||||||||||||||
Description: | The Windows NT system, application and (if auditing is enabled) security event logs are often the first place checked during troubleshooting operations. Monitoring the logs enables operational staff to identify and deal with server related problems more efficiently. | ||||||||||||||||
Process: | Windows NT event logs are used to record detailed information about hardware, software and security events on workstations and servers. There are three logs— application, system and security— although the security log is accessible only by users with administrator privileges. The status monitor must check the Windows NT event logs daily on all key SMS site and component servers to see if problems are developing, and must report all critical problems and warnings related to system security to the person responsible for troubleshooting the site. The status system can be configured to write SMS event(s) to the Windows NT event log, so the event logs for all SMS site servers and component servers should be configured as follows:
Auditing of user account (access) problems and system failures must be enabled through user manager for domains or group policies. The status monitor should save logs files to disk at the end of each week and they should be retained for at least a month. After the logs have been saved, they should be cleared to make it easier to spot new problems and to reduce the possibility that events are not lost by the process of overwriting. | ||||||||||||||||
Automation: | The number of servers running Windows NT Server in medium to large organizations makes daily monitoring of event logs impractical. The SMS design and operational teams should therefore define which error message(s) require attention by support staff, and configure the SNMP event to trap translator client agent so that it forwards those messages to an SNMP management console. | ||||||||||||||||
Weekly Task | Manage file system. | ||||||||||||||||
Description: | To keep the site servers file system and database from filling up with unused and outdated files or data, some tidying is necessary, particularly of duplicate machine information, bad MIFs, aged inventory and aged collected software inventory files. | ||||||||||||||||
Process: | Duplicate computer information Duplicate computer information can occur when computers are cloned. SMS 2.0 does not support the cloning of SMS client installations because each client must have a unique ID assigned during the installation phase, and transferring IDs can result in inconsistent inventory and software distribution. If you suspect that cloning has occurred, follow these steps:
Bad MIFs As part of the hardware inventory process, extra information can be sent from client computers to the site server database in the form of MIFs or IDMIFs that can be placed on the site server to add inventory information to specific clients. MIFs that cannot be processed correctly are placed in the SMS\inboxes\dataldr.box\BADMIFS directory. This directory must be emptied manually. If a large number of MIFs build up here, it is likely that another MIF generation process is producing incorrectly formatted files. Investigate the origin and cause of the bad MIFs. Aged inventory and aged collected files By default, inventory history and files collected as part of software inventory are kept for 90 days, after which they are deleted by automated scheduled tasks. Periodically compare how much database space this information occupies against its usefulness, and consider reducing its period of retention with Scheduled Tasks. | ||||||||||||||||
Monthly Task | Secure system accounts. | ||||||||||||||||
Description: | In many networked environments, passwords and, sometimes, service account names must be changed at regular intervals. When changing SMS passwords and account names, you must follow correct procedures to prevent services from failing due to authentication problems. The SMS Security Manager can change these SMS passwords and service account names:
For more information, see the SMS Security Essentials white paper at http://www.microsoft.com/smserver/techinfo/ deployment/20/deploysms/secessentials.asp for details on individual accounts, and how they are configured and used. | ||||||||||||||||
Process: | SMS service account Change the SMS Service account by running setup from the original medium and selecting the update option. This method is preferable to changing the password in the SMS Administrator console, because it changes the account in the domain security database in addition to within SMS, and it verifies the account information and ensures that the correct rights are granted to the account. Software Metering service account The Software Metering service account must first be changed in the domain and then in the SMS Administrator console. The new account details are used the next time the software metering service starts. Windows Networking Site System Connection account Create a new user account in the domain and add this to each SMS site as a Windows Networking Site System Connection account. This results in two such accounts. After the changes have been accepted by all servers in the SMS hierarchy, delete the old connection account from each site, then retire the account from the domain security database. | ||||||||||||||||
Monthly Task | Secure system accounts. | ||||||||||||||||
Process: | Site address account The site address account/password details must first be changed in the domain and then in the SMS Administrator console. If the same account is used for many senders, they must all be changed. This might result in a period where the account details in SMS do not match those in the domain security database, resulting in failures to connect to other sites. it can also cause delays in sending information between sites. To overcome this, create a new sender account and migrate the senders to the new account over a period of time before retiring the original account. With this approach, the inboxes to which the senders write must have permissions for both accounts. The site address account usually has low user rights (only enough to write to the inbox of the adjacent site), so it might not be necessary to change the account password frequently. Remote Client Installation Account The remote client installation account/password details must first be changed in the domain Security Accounts Manager (SAM) database and then in the SMS Administrator console. If the same account is used in many sites, they must all be changed. This might result in a time during which the account details in SMS for some sites do not match those in the domain security database, resulting in authentication failures. Client Network Connection Account Multiple client connection accounts should be defined and rotated regularly as described in the SMS Security Essentials white paper at http://www.microsoft.com/smserver/techinfo/ deployment/20/deploysms/secessentials.asp Client Software Installation Account The client software installation account/password details must first be changed in the domain SAM and then in the SMS Administrator console. If the same account is used in many sites, they must all be changed. This might result in a time during which the account details in SMS for some sites do not match those in the domain security database, resulting in authentication failures. Server Connection Account If a server connection account was specified during site setup (from the command line), you must first change the account in the domain security database. Then you must run SMS Setup again and specify the new account details. If the same account is used in many sites, they must all be changed. This might result in a time during which the account details in SMS for some sites do not match those in the domain security database, resulting in authentication failures. |
Any process, system, application, or service will have problems that arise after operation begins. The support and operations staff must identify, assign, and resolve problems quickly to meet the requirements set forth in the SLAs. This section outlines the tasks that must be performed in order to determine the cause of the failure and restore correct operation.
As-Needed Task | Provide third-line escalation and technical authority. | ||||||
Description: | It is inevitable that the SMS design will require changes, either as the result of lessons learned after the system is operational or as a result of business or organizational changes. If new business locations are added to the corporate network, new sites must be added to SMS hierarchy; management changes might require the implementation of additional sites or security measures; performance issues might require additional SMS components. The site troubleshooter and SMS Manager should always be able to ask the technical architect to provide technical guidance and support. Input from the technical architect is generally required when problems require escalation or whenever major changes are required to the SMS infrastructure. | ||||||
Process: | A change control meeting should be held when the operational team identifies that changes must be made to the SMS hierarchy. The technical architect should be present to discuss the business requirements, review (and approve, if appropriate) the amendments to the design, and update the design documentation to reflect the change being made. If there is a requirement for significant change to the design, the change control meeting should commission a formal project to develop an appropriate solution. | ||||||
As-Needed Task | Troubleshoot and fix problems reported. | ||||||
Description: | Whenever a problem is encountered within the SMS infrastructure that cannot be handled by current staff, it must be escalated to the troubleshooter. In addition to providing fixes, the troubleshooter must document problems and their solutions for future reference. The troubleshooter might have limited time to spend on a problem before escalating it up the management chain. | ||||||
Process: | When a problem occurs, the troubleshooter's first priority is to restore regular service as quickly as possible; finding the cause to prevent a recurrence is important, but secondary. Analysis of the root cause might result in:
Incidents which are reported to the troubleshooter must be recorded to ensure that repeat occurrences of the same problem can be found in the knowledge base and the documented fix applied. |
Business needs change, technology advances, and markets fluctuate. As these external influences affect a system, changes will likely be required. The Optimizing section outlines the steps that must be taken to optimize SMS for increased security, reliability, and availability. This is accomplished through managing and negotiating service levels and the evaluation of several key operational metrics in the managed environment. These include items such as capacity, throughput, response times, saturation levels, availability, cost, and revenue. With a thorough evaluation and understanding of these operational attributes, the IT staff moves from merely running a system to proactively managing a service solution.
Weekly Task: | Meet with business managers. |
Description: | Maintain a weekly meeting with business managers to discuss service levels, business requirements, and open issues |
Process: | Issue an agenda before the meeting and a documented set of actions (with owners and timescales) afterwards. The SMS Manager should ensure that assigned actions are carried out by the agreed date. |
As-Needed Task | Complete feedback survey. |
Description: | Solicit feedback from the user community to identify users' true perception of the quality of service provided. |
Process: | Agree on a set of questions with the operational team, senior management and user managers. Give users advanced notice of the survey then publish the survey on an intranet or distribute it in e-mail. Analyze results and issue a report describing the results and any required actions. |
Monthly Task: | Review system health and performance. |
Description: | Review and analyze data captured from performance logging tools running on each SMS site system. The logs should provide sufficient detail for the troubleshooter to establish and identify trends, bottlenecks and problems with system resources, such as memory or processor. This task assumes performance logging tools have been configured to capture data from each Systems Management Server site system. |
Process: | The troubleshooter can use Microsoft Windows NT Performance Monitor, Microsoft Excel, and other tools to examine the log files looking for trends, bottlenecks, resource issues, and memory leaks. The counters that are most useful to check in an SMS environment are: Windows NT Counters Systems Management Server Counters Other objects and counters can and should be monitored to obtain a more detailed picture of Windows NT functionality. |
Monthly Task: | Review system health and performance. |
Process: | Here are the counters to monitor and the states that might require remedial action. Memory: Committed Bytes The size of virtual memory that has been committed (rather than reserved). Committed memory must have disk storage available, or there must be enough physical memory so that disk storage is not required. Notice that this is an instantaneous count, not an average over the time interval. Lower is better. If the value is consistently less than the amount of physical RAM, then additional RAM is not needed. If the value is consistently more than twice the amount of physical RAM, and the system is paging frequently, then more RAM might be needed. Physical Disk: % Disk Time The percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Lower is better. Recommended range is 50 percent or below. Avoid averaging over 80 percent. This is a good counter to check to see if the disk is a system bottleneck. A high value might indicate the need for a faster disk or that there are system configuration problems. Processor: % Total Processor Time This counter should be included to monitor single and multiple microprocessor systems. It combines the average microprocessor usage of all microprocessors into a single counter. It can be viewed as the fraction of time spent doing productive work. Each microprocessor is assigned an idle thread in the idle process, which consumes unproductive microprocessor cycles not used by any other threads. Redirector: Current Commands Counts the number of requests to the Redirector that are currently queued. If this number is significantly higher than the number of network adapter cards installed in the computer, then the network(s) or the server(s) being accessed are a bottleneck. SQL Server: Cache Hit Ratio The percentage of time that a requested data page was found in the data cache (instead of being read from disk). Higher is better. 99 percent is desirable. If the percentage consistently averages below 97 percent, you should increase the amount of memory allocated to SQL Server. |