Abstract
This chapter briefly presents the issues facing the database administrator (DBA) in creating a service level agreement (SLA). The process of negotiating an SLA is covered, as well as the elements that should be included in an effective SLA. The chapter discusses the implications for availability of long-running queries, presents blocking and deadlocking, and describes best practices that enable the DBA to maintain the appropriate service levels. After implementing the procedures described in this chapter, the DBA will be able to negotiate an appropriate SLA given the client's needs and the resources available, and will be able to provide that level of service.
On This Page
IntroductionProviding a sustained, high level of service can be a challenge for database administrators (DBA). With continuous upgrades to the operating system (OS), Microsoft® SQL Server™, the network, and applications, the possibilities for error are numerous. Yet the demand for continuous, flawless service is constant. The service level agreement (SLA) is essential to the delivery of this service. Not only is the SLA the mechanism by which the customer clearly expresses the demand for a certain service level, it is also a tool that the DBA can use to control and improve the processes that traditionally interfere with the provision of that service. The DBA can use the SLA to justify requests for additional hardware, software, and staff and to regulate how customer demands and problems are handled. This chapter covers the service issues that should be addressed in your SLA. The chapter provides guidance on the process of analyzing your production environment to determine what kind of requirements and restrictions you should negotiate for. It also details the techniques that you should employ to ensure that your service levels are met. Design ConsiderationsThe SLA described in this chapter is intended to slowly improve operating conditions over time, serving as a tool for reform. This approach was selected because of its success in numerous data centers in a wide variety of industries. The idea common to these success stories is that problems should be resolved and not just managed. The tradeoff to this approach is the number of personnel that become involved in the provision of service. This approach involves a DBA, problem tracking personnel, development personnel (assigned to the resolution of critical bugs), test personnel (required to clear bugs), and a customer representative. If your organization cannot provide these resources for the application you must maintain, you may find that the SLA described in this chapter is not optimal for your situation. One other factor should be considered in creating an SLA. You will probably want to write a general SLA that can be used with most of your customers, but that includes addenda, or exceptions, for special situations. The standard SLA can then be referred to as the baseline performance standard, so that the negotiations and drafting of agreements can focus on the factors that make service to a particular organization unique. For example, in some organizations, the accounting department will have special month-end, quarterly, and year-end requirements. And, in very large international organizations, subsidiaries may not have the same year end as the parent organization. Resource RequirementsTo implement the SLA described in this chapter, you will need the following software:
You will also need to assign people to the following roles in addition to the DBA role:
Process FlowchartYour SLA should define the process for logging, responding to, and resolving trouble tickets. The SLA should also define the interaction of the trouble ticket process with the development process through the bug tracking tool. To help all parties involved understand this complex set of processes, it is a good idea to include a process flowchart in your SLA. The process flowchart follows a trouble ticket from creation to resolution (see Figure 8.1). Negotiating the SLAThis section describes the process of developing an SLA. The information in this section applies equally to the way that the baseline SLA is produced and how the addenda are produced for each customer. The process is one of clarifying expectations and deliverables. The type of SLA you are developing (baseline or customer-specific) affects the scope of the agreement and who is involved, but the approach is the same. The large number of roles involved in the provision of service should suggest to you that negotiating the SLA is not an easy task. When you negotiate the SLA, you are negotiating the terms of a number of operational positions. The people who currently hold these positions are likely to be uncomfortable during the negotiation process. Be sensitive to their concerns. To overcome the fears that surround the SLA negotiation, be sure to include all affected parties. Encourage participation in the crafting of the SLA by making it clear to all parties that the SLA will help to set proper expectations. Let them know that by participating, they can help to set a level of expectation that they can live with. Emphasize that everyone benefits from clear expectations. You might consider opening the negotiation with a review of some current operational pressures, showing all parties how improperly set expectations make these pressures worse. After you have the participation of all parties, you may want each operational group to vote for a representative to the negotiations. This will allow you to introduce the negotiations to an all-inclusive group, but then work out the details of the SLA with a more manageable set of representatives. You will need representatives from the following groups:
The group representatives or the group as a whole will need to work out a number of issues that comprise the SLA. These issues are listed in the following section. To move the negotiating group through the issues, you might consider providing an outline of the SLA issues at the start of negotiations. This outline could even include suggested text that would resolve the issue. However, if you present any prepared ideas to the negotiators, you must make it clear that they are only suggestions. Remember, the parties involved may already be uncomfortable and may not accept suggestions that appear to be decided in advance. Elements of the SLAThe SLA should include the following sections:
Delivery of ServiceAfter the SLA is agreed to, each party involved in the SLA should prepare to meet the responsibilities that were established by the document. For operations, this might require putting into place an entirely new support structure. The two support systems that provide the core of your SLA are trouble ticketing and bug tracking. Trouble ticketing and bug tracking are key to your organization's ability to deliver on the service levels specified by the SLA. Trouble TicketingWhen a problem occurs in a well-run organization, it is reported by customers or operational staff. The problem is then recorded and tracked in a system that can manage trouble details as they emerge. When the problem is finally resolved, the resolution is also recorded by the system with the expectation that operational personnel will learn from the experience. If your organization does not use a trouble ticketing system, you must first find out whether customers or operational staff are encouraged to report problems when they first encounter them. If it is easier for customers to leave your site once it fails and visit a competitor's site, you can be certain that this is exactly what will happen. Make sure that your organization encourages the reporting of problems. If you design your system so that customers benefit from pointing out problems, there is a greater chance that they will do so. Also, reward employees that quickly identify operational problems. If your organization does not reward this type of behavior, there is no reason to implement a trouble ticketing system. After you have removed obstacles to trouble reporting, you can implement a system for issuing and tracking trouble tickets. You can conduct a preliminary test with an e-mail based system. The process would begin with a description of the trouble being e-mailed to the appropriate person. This person would add information to the e-mail chain or would forward the e-mail to the appropriate person. As long as a single group is responsible for issuing and tracking the e-mail chains, an e-mail-based trouble ticketing system can work. It is recommended that a third-party trouble ticketing system be adopted. This type of system allows for the assignment of trouble tickets, a standard procedure for the closing of tickets, and reporting on ticket status and progress. A third-party system is the only way to manage the work of a large operational body. Bug TrackingThe best source of ongoing application improvement is the constant tracking and resolution of bugs. Your organization should formalize this important process by introducing bug tracking software (if it has not done so already). Bug tracking software allows application problems to be managed apart from operational problems. Although an organization may open a large number of trouble tickets a day, only one or two new bugs may be found. A separate bug tracking system helps prevent the application development group from being sidetracked with these operational issues. Your bug tracking system cannot be based on informal e-mail exchanges. The information associated with the bug is far too critical for this. If the information is lost, it could require hours of independent testing by the developer assigned to fix the bug. Furthermore, your system may have to monitor bugs through numerous application releases and numerous development teams. As developers leave a project, the background and resolution of a given bug become invaluable to future developers. Service LevelsNo matter what the customer has asked for, there are actually only two types of service that you have the ability to provide: monitoring and response. Although you may want to improve application performance, and may have even considered offering that as a separate service, focusing on application performance is not beneficial until you have monitored the application and determined where performance needs to be improved. After you have monitored the application and have found an area to improve, any improvements implemented can be considered a response to poor performance. Thus even application tuning can be viewed as a process of monitoring and response. Your service levels will detail the states you are watching for (monitoring) and the actions you will take when a certain state arises (response). You should also provide the timeframe associated with each monitoring and response pair. A service level can be simply defined as the response to a given operational state within a certain timeframe. Operational StatesTo limit the number of service levels that your SLA must detail, it is suggested that you define responses only for the following operational states:
The last three operational states could involve many possible responses. If you are supporting something more than a simple database application, it is recommended that the operational states involving negative indicators be expanded to include indicators for each element of your application. For a complex application, the last three operational states could be expanded to include:
TimeframesYou must factor a timeframe into the delivery of each of your service levels. Unless your data center is staffed 24 hours a day, seven days a week, the supported timeframes can be simply defined. The timeframes that are most often associated with operational response are as follows:
ResponsesThe most complex element of your service levels will be the responses that you define for each level. Each response needs to be specifically designed to fit your application and your operational environment. All parties involved in the SLA process should work together to determine the appropriate response to each operational state. A detailed discussion of this process is beyond the scope of this chapter. However, some guidance can be provided for the response creation process. First, do not abandon responses that you think are valid simply because the customer representative does not require them at present. Any responses that you think are valid that are not included in the final service levels should be included in your SLA as "additional responses" or "emergency responses". Your goal should be to define the full range of responses that might be needed at some point. By defining many responses and creating a support structure that can handle each type of response, you will be able to easily move service levels from one type of response to another when the customer demands it. This is important, because the customer representative you negotiate with may miscalculate the importance of some situation prior to the actual implementation of the SLA. Soon after the SLA has been signed and your support structure has been created, the customer representative may then pressure you to change the way you are dealing with the situation. When additional responses are defined at the outset, both you and the customer representative have something that you can refer to. Another thing to consider is the full set of roles that could be involved in the response to an operational state. It is a good idea to define responses for all the parties that may be involved in solving a problem. As a problem continues, increasingly valuable support resources should be introduced. The following list provides an example of the roles that might be involved in a problem state and the order in which they should be introduced:
Maintaining Service LevelsThe following suggestions are intended to help you avoid problem states in your SQL Server environment:
SummaryThe goal of your SLA should be the creation of a responsive support organization. The members of this organization should continually meet the expectations set by the SLA, and they should be provided tools that enable their participation in the support structure that has been defined. The SLA itself should be a living document that is easily modified as the support infrastructure matures. All parties should embrace the SLA negotiation process and welcome the results. The benefits provided by clear expectations and standardized responses should be clear to all. To create this environment, you should expect to do a considerable amount of work. You will need to bring parties together, work through complex issues, and address the concerns of those involved. But the results will be worth the effort. | In This Article |