Chapter 4: Designing a Microsoft SQL Server System
Before you even begin loading the operating system and Microsoft SQL Server, you should have a good idea how you want to design your SQL Server system. By carefully designing your SQL Server system, you can avoid costly downtime caused by having to either rebuild the system or reinstall SQL Server with different options. In this chapter, you will learn how to design a SQL Server system. In this book, the SQL Server system is the hardware and software that make up the computer that is actually running the SQL Server database. By comparison, when we talk in general about a system, we mean all of the hardware and software that make up all of the computers working on behalf of the user in order to access data from one or more SQL Server databases.
This chapter also gives you a brief introduction to the versions of Microsoft Windows 2000 and SQL Server that you might choose to install. Finally you will learn about the types of front-end applications that your company might purchase or create in order to access SQL Server and about how the application architecture can affect the future scalability and performance of your system.
To determine the requirements for an entire system, you must look at the function of the system. Once you have an idea of the purpose of a system, you can start looking at ways of using one or more computers to serve these requirements. You might find that one computer is sufficient or that you need many computers to achieve your goals. This chapter looks at those goals in general. Chapters 5 and 6 will more specifically describe how to design and configure the hardware.
In this chapter, we will look at system functionality and at what configuration decisions can be made based on that functionality. We will begin by looking at the types of applications that a system might be serving before looking at how the service level requirements can be defined.
What good is the SQL Server database without an application? Well, unless everyone in your company is an expert in SQL and enjoys putting together complex queries and viewing data in tabular form, it is not of much use. The application facilitates the use of SQL Server. Some applications make it clear that you are accessing a database, and other applications completely hide the fact that there is a database at all. In any case, it is important to design the application to easily provide users the service they desire, in a timely manner. In many cases, a slow, inefficient application causes frustration for end users. Eventually, if an application doesn't meet the customer's needs, the customer might find another company to deal with that provides better service.
Application types vary based on the function that they are serving. In general, there are three main functions: online transaction processing (OLTP) systems, decision support systems (DSSs), and batch processing systems. These functions have different requirements and might use drastically different types of applications.
OLTP systems are characterized by many users accessing online data simultaneously. In addition, and probably of more importance, these users are waiting for a response from the system. OLTP systems take a variety of forms, such as the following:
All OLTP systems have one thing in common. The user is waiting on a response from them. It is your job to design a system that can adequately service user requests in the specified response time.
Decision support systems assist the user in making important business decisions by providing a specific result based on a business question. Here are a few examples of business questions that might be answered by a decision support system:
Decision support systems are different from OLTP in that the user of a decision support system expects the result of a complex query to take a significant amount of time to be returned. Decision support queries can take anywhere from seconds to minutes to hours to complete. This does not mean that response time is not important, but some compromises can be made between throughput (performance for all users) and response time (performance for each user).
Batch processing systems process offline jobs that do not have any end-user component. The following tasks are typical jobs for this kind of system:
Batch processing systems typically have no users waiting for their jobs to finish, but they also typically have a certain time frame in which they must conclude the tasks. For example, overnight loads of data cannot overlap morning logins.
As you can see, it is important to get an idea of the types of applications that the system will need to support before you design the database server. Without knowing the requirements, you can't effectively design the system. As you will see later in this chapter and throughout the book, each system has its own requirements and properties. These requirements will prompt certain design decisions such as how many computers to use and whether you need distributed transaction processing.
Among the most important factors going into the design of your system are the service level requirements. The service level requirements are usually specified in a service level agreement (SLA). The SLA is made between the supplier (the CIO) and the customer (the users). Whether a formal SLA is created really depends on who the customer is and how the services are provided.
For example, if you are contracting with an outside firm for database services, you will almost certainly have an SLA. In fact, this SLA is most likely a legally binding contract. If you are providing services to other in-house departments, you might not have a legally binding agreement. In any case, an SLA will spell out the level of service that is guaranteed to the customer. It might also include penalties if these service levels are not met. SLAs are described in more detail in Chapter 6.
When a level of service is guaranteed, you must design your system to handle that level of performance. A few areas that influence that design are discussed here: performance, capacity, and uptime.
One of the most important aspects of an SLA is the specification of the minimum acceptable system performance. A typical SLA will include a chart of various transactions that an application supports, a minimal service time for 100 percent of the transactions, and an optional, more stringent minimal service time for 95 or 90 percent of the transactions. For example, the SLA might specify that 90 percent of "add new customer" transactions must finish within 2 seconds and that all of them must finish within 3 seconds.
The provider of the services must maintain this level of performance, or a penalty might be involved. As a DBA, you are responsible for administering the system so that these service levels are maintained. In addition, you must anticipate when problems might begin to occur and add capacity to the system as necessary. You meet these responsibilities by constantly monitoring the system and periodically performing capacity-planning exercises.
In addition to containing performance requirements, the SLA might also contain capacity requirements. Capacity of the system falls into several categories, such as disk space capacity, user capacity, network connection capacity, and so forth. It is your job to maintain the system so that it does not run out of capacity.
You maintain the capacity of the system by continually monitoring the system and taking action if it appears to be nearing a threshold. As you will see in Chapter 6, a certain amount of disk space should remain available at all times for workload spikes. If this buffer begins to be used frequently, additional resources should be added to the system. This is also true of CPU usage.
In addition to specifications for performance and capacity, specifications for system uptime are usually included in an SLA. Requirements for system uptime demand that the system be available for users during specified time intervals. The SLA might specify 99.9 percent uptime seven days a week, 52 weeks a year, or it might specify uptime for eight hours a day, five days a week, depending on your company's needs. You are responsible for developing a plan for routine backup and recovery, as well as a disaster recovery plan, in order to maintain the required uptime.
Now that you have had a brief introduction to the types of applications available and service level requirements, you are ready to decide which software to install on your system. You can choose from four versions of Windows 2000 and three versions of SQL Server 2000. In this section, you will learn the differences between these versions and why you might select one over another.
The four versions of Windows 2000 are designed to provide the right software for the right application. The capabilities of Windows 2000 grow as you move from Windows 2000 Professional to Windows 2000 Server to Windows 2000 Advanced Server and finally to Windows 2000 Datacenter. The following sections describe the capabilities of each version. You should choose the version of Windows 2000 that provides the capabilities that you need, rather than just purchase the most expensive version with the most capabilities.
Windows 2000 Professional is essentially the desktop version of Windows 2000. Typically, a system running Windows 2000 Professional will take advantage of only the SQL Server 2000 client components. However, if you need to run SQL Server on your computer, you can install the Personal Edition of SQL Server 2000. The Personal Edition permits only local access to the database. Access from other systems is not permitted.
Only the Personal Edition of SQL Server (and the client components) can be installed on Windows 2000 Professional.
Windows 2000 Server is designed as a server operating system, which means that installing Windows 2000 Server on a computer allows other systems to access resources on that computer. Windows 2000 Server supports SQL Server 2000 Standard Edition. Windows 2000 Server doesn't support systems with more than four CPUs and more than 4 gigabytes (GB) of memory. SQL Server 2000 allows remote clients to access the database as well.
Only SQL Server Standard Edition, SQL Server Personal Edition, and the SQL Server client components can be installed on computers that are running Windows 2000 Server.
Windows 2000 Advanced Server is also a server operating system. As with systems that run Windows 2000 Server, systems that run Windows 2000 Advanced Server allow other systems to access their system resources as well as SQL Server. In addition to having the capabilities of Windows 2000 Server, Windows 2000 Advanced Server supports up to eight CPUs and 8 GB of memory. In order to use Microsoft Cluster Services (MSCS) for failover support, you must be using Windows 2000 Advanced Server. In addition to supporting MSCS, Windows 2000 Advanced Server with SQL Server 2000 supports the new SQL Server clustering technology, updatable distributed views.
To utilize eight CPUs and 8 GB of memory within SQL Server 2000, you must run the Enterprise Edition of SQL Server. In addition, SQL Server Standard Edition, SQL Server Personal Edition, and SQL Server client components can be installed on Windows 2000 Advanced Server.
The flagship version of Windows 2000 is the Datacenter Edition. This version supports all of the components that the other editions of Windows 2000 do, as well as up to 64 CPUs and 64 GB of memory. Windows 2000 Datacenter is available only from hardware vendors. In addition to integrating Windows 2000 Datacenter with their hardware, these vendors offer the highest level of support available for Windows 2000. This integration provides a single point of contact for Windows 2000 support and hardware support.
To utilize 64 CPUs and 64 GB of memory within SQL Server 2000, you must run the Enterprise Edition of SQL Server. In addition, SQL Server Standard Edition, SQL Server Personal Edition, and SQL Server client components can be installed on Windows 2000 Datacenter.
In addition to the versions of Windows 2000 that you have to choose from, there are several editions of SQL Server. It is fairly easy to choose, based on the amount of memory and number of CPUs that you need to use. The editions of SQL Server are described here.
The SQL Server 2000 client components consist of the network libraries and utilities needed to access a remote or local SQL Server system. These components are necessary for any system to access SQL Server, and they are identical regardless of which edition of SQL Server is installed.
The Personal Edition of SQL Server is designed for small databases that are accessed locally on a client system. SQL Server 2000 Personal Edition does not allow other computers to gain access to the database.
SQL Server 2000 Standard Edition is one of the two server editions of SQL Server 2000. The Standard Edition functions the same way the Enterprise Edition does except that a maximum of four CPUs and 4 GB of memory can be accessed from the Standard Edition.
The Enterprise Edition of SQL Server supports all the features and functionality of all versions of Windows 2000. SQL Server 2000 Enterprise Edition requires Windows 2000 Advanced Server or Windows 2000 Datacenter. In addition, SQL Server 2000 Enterprise Edition supports 2-node failover clustering and updatable distributed views.
This table shows the versions and capabilities of Windows 2000 and SQL Server 2000.
Table 4-1. Version comparison
As you can see, you have several choices. You should carefully consider your decision on what to purchase. Once you have installed the system, you will have to start from scratch if change your mind.
In addition to selecting an edition of SQL Server 2000 and a version of Windows 2000, you can take advantage of several other options. These options are described in the following section, and they include MSCS, SQL Server 2000 replication options, and updatable distributed views (new in SQL Server 2000 Enterprise Edition). Each of these options has specific capabilities and requirements and therefore may or may not be useful with your configuration, as you will see in the next few paragraphs.
MSCS stands for Microsoft Cluster Services and is a Windows 2000 option that works in conjunction with SQL Server 2000. MSCS provides the ability for one computer to act as a standby, or failover, server for another computer. This capability allows the recovery process to begin almost immediately in the event of a hardware or even a software failure.
MSCS requires a shared disk subsystem to be connected to both systems in the cluster. The SQL Server transaction log and data files, as well as executable files, must reside on this shared disk subsystem. In the event of a failure, indicated by the loss of a heartbeat signal, the standby system takes over the SQL Server functions. Because the standby system captures the IP address and system name, it appears to the outside world as though the primary database server has simply rebooted.
A cluster failover requires the same database recovery as any system that has rebooted suddenly. MSCS does not provide fault tolerance—merely quick recovery.
SQL Server replication allows data from one SQL Server database to be replicated to another database system. There are several varieties of replication—snapshot, transactional, and merge—which are described in the following paragraphs. Which one would work best for you depends on your preferences and needs. SQL Server replication works on a publish-and-subscribe model, in which the publisher publishes the data and one or more subscribers receive copies of that data.
Snapshot Replication Snapshot replication periodically takes a picture, or snapshot, of the data and provides that data for other systems to use. Snapshot replication operates only when the snapshot is being created and applied; thus, no overhead is incurred during normal operations. The downside to snapshot replication is that the data is only as current as the last snapshot, which could be quite old.
Transactional Replication Transactional replication starts with a snapshot, but from that point on, the transaction log on the publisher is continually read, and transactions that have been applied to the publisher are then applied to the subscriber or subscribers. This allows the subscriber or subscribers to be kept somewhat up-to-date. Of course, there is some delay between when a transaction is committed on the publisher and when it is applied to the subscriber or subscribers.
Merge Replication Merge replication is different from snapshot and transactional varieties in that updates take place on both the publisher and subscriber systems. SQL Server uses triggers and timestamps to coordinate the changes between the various systems involved in the replication. Merge is useful if multidirectional replication is required, but it has much higher overhead than snapshot or transactional replication.
SQL Server 2000 introduces updatable distributed views. This option allows SQL Server systems to share a logical database, thus increasing scalability. The logical database can become large, and you can spread it across many computers to increase its capacity. Updatable distributed views are described in detail in Chapter 18.
An important part of designing the SQL Server system is laying out the database. This process involves the physical placement of transaction logs, data files, and so forth. This is one of the most important tasks involved in designing a SQL Server system because placement decisions are so difficult to reverse. Chapters 5 and 6 include tips on the physical placement of the transaction log and data files.
The transaction log is critical to the operation, the stability, and the performance of the database server. Each database has its own transaction log; thus, each transaction log should be properly placed. The transaction log is used to record changes to the database, thus allowing the system to recover in the event of a failure. Because recovery relies on the transaction log, it is important that you use a RAID I/O device to protect this component of the database from possible faults. In the event of the loss of a disk drive, the transaction log should still be available.
In addition to protecting the transaction log from disk failure, you should ensure that the transaction log is on a high-performance device. If the transaction log is too slow, transactions must wait, which drastically affects the performance of the system. The transaction log should also be configured as fault tolerant. These requirements are covered in more detail in the next chapter.
Finally there must be sufficient space within the transaction log so that the system can run uninterrupted for a long period of time. If the transaction log fills up, all transaction processing ceases until space is freed up. Space is freed up by backing up the transaction log. However, backing up the transaction log can affect performance. Some DBAs prefer to create a sufficiently large transaction log so that it is necessary to back it up only once per hour or once per day. The transaction log should be sized to run for at least eight hours without having to be backed up. As you will learn later in this book, this is a simplification of the transaction log process.
Data file placement is an entirely different process from transaction log placement. Depending on how the data files are accessed, you should place all of them on as many disks as possible, distributing the I/O load among all of the disk drives. This process is covered in more detail in the next chapter.
You should size data files so that there is enough capacity to handle system growth. You will sometimes be surprised by how fast your database grows. As data grows, so do indexes. Periodically you should check your system and perform a sizing and capacity-planning exercise.
So that you can plan the proper layout for the data files, the space should be calculated, performance needs should be assessed, and the proper number of disk drives should be created using a RAID subsystem. Whether or not fault tolerance is used will depend on your specific needs. Once the I/O subsystem has been determined, the data files should be evenly spread across controllers and disk drives.
A major part of your system is the application, which should be designed to perform well now and in the future. In this section, you will learn how to design an application with performance, scalability, and growth in mind.
The basic architecture of an application can take one of many forms. The major differences between application architectures have to do with the number of systems involved in the application. This distinction is known as the number of tiers. Many of the most popular applications are advertised based on the number of tiers they comprise.
Each database application is divided into three distinct components. These components are as follows:
The differences between one-tier, two-tier, and three-tier architectures are in how these components are divided up. In a one-tier architecture, they are all part of one program. In a two-tier architecture, these components are split into two distinct parts. In a three-tier architecture, these components are divided into three distinct parts. This is shown in Figure 4-1 and described in more detail in the sections that follow it.
Figure 4-1. Differences between architectures with one, two, or three tiers.
The one-tier, or single-tier, architecture is a system in which the database, application, and presentation services (the user interface) all reside on one system. This type of system does no processing external to the platform on which it is running. An example of single-tier architecture is a Microsoft Access database with local presentation services.
It is rare nowadays to find a substantial single-tier application, especially on a Windows 2000 platform. However, many smaller, single-user applications are single tier. Examples of this are Microsoft Money, Quicken, and TurboTax. These applications typically reside on the same system on which they are running. It is much harder to find an example that uses SQL Server. In fact, even though you can run Enterprise Manager on the same system that the database resides on, it isn't really a single-tier application because the application uses SQL Server networking components. The fact that you happen to be running them on the same system is irrelevant.
A two-tier application is one in which the presentation services and the database reside on different systems. The presentation services (user interface) layer usually includes application logic. A good example of a two-tier application is one that uses SQL Server Enterprise Manager. For this type of application, the user interface and the application logic reside in Enterprise Manager, but all of the data that the application uses to function resides in a SQL Server database on a different system.
Two-tier applications are common. You might have worked with many of these applications already. These applications are typically created in languages that support the Windows programming APIs, such as Microsoft Visual C++ or Visual Basic. With a two-tier application, each user must have one or more connections into the SQL Server database. This architecture can be inefficient because most of those connections will be idle for most of the time.
Three-tier applications separate the database layer, the application layer, and the presentation services layer into three distinct components. Typical three-tier applications use the middle layer to multiplex connections from the presentation services layer, which reduces the number of connections into SQL Server. In addition, the middle layer can perform a great deal of the business logic, leaving the database free to do what it does best: deliver data.
There is some debate over whether Web-based applications are two-tier or three-tier applications. You can use this simple test: if the data presented in the presentation services layer could just as easily use a terminal or a Web browser, the application probably has two tiers.
As you can see, the separation of the components allows you to use multiple systems. In fact, typical systems start with one database server connected to several application servers that, in turn, serve many PC clients. How you design your system depends on the number of users and the type of application you choose.
When you develop an application and a database schema, you should keep performance and scalability in mind. You make many choices during the application design phase that can eventually affect the performance and the scalability of the system. These choices include the following:
As you can see, you should keep several factors in mind if you want to design a system that performs well as the workload grows. By incorporating performance optimization techniques from the design stage, you should be able to create a scalable system.
As you have seen in this chapter, you must keep in mind many things when designing a SQL Server system. Unfortunately, it is not possible for anyone to simply tell you how to design your system. Even if you design systems for many companies, you will seldom end up with similar results because each company has its own needs and requirements.
This chapter covered several key points. You must assess the uptime requirements of your company and develop a design that will meet those requirements. This might mean multiple data centers, clustering, RAID I/O subsystems, or replication. In addition, the scalability and performance requirements of your system will influence its overall design. As you have seen in this chapter, you have a variety of options. Finally you should design the application with performance in mind. This foresight will result in a system that does not slow down as the data set grows and the number of users increases.
In the next chapter, many of the topics introduced in this chapter will be expanded. Chapter 5 will also teach you how the I/O subsystem works, what performance and fault tolerance issues you should consider, and how to plan and configure an optimal I/O subsystem.
Last Updated: Friday, July 6, 2001