|
Chapter 4: Disk Configuration for High Availability
4 Disk Configuration for High AvailabilityDisks are the heart and soul of a database systemthey physically store the data accessed by the database server. From both performance and high availability standpoints, ensuring a proper disk configuration is one of, if not the most, important aspect of planning and configuration when it comes to the system that will run Microsoft SQL Server 2000. Even though the decisions of how many processors and how much memory you need are important, you will probably get the most from your SQL Server investment by planning and implementing the best disk subsystem for your needs.Whenever you design a system for availability, growth, and performance, as noted in Chapter 1, "Preparing for High Availability," there is some form of trade-off involved. This chapter guides you step by step through the decision-making, planning, and implementation of disk subsystems used with SQL Server.
Quick Disk Terminology CheckThis section defines a few terms that are used throughout the rest of the chapter.
Capacity PlanningNumerous factors go into deciding how your disks will be configured. The first part of that decision process must be capacity planning, the art of determining exactly how much space you need. For new systems, this can range from very simple to very complex, depending on how much information you have up front. For extending existing systems, upgrading, or migrating to a new hardware or software platform, capacity planning should be easier because there should be documented history on the prior growth of that system, database, and application.Two kinds of disk space usage must be known: raw space, and the physical number of disk drives needed for storage and to achieve the desired level of performance. Remember that figuring out how much raw space, which will then dictate how many drives you need, is based on your application's requirements of how it will be using SQL Server. The information and equations are based on these basic tenets of disk capacity planning.
Raw Disk Space NeededConceptually, the amount of raw disk space is represented by the following equation:
Minimum Disk Space Needed = Size of Data (per database, including system databases) + Size of All Indexes (per database, including system databases) There is a flaw in this equation, however. Calculating your disk space based on the minimum amount needed would be a mistake. You need to take into account the growth of the database over the time that the system will be in service. That would transform the equation into this:
Minimum Disk Space Needed = Size of Data (per database, including system databases) + Size of All Indexes (per database, including system databases, full-text indexes, and so on) + Planned Growth + Microsoft Distributed Transaction Coordinator (MS DTC) Logging Space + Amount of Operating Reserve + Amount Reserved for Hardware Optimization The revised equation is much more realistic. The amount of operating reserve is the total drive space needed to handle an emergency situation. For example, if you need to add a new column to a table, you need enough transaction log space for the entire table (possibly two times the table) because all changes are logged row by row and the table is affected in its entirety within one transaction. The amount reserved for hardware optimization is based on the disk drive performance of the inner tracks of the physical drive, which might be slower than the outer tracks (see the section "Understanding Your Hardware" later in this chapter for information on media banding and physical drive characteristics). In this case, the amount can range from 10 to 40 percent depending on whom you ask, but the performance characteristics might be different on more modern disk drives (up to 25 percent faster on the outer tracks than the inner tracks). You can combine this reserve space with the operating system reserve in most cases. With MS DTC usage, if you have high transactional volume, the logging of MS DTC might become a bottleneck.
Application Database UsageEach application using SQL Server has its own signature that is its distinct usage of SQL Server. Assess each application that will be utilizing one or more databases. What kind of work is it doing? Is it mainly reads? Mainly writes? A mixture? Only reads? Only writes? If the workload is a combination of reads and writes, what is the ratio of reads to writes? (Some hardware solutions can assist you in this matter to report on real read versus write statistics, as well as storage caching statistics.) If this is an existing database system, how has the usage changed over time? If this is a new system, what is the projected usage of the system? Do you anticipate changes in usage patterns? You might need to ask more questions. For a packaged application that you are not developing, ensure that your vendor can reasonably answer the questions so you can plan your hardware appropriately. If possible, get the input of other customers who have used that software as well.For example, an accounting package might have heavy inserts, for 70 percent on average, and 30 percent reads. Because the software is doing some aggregations at the SQL Server level and bringing back intermediate result sets, it also uses tempdb heavily. The company that manufactures the accounting package recommends a minimum of 60 GB as a starting size for the database, but you can best determine how the software would be used in your environment. These questions are vital to starting the disk configuration process because, like the guiding principles that will carry through the entire high availability process, how the database is to be used directly influences every decision made about disk configuration. Application Schema Knowing the schema and how data is used also helps you determine the capacity needed. Important things to track are whether it is a custom in-house application or a third-party application; usage of data that is inserted, updated, and deleted; the most frequently used tables (for reads and writes); and how much these tables grow. A database administrator (DBA) or someone else, if appropriate, should track these items on a regular basis. Tracking these items over time for new and existing systems not only provides an accurate picture of your database usage, but also helps in areas such as performance tuning (figuring out what indexes to use, avoiding hotspotting at a physical level when designing a disk subsystem). If you have been using a database for years and now need to migrate it to a new version of the database software (for example, from Microsoft SQL Server 7.0 to Microsoft SQL Server 2000), you can accurately know how to configure your disk subsystem and SQL Server file sizes only if you know the answers to the above questions. One consideration when thinking about schemas is the use or nonuse of Unicode. Unicode data is stored as nchar, nvarchar, and ntext data types in SQL Server as opposed to char, varchar and textr. This means that Unicode data types take up twice as much storage as non-Unicode data types, effectively halving how data is actually stored at a physical level (or doubling your space requirement, depending on how you look at it). This must be taken into account when calculating how much each row will total in terms of bytesif a row that is non-Unicode is 8000 bytes, for a Unicode row it will be 16,000 bytes. Another consideration is fixed-length versus variable-length text columns. To be able to predict row length, and ultimately table growth over time, you should use fixed-length fields such as a char instead of a varchar. A varchar also consumes more disk space. If you are writing an application that must support international customers in different languages, you need to account for the space required for each language. Therefore, if you are writing an application that must store information in English, French, Spanish, Dutch, Japanese, and German, you need to account for up to six different versions of each column. As with many other things related to system and application designs, there are trade-offs when it comes to schema design. For example, you have an application that accesses a table named CustomerInfo. Table 4-1 shows the table structure. Table 4-1 CustomerInfo Table
According to Table 4-1, each row inserted into the CustomerInfo table consumes 359 bytes. Assume that this is an e-commerce application, and the database is expected to grow to 500,000 customers in two years. That equates to 179,500 kilobytes, or just fewer than 180 MB for this table alone. Each entry might also include updates to child tables (such as customer profiles or customer orders). With each insert, you need to take into account all parts of the transaction. If your table has 10, 20, 50, 100, or 500 tables in it, suddenly 180 MB is only a small part of the puzzle. If you have not gathered already, things get complex very quickly when it comes to disk architecture.
Indexes Indexes are integral to a schema and help queries return data faster. Take performance out of the equation for a moment: each index takes up physical space. The total sum of all indexes on a table might actually exceed or equal the size of the data. In many cases, it is smaller than the total amount of data in the table. There are two types of indexes: clustered and nonclustered. A clustered index is one in which all the data pages are written in order, according to the values in the indexed columns. You are allowed one clustered index per table. Clustered indexes are great, but over time, due to inserts, updates, and deletions, they can become fragmented and negatively impact performance. Indexes can help or hurt performance. UPDATE statements with corresponding WHERE clauses and the right index can speed things up immensely. However, if you are doing an update with no WHERE clause and you have several indexes, each index containing the column being updated needs to be updated. In a similar vein, if you are inserting a row into a table, each index that contains the row has to be updated, in addition to the data actually being inserted into the table. That is a lot of disk input/output (I/O). Then there is the issue of inserting bulk data and indexesindexes hinder any kind of bulk operation. Think about it logicallywhen you do one insert, you are not only updating the data page, but all indexes. Multiply that by the number of rows you are inserting, and that is a heavy I/O impact no matter how fast or optimized your disk subsystem is at the hardware level. For performing bulk inserts, indexes should be dropped and re-created after the inserting of the data in most cases. Queries might or might not need an index. For example, if you have a lookup table with two columns, one an integer, and one a fixed length of five characters, putting indexes on the table, which might only have 100 rows, could be overkill. Not only would it take up more physical space than is necessary, but also a table scan on such a small amount of data and rows is perfectly acceptable in most cases.
Deletes, Inserts, Selects, and Updates The code you write to delete, insert, select, and update your data is the last part of the application troika, as it contains the instructions that tell SQL Server what to do with the data. Each delete, insert, and update is a transaction. Optimizing to reduce disk I/O is not an easy task because what is written is directly related to how well designed the schema is and the indexing scheme used. The key is to return or update only the data that you need, keeping the resulting transaction as atomic as possible. Atomic transactions and queries are ones that are kept extremely short. (It is possible to write a poor transaction that is seemingly atomic, so do not be misled.) Atomicity has an impact on disk I/O, and in terms of technologies like log shipping and transactional replication, the smaller the transaction that SQL Server has to handle, the better off you are in terms of availability. For example, a Web-based customer relationship management (CRM) program used by your sales force allows you to access your company's entire customer list and page through the entries one by one. The screen only displays 25, and at a maximum, displays up to 200. Assume the schema listed earlier for the CustomerInfo table is used. The application developer implements this functionality by issuing a SELECT * FROM CustomerInfo query with no WHERE clause; as you might have deduced, the combination of a SELECT * that pulls back all columns and no WHERE clause to narrow the result set is not recommended. The application then takes the results and displays only the top 200 customers. Here is the problem with this query: when you reach 500,000 customers, every time this functionality is called, it returns 180 MB of data to the client. If you have 500 salespeople, 100 of whom are always accessing the system at one time, that is 18,000 MB or 18 GB of data that can potentially be queried at the same time. Not only will you be thrashing your disks, but network performance will also suffer. A better optimization would be to issue the same query with a TOP 200 added so that you are only returning the maximum result set handled by the application. Another way to handle the situation is to make the rowcount a parameter allowing each user to customize how many rows he or she would like returned, but that would not guarantee predictability in the application. To make matters worse, if the user wanted to do different sorts, the application developer would not utilize the 180 MB result set that was already queried: he or she issues another SELECT * FROM CustomerInfo query, this time with a sorting clause (such as ORDER BY Customer_lname ASC). Each round trip, you are thrashing disks and consuming network bandwidth when you want to do a sort! If there were any sort of action associated with the rows returned, you would now have multiple users going through the list in different directions, potentially causing deadlocks and compounding the problem. Unless the data that will be pulled back changes, the application should be able to handle the sort of the data for display without another round trip. If you need to pull back different data, using a WHERE clause combined with the aforementioned TOP also helps disk I/O, especially if you have the proper indexing scheme, as you can go directly to the data you are looking for. Transact-SQL statements that use the locking hints in SQL Server, such as HOLDLOCK, NOLOCK, and so on can either help or hinder SQL Server performance and availability. You should not use locking hints because the query processor of SQL Server in most cases chooses the right locking scheme behind the scenes as its default behavior. For example, a user of the same CRM application updates a row into the CustomerInfo table. The developer, with the best of intentions, wants to ensure that no one else can update the record at the same time, so naively he or she adds the XLOCK hint to the UPDATE statement. Adding XLOCK forces the transaction to have an exclusive lock on the CustomerInfo table. This means that no other SELECT, INSERT, UPDATE, or DELETE statements can access the table, backing up requests for SQL Server, backing up disk resources, and causing a perceived unavailability problem in the application because the users cannot do their work. Thus one seemingly small decision can affect not only many people, but also various resources on the server.
Understanding Physical Disk PerformanceIf you are using a database now, or are testing one that is going into production, you need to capture statistics about what is going on at the physical and logical layers from a performance perspective under no load, light load, medium load, and heavy load scenarios. You might be set with raw disk capacity in terms of storage space, but do you already have a performance bottleneck that might only get worse on a new system configured in the same way?For Microsoft Windows 2000 Server and Microsoft Windows Server 2003, disk performance counters are permanently enabled. If, for some reason, they are not currently configured in Performance Monitor (they should appear as the LogicalDisk and PhysicalDisk categories), execute DISKPERF -Y to enable the disk performance counters at the next system restart. Obviously, this causes a potential availability problem if a restart is required immediately. For more information about the different options with DISKPERF, execute DISKPERF /? at a command prompt. Having the disk performance counters turned on adds approximately 10 percent of overhead to your system. Once the disk performance counters are enabled, you can begin capturing information to profile your disk usage. There are two types of disk statistics to consider:
The most important statistics are the physical counters, although the logical ones could help in some environments. It really depends on how your disk subsystem is configured and what is using each physical and logical disk. More on logical and physical disks is covered in the section "Pre-Windows Disk Configuration" later in this chapter. The performance counters described in Table 4-2 should provide you with a wealth of information about your disk subsystem. Table 4-2 Performance Counters
Some of the counters in Table 4-2, such as Avg. Disk sec/Transfer, have equivalents in the LogicalDisk category. It is a good idea to further isolate and gather performance statistics to track numbers for each logical disk used by SQL Server to help explain some of the numbers that occur on the PhysicalDisk counters.
Using SQL Server to Assist with Disk Capacity PlanningSQL Server can also help you understand what is happening with your disk usage. SQL Server has a function named fn_virtualfilestats that provides I/O information about all of the files (data, log, index) that make up an individual database. fn_virtualfilestats requires the number of the database you are looking for statistics about, which can be gathered by the following query:
SELECT * FROM master..sysdatabases Or you could use the following:
sp_helpdb To see the actual names of the files used in your database, execute the following query:
SELECT * FROM databasename..sysfiles Finally, to get the statistics for all files for a specified database, execute a query using fn_virtualfilestats. It takes two parameters: the database ID, which is the first number, and the file number. If you want to return all files, use -1 for the second parameter. This example retrieves the results for all files used with pubs, as shown in Figure 4-1. Figure 4-1 The results of a query designed to show the I/O statistics for all files comprising the pubs database. If you take the data returned from the IOStallMS column and divide by the sum of NumberReads and NumberWrites (IOStallMS/[NumberReads+NumberWrites]), the result determines if you have a log bottleneck.
Types of Disk SubsystemsThere are three main types of disk subsystems you need to understand when putting together a SQL Server solution: Direct-Attached Storage (DAS), Network-Attached Storage, and storage area networks (SANs).
Direct-Attached StorageDirect-Attached Storage (DAS) is the traditional SCSI bus architecture. There are quite a few variations of SCSI, and describing them all is beyond the scope of this chapter. With DAS, you have a disk or set of disks dedicated to one host, or in the case of a failover cluster, potentially available to the nodes. SCSI has some inherent cabling issues, such as distance limitations, and adding a new disk with DAS can cause an availability outage. One of the biggest problems with DAS is that when it is used with a server cluster, you have to turn all caching off, so you take a potential performance hit because the redundant array of independent disks (RAID) controllers are located in each node. However, this can vary with each individual hardware implementation. If the node's SQL Server resources fail to the other node and there are uncommitted transactions in the cache of the local controller, assuming that node is not cycled and you fail the resources back at some point, you then potentially corrupt your data when the original node sees the disks again and flushes its cache.Keep in mind that if the material in the cache is not something that needs to be written to the log (such as an insert, update, or delete), it might not matter. SQL Server automatically rolls back any unfinished transactions in a failover.
Network-Attached StorageNetwork-Attached Storage devices are fairly new to the storage arena. These devices are function-focused file servers that enable administrators to deploy and manage a large amount of disk storage in a single device. Network-Attached Storage behaves like any other kind of serverit integrates into a standard IP-based network to communicate with other clients and servers. Some storage vendors present Network-Attached Storage as a DAS device. Network-Attached Storage is usually used for file-based storage, but it can be used with SQL Server if it meets all of your performance, availability, and other goals. Besides generic Network-Attached Storage devices, there are also Windows-powered Network-Attached Storage devices based on Windows 2000 that are optimized and preconfigured for file storage. Before purchasing a Network-Attached Storage-based SQL Server storage solution, consider the following:
Performance Considerations Prior to making any storage purchase decisions, you need to determine the required disk throughput, disk capacity, and necessary processor power. If you are considering a Network-Attached Storage solution, these additional requirements need to be taken into account:
Using Network-Attached Storage as a Data Store with SQL Server SQL Server is not configured by default to support the creation and usage of a data store on a network file share, either those located on a standard server or a Network-Attached Storage device. In the case of Network-Attached Storage, data corruption could occur due to network interruptions if the process that ensures database consistency issues a write that cannot be committed to the disks of the Network-Attached Storage device. To enable support for network file shares, trace flag 1807 must be enabled. This trace flag bypasses the check to see if the location for the use and creation of the database file is on a network share. Use Query Analyzer, select the master database, and execute the following command:
DBCC TRACEON(1807) The successful result of this command should be as follows:
DBCC Execution Completed. If DBCC Printed Error Messages, Contact Your System Administrator. It is now possible to use a mapped drive or a Universal Naming Convention (UNC) path (that is, \\servername\sharename) with SQL Server 2000. If trace flag 1807 is not enabled prior to using a Network-Attached Storage device with SQL Server, you will encounter one of the following errors:
Storage Area NetworksSANs are a logical evolution of DAS, and they fix many of the issues, such as the caching problem, associated with DAS. The purpose of a SAN is to give you flexibility, scalability, availability, reliability, security, and device sharing. SANs cache work at the physical level on the disk of sectors, tracks, and cylinders.SQL Server itself technically does not care what protocol you use to access your SAN, but your choice impacts performance. SCSI and Fibre Channel have been mentioned, but there is also the VIA protocol, traditional IP, Ethernet, and Gigabit Ethernet. Fibre Channel, which does have some low-level SCSI still embedded, is strongly recommended. It is the most effective at carrying block-type data from storage to the computer writing or reading data. It also delivers predictable performance under higher loads than traditional Ethernet loads. Most important, Fibre Channel is extremely reliable.
The speed of your SAN is largely related to its architecture, which has a host bus adapter that is in each server accessing the SAN, controllers for the disk, switches, and finally, the disks themselves. Each manufacturer implements SANs a bit differently, although the overall concepts are the same, so it is important to work with your hardware vendor to ensure you understand what you are implementing. From a security standpoint, with a SAN, you can do things like zoning and masking, although your SAN vendor must support these features. Zoning occurs when you set up the SAN so that systems can be isolated from one another, and this feature is very useful for clustering. Masking allows you to hide LUNs from certain systems. Cluster nodes can be in the same or different, overlapping zones that are configured with masking. Because SANs support a wide range of operating systems and servers, a company might purchase a large SAN for use with many different servers or types of workload. Although it is good that you have a SAN, you need to realize the implications of this. Because all systems attached to the system have different workloads and the SAN only has one cache, you will be sharing the cache among multiple systems. If, for example, you also have an Exchange server on your SAN that is heavily utilizing the cache, this could impact the performance of your SQL Server. Also consider how the other operating systems or applications interface with the SAN at a base levelif another Windows server or cluster issues a bus reset as it comes online (say, after a reboot of a node), will it affect your current Windows server or cluster with SQL Server that is running with no problems? These are things you should know prior to putting the SAN into production. You might also be sharing spindles between different servers on a SAN depending on how the vendor implemented its disk technology. What this means is that one physical disk might be carved up into multiple chunks. For example, a 36-GB disk might be divided into four equal 9-GB partitions at a physical level that Windows and SQL Server would never even see; they only see the LUN. You must recognize if your hardware does this and plan accordingly.
What Disk Technology to UseWith so many choices available to you, what should you choose? Simply put, choose what makes sense for your business from many standpoints: administration, management, performance, growth, cost, and so on. This is not like other decisions you will make for your high availability solution, but it might be one of the most important technology decisions you make with regard to SQL Server. When considering cost versus performance, features, and so on, look at the long-term investment of your hardware purchase. Spending $100,000 on a basic SAN might seem excessive when compared to a $10,000 traditional DAS solution, but if your company is making millions of dollars per month, and downtime will affect profitability, over time, that SAN investment gets cheaper. The initial cost outlay is usually a barrier, however.As you might have gathered, Network-Attached Storage is currently not the best solution when you want to configure SQL Server. From an availability standpoint, your network is one large single point of failure, and the possibility of data corruption due to network interruption decreases your availability if you need to restore from a backup. That leaves DAS and SAN. At this point the main issues will be cost, supportability, and ease of expansion and administration. DAS is usually SCSI-based and it is much cheaper, but it is less flexible, and because you cannot, for example, use the write cache (read is just fine), it might not be ideal for your high availability usage of SQL Server. SANs are the way to go if you can afford a solution that fits your needs. The ease of configuration and expansion as well as flexibility are key points to think about when looking at SANs. Although they are not mentioned directly, there is always the option (in a nonclustered system) to use separate disks internal to a system, whether they are SCSI or Integrated Device Electronics (IDE). Some internal SCSI disks also support RAID, which is described in more detail in the section "A RAID Primer" later in this chapter.
Server Clusters, Failover Clustering, and DisksThere are two types of storage I/O technologies supported in server clusters: parallel SCSI and Fibre Channel. For both Windows 2000 and Windows Server 2003, support is provided for SCSI interconnects and Fibre Channel arbitrated loops for two nodes only.
If you are implementing SCSI, the following considerations must be taken into account:
If you are implementing Fibre Channel, the following considerations must be taken into account:
When you really think about it, clusters are networked storage configurations because of how clusters are set up. They are dependent on a shared storage infrastructure. SCSI-based commands are embedded in fiber at a low level. For example, clustering uses device reservations and bus resets, which can potentially be disruptive on a SAN. Systems coming and going also lead to potential disruptions. This behavior might change with Windows Server 2003 and SANs, as the Cluster service issues a command to break a reservation and the port driver can do a targeted or device reset for disks on Fibre Channel (not SCSI). The targeted resets require that the host bus adapter (HBA) drivers provided by the vendor for the SAN support this feature. If a targeted reset fails, the traditional entire buswide SCSI reset is performed. Clusters identify the logical volumes through disk signatures (as well as partition offset and partition length), which is why using and maintaining disk signatures is crucial. Clusters have a disk arbitration process (sometimes known as the challenge/defense protocol), or the process to reserve or "own" a disk. With Microsoft Windows NT 4.0 Enterprise Edition, the process was as follows: for a node to reserve a disk, it used the SCSI protocol RESERVE (issued to gain control of a device; lost if a buswide reset is issued), RELEASE (freed a SCSI device for another host bus adapter to use), and RESET (bus reset) commands. The server cluster uses the semaphore on the disk drive to represent the SCSI-level reservation status in software; SCSI-III persistent reservations are not used. The current owner reissues disk reservations and renews the lease every 3 seconds on the semaphore. All other nodes, or challengers, try to reserve the drive as well. Before Windows Server 2003, the underlying SCSI port did a bus reset, which affected all targets and LUNs. With the new StorPort driver stack of Windows Server 2003, instead of the behavior just described, a targeted LUN reset occurs. After that, a wait happens for approximately 7 to 15 seconds (3 seconds for renewal plus 2 seconds bus settle time, repeated three times to give the current owner a chance to renew). If the reservation is still clear, the former owner loses the lease and the challenger issues a RESERVE to acquire disk ownership and lease on the semaphore. With Windows 2000 and Windows Server 2003, the arbitration process is a bit different. Arbitration is done by reading and writing hidden sectors on the shared cluster disk using a mutual exclusion algorithm by Leslie Lamport. Despite this change, the Windows NT 4.0 reserve and reset process formerly used for arbitration still occurs with Windows 2000 and Windows Server 2003. However, the process is now used only for protecting the disk against stray I/Os, not for arbitration.
As of Windows 2000 Service Pack 2 or later (including Windows Server 2003), Microsoft has a new multipath I/O (MPIO) driver stack against which vendors can code new drivers. The new driver stack enables targeted resets using device and LUN reset (that is, you do not have to reset the whole bus) so that things like failover are improved. Consult with your hardware vendor to see if their driver supports the new MPIO stack.
When using a SAN with a server cluster, make sure you take the following into consideration:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||