Chapter 15 - Detecting Cache Bottlenecks
The Windows NT file system cache is an area of memory into which the I/O system maps recently used data from disk. When processes need to read from or write to the files mapped in the cache, the I/O Manager copies the data from or to the cache as if it were an array in memory — without buffering or calling the file system. Because memory access is quicker than file operations, the cache provides an important performance boost to the processes.
Because the cache is just a part of physical memory, it is never really a bottleneck (although memory can be). However, when there is not enough memory to create an effective cache, the file system must retrieve more data from disk. This shortage of cache space is known as a cache bottleneck.
The size of the Windows NT file system cache is continually adjusted by the Virtual Memory Manager based upon the size of physical memory and the demand for memory space. In many operating systems, administrators can tune the cache size, but the Windows NT cache is designed to be self-tuning; you cannot change the cache size. For more information about the Cache Manager and the Virtual Memory Manager, see Chapter 5, "Windows NT Workstation Architecture."
Note Cache bottlenecks are rare on workstations. More often the cache is monitored as an indication of application I/O, since almost all application file system activity is mediated by the cache.
Cache bottlenecks are mainly a server problem: Workstations rarely generate enough traffic to pressure the cache. However, complex programs such as CAD/CAM applications and large databases that access large blocks of multiple files and benefit from the cache will suffer when the cache is too small. Also, cache bottlenecks only affect applications that use the cache effectively, for example, by using data in the same sequence in which it is stored, so data requested is likely to be in the cache.
To monitor the cache, log the Memory, Cache, and Logical Disk objects for several days at a 60-second update interval, then chart the following counters:
The Windows NT File System Cache
Cache is a French word for a place to hide necessities or valuables. In computer terminology, a cache is an additional storage area close to the component that uses it. Caches are designed to save time: In general, the closer the data is, the quicker you can get to it.
Windows NT 4.0 supports several cache architectures: caches on processor chips, caches on the motherboard, caches on physical disks, and caches in physical memory. This chapter describes the file system cache, a cache in physical memory through which data files pass on their way to and from disk or other peripheral devices.
The file system cache is designed to minimize the need for disk operations. When an application requests data from a file, the file system first searches the cache:
When determining what to cache, the Windows NT Virtual Memory Manager tries to anticipate the application's future requests for code and data, as well as its immediate needs. It might map an entire file into the cache, if space permits. This increases the likelihood that data requested will be found there.
The file system cache actually consists of a series of section objects created and indexed by the Windows NT Cache Manager. When the Virtual Memory Manager needs space in the cache, the Cache Manager creates a new section object. The files are then mapped—not copied—into the file system cache, so they don't need to be backed up in the paging file. This frees the paging file for other code and data.
Cache Hits and Misses
The simplest way to judge the effectiveness of the cache is to examine the percentage of cache hits, that is, how often data sought in the cache is found there. Cache misses, however, are even more important. When data is not found in the cache, or elsewhere in memory, the file system must make a time-consuming search of the disk. An application with a miss rate of 10% (a hit rate of 90%) requires twice as much disk I/O as an application with a miss rate of 5%.
Also, especially on a workstation, you must keep cache rates in perspective. On a system where cache reads are minimal, the hit-and-miss rates are not a significant performance factor. However, when running I/O-intensive applications such as databases, the cache hit-and-miss rates are an important performance measure of the computer and the application.
Pages are removed from the cache by flushing, that is, any changes are written back to disk, and the page is deleted. Two threads in the system process—the lazy writer thread and the mapped page writer thread—periodically flush unused pages to disk. The cache is also flushed when Virtual Memory Manager needs to shrink the cache because of memory constraints.
Applications can also request that a page copied from the cache be written back to disk. With write-through caching, the disk file is updated immediately; with write-back caching (the default), the Virtual Memory Manager waits until a batch of modifications has accumulated and writes them together.
Locality of Reference
Applications use memory most efficiently when they reference data in the same sequence or a sequence similar to the order in which the data is stored on disk. This is called locality of reference. When an application needs data, the data page or file is mapped into the cache. When an application's references to the same data, data on same page, or in the same the file, are localized, the data they seek is more likely to be found in the cache.
The nature of the application often dictates the sequence of data references. At other times, factors such as usability become more important in determining sequence. But by localizing references whenever possible, you can improve cache efficiency, minimize the size of process's working set, and improve application performance.
In general, sequential reads, which allow the Cache Manager to predict the application's data needs and to read larger blocks of data into the cache, are most efficient. Reads from the same page or file are almost as efficient. Reads of files dispersed throughout the disk are less efficient and random reads are least efficient.
You can monitor the efficiency of your application's use of the cache by watching the cache counters for copy reads, read aheads, data flushes and lazy writes. Read Aheads usually indicate that an application is reading sequentially, although some application reading patterns may fool the system's prediction logic. When data references are localized, a smaller number of pages are changed, so the lazy writes and data flushes decrease.
Copy read hits (when data sought in the cache is found there) in the 80-90% range are excellent. In general, Data flushes/sec are best kept below 20 per second, but this varies widely with the workload.
The file system cache is used, by default, whenever a disk is accessed. However, applications can request that its files not be cached by using the FILE_FLAG_NO_BUFFERING parameter in its call to open a file. This is called unbuffered I/O. Applications that use unbuffered I/O are typically database applications (such as SQL Server) that manage their own cache buffers. Unbuffered I/O requests must be issued in multiples of the disk sector size.
Cache Monitoring Utilities
In addition to Performance Monitor, several tools and utilities let you monitor the file system cache.
Task Manager displays the size of the file system cache on the Performance Tab in the Physical Memory box..
Performance Meter (Perfmtr.exe), a tool on the Windows NT Resource Kit 4.0 CD in the Performance Tools group, lists current statistics on the file system cache. It is run at the command prompt. Start Performance Meter, then type r for Cache Manager reads and write statistics. Type q to quit.
Response Probe, a tool on the Windows NT Resource Kit 4.0 CD, lets you design a workload and test it on your system. When your workload includes file I/O, you can choose whether the files accessed use the cache or are unbuffered. In this way, you can measure the effect of the cache strategy on your application or test file operations directly. For more information, see "Response Probe" in Chapter 11, "Performance Monitoring Tools."
Clearmem, another tool on the Windows NT Resource Kit 4.0 CD, allocates and references all available memory, consuming any inactive pages in the working sets of all processes (including the cache). It clears the cache of all file data, letting you begin your test with an empty cache.
Understanding the Cache Counters
The following Performance Monitor Cache and Memory counters are used to measure cache performance and are described in this chapter.
Important The Hit% counters are best displayed in Chart view. Hits often appear in short bursts that are not visible in reports. Also, the average displayed for Hit% on the status bar in Chart view might not match the average displayed in Report view because they are calculated differently. In Chart view, the Hit% is an average of all changes in the counter during the test interval; in Report view, it is the average of the difference between the first and last counts during the test interval.
Recognizing Cache Bottlenecks
The file system cache is a part of memory. It can be thought of as the working set of the file system. When memory becomes scarce and working sets are trimmed, the cache is trimmed as well. If the cache grows too small, cache-sensitive processes will be slowed by disk operations.
To monitor cache size, use the following counters:
Tip You can test the effect of a memory and cache shortage on your workstation without changing the physical memory in your computer. Use the MAXMEM parameter in the boot configuration to limit the amount of physical memory available to Windows NT. For more information, see "Configuring Available Memory" in Chapter 12, "Detecting Memory Bottlenecks."
The following graph shows that a memory shortage causes the cache to be trimmed, along with the working sets of processes, and other objects that compete with the cache for space in memory. The memory shortage was produced by running LeakyApp, a test tool that consumes memory.
In this graph, the thick black line represents Process: Private Bytes for LeakyApp. (Note that it has been scaled to 0.000001 to fit on the graph.) At the plateau in this curve, it held 70.4 MB of memory. The white line represents Memory: Cache Bytes. The gray line is Memory: Available Bytes, and the thin black line is Process: Working Set.
In this example, run on a workstation with 32 MB of physical memory, the memory consumption by LeakyApp affects all memory, but not to the same degree. Available Bytes drops sharply then recovers somewhat, apparently because pages were trimmed from the working sets of processes. Cache size falls steadily until all available bytes are consumed, and then it levels off. In addition, page faults—not shown on this already busy graph—increase steadily as working sets and cache are squeezed.
The effect of a smaller cache on applications and file operations depends upon how often and how effectively applications use the cache.
Applications and Cache Bottlenecks
Applications that use the cache effectively are hurt most during a cache shortage. A relatively small cache, under 5 MB in a system with 16 MB of physical memory, is likely to become a bottleneck for the applications that use it.
However, normal rates of reads, hits, and flushes vary widely with the nature of the application and how it is structured. Thus, you must establish cache-use benchmarks for each application. Only then can you determine the effect of a cache bottleneck on the application.
To monitor the effect of a cache bottleneck on an application, log the Cache and Memory objects over time, then chart the following counters:
Tip To test your application with different size caches, add the MAXMEM parameter in the Boot.ini file. This lets you change the amount of memory available to Windows NT without affecting the physical memory in your computer.
A cache bottleneck appears in an application as a steady decrease in Copy Read Hits while Copy Reads/sec are relatively stable. There are no recommended levels for these counters, but running an application over time in an otherwise idle system with ample memory will demonstrate normal rates for the application. It will also let you compare how effectively different applications use the cache. When you run the same applications on a system where memory is scarce, you will see this rate drop if the cache is a bottleneck. In general, a hit rate of over 80% is considered to be excellent. A 10% decrease in normal hit rates is cause for concern and probably indicates a memory shortage.
The following graph shows a comparison of copy reads and copy hits for several instances of a compiler. Compilers are relatively efficient users of the cache because their data (application code) is often read and processed sequentially. During the short time-interval represented here, the cache size varied from 6.3 MB to 7.3 MB.
In this example, the thicker line is Copy Reads/sec and the thin line is Copy Read Hits %. The Copy Reads/sec, averaging 6 per second, are a moderate amount, and the Copy Read Hits %, at an average of 32%, are also moderate. This indicates that, on average, fewer than 2 reads/sec are satisfied by data found in the cache. The remainder are counted as page faults and sought elsewhere in memory or on disk.
It is important to put some of these rates in perspective. When copy reads are low (around 5 per second), a 90% average hit rate means that the data for 4.5 reads was found in the cache. However, when reads are at 50 per second, a 40% hit rate means that data for 20 reads was found in the cache.
Accumulating data like this while varying the amount of memory will help you determine the effect of cache size on your application.
Page Faults and Cache Bottlenecks
When memory is scarce, more data must remain on the disk. Accordingly, page faults are more likely. Similarly, when the cache is trimmed, cache hit rates drop and cache faults increase. Cache faults are a subset of all page faults.
Note The operating system sees the cache as the file system's working set, its dedicated area of physical memory. When data isn't found in the cache, the system counts it as a page fault, just as it would when data was not found in the working set of a process.
To monitor the effect of cache bottlenecks on disk, use the following counters:
The following graph shows the proportion of page faults that can be traced to the cache. Cache Faults/sec includes data sought by the file system for mapping as well as misses in copy reads for applications. Because both the Cache Faults/sec and Page Faults/sec counters are measured in numbers of pages, they can be compared without conversions.
In this example, the thin black line represents all faulted pages; the thick black line represents pages faulted from the cache. Places where the curves meet indicate that nearly all page faults are cache faults. Space between the curves indicates faults from the working sets of processes. In this example, on average, only 10% of the relatively high rate of page faults happen in the cache.
The important page faults, however, are those that require disk reads to find the faulted pages. But the memory counters that measure disk operations due to paging make no distinction between the number of reads or pages read due to cache faults and those caused by all faults.
This graph and the report that follows show that most faulted pages are soft faults. Of the average of 182 pages faulted per second, only 21.586—less than 12%—are hard faults. It is even more difficult to attribute any of the pages input due to faults to the cache.
Applications and the Cache
Cache bottlenecks on workstations are uncommon. More often, the Performance Monitor cache counter values are used as indicators of application behavior. Although some large database applications, such as the Microsoft SQL Server, bypass the cache and do their own caching, most use the file system cache.
Data requested by an application is first mapped into the cache and then copied from there. Data changed by applications is written from the cache to disk by the Lazy Writer system thread or by a write-through call from the application. Thus, watching the cache is like watching your application I/O.
Remember, however, that if an application uses the cache infrequently, cache activity will have an insignificant effect on the system, the disks, and on memory.
Reading from the Cache
There are four types of application reads:
The following graph shows the frequency of different kinds of cache reads during the run of a compiler. The intersecting curves are difficult to interpret, so a second copy of Performance Monitor—a report set to the same Time Window as the graph—is appended.
In this example, copy reads are more frequent than fast reads. This pattern of many first reads and fewer subsequent reads indicates that the application is probably reading from many small files. The rate of read aheads is also low, which is another indication that the application is skipping from file to file. When more fast reads than copy reads occur, the application is reading several times from the same file. The rate of read aheads should increase as well.
Writing to the Cache
Although most of this chapter has described using the cache to prevent repeated file operations for reading, it's important to note that applications can also write to data pages in the cache, though not directly. When applications write data to files in their memory buffers that have been copied from the cache, the changes are copied back to the cache. The application continues processing without waiting for the data to be written back to disk.
The system does not count copies or writes to cache directly, but these changes appear in Performance Monitor as data flushes and lazy writes when they are flushed. Cache pages are flushed to free up cache space in several different ways:
To measure the rate at which changed pages mapped into the cache are written to disk, use the following counters:
In general, lazy writes reflect the amount of memory available in the cache. Lazy writes are a subset of all data flushes, which include write through request from applications and write-back requests by the mapped page writer thread.
The following display was made with three copies of Performance Monitor all charting from the same log file with the same time window.
The top graph shows the ratio of Lazy Write Flushes/sec (the white line) to all data flushes, as represented by Data Flushes/sec (the black line). The space between the lines indicates mapped page writer flushes and application write-through requests. In this example, as the report shows, on average, 73.5% of the data flushes were lazy writes, but lazy writes accounted for only 60% of the pages flushed.
The bottom graph shows the relationship between Data Flush Pages/sec (the black line) and Data Flushes/sec (the white line). The points where the curves meet indicate that data is being flushed one page at a time. Space between the curves indicates that multiple pages are written. The report shows that the lazy writer flushed 1.6 pages per write on average compared to 1.8 pages for all flushes. These small numbers indicate that the system is reading from many small files. Lazy writes often average 15-16 pages of data.
The spikes in the data probably result from an application closing a file and the lazy writer writing all of the data back to disk. To see just how many pages went back to disk, narrow the time window to a single data point, and add the same counters to a report.
This report on the second spike shows that in that second (averaged for the last two data points), about 101 pages were written back to disk, nearly 40% of which were lazy writes.
Tuning the Cache
This is going to be a short section, because there is not much you can do to tune the Windows NT file system cache. The tuning mechanisms are built into the Virtual Memory Manager to save you time for more important things. Nonetheless, there are a few things you can do to make the most of the cache: