Chapter 16 - Monitoring Multiple Processor Computers
In an ideal world, five processors would do five times the work of one processor. But we live in a world of contention for shared resources, of disk and memory bottlenecks, single-threaded applications, multithreaded applications that require synchronous processing, and poorly coordinated processors. In our world, five processors can be five times as much work for the systems administrator!
Fortunately, Windows NT 4.0 is designed to make the most of multiprocessor configurations. Multiple processors enable multiple threads to execute simultaneously, with different threads of the same process running on each processor. The Windows NT 4.0 microkernel implements symmetric multiprocessing (SMP), wherein any processes—including those of the operating system—can run on any available processor, and the threads of a single process can run on different processors at the same time.
The most common bottlenecks on multiprocessor systems arise when all processors contend for the same operating system or hardware resource. If this resource is in short supply, the system can't benefit from the additional processors.
Shared memory is the Achilles' Heel of multiprocessor systems: Although it enables the threads of a single process to be executed on different processors, it makes multiprocessor systems highly vulnerable to memory shortages, to the design of the cache controller, and to differences in cache management strategies.
Understanding the Multiple Processor Counters
Some Performance Monitor counters were designed for single processor systems and might not be entirely accurate for multiprocessor systems.
For example, on a multiprocessor computer, a process can (and often does) use more than the equivalent of 100% processor time on one processor. Although it is limited to 100% of any single processor, its threads can use several processors, totaling more than 100%. However, the Process: % Processor Time counter never displays more than 100%. To determine how much total processor time a process is getting, chart the Thread: % Processor Time counter for each of the process's threads.
Use the following counters to monitor multiple processor computers.
Charting Multiple Processor Activity
Logging and charting are similar for both multiple processor systems and single-processor systems. And because the graphs can get crowded and complex, its best to log the System, Processor, Process, and Thread objects, and then chart them one at a time. If you need to compare charts, start several copies of Performance Monitor and have them all chart or report on data from the same log file.
When monitoring a complex occurrence, a comparison of graphs can be more useful than a single graph.
You can also test each of your processors independently or in different combinations with single and multithreaded applications. Add Process: Thread Count to a Performance Monitor report to see how many threads are in each active process. Edit the Boot.ini file in your root directory to change the number and combination of active processors.
Task Manager, a new administrative tool, lets you determine which processes run on which processors of a multiprocessor computer. On the Task Manager Processes tab, click a process with the right mouse button, then select Set Affinity. The process you selected will run only on the processors selected on the panel. This is a great testing tool.
The following figures use histograms of the Process: % Processor Time counter to compare two active processes running on one processor to the same processes running on two processors.
The first graph shows the processes running on a single processor computer. Each process is getting about half of the processing time. All other processes are nearly idle.
The following figure shows the same processes running on a computer with two processors.
On the multiprocessor computer, each process is using 100% of a processor, and the system is doing twice the work. The processor time is the same as for a single process with a single processor all to itself.
However, to achieve this performance, the processes had to be entirely independent; the only thing they shared was their code. Each processor had a copy of the code in its primary and secondary memory caches, so the processes didn't even have to share physical memory or any common system resources. This is the ideal, simulated by CpuStress, a test tool designed for the purpose.
In the previous case, several processes were competing for the same resource. But resource contention occurs even among multiple threads of a single process. Threads within a process share and contend for the same address space and frequently are writing to the same memory location. Although this is a minor problem for single-processor configurations, it can become a bottleneck in multiprocessor systems.
Unfortunately, you can't see cache and memory contention directly with Performance Monitor because these conflicts occur at hardware level (where no counters exist). You can, however, get indirect evidence based on response time and total throughput: The processors simply appear to be busy.
In multiprocessor systems, shared memory must be kept consistent: that is, the values of memory cells in the caches of each processor must be kept the same. This is known as cache coherency. The responsibility for maintaining cache coherency in multiprocessor systems falls to the cache controller hardware. When a memory cell is written, if the cache controller finds that the memory cell is in use in the cache of any other processors, it invalidates or overwrites those cells with the new data and then updates main memory.
Two frequently used update strategies are known as write-through caching and write-back caching:
Write-back caching usually causes fewer writes to main memory and reduces contention on the memory bus, but as the number of threads grows and the likelihood that they will need shared data increases, it actually causes more traffic and resource contention.
Resource sharing and contention is much more common than isolated processing. Although ample processors exist for the workload, they must share the single pool of virtual memory and contend for disk access. There is no easy solution to this problem. However, it demonstrates the limits of even the most sophisticated hardware. In this situation, the traditional solutions to a bottleneck—adding more processors, disk space, or memory—cannot overcome the limitations imposed by an application's dependence on a single subsystem.