Windows 2000 Terminal Services Capacity and Scaling

Terminal Services is a technology that lets users execute Windows-based applications on a remote Windows 2000-based server. This white paper contains testing methodologies, results, analysis, and sizing guidelines for Windows 2000 Terminal Services. Groupe Bull and NEC engineers, under the supervision of Microsoft's Terminal Services development team, performed the sizing tests and data collection at NEC's Redmond Technology Center in Redmond, WA, USA. The tests were performed using Windows 2000 Advanced Server, build 2195.

w2ktss01

Introduction

Terminal Services is a technology that lets users execute Windows-based applications on a remote Windows 2000-based server. This white paper contains testing methodologies, results, analysis, and sizing guidelines for Windows 2000 Terminal Services. Groupe Bull and NEC engineers, under the supervision of Microsoft's Terminal Services development team, performed the sizing tests and data collection at NEC's Redmond Technology Center in Redmond, WA, USA. The tests were performed using Windows 2000 Advanced Server, build 2195.

For information on Terminal Services features, licensing and architecture please see:

Exploring Terminal Services

http://www.microsoft.com/windows2000/terminalservices

In a server-based computing environment, all application execution and data processing occur on the server. Therefore it is extremely useful and desirable for server manufacturers to test the scalability and capacity of their servers to determine how many client sessions a server can typically support under a variety of different scenarios. Groupe Bull and NEC began this testing procedure under the supervision of Microsoft starting with the Beta 3 release of Windows 2000. Multiple NEC/Groupe Bull Express5800 hardware configurations were tested with Terminal Services in order to provide customers with guidelines to choose the right server according to their needs.

The results and analysis contained here should not be interpreted in isolation. The client applications used in the test (mostly components of Microsoft Office 2000) are not easy to characterize without accounting for the features or data sets an individual uses or creates. Three different user scenarios are tested in accordance with Gartner Group recommendations (Knowledge Worker, Structured Task Worker and Data Entry Worker), but the actual applications, features, and data sets used in these user scenarios cannot precisely mimic the experience of a real-life user on a moment-by-moment basis. The tests assume a rather robotic quality, with users taking no prolonged breaks and essentially using the same functions and data sets during a ten to thirty minute period of activity. In short, your results may vary.

The results are conservative however, with a server considered to be at capacity when the server is 10 percent slower than it was with a single user load. With this in mind, consider buying a server that will, based on the analysis, comfortably accommodate the required number of users under the expected peak workload, leaving room for expansion.

Top of pageTop of page

Results Overview

Server Capacity

The actual number of users that a specific configuration of server can support varies depending on several criteria such as the processor type, the size of the memory, the hard disk, the network configuration, and the user type (typing speed, applications used, frequency and so forth).

Table 1 Maximum Users by Scenario and Server Type

Server configurationExpress5800 Model NumberStructured Task WorkerKnowledge WorkerData Entry WorkerData Entry Worker Dedicated

8 x Pentium III 500 MHz
1 MB L2 Cache
4096 MB

HV8600

105 Users

160 Users1,2

Not Tested3

Not Tested(3)

4 x Pentium III
500 MHz
1 MB L2 Cache
4096 MB

HX4600

90 Users

135 Users

Not Tested(3)

Not Tested(3)

2 x Pentium III
450 MHz
0.5 MB L2 Cache
1024 MB

MC2400

40 Users

70 Users

320 Users(1,2)

350 Users(1,2)

1 x Pentium III
450 MHz4
0.5 MB L2 Cache
1024 MB

MC2400

25 Users

35 Users

280 Users(1)

280 Users(1)5

4 x Pentium Pro
0.5 MB L2 Cache
200 MHz
1024 MB

MH4000

30 Users

50 Users

Not Tested

Not Tested

Figure 1: Maximum Users by Scenario and Processor Configuration on Pentium III Systems

Figure 1: Maximum Users by Scenario and Processor Configuration on Pentium III Systems
See full-sized image.

System and User Memory Requirements

Table 2 below contains general guidelines for Windows 2000 Terminal Services memory requirements, based on the results achieved in the performance lab.

Table 2 Recommended Memory

 Structured Task WorkersKnowledge WorkersData Entry WorkersData Entry Workers Dedicated

Memory per user (MB)

9.3

8.5

3.5

3.3

System Memory (MB)

128

Total Memory

System + (# of Users x Memory per User)

Comparison with Windows NT Server 4.0, Terminal Server Edition

On the 4-processor Pentium Pro system with 1 GB of memory, Windows 2000 Terminal Services scaled to the same number of users that the same system running Terminal Server 4.0 achieved. In previous tests on 4-way Pentium II Xeon hardware, Windows 2000 Terminal Services scaled up to 20 percent better than Terminal Server 4.0. This may indicate that Windows 2000 Terminal Services makes better use of faster hardware than Terminal Server 4.0 does.

Top of pageTop of page

Test Environment and Testing Tools

Test Environment

The Terminal Services testing laboratory is shown in Figure 2 below.

The Express5800 servers tested with Windows 2000 Terminal Services were:

Express5800 HV8600

Express5800 HX4600

Express5800 MC2400

Express5800 MH4000

Windows 2000 Advanced Server, build 2195, was installed on these servers. All settings are defined in Appendix C: Terminal Server Settings. Overviews of the server specifications are included in Appendix D: Express5800 Server Specifications. For detailed specifications of the servers, see the Bull Express5800 Server Web site at http://www.servers.bull.com/express5800 and the NEC Web site at http://www.nec-computers.com .

Other components of the testing laboratory included:

Test manager: PowerMate 8100, Pentium II 400 MHz, with Windows NT Workstation 4.0 Service Pack 5 (SP5). This workstation manages the 64 client workstations, including script control, software distribution, and remote reset of the workstations.

64 client workstations: Pentium II 350 MHz, 64 MB RAM, 8 GB hard disk with Windows NT Workstation 4.0 SP5. Multiple Terminal Services Client sessions can be running on each of the 64 workstations.

Client workstation domain controller: PowerMate 8100, Pentium III 500 MHz, with Windows NT Server 4.0 SP5. The domain controller for the test manager and the client workstations. It is also the DHCP server for the client workstations. The logon script on this server updates all the client workstations at startup.

Web server: Express5800 MT2200, 2 x Pentium II 300 MHz, 128 MB RAM, 3 x 9 GB hard disk, with Windows NT Server 4.0 SP5 and Internet Information Server (IIS) 4.0. It is also the domain controller for the mail server, the database server and the terminal server. Used in the Knowledge and Structured Task Worker tests.

Mail server: Express5800 HX4500, 4 x Pentium II Xeon 400 MHz, 1MB Level 2 cache, 1GB RAM, 3 x 8 GB hard disk (RAID 5), with Windows NT Server 4.0 SP5 and Microsoft Exchange 5.5 Service Pack 2 (SP2). Used in the Knowledge and Structured Task Worker Tests.

Database server: Express5800 MT2200, 2 x Pentium II 300 MHz, 128 MB RAM, 9 GB hard disk with Windows NT Server 4.0 SP5 and Microsoft SQL Server 6.5 SP5. Used only for the Data Entry Worker tests.

Figure 2: Testing Lab Environment

Figure 2: Testing Lab Environment
See full-sized image.

Testing Tools and Scripts

Microsoft developed the testing tools and scripts used on the clients to accurately simulate a true user session.

Testing Tools

The SMClient tool is used to simulate a user session from a script file. It has the ability to send keystrokes, mouse movements, and clicks as well as the ability to wait for data to appear on the screen before proceeding. Unlike a utility that runs on the server, such as Microsoft Visual Test, SMClient sends data through the Microsoft Terminal Services Client software, using the Remote Desktop Protocol (RDP). The SMClient utility drives the Terminal Services Client as if a user were actually performing the actions at the client machine itself. Therefore, this testing tool leads to more accurate results than if the test software were running on the server side.

To assist with the test environment, the following three automation utilities were developed by Microsoft:

RoboClient. Runs on each client workstation and waits for commands from the RoboServer, such as when to launch the SMClient and which script to use.

RoboServer. The console utility that runs on the Test Manager workstation, allowing the tester to assign scripts to users and automatically start scripts at pre-determined intervals.

QueryIdle. A utility that polls the client sessions periodically to determine whether the scripts are still running or whether any are 'stuck' waiting for text that has not appeared.

Testing Scripts

Three scripts were developed based on Gartner Group specifications6 for the Knowledge Worker, Structured Task Worker, and Data Entry Worker as defined below.

Knowledge Workers

Defined as a worker who gathers, adds value to, and communicates information in a decision support process. Cost of downtime is variable but highly visible. These resources are driven by projects and ad-hoc needs towards flexible tasks. These workers make their own decisions on what to work on and how to accomplish the task.

Example job tasks include: marketing, project management, sales, desktop publishing, decision support, data mining, financial analysis, executive and supervisory management, design, and authoring.

Structured Task Workers

Workers who are typically a link in a workflow or process and perform the same tasks repetitively. The process worker is driven in their daily jobs by a set process, rather than ad-hoc projects. Cost of downtime varies; most workers are only partially dependent on computer availability.

Example job tasks include: Claims processing, accounts payable, accounts receivable, customer service, high-end manufacturing, high-end maintenance, and repair.

Data Entry Workers

Workers who input data into computer systems - example: transcription, typists, order entry, clerical, and manufacturing.

Additionally, the Data Entry Worker script was tested in a 'dedicated' mode, by not starting a Windows Explorer shell for each user.

Gartner defines another class of worker the High Performance Worker. Workers of this type typically use specialized computing platforms and applications to perform their tasks, such as genetic engineering, chip designing, quantum physics, 3D modeling, 3D animation, and simulation. Because these types of applications would not be suitable to run on a terminal server, this class of worker was not tested.

A detailed flowchart describing the functions of the scripts is contained in "Appendix B: Test Script Flow Sharts". The utilities used to perform these tests are available on the Windows 2000 Resource Kit.

The scripts developed for these tests are Microsoft's interpretations of the Gartner Group user definitions, and are provided "as is". They will not work in your test environment without some modifications, such as changing the various server names that are hard coded in the scripts to match those in your test environment. They are available for downloading at: http://www.microsoft.com/windows2000/library/operations/terminal/loadscripts.asp

Testing Methodology

Windows 2000 Advanced Server and Office 2000 were installed using settings described in "Appendix C: Terminal Server Settings."

An automated server and client workstation reset was performed before each test-run to revert to a clean state for all the components.

The canary, or timer script was used to determine when or if a terminal server was over-loaded. It performs actions similar to the Structured Task Worker script, but with a higher typing rate, and it only goes through the script once and then logs off. This usually takes about nine minutes on an idle system. The canary script was executed on the Test Manager workstation before any users were logged onto the terminal server and the time the script took to complete (elapsed time) was recorded automatically by the RoboServer. This elapsed time became the baseline and was deemed to be the baseline response rate for a given configuration of server.

For each scenario, the Test Manager workstation started groups of ten client sessions on the client workstations, with a 30-second interval between each session. The canary script was re-executed on the Test Manager workstation when the last client session in a group was started. At the same time, a 15-minute stabilization period was observed in which no additional sessions were started. For both of the Data Entry Worker scenario tests (normal and dedicated), these intervals were decreased because of the high number of users these scenarios could support and the length of time these tests would have otherwise taken. Given the repetitive nature of the Data Entry Worker script, this was not deemed to have a significant effect on the results, unlike the Knowledge Worker and Structured Task Worker, which performs more varied tasks.

The maximum load was determined to have been reached when the duration of the canary script was 10 percent longer than the baseline, or when a restricting server event occurred, such as running out of paged pool or system page table entry (PTE) address space. Assuming the maximum load had not been reached, the process was repeated with 10 more users and another 15-minute stabilization period.

When the maximum load was reached, the last 10 test clients were considered to have overloaded the system and were not counted as having successfully logged on, unless the average of the before-maximum and after-maximum canary times was less than 10 percent above the baseline time, in which case the last five clients were considered to have overloaded the system and were not counted as having logged on.

Figure 3 below shows an example of the elapsed time for the canary script, recorded when running Terminal Services Client sessions on an Express5800 2-Way Server.

Figure 3: Example of Canary Time by Number and Profile of Users

Figure 3: Example of Canary Time by Number and Profile of Users
See full-sized image.

Top of pageTop of page

Analysis of the Results

Overview

Although the scripts used in these scenarios simulate tasks that a normal human being could perform, the users simulated in these tests are tirelessthey never reduce their intensity level. The simulated clients type at a normal rate, pause as if looking at dialog boxes, and scroll through mail messages as if to read them, but they do not get up from their desks to get a cup of coffee, they never stop working as if interrupted by a phone call, and they do not break for lunch. This approach yields accurate but conservative results.

Figure 4 below shows the maximum number of users supported by scenario on the Express5800 MC2400, in 1-way and 2-way processor configurations. Both configurations had 1 GB of physical RAM. The Data Entry Worker and Data Entry Worker 'Dedicated' for this chart are not CPU-bound in the 2-way configuration in either casein both instances it reached a kernel address space limitation. See the section, Effect of Kernel Address Space Limitations, for more information.

Figure 4: Maximum Users by Scenario and Processor Configuration

Figure 4: Maximum Users by Scenario and Processor Configuration
See full-sized image.

Memory Requirements and Utilization

In addition to the 128-MB base minimum memory requirements for a Windows 2000-based server, the amount of memory needed per user for these scenarios is shown in Figure 5 below.

Figure 5: Memory Requirements by Scenario

Figure 5: Memory Requirements by Scenario
See full-sized image.

Determining the amount of memory necessary for a particular use of a terminal server is complex. It is possible to measure how much memory an application has committedthe memory the operating system has guaranteed the application that it can access. But the application will not necessarily use all of that memory, and it certainly is not using all of that memory at any one time. The subset of committed bytes that an application has touched recently is referred to as the working set of that process. Because the operating system can page the memory outside a process's working set to disk without a performance penalty to the application, the working set, if used correctly, is a much better measure of the amount of memory needed.

The Process performance object's working set counter, used on the "_Total" instance of the counter to measure all processes in the system, measures how many bytes have been touched recently by threads in the process. However, if free memory in the computer is sufficient, pages are left in the working set of a process even if they are not in use. If free memory falls below a threshold though, unused pages are trimmed from working sets.

Therefore the method used in these tests for determining memory requirements cannot be as simple as observing a performance counter. It must account for the dynamic behavior of a memory-limited system.

The most accurate method of calculating the amount of memory required per user is to analyze the results of the Total Process Working Set performance counter in a memory-constrained scenario. When a system has abundant physical RAM, the working set will initially grow at a high rate, and pages will be left in the working set of a process even if they are not in use. Eventually, when the total working set exceeds the amount of physical memory, the operating system will be forced to trim the unused portions of working sets until the total working set is below the amount of physical memory. This trimming of unused portions of the working sets will occur until the applications collectively need more physical memory than is available, a situation that requires the system to constantly page to maintain all the processes' working sets. In operating systems theory terminology, this constant paging state is referred to as thrashing.

Figure 6 below shows the Total Process Working Set from a Data Entry Worker test with 512 MB of physical RAM. Also plotted is the number of users for this test on the secondary y-axis.

Figure 6: Total Process Working Set and Number of Users vs. Time, Data Entry Worker Scenario

Figure 6: Total Process Working Set and Number of Users vs. Time, Data Entry Worker Scenario
See full-sized image.

The results are very close to what is expected.

Zone 1 represents the abundant memory stage. This occurs when physical memory is greater than the total amount of memory that applications need. In this zone, the operating system has no reason to page anything to disk, even seldom-used pages.

Zone 2 represents the stage when unused portions of the working sets are trimmed. In this stage the operating system begins to trim the unused pages from the processes' working sets. This state is acceptable and applications should respond at a good rate because, in general, only unused pages are being paged to disk.

Zone 3 represents controlled growth. The working set in this stage accurately reflects the working set for the scenario being judged. Either the inflection point at which Zone 3 begins or the slope of the line in Zone 3 can be used to determine the actual per-user working set. Note that the slope of this line is shallower than the slope of the Zone 1 line.

At the end of Figure 6, the system begins thrashing. The test quickly ends as the system becomes less usable and scripts fail due to lack of responsiveness.

In Figure 6, it seems as though the amount of physical memory is greater than 512 MB, because the operating system does not start to trim working sets until the total is well above 600 MB. This is the effect of cross-process code sharing, which makes it appear that there is more memory used by working sets than actually available. Considering code sharing, this method will slightly overestimate the amount of memory needed per user, an acceptable situation that provides an area of "breathing room" for the system.

Figure 7 below shows the total process working set divided by the number of active sessions for the same scenario.

Figure 7: Working Set Per User and Number of Users vs. Time, Data Entry Worker Scenario

Figure 7: Working Set Per User and Number of Users vs. Time, Data Entry Worker Scenario
See full-sized image.

The amount of memory needed can be determined from the average point on which the line converges toward the end of this graph (which is in Zone 3). The working set per user for the Data Entry Worker is 3.5 MB.

Although a reasonable amount of paging is acceptable, paging naturally consumes a small amount of the CPU and other resources. Because the maximum users that could be loaded onto a system (Figure 1, Figure 4) were determined on systems with abundant physical RAM, it only performed a minimal amount of paging. The working set calculations assume a reasonable amount of paging has occurred to trim the unused portions of the working set, but this would only occur on a system that was memory-constrained. If you take the base memory requirements and add to that the number of users multiplied by the required working set, you end up with a system that is naturally memory-constrained and therefore acceptable paging will occur. On such a system, expect a slight decrease in performance due to the overhead of paging. This decrease in performance can reduce the number of users who can be actively working on the system before the canary time reaches ten percent over its baseline.

Network Utilization

Network utilization for the four scenarios is shown in Figure 8 below. This includes all traffic into and out of the terminal server for these scenarios.

Figure 8: Total Network Utilization (including RDP and all other network traffic) by Scenario

Figure 8: Total Network Utilization (including RDP and all other network traffic) by Scenario
See full-sized image.

Network utilization tends to be quite low on Terminal Services, both because of protocol efficiency and because the default setting of the Terminal Services Client (mstsc.exe) is to use data compression for all connections. Note that persistent caching was not enabled for this test because this feature works only with a single instance of the Terminal Services Client application. In these tests, multiple Terminal Services sessions are run on each client machine.

Figure 9 below shows network usage in bytes per user, for the Data Entry Worker scenario. This is taken from the Bytes Total/Sec. counter in the Network Interface performance object. This graph illustrates how the bytes per user average was calculated, as it converges on a single number when sufficient simulated users are running through their scripts. The number of user sessions is plotted on the secondary axis. This count includes both bytes received and sent by the terminal server, using any network protocol.

Figure 9: Data Entry Worker Scenario Network Utilization Per User and Number of Users vs. Time

Figure 9: Data Entry Worker Scenario Network Utilization Per User and Number of Users vs. Time
See full-sized image.

In these tests, the terminal server's local hard drive is used for all user data storage and profiles, and no roaming profiles or network home directories were used. Therefore, these network utilization numbers reflect only the traffic of the RDP protocol itself, in addition to a small amount of domain controller, Microsoft Exchange Server, Microsoft SQL Server, IIS Server, and test control traffic. In a normal terminal server environment there will be more traffic on the network, especially if user profiles are not stored locally.

Effect of Logon Activity on CPU Utilization

In each of the tests, the CPU utilization graphs are similar to the one in Figure 10 below, in that they consist of an ascending phase corresponding with the test scenario script starting on each client workstation, with a modicum of CPU-intensive logon activity followed by a stabilization plateau after each set of 10 connections.

Figure 10: Example of Plateau Phases

Figure 10: Example of Plateau Phases
See full-sized image.

Effect of Typing Rate on CPU Utilization

Changing the typing rate in these tests increases CPU utilization and has an effect on scalability, with higher typing rates corresponding to fewer users.

In the standard tests, the Structured Task Worker scenario has a typing rate of approximately 60 words per minute (WPM), and the Knowledge Worker has a typing rate of 35 WPM. Note that the Gartner Group does not specify typing rates in the worker definitions. To test the effect of altering the typing rate, each scenario was run twice, once at 35 WPM and once at 60 WPM. As Figure 11 below shows, the higher typing rate corresponds to fewer users before the canary time reaches ten percent above the baseline time.

Figure 11: Effect of Typing Speed on Scalability

Figure 11: Effect of Typing Speed on Scalability
See full-sized image.

Although typing rate affects the results, the two scenarios have other characteristics that also affect scalability. The Structured Task Worker script spends less time in each application than the Knowledge Worker script when both are run at the same typing speed. In addition, the Structured Task Worker opens and closes applications as it moves between different tasks. The Knowledge Worker, on the other hand, keeps applications open all the time and switches between them.

These results indicate that in real-world situations, the expected typing rate of users should be taken into consideration when sizing a system. In addition, users who open and close applications (instead of switching between them) and users who move quickly between tasks will place a heavier load on a system.

Effect of Remote Desktop Protocol Encryption

In the Windows 2000 Server Family, the default Terminal Services Remote Desktop Protocol encryption level is Medium, which provides 2-way encryption using RSA Security's RC4 encryption algorithm, with a 56-bit key. The Remote Desktop Protocol can also be configured to use 128-bit encryption when the Windows 2000 High Encryption Pack is installed. It can be found at http://www.microsoft.com/windows2000/downloads/recommended/encryption/default.asp

(Note that this requires that high-encryption RDP clients be installed on each computer after the pack is applied). Tests were performed to test the impact of using 128-bit (High) encryption on the Knowledge Worker and Structured Task Worker scenarios, with the maximum user results shown in Figure 12.

Figure 12: Effect of adding Windows 2000 High Encryption Pack

Figure 12: Effect of adding Windows 2000 High Encryption Pack
See full-sized image.

Effect of Remote Desktop Protocol Compression

Tests performed on pre-release versions of Windows 2000 Terminal Services indicated that RDP compression does not have a significant impact on server capacity. It is for this reason that RDP compression is enabled by default when the Terminal Services Client application is started.

Effect of Background Spelling and Grammar Checking

Based on the results of previous tests, background grammar checking was disabled in Microsoft Word for the Knowledge Worker and Structured Task Worker scenarios. Background grammar checking had a significant negative impact on scalability, reducing the number of users supported on the four-way Knowledge Worker scenario to about half. Microsoft is currently investigating this issue. If you wish to disable background grammar checking, you can use foreground checking by pressing F7 from within Word.

Effect of Changes to Default Settings

In order to achieve a manageable test environment certain changes were made to the default settings of the operating system and applications. However, the default settings were changed one at a time and tests were run to ensure that disabling certain options did not produce results that would be unachievable otherwise.

In the baseline tests, Microsoft Word had the AutoSave option and the Allow Background Saves option disabled, to make the test environment easier to manage. Enabling these options for a one-time test did not have a significant effect on performance.

In addition, in the baseline tests Clipboard Mapping which allows the server and client clipboards to be sharedwas disabled in order to allow several scripts to run simultaneously on each computer without interfering with one another. Running a single test on a pre-release build with this setting enabled did not have a significant impact on scalability.

Effect of Kernel Address Space Limitations

The 32-bit Windows platform is named after its 32-bit address space, meaning that up to 2^32 bytes (4 GB) can be addressed at any one time, regardless of physical RAM7. By default, 2 GB of this address space is allocated to user-mode processes, and 2 GB is allocated to the kernel. Although separate 2 GB regions of address space are used for user-mode processes in the system, most of the 2 GB kernel area is global and remains the same regardless of the user-mode process currently active.

The 2 GB of kernel area contains all system data structures and information. Therefore, the 2 GB kernel address space area can impose a limit on the number of system data structures and the amount of kernel information that can be stored on a system, regardless of physical memory.

Two types of data that share a portion of this 2 GB address area are paged pool allocations, or memory allocations made by kernel-mode components, and kernel stack allocations, or stacks created in the kernel for each thread for when that thread makes system calls. Paged pool allocations are made in the Paged Pool area, and kernel stack allocations are made in the System PTE area.

Although these different allocations share the same area, the partition between them is fixed at boot: If the system runs out of space in one of those areas, the other area cannot donate space to it, and applications may begin to encounter unexpected errors. Therefore, when a customer sees a system that is experiencing unexpected errors or inability to accept new logins, without the system having some other resource limitation (such as CPU or disk), it is probably due to the Paged Pool area or the System PTE area running out of space. Since, by default, the System PTE area is sized to be as large as possible on a system with Terminal Services enabled, the limitation will usually be due to insufficient Paged Pool address space. Fortunately, the System PTE area can be configured to be smaller, which can alleviate the symptoms and permit more users.

Diagnosing and Optimizing a Kernel Address Space Limited System

In order to determine whether your system has run out of one of these resources, and to learn the steps necessary for tuning the System PTE allocation, please refer to the Knowledge base article 247904 at http://support.microsoft.com/default.aspx?scid=KB;en-us;247904&sd=tech

You can also use the Kernel Tuning spreadsheet contained in the archive located at http://www.microsoft.com/windows2000/library/operations/terminal/loadscripts.asp

Top of pageTop of page

Performing Your Own Scaling Tests

To Test or Pilot?

The purpose of this document is to give the system administrator a starting point from which to base his or her own sizing efforts. Unless you are prepared to spend large amounts of resources analyzing your users work habits and capturing these actions into a simulated script, you will find that it is more effective to go into a 'pilot' mode after you have determined that your applications work in a Terminal Services environment.

Once you have chosen a server configuration as a starting point (based on this white paper's findings), you can gradually add users to determine the maximum number that a system configuration (terminal server/network architecture/infrastructure servers) can support.

It is recommended that you add small batches of users to the server at a time (in a similar fashion to the testing methodology used in this paper) to determine when the system slows down to unacceptable level. Obviously these batches of users should be added in intervals of hours or days, rather than minutes, as there is likely to be a delay in the performance impact to the system as each user becomes familiar with the new system.

As a precaution, it is a good idea to have an identical secondary server available in case the first one experiences a hardware failure, but try to avoid initially testing the effects of load-balancing, unless you are using it purely for fail over. Once you have determined the terminal server configuration, you can then expand the scenario by testing load balancing.

As an aid to understanding the various factors involved when running applications on a terminal server, the following items should also be taken into consideration.

Determining Application Suitability

If some or all of your desktops are capable of running the application locally, consider using application distribution technology such as Windows 2000 Professional and IntelliMirror management technologies, or Microsoft Systems Management Server. It is a better use of resources to run a frequently used productivity application on a LAN-connected, Windows-based PC than on a terminal server attached to the same LAN. Applications that make extensive use of graphics or multimedia (such as Windows Media Player, voice recognition, or CAD applications), are not suited for running on a terminal server and may not scale effectively or even work at all. Other issues such as how the application writes to the screen, and whether the application uses large amounts of CPU while idle or when the user is typing will also determine its suitability for use on a terminal server.

However, if your application is frequently updated, needs to be accessed from a non-Windows desktop or manipulates large amounts of data over a low-bandwidth connection, then that application may be a good candidate for running on a terminal server.

If it is determined that a terminal server is the most practical method of distributing the application, consider just running the application on the terminal server, and not the entire desktop. This can save significant amounts of resources on the terminal server and may allow many more users to log on simultaneously.

Characterization of Users

User usage patterns have a significant impact on terminal server performance and should be considered carefully when sizing a terminal server. User usage characteristics will have different effect on a terminal server than what is expected on a traditional Windows-based PC. In a PC-centric architecture, the speed at which a user inputs characters from the keyboard will not have a significant impact on CPU utilization. The same cannot be said for a terminal server. Because each character typed on the client requires processing on the terminal server, and many users can be typing at one time, the speed at which the users enter characters has a significant effect on scalability. Other factors such as whether all of your users logon at the same time of day and how often they take breaks will also have an effect on overall system responsiveness.

Network Utilization

Understanding the network environment is especially important when designing a terminal server solution that involves WAN communications. Even infrequent network slowdowns can provide unacceptable performance to terminal server users. Both latency (the time it takes a packet to reach the other end of the network) and bandwidth (the amount of data that can travel over the network within a given period of time) are equally important factors. Because everything a user sees on their screen is generated by the server, high-latency has a serious impact on the perceived response of the system, while low-bandwidth affects the time it takes to get large chucks of data (e.g. bitmaps) to the user's screen. Therefore, variables such as the typing rate of the users, the amount of graphics used in an application, and how many users are working at any one time over a WAN connection all factor into the equation when asking, "How many users can I connect to a terminal server over such and such a connection?" The only safe way of determining this is to test it in real life, but if your latency over a WAN connection is low, you can use the data from Figure 8 to estimate the average network bandwidth required by each user. Keep in mind that the user experience very much depends on there being sufficient bandwidth available for when the application is writing large amounts of information to the screen. Connecting over a low-bandwidth connection has no significant impact on terminal server scaling.

Top of pageTop of page

Appendix A: Example Performance Charts

Example of Processor Utilization

Figure 13 below shows the average processor utilization recorded when running Terminal Services client sessions on an Express5800 2 Way Server.

Figure 13: Average Processor Utilization by Scenario on an Express5800 2 way Server

Figure 13: Average Processor Utilization by Scenario on an Express5800 2 way Server
See full-sized image.

Example of Network Utilization

Figure 14 shows the average network utilization recorded when running Terminal Services client sessions on an Express5800 2 Way Server.

Figure 14: Average Network Utilization by Scenario on an Express5800 2 way Server

Figure 14: Average Network Utilization by Scenario on an Express5800 2 way Server
See full-sized image.

Example of Paging on a Memory Limited System

Figure 15, below, shows paging activity on a memory-limited system, with Figure 16 showing us that the three zones that start at the beginning of the chart, approximately 1 hour and 30 minutes, and approximately 2 hours and 50 minutes.

Taking both of these charts together, zone 1 shows a small amount of paging out with almost no paging in, corresponding to most of the memory pages being in physical RAM. Zone 2 shows a considerable amount of paging out, but very little paging in, which is the stage during which the unused portions of the working sets are timed. Zone 3 shows similar activity to zone 2, except toward the end, where the number of pages being paged in increases considerably. If this condition is sustained, the system performance will degrade dramatically.

Figure 15: Example of Paging on a Memory Limited System

Figure 15: Example of Paging on a Memory Limited System
See full-sized image.

Figure 16: Total Working Set Size on a Memory Limited System

Figure 16: Total Working Set Size on a Memory Limited System
See full-sized image.

Top of pageTop of page

Appendix B: Test Script Flow Sharts

Structured Task Worker Script

Typing speed = 60 WPM

Definition: Workers who are typically a link in a workflow or process and perform the same tasks repetitively. The process worker is driven in their daily jobs by a set process, rather than ad-hoc projects. Cost of downtime varies; most workers are only partially dependent on computer availability. Claims processing, accounts payable, accounts receivable, customer service, high-end manufacturing, high-end maintenance, and repair are examples of tasks performed by a structured task worker.

w2ktss18

See full-sized image.

Knowledge Worker Script

Typing Speed = 35 WPM

Definition: a worker who gathers, adds value to, and communicates information in a decision support process. Cost of downtime is variable but highly visible. Projects and ad-hoc needs towards flexible tasks drive these resources. These workers make their own decisions on what to work on and how to accomplish the task. The usual tasks they perform are marketing, project management, sales, desktop publishing, decision support, data mining, financial analysis, executive and supervisory management, design, and authoring.

w2ktss19

See full-sized image.

w2ktss20

See full-sized image.

w2ktss21

See full-sized image.

w2ktss22

See full-sized image.

Data Entry Worker Script

Typing Speed = N/A

Definition: Workers who input data into computer systems like transcription, typists, order entry, clerical, and manufacturing.

w2ktss23

See full-sized image.

Top of pageTop of page

Appendix C: Terminal Server Settings

Operating System Installation

All drives formatted using NTFS

Components

Terminal Services enabled in Application server mode

All other components disabled except Accessories and Utilities, Network Monitor Tools and SNMP under Management and Monitoring Tools

Networking left at default with Typical Network Settings

Server is joined as a member to a Windows NT 4.0 Domain

Page file initial and maximum size set to 4092 MB

Registry set to 256 MB

RDP protocol client settings:

Clipboard mapping, printer mapping and LPT mapping disabled

Office 2000 Settings

Office 2000 installed using default Terminal Server transforms file from Office 2000 Resource Kit (termsrvr.mst)

Outlook Settings

Mailbox on Exchange server.

Email Options

AutoSave of messages disabled

Automatic name checking disabled

AutoArchive disabled

Word Settings

Background grammar checking disabled

Background saves disabled

Save AutoRecover information disabled

Printer Settings

HP LaserJet 6P created to print to NUL:

Print Notification messages disabled

Spooler information event logging disabled

User Profiles

Configuration script executed to pre-create cached profiles and run through Internet Connection Wizard

Performance Logger

Performance counters are logged on the terminal server itself

Top of pageTop of page

Appendix D: Express5800 Server Specifications

Express5800 HV8600

Number of processors

8 (SMP)

Type of processor

Pentium III Xeon 500 Mhz

Integrated L1 cache

32 KB

L2 cache std / max / type

2 MB ECC

Front side bus speed (FSB)

100 MHz

Memory

4 GB ECC

RAID controller

Mylex, 16 MB cache, write back

Internal storage

18 GB striped array on 3 drives

Express5800 HX4600

Number of processors

4 (SMP)

Type of processor

Pentium III Xeon 500 Mhz

Integrated L1 cache

32 KB

L2 cache std / max / type

2 MB ECC

Front side bus speed (FSB)

100 MHz

Memory

4 GB ECC

RAID controller

Mylex, 16 MB cache, write back

Internal storage

18 GB striped array on 3 drives

Express5800 MC2400 (also used for 1-way tests)

Number of processors

2 (SMP)

Type

Pentium III 450 MHz

Integrated L1 cache

32 KB

L2 cache std / max / type

512 KB ECC

Front side bus speed (FSB)

100 MHz

Memory

1 GB ECC

RAID controller

Option

Maximum internal storage

182 GB ( 3x36.4GB + 4x18.2GB)

Top of pageTop of page

For More Information

For more information about Windows 2000 Terminal Services, see the Exploring Terminal Services Web site at

http://www.microsoft.com/windows2000/terminalservices

For detailed server specifications and up to date model numbers, visit these Web sites:

http://www.servers.bull.com/express5800

http://www.nec-computers.com

1 Kernel was tuned using the procedure described in the section entitled "Diagnosing and Optimizing a Kernel Address Space Limited System"

2 System was kernel address space limited, even after tuning the kernel

3 Scenario not tested with a tuned kernel, as the 2-way configuration was kernel address space limited after the kernel was tuned. Therefore no additional users would be able to logon if the server had the same amount of RAM as the 2-way.

4 This server was tested in a 2-way configuration with one processor disabled using the /numproc=1 boot.ini switch. Therefore it was using a multi-processor kernel and HAL, rather than a uni-processor kernel and HAL.

5 Because of a limitation in the testing simulation tools, there was no canary timer script running for the Data Entry Worker Dedicated (DEWD) scenario. As the standard Data Entry Worker was canary limited, it was assumed that the DEWD would have also been canary limited running on the same hardware.

6 TCO Manager for Distributed Computing 4.0

7 Some customers will have systems with greater than 4 GB of RAM, using Physical Address Extensions (PAE) available on later Intel processors and Windows 2000 Advanced or Datacenter Server. This is physical RAM, however, and such systems still use 32 bits internally for virtual addresses. The 32-bit virtual addresses are mapped to 36-bit physical addresses so that the system can address all physical RAM. As such the system still has the same limitations on Paged Pool and System PTEs.


Top of pageTop of page