Chapter 6 - TCP/IP Implementation Details
The Transmission Control Protocol/Internet Protocol (TCP/IP) suite is a set of networking transports that govern how data passes between networked computers. Microsoft has adopted TCP/IP as the strategic enterprise network transport for its platforms. Microsoft 32-bit TCP/IP for Windows NT is a high-performance, portable, 32-bit implementation of the industry-standard TCP/IP protocol.
This chapter is intended for network engineers and support professionals who are already familiar with TCP/IP or who have read the TCP/IP chapters in the Networking Supplement for Windows NT Server version 4.0. This chapter provides additional technical detail about Microsoft 32-bit TCP/IP as implemented in Windows NT.
This chapter discusses the following topics:
This chapter concludes with a summary of TCP/IP changes in Windows NT Server version 4.0.
Network traces are used throughout this chapter to help illustrate concepts. These traces were gathered and formatted using Microsoft Network Monitor, a software-based protocol tracing and analysis tool included in the Microsoft Systems Management Server product. All the IP addresses in these traces have been replaced with the IP addresses for the fictitious company Terra Flora.
Note The base code described in this chapter is shared by all Microsoft 32-bit TCP/IP protocol stacks, including TCP/IP for Windows NT Server, Windows NT Workstation, and Windows 95. However, there are small differences in implementation, configuration methods, and available services. This chapter describes the implementation, configuration, and available services for Microsoft 32-bit TCP/IP for Windows NT.
Architecture of Microsoft TCP/IP for Windows NT
TCP/IP is the primary protocol of the Internet and intranets (networks that connect enterprise-wide local area networks). You can communicate with computers running under Windows NT, with devices that use other Microsoft networking products, and with computers running under non-Microsoft operating systems using TCP/IP (such as UNIX computers).
The TCP/IP protocol suite in Windows NT Server and Windows NT Workstation was completely redesigned beginning with Windows NT version 3.5. It is a high-performance, portable, 32-bit implementation of the industry-standard TCP/IP protocol suite and is easy to administer. Microsoft 32-bit TCP/IP for Windows NT provides support for standard TCP/IP features including:
The TCP/IP protocol suite for Windows NT Server and Windows NT Workstation is comprised of:
The Windows NT – based networking services include, but are not limited to, the following:
The following figure illustrates the Windows NT TCP/IP architecture model and shows the core protocol elements and the interfaces between protocol elements and services.
Figure 6.1 The Windows NT TCP/IP Architecture Model
Note Specifications and programming information for Microsoft 32-bit TCP/IP are included in the Windows NT Device Driver Kit (DDK). The Transport Driver Interface (TDI) and the Network Driver Interface Specification (NDIS) are public specifications available from Microsoft.
Network Driver Interface Specification
Microsoft networking protocols (including TCP/IP) communicate with network card drivers using the Network Driver Interface Specification (NDIS). Windows NT Server version 4.0 uses NDIS version 4.0. The following figure illustrates the NDIS interface and the layers below the NDIS interface.
Figure 6.2 NDIS Interface
The NDIS interface (Ndis.sys) provides basic services used by the protocol modules. The protocol driver uses the NDIS interface to send raw data packets over a network device and to receive notification of incoming packets received by the NIC. NDIS allows the protocol components to function independently of the NIC.
Any NDIS-compliant protocol driver can communicate with any NDIS-compliant NIC.
Network Adapter Card Bindings
The NDIS layer supports binding, a process that establishes the communication channel between a protocol driver (such as TCP/IP) and a network card, in the following ways:
The NDIS specification describes the communications technique known as multiplexing, which allows transmitting a number of separate signals simultaneously over a single channel or line and which supports the different ways of binding a protocol driver and network card.
To improve networking performance, you can manually change the default bindings on a computer running under Windows NT by using the Bindings tab in the Control Panel—Network dialog box. You can choose to disable a binding, enable a binding, or redefine the default binding search order.
Media Access Control Layer
Because the NDIS interface handles raw packets, the protocol stack is normally responsible for building each frame, including Media Access Control (MAC) layer headers. This means that the protocol stack must explicitly support each media type. Microsoft 32-bit TCP/IP for Windows NT versions 4.0 and 3.5x provides support for:
In addition, there are now asynchronous transfer mode (ATM) adapters available for Windows NT. The drivers for these adapters use "LAN emulation" to appear to the protocol stack as a supported media type, such as Ethernet.
Link layer functionality is divided between the combination (binding) of the NIC driver and the low-level protocol stack driver. The binding of the NIC driver and low-level protocol stack driver creates filters based on the destination MAC address of each frame. Normally, the hardware filters out all incoming frames except those containing one of the following destination addresses:
Because this first filtering decision is made by the hardware, all frames not meeting the filter criteria are discarded by the NIC without any CPU processing. All frames (including broadcasts) that do pass the hardware filter are passed up to the NIC driver through a hardware interrupt.
A NIC driver is software on the local computer, and any frames that reach the NIC require some CPU time to process. The NIC driver brings the frame into system memory from the NIC. Then the frame is passed up to the appropriate bound transport drivers, in this case, TCP/IP. The NDIS specification provides more detail on this process.
Most NICs can be selectively configured in a non-selective mode. A NIC in non-selective mode does not perform any address filtering on frames that appear on the media. Instead, it passes upward every frame that passes the cyclic redundancy check (CRC). This feature is used by some protocol analysis software—for example, Microsoft Network Monitor.
Frames are passed upward to all bound transport drivers, in the order of the binding. By default, the binding order is the alphabetical order of their key names in the Registry.
As a frame traverses a network or series of networks, the source MAC address is always that of the NIC that placed it on the media, and the destination MAC address is that of the NIC that is intended to pull it off the media. This means that in a routed network, the source and destination MAC addresses change with each "hop" through a network-layer device (router).
Maximum Transfer Unit
Each media type has a maximum frame size that cannot be exceeded. The link layer is responsible for discovering this maximum transfer unit (MTU) and reporting it to the protocols above. NDIS drivers can be queried for the local MTU by the protocol stack. The upper layer protocols, such as TCP, use the MTU to automatically optimize packet sizes for each media. See the section "Internet Control Message Protocol" later in this chapter for related information about path maximum transfer unit (PMTU) discovery.
If a NIC driver such as an ATM driver is using LAN-emulation mode, it may report that it has an MTU higher than what is expected for that media type. For example, it may emulate Ethernet but report an MTU of 9180 bytes. Windows NT accepts and uses the MTU size reported by the adapter even when it exceeds the normal MTU for a given media type.
Core Protocol Stack Components
The following figure illustrates the core protocol stack components in the Tcpip.sys driver which exists between the NDIS and TDI interfaces.
Figure 6.3 Core Protocol Stack Components
Address Resolution Protocol
Address Resolution Protocol (ARP) performs IP address-to-MAC-address resolution for outgoing packets. As each outgoing IP datagram is encapsulated into a frame, source and destination MAC addresses must be added. Determining the destination MAC address for each frame is the responsibility of ARP.
ARP compares the destination IP address on every outbound IP datagram to the ARP cache for the NIC which that frame will be sent over. If there is a matching entry, then the MAC address is retrieved from the cache. If not, ARP broadcasts an ARP Request Packet onto the local subnet, requesting that the owner of the IP address in question reply with its MAC address. If the packet is going through a router, ARP resolves the MAC address for that next-hop router rather than for the final destination host. When an ARP reply is received, the ARP cache is updated with the new information, and it is used to address the packet at the link layer.
You can view, add, or delete entries in the ARP cache using the arp utility (see the following examples). Entries added manually are static, and do not get aged out of the cache like dynamic entries do.
To view the ARP cache, type the following command; you will get results similar to those in the following example:
C:\>arp -a Interface: 172.16.112.123 Internet Address Physical Address Type 172.16.112.1 00-00-0c-1a-eb-c5 dynamic 172.16.112.124 00-dd-01-07-57-15 dynamic Interface: 172.16.113.190 Internet Address Physical Address Type 172.16.113.138 00-20-af-1d-2b-91 dynamic
In these examples, the computer is multihomed, meaning that it has multiple NICs, so there is a separate ARP cache for each interface. The arp-s command can be used to add a static entry to the ARP cache used by the second interface, for the host whose IP address is 172.16.90.32 and whose NIC address is 00608C0E6C6A, as shown in the following example:
C:\>arp -s 172.16.90.32 00-60-8c-0e-6c-6a 172.16.48.190 C:\>arp -a Interface: 172.16.112.123 Internet Address Physical Address Type 172.16.112.1 00-00-0c-1a-eb-c5 dynamic 172.16.112.124 00-dd-01-07-57-15 dynamic Interface:172.16.48.190 Internet Address Physical Address Type 172.16.80.138 00-20-af-1d-2b-91 dynamic 172.16.90.32 00-60-8c-0e-6c-6a static
ARP Cache Aging
Windows NT versions 4.0 and 3.5x adjust the size of the ARP cache automatically. Entries are aged out of the ARP cache if they are not used by any outgoing datagrams for two minutes. Entries that are being referenced get aged out of the ARP cache after 10 minutes. Entries added manually are not aged out of the cache. A new registry parameter, ArpCacheLife, was added to allow more administrative control over aging. This parameter is described in online Registry Help.
Entries can be deleted from the cache using arp -d as shown next:
C:\>arp -d 172.16.90.32 C:\>arp -a Interface: 172.16.112.123 Internet Address Physical Address Type 172.16.112.1 00-00-0c-1a-eb-c5 dynamic 172.16.112.124 00-dd-01-07-57-15 dynamic Interface: 172.16.112.190 Internet Address Physical Address Type 172.16.112.138 00-20-af-1d-2b-91 dynamic
ARP will queue one outbound IP datagram to a given destination address while that IP address is being resolved to a MAC address. If a UDP – based program sends multiple IP datagrams to a single destination address without any pauses between them, some of the datagrams may be dropped if there is no ARP cache entry already present.
Internet Protocol (IP) is the "mailroom" of the TCP/IP stack, where packet sorting and delivery takes place. At this layer, each incoming or outgoing packet is known as a datagram. Each IP datagram bears the source IP address of the sender and the destination IP address of the intended recipient. Unlike the MAC addresses, the IP addresses in a datagram remain the same throughout a packet's journey across an internetwork. The following sections describe the IP layer functions.
Routing is the primary function of IP. Datagrams are handed to the IP protocol from UDP and TCP above, and from the NIC(s) below. Each datagram is labeled with a source and destination IP address. The IP protocol examines the destination address on each datagram, compares it to a locally maintained route table, and decides what action to take. There are three possibilities for each datagram:
The route table maintains four different types of routes. They are listed here in the order in which they are searched for a match:
You can view the route table from the command prompt, as shown in the following example of the command and its results.
The preceding route table is for a computer with the class C IP address 172.16.112.123. It contains seven entries, as follows:
On this host, if a packet is sent to 172.16.112.122, the table is first scanned for a host route (not found), then for a subnet route (not found), then for a network route (that is found). The packet is sent by using the local interface 172.16.112.123. If a packet is sent to 172.17.1.1, the same search is used, and no host, subnet, or network route is found. In this case, the packet is directed to the default gateway, by inserting the MAC address of the default gateway into the destination MAC address field.
The route table is maintained automatically in most cases. When a host initializes, entries for the local network(s), loopback, multicast, and configured default gateway are added. More routes may appear in the table as the IP layer learns of them. For example, a computer may receive a message from the default gateway that indicates (using ICMP, as explained later) a better route to a specific network, subnet, or host. Routes also may be added manually by using the route command. In Windows NT versions 4.0 and 3.5x, the -p (persistent) switch can be used with the route command to specify permanent routes. Permanent routes are stored in the Registry under:
HKEY_LOCAL_COMPUTER \SYSTEM \CurrentControlSet \Services \Tcpip \Parameters PersistentRoutes
Note In Windows NT version 3.5, manually-added routes were treated as temporary files and the routes were deleted from the table when the computer was restarted.
Most routers use a protocol such as RIP (Routing Information Protocol) or OSPF (Open Shortest Path First) to exchange routing tables with each other.
The Multi-Protocol Router (MPR) in Windows NT consists of the following:
Routers use RIP to dynamically exchange routing information. Windows NT routes the RIP protocols and dynamically exchanges routing information with other routers running the RIP protocol.
The Windows NT router uses the BOOTP relay agent to forward DHCP requests to DHCP servers on other subnets. This allows one DHCP server to service multiple IP subnets.
Note By default, computers running under Windows NT do not behave as routers. You must install MPR after installing TCP/IP on your computer. MPR is included with Windows NT Server and Windows NT Workstation version 4.0. MPR for Windows NT version 3.51 is available from ftp.microsoft.com, and is included with Service Pack 3 and later under the "MPR" directory. Windows NT version 3.5 or earlier, when used as a router, does not include support for RIP.
Routing for Multiple Logical Subnets
When running multiple logical subnets on the same physical network, use the following command to tell IP to treat all subnets as local and to use ARP directly for the destination:
route add 0.0.0.0 MASK 0.0.0.0 <my local ip address>
Thus, packets destined for "non-local" subnets will be transmitted directly onto the local media instead of being sent to a router. In essence, the local interface card can be designated as the default gateway. This might be useful where several class "C" networks are being used on one physical network with no router to the outside world.
Duplicate IP Address Detection
Duplicate address detection is an important feature. When the stack is first initialized, a "gratuitous" ARP Request is broadcast for the IP address(es) of the local host. If another computer replies, the IP address is already in use. When this happens, the Windows NT computer will still start; however, IP on the offending interface is disabled, a system log entry is generated, and a popup error message is displayed. If the computer that is "defending" the address is also a Windows NT computer, a system log entry is generated and a popup error message is displayed there; however, its interface will continue to operate. After transmitting the ARP reply, the "defending" computer ARPs for its own address again so that other hosts on the network will maintain the correct mapping for the address in their ARP caches.
A computer using a duplicate IP address may be started while it is not attached to the network, in which case no conflict would be detected at that point. However, if it is then plugged into the network, the first time that it ARPs for another IP address, any Windows NT computer with a conflicting address will detect the conflict. The computer detecting the conflict will display a popup error message and log a detailed event in the system log. The following is a sample event log entry:
** The system detected an address conflict for IP address 172.16.48.123 with the system having network hardware address 00:DD:01:0F:7A:B5. Network operations on this system may be disrupted as a result. **
When a computer is configured with more than one IP address, it is known as a multihomed computer. The different types of multihoming are:
When an IP datagram is sent from a multihomed host, it will be handed down to the interface card with the best apparent route to the destination. Accordingly, the datagram may bear the source IP address of one interface in the multihomed host, yet be placed on the media by a different NIC. The source MAC address on the frame will be that of the NIC that actually transmitted the frame onto the media, and the source IP address will be the one that the sending application sourced it from, not necessarily one of those associated with the sending NIC in the configuration screens in the network control panel.
Routing problems may arise when a computer is multihomed with NICs attached to disjoint networks (networks that are separate from and unaware of each other, such as one connected by using RAS). In this scenario, it is often necessary to set up static routes to remote networks.
More details on name registration and resolution with multihomed computers are provided in the section "NetBIOS over TCP/IP" later in this chapter.
Classless Interdomain Routing
Classless Interdomain Routing (CIDR), also known as supernetting, can be used to consolidate several class C network addresses into one logical network. CIDR is described in RFC 1518/1519. To use supernetting, the IP network addresses that are to be combined must share the same high-order bits, and the subnet mask is "shortened" to take bits away from the network portion of the address and add them to the host portion.
This is best explained with an example. The class C network addresses 172.16.16.0, 172.16.32.0, and 172.16.48.0 can be combined by using a subnet mask of 255.255.252.0 for each:
NET 172.16.16 (1010 1100 . 0001 0000 . 0010 0000.0000 0000) NET 172.16.32 (1010.1100 . 0001.0000 . 0010.0000.0000 0000) NET 172.16.48 (1010.1100 . 0001.0000 . 0011.0000.0000 0000) MASK 255.255.252.0 (1111 1111 . 1111 1111 . 1111 1100.0000 0000)
When routing decisions are made, only the bits covered by the subnet mask are used, thus making these addresses all appear to be part of the same network for routing purposes. Any routers in use must also support CIDR and may require special configuration.
IP multicasting is used to provide efficient multicast services to clients that may not be located on the same network segment. Windows Sockets programs can join a multicast group. For more information, see the section "Using IP Multicasts with Windows Sockets Programs," later in this chapter.
Windows NT versions 4.0 and 3.5x are level-2 (send and receive) compliant with RFC 1112. IGMP is the protocol used to manage IP multicasting.
Internet Control Message Protocol
Internet Control Message Protocol (ICMP) is a maintenance protocol specified in RFC 792 and is normally considered to be part of the IP layer. ICMP messages are encapsulated within IP datagrams, so they can be routed throughout an internetwork. ICMP is used by Windows NT to:
Maintaining Route Tables
When a Windows NT computer is initialized, the route table normally contains only a few entries. One of those specifies a default gateway. Datagrams that have a destination IP address with no match in the route table are sent to the default gateway.
However, because routers share information about network topology with each other, the default gateway may know of a better route to a given address. When this is the case, upon receiving a datagram that could be taking the better path, the router forwards the datagram normally, then advises the sender of the better route using an ICMP redirect message.
These messages can specify redirection for one host, a subnet, or for an entire network. When a Windows NT computer receives an ICMP redirect, a check is performed to be sure that it came from the first-hop gateway in the current route, and that the gateway is on a directly connected network. If so, the route table is adjusted accordingly.
If the ICMP redirect did not come from the first-hop gateway in the current route, or if that gateway is not on a directly connected network, then the ICMP redirect is ignored.
Path Maximum Transfer Unit Discovery
TCP uses Path Maximum Transfer Unit (PMTU) discovery. The mechanism relies on ICMP destination unreachable messages.
Using ICMP to Diagnose Problems
The ping utility is used to send ICMP echo requests to an IP address, and wait for ICMP echo responses. Ping reports on the number of responses received and the time interval between sending the request and receiving the response. There are many different options that can be used with the ping utility.
Tracert is a route-tracing utility that can be very useful. Tracert works by sending ICMP echo requests to an IP address, while incrementing the time-to-live (TTL) field in the IP header by one starting at 1, and analyzing the ICMP errors that get returned. Each succeeding echo request should get one hop further into the network before the TTL field reaches 0 and an ICMP Time Exceeded error is returned by the router attempting to forward it. Tracert simply prints out an ordered list of the routers in the path that returned these error messages. If the -d switch is used (meaning do not do a DNS lookup on each IP address), then the IP address of the near-side interface of the routers is reported.
Adjusting Flow Control by Using ICMP
If a host is sending datagrams to another computer at a rate that is saturating the routers or links between them, it may receive an ICMP Source Quench message asking it to slow down. The TCP/IP stack in Windows NT honors a source quench message as long as it contains the header fragment of one of its own datagrams from an active TCP connection. If a Windows NT computer is being used as a router, and it is unable to forward datagrams at the rate they are arriving, it drops any datagrams that cannot be buffered but does not send ICMP source quench messages to the senders.
Internet Group Management Protocol
Windows NT versions 4.0 and 3.5x provide level-2 (full) support for IP multicasting as specified in RFC 1112. The introduction to RFC 1112 provides a good overall summary of IP multicasting. The text reads:
IP multicasting is the transmission of an IP datagram to a "host group", a set of zero or more hosts identified by a single IP destination address. A multicast datagram is delivered to all members of its destination host group with the same "best-efforts" reliability as regular unicast IP datagrams, i.e., the datagram is not guaranteed to arrive intact at all members of the destination group or in the same order relative to other datagrams. The membership of a host group is dynamic; that is, hosts may join and leave groups at any time. There is no restriction on the location or number of members in a host group. A host may be a member of more than one group at a time. A host need not be a member of a group to send datagrams to it.A host group may be permanent or transient. A permanent group has a well-known, administratively assigned IP address. It is the address, not the membership of the group, that is permanent; at any time a permanent group may have any number of members, even zero. Those IP multicast addresses that are not reserved for permanent groups are available for dynamic assignment to transient groups that exist only as long as they have members.Internetwork forwarding of IP multicast datagrams is handled by "multicast routers" that may be co-resident with, or separate from, Internet gateways. A host transmits an IP multicast datagram as a local network multicast that reaches all immediately-neighboring members of the destination host group. If the datagram has an IP time-to-live greater than 1, the multicast router(s) attached to the local network take responsibility for forwarding it towards all other networks that have members of the destination group. On those other member networks that are reachable within the IP time-to-live, an attached multicast router completes delivery by transmitting the datagram as a local multicast.
IP/ARP Extensions for IP Multicasting
To support IP multicasting, an additional route is defined. The route (added by default) specifies that if a datagram is being sent to a multicast host group, it should be sent to the IP address of the host group by using the local interface card, not forwarded to the default gateway. The following route (which can be seen with the route print command) illustrates this:
Host group addresses are easily identified, because they are from the class D range, 188.8.131.52 to 184.108.40.206. These IP addresses all have "1110" as their high-order 4 bits.
To send a packet to a host group using the local interface, the IP address must be resolved to a MAC address. From RFC 1112:
An IP host group address is mapped to an Ethernet multicast address by placing the low-order 23 bits of the IP address into the low-order 23 bits of the Ethernet multicast address 01-00-5E-00-00-00 (hex). Because there are 28 significant bits in an IP host group address, more than one host group address may map to the same Ethernet multicast address.For example, a datagram addressed to the multicast address 220.127.116.11 would be sent to the (Ethernet) MAC address 01-00-5E-00-00-05. This MAC address is formed by the junction of 01-00-5E and the 23 low-order bits of 18.104.22.168 (00-00-05). Because more than one host group address might map to the same Ethernet multicast address, the NIC may indicate up some multicasts for a host group for which no local programs have registered interest. These extra multicasts are discarded. Finally, the protocol stack must provide a means of joining and leaving host groups.
Using IP Multicasts with Windows Sockets Programs
IP multicasting is currently supported only on AF_INET sockets of type SOCK_DGRAM. By default, IP multicast datagrams are sent with a time-to-live (TTL) of 1. The setsockopt() call can be used by a program to specify a TTL. By convention, multicast routers use TTL thresholds to determine how far to forward datagrams. The following table lists the TTL thresholds that are used to determine how far to forward multicast datagrams.
Table 6.1 Time-to-Live Thresholds for Windows Sockets Programs
Use of IGMP by Windows NT Components
At the time of this writing. the only Windows NT component that uses IGMP is Windows Internet Name Service (WINS), which attempts to locate replication partners by using multicasting.
Transmission Control Protocol
Transmission Control Protocol (TCP) provides a connection-based, reliable, byte-stream service to programs. Microsoft networking relies upon the TCP transport for logging on, file and print sharing, replication of information between domain controllers, transfer of browse lists, and other common functions. TCP can only be used for one-to-one communications. TCP uses a checksum on both the headers and data of each segment to reduce the chance of network corruption going undetected.
Size Calculation of the TCP Receive Window
The TCP receive window size is the amount of receive data (in bytes) that can be buffered at one time on a connection. The sending host can send only that amount of data before waiting for an acknowledgment (ACK) and window update from the receiving host.
The TCP/IP stack is designed to self-tune itself in most environments. Instead of using a hard-coded default receive window size, TCP adjusts to even increments of the maximum segment size (MSS) negotiated during connection setup.
Matching the receive window to even increments of the MSS increases the percentage of full-sized TCP segments used during bulk data transmission. The following defaults are used for receive window size: TCPWindowSize = 8K rounded up to the nearest MSS increment for the connection; if that is not at least 4 times the MSS, then it's adjusted to 4 times the MSS, with a maximum size of 64K.
Note The maximum window size is 64K because the field in the TCP header is 16 bits in length. RFC 1323 describes a TCP window scale option that can be used to obtain larger receive windows; however Windows NT TCP/IP does not yet implement that option.
For Ethernet, the window will normally be set to 8760 bytes (8192 rounded up to six 1460-byte segments); for 16/4 Token Ring or FDDI, it will be around 16K. These are default values and it's not generally advisable to alter them; however, you can either change the registry parameter TcpWindowSize to globally change the setting for the computer, or use the setsockopt() Windows Sockets call to change the setting on a per-socket basis.
Per RFC 1122, TCP uses delayed acknowledgments to reduce the number of packets sent on the media. The Microsoft stack takes a common approach to implementing delayed acknowledgments. The following conditions cause an acknowledgment to be sent as data is received by TCP on a given connection:
In summary, normally an ACK is sent for every other TCP segment received on a connection, unless the delayed ACK timer (200ms) expires. There is no configuration parameter to disable delayed ACKs.
RFC 1191 describes PMTU discovery. When a connection is established, the two hosts involved exchange their TCP MSS values. The smaller of the two MSS values is used for the connection. The MSS for a computer is usually the MTU at the link layer minus 40 bytes for the IP and TCP headers.
When TCP segments are destined to a non-local network, the "don't fragment" bit is set in the IP header. Any router or media along the path may have an MTU that differs from that of the two hosts.
If a media is encountered with an MTU that is too small for the IP datagram being routed, the router will attempt to fragment the datagram accordingly. Upon attempting to do so, it will find that the "don't fragment" bit in the IP header is set. At this point, the router should inform the sending host with an ICMP destination unreachable message that the datagram can't be forwarded further without fragmentation. Most routers will also specify the MTU that is allowed for the next hop by putting the value for it in the low-order 16 bits of the ICMP header field that is labeled "unused" in the ICMP specification. See RFC 1191, section 4, for the format of this message.
Upon receiving this ICMP error message, TCP adjusts its MSS for the connection to the specified MTU minus the TCP and IP header size, so that any further packets sent on the connection will be no larger than the maximum size that can traverse the path without fragmentation. The minimum MTU permitted by RFCs is 68 bytes, and this limit is enforced by Windows NT TCP.
Some non-compliant routers may silently drop IP datagrams that cannot be fragmented, or may not correctly report their next-hop MTU. If this occurs, it may be necessary to make a configuration change to the PMTU detection algorithm. There are two registry changes that can be made to the TCP/IP stack to find and correct errors caused by these problematic routers:
The PMTU between two computers can be discovered by manually using ping with the -f (do not fragment) switch as follows:
ping -f -n <number of pings> -l <size> <destination ip address>
In the preceding example, the size parameter can be varied until the MTU is found. Note that the size parameter used by ping is the size of the data buffer to send, not including headers. The ICMP header consumes 8 bytes, and the IP header would normally be 20 bytes. In the following case (Ethernet), the link layer MTU is the maximum-sized ping buffer plus 28, or 1500 bytes:
C:\temp>ping -f -n 1 -l 1472 172.16.48.03 Pinging 172.16.48.03 with 1472 bytes of data: Reply from 172.16.48.03: bytes=1472 time<10ms TTL=30 C:\temp>ping -f -n 1 -l 1473 172.16.48.03 Pinging 172.16.48.03 with 1473 bytes of data: Packet needs to be fragmented but DF set
In the preceding example, the router returned an ICMP error message which ping interpreted for us. If the router had been a "black hole" router, the ping would simply not be answered once its size exceeded the MTU that the router could handle. Ping can be used in this manner to detect such a router.
A sample ICMP destination unreachable error message is as follows:
+ FRAME: Base frame properties + FDDI: Length = 77 + LLC: UI DSAP=0xAA SSAP=0xAA C + SNAP: ETYPE = 0x0800 + IP: ID = 0x0; Proto = ICMP; Len: 56 ICMP: Destination Unreachable, Destination: 172.16.112.125 ICMP: Packet Type = Destination Unreachable ICMP: Unreachable Code = Fragmentation Needed, DF Flag Set ICMP: CheckSum = 0x8ABF ICMP: Data: Number of data bytes remaining = 28 (0x001C) 00000: 50 00 60 8C 14 C7 0E 00 00 0C 1A EB C0 AA AA 03 00010: 00 00 00 08 00 45 00 00 38 00 00 00 00 FF 01 D3 00020: 36 C7 C7 2C 01 C7 C7 2C FE 03 04 8A BF 00 00 05 00030: C7 45 00 05 F8 55 24 40 00 1F 01 1B D7 C7 C7 2C 00040: FE C7 C7 28 7D 08 00 00 75 01 00 63 00
Network Monitor did not parse the MTU suggestion in this frame, but it is shown underlined in the hex portion of the trace. This error is generated by using ping -f -l 2000 on an FDDI-based host to send a large datagram through a router to an Ethernet host. When the router tried to place the large frame onto the Ethernet segment, it found that fragmentation is not allowed, and so it returned the error message indicating the largest datagram that could be forwarded is 0x5c7, or 1479 bytes.
Dead Gateway Detection
Microsoft TCP/IP provides dead gateway detection. Dead gateway detection allows TCP to detect failure of the default gateway and to make an adjustment to the IP routing table to use another default gateway.
Dead gateways are detected by using TCP retries. Microsoft TCP/IP stack uses the triggered reselection method as described in RFC 816.
TCP will attempt to send a packet to the default gateway configured on a computer until it receives an acknowledgment or until one-half of the TcpMaxDataRetransmissions registry parameter is reached. If no response is received from the default gateway and multiple gateways are configured on the computer, TCP requests that IP switch to the next default gateway in the list.
Note If the computer running Windows NT Server or Windows NT Workstation is a DHCP client, the default gateway is automatically configured on the computer.
To add additional default gateways or to configure gateways for non-DHCP configured computers
IP utilities such as ping do not trigger the dead gateway detection process. They use the current default gateway. If TCP detects a dead gateway and selects a new one, the IP utilities will then function using the new gateway. By default, dead gateway detection is set to "on" when you configure a computer running under Windows NT with the IP address of more than one gateway.
TCP starts a retransmission timer when each outbound segment is handed down to IP. If no acknowledgment has been received for the data in a given segment before the timer expires, then the segment is retransmitted, up to the value of the TcpMaxDataRetransmissions registry parameter. The default value for this parameter is 5.
The retransmission timer is initialized to three seconds when a TCP connection is established; however it is adjusted "on the fly" to match the characteristics of the connection using smoothed round trip time (SRTT) calculations as described in RFC 793. The timer for a given segment is doubled after each retransmission of that segment. Using this algorithm, TCP tunes itself to the "normal" delay of a connection. TCP connections over high-delay links will take much longer to time out than those over low-delay links.
Note Adding  to the registry parameter TcpMaxDataRetransmissions approximately doubles the total retransmission time-out period for all connections.
The following trace clip shows the retransmission algorithm for two hosts connected over Ethernet on the same subnet. An FTP file transfer was in progress when the receiving host was disconnected from the network. Since the SRTT for this connection is very small, the first retransmission is sent after about one-half second. The timer is then doubled for each of the retransmissions that followed. After the fifth retransmission, the timer is once again doubled, and if no acknowledgment is received before it expires, the transfer is aborted.
TCP Keepalive Messages
A TCP keepalive packet is simply an ACK with the sequence number set to one less than the current sequence number for the connection. A computer receiving one of these ACKs should respond with an ACK for the current sequence number. Keepalives can be used to verify that the computer at the remote end of a connection is still available. TCP keepalives can be sent once every KeepAliveTime (defaults to 7,200,000 milliseconds or two hours), if no other data or higher level keepalives have been carried over the TCP connection. If there is no response to a keepalive, it is repeated once every KeepAliveInterval seconds. KeepAliveInterval defaults to one second. NetBT connections, such as those used by many Microsoft networking components, send NetBIOS keepalives more frequently, and so normally no TCP keepalives will be sent on a NetBIOS connection. TCP keepalives are disabled by default, but Windows Sockets programs may enable them using setsockopt().
Slow Start Algorithm and Congestion Avoidance
When a connection is initially established, TCP processes at a slow rate to assess the bandwidth of the connection and to avoid overflowing the receiving host or any other devices or links in the path. The send window is set to two TCP segments.
If the TCP/IP segments are acknowledged, the send window is incremented again, and so on until the amount of data being sent per burst reaches the size of the receive window on the remote host. At that point, the slow start algorithm is no longer in use and flow control is governed by the receive window on the remote host.
However, at any time during transmission, congestion could still occur on a connection. If this happens (evidenced by the need to retransmit), a congestion avoidance algorithm is used to reduce the send window size temporarily, and then to slowly increment the send window back towards the receive window size.
Note Slow start and congestion avoidance are discussed in RFC 1122.
Silly Window Syndrome
Silly Window Syndrome (SWS) is described in RFC 1122 as follows:
In brief, SWS is caused by the receiver advancing the right window edge whenever it has any new buffer space available to receive data and by the sender using any incremental window, no matter how small, to send more data [TCP:5]. The result can be a stable pattern of sending tiny data segments, even though both sender and receiver have a large total buffer space for the connection.
TCP/IP for Windows NT implements SWS avoidance per RFC 1122 by not sending more data until there is a sufficient window size advertised by the receiving end to send a full segment. It also implements SWS on the receive end of a connection by not opening the receive window in increments of less than a TCP segment.
TCP/IP for Windows NT Server and Windows NT Workstation implements the Nagle algorithm described in RFC 896. The purpose of this algorithm is to reduce the number of "tiny" segments sent, especially on high-delay (remote) links. The Nagle algorithm allows only one small segment to be outstanding at a time without acknowledgment. If more small segments are generated while awaiting the ACK for the first one, then these segments are coalesced into one larger segment. Any full-sized segment is always transmitted immediately, assuming there is a sufficient receive window available. The Nagle algorithm is effective in reducing the number of packets sent by interactive programs, such as Telnet, especially over slow links.
The following trace captured by using Microsoft Network Monitor shows the Nagle algorithm at work. The trace was captured by using PPP to dial up an Internet provider at 9600 bps. A Telnet (character-mode) session is established, then the "y" key is held down on the Windows NT Workstation. At all times, one segment is sent, and further "y" characters were held by the stack until an acknowledgment is received for the previous segment. In this example, three to four "y" characters were saved up each time and sent together in one segment. The Nagle algorithm resulted in a huge savings in the number of packets sent—it is reduced by a factor of about three.
Each segment contained several of the "y" characters. Following is the first segment shown more fully parsed, and the data portion is pointed out in the hex at the bottom.
Time Source IP Dest IP Prot Description
0.644 172.16.48.1 172.16.112.0 TELNET To Server From Port = 1901 + FRAME: Base frame properties + ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol + IP: ID = 0xEA83; Proto = TCP; Len: 43 + TCP: .AP..., len: 3, seq:1032660278, ack: 353339017, win: 7766, src: 1901 dst: 23 (TELNET) TELNET: To Server From Port = 1901 TELNET: Telnet Data D2 41 53 48 00 00 52 41 53 48 00 00 08 00 45 00 .ASH..RASH....E. 00 2B EA 83 40 00 20 06 F5 85 CC B6 42 53 C7 B5 .+..@. .....BS.. A4 04 07 6D 00 17 3D 8D 25 36 15 0F 86 89 50 18 ...m..=.%6....P. 1E 56 1E 56 00 00 79 79 79 .V.V..yyy ^^^ data
Windows Sockets programs can disable the Nagle algorithm for their connection(s) by setting the TCP_NODELAY socket option. However, this practice should be avoided unless absolutely necessary because it increases network usage. Some network programs may not perform well if their design does not take into account the effects of transmitting large numbers of small packets and the Nagle algorithm.
TCP is designed to provide optimum performance over varying link conditions. Actual throughput for a link is dependent on a number of variables, but the most important factors are:
TCP throughput calculation is discussed in detail in Chapters 20 through 24 of TCP/IP Illustrated, by W. Richard Stevens. The following are some key considerations:
To summarize, Windows NT TCP/IP will adapt to most network conditions and dynamically provide the best throughput and reliability possible on a per-connection basis. Attempts at manual tuning are often counter-productive unless a qualified network engineer performs careful study of data flow.
User Datagram Protocol
User Datagram Protocol (UDP) provides a connectionless, unreliable transport service. It is often used for one-to-many communications, using broadcast or multicast IP datagrams. Because delivery of UDP datagrams is not guaranteed, programs using UDP must supply their own mechanisms for reliability if needed. Microsoft networking uses UDP for logon, browsing, and name resolution.
UDP and Name Resolution
UDP is used for (1) NetBIOS name resolution by using unicast to a NetBIOS name server (such as WINS) or subnet broadcasts, and (2) for DNS host name and IP address resolution. NetBIOS name resolution is accomplished over UDP port 137. DNS queries use UDP port 53. Because UDP itself does not guarantee delivery of datagrams, both of these services use their own retransmission schemes if they receive no answer to queries. Broadcast UDP datagrams are not usually forwarded over IP routers, and so NetBIOS name resolution in a routed environment requires a name server of some type, or the use of static database files.
Mailslots over UDP
Many NetBIOS programs use mailslot messaging. A 2nd class mailslot is a simple mechanism for sending a message from one NetBIOS name to another over UDP. Mailslot messages may be broadcast on a subnet, or may be directed to the remote computer. In order to direct a mailslot message to another computer, there must be some method of NetBIOS name resolution available. The WINS server running under Windows NT Server provides this service.
TCP/IP Security Filters
Security filtering for TCP/IP allows you to control the type of network traffic passed up the TCP/IP protocol stack to upper-layer protocols and programs. Security filters are one of the security mechanisms typically used on Internet servers.
TCP/IP security filters control the ports on which TCP connections and UDP datagrams are accepted (For more information ,see Appendix B, "Port Reference for MS TCP/IP.") The filters also control which IP protocol can be assessed by using raw sockets.
If TCP/IP security filters are configured on a computer running under Windows NT, incoming connection requests and datagrams are accepted or rejected based on the configured security filters. Outgoing connection requests and datagrams are not affected.
Security filters are configured separately for each network adapter to which TCP/IP is bound. Filters are applied to network traffic based on the adapter that received the traffic.
For specific information about configuring advanced TCP/IP security, see Microsoft TCP/IP Help.
NetBIOS over TCP/IP
The following TCP and UDP ports are used in NetBT, the Windows NT implementation of NetBIOS over TCP/IP.
NetBIOS over TCP/IP is specified by RFC 1001 and RFC 1002. The Netbt.sys driver is a kernel-mode component that supports the TDI interface. Services such as Windows NT Workstation and Windows NT Server services use the TDI interface directly, while traditional NetBIOS programs have their calls mapped to TDI calls by using the Netbios.sys driver. Using TDI to make calls to NetBT is a more difficult programming task, but can provide higher performance and freedom from historical NetBIOS limitations.
See the section "Network Application Interfaces" later in this chapter for more information about NetBIOS.
Transport Driver Interface
Microsoft developed the transport driver interface (TDI) to provide greater flexibility and functionality than provided by existing interfaces such as NetBIOS and Windows Sockets. The TDI interface is exposed by all Windows NT transport providers. The TDI interface specification describes the set of primitive functions by which transport drivers and TDI clients communicate, and the call mechanisms used for accessing them. Currently, the TDI Interface is kernel-mode only.
The Windows NT redirector and server both use TDI directly, rather than going through the NetBIOS mapping layer. By doing so, they are not subject to many of the restrictions imposed by NetBIOS, such as the 254 session limit.
TDI may be the most difficult to use of all the Windows NT network APIs. It is a simple conduit, so the programmer must determine the format and meaning of messages.
Note More information on the TDI interface is available in the Windows NT Device Driver Kit (DDK).
The following features are part of the Windows NT implementation of TDI.
Network Application Interfaces
There are a number of ways that network programs can communicate by using the TCP/IP protocol stack. Some of them, such as named pipes, go through the network redirector, which is part of the Workstation service. Many older programs were written to the NetBIOS interface, which is supported by NetBIOS over TCP/IP. Windows Sockets is used in many programs.
The network application programming interfaces (APIs) discussed in this section are:
Windows Sockets Interface
Windows Sockets is an API used for sending and receiving data on a network. Originally designed as the top-level interface for TCP/IP network transport stacks, the Windows Sockets API provides a standard Windows interface to many transports with different addressing schemes, including, for example, TCP/IP and IPX.
Windows Sockets specifies a programming interface based on the "socket" interface from the University of California at Berkeley. It includes a set of extensions designed to take advantage of the message-driven nature of Microsoft Windows. Windows Sockets is an open, industry-standard specification and Microsoft is one member of the group that originally defined Windows Sockets.
There are many Windows Sockets programs available. A number of the utilities that ship with Windows NT are Windows Sockets – based; for example, the DHCP client/server program.
Note Windows NT version 4.0 implements 32-bit Windows Sockets version 2.0. Earlier versions of Windows NT implemented 32-bit Windows Sockets version 1.1. See Appendix D, "Windows Sockets," for a list of Microsoft and other Internet sites from which you can receive Windows Sockets specifications.
Name and Address Resolution
Windows Sockets programs generally use the gethostbyname() call to resolve a host name to an IP address. The gethostbyname() call uses the following (default) name lookup sequence:
Some programs use the gethostbyaddr() call to resolve an IP address to a host name. The gethostbyaddr() call uses the following sequence:
Support for IP Multicasting
The Windows Sockets API has been extended to provide support for IP multicasting. The extensions, and a sample program, party.exe, that illustrates usage, are available from ftp.microsoft.com. IP multicasting is currently supported only on AF_INET sockets of type SOCK_DGRAM.
The Backlog Parameter
Windows Sockets server programs generally create a socket and then use listen() to listen on it for connection requests. One of the parameters passed when calling listen() is the backlog of connection requests that the program would like Windows Sockets to queue for it.
Windows NT Server version 4.0 allows a backlog maximum of 200. Windows NT Workstation version 4.0 supports only a maximum allowable value of 5.
Note Earlier versions of Windows NT based on the Windows Sockets 1.1 specification used the specified maximum allowable value (5) for backlog.
FTP or Web servers that are heavily used may benefit from increasing the backlog to a larger number than the default. Microsoft Internet Information Server allows the backlog parameter to be specified by using a registry setting.
PUSH Bit Interpretation
By default, Windows NT versions 4.0 and 3.5x complete a recv() call when:
If a client program is run on a computer with a TCP/IP implementation that does not set the PUSH bit on sends, response delays may result. It's best to correct this on the client side; however, a configuration parameter (IgnorePushBitOnReceives) is added to Afd.sys to force it to treat all arriving packets as though the PUSH bit were set.
Network Basic Input/Output System (NetBIOS) defines a software interface and a naming convention, not a protocol. The NetBEUI protocol, introduced by IBM in 1985, provided a protocol for programs designed around the NetBIOS interface. However, NetBEUI is a small protocol with no networking layer and because of this, it is not a routable protocol suitable for medium-to-large intranets.
NetBIOS over TCP/IP (NetBT) provides the NetBIOS programming interface over the TCP/IP protocol, extending the reach of NetBIOS client/server programs to the WAN and providing interoperability with various other operating systems. NetBT and NetBIOS are illustrated in the following figure.
Figure 6.4 NetBIOS over TCP/IP (NetBT) Component
The Windows NT Workstation service, Server service, Browser, Messenger, and Netlogon services are all direct NetBT clients that use the TDI to communicate with NetBT. Windows NT also includes a NetBIOS emulator. The emulator takes standard NetBIOS requests from NetBIOS programs and translates them to equivalent TDI primitives.
The NetBIOS namespace is flat, meaning that all names within a network must be unique. NetBIOS names are 16 characters in length. Resources are identified by NetBIOS names that are registered dynamically when computers start, services start, or users log on. Names can be registered as unique (one owner) or as group (multiple owner) names. A NetBIOS Name Query is used to locate a resource by resolving the name to an IP address.
Microsoft networking components, such as Windows NT Workstation and Windows NT Server services, allow the first 15 characters of a NetBIOS name to be specified by the user or administrator, but reserve the 16th character of the NetBIOS name to indicate a resource type (00-FF hex). See Appendix G, "NetBIOS Names."
To identify the names registered on your local computer
NetBIOS Scope, also known as TCP/IP Scope, provides a method for adding a second element to the single-element NetBIOS computer name. The scope ID is a character string value that is appended to the NetBIOS name and is used for all NetBT communications from that computer. The character string can be multi-part—for example, "mydomain.mycompany.com".
Note Use of NetBIOS Scope is strongly discouraged if you are not already using it, or if you use Domain Name System (DNS) on your network.
By installation default, the NetBIOS Scope value is NULL. You can change the default value by entering a character string in the Scope ID on the WINS Address tab on the Microsoft TCP/IP Properties page. Note that the maximum length of the combined NetBIOS name and NetBIOS Scope ID is limited to 256 characters.
Note NetBIOS Scope should not be confused with DHCP Scope, which defines the group of IP addresses that the DHCP server can lease to client computers.
The effect of using a NetBIOS Scope ID, other than the default NULL value, is to isolate a group of computers on the network that can communicate only with other computers that are configured with the identical NetBIOS Scope ID. Use NetBIOS Scope only when it is necessary to isolate a group of computers that cannot communicate with other computers on the intranet.
Once configured on the local computer, NetBIOS Scope is automatically attached to all NetBIOS commands on that local computer. In other words, NetBIOS programs started on a computer using NetBIOS Scope ID cannot "see" (receive or send messages) NetBIOS programs started by a process on a computer configured with a different NetBIOS Scope ID.
Several Windows NT-based programs, such as net logon and domain controller pass-through authentication, use NetBIOS names. Therefore, consider the effect of NetBIOS Scope ID if you decide to change the default NetBIOS Scope ID. Use the following guidelines:
NetBIOS Name Registration and Resolution
Windows NT versions 4.0 and 3.5x computers use several methods for locating NetBIOS resources:
Earlier implementations used only cache, broadcasts, and LMHOSTS files; however, in Windows NT versions 4.0 and 3.5x, a NetBIOS name server—the WINS server—was implemented, and modifications were made to allow NetBIOS programs to query the DNS namespace by appending configurable domain suffixes to a NetBIOS name.
NetBIOS name resolution order depends on the node type and computer configuration. The following node types are supported:
The many configurable options sometimes make it difficult to determine what name resolution methods to choose, and what name resolution order each configuration will use. The following flowcharts illustrate name resolution for the various node types and the relationships between the different Windows NT name resolution services.
Figure 6.5 NetBIOS Name Resolution Flowchart (part 1 of 3)
Figure 6.6 NetBIOS Name Resolution Flowchart (part 2 of 3)
Figure 6.7 NetBIOS Name Resolution Flowchart (part 3 of 3)
The NetBIOS name server provided with Windows NT Server is the Windows Internet Name Service (WINS) server. Most WINS clients are set up as h-nodes; in other words, they first attempt to register and resolve names by a using WINS server, and if that fails they try local subnet broadcasts. Using a name server to locate resources is generally preferable to broadcasting, for two reasons:
NetBT and DNS
It has always been possible to connect from one Windows NT computer to another using NetBT over the Internet and other TCP/IP networks. To do so, some means of name resolution (associating a name with the appropriate IP address) is used because an IP address is required to establish a connection.
NetBT is the name resolution service for Windows-based networking in TCP/IP. DNS is the traditional and widely used name resolution service for the Internet and other TCP/IP networks. Windows NT Server version 4.0 has expanded support for DNS by implementing a DNS server.
A DNS name is similar to a NetBIOS name in that it is a "friendly" name for a computer or other network device. However, the DNS name is based on a hierarchical naming structure (also known as the name space) that is more flexible than the flat structure of NetBIOS names. DNS computer names consist of two parts: a host name and a domain name, which when combined, form the fully qualified domain name (FQDN).
NetBIOS computer names are analogous to DNS host names, however a DNS name can be as long as 255 characters while the NetBIOS name is limited to 15 user-definable characters.
Note Under Windows NT, the DNS host name defaults to the NetBIOS computer name. Windows NT combines the NetBIOS computer name with the DNS domain name to form a FQDN by removing the 16th character in the NetBIOS name and appending a dot and the DNS domain name. If you want to change the default host name from the NetBIOS computer name, reconfigure TCP/IP by selecting the DNS page on the Microsoft TCP/IP Properties dialog box and changing the host name displayed on the DNS page.
It is now possible, in Windows NT 4.0, to connect to a NetBT resource by using an IP address, FQDN, or NetBIOS computer name. For example, if using the Event Viewer, when prompted to "select computer," you now can choose to enter an FQDN or IP address or NetBIOS computer name.
NetBIOS Name Registration and Resolution for Multihomed Computers
As mentioned earlier, NetBT only binds to one IP address per physical network interface. From the NetBT viewpoint, then, a computer is multihomed only when it has more than one NIC installed. When a name registration packet is sent from a multihomed computer, it is flagged as a "multihomed name registration" so that it will not conflict with the same name being registered by another interface in the same computer.
When a broadcast name query is received by a multihomed computer, all NetBT interface bindings receiving the query will respond with their address and, by default, the client will choose the first response and connect to the address supplied by it. This behavior can be controlled by the RandomAdapter registry parameter described in online Registry Help.
When a directed name query is sent to a WINS server, the WINS server will respond with a list of all IP addresses that were registered with WINS by a multihomed computer.
Choosing the "best" IP address to connect to on a multihomed computer is a client function. Currently, the following algorithm is employed, in the order listed:
This algorithm provides a reasonably good way of balancing connections to a server across multiple NICs, while still favoring direct connections when they are available.
Note The current implementation of NetBT does not attempt to "walk the list" of returned addresses if a connection attempt to the first choice fails. This enhancement has been requested and is under review.
NetBIOS sessions are established between two names. For example, when a Windows NT Workstation makes a file sharing connection to a server, the following sequence of events takes place:
Once the NetBIOS session has been established, the workstation and server negotiate a higher level protocol to use over it. Microsoft networking uses only one NetBIOS session between two names at any point in time. Any additional file or print sharing connections made after the first one are multiplexed over that same NetBIOS session.
NetBIOS keepalives are used on each connection to verify that the server and workstation are both still up and able to maintain their session. This way, if a workstation is shut down ungracefully, the server will eventually clean up the connection and associated resources, and vice versa. NetBIOS keepalives are controlled by the SessionKeepAlive registry parameter and default to once per hour.
If LMHOSTS files are used and an entry is misspelled, it is possible to attempt to connect to a server using the correct IP address but an incorrect name. In this case, a TCP connection will still be established to the server. However, the NetBIOS session request (using the wrong name) will be rejected by the server, because there is no listen posted on that name. Error 51, "remote computer not listening," will be returned.
NetBT Datagram Services
Datagrams are sent from one NetBIOS name to another over UDP port 138. The datagram service provides the ability to send a message to a unique name or to a group name. Group names may resolve to a list of IP addresses or to a broadcast. For example, the command net send /d:mydomain test would send a datagram containing the text "test" to the group name <mydomain>. The <mydomain> name would resolve to an IP subnet broadcast, and so the datagram would be sent with the following characteristics:
All hosts on the subnet would pick up the datagram and process it at least to the UDP protocol. On hosts running a NetBIOS datagram service, UDP would hand the datagram to NetBT on port 138. NetBT would check the destination name to see if any program had posted a datagram receive on it, and if so would pass the datagram up. If no receive is posted, the datagram is discarded.
TCP/IP Client/Server Programs
This chapter is intended to provide an overview of the Windows NT versions 4.0 and 3.5x implementation of the TCP/IP stack, not the many clients and services that are shipped with the product or are available from third parties. However, there are a few client/server programs that are critical to the configuration and operation of the TCP/IP protocol suite. These client/programs are briefly described in the following sections and then explained in detail in:
Dynamic Host Configuration Protocol
The Dynamic Host Configuration Protocol (DHCP) client/server is a Windows Sockets program that is used to provide automatic and dynamic configuration of various TCP/IP protocol components. The server is configured with "scopes," which are ranges of IP addresses, to distribute to network clients as they start on the network. The DHCP server can also provide the additional configuration parameters that are associated with the IP addresses. For example, a scope that includes a specific range of IP addresses may also be associated with default gateway, DNS server, and NetBIOS Name Server (WINS), with which the DHCP clients can be configured.
Obtaining Configuration Parameters Using DHCP
When a DHCP-enabled client starts for the very first time, it broadcasts a DHCP Discover request onto the local subnet. Any DHCP server that receives the request may respond with a DHCP Offer that contains proposed configuration parameters. The client can evaluate the offer, and respond with a DHCP request to accept it. The server finalizes the transaction with a DHCP ACK. The following example explains this sequence.
First, the DHCP Discover is sent as the stack initializes:
Time Source IP Dest IP Prot Description
0.000 0.0.0.0 255.255.255.255 DHCP Discover (xid=68256CA8) + FRAME: Base frame properties ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol + ETHERNET: Destination address : 255.255.255.255 + ETHERNET: Source address : 00DD01075715 ETHERNET: Frame Length : 342 (0x0156) ETHERNET: Ethernet Type : 0x0800 (IP: DOD Internet Protocol) ETHERNET: Ethernet Data: Number of data bytes remaining = 328 (0x0148) IP: ID = 0x0; Proto = UDP; Len: 328 IP: Version = 4 (0x4) IP: Header Length = 20 (0x14) + IP: Service Type = 0 (0x0) IP: Total Length = 328 (0x148) IP: Identification = 0 (0x0) + IP: Flags Summary = 0 (0x0) IP: Fragment Offset = 0 (0x0) bytes IP: Time to Live = 32 (0x20) IP: Protocol = UDP - User Datagram IP: CheckSum = 0x99A6 IP: Source Address = 0.0.0.0 IP: Destination Address = 255.255.255.255 IP: Data: Number of data bytes remaining = 308 (0x0134) UDP: IP Multicast: Src Port: BOOTP Client, (68); Dst Port: BOOTP Server (67); Length = 308 (0x134) UDP: Source Port = BOOTP Client UDP: Destination Port = BOOTP Server UDP: Total length = 308 (0x134) bytes UDP: CheckSum = 0x4A0E UDP: Data: Number of data bytes remaining = 300 (0x012C) DHCP: Discover (xid=68256CA8) DHCP: Op Code (op) = 1 (0x1) DHCP: Hardware Type (htype) = 1 (0x1) 10Mb Ethernet DHCP: Hardware Address Length (hlen) = 6 (0x6) DHCP: Hops (hops) = 0 (0x0) DHCP: Transaction ID (xid) = 1747283112 (0x68256CA8) DHCP: Seconds (secs) = 0 (0x0) DHCP: Flags (flags) = 0 (0x0) DHCP: 0............... = No Broadcast DHCP: Client IP Address (ciaddr) = 0.0.0.0 DHCP: Your IP Address (yiaddr) = 0.0.0.0 DHCP: Server IP Address (siaddr) = 0.0.0.0 DHCP: Relay IP Address (giaddr) = 0.0.0.0 DHCP: Client Ethernet Address (chaddr) = 00DD01075715 DHCP: Server Host Name (sname) = <Blank> DHCP: Boot File Name (file) = <Blank> DHCP: Magic Cookie = [OK] DHCP: Option Field (options) DHCP: DHCP Message Type = DHCP Discover DHCP: Client-identifier = (Type: 1) 00 dd 01 07 57 15 DHCP: Host Name = DAVEMAC4 DHCP: End of this option field
There are several interesting points to note in the DHCP discover packet. First, it is sent as a broadcast at both the link layer and the IP layer. Second, the DHCP broadcast flag is set to 0, indicating that the client is capable of receiving a response that is directed to its MAC address (indicated by chaddr). This means that the DHCP server is not required to broadcast the response.
Note Windows NT version 3.5 computers required a broadcast response and did not set this flag to 0.
Finally, note that there is a transaction ID (XID) used to track each configuration sequence. Any response to this discover packet should reference the same XID.
A DHCP offer follows:
Time Source IP Dest IP Prot Description
0.165 172.16.113.254 172.16.112.13 DHCP Offer (xid=68256CA8) + FRAME: Base frame properties ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol + ETHERNET: Destination address : 00DD01075715 + ETHERNET: Source address : 00000C1AEBC5 ETHERNET: Frame Length : 590 (0x024E) ETHERNET: Ethernet Type : 0x0800 (IP: DOD Internet Protocol) ETHERNET: Ethernet Data: Number of data bytes remaining = 576 (0x0240) IP: ID = 0x906; Proto = UDP; Len: 576 IP: Version = 4 (0x4) IP: Header Length = 20 (0x14) + IP: Service Type = 0 (0x0) IP: Total Length = 576 (0x240) IP: Identification = 2310 (0x906) + IP: Flags Summary = 0 (0x0) IP: Fragment Offset = 0 (0x0) bytes IP: Time to Live = 31 (0x1F) IP: Protocol = UDP - User Datagram IP: CheckSum = 0xAF0D IP: Source Address = 172.16.113.254 IP: Destination Address = 172.16.112.13 IP: Data: Number of data bytes remaining = 556 (0x022C) UDP: Src Port: BOOTP Server, (67); Dst Port: BOOTP Client (68); Length = 556 (0x22C) DHCP: Offer (xid=68256CA8) DHCP: Op Code (op) = 2 (0x2) DHCP: Hardware Type (htype) = 1 (0x1) 10Mb Ethernet DHCP: Hardware Address Length (hlen) = 6 (0x6) DHCP: Hops (hops) = 0 (0x0) DHCP: Transaction ID (xid) = 1747283112 (0x68256CA8) DHCP: Seconds (secs) = 0 (0x0) DHCP: Flags (flags) = 0 (0x0) DHCP: 0............... = No Broadcast DHCP: Client IP Address (ciaddr) = 0.0.0.0 DHCP: Your IP Address (yiaddr) = 172.16.112.13 DHCP: Server IP Address (siaddr) = 0.0.0.0 DHCP: Relay IP Address (giaddr) = 172.16.112.1 DHCP: Client Ethernet Address (chaddr) = 00DD01075715 DHCP: Server Host Name (sname) = <Blank> DHCP: Boot File Name (file) = <Blank> DHCP: Magic Cookie = [OK] DHCP: Option Field (options) DHCP: DHCP Message Type = DHCP Offer DHCP: Subnet Mask = 255.255.255.0 DHCP: Renewal Time Value (T1) = 1 Days, 12:00:00 DHCP: Rebinding Time Value (T2) = 2 Days, 15:00:00 DHCP: IP Address Lease Time = 3 Days, 0:00:00 DHCP: Server Identifier = 172.16.113.254 DHCP: End of this option field
The DHCP offer is also interesting. The XID is the same as that in the discover packet. It is a directed offer, not sent as a broadcast, and it is directed to the MAC address of the client and to the proposed IP address for the client. The source address is from a different subnet (172.16.113) than the subnet that the client is attached to, indicating that both the discover and the offer must have traversed a router. This can be verified by checking the DHCP "giaddr" field, that is set to 172.16.112.1. As you might suspect, a router is configured to forward DHCP broadcasts from this subnet to the one where the DHCP server is located. DHCP forwarding is discussed in RFC 1542, and routers used for this purpose must explicitly support the RFC and be configured accordingly.
Next, the client accepts the offer:
Time Source IP Dest IP Prot Description
0.172 0.0.0.0 255.255.255.255 DHCP Request (xid=08186BD1) + FRAME: Base frame properties + ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol + IP: ID = 0x100; Proto = UDP; Len: 328 + UDP: IP Multicast: Src Port: BOOTP Client, (68); Dst Port: BOOTP Server (67); Length = 308 (0x134) DHCP: Request (xid=08186BD1) DHCP: Op Code (op) = 1 (0x1) DHCP: Hardware Type (htype) = 1 (0x1) 10Mb Ethernet DHCP: Hardware Address Length (hlen) = 6 (0x6) DHCP: Hops (hops) = 0 (0x0) DHCP: Transaction ID (xid) = 135818193 (0x8186BD1) DHCP: Seconds (secs) = 0 (0x0) DHCP: Flags (flags) = 0 (0x0) DHCP: 0............... = No Broadcast DHCP: Client IP Address (ciaddr) = 0.0.0.0 DHCP: Your IP Address (yiaddr) = 0.0.0.0 DHCP: Server IP Address (siaddr) = 0.0.0.0 DHCP: Relay IP Address (giaddr) = 0.0.0.0 DHCP: Client Ethernet Address (chaddr) = 00DD01075715 DHCP: Server Host Name (sname) = <Blank> DHCP: Boot File Name (file) = <Blank> DHCP: Magic Cookie = [OK] DHCP: Option Field (options) DHCP: DHCP Message Type = DHCP Request DHCP: Client-identifier = (Type: 1) 00 dd 01 07 57 15 DHCP: Requested Address = 172.16.112.13 DHCP: Server Identifier = 172.16.113.254 DHCP: Host Name = DAVEMAC4 DHCP: Parameter Request List = (Length: 7) 01 0f 03 2c 2e 2f 06 DHCP: End of this option field
The request is again broadcast, and the proposed IP address from the server is referenced. The request is broadcast for a reason—the client could have received more than one offer and, by broadcasting its request, it allows the other DHCP servers to see that it isn't going to use their offers.
Finally, the client acknowledges that it will accept the lease:
Time Source IP Dest IP Prot Description
0.061 172.16.113.254 172.16.112.13 DHCP ACK (xid=08186BD1) + FRAME: Base frame properties + ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol + IP: ID = 0xA06; Proto = UDP; Len: 576 + UDP: Src Port: BOOTP Server, (67); Dst Port: BOOTP Client (68); Length = 556 (0x22C) DHCP: ACK (xid=08186BD1) DHCP: Op Code (op) = 2 (0x2) DHCP: Hardware Type (htype) = 1 (0x1) 10Mb Ethernet DHCP: Hardware Address Length (hlen) = 6 (0x6) DHCP: Hops (hops) = 0 (0x0) DHCP: Transaction ID (xid) = 135818193 (0x8186BD1) DHCP: Seconds (secs) = 0 (0x0) DHCP: Flags (flags) = 0 (0x0) DHCP: 0............... = No Broadcast DHCP: Client IP Address (ciaddr) = 0.0.0.0 DHCP: Your IP Address (yiaddr) = 172.16.112.13 DHCP: Server IP Address (siaddr) = 0.0.0.0 DHCP: Relay IP Address (giaddr) = 172.16.112.1 DHCP: Client Ethernet Address (chaddr) = 00DD01075715 DHCP: Server Host Name (sname) = <Blank> DHCP: Boot File Name (file) = <Blank> DHCP: Magic Cookie = [OK] DHCP: Option Field (options) DHCP: DHCP Message Type = DHCP ACK DHCP: Renewal Time Value (T1) = 1 Days, 12:00:00 DHCP: Rebinding Time Value (T2) = 2 Days, 15:00:00 DHCP: IP Address Lease Time = 3 Days, 0:00:00 DHCP: Server Identifier = 172.16.113.254 DHCP: Subnet Mask = 255.255.255.0 DHCP: Domain Name = (Length: 22) 63 73 77 61 74 63 70 2e 6d 69 63 72 6f 73 6f 66 ... DHCP: Router = 172.16.112.1 DHCP: NetBIOS Name Service = 172.16.113.254 DHCP: NetBIOS Node Type = (Length: 1) 08 DHCP: End of this option field
The acknowledgment is the final packet of the transaction, and it contains all of the configuration parameters that the client will use.
Lease Expiration and Renewal
DHCP-supplied configurations are "leased" from the server. Periodically, the client will contact the server to renew the lease. The protocol and implementation are very robust and configurable, and short-term server or network outages do not generally affect lease renewal. For example, DHCP clients start to try to renew their lease when 50 percent of the lease time has expired. Repeated attempts are made to contact the DHCP server and renew the lease, until 87.5 percent of the lease time has expired. At this point, the client attempts to get a new lease from any available DHCP server.
When a DHCP client is rebooted, it attempts to verify that the lease it holds is valid for the current subnet. If it is moved to another subnet and rebooted, the following sequence takes place:
Source Destination Source IP Destination IP Pro Description
davemacp *BROADCAST 0.0.0.0 255.255.255.255 DHCP Request (xid=6E3A2E74) router *BROADCAST 10.57.8.1 255.255.255.255 DHCP NACK (xid=6E3A2E74) davemacp *BROADCAST 0.0.0.0 255.255.255.255 DHCP Discover (xid=51CA7FED) router davemacp 10.57.8.1 10.57.13.152 DHCP Offer (xid=51CA7FED) davemacp *BROADCAST 0.0.0.0 255.255.255.255 DHCP Request (xid=2081237D) router davemacp 10.57.8.1 10.57.13.152 DHCP ACK (xid=2081237D)
In this example the portable computer "davemacp" is moved to a new subnet and re-started. It broadcasts a DHCP request for renewal of its old parameters, but the DHCP server responsible for the new subnet recognized that these were invalid for the subnet and NAK'd them. The DHCP client software automatically went through a normal discovery process to get reconfigured with parameters that are valid for the new location. For additional information on DHCP, see Chapter 7, "Managing Microsoft DHCP Servers."
Windows Internet Name Service
Windows Internet Name Service (WINS) is a NetBIOS name service as described in RFC 1001 and RFC 1002. When a Windows NT computer is configured as h-node (default for WINS clients), it attempts to use a WINS server for name registration and resolution first and, if that fails, it resorts to subnet broadcasts.
WINS Name Registration and Resolution
Using WINS for name services dramatically reduces the number of IP broadcasts used by Microsoft network clients. The following portion of a trace illustrates name registration and resolution traffic caused by starting a Windows NT workstation.
Source IP Destination IP Prot Description
172.16.112.124 172.16.113.254 NBT NS: MultiHomed Name Registration req. for DAVEMAC4<00> 172.16.113.254 172.16.112.124 NBT NS: Registration resp. for DAVEMAC4<00>, Success 172.16.112.124 172.16.113.254 NBT NS: Registration req. for DAVEMACD<00> 172.16.113.254 172.16.112.124 NBT NS: Registration resp. for DAVEMACD<00>, Success 172.16.112.124 172.16.113.254 NBT NS: Query req. for DAVEMACD<1C> 172.16.113.254 172.16.112.124 NBT NS: Query resp. for DAVEMACD<1C>, Success 172.16.112.124 172.16.113.254 NBT NS: MultiHomed Name Registration req. for DAVEMAC4<03> 172.16.113.254 172.16.112.124 NBT NS: Registration resp. for DAVEMAC4<03>, Success
This trace shows that the starting client (172.16.112.124) sends a single name registration request to the WINS server, asking to register the computer name (DAVEMAC4<00>) as a unique name for a multihomed host. The WINS server responds affirmatively. Next, the domain name (DAVEMACD<00>) is registered as a group name. Then a name query is sent to the WINS server, requesting a list of domain controllers (who all register the <domain>[1C] name) so that a logon server can be contacted. One more registration is shown, for DAVEMAC4<03>, which is the name registered by the Messenger service. The fully parsed version of the domain name registration follows:
Source IP Destination IP Prot Description
172.16.112.124 172.16.113.254 NBT NS: Registration req. for DAVEMACD<00> + FRAME: Base frame properties + ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol + IP: ID = 0x300; Proto = UDP; Len: 96 + UDP: Src Port: NETBIOS Name Service, (137); Dst Port: NETBIOS Name Service (137); Length = 76 (0x4C) NBT: NS: Registration req. for DAVEMACD<00> NBT: Transaction ID = 32770 (0x8002) NBT: Flags Summary = 0x2900 - Req.; Registration; Success NBT: 0............... = Request NBT: .0101........... = Registration NBT: .....0.......... = Non-authoritative Answer NBT: ......0......... = Datagram not truncated NBT: .......1........ = Recursion desired NBT: ........0....... = Recursion not available NBT: .........0...... = Reserved NBT: ..........0..... = Reserved NBT: ...........0.... = Not a broadcast packet NBT: ............0000 = Success NBT: Question Count = 1 (0x1) NBT: Answer Count = 0 (0x0) NBT: Name Service Count = 0 (0x0) NBT: Additional Record Count = 1 (0x1) NBT: Question Name = DAVEMACD<00> NBT: Question Type = General Name Service NBT: Question Class = Internet Class NBT: Resource Record Name = DAVEMACD<00> NBT: Resource Record Type = NetBIOS General Name Service NBT: Resource Record Class = Internet Class NBT: Time To Live = 300000 (0x493E0) NBT: RDATA Length = 6 (0x6) NBT: Resource Record Flags = 57344 (0xE000) NBT: 1............... = Group NetBIOS Name NBT: .11............. = Reserved NBT: ...0000000000000 = Reserved NBT: Owner IP Address = 172.16.112.13
Because the domain name is a group name, any number of hosts are allowed to register it.
WINS in a DHCP Environment
WINS is especially helpful on DHCP-enabled networks. One of the DHCP-provided parameters can be the address of a WINS server, and so as soon as the client is configured by DHCP, it registers its name(s) and address with the WINS server, and can then be easily located by the other computers on the network. This combination of DHCP and WINS is ideal for dynamic situations.
For additional information on WINS, see Chapter 8, "Managing Microsoft WINS Servers."
Domain Name System
Windows NT Server version 4.0 includes an RFC-compliant Domain Name System (DNS) server. DNS servers are defined in RFCs 1034 and 1035.
DNS is a global, distributed database based on a hierarchical naming system. The naming system was developed to provide a method for uniquely identifying hosts (computers and other network devices) on the Internet and other TCP/IP networks. The root of the DNS database is managed by the Internet Network Information Center. The top-level domains are assigned by organization and by country.
The DNS name consists of two parts—the domain name and the host name—known together as the fully qualified domain name (FQDN). For example, using the fictional domain name of Terra Flora, an FQDN for a workstation in the nursery division could be: jeff.nursery.terraflora.com. Note that the DNS name can actually be multi-part with each part of the name separated by a period (.).
DNS uses a client/server model. The DNS name server contains information about a portion of the global DNS name space, such as a private intranet. Client computers can be configured to query the DNS server for host name-to-IP-address mapping as needed to connect to the Internet or an intranet TCP/IP network resource.
Integration of Windows NT DNS and WINS Servers
The Windows NT-based DNS server provides connectivity between WINS and DNS. In addition to providing an RFC-compliant DNS service, the Windows NT Server-based DNS server can pass through an unresolved DNS name query to a WINS server for final name resolution.
This occurs transparently and the client need not be aware of whether a DNS or WINS server processed the name query. In a Windows NT– based network running both DNS and WINS servers, you can perform forward look-up—which is IP address resolution by using a friendly (NetBIOS or DNS) name, and reverse look-up—which is (NetBIOS or DNS) name resolution by using an IP address.
Dynamic WINS and Static DNS
WINS provides a dynamic, distributed database for registering and querying dynamic NetBIOS computer name-to-IP-address mappings. DNS provides a static, distributed database for registering and querying static FQDN name-to-IP-address mappings.
DNS depends on static files for name resolution and does not yet support dynamic updates of name and IP address mapping. In other words, DNS requires static configuration of IP addresses to perform name-to-IP-address mapping, WINS supports DHCP dynamic allocation of IP addresses and can resolve a NetBIOS computer name to a dynamic IP address mapping.
Note Dynamic DNS is currently under discussion by the Internet Engineering Task Force (IETF).
For information about installing a Microsoft DNS server, see online Help. For information about configuration of the DNS server and using the DNS Manager, see online DNS Manager Help.
Note Microsoft first included a Beta version of a DNS server for Windows NT 3.5x in the Windows NT 3.5x Resource Kits. You can upgrade the Beta version to Microsoft DNS server under Windows NT Server version 4.0. See the topic "To Upgrade a Windows NT 3.51 Resource Kit DNS Server" in Microsoft TCP/IP Help.
For additional information on DNS, see Chapter 9, "Managing Microsoft DNS Servers."
The Browser service (not to be confused with a Web browser) was originally designed to be a simple workgroup enumeration tool, but has been enhanced significantly over time. The Browser service supports browsing computers on the network and being browsed by other computers.
It is the service that gathers and organizes the list of computers and domains that is displayed in Network Neighborhood. (You can also see the browse list by typing net viewin the command window.) The Browser maintains an up-to-date list of computers and provides this information to programs that require it.
Note Under Windows NT version 3.5x, use the File Manager Connect Network Drive dialog box to view the computer browser list.
Master Browser Elections
The Primary Domain Controller (PDC) for a domain always functions as the Domain Master Browser and is responsible for replicating the browse lists to all Master Browsers within the domain. A Master Browser is elected on each subnet within the domain.
Each domain has one Master Browser per subnet that contains computers listening for server announcements. The Master Browser maintains lists of available resources that can be requested by client computers.
As the number of hosts on a subnet grows, the Master Browser will start to replicate the browse list to Backup Browsers. If the Master Browser is shut down, an election takes place to determine the new Master Browser. Existing Backup Browsers have an advantage in the election. For this process, workgroups and domains function alike, except that all Windows NT Servers are either a Master Browser or Backup Browser, and Windows NT Workstation and Windows for Workgroups computers are not allowed to become Backup or Master browsers unless specifically configured.
Master Browser elections take place over the special <domain>[1E] NetBIOS name using subnet broadcasts (without using WINS). The election is fully automatic and takes into consideration a number of heuristics: operating system, version number, uptime, role (Workstation, Backup Domain Controller, Primary Domain Controller), etc. In general, the most robust computer on the network wins. Elections are forced when:
Maintaining Browse Lists
File servers periodically (once every 12 minutes) announce their presence to the special <domain>[1D] NetBIOS name in an IP subnet broadcast. The Master Browser builds a list from these broadcasts. In addition, all Master Browsers register a group name \0x01\0x02__MSBROWSE__\0x02\0x01on the local subnet (not with WINS). Periodically the Master Browsers in the domains and workgroups announce their presence to this special name. Thus, in addition to the workgroup or domain membership lists, Master Browsers also maintain lists of other domains with their associated Master Browsers.
Requesting Browse Lists
When a browse request is made from a client, a "GetBackupListRequest" is sent to the <domain>[1D] name (the Master Browser) that returns a list of Browser servers for the local subnet. The "GetBackupListRequest" is also unicast to the Domain Master Browser, which handles the case in which the queried domain has no members on the subnet. The client Browser service selects three of the browsers from the list and stores them for future use. Then when further browsing is done, by calling the NetServerEnum API, one of the three saved names is contacted by the client.
When a client queries its workgroup or domain browser, it first gets back a list of all of the domains and workgroups that the browser has learned about through the \0x01\0x02__MSBROWSE__\0x02\0x01 name as well as the name of the Master Browser for each. When the user expands a domain or workgroup into a membership list, the client sends a request to <domain>[1D] to get to the list (this is translated to a local subnet broadcast by WINS). If this fails, it contacts the Master Browser for the particular domain or workgroup and fetches the membership list.
The Domain Master Browser
As mentioned earlier, the PDC always acts as the Domain Master Browser. Because each locally-elected Master Browser will only hear local membership announcements, there needs to be a mechanism to consolidate all of the members into a single list. This is the role of the Domain Master Browser.
Periodically, all of the locally-elected Master Browsers contact the PDC and replicate their membership lists to it. The PDC merges the list with the "master" list for the whole domain and replicates the master list back down.
The replication algorithm is "smart" in that the local Master Browsers only replicate the members that they have learned about locally to the domain master. This whole mechanism allows members in a domain to span subnets and, for all clients (eventually), to be able to get complete membership lists.
On WINS-enabled networks, the browser code in Windows NT versions 4.0 and 3.5x periodically connects to WINS and learns all of the computers that have registered any <domain>[1B] names. The Browser then does a GetDCName() on each of the <domain>[1B] names (followed by an attempt on <domain>[1C]), and adds the <domain name> <master browser name> to its domain/workgroup list. This allows members of one domain to locate the Master Browser for another domain even when it is on another subnet and the two domains have no "broadcast area" in common.
Browsing for Other Windows-based Computers
Browser code for Windows for Workgroups computers has been enhanced several times to reduce the dependency on having a BDC per subnet. The updated files are available from ftp.microsoft.com. Windows 95 computers also contain enhanced browsing code.
Windows NT Workstation and Windows NT Server Services
The Workstation and Server services are used for file and print sharing. Both use NetBIOS over TCP/IP to communicate with each other; however, they are not NetBIOS programs. They are written to talk directly to NetBT over the TDI interface. Being direct TDI clients, they are high performance and not subject to limitations of the NetBIOS interface, such as the 254 session limit. The Server Message Block (SMB) protocol is used to send commands and responses between clients and servers. Public SMB specifications are available from ftp.microsoft.com.
When a user logs on to a Windows NT domain, the following sequence of events occurs:
Connecting to Network Resources
When a workstation attempts to connect to a shared resource on the network, the resource is "called" by its NetBIOS name. The name-to -IP-address resolution is done in the manner illustrated in the NetBIOS Name Resolution Flowchart (Figures 6.5 through 6.7) in the section "NetBIOS Interface" earlier in this chapter.
Once the IP address of the target host is known, a standard TCP/IP connection is set up, and a NetBIOS session is established over that connection. The user is authenticated using encrypted passwords, and then client/server messages are exchanged using the SMB protocol. The workstation and server use sophisticated caching mechanisms to reduce network traffic and provide high performance. When WINS is used, there is no reliance on IP broadcasts, with the single exception of ARPs.
The Windows NT Workstation and Windows NT Server services were designed with many optimizations to minimize network traffic and maximize throughput. The network redirector works closely with the Windows NT Cache Manager to provide read-ahead caching, write-behind caching, and search caching. Various file locking schemes, such as opportunistic locking and local file lock optimization, help to reduce network traffic. The SMB protocol which is used supports compound commands and responses, such as LockAndRead and WriteAndUnlock.
Microsoft Remote Access Service
Windows NT Remote Access Service (RAS) is a networking service that connects remote or mobile workers to corporate networks. RAS uses the following remote access protocols for RAS server and client services:
Note Remote access protocols control the transmission of data over wide area networks (WANs). Protocols such as TCP/IP, IPX, and NetBEUI are considered local area network (LAN) protocols. The focus of this chapter is TCP/IP; for detailed information about RAS and the other LAN protocols, see the Networking Supplement for Windows NT Server version 4.0.
RAS servers act as a "proxy" for TCP/IP clients. RAS servers use proxy ARP to respond to ARP requests from dial-up networking clients, and also set up the network host routes to each dial-up client. RAS servers may obtain configuration parameters for their clients from a DHCP server, and then use PPP IPCP (Internet Protocol Control Protocol), as defined in RFC 1332, to dynamically configure their clients with these parameters over the RAS link.
When a RAS server is configured to use DHCP to obtain TCP/IP configuration parameters for its clients, a pool of leased addresses is obtained from the DHCP server and managed locally by the RAS server. If more addresses are needed, or leases need to be renewed, the RAS server will contact the DHCP server; however, it does not check with the DHCP server each time a dial-up networking client starts. If the RAS server is moved to another subnet, it may have a pool of leases that are not valid for the new subnet still stored in the registry until they expire.
RAS clients using TCP/IP can be configured to use the default gateway on the remote network while they are connected to a RAS server. This default gateway overrides any local network default gateway while the RAS connection is established. The override is accomplished by manipulating the IP route table. Any local routes, including the default gateway, get their metric (hop count) incremented by one, and a default route with a metric of 1 hop is dynamically added for the duration of the connection. One-hop routes are also added for the IP multicast address (22.214.171.124), for the local WAN interface, and for the network that the PPP server is attached to.
This can present a problem connecting to resources by using the local network default gateway, unless static routes are added at the client. The following are sample route tables for a computer running Windows NT Workstation or Windows NT Server before and after connecting to a remote network using PPP:
Route table before dialing a PPP Internet provider:
Route table after dialing a PPP Internet provider:
Secure Internet Transport with TCP/IP and PPTP
Windows NT-based RAS is based on PPP, the industry-standard for dial-up access services and includes industry-standards for authentication and encryption. PPTP, which is used to create virtual private networks (VPNs), uses PPP to provide compressed and encrypted RAS communication. PPTP technology enables RAS user access to private networks by using the Internet instead of long distance telephone lines (thus reducing transmission costs). RAS users can use PPTP over the Internet by either:
PPTP provides multi-protocol support for IP, IPX, and NetBEUI protocols. For example, RAS clients using PPTP and the Internet (as a network backbone) can send and receive IPX and NetBEUI packets.
Note Because the Internet is a TCP/IP-based network, you must install and bind TCP/IP to the network card that will be used for RAS and PPTP communications. To select the network card (adapter) and to enable PPTP filtering, open the Microsoft TCP/IP Properties page, and click Advanced to open the Advanced TCP/IP Properties page. For specific instructions, see online Help.
The following figure illustrates the implementation of PPTP. Note that after processing a packet (from an IP, IPX, or NetBEUI transport), PPTP sends the packet to the top of the TCP/IP protocol stack. The TCP/IP protocol stack then sends the packet across the Internet. (At the receiving end of a packet transmission, the PPTP packet must be decoded by another PPTP service.)
Figure 6.8 Using RAS with PPTP and TCP/IP
For detailed information about PPTP, see the chapter "Point-to-Point Tunneling Protocol" in the Networking Supplement for Windows NT Server version 4.0.
By default, RAS uses effective compression methods to increase the amount of data that can be pumped over a serial link. Bandwidth planning is important when designing and installing computers and services using RAS. As a rule of thumb, transfer rates can be estimated using the 10-bit byte to allow for protocol and timing overhead. For example, 9600 BPS (without compression) is approximately 1 Kbyte/second, 60Kbytes/minute, and 3.5Mbytes/hour. If the data being transferred compresses fairly well, 5-8 Mbytes per hour throughput might be expected. While this may be an adequate rate for a single workstation, it probably is not feasible as an inter-site link for most programs. ISDN (128Kbits/second or 45 Mbytes/hour, not including compression) might be more realistic. ISDN service in the United States has recently become more available and economical to install and use.
Simple Network Management Protocol
The Simple Network Management Protocol (SNMP) agent in Windows NT provides some programmatic access to the TCP/IP protocol stack and can be used to get information about the performance and usage of network components. The SNMP agent supports network management programs provided by Microsoft and third-party vendors. For more information about SNMP, see Chapter 11, "Using SNMP for Network Management," and Appendix C, "MIB Object Types for Windows NT." For information about installing and configuring SNMP, see Microsoft TCP/IP Help.
Line printer (LPR) is one of the network protocols and utilities of the TCP/IP protocol suite and is defined in RFC 1179. LPR provides a standard for transmitting TCP/IP print jobs between computers. With the LPR protocol, a client can send a print job to a print spooler service on another computer running the print spooler service known as line-printer daemon (LPD).
Windows NT provides both the LPR and LPD services for TCP/IP printing. In general, Windows NT supports TCP/IP printing as documented in RFC 1179. However, because RFC 1179 describes an existing print server protocol that is widely used on the Internet, but which is not an Internet standard, changes to printing under Windows NT 4.0 is somewhat different than printing described in RFC 1179. The following TCP/IP print enhancements were added to Windows NT version 4.0.
Note Under Windows NT version 3.5x, all TCP/IP print jobs sent from a Windows NT computer were sourced from TCP ports 721 through 731, and, if many jobs were sent in quick succession, the ports could be "used up," causing a pause in printing until one of them passed through the TCP TIME_WAIT state.
Microsoft Internet Information Server
Internet Information Server (IIS) is a powerful Web, FTP, and GOPHER server designed specifically for Windows NT. It uses a worker thread model (as opposed to a "thread per client" model) to provide the ability to service an extremely large number of connections with high performance. It also takes advantage of the new performance-boosting Microsoft Internet API set, which includes calls such as TransmitFile() and AcceptEx().
All three services run from within the same process (inetinfo.exe) and share resources such as worker threads and cached file handles. IIS has configurable logging that supports both text files and logging directly to a database by using ODBC. IIS also supports the use of sophisticated databases on the "backend" so that access to a database can be achieved by using a standard Web browser. In many cases a database is easier to maintain than a large number of static HTML files.
IIS ships with Windows NT Server 4.0, and versions for Windows NT Server 3.51 are available from www.microsoft.com. Microsoft Internet Information Server is not designed (or licensed) to run on Windows NT Workstation. More details are available from the Microsoft Web site.
Summary of Changes
As a convenient reference, the following sections list recent changes in Microsoft 32-bit TCP/IP for Windows NT.
Additional or Changed TCP/IP Registry Parameters