Server and Domain Isolation Using IPsec and Group Policy

Chapter 7: Troubleshooting IPsec

Published: March 17, 2005 | Updated: July 24, 2006

This chapter provides information about how to troubleshoot Internet Protocol security (IPsec), such as in server and domain isolation scenarios, and is based on the experience and processes of the Microsoft Information Technology (IT) team. Where possible, this chapter refers to existing Microsoft troubleshooting procedures and related information.

Microsoft IT support is based on a multi-tiered support model, and the help desk is referred to as Tier 1 support. Escalation procedures enable the help desk staff to escalate incidents that require the assistance of specialists.

The procedures in this chapter refer to three levels of support: Tier 1, Tier 2, and Tier 3. To ensure that the guidance is as practical and concise as possible, most of the content is at the Tier 2 level. Initial Tier 1 guidance is provided to help an organization determine as quickly as possible if a problem is related to IPsec and, if it is, to generate the required information to help Tier 2 support engineers troubleshoot the problem.

The highly detailed and complex information that would be required to support Tier 3 troubleshooting efforts is outside the scope of this chapter. If the information provided in this chapter does not fix the IPsec problem, Microsoft recommends that you contact Microsoft® Product Support Services to obtain additional assistance.

Many of the support procedures, tools, and scripts that are used by Microsoft are provided in this chapter for reference purposes. These recommendations and tools should be adapted to meet the specific needs of your organization.

When IPsec is used to secure Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) traffic on the network, typical TCP/IP network troubleshooting procedures and tools can become ineffective. For this reason, it is important to plan for and develop IPsec-specific troubleshooting techniques that can be used if an issue arises between computers that use (or attempt to use) IPsec for their communications.

On This Page
Support Tiers and EscalationSupport Tiers and Escalation
Tier 1 TroubleshootingTier 1 Troubleshooting
Tier 2 Troubleshooting PreparationTier 2 Troubleshooting Preparation
The IPsec Troubleshooting ProcessThe IPsec Troubleshooting Process
Tier 3 TroubleshootingTier 3 Troubleshooting
SummarySummary

Support Tiers and Escalation

Within Microsoft, server and domain isolation support is a standard offering and is defined in standard service level agreements (SLA). Isolation support is provided by the following tiers:

Tier 1: Help desk. The help desk is the entry point for both domain-joined and non-domain-joined client issues. The help desk also supports servers that are managed by the central IT organization. (Other servers may be managed by line of business application teams or product groups.) Help desk staff members are trained to use a taxonomy and several flowcharts for classifying problems that relate to server and domain isolation.

During the pilot phase of the Microsoft isolation solution, client issues were escalated to the Corporate IT Security department. However, after the solution was deployed into production, client issues were handled by the Tier 2 support teams.

Tier 2: Data center operations, global network operations center, line of business application support, and messaging/collaboration support. These groups are the day-to-day operations teams that monitor and manage IT services and related assets. During server and domain isolation pilots, these teams were the initial escalation point for help desk and Corporate IT Security for server-related issues and troubleshooting. Each group has a subject matter expert for server and domain isolation, as well as detailed procedures for troubleshooting.

Tier 3: Windows network and infrastructure services. For server and domain isolation pilots, this group identified a team of people to be experts in troubleshooting the solution-related architectural components and technologies, such as IPsec, TCP/IP packet processing, computer accounts, and network logon rights. Within Microsoft, if further escalation is necessary, Tier 3 works directly with the Windows Development teams until closure is reached. Outside of Microsoft, this level would engage with Microsoft Product Support Services when necessary.

The following section summarizes the troubleshooting techniques that can be used by the help desk staff in the Tier 1 support organization.

Tier 1 Troubleshooting

This section presents the overall process for troubleshooting IPsec-related problems that is used by help desk staff, who provide Tier 1 support. Typically, Tier 1 support personnel are phone-based help desk staff members who attempt to diagnose problems remotely.

Is IPsec the Problem?

The help desk is likely to receive calls such as “I was able to connect to server x until IPsec was turned on" or "Everything worked yesterday, today I can’t connect to anything!" In the experience of Microsoft IT, the rollout of IPsec increased call volumes for all types of network connectivity issues and "access denied" incidents because people were paying increased attention to application and network behaviors. If someone thought it might be related to IPsec, they called the help desk. A server and domain isolation implementation plan should include a call classification system so that help desk personnel can provide clear reports about the volume and nature of IPsec-related problems.

After appropriate administrative information is obtained from the caller, help desk staff should follow a defined troubleshooting process. Because IPsec policy designs may vary in their impact on communications, and because the rollout process may take several days or weeks, a flowchart should be defined and updated for each set of isolation changes being implemented. Help desk management personnel must be involved in this planning process.

The goal of the help desk should be to categorize the problem so that known solutions can be attempted. If these attempts do not resolve the problem, then help desk personnel can ensure the proper information is collected and escalate the problem to Tier 2 support. For example, the help desk should be able to identify various types of problems in the following ways:

Network connectivity. Use ping and tracert Internet Control Message Protocol (ICMP) messages to test network paths.

Name resolution. Use ping <destination name> and nslookup.

Applications. Some applications work (for example, net view), but others do not when communicating with the same destination.

Services. For example, determine whether the server is running the Routing and Remote Access (RRAS) service, which creates a conflicting automatic IPsec policy for L2TP.

The caller’s computer. Determine whether it can access any host or specific trusted host destination computers that are used for help desk testing and diagnosis.

The target computer. Determine whether the caller's computer can access all help desk computers that are used for testing but cannot access a certain destination computer.  

Depending on the organization, the help desk may use Remote Assistance or Remote Desktop to connect to the caller's computer. The guidelines provided in this chapter do not require remote access, although they may be useful tools for help desk personnel to use as an alternative to guiding the caller through the IPsec Monitor Microsoft Management Console (MMC) snap-in or the Event Log viewer.

In scenarios where server isolation is used without domain isolation, help desk personnel should be aware of which servers are members of the isolation group.

Assign Scope and Severity

One of the first questions that Tier 1 support must address is: who is affected by the problem? Support personnel need to understand if the problem is shared by other users and, if so, how many and where they are. The support staff must then look at the extent of the problem. For example, does it affect connectivity to a single server, or are there more extensive problems such as logon or authentication failures across large parts of the network?

Problems with connectivity can involve many different layers and technologies that are used in network communications. Support engineers should be aware of how Windows TCP/IP network communications work in general, as well as specific issues related to the solution. This section reviews the different types of problems and common issues for each that Tier 1 support must handle.

Computer-specific problems. IPsec-protected communications require mutual Internet Key Exchange (IKE) computer authentication. Computers that initiate communications and computers that respond to communications must have valid domain accounts and access to domain controllers for their domain. Furthermore, IPsec policy assignment and network access controls depend upon computer accounts being in the correct domain groups. Other computer-specific issues that may affect IPsec behavior include the following:

The operating system does not have the correct service pack, patch or registry key configuration.

The computer has certain software installed or particular services running.

The network connection is using a specific IP address or communicating using a particular network path.  

Because of these types of issues, some computers may experience problems with connectivity and others not.

Note   All of the IPsec troubleshooting tools discussed in this chapter require local administrator group privileges.

Network location and path-specific problems. In a server and domain isolation solution or other widespread deployment of IPsec, it is likely that all TCP and UDP traffic will be encapsulated. Therefore, network devices along the path will only see only IKE, IPsec and ICMP protocols. If there are any network problems in the transmission of these three protocols between the source and destination, then communication may be blocked between the two computers.

User-specific problems. The deployment of IPsec, such as in a server and domain isolation scenario, can affect the network logon rights of domain users. For example, the problem may only affect users who are not in an authorized group for network access, or an authorized user may have problems obtaining Kerberos authentication credentials that contain the proper group memberships. There may be differences in behavior between domain and local user or service accounts.

Two other features of the server and domain isolation solution that are also typically found in enterprise deployments of IPsec are the use of subnet filters to define the address ranges used on the internal network, and the application of IPsec policies that are based on domain membership and group membership, regardless of where a computer is located on the internal network. Consequently, if there is a problem with the design of the subnet filters or the network path used by that computer to reach other computers, connectivity problems may appear in only certain parts of the network, when using a certain IP address (for example, a wireless address and not a LAN address), or only on certain computers.

Troubleshooting Flowcharts

The call handling flowcharts in this section were developed by Microsoft IT to help classify Tier 1 IPsec support problems. In addition to standard tools, two of the flowcharts refer to an IPsec policy refresh script, a description of which is provided in the "Support Script Examples" section later in this chapter.

Figure 7.1 is used for initial diagnosis and to determine the type of problem:

Is it a network connectivity problem? If so, attempt basic network troubleshooting. If unsuccessful, escalate to Tier 2 support.

Is it a name resolution problem? If so, attempt basic name resolution troubleshooting. If unsuccessful, escalate to Tier 2 support.

Is it an application problem? If so, escalate to Tier 2 support.

Is it an IPsec problem with the caller's computer? If so, go to Figure 7.2.

Is it an IPsec problem with the target computer the caller is trying to reach? If so, go to Figure 7.3.  

Figure 7.1  Troubleshooting process for failure to communicate with a target computer

Figure 7.1  Troubleshooting process for failure to communicate with a target computer
See full-sized image

Note   This flowchart assumes the caller computer is running IPsec and that DNS reverse lookup zones are configured to allow the correct operation of the ping –a command.

Figure 7.2 is designed to help identify problems with the caller's own computer. Note that in addition to diagnostics, this flowchart references the use of an IPsec policy refresh script (see "Support Script Examples" later in this chapter), which may fix the problem without necessarily identifying it. The steps in Figure 7.2 help determine the following potential problems with the caller's computer:

Is it an RRAS issue? If so, either stop the RRAS service (if RRAS is not required) or escalate the problem to Tier 2 support.

Is it a policy issue? If so, try to refresh the Group Policy and the IPsec policy.

Is it a domain account issue? If so, create a domain account for the caller's computer.

Is it none of the above? If IPsec policy refresh and/or creating a domain account do not solve the problem, escalate the issue to Tier 2 support.  

Figure 7.2  Troubleshooting caller computer IPsec-related problems

Figure 7.2  Troubleshooting caller computer IPsec-related problems
See full-sized image

Figure 7.3 is designed to help identify problems with a particular target computer. Note that this flowchart also references the use of an IPsec policy refresh script that may fix the problem without necessarily identifying it. Figure 7.3 helps determine the following potential problems with the target computer (or the path to it):

Is it a RRAS issue? If so, escalate to Tier 2 support.

Is it an IPsec policy issue? If so, try to refresh the Group Policy and the IPsec policy. Then check network connectivity.

Is it a network connectivity issue? If so, escalate to Tier 2 support.

Is it a logon right issue? If so, escalate to Tier 2 support.  

Figure 7.3  Troubleshooting target computer IPsec-related problems

Figure 7.3  Troubleshooting target computer IPsec-related problems
See full-sized image

After the Tier 1 support staff has worked through the flowcharts, the problem status will be one of the following:

Fixed understood. This status means the problem has been resolved and the reason for the problem may have been determined.

Fixed unclear. This status means the issue is resolved the issue but the reason for the problem is not fully understood. For example, an IPsec policy refresh may solve the problem but does not necessarily explain why an incorrect policy, or no policy at all, came to be applied.

Not fixed. This status means the problem is still outstanding but with likely problem issues identified as the issue is escalated to Tier 2 support.  

Prevention of Social Engineering Attacks

In an isolation solution, help desk personnel may become aware of specific areas within the IT environment that are not protected by IPsec, such as computers that are members of the exemption list. They may not be used to protecting sensitive information, because in other security solutions such critical information is usually only available to higher-level support teams. For this reason, help desk personnel should be trained in how to detect and resist social engineering attacks.

In a social engineering attack, an untrusted person attempts to gain information about how security is implemented and where security is weak, often by simply taking advantage of the human tendency to trust other people. The following information should be carefully controlled by help desk personnel:

Members of the exemption list. The list of IP addresses in the exemption list filters is likely available to local administrators on all trusted hosts by using the IPsec Monitor MMC snap-in, or by viewing the domain IPsec policy cache in the local registry. In addition, the security settings used in the organization may provide non-administrative users with read access to the cache. After domain isolation is fully implemented, attackers must scan the network to detect exempted computers, which will be able to respond to TCP and UDP connection requests. Note that DNS servers, DHCP servers, and WINS servers are easily identified from the DHCP configuration, and domain controllers are easy to locate by using either a DNS query or a UDP Light Directory Access Protocol (LDAP) query.

Computers in the organization that are not participating in the isolation solution. For example, certain domains or server types may not be included in the solution.

Computers that do use server isolation or require machine-based access control. The servers that contain the most sensitive information usually have the most security protections in place.

Users who are administrators or have special roles in the IT organization. In some cases, e-mail addresses are used as computer names or part of the computer name, thereby revealing logon names or e-mail addresses.

Subnets that are being used for specific purposes or by certain organizations. If this information is known, an attacker can then focus their attack on the most sensitive and valuable parts of the network.

Other network-based security measures that are being used. For example, knowledge of whether firewalls exist, whether router filters permit certain traffic, or whether network intrusion detection is being used is very helpful for an attacker.  

Help desk personnel should also be trained to be wary if a caller asks them to connect to their computer IP address to see what it wrong—for example, if an attacker asks someone at the help desk to connect to their computer using file sharing, Remote Desktop, Telnet, or other network protocol. If a help desk person makes the connection without IPsec, the attacker's computer can learn information about the password or (in some cases, such as with Telnet) steal the password. This situation can occur because some client network protocols do not first authenticate and establish a strong trust with the destination computer, or they do not require strong password protections before revealing user identity or password-related information.

Support Script Examples

For most troubleshooting scenarios, a solution can be quickly determined after the right information is identified. This information may be found using various Windows tools, such as those referenced in the flowcharts. In the Woodgrove Bank solution, a number of scripts were developed to provide key information without requiring Tier 1 support staff to have detailed knowledge of tool operations and syntax. These scripts are available in the Tools and Templates folder of the download for this guide.

Scripts Available for Tier 1 Support

If the user is a local administrator of their computer, help desk personnel can have them run one of three scripts provided with this solution. These scripts are examples of the customized scripts used for the Woodgrove Bank environment that is detailed in this guide. They are described in this chapter to illustrate how scripts can be used to support the troubleshooting process.

Note   These scripts are tested examples but are not supported by Microsoft. They should be used as a basis for an organization's own customized solution.

IPsec_Debug.vbs

In addition to providing debug information, this script may actually fix some problems. It stops and restarts the IPsec service (which deletes all current IKE and IPsec security associations), forces a Group Policy refresh to reload the current domain-assigned IPsec policy from the Active Directory® directory service, and updates the policy cache. To avoid loss of connectivity for remote desktop sessions, the script should be downloaded to the caller's computer and run locally by an account that has administrative privileges. Use the following syntax to run the script at a command prompt:

    cscript IPsec_Debug.vbs

The script performs the following functions:

Discovers the operating system version

Calls Detect_IPsec_Policy.vbs

Increases Group Policy logging

Increases Kerberos version 5 authentication protocol logging

Purges current Kerberos protocol tickets

Refreshes Group Policy

Enables IPsec logging

Performs PING and SMB (Net View) tests

Detects IPsec file versions

Runs policy and network diagnostic tests

Copies IPsec 547 events to a text file

Disables IPsec logging

Restores Kerberos protocol logging

Restores Group Policy logging  

This script also enables all IPsec-related logs for troubleshooting by Tier 2 support.

Detect_IPsec_Policy.vbs

This script determines whether the computer is running the correct IPsec policy by checking the current local registry cache for policy version information for the domain IPsec policy. Use the following syntax to run the script at a command prompt:

    cscript Detect_IPsec_Policy.vbs

Note   This script is also called from IPsec_Debug.vbs, and therefore does not need to be run in addition to that script.

Refresh_IPsec_Policy.vbs

This script is the IPsec policy refresh script referenced in the troubleshooting flowcharts. It refreshes computer Kerberos authentication protocol tickets and Group Policy, and may fix the problem if it is caused by an incorrect IPsec policy assignment or a Group Policy download failure. Use the following syntax to run the script at a command prompt:

    cscript Refresh_IPsec_Policy.vbs

Escalation

When help desk personnel need to escalate a likely IPsec problem, the following information should be collected by Tier 1 and passed with the service request:

Log files generated with IPsec_Debug.vbs script.

The caller's machine name so that the next support tier can identify the log file generated by the script.

The destination computer to which access is denied, so that escalation can be directed to the proper support group.

Server isolation scenarios often have their own support team to investigate membership of network access groups.

Tier 2 Troubleshooting Preparation

Tier 2 support has two main roles. First, as the recipient of all Tier 1 escalations, Tier 2 validates issues and reviews the steps taken by Tier 1 to ensure that no troubleshooting steps were missed. In this respect, Tier 2 should confirm that any escalated issue is really due to IPsec, and not a misdiagnosis. Second, as skilled network support engineers, Tier 2 support staff members should be able to use their skills and experience (listed in the following section) to successfully resolve the problem through log analysis without gaining administrative control of the computer. However, logs only capture information, and corrective actions require administrative access. It is not expected that a Tier 2 support person should be a domain administrator or be able to make changes in domain-based IPsec policy or computer group memberships.

Tier 2 Support Skills

Support staff that provide Tier 2 IPsec support should have skills and expertise in the following areas:

Group Policy. Know what policies should be assigned, how they are assigned, and be able to perform the following tasks:

Check access control lists (ACL) on Group Policy objects (GPO).

Check GPO settings.

Check group memberships for computers and users.  

Experience with third-party software used by the organization.

Authentication failure identification.

Be able to verify that a domain computer account is OK by using the netdiag and nltest utilities.

IPsec configuration. Be able to perform the following tasks:

Verify IPsec filter configurations.

Reload IPsec domain policy.

Disable IPsec entirely, or just the domain policy to use local policy for testing.

Troubleshoot the IPsec IKE negotiation process and security protocols.

Networking. Be able to perform the following tasks:

Troubleshoot the network protocol stack on a host machine.

Understand and troubleshoot the information that is gathered in a network trace.

Troubleshoot network path problems, including TCP Path MTU discovery and virtual private network (VPN) remote access solutions.  

Issues Inherent with the Use of IPsec

As indicated in the previous section, Tier 2 support personnel for a server and domain isolation solution will need to know the details of IPsec-protected communications, but they also must be able to isolate problems related to other technology components.

For successful IPsec communication between two computers, both computers usually require a compatible IPsec policy. For example, an IPsec policy may block communication if the remote computer does not have an appropriate IPsec policy. Although this may be an intended or acceptable behavior during the rollout of a policy change, it may not be immediately apparent whether it blocks network connectivity with one or more computers and causes any application warnings or errors. In a worst case scenario, an administrator might accidentally assign an IPsec policy to all domain members that blocks all traffic. Unless the mistake is realized immediately, with a correct assignment that quickly replicates after the original assignment, replication of the damaging policy is not easily stopped. This type of mistake results in a situation in which communications between a client and a domain controller would be required to use IPsec. Because the authentication used in this solution relies on the Kerberos protocol, any client that inherits this policy would not be able to complete the logon process—because they would be unable to obtain the required Kerberos ticket to secure the communications. Administrators must carefully plan any policy changes and ensure that procedural safeguards exist to mitigate this type of situation.

Background information on troubleshooting TCP/IP is provided in the troubleshooting guides listed in the "More Information" section at the end of this chapter. However, many of the procedures referred to in these guides will only work while IPsec is providing successful connectivity. If IKE or IPsec is failing, then most of these procedures and tools will probably become ineffective. In a server and domain isolation scenario, some of the procedures documented in the background guides may not work at all, even if IPsec is providing successful connectivity. A support organization should expect to update and customize troubleshooting tools and procedures to remain effective within a server and domain isolation environment. Because there are many different ways that IPsec policies can be deployed to control and help secure traffic, it is unlikely that organizations will be able to rely solely on existing procedures and a generic toolkit.

It is important for support personnel to have documented examples of the expected output of network troubleshooting tools that are obtained from a lab environment where server and domain isolation or some other IPsec deployment is functioning correctly. In many cases, network diagnostic tools do not expect three-second delays for Fall back to clear, or the small delays required for IKE initial negotiation of IPsec security associations (SA). Therefore, the tools may display one result when run initially but a different result when run a few seconds later. Furthermore, where network access is deliberately denied by IPsec, the tools will report failures. The type of failure will depend on the tool and the IPsec environment.

Note   In the Tier 1 section the terms caller and target were used to help the support staff troubleshoot common problems. In the Tier 2 section it is preferable to use the IPsec terms initiator and responder to help make the more advanced troubleshooting processes clearer. The remainder of this chapter uses these IPsec terms.

Group Policy and Group Memberships

Domain-based IPsec policy depends upon Group Policy and the download of GPOs. If the client Group Policy system experiences errors in detecting GPO changes or in downloading them, then IPsec connectivity may be affected. If Group Policy assignment is controlled by organizational unit (OU) membership and computer accounts are inadvertently moved to a different OU, deleted, or recreated in the wrong OU, then an inappropriate IPsec policy may become assigned.

This solution uses domain security groups to control policy assignment and to control network access. Group membership is contained within Kerberos version 5 authentication protocol tickets (both TGT and service tickets) that have fairly long lifetimes. Therefore, administrators must plan for the time required for computers to receive new Kerberos TGT and service ticket credentials that contain group membership updates. The Kerberos protocol makes it extremely difficult to determine if the Kerberos tickets for a computer contain the proper group memberships. This difficulty is "by design," as all the information about group membership is stored in an encrypted form within the ticket. Group membership must be determined by using the information within the directory service, not from the tickets themselves.

Kerberos Authentication

The server and domain isolation design uses the Kerberos version 5 protocol for IKE authentication. Because the Kerberos protocol requires successful network connectivity and available service from DNS and domain controllers, lack of connectivity will cause Kerberos authentication and IKE to fail. (IKE will also fail if Kerberos itself fails.), Therefore, connectivity problems between computer A and computer B may be caused by blocked network connectivity between computer A and computer C, which are caused by the inability of the Kerberos protocol to authenticate with a domain controller. In situations like this, the information provided in the 547 events in the Windows audit and security logs generally provides invaluable guidance on the source of the problem.

IPsec-Protected Inbound Traffic Required

This server and domain isolation solution specifies that IPsec-protected communication is required for inbound access. This requirement will cause remote monitoring tools that run on untrusted computers or dedicated network monitoring devices to report that a remote computer is not contactable. If these computers or devices are not able to join the "trusted" environment, they will not be able to perform the monitoring role unless some specific exemptions are added to the design. Troubleshooting is complicated by the fact that IPsec may be required to establish connectivity to a trusted host, which means that an administrator may not be able to connect to a trusted host and then stop the IPsec service without losing connectivity. If the administrator's IPsec policy allows Fall back to clear, then the remote connection will experience a three or four second delay after the service is stopped on the remote computer. However, stopping the IPsec service on a remote computer will delete the IPsec SAs that are in use by all other currently connected computers. If these other computers are not able to Fall back to clear, then communications will stop and TCP connections will eventually time out. Because sudden breaks in TCP communications can cause data corruption in applications, stopping the IPsec service should be used only as a last option in the troubleshooting process. Before IPsec service is stopped, the computer should be prepared to be shut down so that all connected users and applications can properly terminate communications.

Communication Direction Issues

One common troubleshooting scenario is successful communication in one direction but failed communication in the reverse direction. IKE authentication typically requires mutual authentication between computers. If one computer can not obtain a Kerberos ticket when it initiates IKE main mode for a remote computer, then IKE will fail. This situation could happen if the Kerberos client from the initiating computer could not access a domain controller in the domain of the destination computer. If computers are members of domains that are not mutually trusted (two-way trust), then IKE main mode negotiations will succeed when one computer initiates and fail if the other computer initiates. Similarly, inbound network logon rights may differ on two computers. It is possible for IKE main mode and quick mode negotiation to fail in one direction not only for these reasons, but also if the IPsec policy designs are not compatible on both sides.

Host-based firewalls that intercept traffic above the IPsec layer can enforce directionality on connections. Some host-based firewalls intercept traffic below IPsec layer. After successful IPsec communication is established, IPsec-protected traffic is likely to be allowed in both directions for a period of time.

Stateful filtering by a network router or firewall can also block IKE rekey actions or IPsec traffic flow without affecting other diagnostic protocols such as ICMP. TCP and UDP ports may not be accessible on one computer because a service is not running, or because a device that works above the IPsec layer (such as Windows Firewall or a network router) is blocking access.

Network Traces and Advanced Network Path Troubleshooting

Failures in IKE negotiation often cause the computer that experiences the failure to stop responding to the IKE negotiation, or in some cases to resend the last "good" message until the retry limit expires. IKE must be able to send fragmented UDP datagrams that contain the Kerberos tickets, because such packets often exceed the path maximum transmission unit (PMTU) for the destination IP address. If fragmentation is not properly supported, such fragments may be dropped by network devices along a certain path. In addition, the network may not pass IPsec protocol packets or fragments of IPsec packets correctly. IPsec integration with TCP enables TCP to reduce the packet size to accommodate the overhead of IPsec headers. However, the TCP negotiation of the maximum segment size (MSS) during the TCP handshake does not take into account IPsec overhead. Consequently, there is an increased requirement for ICMP PMTU discovery in the network to ensure successful IPsec-protected TCP communication. Therefore, troubleshooting connectivity failures may require network traces of one or both sides of the communication, as well as logs from both sides of the communication.

Technical support engineers should understand how to read network traces, and also understand the IKE negotiation. Servers should have the Windows Network Monitor software installed. Windows 2000 Network Monitor provides parsing of IPsec AH and IKE. Windows Server 2003 adds support for parsing IPsec ESP-null, parsing ESP when encryption is offloaded, and parsing UDP-ESP encapsulation used for NAT traversal.

When troubleshooting IPsec and taking network traces between hosts, it is considered best practice to lower the ESP encryption level (if it’s currently at DES/3DES) to ESP-null. This will help in reading and understanding the network trace much better than going through the encrypted traffic capture.

The Troubleshooting Toolkit

Before starting troubleshooting, it is important to identify utilities that can abstract information to aid the troubleshooting process. This section does not attempt to duplicate information that is found in Windows 2000, Windows XP, or Windows Server 2003 Help or that is accessible through the IPSec Troubleshooting Tools page.

Detailed tool information is only provided in this section if it is not readily found through the referenced Troubleshooting tools page or where it is useful to have summaries across operating system versions.

IP Security Policy Management MMC Snap-In

The IP Security Policy Management MMC snap-in is used to create and manage local IPsec policies or IPsec policies stored in Active Directory. It can also be used to modify IPsec policy on remote computers. The IP Security Policy Management MMC snap-in is included in Windows Server 2003, Windows XP, Windows 2000 Server, and Windows 2000 Professional operating systems and it can be used to view and edit IPsec policy details, filters, filter lists, and filter actions and to assign and un-assign IPsec policies.  

IP Security Monitor MMC Snap-In

The IP Security Monitor MMC snap-in shows IPsec statistics and active SAs. It is also used to view information about the following IPsec components:

IKE main mode and quick mode

Local or domain IPsec policies

IPsec filters that apply to the computer  

Although this snap-in is part of the Windows XP and Windows Server 2003 operating systems, there are functionality and interface differences between the Windows XP and Windows Server 2003 versions. Also, the Windows Server 2003 version has the following additional features:

Provides details on the active IPsec policy, including the policy name, description, date last modified, store, path, OU, and Group Policy object name. To obtain the same information in Windows XP you must use the IPseccmd command-line tool (described later in this section).

Statistics are provided separately for main mode or quick mode, in folders under each mode rather than in one display.  

Note   In Windows 2000, IP Security Monitor is a stand-alone executable program (IPsecmon.exe) with its own graphical user interface (GUI). This tool and how it can be used is described in Microsoft Knowledge Base article 257225, "IPsec troubleshooting in Microsoft Windows 2000 Server".

An update to this snap-in is available for Windows XP as part of the update that is described in Microsoft Knowledge Base article 818043, "L2TP/IPSec NAT-T update for Windows XP and Windows 2000". This update makes it possible to view Windows Server 2003 computers from Windows XP. The updated IP Security Monitor MMC snap-in can also read advanced features created in Windows Server 2003 (for example, Diffie-Hellman 2048 group information, certificate mappings, and dynamic filters), but cannot edit them. For more information see the referenced Knowledge Base article.

Netsh

Netsh is a command-line scripting utility that allows you to display or modify the network configuration. In addition, you can use Netsh either locally or remotely. Netsh is available for Windows 2000, Windows XP, and Windows Server 2003. However, the Windows Server 2003 version is enhanced to provide IPsec diagnostic and management functionality. The Netsh commands for IPsec are only available for Windows Server 2003; they replace Ipseccmd in Windows XP and Netdiag as used in Windows 2000.

Ipseccmd

Ipseccmd is a command-line alternative to the IP Security Policy MMC snap-in. It is only available for Windows XP, and Windows XP Service Pack 2 provides additional functionality for this tool.

Ipseccmd must be installed from the Support Tools folder on the Windows XP CD. An updated version is available with Windows XP SP2, which must be installed from the Support Tools folder on the Windows XP SP2 CD. The pre-SP2 version does not work on updated computers, and the updated version does not work on pre-SP2 computers.

The updated Ipseccmd utility has the following capabilities:

Dynamically turns IKE logging on and off

Displays information about a currently assigned policy

Enables you to create a persistent IPsec policy

Can display the currently assigned and active IPsec policy

For more information on the updated Ipseccmd utility, refer to Microsoft Knowledge Base article 838079,”Windows XP Service Pack 2 Support Tools”.

To display all IPsec policy settings and statistics for diagnostics, use the following syntax:

ipseccmd show all

To display currently assigned and active IPsec policies (local or Active Directory), use the following syntax:

ipseccmd show gpo

Note   This command only works with the SP2 version.

To enable debug logging in Windows XP SP2, use the following syntax (no IPsec service restart is required):

    ipseccmd set logike

To turn off debug logging, use the following syntax (again, no IPsec service restart required):

    ipseccmd set dontlogike

Note   You can only use Ipseccmd to enable Oakley logging in Windows XP SP2; the above commands do not work on pre-SP2 computers.

Netdiag

Netdiag is a command-line diagnostic tool that is used to test network connectivity and configuration, including IPsec information. Netdiag is available in Windows 2000, Windows XP, and Windows Server 2003, but its functionality changes with the operating system version. In Windows Server 2003, Netdiag no longer includes IPsec functionality; instead, you can use the netsh ipsec context, and basic network testing is also obtainable from Netsh. For all operating system versions, it is important to make sure you are using the latest version by checking the Microsoft Download Center. Netdiag must be installed from the Support Tools folder of whichever Windows operating system CD is used.

Note   Netdiag is not updated when Windows XP SP2 is installed.

The relevance of Netdiag to IPsec troubleshooting depends on the operating system version. Functionality differences are described in the following table.

Table 7.1  Netdiag IPsec Functionality in Different Operating Systems

CommandDescriptionWindows 2000?Windows XP?Windows Server 2003?

netdiag /test:ipsec

View the assigned IPsec policy

Yes

Yes

No**

netdiag /test:ipsec /debug

Display the active IPsec policy, filters, and quick mode statistics

Yes

Yes*

No**

netdiag /test:ipsec /v

Display the active IPsec policy, filters, and main mode statistics

Yes

Yes*

No**

* Provides network diagnostics, but displays IPsec policy name only. Additional IPsec information is available by using Ipseccmd.

** Provides network diagnostics, but does not display any IPsec information. Instead, use the following syntax: netsh ipsec dynamic show all.

Other Useful Tools for Supporting IPsec

In addition to the IPsec-specific tools noted earlier, the following table lists other tools that may be useful in troubleshooting and should be included in your Tier 2 troubleshooting toolkit.

Table 7.2  Miscellaneous Useful Tools for IPsec Troubleshooting

ToolSupported operating systemsHow to obtainRoleMore information

Ipsecpol.exe

Windows 2000 only

Windows 2000 Resource Kit

Configures IPsec policies in the directory or in a registry

Windows 2000 Resource Kit Tools Help

Gpresult

Windows 2000, Windows Server 2003, Windows XP

Windows 2000– Resource Kit; for Windows XP and Windows Server 2003, it is part of the operating system

Check when Group Policy was last applied

Windows 2000 Resource Kit Tools Help, Windows XP and Windows Server 2003 Help

Resultant Set of Policy (RSoP) MMC
snap-in

Windows Server 2003, Windows XP

Part of the operating system

View IPsec policy for a computer or for members of a Group Policy container

Windows Server 2003 Help

Srvinfo

Windows 2000, Windows Server 2003, Windows XP

Windows 2000  and Windows Server 2003 Resource Kits

Services, device drivers, and protocols information

Windows Server 2003 Resource Kit Tools Help

PortQry

Windows 2000, Windows Server 2003, Windows XP

Windows Server 2003 Resource Kit

Network port status reporting

http://support.microsoft.com/kb/310099

NLTest

Windows 2000, Windows Server 2003, Windows XP

Support Tools

Test trust relationships and Netlogon secure channels

Windows Server 2003 Support Tools Help

KList

Windows 2000, Windows Server 2003, Windows XP

Windows 2000 and Windows Server 2003 Resource Kits

Kerberos ticket reporting

Windows Server 2003 Resource Kit Tools Help

Pathping

Windows 2000, Windows Server 2003, Windows XP

Part of the operating system

Network connectivity and path testing

Windows Help

LDP

Windows 2000, Windows Server 2003, Windows XP

Support Tools

LDAP client for Active Directory testing

Windows Server 2003 Support Tools Help

Using ICMP-Based Tools with IPsec

Windows XP and Windows Server 2003 Ping, Pathping, and Tracert are aware of IPsec, but may not function correctly until soft SAs are established (if Fall back to clear is allowed). If IPsec SAs were negotiated successfully to encapsulate the ICMP traffic used by these utilities, they would not be able to detect any intermediate hops (routers) between the client and the target destination. Calculations on packet loss for Ping may show packets lost during the time required for IKE to successfully negotiate an IPsec SA pair with the target. Calculations on packet loss for each intermediate hop will not be available when ICMP traffic is encapsulated by IPsec.

These ICMP utilities are designed to detect whether the IPsec driver matched an IPsec filter to the outbound ICMP echo request packet, and therefore requested IKE to negotiate security. When this happens, the message "Negotiating IP security" is displayed by the utility. A known bug in Windows 2000 causes the Ping utility to not wait the proper amount of time before retrying the next echo request, which means that the command may complete immediately instead of waiting three seconds until the soft SA is established. The Ping utility in Windows XP and Windows Server 2003 waits the expected number of seconds before the next echo request is sent.

The "Negotiating IP security" message will not display under the following conditions:

If the IPsec driver drops the outbound ICMP packet because of a blocking filter.

If the IPsec driver allows the ICMP packet to pass unsecured because of a permit filter or a soft SA.

If the IPsec driver does not detect the outbound packet (for example, if it was dropped by layers above the IPsec driver).

Note   Some tools that use ICMP may not be able to detect that IPsec is negotiating security and may produce inconsistent or erroneous results.

The IPsec Troubleshooting Process

If Tier 1 support has clearly identified the problem, then Tier 2 support will be able to quickly find the relevant troubleshooting procedure in the following sections. In this model, Tier 1 support primarily handles client-related access problems. It is expected that administrative owners of servers will be able to perform basic network connectivity diagnostics and may skip Tier 1 support. However, each organization should adjust the model for their support environment. Tier 2 support should focus on identifying where the failure to communicate is happening, then investigate related possibilities in the architecture of the system.

If your organization is using the scripts that are provided as part of the troubleshooting process, you will have access to a number of text log files that can be used to help diagnose the problem. Descriptions of the files that the script generates are provided in the following table.

Table 7.3  Files Created from the IPsec_Debug.vbs Script

File nameDescription

<CompName>_FileVer.txt

Lists the file versions of various IPsec-related DLLs.

<CompName>_gpresult.txt

Output of the gpresult command.

<CompName>_ipsec_547_events.txt

Output of any IPSEC 547 errors in the Security event log.

<CompName>_ipsec_policy_version.txt

Output of the Detect_IPsec_Policy.vbs script. Shows the current policy version on the box and if it matches the Active Directory policy.

<CompName>_ipseccmd_show_all.txt

Only on Windows XP. This file captures the output of the ipseccmd command.

<CompName>_kerberos_events.txt

Output of any Kerberos events in the System event log.

<CompName>_klist_purge_mt.txt

Output from KList while purging machine tickets.

<CompName>_lsass.log

Copy of the lsass.log file if present.

<CompName>_netdiag.txt

Output from running netdiag.

<CompName>_netsh_show_all.txt

Only on Server platforms. Output from the show all command in netsh.

<CompName>_netsh_show_gpo.txt

Only on Server platforms. Output from the show gpo command in netsh.

<CompName>_oakley.log

Copy of the Oakley.log file, if present.

<CompName>_OSInfo.txt

Output of current operating system information.

<CompName>_RegDefault.txt

Output of the original registry key values prior to changing. Can be used to manually reset the registry to previous values if the script fails for some reason.

<CompName>_userenv.log

Copy of the userenv.log file, if present.

<CompName>_<SrvName>_netview.txt

Output of the net view command against <SrvName>.

<CompName>_<SrvName>_ping.txt

Output of the ping command against <SrvName>.

<CompName>_winlogon.log

Copy of the winlogon.log file, if present.

Because there are many potential points of failure, this section addresses each architectural component in order, starting with network connectivity. Procedures are defined that will help you perform the following tasks:

Verify IP network configuration, network connectivity and service with domain controllers, and client-server path connectivity for IPsec-related protocols.

Verify correct application of Group Policy and IPsec policy on both client and server.

Investigate issues with IKE negotiation and IPsec-protected communication.

Identify the cause of a problem for Tier 3 escalation, if required.  

Consider the following example scenario: a client reports being able to ping a server, but not being able to connect to a file share on that server. This is the only server the client cannot access. A quick review of the Security Log for event 547 (IKE negotiation failure), which contains the IP address of the server, will indicate that the client has an IPsec policy and that IKE is being initiated. If the client event 547 indicates that the client IKE negotiation timed out, the server IKE likely failed the negotiation. Tier 2 support would then review the MOM event database for 547 events that are collected from the specified server, which will contain the current client IP address.

Warning: Starting and Stopping the IPsec Service

The Windows Server 2003 TCP/IP Troubleshooting document and other references describe how to determine if IPsec is causing a connectivity problem by stopping the IPsec service. Although this will stop IPsec filtering on the computer, it will also disable the protection that IPsec provides, expose the computer to untrusted network access, and disable packet protection. Also, in a domain isolation environment, TCP and UDP traffic that is not protected by IPsec will be dropped by other isolation domain members. If IPsec is disabled on one computer, it will cause connectivity interruptions with those remote computers that currently have IPsec security associations established. When the IPsec service is stopped, IKE sends delete notifications for all IPsec SAs and for the IKE SA to all actively connected computers. Remote computers with IPsec policy that allowed Fall back to clear will re-establish connectivity after a three second delay. Remote computers with IPsec policy that does not allow Fall back to clear will be unable to communicate.

Therefore, it is important to use the techniques discussed in the following sections to troubleshoot isolation scenarios without stopping the IPsec service. The IPsec service should be disabled only as a last resort to rule out IPsec-related problems for the following situations:

Broadcast and multicast traffic environments

Connections to remote computers that do not require IPsec for inbound access (for example, the computers that are members of the exemption list)  

In Windows 2000, stopping the IPsec service will unbind the IPsec driver from TCP/IP and unload the IPsec driver from memory.

In Windows XP and Windows Server 2003, stopping the IPsec service will delete all filters from the IPsec driver and set the driver mode to PERMIT. It does not unload the IPsec driver from memory. The IPsec service must be disabled and the computer restarted to avoid loading the IPsec driver.

In Windows 2000 and Windows XP SP1, IKE logging to the Oakley.log file requires a restart of the IPsec service. Stopping the service is not required to enable and disable IKE logging to the Oakley.log file in Windows XP SP2 and Windows Server 2003. The latest update to Ipseccmd for Windows XP SP2 provides the syntax
ipseccmd set logike and ipseccmd set dontlogike to dynamically enable and disable IKE logging to the Oakley.log file. Windows Server 2003 IKE logging can be enabled dynamically using the Netsh commands described in online Help.

Verifying Network Connectivity

If Tier 1 support identifies possible network connectivity issues, then the first step is to determine if basic network connectivity exists. This determination involves verifying that the proper IP configuration is being used, that there is a valid network path between the initiator and the responder computer, and that name resolution services are working.

Network IP Address Configuration Problems

If dynamic IP configuration is not successful, or if communications are blocked after restarting the computer (or even during normal operation), IPsec may be the cause. In Windows Server 2003, such problems may be related to IPsec failsafe behavior (for example, if the computer is started in Safe Mode or Active Directory Recovery Mode).

Note   For details about Windows Server 2003 failsafe behavior, see "Understanding IPSec Protection During Computer Startup".

Windows Server 2003 resorts to failsafe behavior if the IPsec service cannot successfully start or cannot apply the assigned policy. Failsafe only applies when an IPsec policy is assigned to the computer and when the IPsec service is not disabled. Consequently, connectivity to or from a computer could fail during normal operation because the IPsec driver is not enforcing the domain-based IPsec policy. After you determine what traffic is allowed and blocked by bootmode and persistent configurations, a communications failure may be easy to explain. To obtain alternative or additional information, you can query the current state from the command line by using the following syntax:

netsh ipsec dynamic show config

For Windows Server 2003, the IPsec driver is loaded at computer startup time with the TCP/IP driver. Therefore, to remove the IPsec driver failsafe behavior, the IPsec service must be disabled and the computer restarted. The previously referenced IPsec deployment chapter includes recommended configuration of boot exemptions to exempt inbound Remote Desktop Protocol connections, which will ensure that remote access to the server is available when other traffic is blocked.

In a server and domain isolation solution, broadcast traffic and traffic to the DHCP servers is exempt to ensure that dynamic IP configuration works properly. However the exemption list must be maintained manually and may become outdated. If a computer cannot obtain proper DHCP configuration (for example, if it uses an Auto-configuration IP address of 169.254.x.x) or has problems renewing the lease, then the IPsec policy should be examined for proper exemptions. With the IPsec service running, use Ipconfig to confirm there are no problems obtaining an address. For DHCP clients, open a command window and enter the following:

ipconfig /release
ipconfig /renew

If the address configuration problems only happen during computer startup for Windows XP SP2 and Windows Server 2003, the configuration for exemptions (default exemptions and boot exemptions) should be inspected.

Name Resolution Problems

The IPsec policy design used in the server and domain isolation scenarios should not interfere with typical procedures that are used to determine if name resolution is working. For example, in the Woodgrove Bank scenario, the IPsec policy design exempts all traffic to DNS and WINS servers. However, it is possible that DNS and WINS servers could be configured to not respond to Ping requests. Answer the following questions to confirm that name resolution is working properly while the IPsec service is running:

Can the client ping the DNS server IP address listed in its IP configuration?

Can nslookup find a DNS server?

Can the client ping the fully qualified DNS name of the target?

Can the client ping the shortened DNS or NetBIOS name of the target?  

Potential sources of name resolution problems include an active and misconfigured HOSTS file, a misconfigured DNS server entry in IP properties, incorrect DNS record registrations, zone file update problems, Active Directory replication issues, Fall back to clear used for DNS servers, and DHCP auto-update issues.

Possible reasons for NetBIOS name resolution failures include an active and misconfigured LMHOSTS file, a misconfigured WINS server entry in IP properties, WINS server unavailability, incorrect WINS record, WINS replication problems, WINS proxy failures, and network timeouts reaching the WINS server.

For procedures to help troubleshoot Active Directory-integrated DNS, refer to the Active Directory Operations Overview: Troubleshooting Active Directory-Related DNS Problems page.

Some high-security environments may require DNS and WINS servers to be protected with IPsec, which can result in name resolution problems. For example, if DNS is integrated within Active Directory and there are duplicate filters for the same IP address in the IPsec policy, one filter may negotiate security to the DNS server and one may exempt the domain controllers. For more information, see the "Troubleshooting IPsec Policy" section later in this chapter.

If name resolution problems persist, you can get the filter list from the initiator and check for duplicate filters. You can use the following command line options to view the filter lists for this task:

Ipseccmd show filters
Netsh ipsec static show all 

If the name resolution problems still persist, the IPsec service should be stopped briefly (if possible) while name resolution tests are repeated. If name resolution tests only fail when the IPsec service is running, investigation should continue to determine which IPsec policy is being applied, as discussed later in this section.

Verifying Connectivity and Authentication with Domain Controllers

Because IPsec policy delivery, IKE authentication, and most upper layer protocols depend on access to domain controllers, tests for network connectivity and successful operation of authentication services should be performed before IPsec-specific troubleshooting steps (described in the next section) are performed. In a scenario such as Woodgrove Bank, IPsec policy design exempts all traffic to all domain controllers, so network connectivity tests to the domain controllers should not be affected by IPsec. However, the list of domain controller IP addresses in the exemption list must be maintained manually. If IKE negotiation is seen to a domain controller IP address, then the IPsec policy may be incorrectly assigned or not updated.

To troubleshoot access to network services in Active Directory

Check that the client can ping each domain controller IP address. If not, refer to the network connectivity steps above.

Identify which IP addresses are used for the domain member's domain controllers. Use nslookup <domain name> to return the full list of IP addresses. In a server and domain isolation scenario there should be a quick mode-specific filter with a negotiation policy (filter action) of permit for each of these addresses.

Use version 2.0 or later of the portqry.exe tool or the PortQueryUI tool to test access to the domain controller UDP, LDAP, and RPC ports. The UDP protocol messages used by portqry do not usually require upper layer protocol authentication, so they can verify service availability even if authentication is not available. These steps are explained in Microsoft Knowledge Base article 816103, “HOW TO: Use Portqry to Troubleshoot Active Directory Connectivity Issues”.

When connected to the internal network, use netdiag /v >outputfile.txt to perform many DNS-related and domain controller-related connectivity tests. Netdiag uses multiple network connections and protocols to perform testing; if any of these connections trigger IKE negotiations and the authentication fails because IKE is unable to locate a domain controller for Kerberos authentication, the Event 547 failure error may be logged in the security log.  

The Windows Support tool klist.exe can be used to verify successful Kerberos login and authentication. Klist must be run in the local system context to view the Kerberos tickets for the computer.

To view Kerberos service tickets for the logged in domain user

Open a command prompt and type the following:

klist tickets

To view domain computer tickets

1.

Verify the Task Scheduler service is running and the logged on user is a member of the Local Administrators group.

2.

At a command line prompt, chose a time one minute ahead of the current system time (such as 4:38 pm) and type the following:

at 4:38pm /interactive cmd /k klist tickets

Note that the command window title bar contains C:\Windows\System32\svchost.exe.

Although Kerberos tickets contain group information for the user or the computer, this information is encrypted so that the groups cannot be viewed. Therefore group membership must be confirmed manually by inspecting the group membership in Active Directory. To ensure that computers have the latest group membership information in their Kerberos tickets, use klist to purge the current Kerberos tickets. When IKE negotiation is attempted again, new Kerberos tickets will be obtained automatically.

To purge the Kerberos tickets for the computer

(Steps 1-4 must be run in Local System context)

1.

Verify the Task Scheduler service is running and the logged on user is a member of the Local Administrators group

2.

At a command-line prompt, chose a time one minute from the current system time (such as 4:38 pm) and type the following:

at 4:38pm /interactive cmd

3.

Type klist purge and press Y for each ticket type to delete all Kerberos tickets.

4.

Type klist tickets to confirm that no tickets exist.

5.

If this procedure is part of troubleshooting IKE negotiation failure, wait one minute for IKE negotiation to time out and then try to access the target server again with the application. New IKE main mode negotiations will request new Kerberos TGT and service tickets for the destination computer, which will have the latest group information available. Be careful not to execute the application from the command window that is running in Local System context.  

Additional detailed procedures for troubleshooting Kerberos are published in the following white papers:

Troubleshooting Kerberos Errors.

Troubleshooting Kerberos Delegation.

Verifying Permissions and Integrity of IPsec Policy in Active Directory

It may be necessary to verify information about the IPsec policy container in Active Directory. The following procedure uses the support tool ldp.exe.

To verify information about the IPsec policy container

1.

Click Start, Run, type ldp.exe and press ENTER.

2.

Select Connection, and then Connect. Specify the name of the target domain.

3.

Select Connection, and then Bind. Specify logon credentials for the target domain.

4.

Select View, and then Tree. Either specify no base DN and navigate to the IPsec policy container, or specify the LDAP DN for the IPsec policy container as a base location.

5.

Click the plus sign (+) next to the container node in the tree view. If you have Read permissions on the container, all IPsec policy objects in the container will display.

Note   Some domain users may not have Read access to the container because of the way permissions are configured. For more information, see Microsoft Knowledge Base article 329194, "IPSec Policy Permissions in Windows 2000 and Windows Server 2003".

For advanced troubleshooting of policy retrieval and corruption problems, ldp.exe can be used to manually inspect the contents of the IP Security container and the relationship of among IPsec policy objects. Windows 2000, Windows XP, and Windows Server 2003 use the same basic directory schema for IPsec policy that is shown in the IPsec Policy Structure diagram in the Windows Server 2003 How IPsec Works technical reference.

The following table shows the relationship between the Active Directory object names and the IPsec policy component names that are configured in the IPsec Policy Management MMC snap-in:

Table 7.4  IPsec Policy Component to Active Directory Object Name Mapping

IPsec policy component nameActive Directory object name

IPsec Policy

ipsecPolicy{GUID}

IKE Key Exchange Security Methods

ipsecISAKMPPolicy{GUID}

IPsec Rule

ipsecNFA{GUID}

IPsec Filter List

ipsecFilter{GUID}

IPsec Filter Action

ipsecNegotiationPolicy{GUID}

Ldp.exe provides the ability to identify the last time IPsec policy objects were modified, which can help troubleshoot object version and replication issues. It can be launched from a command window in the context of the local system to troubleshoot Read permission issues for the IPsec service.

Caution   It is strongly recommended that all objects in the IP Security container have the same permissions. Microsoft does not recommend setting permissions on individual IPsec policy objects. To control Read and Modify access for IPsec policy, permissions should be managed on the IP Security container itself as explained by Knowledge Base article 329194, "IPSec Policy Permissions in Windows 2000 and Windows Server 2003".

Corruption of the IPsec policy is the most common reason for situations in which an IPsec object contains a DN reference to an object that no longer exists. However, corruption may also occur if control characters become part of the name of an object, individual objects are unable to be read due to permission problems, or identical names for objects cause improper IPsec policy design (for example, two versions of the same filter list). See the following "IPsec Service" troubleshooting section for more information about how to correct IPsec policy corruption.

Note   The design details of these objects are considered an internal private data structure and are not published by Microsoft. Differences exist within the format of these objects in different Windows releases, and Microsoft may make changes in these formats at any time. Therefore, these objects should only be managed using the IPsec Policy Management MMC snap-in and the command-line tools that are available for each platform. You should only delete objects by using LDP as a final option, when corruption prevents the IPsec Policy Management MMC snap-in or command-line tools from being used.

Network Path Connectivity

Microsoft recommends that the ICMP protocol be exempted in server and domain isolation solutions. There are several reasons for this recommendation, including the need to use ICMP for network path testing by utilities such as Ping, Pathping and Tracert. These utilities should, therefore, work properly and not display the "Negotiating IP security" message. If this message displays, then an improper IPsec policy may have been assigned.

To determine whether the problem is related to basic network configuration or path connectivity

Can the client ping its own IP address or the local loopback address 127.0.0.1? If not, then there could be a problem with the TCP/IP configuration, a third-party firewall may be installed, the Ping utility is missing, or the IP configuration is invalid. Use other TCP/IP configuration troubleshooting procedures to investigate.

Can the client ping the default gateway shown in its IP configuration? If not, then the IP configuration on the client may be a problem, the local interface may not be connected or may have limited connectivity, local or network filters may be blocking traffic, or the network path to the default gateway may be interrupted. Use other TCP/IP troubleshooting procedures to investigate.

Can the client ping the DNS servers shown in its IP configuration? If not, then the DNS servers may not allow themselves to receive ICMP echo request messages, the IPsec policy may not be exempting the proper DNS server IP addresses, or any of the possible issues mentioned previously may exist. Use other TCP/IP troubleshooting procedures to investigate.

Can the client ping an IP address in the exemption list, such as a DC? If not, then  IPsec is not causing the problem or IPsec does not have a filter for that exempted IP address. The latter can be confirmed by inspecting the filter configuration. See the following IPsec policy section later in this chapter.

Can the client ping the IP address of the target destination? If yes, then basic network connectivity exists between the client and the target without IPsec. If no, then try tracert to the target and other destination IP addresses to determine how far the network path is valid. Use other TCP/IP and core network troubleshooting procedures to investigate.  

Path connectivity tests may succeed for ICMP, but not when using IKE or IPsec protocols. In particular, the IPsec overhead for IKE main mode authentication packets that contain the Kerberos ticket is often larger than the PMTU for the destination IP address, which requires fragmentation. Therefore, host-based firewalls, filtering in routers, network firewalls, and filters on the target host must be open to the following protocols and ports and support fragmentation:

IKE. UDP source port 500, destination port 500 and fragments

IKE/IPsec NAT-T. UDP source port 4500, destination port 4500

IPsec ESP. IP protocol 50 and fragments

IPsec AH. IP protocol 51 and fragments  

Stateful Filtering in the Path Is Not Recommended

Stateful filtering may cause connectivity problems for IKE, AH, and ESP because the state is typically based on activity timeouts. Devices cannot inspect IKE traffic to determine when IPsec SAs are deleted because these messages are encrypted by IKE. By definition, IKE is required to be able to rekey in either direction, which means delete messages may be sent in either direction. If one side does not receive a delete message, it may believe that an IPsec SA pair still exists when the peer no longer recognizes it and discards those packets that use it. The direction that IKE will rekey is based on the direction of traffic flow that expires the byte-based lifetime more quickly, the small offsets for rekey when the time-based lifetime expires, and the direction that packets flow after idle IPsec SAs are deleted. Host-based stateful filtering of IKE traffic on clients that initiate connections (and thus IKE negotiations) through Windows Firewall usually does not cause a problem. Windows Firewall does not filter IPsec packets, because the IPsec driver processes packets at a lower layer than the layer at which the firewall filtering is performed. However, the IKE ports should be configured open in the host firewall to receive incoming IKE negotiations for upper layer protocol connections that are allowed through the firewall (for example, for file sharing using SMB protocol over TCP port 445).

Support for ICMP PMTU Required by TCP

The default setting in Windows 2000 and later releases is for each TCP packet to have the Don't Fragment bit set in the IP header. This setting is preserved when either AH or ESP IPsec transport mode is used to secure the packet. Therefore, a packet that is too big will be dropped at the router and the router should return an ICMP Destination Unreachable message that specifies the maximum size allowed. This behavior is called TCP Path MTU Discovery. Both the client and the target computer must be able to receive ICMP PMTU messages for IPsec packets that are too big. It is especially important for IPsec-protected traffic to avoid fragmentation, because hardware acceleration typically does not process fragmented packets. Fragmented IPsec packets must be processed by the IPsec driver in software.

Windows 2000 and Windows XP do not support ICMP PMTU discovery processing for IPsec transport mode packets that use the NAT traversal encapsulation (UDP port 4500). Windows Server 2003 does support this discovery processing. See the "Troubleshooting Translational Bridging" page.

Note   There is a known issue that requires TCP PMTU detection to be enabled for IPsec to secure traffic in a NAT traversal scenario where IPsec UDP-ESP connections are initiated from a host outside of the NAT to a host behind a NAT. If this scenario is required, confirm that TCP PMTU detection is enabled either by ensuring that the following registry key is not defined or set to 1 on both sides:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\
Parameters\EnablePMTUDiscovery=1

(This key may display on more than one line; it is a single line in the registry.)

The Microsoft Windows Server 2003 Member Server Baseline Security Policy template and other third-party configurations may configure this registry key in order to disable TCP PMTU.

Support Required for Fragmentation

Network paths and filters must support passing fragments for the IKE, IPsec, AH, and ESP protocols. IKE uses UDP packets and allows them to be fragmented as necessary. The IPsec NAT traversal implementation added support for IKE fragmentation avoidance only when IKE authenticates with certificates (for example, in L2TP/IPsec VPN scenarios). IKE authentication that uses Kerberos does not support fragmentation avoidance and must be able to send and receive fragmented UDP packets that contain the Kerberos ticket.

The network path must support passing fragments for AH and ESP because IPsec secures the entire original IP packet before outbound fragmentation at the IP layer. IPsec is integrated with TCP so that when TCP packets have the DF (Don’t Fragment) flag set (the default setting), TCP will reduce its packet size to accommodate the additional bytes that are added by IPsec encapsulation.

IPsec is not integrated with UDP, and UDP applications do not have a method to detect if IPsec is protecting their traffic. Consequently, when IPsec AH or ESP is applied, UDP packets that use the full MTU size will become fragmented by the host when transmitted. Similarly, if IPsec policy filters do not exempt ICMP, use of the Ping utility may produce ICMP packets that appear as fragmented IPsec AH or ESP packets on the wire.

For more information, see Microsoft Knowledge Base article 233256, “How to Enable IPSec Traffic Through a Firewall”.

Support Required for Broadcast or Multicast Traffic

The IPsec policy design for server and domain isolation uses filters from Any <-> Subnet. Therefore, the outbound filter Subnet -> Any will match outbound broadcast and multicast traffic sent from hosts using an internal subnet IP address. However, because IPsec cannot secure multicast or broadcast traffic, it must discard such traffic if it matches the filter. Inbound multicast and most types of broadcast will not match the corresponding Any -> Subnet inbound filter. If multicast or broadcast traffic is required, then you can set the registry key to NoDefaultExempt=1, which allows multicast and broadcast traffic to bypass IPsec filtering in Windows XP and Windows Server 2003. This configuration prevents known problems with Real Time Communications (RTC) clients and Windows Media Server, both of which use multicast traffic. For details about the use and security implications of the NoDefaultExempt registry key, see the following Knowledge Base articles:

810207, IPSec default exemptions are removed in Windows Server 2003.

811832, IPSec Default Exemptions Can Be Used to Bypass IPsec Protection in Some Scenarios.

Note   Windows XP SP2 uses the same default exemptions as Windows Server 2003.

The registry key can be set to control the default exemptions as necessary for all platforms. IPsec filtering does not support configuring destination addresses for specific broadcast or multicast addresses.

Diagnostics in Network Devices May Not Be Useful

One of the impacts of using IPsec encapsulation is that applications which assume TCP/IP traffic is in plaintext can no longer inspect traffic within the network. Network diagnostic tools that inspect or provide reports based on TCP and UDP ports are unlikely to be able to interpret the IPsec-encapsulated packets, even if AH or ESP encryption is not used. Updates to such tools may be required from vendors to inspect IPsec AH or ESP-null packets.

Network Interface Card and Driver Issues

IPsec packet loss can sometimes be caused by network interface cards (NIC) that perform special functions. Cards that perform clustering or "teaming" should be tested for IPsec compatibility. NIC drivers that accelerate non-IPsec functions may have problems with IPsec-protected traffic. NICs that accelerate TCP functions may be ones that support TCP checksum calculation and validation (checksum offload), as well as the ability to efficiently send large TCP data buffers (large send offload). Windows 2000 and later releases automatically disable these TCP offload functions in the TCP/IP stack when the IPsec driver has filters, even if IPsec is performing only permit and block functions. Network card drivers that are not certified and signed by the Windows Hardware Quality Lab (WHQL) may cause problems when IPsec is used to protect traffic. An extensive set of tests is used by WHQL to certify NIC drivers that are designed to support IPsec offload. To assist troubleshooting, the Windows 2000, Windows XP and Windows Server 2003 TCP/IP stack supports a registry key option to disable all forms of TCP/IP offload. Some NIC drivers also support the ability to disable offload by using the Advanced properties of the network connection. The computer may need to be restarted for driver-level configuration changes to take effect.

Troubleshooting Packet Loss in IPsec Protocols

Packets can be discarded or lost, which may affect application connectivity. This section reviews common cases in which packets are discarded by IPsec. As previously mentioned, certain network devices may not allow IP protocol types 50 or 51 or UDP port 500 or 4500 to pass. Similarly, IPsec-encapsulated packets may cause some packets to fragment and not pass through the network. In such cases, a network monitor trace is usually needed from both sides of the communication to identify and correlate which packets are being sent and which received. Look for retransmissions indicated by the same size of packet appearing repeatedly. It may be necessary to capture a trace of the typical protocol behavior without IPsec and then compare it with the protocol behavior of IPsec-protected traffic.

Event Error 4285

Event title: Hash Authentication Failure

IKE and IPsec provide protection against modification of packets while they transit the network. If a device modifies a part of the packet that is protected by an integrity hash, then the receiving IKE or IPsec driver will discard this packet and cause the Hash Authentication Failure error, which is logged in the System Log as event 4285. Experience has shown that some devices, network drivers, and third-party packet processors occasionally corrupt packets of a certain size, those with a certain number of fragments, those of certain protocol types, or under certain conditions (such as when the device is congested, monitors traffic, or reboots). This error may also represent an attack on the packet by a malicious application, or by an application that did not realize it was protected. The error may also be an indicator of a denial of service attack.

To detect IPsec packet discards of corrupted packets, the following techniques can be used. However, it is also important to correlate these observations with a network monitor trace so that the source of the corruption can be found.

Examine the IPsec Packets Not Authenticated counter. In Windows Server 2003, this counter can be checked by using the IPsec counter in Performance Monitor, by using the netsh ipsec dynamic show stats command, or by looking at Statistics in the IPsec Security Monitor MMC snap-in. In Windows XP, this counter can be checked by using the ipseccmd show stats command or by looking at Statistics in the IPsec Security Monitor MMC snap-in. Windows 2000 shows this counter in the ipsecmon.exe graphical display, or by using the netdiag /test:ipsec /v command.

Enable IPsec driver logging and look for event 4285 in the System Log from source IPsec. See the "Toolkit" section in this chapter for details on how to enable IPsec driver logging. The event text will be similar to the following:

Failed to authenticate the hash for 5 packet(s) received from 192.168.0.10. This could be a temporary glitch; if it persists please stop and restart the IPSec Policy Agent service on this machine.  

Although the event text suggests that a restart of the IPsec service may fix the problem, the source of most packet loss problems is not the IPsec system. Restarting the service will not fix the problem and may cause more problems. The IPsec service should be stopped only as a last resort to identify whether a problem is IPsec-related or not.

Resolution of this error requires investigation to identify a pattern of source IP addresses, times of day, adapters, or conditions in which the error occurs. If the number of packets is small, then this error may not warrant investigation. It is important to start by trying to exclude sources of corruption in the local system. Disable IPsec offload, try to disable advanced or performance features of the driver using the configuration provided by Advanced Properties, and use the latest NIC drivers that are available from the vendor, preferably those certified and signed by the Windows Hardware Quality Lab. Then investigate the characteristics of the network paths through which the packet would be transmitted. Look for other evidence of packet corruption in TCP/IP packet discard statistics and on other computers that use the same configuration. The IP counter for Datagrams Received Discarded will increase each time IPsec discards a packet.

Note   To disable TCP/IP offload functionality, use the following registry key for computers running Windows 2000, Windows XP, or Windows Server 2003:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\IPSEC\
EnableOffload DWORD registry value to 0

(This key may display on more than one line; it is a single line in the registry.)

Then restart the computer.

Event Error 4268

Event title: Received Packets with Bad Security Parameters Index (SPI)

The Windows 2000 and Windows XP (including SP1) implementation of IKE has a known issue that results in packet loss under particular conditions. This issue is fixed in Windows Server 2003 and Windows XP SP2. The impact on upper layer protocol communications is usually negligible, because protocols already expect some packet loss for a variety of other reasons. This issue can be identified by the following issues:

Slow but consistent increase in Bad SPI counter values.

System Log event 4268 messages (if enabled). By default, Windows 2000 logs these messages to the System Log as event 4268. Windows XP does not log this event by default; driver logging must be enabled. Event text is similar to the following:

"Received <number> packet(s) with a bad Security Parameters Index from <ip address>. This could be a temporary glitch; if it persists please stop and restart the IPSec Policy Agent service on this machine."  

Although the event text suggests that a restart of the IPsec service may fix the problem, the source of this type of packet loss is part of the design of the IPsec system. Restarting the service will not fix the problem and may cause more problems. As noted earlier, the IPsec service should be stopped only as a last resort to identify whether a problem is IPsec-related or not.

An IPsec security parameter index (SPI) is a label in the packet that tells the responder which security association should be used to process the packet. If an SPI is not recognized, it is called a "bad SPI."  This error indicates that the receiving computer received IPsec-formatted packets when it did not have an IPsec SA with which to process them. Therefore, it must discard the packets.

These are benign error messages, although the packets are being discarded. The number of bad SPI events that are generated depends on how busy the computers are at the time and how fast IPsec-protected data is being transmitted at the time of rekey. The following conditions are likely to generate more of these events:

Transferring high volumes of IPsec traffic over 1 Gigabit or higher connections.

When there are heavily loaded (slow) servers and fast clients.

When there are slow clients communicating to a fast server.

When there are many active clients for a server, which causes IKE to constantly rekey with many clients simultaneously.  

The impact is that an IPsec-protected TCP connection will slow down for a few seconds to retransmit the lost data. If several packets are lost, TCP may go into congestion avoidance mode for that connection. In a few seconds the connection should resume full speed. File copy, Telnet, Terminal Server, and other TCP-based applications should not notice these few lost packets. However, there have seen some cases in which TCP loses a burst of packets on a fast link and must reset the connection. When the connection is reset, the socket is closed and applications are notified of a connection break, which may interrupt file transfers.

The most common cause of this error is a known issue with Windows 2000 that involves how IKE synchronizes IPsec SA keying. When an IKE quick mode initiator is faster than the responder, the initiator is able to send new IPsec SA secured packets sooner than the responder is ready to receive them. As specified in the Internet Engineering Task Force (IETF) IPsec Requests for Comment (RFC), when IKE establishes a new IPsec security association pair the initiator must use the new outbound IPsec SA to transmit data, and the slower responder receives IPsec-protected traffic that it doesn't yet recognize. Because IKE rekey is dependent on both the elapsed time and the amount of data sent under the protection of an IPsec SA, bad SPI events may be seen periodically (although not necessarily at specific intervals).

In third-party client interoperability scenarios, a bad SPI error may indicate that an IPsec peer did not accept and process a delete message or had problems completing the last step of IKE quick mode negotiation. The error can also mean that an attacker is flooding a computer with spoofed or injected IPsec packets. The IPsec driver counts these events and logs them to keep a record of bad SPI activity.

The LogInterval registry key can be used to investigate and minimize these events. When troubleshooting, set it to the minimum value (every 60 seconds) so the events are registered quickly. In Windows 2000, you can stop and restart the IPsec Policy Agent service to reload the IPsec driver. In Windows XP and Windows Server 2003, the computer must be restarted to reload TCP/IP and IPsec drivers.

In Windows 2000, these events cannot be eliminated by any current registry key settings or patches. The default setting in Windows XP and Windows Server 2003 is to not report these events. Reporting of these events can be enabled by using the IPsecDiagnostics configuration through the netsh ipsec command-line option, or through the registry key directly.

The following techniques can help minimize these errors:

Adjust the IPsec policy settings. Increase the quick mode lifetimes (if security requirements will allow it) to 21 or 24 hrs (idle IPsec SAs are deleted in 5 minutes if they are not used). To avoid potential security weaknesses introduced by using the same key to encrypt too much data, do not set a lifetime greater than 100 MB when using ESP encryption.

Use main mode or quick mode perfect forward secrecy (PFS), which will not cause this problem for the particular IPsec SA that is being negotiated. However, either setting will substantially increase the load on the computer that services many clients, and therefore may contribute to the delay in response to other negotiations.

Add CPU or other hardware to increase performance or reduce application loads.

Install an IPsec hardware acceleration NIC if one is not already installed. These cards substantially reduce the amount of CPU utilization that IPsec uses for high throughput data transfer.

If CPU utilization remains high, investigate use of a hardware accelerator product to speed up Diffie-Hellman calculations. These products are typically a PCI card with Diffie-Hellman exponentiation offload capability that accelerate the Diffie-Hellman calculations. This acceleration also benefits public and private key operations for certificates that use the Secure Sockets Layer (SSL) protocol. Verify with the vendor that their card specifically supports the "ModExpoOffload interface in CAPI for Diffie-Hellman calculations."

If possible, create a filter to permit certain high-speed traffic that does not need IPsec protection (for example, server backup traffic over a dedicated LAN).

If these options do not work, and upgrading to Windows XP SP2 or Windows Server 2003 is not possible, then contact Microsoft Product Support Services to see if there are other options currently available.

Event Error 4284

Event title: Packets in the clear that should be secured

This event indicates that an IPsec security association was established at a time when packets were received in plaintext that should have been inside the IPsec security association. These packets are discarded to prevent packet injection attacks on IPsec-secured connections. Although the IP counter for Datagrams Received Discarded will be incremented, IPsec does not have a counter value that records packets dropped for this reason. This issue can only be identified from System Log error event 4284, which reads as follows:

"Received <number> packet(s) in the clear from <IP address> which should have been secured.

This could be a temporary glitch; if it persists please stop and restart the IPsec Policy Agent service on this machine."

As with previous errors, the event suggestion should not be followed. It is unlikely that restarting the IPsec service will correct the error.

The most likely cause of the error is a policy configuration problem that causes one side to send traffic in the clear because of a more specific outbound permit filter. For example, if a client has a filter to secure all traffic with a server and the server policy has a more specific filter to permit plaintext HTTP replies, the server will secure all traffic to the client except outbound HTTP packets. The client receives these packets and discards them for security reasons, because it expects all traffic to and from the server to be secured inside the IPsec SA pair.

This event can also occur during regular operations and during third-party client interoperability cases in which one peer deletes an IPsec security association or a filter in the IPsec driver while traffic is flowing between the computers. For example, one side may unassign IPsec policy, or may experience a policy update that deletes IPsec SAs and filters. Because one peer has already deleted the filter while an active upper-level protocol communication is taking place, the IKE delete message may not arrive and be processed by the other peer before the plaintext packets arrive, which causes the error. Also, the amount of time it takes to process the delete message depends on the current load on the peer computer.

The error message may also happen while a large policy is being loaded, because IPsec security associations may become established before the full filter set is applied to the IPsec driver. If this situation occurs, IPsec SAs may be negotiated for traffic that will be exempt after policy loading has completed.

The error message can also be an indication of an injection attack where plaintext traffic is being sent that matches (either deliberately or by chance) the traffic selectors for a particular active inbound security association.

This problem should be escalated to the IPsec policy designer.

IPsec NAT-T Timeouts When Connecting Over Wireless Networks

A recent problem was found that causes connections to time out when Windows Server 2003 or Windows XP-based client computers attempt to connect to a server on a wireless network that uses IPsec NAT-T. For more information, see Microsoft Knowledge Base article 885267, “Connections time out when client computers that are running Windows Server 2003 or Windows XP try to connect to a server on a wireless network that uses IPsec NAT-T”.

Verifying the Correct IPsec Policy

This section describes steps for detecting problems with IPsec policy assignment and interpretation. Filters from a properly interpreted IPsec policy must be in the IPsec driver for IPsec to permit and block packets, as well as to trigger IKE to negotiate IPsec SAs with remote IP addresses to secure traffic. Appropriate filters must also be in place to guide IKE as a responder. In this solution, the IPsec policy design requires all traffic (except ICMP) to be secured by IPsec. The policy also contains filters for each IP address in the exemption list.

Note   In Windows 2000, the IPsec service is called the IPsec Policy Agent; in Windows XP and Windows Server 2003 this service is called the IPsec Service.

Support engineers must be familiar with the use of Group Policy by IPsec, IPsec policy precedence, and IPsec policy interpretation. References to information about these topics can be found in the "More Information" section at the end of this chapter.  

Troubleshooting Group Policy for IPsec

Group Policy provides the mechanism for assigning a domain-based IPsec policy to a domain member. The retrieval of assigned GPOs by the domain member is what delivers the IPsec policy assignment to a host computer. Therefore, any problems with GPO retrieval will cause computers to not apply the proper IPsec policy. Common issues with Group Policy for IPsec policy management include the following:

Replication delays of various configuration components in Active Directory

Problems with the Group Policy polling and download process

Confusion over which IPsec policy version is assigned

IPsec service is not running

IPsec policy in Active Directory cannot be retrieved, so a cached copy is used instead

Delays because of IPsec policy polling for retrieval of currently assigned IPsec policy

Replication may be delayed because of the number of IPsec-related objects in Active Directory, such as IPsec policies, GPOs, attribute changes in GPO IPsec policy assignments and IPsec policy, and security group membership information. Careful planning must be done to assess the impact of an IPsec configuration change as it gradually takes effect on domain members.

For Group Policy troubleshooting procedures, see the following white papers:

Troubleshooting Group Policy in Windows 2000