An Introduction to Telephony Call Control with Microsoft Speech Server 2004
By Dan Kershaw, Program Manager, Microsoft Speech Server
Introduction
Telephony call control is an important part of speech-enabled or traditional Interactive Voice Response (IVR) applications. In addition to enforcing the business logic and correctly handling any speech input and output, applications must interact with the phone system for simple actions such as answering, making, disconnecting, and transferring calls. In addition, some applications require extended call control actions, such as setting up three-way consultation calls in order to conduct supervised transfers or conference calls.
Microsoft Speech Server 2004 (MSS) and its development tool set, the Microsoft Speech Application SDK 1.1 (SASDK), address both basic and extended call control. The SASDK includes basic call controls as well as extended call controls that wrap and handle the underlying call control communication via the Speech Application Language Tags (SALT) specification Simple Messaging EXtension (smex) element. This article provides an introduction to both the basic and extended call control functionality supported by MSS.
Call Signaling Support
Before discussing call control in any detail, it's important to understand a little about the call information and call transfer capabilities available with various call signaling protocols. We'll first discuss ANI and DNIS, two pieces of call information helpful for many extended call control scenarios. Then we'll discuss methods of call transfer. Both application developers and administrators need to be aware of a signaling protocol's capabilities as this could impact the desired functionality for the planned service.
Call information and signaling protocols
When a call arrives at the MSS/TIM server(s), the signaling layer may present information about the call to the application. The most important pieces of information are:
- Automatic Number Identification (ANI): This typically represents the caller's number (often termed caller ID). In the Computer Supported Telephony Application (CSTA) message and in the SASDK's CallInfo object, the ANI is represented by the CallingDevice property. Knowledge of this information would enable an application to identify the caller and potentially provide tailored or premium services to the caller.
- Dialed Number Information Service (DNIS): This typically represents the number that the originating caller dialed. In CSTA and in the SASDK's CallInfo object, the DNIS is represented by the CalledDevice property. Knowledge of this information would enable MSS to associate different application services with different dialed numbers, even when the calls with different DNISs are routed to the same computer running MSS.
As the following table shows, most of the common signaling protocols provide ANI and DNIS, but some do not.
| Signaling protocol | ANI | DNIS |
|---|---|---|
| Analog | Yes, however the analog line must be provisioned with caller ID. This should be done on the PBX, or by the carrier (if direct lines come from the carrier). | No. This is not available for analog lines. |
| T1 CAS | This may be available. In many cases, CAS does not provide any ANI information. However, some switches/PBXs can be configured to provide this information. | This may be available. In many cases, CAS does not provide any DNIS information. However, some switches/PBXs can be configured to provide this information. |
| ISDN PRI | Yes | Yes |
Call transfer support
Similar to support for ANI and DNIS, different signaling protocols support transfers in a variety of ways. Each type of transfer may also require differing numbers of physical channels to enable the transfer.
Call transfer support provided by the various signaling protocols is summarized in the following table:
| Signaling protocol | Blind Transfer | Bridged consultation (supervised) transfer | Consultation (supervised) transfer |
|---|---|---|---|
| Analog | Yes. This is also known as a hook-flash transfer. | Yes. Ties up 2 channels during and after the transfer. | No |
| T1 CAS | Yes | Yes. Ties up 2 channels during and after the transfer. | Maybe. This is possible, but potentially difficult to configure on the PBX. |
| ISDN PRI | Maybe. AT& T *T (or *8) transfer for 1-800 services. This may also be available with other carriers. | Yes. Ties up 2 channels during and after the transfer | Maybe. Supported through 2B channel transfers (TBCT). See TBCT definition. |
Because the TIM may not support all of these possibilities, please consult your TIM documentation or TIM vendor to understand the supported transfer types for your particular TIM implementation.
After you've determined which call functions your telephony infrastructure supports, you can choose the basic and/or extended call control capabilities that your speech application will use. However, to write applications that make effective use of these controls, you should have some background on how MSS processes telephony call controls.
Speech Server and Telephony Processing
As shown in Figure 1, MSS has two main components:Speech Engine Services (SES) and Telephony Application Services (TAS). The TAS component of MSS interacts with the phone system through a middle layer called the Telephony Interface Manager (TIM).
Figure 1 - MSS communicates with a telephone system through the Telephony Interface Manager
The call control interaction between Speech Server-based applications and the TIM is done through the exchange of XML messages defined by the ECMA standard "XML Protocol for Computer Supported Telecommunications Applications (CSTA) Phase III". This standard, often known as Standard ECMA-323, specifies XML schema definitions for the CSTA Phase III Services Standard, Standard ECMA-269, a set of services and events defined for computer- to-telephony communication. This computer-to-telephony communication is often referred to as CSTA communication. While speech application developers do not work directly with the Standard ECMA-323 messages, they need to understand the underlying messages used for call control requests, responses, and events so they can design and implement controls within the SASDK to generate and handle these CSTA messages.
Call Control Using CSTA and Smex
TAS runs speech-enabled Web applications containing both application and call control logic created by developers using the SASDK. While developers work with call control objects within the SASDK, behind the scenes CSTA XML messages are generated, serving as the communication link between TAS and the TIM. The SALT interpreter, part of TAS, establishes a "communication channel" to the TIM for call control purposes. The SALT
Figure 2:Call control communication between the application and the TIM<
Basic Call Controls
The ASP.NET Call Management Controls, which are provided in the SASDK, simplify the authoring of telephony applications and hide the potentially complicated CSTA interaction from the application developer by:
- Wrapping specific client-side CSTA service requests
- Receiving and processing call control events
- Handling CSTA errors
The SASDK Call Management Controls include:AnswerCall, DisconnectCall, MakeCall, TransferCall, and SmexMessage.
AnswerCall Control
This control answers incoming calls from a telephony device using the CSTA AnswerCall service, and stops dialogue flow until an incoming call is answered. It also provides access to information such as the calling party's number and the dialed number. To support applications that require Computer Telephony Integration (CTI), the Established and Delivered events may contain CTI correlator data associated with the call. The speech platform places this information in the CorrelatorData property of the CallInfo control.
DisconnectCall Control
This control disconnects a call using the CSTA ClearConnection service. In general, application authors should use the DisconnectCall control to end the phone call in a speech application. When this control completes its operation successfully, TAS terminates the dialogue and by default closes the application and resets itself without posting back to the server.
To force a postback before the interpreter resets, application authors can either set the control's AutoPostback property to True, or call the SpeechCommon.Submit method in its OnClientDisconnected event handler. The DisconnectCall control only allows the application developer to disconnect the original party on the call. This control cannot be used to drop the consulted call leg in a three-way conference call.
MakeCall Control
Using the CSTA MakeCall service, this control, when activated by RunSpeech , initiates an outbound telephone call to the specific number on the telephony device. Further speech dialogue on the page is blocked until the call is either connected or fails to connect.
TransferCall Control
This control transfers the current active call using the CSTA SingleStepTransfer service. This provides blind transfer functionality only.
When RunSpeech runs this object, it blocks any further speech dialogue until the transfer either succeeds or fails. When the TransferCall control completes its operation successfully, TAS terminates the dialogue and by default closes the application and resets itself without posting back to the server. To force a postback before the interpreter resets, application authors can either set the control's AutoPostback property to True, or call the form.Submit method in its OnClientTransferred event handler.
The SmexMessage Control
While the basic call controls above provide the core call control functionality needed for telephony-based speech applications, the SmexMessage control is the key to advanced and extended call control capabilities. With it, a developer can create custom call management controls such as supervised transfers, conference calls, and more.
The SmexMessage control handles generic XML messages and events. By default, this control sends and receives XML messages to and from the TIM using a SALT smex element that is automatically generated by the control. The content of these messages must be CSTA XML messages. The SmexMessage control does not parse the content of messages it receives, and cannot determine when a received message indicates an error condition. When this control receives a message indicating an error condition, such as CSTA FailedEvent or CSTAErrorCode messages from the TIM, it raises the OnClientReceive event rather than the OnClientError event. The SmexMessage OnClientError event is triggered only when the client-side smex object raises its onerror event.
Extended Call Control
While the basic Call Management controls AnswerCall, MakeCall, DisconnectCall, and TransferCall provide core call control processing, often a developer needs to go beyond these functions and needs more advanced call control to accomplish the task. This is where the SmexMessage call control comes into play. The SmexMessage control provides baseline functionality with which a developer can create more sophisticated extended call control capabilities. To illustrate the use of extended call control, let's look at a common scenario found in many call centers.
Many traditional or speech-enabled IVR applications deployed in call centers require the ability to transfer calls and create conferences with multiple parties. Companies typically prefer supervised transfers (transfers where the transferring party maintains control over the transfer until complete) over blind transfers (transfers where the transferring party has no control over the transfer success) because it allows the transferring party to ensure a real person (or at least working voice mail) answers the call on the other end.
Consultation calls are the first step to both supervised transfers and conference calls. A supervised transfer is started using a CSTA Consultation Call service request and is completed with a Transfer Call service request. Similarly, a three-way conference call is also initiated through a Consultation Call service request and is completed with a Conference Call service request.
A Consultation Call is a compound service request that places the original call on hold and (typically on a separate channel) issues an outbound call to the "consulted" party. Once an active connection is established with the consulted party, the Consultation Call service request completes. Generally, two physical channels are required for a consultation call:one for the held party and one for the consulted party. Figure 3 shows the connection states before a consultation call is issued and after it completes.
Figure 3:Consultation call connection states
- D1 is the consulting device. In Figure 3, D1 is a speech application attached to a local TIM channel resource.
- D2 is the original calling party (device) connected to the MSS system.
- D3 is the consulted party (device), the target party for the transfer or conference.
Before the consultation call is issued, the calling party (D2) is involved in an active call (C1) with the speech application, which is running inside TAS. During the consultation call, the original caller is placed on hold at the consulting device (D1), and a new call (C2) is initiated to the consulted device (D3). After the consultation call is connected, the consultation call service request completes, and the new call (C2) is in an active connected state between the consulting device (D1) and the consulted device (D3).
After a consultation call has been successfully established, the application has three completion options-transfer, conference, and reconnect (as shown in Figure 4).
Figure 4:Supervised transfers and conference calls with CSTA
- TransferCall transfers the held party to the consulted party (supervised transfer).
- ConferenceCall joins the held party, the consulted party, and the speech application in a three-way conference call. This service is commonly used for virtual assistant-type scenarios.
- ReconnectCall is another compound service that disconnects the consulted party connection and retrieves the held party back into an active state (with the speech application).
Figure 4 also shows that in the case of a Failed Event, a ReconnectCall is also explicitly required to return the held call to an active call state with the speech application. Some TIM implementations may automatically clear the consultation and retrieve the held call in the event of a Failed Event.
For most extended call control functionality, including supervised transfers and conference calls, the CSTA Consultation Call service is crucial, and understanding how it works is imperative for the developer who wants to exploit extended call control.
While an in-depth drill-down on the CSTA messaging flow of extended call control is beyond the scope of this article, look for a forthcoming technical white paper, entitled "Telephony Call Control in Microsoft Speech Server 2004" for details and sample code you can use to start building sophisticated extended call control functionality into your telephony Speech Server-based applications.
Conclusion
By virtue of its support for the Speech Application Language Tags (SALT) specification and its Simple Messaging EXchange (smex) element, telephony call control capabilities in MSS allow a developer to create sophisticated telephony-based speech applications that can exploit both basic call control services, using the included basic call controls, or extended call control services, by creating custom call controls that harness the power of CSTA. For more in-depth information on this topic, look for the "Telephony Call Control in Microsoft Speech Server 2004" white paper, coming to the Microsoft Speech Server Web site in August, 2004.