.NET  Telephony Server Platform

Peter Gavalakis

Intel Corporation
.NET  Telephony Server Platform

In our recent article entitled “Telephony 101,” we explored basic concepts of voice communications and telephony systems. Today we will look at how these apply to SALT-based speech solutions. We will focus on the telephony server, the component that provides the interface between the Web server and the telephone network.

 

You can learn more about this topic at the upcoming Web Cast on June 24, 2003 at 11:00 a.m. PST.

 

OVERVIEW

SALT-based speech solutions share application logic with enterprise data applications. This application code runs on a Web server. The user interface(s) are implemented separately from the application. “Pages” written in markup language are transmitted from the application server to a platform running an interpreter. For visual user interfaces, the platform is commonly a PC running an HTML browser. The browser interprets the markup and presents the information on a visual display.

 

Voice interfaces for Web applications can be implemented in a similar way, but with one important difference. Most wireless and wireline phones do not have the processing capability to run SALT interpreters and speech processing software. To solve this problem, a telephony server is deployed as an intermediary between the Web server and the client phones. Also known as a SALT gateway, the telephony server interprets SALT markup language and maintains state information on all speech and telephony resources. It also provides the physical connection and signaling path to the voice network. These complex functions are abstracted from the application developer, who only needs to understand SALT and/or the tools used to build the SALT interface. With a SALT-based telephony server, a single speech application can be deployed in a variety of telephony environments.

 

Figure 1 shows a how the telephony server fits into a SALT-based speech solution. Speech services such as speech recognition and speech synthesis may run on the same platform as the telephony server or — for higher-density applications — in a distributed environment (i.e., on separate servers).

 

Note that clients ¾ such as voice-enabled PDAs and next-generation wireless “smartphones” ¾ have the processing power to run a SALT interpreter and perhaps even speech engine services. When these devices are used, the telephony server may not be needed. However, there are few of these devices relative to the large number of traditional wirelss and wireline phones, currently counted in the billions. The gateway-based architecture will remain valid for the foreseeable future.

 

FIGURE 1: ARCHITECTURE OF SALT SOLUTION

 

A CLOSER LOOK AT THE TELEPHONY SERVER

The SALT interface abstracts the application from the underlying complexities of the phone network(s) and speech resources to which it connects. Thus, a single application can:

  • Run on a variety of networks (e.g., circuit-switched, 802.11b, Ethernet)
  • Use a variety of communications protocols (e.g., H.323, ISDN, SIP, SS7)
  • Support a variety of speech recognition and synthesis engines

 

FIGURE 2: TELEPHONY SERVER—DETAIL

 

Figure 2 shows the major components of the telephony server. This example is based on a standard server based on IntelÒ processors (e.g., Intel® Pentium® processors or Intelâ XeonÔ processors) running the WindowsServer2003* operating system. The platform contains three primary functional components, which are highlighted in Table 1.

 

Component

Description

Telephony Board

Provides the physical connection and communication protocols to the network. Intel offers a broad range of platform densities — a single PCI or CompactPCI* slot can support from 4 to 96 voice channels.

Telephony Application Services

Provides SALT language interpretation. Also maintains state information for all speech and telephony resources.

Telephony Interface Manager (TIM)

A layer of software that links the Telephony Application Services layer to the telephony board. This module is kept separate because different telephony environments may require different TIMs and various vendors may supply TIMs with differing features. By contrast, the Telephony Application Services module operates at a higher level of abstraction and can remain the same across all telephony environments.

 

 

DESIGN CONSIDERATIONS

Proper design of the telephony server will maximize the effectiveness and efficiency of your solution by eliminating busy signals, ensuring the availability of speech processing resources, and providing cost-effective connectivity to the existing phone network.

 

Network Interface: Physical and Signaling Connections

Your speech application must fit with the existing communications infrastructure. In many cases, this means connecting to an enterprise phone switch (PBX). There are a number of issues that should be addressed:

  • Does the PBX support analog or digital connections?
  • Are there available ports on the switch, or do they need to be added?
  • If ports must be added, what is the cost?
  • Which communications protocols does the switch support (and therefore must the telephony server support)?

 

Telephony Port Density

The number of voice channels supported by your system is driven by a number of factors including the number of simultaneous callers the system must handle, the connect time of the call, and the way in which call control functions such as transfers are implemented. To illustrate this we will walk through a simple example.

 

Let’s assume a caller is interacting with the speech server system, then wants to speak with a customer service representative (CSR). The application may be designed to implement a “blind transfer” where the call is released form the telephony server and the PBX switches the call to the CSR. In this case, a single port on the telephony server is used for the call. Once the transfer is implemented, the call is released back to the PBX and the port on the telephony server is freed.

 

Alternatively, the application may implement a “supervised transfer”. In this case, the server remains connected to the call and the server port remains occupied until the application determines that the CSR has answered and is prepared to accept the call. At this point, the transfer operation is completed and the server port becomes free to take another call.

 

In a third possible scenario, the application remains connected to the call even after the CSR is on the line and conversation has begun. This can be arranged through a three-way call in the PBX (in which case one server port remains occupied for the total duration of the call), or by using two ports on the server – one to handle the original caller line and one to extend the call through the PBX to the CSR. In this latter case, two ports are occupied on the telephony server for the total duration of the call. This arrangement, called a “hairpin” or “trombone” arrangement, consumes more server ports but allows you to implement variations of transfer and conference operations that would not be possible using other techniques. For example, you could implement a transfer operation using a telephone connection that did not support native transfer signaling.

 

To properly estimate the number of ports required on the telephony server, you need to decide which call handling scenarios you will implement, then estimate the number and duration of the calls you will be handling.

 

Host CPU Sizing for Telephony

The telephony server must be designed with adequate processing capacity to support the expected traffic. Since today’s speech engines run on host CPUs, host loading is most significant when the speech engine services (speech recognition, speech synthesis) run on the telephony server. It is impossible to properly engineer speech services without understanding how often the application needs to invoke recognition and synthesis operations, the complexity of the grammar templates involved, and how efficient the engines are. Intel will continue working with members of the speech ecosystem to define reference architectures for speech engine deployments, and to evaluate the performance of those reference implementations for representative speech-enabled applications.

 

The computational load imposed by speech recognition engines can be reduced if the TIM module and its associated hardware support speech pre-processing technology such as Intel® Dialogic® Continuous Speech Processing. Once a speech recognition operation is initiated, the caller may not begin speaking immediately. Without the Continuous Speech Processing Technology, the recognition engine will still be processing the near-silence in the audio stream and will be attempting to recognize meaningful utterances in that near-silent stream. It will therefore be consuming significant processing cycles. With Continuous Speech Processing Technology, near-silence in the audio stream is truncated in the TIM module and not sent to the recognition engine until meaningful audio is received. This reduces the processing cycles consumed by the recognition engine; it does not start its work until there is something significant to be analyzed.

 

Speech engine services not withstanding, most of today’s server platforms have ample processing capacity to support the telephony application server, telephony interface manager and network interface card. These functions will generally contribute only a small portion of the overall computational load in a speech server.

 

CONCLUSION

In SALT solutions, the telephony server abstracts the developer from the complexities of the voice network(s) to which the application connects. This article provided an introduction to the telephony server, including an overview of server components and guidelines on configuring and deploying real-world servers.

 

Additional detail will be covered in the June 24 Web Cast entitled .NET Telephony Server Platform.

 

IntelÒ Communications Systems Products

Intel, the world's largest chipmaker, is also a leading manufacturer of computer, networking and communications products. Intelâ communications systems products offer developers, service providers, resellers, and communications system owners what they need to succeed in the new world of converged voice and data communications. This includes a broad family of building blocks, a global network of solutions providers, and comprehensive support and consulting services. Ranging from boards to server software, Intel building blocks meet the converged communications needs of environments as diverse as enterprise organizations and service providers. These building blocks include voice, fax, conferencing, and speech technologies; telephone and IP network interfaces; PBX integration products; carrier-class, board systems-level products; and more. Intel communications building blocks enable new, converged Web services including Internet voice browsing. For more information, visit www.intel.com/network/csp/products/index.htm.

 

*Other names and brands may be claimed as the property of others.

Intel, Intel Dialogic, Pentium, Intel Xeon, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Copyright © 2003 Intel Corporation.