.NET Telephony Server Platform
Peter Gavalakis
Intel Corporation
.NET Telephony Server Platform
In our recent article entitled “Telephony
101,” we explored basic concepts of voice communications and telephony systems.
Today we will look at how these apply to SALT-based speech solutions. We will
focus on the telephony server, the
component that provides the interface between the Web server and the telephone
network.
You can learn more about this topic at the
upcoming Web Cast on
OVERVIEW
SALT-based speech solutions share
application logic with enterprise data applications. This application code runs
on a Web server. The user interface(s) are implemented separately from the
application. “Pages” written in markup language are transmitted from the
application server to a platform running an interpreter. For visual user
interfaces, the platform is commonly a PC running an HTML browser. The browser
interprets the markup and presents the information on a visual display.
Voice interfaces for Web applications can
be implemented in a similar way, but with one important difference. Most
wireless and wireline phones do not have the processing capability to run SALT
interpreters and speech processing software. To solve this problem, a telephony
server is deployed as an
intermediary between the Web server and the client phones. Also known as a SALT
gateway, the telephony server interprets SALT
markup language and maintains state information on all speech and telephony
resources. It also provides the physical connection and signaling path to the
voice network. These complex functions are abstracted from the application
developer, who only needs to understand SALT and/or the tools used to build the
SALT interface. With a SALT-based telephony server, a single speech application
can be deployed in a variety of telephony environments.
Figure 1 shows a how the telephony server
fits into a SALT-based speech solution. Speech services such as speech recognition and speech synthesis may run on the
same platform as the telephony server or — for higher-density applications — in
a distributed environment (i.e., on separate servers).
Note that clients ¾
such as voice-enabled PDAs and
next-generation wireless “smartphones” ¾
have the
processing power to run a SALT interpreter and perhaps even speech engine
services. When these devices are used, the telephony server may not be needed.
However, there are few of these devices relative to the large number of
traditional wirelss and wireline phones, currently counted in the billions. The
gateway-based architecture will remain valid for the foreseeable future.

FIGURE 1: ARCHITECTURE OF SAL
A
CLOSER LOOK AT THE TELEPHONY SERVER
The SALT interface abstracts the
application from the underlying complexities of the phone network(s) and speech
resources to which it connects. Thus, a single application can:

FIGURE 2: TELEPHONY SERVER—DETAIL
Figure 2 shows
the major components of the telephony server. This example is based on a
standard server based on IntelÒ
processors (e.g., Intel® Pentium® processors or Intelâ
XeonÔ
processors) running the WindowsServer2003*
operating system. The platform contains three primary functional components,
which are highlighted in Table 1.
|
Component |
Description |
|
Telephony Board |
Provides the physical connection and
communication protocols to the network. Intel offers a broad range of
platform densities — a single PCI or CompactPCI* slot can support from 4 to
96 voice channels. |
|
Telephony Application Services |
Provides SALT language interpretation.
Also maintains state information for all speech and telephony resources. |
|
Telephony Interface Manager (TIM) |
A layer of software that links the
Telephony Application Services layer to the telephony board. This module is
kept separate because different telephony environments may require different
TIMs and various vendors may supply TIMs with differing features. By
contrast, the Telephony Application Services module operates at a higher
level of abstraction and can remain the same across all telephony
environments. |
DESIGN
CONSIDERATIONS
Proper design of the telephony server will
maximize the effectiveness and efficiency of your solution by eliminating busy
signals, ensuring the availability of speech processing resources, and
providing cost-effective connectivity to the existing phone network.
Network
Interface: Physical and Signaling Connections
Your speech application must fit with the
existing communications infrastructure. In many cases, this means connecting to
an enterprise phone switch (PBX). There are a number of issues that should be
addressed:
The number of voice channels supported by
your system is driven by a number of factors including the number of
simultaneous callers the system must handle, the connect time of the call, and
the way in which call control functions such as transfers are implemented. To
illustrate this we will walk through a simple example.
Let’s assume a caller is interacting with
the speech server system, then wants to speak with a customer service representative
(CSR). The application may be designed to implement a “blind transfer” where
the call is released form the telephony server and the PBX switches the call to
the CSR. In this case, a single port on the telephony server is used for the
call. Once the transfer is implemented, the call is released back to the PBX
and the port on the telephony server is freed.
Alternatively, the application may
implement a “supervised transfer”. In this case, the server remains connected
to the call and the server port remains occupied until the application
determines that the CSR has answered and is prepared to accept the call. At
this point, the transfer operation is completed and the server port becomes
free to take another call.
In a third possible scenario, the
application remains connected to the call even after the CSR is on the line and
conversation has begun. This can be arranged through a three-way call in the
PBX (in which case one server port remains occupied for the total duration of
the call), or by using two ports on the server – one to handle the original
caller line and one to extend the call through the PBX to the CSR. In this
latter case, two ports are occupied on the telephony server for the
total duration of the call. This arrangement, called a “hairpin” or “trombone”
arrangement, consumes more server ports but allows you to implement variations
of transfer and conference operations that would not be possible using other
techniques. For example, you could implement a transfer operation using a telephone
connection that did not support native transfer signaling.
To properly estimate the number of ports
required on the telephony server, you need to decide which call handling
scenarios you will implement, then estimate the number and duration of the
calls you will be handling.
Host
CPU Sizing for Telephony
The telephony server must be designed with
adequate processing capacity to support the expected traffic. Since today’s
speech engines run on host CPUs, host loading is most significant when the speech
engine services (speech recognition, speech synthesis) run on the telephony
server. It is impossible to properly engineer speech services without
understanding how often the application needs to invoke recognition and
synthesis operations, the complexity of the grammar templates involved, and how
efficient the engines are. Intel will continue working with members of the
speech ecosystem to define reference architectures for speech engine
deployments, and to evaluate the performance of those reference implementations
for representative speech-enabled applications.
The computational load imposed by speech
recognition engines can be reduced if the TIM module and its associated
hardware support speech pre-processing technology such as Intel® Dialogic® Continuous Speech Processing. Once a speech
recognition operation is initiated, the caller may not begin speaking
immediately. Without the Continuous Speech Processing Technology, the
recognition engine will still be processing the near-silence in the audio stream
and will be attempting to recognize meaningful utterances in that near-silent
stream. It will therefore be consuming significant processing cycles. With Continuous Speech Processing Technology, near-silence in the audio
stream is truncated in the TIM module and not sent to the recognition engine
until meaningful audio is received. This reduces the processing cycles consumed
by the recognition engine; it does not start its work until there is something
significant to be analyzed.
Speech engine services not withstanding,
most of today’s server platforms have ample processing capacity to support the telephony
application server, telephony
interface manager and network
interface card. These functions
will generally contribute only a small portion of the overall computational
load in a speech server.
CONCLUSION
In SALT solutions, the telephony server
abstracts the developer from the complexities of the voice network(s) to which
the application connects. This article provided an introduction to the telephony
server, including an overview of server components and guidelines on
configuring and deploying real-world servers.
Additional detail will be covered in the
June 24 Web Cast entitled .NET Telephony Server Platform.
IntelÒ
Communications Systems Products
Intel, the world's largest chipmaker, is also a leading manufacturer of computer, networking and communications products. Intelâ communications systems products offer developers, service providers, resellers, and communications system owners what they need to succeed in the new world of converged voice and data communications. This includes a broad family of building blocks, a global network of solutions providers, and comprehensive support and consulting services. Ranging from boards to server software, Intel building blocks meet the converged communications needs of environments as diverse as enterprise organizations and service providers. These building blocks include voice, fax, conferencing, and speech technologies; telephone and IP network interfaces; PBX integration products; carrier-class, board systems-level products; and more. Intel communications building blocks enable new, converged Web services including Internet voice browsing. For more information, visit www.intel.com/network/csp/products/index.htm.
*Other names and brands may be claimed as the property of others.
Intel, Intel Dialogic, Pentium, Intel Xeon, and the
Intel logo are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Copyright © 2003
Intel Corporation.