Tips and Tricks

Tips and Tricks

Current Archive

Beta 2 Archive

Want some helpful hints for getting the most out of the Microsoft Speech Application SDK? Check out the tips and tricks section.

Have other questions or comments? Join the discussion about the Microsoft Speech Application SDK. Visit our newsgroup at microsoft.public.netspeechsdk.


Q.How Can I Find More Tools to Troubleshoot Speech Applications?
A.

Grammar compilation and loading, speech application call flow, and Speech Application Programming Interface (SAPI) error codes are all complex topics with which any speech application developer might need occasional help.

At the Microsoft Download Center, there's a small collection of tools that a speech application developer might find useful. The tools were not included with the Microsoft Speech Application SDK Version 1.1 (SASDK). They are unsupported, but they’re free.

The tools are listed and described in the following sections.

GramStat Speech Utility for Microsoft Speech Technologies

The GramStat Speech Utility is a command-line utility that provides statistics for both compiled files and raw grammar files. These statistics can be used to perform basic grammar analysis, and to troubleshoot grammar compilation problems and loading problems.

Recognizer Speech Utility for Microsoft Speech Technologies

The Recognizer speech utility is a command-line utility that is useful for the analysis of offline call flow, the diagnosis of simple speech recognition errors , and top-line error diagnosis for grammars, rules, and speech application installations.

SAPIErr Speech Utility for Microsoft Speech Technologies

The SAPIErr speech utility is a command-line lookup utility that is useful for deciphering SAPI error codes that are returned by either the speech recognizer, the Microsoft Speech Server 2004 prompt engine, or the SAPI itself.

GetPron Speech Utility for Microsoft Speech Technologies

The GetPron speech utility is a command-line tool that takes a list of words and the outputs pronunciations for those words that are used by the Microsoft Speech Server 2004 speech-recognition engine.

BuildAppLex Speech Utility for Microsoft Speech Technologies

The BuildAppLex.exe speech utility is a command-line tool that enables you to create an Application Lexicon by using the Speech API. The BuildAppLex.ese speech utility requires one required command-line argument: a text file that contains a list of words and their corresponding pronunciations.

These tools can be downloaded by searching Microsoft.com for ‘speech utilities’ or by going to http://www.microsoft.com/downloads/details.aspx?FamilyID=52744fb8-9238-4cbd-b615-be2ca781880d&displaylang=en.

Q.How Can I Make My Grammars Load Faster?
A.

Grammars created and edited in Grammar Editor in Microsoft Speech Application SDK (SASDK) 1.1 are XML text files. Text is fine for development work and debugging, but speed is important in a production environment. Because a compiled grammar is smaller, it loads faster from the Web server.

Use the command-line grammar compiler, SrGSGc.exe, to compile your XML grammars. The grammar compiler installs with the SASDK and by default is located at %SystemDrive%\Program Files\Microsoft Speech Application SDK 1.1\SDKTools\Bin. The following example shows how to compile a grammar called Input.grxml into a grammar called Output.cfg.

Srgsgc.exe /O C:\myProject\Grammars\Input.grxml C:\myProject\Grammars\Output.cfg

As a rule of thumb, compiling a 320-KB text grammar yields a 210-KB compiled grammar.

Q.Can I return more than one semantic value as the result of recognition on a single branch in a .grxml grammar?
A.

The short answer to this question is yes. The tip this month illustrates how to do this using semantic interpretation markup in .grxml grammars for Microsoft Speech Server (MSS) 2004 and MSS R2.

Understanding the Issue

To better understand the issue, imagine the following scenario. Suppose you want to create a directory assistance application for your organization. You want a user to be able to call your organization's main number, say the name of the person to whom the user wants to speak, and then offer the user the choice of connecting to either the contact person's office phone or cell phone. For this task, the grammar that your application uses must be able to recognize the person's name and return at least two semantic values: one value representing the person's office phone number and one representing the person's cell phone number.

To accomplish this, for each contact you can use an item element to contain the contact name and corresponding tag elements to return the semantic information associated with that contact. The key here is that because the tag element contains ordinary ECMA script, you can use multiple script expressions within tag elements to declare, store, and return multiple property values.

A Quick Review

Before looking at a few examples, recall the following points about semantic interpretation in .grxml grammars:

A tag element contains script expressions that are executed when the recognizer follows the branch in which the tag element is located. The Microsoft speech recognizer serializes the script products and generates the semantic result in the form of a Semantic Markup Language (SML) output.

Every rule element in a grammar has a single Rule Variable object ('$') that holds a semantic value. You can use script expressions contained in tag elements to define properties of the Rule Variable and these are returned as child nodes in the SML output.

Two properties are predefined for the Rule Variable and for all custom-defined properties. These are the _value and the _attributes properties. The _value property produces the text content of an SML node, and the _attributes property produces XML attributes in the start tag of a node.

With these points in mind, we are ready for a few examples. The examples illustrate how to create custom-defined properties of the Rule Variable, but differ in how they store the semantic values that you want to return for a successful recognition. These differences produce differently structured SML output.

Example 1: Returning Semantic Values in Child Nodes of the SML Return

The following example illustrates how you can store semantic values in the _value property for custom-defined properties so that these semantic values are returned as the content of child nodes. In this example, the script expressions contained in the first three tag elements initialize three custom properties of the Rule Variable: the first to hold the contact's name, the second to hold the contact's office phone extension, and the third to hold the contact's cell phone number. The script expressions contained in the last three tag elements set the semantic value for each custom property using the _value property of the custom property.

<rule id="Contacts" scope="public">
    <tag>$.ContactName={}</tag>
    <tag>$.ContactOfficeExtension={}</tag>
    <tag>$.ContactCellPhone={}</tag>
    <item>John Smith
        <tag>$.ContactName._value="John Smith"</tag>
        <tag>$.ContactOfficeExtension._value="1234"</tag>
        <tag>$.ContactCellPhone._value="5554321"</tag>
    </item>
</rule>


Using the previously illustrated rule, a successful recognition of the utterance "John Smith" produces the following SML output.

<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
    <ContactName confidence="0.805">John Smith</ContactName>
    <ContactOfficeExtension confidence="0.805">1234</ContactOfficeExtension>
    <ContactCellPhone confidence="0.805">5554321</ContactCellPhone>
</SML>


Because the content of tag elements is ordinary script, you can also contain multiple expressions within a single tag element, provided that the expressions are delimited using semicolons. In other words, the results of the following grammar markup are equivalent to those of the previously illustrated grammar markup.

<rule id="Contacts" scope="public">
    <tag>$.ContactName={};$.ContactOfficeExtension={};$.ContactCellPhone={}</tag>
    <item>John Smith
        <tag>$.ContactName._value="John Smith";
             $.ContactOfficeExtension._value="1234";
             $.ContactCellPhone._value="5554321"
        </tag>
    </item>
</rule>


Although using multiple tag elements that each contain a single script expression requires slightly more memory, the effect on performance is usually indiscernible. On the other hand, using multiple tag elements can make your code easier to read.

Example 2: Returning Semantic Values as Attributes of the Top-Level SML Node

The following example illustrates how you can store semantic values in the _attributes property of a custom-defined property so that these semantic values are returned as attributes of the child node. In this example, the script expression contained in the first tag element initializes the custom property as an object, the second expression sets the semantic value of the property itself, and the remaining expressions set attribute properties of the object.

<rule id="Contacts">
    <tag>$.ContactInfo={};</tag>
    <item>John Smith
        <tag>$.ContactInfo._value="John Smith"</tag>
        <tag>$.ContactInfo._attributes.officeExtension="1234"</tag>
        <tag>$.ContactInfo._attributes.cellPhone="5554321"</tag>
    </item>
</rule>


Using the previously illustrated rule, a successful recognition of the utterance "John Smith" produces the following SML output.

<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
    <ContactInfo confidence="0.805" officeExtension="1234" cellPhone="5554321">
        John Smith
    </ContactInfo>
</SML>

 

Example 3: Returning Semantic Values in an Array

The following example illustrates how you can store semantic values in a custom-defined property that is initialized as an array so that these semantic values are returned in a series of item elements in the SML output. In this example, the script expression contained in the tag element initializes the custom property as an array containing the semantic values for the contact's office phone extension and cell phone number.

<rule id="Contacts">
    <item>John Smith
        <tag>$.ContactInfo=["1234", "5554321"]</tag>
    </item>
</rule>


Using the previously illustrated rule, a successful recognition of the utterance "John Smith" produces the following SML output:

<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
    <ContactInfo confidence="0.805">
        <item confidence="0.805">1234</item>
        <item confidence="0.805">5554321</item>
    </ContactInfo>
</SML>

 

Conclusion

The tip this month illustrates several ways to write markup in a .grxml grammar so that more than one semantic value is returned in SML output as the result of recognition on a single branch in the grammar. For a more thorough discussion of semantic interpretation markup, see the "Semantic Interpretation Markup" section in the MSS Help documentation. For additional examples, see the "SML Reference" section.

Q.Why do my RuleRefs use absolute file paths?
A.

When adding RuleRef elements to grammars in Speech Application SDK version 1.1, you might have wondered why the paths are absolute rather than relative. After all, absolute paths mean there is more work involved if you move the grammar to a different folder. For example, you might want to use a single grammar in multiple applications or want to change folders on the production server.

There is a simple explanation and an easy way to choose relative paths or absolute paths for grammar rule references. If you open a grammar as a stand-alone file, the paths in RuleRef elements are absolute paths. However, if you open a grammar in a speech project, the paths in RuleRef elements are relative paths.

To get absolute paths for grammar rule references:

1.

In Visual Studio .NET 2003, click New on the File menu, and then click File.

2.

In the New File dialog box, select Speech in the Categories pane.

3.

In the Templates pane, select Grammar File, and then click Open.

4.

In Grammar Explorer, double-click Rule 1.

5.

In the Rule Editor, add a List element.

6.

Add a RuleRef element to the Phrase element.

7.

Right-click the RuleRef element, select Set Target Rule, and then click Browse.

8.

In the Open Grammar File dialog box, select a grammar file, and then click Open.

9.

In the Rule Browser dialog box, select a rule, and then click Set Target Rule.

In the Properties window, notice that the URI property is now set to a value similar to file:///C:MyGrammarFiles/TestGrammar.grxml#InvoiceRule. This is an absolute file reference.

To get relative paths for grammar rule references:

1.

In Visual Studio .NET 2003, open a speech application.

2.

In Solution Explorer, double-click a grammar file in the Grammars folder.

3.

In the Rule Editor, double-click a rule.

4.

In the Rule Editor, add a RuleRef element to an element already present in the designer.

5.

Right-click the RuleRef element, select Set Target Rule, and then click Browse.

6.

In the Pick a Grammar URL dialog box, click Browse, select a grammar file, click Open, and then click OK.

7.

In the Grammar Editor warning dialog box, click Continue.

8.

In the Rule Browser dialog box, select a rule, and then click Set Target Rule.

In the Properties window, notice that the URI property is now set to a value similar to TestGrammar.grxml#InvoiceRule. This is a relative file reference.

You can easily choose whether grammar rule references are absolute or relative. Absolute references are created in stand-alone grammar files. Relative rule references are created in grammar files contained in a speech project.

Q.How can I enter and record prompt text in the prompt database?
A.

You know Microsoft Speech Application SDK 1.1 has great tools, but you probably don’t know it can automatically populate your prompt database for you. You might be entering prompts manually into a prompt database after the prompts are added to an application. There's an easier way. Use prompt validation to identify all the prompts in your application, and then click Add All to Database to automatically populate the transcription and extraction windows. When that's done, just click Record All to record your prompts.

To automatically populate a prompt database:

1.

Create a new speech Web application.

2.

Add a QA control, and then add a prompt to the control.

3.

Open the prompt database.

4.

On the Prompt Editor toolbar, click Prompt Validation.

5.

On the Prompt Validation toolbar, click Do Validate Solution.

6.

In the Solution Prompt Validation dialog box, select the project, and then click OK.

7.

On the Output Window toolbar, click Add All to Database.

8.

Select a row in the prompt database, and then click Record All on the Prompt Editor toolbar.

Prompt validation finds all the prompts that could possibly be called by the application. To automatically add the missing prompts to the database, click Add All to Database.

Q.How do I set TTS volume and speed?
A.

You may want to make a global change to the speed or volume of text-to-speech (TTS) prompts in your application. Altering speed or volume is easy to do by changing parameters in the Speechify configuration files, and then restarting the Speechify Voice service.

To set TTS volume and speed:

1.

In a text editor, open the configuration file for the application's TTS voice. For example, the Jill voice configuration file for English (United States) applications is Ojill8.xml. By default, Ojill8.xml is located at the path Program Files\Common Files\SpeechEngines\ScanSoft\Speechify\en-US\jill

2.

To change the TTS speed rate, find and change the value attribute for tts.audio.rate. The default value is 100.

3.

To change the TTS volume, find and change the value attribute for tts.audio.volume. The default value is 30.

4.

Restart the appropriate Speechify Voice service as described in the following steps.

5.

On the Windows® taskbar, click Start, point to Administrative Tools, and then click Services.

6.

In the Services pane right-click Speechify Voice - voice, where voice is the voice service you want to restart, and then click Restart.

Use the ssml:prosody element to change the speed and volume of individual prompts. Use the TTS voice's configuration file to make a global change to TTS characteristics.

Q.How do I resolve a 401 Error with Telephony Application Services?
A.

In Microsoft® Speech Server (MSS) 2004 R2, requests from Telephony Application Services (TAS) to Speech Engine Services (SES) may result in the following error, seen in the Application Log in Event Viewer.

"A call failed because SES URL 'http://<application>/SES/Lobby.asmx' could not be found. Please ensure that the TAS SpeechServer property is correct. The following error was returned: 80131509: 'The request failed with HTTP status 401: Unauthorized. (System.Net.WebException)'."

Internet Information Services (IIS) authentication settings may be changed unexpectedly by updates. This may require a manual change to restore the desired settings. See the following procedure.

To reset Windows authentication:

1.

On the Windows taskbar, click Start, right-click My Computer, and then click Manage.

2.

In the Computer Management dialog box, in the tree view pane, expand Services and Applications, expand Internet Information Services, and then expand Web Sites.

3.

Under Web Sites, browse to the MSS application directory, right-click the application directory, and then click Properties.

4.

In the application's Properties dialog box, on the Directory Security tab, locate Authentication and access control, and then click Edit.

5.

In the Authentication Methods dialog box, locate Authenticated access, clear the Integrated Windows authentication check box, and then click OK.

6.

Once again, in the application's Properties dialog box, on the Directory Security tab, locate Authentication and access control, and then click Edit.

7.

In the Authentication Methods dialog box, locate Authenticated access, and then select the Integrated Windows authentication check box.

8.

Click OK twice to return to the Computer Management dialog box.

Clearing Integrated Windows authentication and applying the change, and then setting it back and applying the restoration, resets IIS so that Lobby.asmx is accessible.

Q.How do I easily compare widely-separated prompt database values?
A.

A Microsoft Speech Server application prompt database contains sixteen columns, and can contain thousands of rows of data. You might want to compare the value in the first column of the first row, with the value in the sixteenth column of a row hundreds or thousands of rows distant from the top row. You could open two instances of Visual Studio and see two separate views of the prompt database, but there's an easier way, explained in this tip.

To view widely-separated prompt database fields:

1.

In Visual Studio, open a prompt database. Select the first row.

2.

Make sure the Properties pane is visible. To open the Properties pane, press F4. Note that the values for all sixteen columns of the selected row are visible in the Properties pane.

3.

Use the scroll bars to navigate to the last row in the database. Without selecting the last row, scroll left and right to view all columns in the last row. Note while you can freely scroll to view any column in any row, all values from the selected row are still visible in the Properties window.

Use the Properties window in combination with the Prompt Editor pane to display fields that otherwise are not visible at the same time.

Q.When I view an event log for a Microsoft Speech Server (MSS) system in a different time zone, the times shown for the events in the Event Viewer are not correct. How do I view the correct local time for the events as they occurred on the remote computer?
A.

If you need to troubleshoot a problem on an MSS system on a remote computer, you may need to view the event log for that remote system on a computer that is not in the same time zone. When you do this, the times shown for the events in the Event Viewer are offset by the difference in the time zones. For example, an event that occurred at 1:00 A.M. on a remote computer in the U.S. Eastern time zone, would appear to have occurred at 10:00 P.M. (a 3-hour difference) if viewed on a different computer in the Pacific time zone. This difference occurs for two reasons:

Event times are stored in the event log file (a file with an .evt extension) in Coordinated Universal Time (UTC), a time scale similar to Greenwich Mean Time (GMT).

Event Viewer calculates the event time recorded in the log file relative to the time zone of the computer used to view the event log, not the computer that originated the events.

This Event Viewer behavior can cause considerable confusion when the exact local time of the event is an important part of troubleshooting the problem. To show a more realistic picture of when events occurred, set the time zone on the local computer to match the time zone of the remote computer. This action forces Event Viewer to calculate event times relative to the time zone of the computer on which the events occurred.

To set the time zone:

1.

On the local computer, click Start, click Control Panel, and then double-click Date and Time.

2.

Click the Time Zone tab and then choose the remote computer time zone from the drop-down list.

3.

Click Apply and then click OK.

Note: When you are done viewing events that originated on the remote computer, set the local computer time zone back to the correct local time.

An alternative solution is to save the event log on the remote computer in text file format (a file with a .txt extension) or comma-separated value format (a file with a .csv extension). This solution causes Event Viewer to write the actual local time of the events to the file being saved, instead of the UTC.

To save an event log as a .txt or .csv file:

1.

In the Event Viewer console tree, click the log you want to save.

2.

On the Action menu, click Save Log File As.

3.

In File name, enter a name for the archived log file.

4.

In Save as type, select the .txt or .csv file format, and then click Save.

Q.How does the speech recognition engine in Microsoft Speech Server 2004 treat abbreviations, digit strings, dollar amounts, etc?
A.

To recognize the words and phrases specified in a grammar, the speech recognition (SR) engine in Microsoft Speech Server 2004 needs to look up the pronunciation of each word in the grammar. If your grammar contains abbreviations like "Mr. Smith", digit strings like "123" or dollar amounts like "$34.05" the SR engine first converts these strings into one or more unambiguous sequence of words in a process called text normalization. For example the speech recognition engine converts the string "123" into the word sequence "one hundred and twenty three". Once the string is converted the SR engine can then look up the pronunciation of each individual word and use this in the recognition process. Converting a string like "123" is not necessarily as straightforward as turning it into "one hundred and twenty three" though. Other valid interpretations might be "one two three", "hundred twenty three", "one twenty three" or "twelve three". Similarly the abbreviation "Dr." might mean "Doctor" or "Drive" and its correct interpretation is based on context, which can be complicated to determine.

Therefore it is always better to spell out phrases like abbreviations, digit strings, or dollar amounts in your grammar explicitly rather than rely on the SR engine to guess the appropriate phrase for them.

Below is a list of examples that show how the SR engine will normalize phrases for US English in Microsoft Speech Server 2004. Some examples have multiple normalized forms and in this case all are used as valid phrases in the grammar. This list is not exhaustive, but is meant to cover the most frequent and/or interesting cases:

Numbers

Less than 1,000
ExampleNormalized Form

925

nine hundred twenty five

925

nine hundred and twenty five

1000 to 9999 (because they could be interpreted as years)
ExampleNormalized Form

1905

nineteen oh five

2002

two thousand two

1500

fifteen hundred

10,000 to 4,000,000,000
ExampleNormalized Form

12340

one two three four oh

12,340

twelve thousand three hundred forty

12,340

twelve thousand three hundred and forty

Above 4,000,000,000
ExampleNormalized Form

12345678910

one two three four five six seven eight nine one oh

Decimal

ExampleNormalized Form

92.5

nine two point five

92.50

nine two point five oh

12345.6

one two three four five point six

12,345.6

twelve thousand three hundred forty five point six

Dollar Amounts

ExampleNormalized Form

$35.23

thirty five dollars and twenty three cents

$1

one dollar

$0.50

fifty cents

$45,000

forty five thousand dollars

$45000

dollar four five oh oh oh

Abbreviations
These are case insensitive.

ExampleNormalized Form

assoc

association

bldg

building

ch

chapter

cont

continued

cont

cont

corp

corporation

corp

corp

etc

etcetera

intl

international

jr

junior

mr

Mister

mrs

Missus

miss

Miss

mt

mountain

oz

ounce

oz

oz

pres

president

pres

pres

sec

S. E. C.

sec

second

sec

seconds

sq

square

sq

S. Q.

sr

senior

sr

S. R.

vol

volume

vol

vol

Symbols

ExampleNormalized Form

!

exclamation-point

"

quote

#

pound-sign

$

dollar

%

percent

&

ampersand

'

quote

(

paren

)

close-paren

*

asterisk

+

plus

,

comma

--

double-dash

-

hyphen

...

ellipsis

.

dot, period

/

slash

:

colon

;

semicolon

<

less-than

=

equals

>

greater-than

?

question-mark

@

at-sign

[

bracket

\

back-slash

]

close-bracket

^

circumflex

_

underscore

`

back-quote

{

left-brace

|

vertical-bar

}

right-brace

~

tilde

Q.How can I get my JScript files to support multiple character sets?
A.

In multilanguage projects, JScript files from any editor must be saved as Unicode (UTF-8 with signature) - Codepage 65001. In particular, when saving JScript files in Visual Studio .NET 2003 this selection must be made every time the file is saved, or the setting will be incorrect. If this is not done one possible result is that extended characters are incorrectly stripped from strings.

Visual Studio provides a setting that makes this the default setting whenever JScript files are saved. See the following procedure for details.

To set JScript file encoding in Visual Studio .NET 2003

1.

In Visual Studio .NET 2003, on the File menu select Advanced Save Options.

2.

In the Advanced Save Options dialog box, in the Encoding list select Unicode (UTF-8 with signature) - Codepage 65001.

3.

Click OK.

Q.How Can I Record Messages Longer Than 20 Seconds?
A.

In Microsoft Speech Server 2004, use the RecordSound control to record user speech. When you use the RecordSound control, you'll find that by default, recording ends after 20 seconds. If you want to record messages longer than 20 seconds there are three properties you can set to increase the timeout.

The EndSilence, BabbleTimeout, and MaxTimeout properties interact with each other to set the recording timeout. The default values for these properties are listed in the following table.

PropertyDefault Value

EndSilence property

1000 milliseconds

BabbleTimeout property

20000 milliseconds

MaxTimeout property

120000 milliseconds

The three properties interact in the following ways:

The EndSilence property sets the maximum length of any silent period after the time when the user starts speaking. Use EndSilence to determine the end of user speech.

The BabbleTimeout property sets the maximum time for recording the user's speech, beginning at the point that speech is detected.

The MaxTimeout property sets the maximum total time that can be recorded and must be equal to or greater than the sum of EndSilence and BabbleTimeout.

For example, assuming that the EndSilence and MaxTimeout properties are at their defaults, to record a message up to 30 seconds the only change needed is to set the BabbleTimeout property to 30000.

When the value of any of these properties is exceeded, recording ends and a file of the type specified by the Type property is written to the folder specified by the SavePath property. If the values of the BabbleTimeout or MaxTimeout properties are exceeded, the recording is only written to the file if the SavePartialRecording property is set to True.

Q.How Do I Send a Fax Using an ASP.NET Speech-Enabled Web Application?
A.

Sending a fax using an ASP.NET speech-enabled Web application is as easy as sending a fax using a non-speech-enabled ASP.NET Web application. This article briefly discusses the major tasks required to create a speech-enabled fax-back application, provides fax service implementation details, and points out several security issues.

Task Overview
In this scenario, we would like a customer to be able to call our fax-back service, listen to a list of document titles, choose a document, and receive a faxed copy of the document at a fax number that the customer provides. In order to do this, the application must be able to perform the following tasks:

1.

Load the document titles, present the list, and get the customer's selection.

2.

Elicit and confirm the fax number to which the customer wants the document sent.

3.

Fax the requested document.

Using the Microsoft Speech Application SDK (SASDK), you can easily accomplish the first two tasks. The SASDK includes Application Speech Controls that are well suited for these tasks.

Use the DataTableNavigator Speech Control to accomplish the first task. For a simple implementation, read the document titles directly from an XML file into a DataSet, and then bind the DataSet to the DataTableNavigator control. For a more sophisticated implementation, select the document information from a database table, fill a DataSet, and then bind the DataSet to the DataTableNavigator control. Alternatively, you can construct the document by using pieces of information from various sources.

For the second task, use the Phone Speech Control to get and confirm the customer's fax number. Assuming that the actual faxing is performed in a server-side event handler such as Page_Unload, the responses provided by the customer (such as selected document title and fax number) must be posted back to the server. The easiest way to do this is to enable AutoPostBack in the speech SemanticMap properties for each of the semantic items corresponding to the customer's responses (for example, document title, area code, and local number).

Implementing the Fax Service
Faxing information is the most challenging task because .NET Framework does not provide fax services. To work around this problem, use the Fax Service Extended COM API with early binding via the .NET COM Interop. Although you can use .NET reflection and late binding to access these COM objects, early binding yields better performance and makes the job of writing the code easier as well.

In order to use the Fax Service Extended COM API, you must first use Windows Setup to install the Fax Service component on the host computer. Once installed, for computers running Microsoft Windows XP and Windows Server 2003, the Fax Service is provided by the file fxscomex.dll, which is usually found in the Windows\System32 directory. If you are creating your page with Microsoft Visual Studio .NET 2003, add a reference to this DLL in your project so that Visual Studio imports the DLL's COM objects as .NET classes. If you are using a program other than Visual Studio, create the import library with the TLBIMP utility. In either case, make sure that the import library is in the bin subdirectory of your fax-back application's Web host virtual directory so that the application can automatically compile it into the assembly when it is first accessed.

The Fax API requires a physical file path to the document file that is to be faxed. If your documents are stored in a subdirectory of the Web host virtual directory, you can map document files to a physical file path using the MapPath() function that your fax-back application inherits from the Page class as follows:

String DataPath = this.MapPath(".");

The following code illustrates how to fax the document file. In this example, the fax-back application uses a dialing prefix to place a call outside of the fictitious company Proseware, and uses a remote fax server to send the fax.

FaxDocument  objFaxDoc = new FaxDocumentClass();
FaxServer objFaxServer = new FaxServerClass();

objFaxDoc.Body = String.Format(@"{0}\{1}", ProsewareDataPath, SelectedDocFile);
objFaxDoc.DocumentName = "Proseware FaxBack Document";
objFaxDoc.Priority = fxscomexassembly.FAX_PRIORITY_TYPE_ENUM.fptNORMAL;              
   // 0 == low, 1 == normal, 2 == high

string dialoutPrefix = "9";	
string faxRecipientNumber = String.Format("{0}{1}",
	dialoutPrefix,
	siFaxNumberLocalDigits.Text);

objFaxDoc.Recipients.Add(faxRecipientNumber, "Proseware Customer");
   // Adds the fax phone number and the name of addressee.

objFaxDoc.ReceiptType = fxscomexassembly.FAX_RECEIPT_TYPE_ENUM.frtNONE;         
   // 0 == no receipt, 1 == e-mail, 4 == message box

objFaxDoc.CoverPageType = fxscomexassembly.FAX_COVERPAGE_TYPE_ENUM.fcptLOCAL;         
   // 0 = no cover page, 1 = local cover page, 2 = server cover page

objFaxDoc.CoverPage = String.Format(@"{0}\Proseware.COV", ProsewareDataPath);       
   // The path to the cover page file. See MS Fax Server Cover Page editor.

objFaxDoc.Note = "Here is the document you requested.";    
   // The text of the note printed on the cover page.

objFaxDoc.ScheduleType = fxscomexassembly.FAX_SCHEDULE_TYPE_ENUM.fstNOW;        
   // 0 == "now" (as soon as possible), 1 = scheduled time, 
   // 2 = discounted period. See FaxOutgoingQueue.DiscountRateStart, etc.

objFaxDoc.Subject = String.Format("The document you requested: \"{0}\"", siSelectedDocName.Text);

   // All of the following lines set sender information:
objFaxDoc.Sender.Title = "Mr.";
objFaxDoc.Sender.Name = "Great Docs Fax Robot";
objFaxDoc.Sender.City = "Redmond";
objFaxDoc.Sender.State = "WA";
objFaxDoc.Sender.Company = "Proseware, Inc.";
objFaxDoc.Sender.Country = "USA";
objFaxDoc.Sender.Email = "FaxBackRobot@proseware.com";
objFaxDoc.Sender.FaxNumber = "11234567890";
objFaxDoc.Sender.HomePhone = "10987654321";
objFaxDoc.Sender.OfficeLocation = "Redmond";
objFaxDoc.Sender.OfficePhone = "12223334444";
objFaxDoc.Sender.StreetAddress = "Great Documents Library\nRedmond, WA 98052";
objFaxDoc.Sender.TSID = "ProsewareFAX";
objFaxDoc.Sender.ZipCode = "98052";
objFaxDoc.Sender.BillingCode = "NCC1701C";
objFaxDoc.Sender.Department = "Library Fax Support";

objFaxDoc.Sender.SaveDefaultSender();  
   // This saves the sender information for reuse if you want to send 
   // the document to multiple recipients using the same sender information. 

objFaxServer.Connect(@"REMOTEFAXSERVER01");
objFaxServer.Connect(@"REMOTEFAXSERVER01");
   // Connects to the fax server. See the second note following this code
   // sample for an explanation of why this method is called twice.

objFaxDoc.ConnectedSubmit(objFaxServer);
objFaxServer.Disconnect();

Note: Only computers running Windows Server 2003 can accept fax requests from remote client computers. If the fax server is running on a computer that is running Windows Server 2003, remote fax client computers cannot access the fax server through the Fax Services Extended COM API until you:

1.

Share the "fax printer" on the computer running Windows Server 2003.

2.

Add the "fax printer" to the remote fax client computer.

Note: A known bug in the FaxServer.Connect(FaxServerName) method causes the method to always creates a connection to the local fax server, even if a remote fax server is specified in the parameter, and even though the call appears to complete normally and the FaxServer.ServerName property returns the name of the remote fax server. If the computer on which the fax-back application is running is not running a fax server (in other words, if there is not a local fax server), the subsequent FaxServer.ConnectedSubmit() call fails and throws an exception. To work around this problem, connect to a remote fax server by calling the FaxServer.Connect(FaxServerName) method twice as illustrated in the previous code sample.

Fax-back Application Security Issues
As with any application, and particularly with an ASP.NET application, pay special attention to security issues. First, because the host page is executing server-side code (in its own account and security context) on behalf of an untrusted client, be careful not to expose sensitive information such as physical file paths and computer names to the client side.

Second, be aware that Fax Service faxes a document file by "printing" it as a temporary TIFF image using a Windows application that is:

Installed on the host computer.

Associated with the TIFF file type.

Set so that the ASP.NET process has rights to use it.

If the Windows application associated with the TIFF file type is compromised or replaced, "printing a fax" could compromise the host computer. For this reason, always ensure that the Windows TIFF application is properly protected and that the ASP.NET host process runs in a minimal security context.

Third, if you use a fax server computer that is independent of the ASP.NET host computer (the remote fax client computer), the remote fax client computer requires security rights to access the fax server, but not necessarily rights to anything else on the fax server computer. Always apply the Rule of Least Privilege: Grant only the minimum access needed to get your job done.

Be aware that by default, ASP.NET Web pages run in an application pool under the "Network Service" identity. The access permissions for this account are not sufficient to establish a connection to a remote fax server. Therefore, if you are developing a fax-back application that utilizes a remote fax server, you must do the following:

1.

Create an application pool that runs in an account that is a member of the Users group (or a higher-privilege group if required).

2.

Add the account (to which you assigned the application pool) to the IIS_WPG group. The IIS_WPG group is the group of accounts that can run the IIS working process required for a remote fax server connection.

Q.How do I use Speech Application Error pages?
A.

To gracefully respond to unexpected errors in voice-only applications on Microsoft® Speech Server 2004 (MSS), use custom error pages. MSS uses two types of error pages: application and system error pages. When an unexpected Speech Control error occurs, the application error page runs. If a more serious error occurs, the system error page runs. This tip provides information about how to create and specify these two types of error pages.

Using the Application Error Page
The application error page might be a custom error page or the default error page. The default application error page, DefaultErrorPage.aspx, is installed by the Microsoft Speech Application SDK (SASDK) at \Inetpub\wwwroot\aspnet_speech\<build number>\client_script. The default application error page plays one of several text-to-speech (TTS) prompts, depending on the nature of the error.

If the quality of TTS messages is acceptable but different message text is needed, rename and edit the default application error page. It is best not to edit the default page itself because it is a resource used by all speech applications on the Web server. If the quality of TTS messages is not adequate, replace the default page and its TTS prompts with a custom application error page containing QA controls that play recorded messages from the prompt database.

Specifying a Custom Application Error Page
To specify a custom application error page, use the appSettings tag in the Web.config file, as shown in the following example.

<configuration>
    <appSettings>
        <add key="errorpage" value="ErrorPage.aspx" />
    </appSettings>
</configuration>

Note By default, the Web.config file is located in the application's root folder and is visible in the Solution Explorer window.

Using the System Error Page
Provide a system error page to prevent calls from disconnecting without warning in the event of an unexpected error that the platform cannot recover from, including the following:

Web server HTTP errors, such as 404 and 500 errors

Application page errors, such as a failure to create the Document Object Model or a failure to compile inline JScript® code

JScript run-time errors

Problems with SES requests, such as SES becoming unavailable during a call

Specifying a System Error Page
There are two ways to specify the system error page: using the Microsoft Management Console (MMC) snap-in for MSS and using the error page meta tag.

To specify the system error page in the MMC, use the Global Error Page URL setting. For more information, see Adding a Speech Application in the MSS Help file, MSS.chm.

To specify the system error page on an .aspx page, use a meta tag as in the following example.

<meta http-equiv="error-page" content="http://MyServer/MyApplication/SystemErrorPage.html"/>

The error page setting, whether made in the MMC or using the meta tag, specifies the error page that is stored in the cache. Only one system error page per SALT interpreter is stored in the cache. The setting persists until it is overridden. If the page is specified in the MMC, that setting lasts until a meta tag on an application page overrides it. The new setting persists until a different setting is encountered in a subsequent navigation.

Creating a System Error Page
The system error page should be designed so that it does not rely on external services such as Speech Engine Services (SES) and the Web server. To do this, provide a .wav file containing the error message. To play the .wav file, use SALT elements in the .html file specified in the application start page meta tag described previously. See the following example.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html xmlns:SALT="http://www.saltforum.org/2002/SALT">
<head>
<meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
<meta name=ProgId content=VisualStudio.HTML>
<meta name=Originator content="Microsoft Visual Studio .NET 7.1">

<object id="Speechtags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC" VIEWASTEXT></object>
<?import namespace="salt" implementation="#Speechtags" ?>
<SALT:prompt id="SystemErrorPrompt">
  <SALT:content id="PromptContent" href="http://myServer/SystemError.wav" />
</SALT:prompt>

<script language=jscript>
function fnOnLoad()
{
  SystemErrorPrompt.Start();
}
</script>

</head>
<body onload="fnOnLoad()">
</body>
</html>

The .wav file must be 8-kHz mono a-law or mu-law compressed audio depending on the telephony standard of the locale where it is used. For better quality audio, use 8-kHz 16-bit PCM. Store the system error page on the Web server.

Conclusion
Error pages that play .wav files are a good alternative to the default SASDK error page or to calls disconnecting without warning. Use the described techniques to specify and create application and system error pages. Remember that system and application error pages have different scopes, respond to different types of errors, and are specified separately.

Q.How do I reduce false barge-in issues caused by prompt echo?
A.

Barge-in is an important feature of Microsoft® Speech Server 2004 (MSS) that allows the caller to interrupt a prompt. One of the main benefits of barge-in is that prompts can be designed so that novice users get sufficient guidance, while repeat users can quickly move through the application. However, using barge-in can sometimes be problematic. One of the main sources of barge-in problems is the presence of prompt echo. It can cause prompt playback to suddenly stop without the caller's intervention (false barge-in) and usually results in an incorrect recognition by the system.

This article discusses the causes of prompt echo, how to verify prompt echo by using log analysis tools, the steps you can take to reduce prompt echo, and finally, issues that should be considered when disabling barge-in.

Causes of prompt echo
Prompt echo is caused by one of two conditions:

An impedance mismatch at an analog connection point, which leads to a partial reflection of the prompt signal. This is typically created in one of two places:

Between an analog handset used by the caller and the Central Office,

Between an analog telephony card and the Private Branch Exchange (PBX).

Acoustic echo between the telephone speaker and the telephone microphone. In this case, the microphone hears the prompt being played by the speaker. This typically occurs when using speaker phones. Speaker phones may employ echo-elimination techniques such as half-duplex (in which speaker and microphone take turns and are not turned on at the same time) or acoustic echo cancellation.

False barge-in caused by an analog handset only occurs only on some calls. Only some analog handsets will produce a significantly loud enough echo to cause false barge-in. Also, it is generally only an issue for local calls because telephony service providers are required to provide network echo cancellation on long-distance telephone calls, which reduces the echo enough to avoid false barge-in. The length of the delay of the prompt echo increases with the distance of the echo source. Therefore, the prompt echo delay is often greater than what the telephony card echo canceller (available on both analog and digital telephony cards) can effectively eliminate.

In contrast, echo caused by the connection of an analog telephony card and the PBX typically results in consistent barge-in regardless of the type of phone that is used to call the system. Some amount of echo caused by the connection is unavoidable. Therefore, analog telephony cards are equipped with an echo-canceling feature that can usually significantly reduce the echo caused by the analog connection to the PBX. However, if there is a significant impedance mismatch between the telephony card and the PBX, the echo may be so strong that it cannot be sufficiently removed. In this case, contact your Intel representative to get assistance. For more information, see the Intel telecom support resource document titled Alternate Impedance and Gain settings for the DMV160LP and D/41JCT Boards.

Verifying the presence of echo by using the log analysis tools
You can easily verify the presence of significant echo by logging the recognition audio and playing it back.

You can turn on recognition audio logging by opening a command window on the computer running MSS, and changing the directory to the following folder:

%programfiles%\Microsoft Speech Server\Administrative Tools\Scripts

and running the MSSLogConfig.vbs file, using the following command:

cscript MSSLogConfig /activate /filter:RecognitionAudio

To play back the audio data, use "Microsoft Log Analysis Tools for Speech Applications" which is found as a redistributable installer, in the Microsoft Speech Application SDK Version 1.1 (SASDK). After a few calls have been received, files containing the logged audio can be extracted by using the MSSContentExtract log analysis tool. The audio files can be played in most standard audio players. If significant prompt echo is present it will be audible in the logged audio files.

For more information on how to setup logging and use the MSSContentExtract tool, see "Log Analysis Tools" in SASDK Help. The Help file also describes how to set up additional logging and how to use the CallViewer tool, which enables you to analyze the events logged by MSS.

Reducing echo
If echo caused by an analog handset is creating problems with your speech application, there are a few things you can do to reduce prompt echo:

Reduce the prompt volume to as low a level as possible, since this in turn lowers the volume of the prompt echo.

Use a toll-free number, since telephony service providers generally use network echo cancellation on all calls to such a number. Check with your telephony service provider to verify that it uses echo-cancellation with toll-free numbers.

Use an external echo canceller to reduce echo. An external echo canceller is a hardware device that can be inserted between the PBX and the telephony card. Since it is mainly used by network providers to provide network echo cancellation, it is only available for T1 connections.

Disabling barge-in
In most case the procedures outlined above will enable you to avoid false barge-ins caused by prompt echo. However, sometimes it is necessary to disable barge-in, because it is important to guarantee that the caller hears the entire prompt. This requires special care to avoid recognition problems that can occur when a caller speaks too quickly right after a prompt has been played.

When barge-in is disabled, the system does not begin to listen to the caller until it finishes playing the prompt. If the caller begins speaking before the completion of the prompt, the system will miss the beginning of the caller's utterance. Additionally, if the prompt ends with a silence, the silence may make the caller think the system is ready for a response when it is not. This timing is critical since callers often speak immediately after they think the prompt is finished. Responses that are partially cut off by the caller will cause misrecognitions, and completely missed responses will make the system seem unresponsive. Both of these errors can occur without obvious reason to the caller.

One method of ensuring proper timing is to set the beep property on prompt elements. This will cause the telephony card to play a beep right before it starts listening to the caller. This turn-taking cue is quickly picked up by users and they adapt their behavior to speak only after hearing the beep. Since this beep is generated by the telephony card, it ensures that the system is listening right after the beep, therefore eliminating any timing issues between the end of the prompt and the start of the listen.

Conclusion
Barge-in is an extremely useful feature that you can use to accommodate both novice and experienced callers. Being able to recognize, and possibly mitigate, the types of false barge-in issues that prompt echo can introduce into your applications gives you an advantage in developing more effective speech solutions.

Q.How do I export a .wav File From a Prompt Database?
A.

It is well known that you can use the Speech Prompt Editor in Microsoft Speech Application SDK Version 1 to import .wav files to a prompt database. Many users don't realize that it's also possible to export .wav files by using the Wave Editor, which is a tool included with the Speech Prompt Editor.

To export a .wav file from a prompt database

In the Speech Prompt Editor, double-click a .wav icon in the Transcription pane. The Wave Editor opens, displaying wave boundaries and tuning alignments for the selected .wav file.

In the Wave Editor, on the File menu, click Save Prompts.promptdb As.

In the Choose File Format dialog box, select a recording format and sampling frequency, and then click OK.

In the Save Copy As dialog box, select a save location, type a file name, and then click Save.

Q.How Can I Record Prompts for Application Speech Controls?
A.

You can spend significant time and money getting the prompts for a speech application recorded by professional voice talent. Then, if you add Application Speech Controls to the project, you find that the default prompts for those controls play as text-to-speech (TTS). As a result, application user experience is inconsistent, with users hearing a mixture of both professionally recorded prompts and TTS prompts.

Application Speech Controls are a valuable tool for speech application development. They make it easy to add frequently used functionality, such as collecting phone numbers and dates, to an application. The challenge in this case is to get Application Speech Control prompts to speak in the same voice as the rest of the application.

Speech Prompt Editor offers an easy and convenient, but little-known solution: import the transcriptions from the Application Speech Control, add them to the prompt database, and then record the transcriptions using the same voice talent you use to record the rest of the application prompts.

To import transcriptions from an Application Speech Control

1.

In Solution Explorer, open a prompt project, and then double-click a prompt database

2.

In Speech Prompt Editor, select Add New Item on the File menu

3.

In the Add New Item – <prompt project name> dialog box, select Prompt Project Items in the left pane

4.

In the right pane, select the template for the appropriate Application Speech Control

5.

Click Open

Conclusion
Application Speech Controls offer valuable functionality to speech application developers. To make these controls fit seamlessly into an application that uses recorded prompts, it is important to record the prompts in the Application Speech Control using the same voice as the other prompts. To do this, import the Application Speech Control prompt transcriptions into the prompt database, where they can then be edited and recorded in common with the application's other prompts.

Q.How do I use the Speech Application SDK Log Player to Make Testing and Debugging Easier?
A.

Log Player allows you to record and play back application debug sessions. Log Player is installed, by default, as part of the Microsoft Speech Application SDK in the \Program Files\Microsoft Speech Application SDK 1.0 folder.

Use Log Player as a shortcut to reach interesting points in a dialogue
Some parts of dialogues that need attention during debugging or testing might be difficult to reach, either because of the time needed to step through the dialogue or because of the need to provide specific inputs. Use Log Player to create a shortcut to a selected point in a dialogue or to reproduce a specific set of conditions for a test case. Run the application once to get to the desired state or location. Close Speech Debugging Console, and the log file is saved. Replay the log file to continue debugging manually at that point.

Use Log Player for regression testing
Use Log Player to detect changes to an application. When portions of a dialogue are complete and should not change, record a log file. Then, replay the log file periodically. If the application has changed, Log Player returns an error or a warning.

Use caution when recording dialogues that are still in development. If those dialogues change, the log files that contain them break and require re-recording, which can be time-consuming if numerous log files are affected.

Saving and Replaying Log Files
For information about saving and replaying log files, see the following two procedures.

To save a debug session log

1.

In Visual Studio .NET 2003, click Options on the Tools menu.

2.

In the left pane, select Speech Application SDK, and then select Speech Debugging Console.

3.

Under Logging, select Record Log Files.

4.

Click Browse, and then select a folder.

5.

Click Open.

When recording, log files are opened when the application is started in debug mode and closed when debugging stops. File names are time stamps and 14 digits in length. To prevent confusion, it is a good idea to give log files a descriptive name as soon as possible. Do not forget to clear the Record Log Files check box, unless you want to continue recording log files. Editing the log files is not recommended; when a dialogue changes, re-record the log file.

To replay a debug session log

1.

Click Start, select All Programs, select Microsoft Speech Application SDK Version 1.0, select Debugging Tools, and then select Speech Debugging Console Log Player.

2.

In Log Player, click Browse.

3.

In the Open A Log File box, select a log file, and then click Open.

4.

On the toolbar, click Start Replay.
Note When trying to identify which Log Player file you want to open, bear in mind that files are named using the format yyyymmddhhmmss.xml.

When replay starts, Speech Debugging Console opens and programmatic output text begins to stream into the Output pane. If the log file ends part way through the application, Speech Debugging Console remains open and ready for you to continue debugging manually at that point.

Simple and Strict Modes
Use Simple and Strict modes to choose how sensitive Log Player is to differences between the log file and the application.

If you want to know about the absolute slightest change, use Strict mode.

If you only want to know about changes that noticeably affect the user (such as changes in a dialogue, prompt, or SML), use Simple mode.

Replaying Multiple Log Files
You can also replay log files in batch mode. Create a batch file listing the log files to be played, and then run Log Player from the command line as described in the following procedure. The batch file lists the log files to be played and specifies a file to contain the batch replay results. Use the following sample and save the text in a file with an .xml file name extension.

<BatchReplay>
  <ResultsFilePath>BatchResults.xml</ResultsFilePath> 
  <Replay Mode="Strict">
    <LogFilePath>c:\SpeechLogs\20030507093859.xml</LogFilePath>
  </Replay>
  <Replay Mode="Strict">
    <LogFilePath>c:\SpeechLogs\20030507095458.xml</LogFilePath>
  </Replay>
</BatchReplay>

To run the batch
Open a command prompt window. Browse to the folder containing LogPlayer.exe. By default, this is \Program Files\Microsoft Speech Application SDK 1.0\SDKTools. Type the following at the command prompt.

LogPlayer myBatchFilePath\myBatchFileName

The log files run in the sequence specified in the batch file. See the results file after the batch finishes. Subsequent replays overwrite the results file unless you change the file name in the batch file.

Conclusion
Using Log Player to record application dialogues helps with automating debugging and testing tasks on the computer running the Speech Application SDK.

Q.How do I troubleshoot DTMF issues?
A.

An overview of DTMF processing
Microsoft Speech Server 2004 (MSS) is designed to recognize both speech and dual tone multi-frequency (DTMF) input. Although speech recognition is often a more attractive option, DTMF recognition is useful in cases such as keeping a PIN private or recognizing a credit card in a noisy environment. As an application runs, the DTMF inputs that the caller enters are stored in an internal DTMF buffer. This buffer is used by MSS to keep track of DTMF key presses and is beneficial for experienced callers who want to type ahead of the prompts. When a speech control with a DTMF grammar is activated, the contents of the DTMF buffer begin to be collected and compared with the DTMF grammar. Collection continues until it can be determined whether or not a match with the grammar exists, and then the collection is cleared. At this point, the remainder of the DTMF buffer is then ready for the next speech control activation, input collection, and DTMF grammar comparison.

Managing the RecordSound control
The DTMF buffer works in the standard way for most speech controls, except for the RecordSound control. Although this control reads from the DTMF buffer if the StopOnDtmf property is set to a value other than DtmfNone, it does not actually remove any DTMF inputs. The unique behavior of the RecordSound control provides a great benefit in this case:

When the caller presses a DTMF input to stop recording, it remains in the buffer so that it can be processed to perform additional tasks by the application. For example, you might want '#' to just mean that the control should stop recording, but '*' to mean cancel recording and reactivate the RecordSound control. You could then read the value of the DTMF input from the buffer with the next speech control that is activated, and then perform the appropriate application logic depending on that value.

It is useful to keep this behavior in mind when considering how to handle application logic for successive speech controls that capture DTMF input. If no processing of the DTMF input is planned, that input still remains in the buffer and could interfere with the processing of the next DTMF grammar. A common method is simply to set the PreFlush property to True on the next speech control, clearing the DTMF buffer of any extra inputs. This will prevent the DTMF input from interfering with DTMF recognition for the next speech control that is activated. For example, if the '#' key stops the recording but the DTMF grammar for the next speech control only recognizes numbers, a "No Reco: Out Of Grammar Key Press" (-13) error will occur.

On the other hand, if it is important that callers retain the ability to type ahead, you must provide additional application logic to account for the extra DTMF input that remains in the buffer. This can be done, for example, by temporarily pausing RunSpeech, pulling out the single DTMF input in the buffer that we wish to discard, and then resuming RunSpeech. The SALT dtmf element can be used to accomplish this task with the following steps:

1.

In the Speech Application SDK (SASDK), set the StopOnDtmf property of the RecordSound control to any value.

2.

In HTML view, verify that the SALT namespace is declared in the html element at the top of the page:

<HTML xmlns:salt="http://www.saltforum.org/2002/SALT">

3.

Now we can create SALT elements on the page. In HTML view, create a new SALT dtmf element as follows:

<salt:dtmf id="myDtmf" onreco="myDtmf_onreco()">
	<salt:grammar src="Grammars/myDtmf.grxml#Rule1" id="myDtmf_DtmfGrammar1">
	</salt:grammar>
   </salt:dtmf>

4.

In a script block before the myDtmf element, create the event handler for the onreco event of the myDtmf element:

function myDtmf_onreco()
{
	RunSpeech.Resume();
}	

5.

Create a new grammar rule, Rule1, in a grammar named myDtmf.grxml. This grammar should be designed to accept any DTMF input. Note that this grammar is just a placeholder for collecting the DTMF input that is in the buffer, but in this case its actual value does not matter. For more information on designing grammars, see the section titled "Creating Grammars" in the SASDK Help documentation. For an example of a DTMF grammar that accepts all input, see the section titled "Dtmf Remarks" in the SASDK Help documentation.

6.

In a script block before the RecordSound control, create an event handler for the OnClientDone event of the RecordSound control, say RecordSound1_OnClientDone:

function RecordSound1_OnClientDone()
{
	RunSpeech.Pause(false);
	myDtmf.Start();
}

To summarize what we have done here, the SALT dtmf element is started when the RecordSound element is done. While the dtmf element collects the single DTMF input from the buffer, RunSpeech is paused. Once the input is collected, RunSpeech resumes, allowing the next speech control to be activated by RunSpeech. For more information on the SALT dtmf element, see the section titled "dtmf Element" in the SASDK Help documentation. For more information on client side RunSpeech functionality, see the section titled "Additional Client Scripting Elements" in the SASDK Help documentation.

Troubleshooting with the Speech Debugging Console
Once you have designed your application to take into account the DTMF buffer behavior as described above, you can perform further debugging by taking advantage of the DTMF tab on the Speech Debugging Console, a tool that comes with the SASDK. Be sure to enable the Break on DTMF Start button at the top of the window when trying this, so that application runtime will pause to allow you to view the DTMF buffer or enter test DTMF input. When paused, the Use Buffer button near the bottom of the DTMF panel will allow you to submit the keys that are already stored in the buffer and then perform any additional DTMF key presses. Additionally, there is a Collection field, which shows the grouping of DTMF inputs that are being compared with the DTMF grammar. When collection begins, there is a Status of Collection Active. When the collection is accepted or rejected, the Status changes to Collection Finished.

Conclusion
By understanding how DTMF inputs are stored in the buffer, how the buffer works in coordination with different speech controls, and how the Speech Debugging Console can help with troubleshooting DTMF issues, your DTMF applications can be expanded to allow for call flows that satisfy both the novice and advanced caller. You will be able to combine the interaction of Command, RecordSound, QA, and many more controls with both DTMF and voice recognition logic?ultimately resulting in a speech application that is more intuitive and secure.

Q.How do I build a speech-enabled application to call customers?
A.

Outbound Dialing Applications

Consider this.

You've been carefully monitoring the progress of a vintage camera for the past several days on a popular e-bidding site. There are two hours left until the auction closes and you have the top bid. You're confident that you're going to get your hands on this camera as you leave for dinner at a restaurant. A little less than two hours later you receive a call on your cell phone. An automated agent tells you that someone just outbid you with less than a minute to go, offering a bid only slightly higher than yours. You converse with the agent, direct the agent to raise your bid, and win the camera!

Speech applications in which a caller dials in to book a flight, make a stock transaction, or to reach a person are very well known. Notifications via e-mail or instant messaging for important events are also very common. The happy marriage of these two types of communication, where the customer can receive a phone call triggered by events of their choosing and then engage in a natural conversation to direct a response to these events not only opens up numerous compelling end-user scenarios, but also highlight ways in which businesses can save money.

The types of applications in which outbound notifications can add value are nearly endless, such as applications that:

Remind customers of an upcoming dentist appointment and reschedule on the phone if needed

Alert bank customers about upcoming bills and provide them the capability to pay them on the phone

Notify parents of school closures

Inform store patrons of shopping opportunities specific to their interests

Call an employee and read out a set of important e-mails

Advise a manufacturer about changes in daily production schedules

Report important changing business conditions (such as changes in the stock market) to executives

Caution homeowners about potential nature hazards in their area

The Microsoft Speech Server (MSS) and the Speech Application SDK (SASDK) is uniquely positioned to provide developers with the ability to build complex outbound dialing applications, as well as deliver them on a highly robust and performant platform.

Getting Started
The easiest way to start getting acquainted with outbound dialing is to build a simple application in which you can press a button and receive an outbound call that speaks a message using text-to-speech (TTS).

In order to do this, you will need to create a new speech Web application and build the following pages:

A graphical user interface page (GUI) that has a text field for text input and a button that submits that text input and acts as the trigger for the voice user interface (VUI) application

The VUI page that contains a single prompt-only QA that takes the text and speaks it out

To find out more about how to build a simple outbound dialing application, see the "MakeCall Example" topic in the Microsoft Speech Application SDK 1.0 documentation.

The Next Level
Two of the most important components of an outbound dialing application that you will need to understand in order to build an application that can be deployed in an enterprise environment are the:

Notification generator: Every outbound application depends on the ability to generate one or more notifications. These notifications might be triggered by events as simple as a button click. For more advanced applications, Microsoft provides the SQL Notification Server (SQL NS), which is a set of software components that sits on top of SQL and monitors changing data and generates a notification event when a particular rule is matched (for example, if someone made a higher bid than you).

Notification queue: Managing multiple events requires the use of a queue to contain these notifications and to ensure timely and accurate delivery, while maintaining priority sequencing and logic handling for retries (for example, if a call is placed and the receiving phone is busy or if an answering machine picks up instead of a human).

The SASDK ships with a detailed reference application called Banking Alerts. Banking Alerts allows customers to choose which of three transaction events they want to receive notification about calls them when one of the events is triggered, and engages them in a conversation to elicit a response to the event. Banking Alerts provides detailed examples of a notification generator, a notification queue, and voice user interfaces.

Get started with Banking Alerts by reading the topic "The Banking Alerts Reference Application: Overview" in the SASDK Help documentation. To find this topic, in the SASDK Help documentation table of contents expand "Speech Application SDK," expand "Learning with Microsoft Speech Application SDK," expand "SASDK Sample and Reference Applications," and then click "The Banking Alerts Reference Application: Overview."

Let us know how you get along and we look forward to providing more tips and tricks on outbound dialing applications!

Q.How can I handle increased memory demand when running multiple applications on Microsoft Speech Server?
A.

Problem Description
Deploying multiple applications on Microsoft Speech Server 2004 may increase the load on available memory due to the additional application resources that are preloaded into system memory by default.

In such cases it may make sense to create an "engine partition," or to dedicate one Speech Engine Services (SES) engine configuration to handle the resources for a particular application. This solution avoids the default situation- all application resources are preloaded into all available SES engine instances.

In this article we use the following scenario to illustrate the issue:

Two applications are deployed on Microsoft Speech Server.

The first application, Application1, has a very large grammar that needs to be preloaded. Users access this application regularly and repeatedly.

The second application, Application2, also has a very large-and different-grammar that needs to be preloaded. Users rarely access this application more than once.
For example, Application2 might be an enrollment application for users of Application1.

Application1 receives many more calls than Application2.

The grammars for both applications are large enough that if both are preloaded in all engines, the maximum total available engines serving Application1 (the high-demand application) will be lower than if a partition is used. Note: Extra memory is consumed by preloading the low-demand application grammars in all engine instances.

Specific numbers are application dependent, but in our scenario it is reasonable to expect gains similar to the following:

1.

Preloading all grammars in all engine instances-max 15 engine instances serving both applications.

2.

Partitioning-max 20 engine instances serving Application1, 4 serving Application2.

The partition enabled Application1 to handle a higher volume of incoming calls without overusing system memory. The tradeoff is that Application2 handles a lower call volume.

Solution
The Speech Application software development kit (SASDK) creates a manifest file, Manifest.xml, by default when a new speech Web application is created. Developers add application-specific information to the file. For our scenario, we use the following manifest files:

Application1 manifest:

<?xml version="1.0" encoding="utf-8" ?>
<manifest>
  <application name="Application1">
    <resourceset type="TelephonyRecognizer">
      <resource src="Grammars/Library.grxml" /> 
      <resource src="Grammars/App1LargeGrammar.grxml" /> 
    </resourceset>
      <resourceset type="Voice">
      <resource src="Prompts/App1Prompts.prompts" /> 
    </resourceset>
  </application>
</manifest>

Application2 manifest:

<?xml version="1.0" encoding="utf-8" ?>
<manifest>
  <application name="Application2">
    <resourceset type="TelephonyRecognizer">
      <resource src="Grammars/Library.grxml" /> 
      <resource src="Grammars/App2LargeGrammar.grxml" /> 
    </resourceset>
    <resourceset type="Voice">
      <resource src="Prompts/App2Prompts.prompts" />
    </resourceset>
  </application>
</manifest>

When creating the dedicated engine configuration for an application, certain values from the manifest file must match settings in the Speech Engine configuration tab of the MMC snap-in. These are:

The "Resource set to preload" setting must match the element in the manifest.

The "Application list" field must contain the value of the "name" attribute of the element in the manifest.

These procedures assume that the applications are deployed to a default Microsoft Speech Server Standard Edition installation.

Step 1: Create a new engine configuration to preload Application2 grammars.

1.

Open the MMC Administration console. (For instructions, see MMC Administration Overview in Microsoft Speech Server help.)

2.

In the console pane, expand the applicable group.

3.

In the details pane, double-click the computer running Speech Engine Services (SES) to open the SES properties page.

4.

Click the Speech Engine Configurations tab.

5.

Click Add.

6.

Type Application2TelephonyRecognizer and click OK.

7.

Set the following properties for the new configuration

Number of instances: 4

Engine Class: Recognition (default)

Engine Name: Microsoft English (U.S. Telephony) v 7.0

Resource set to preload: TelephonyRecognizer (Must match element in the manifest file for the application.)

Application list: Application2 (Must match the name attribute of the element in the manifest file.)

8.

Click Apply.

Step 2: Modify the DefaultTelephonyRecognizer engine configuration to adjust the number of instances and preload only Application1 grammars.

1.

In the Speech Engine Configurations tab, select DefaultTelephonyRecognizer from the EngineConfigurations list.

2.

Set the following properties:

Instances: 20

Application list: Application1 (Must match the name attribute of the element in the manifest file.)

3.

Click Apply, and then click OK.

Step 3: Deploy the applications to Microsoft Speech Server using Speech Application Deployment Service (SADS).

This procedure remains the same whether partitioning is used or not. Follow the instructions described in the SASDK documentation to:

1.

Install SADS on a Web server. (See "Preparing the Web Server.")

2.

Deploy the applications to a Web server. (See "Preparing the Web Server.")

3.

Configure Microsoft Speech Server to use the SADS service. (See "Adding and Configuring Speech Application Deployment Services.")

4.

Add both applications to that SADS service using MMC. (See "Adding and Configuring Speech Application Deployment Services.")

NOTES:

1.

This example only describes creating an engine partition for preloading grammar files into recognition engines. Prompt databases for Application1 and Application2 are still preloaded into all speech output engine instances (default out-of-the-box configuration). Depending on your application, you may wish to create a partition for the voice output engine instances.

2.

This example illustrates a simple way to configure the partition. You can use the same technique to configure more sophisticated schemes. For example, because a given engine configuration is capable of preloading resources from multiple applications, the example above could be slightly changed so that the "Application2Telephonyrecognizer" preloads resources for both Application1 and Application2. With this change, all 24 engine instances would have the preloaded resources to serve Application1 (instead of 20), while keeping the number of instances preloaded for Application2 at 4.

3.

Creating dedicated engine configurations does not set a hard limit on the number of incoming calls that an application can handle. In our example, Microsoft Speech Server will adapt to small peaks in the number of incoming calls for a particular application-for example five simultaneous incoming calls for Application2. Setting hard limits can be done by appropriately configuring the switch or PBX servicing the Microsoft Speech Server deployment.

Q.How do I handle unsuccessful MakeCalls in an outbound application?
A.

All outbound applications need to be capable of handling call failures (for example, when the number called by the application is busy). A typical solution for this is to try calling the number again later. This can be done by inserting the requested message for the outbound call back to the outbound call message queue.

This may present a challenge because there is no server-side event for MakeCall:ConnectionFailed, but here is a simple list of steps to follow in order to attempt making the call later. For a more advanced, deployable solution, see the Banking Alerts Reference Application, which can be found in the Reference Applications directory on the Microsoft Speech Application SDK 1.0 CD.

The first step is to declare a SemanticItem called siFailed in the SemanticMap control for your application. We'll use the AutoPostback feature of this SemanticItem to add the message back to the outbound call request message queue. To do this, perform the following steps:

1.

Right-click the SemanticMap control for your application, and select Property Builder.

2.

In the SemanticMap Properties dialog, click Add to add a SemanticItem.

3.

In the ID edit box, replace the default value with siFailed.

4.

Select the AutoPostBack box, and in the Changed edit box, enter siFailed_Changed.

5.

Click OK.

Next, on the server-side code-behind page, add the following to put the message back in the message queue:

private void siFailed_Changed(object sender, Microsoft.Speech.Web.UI.SemanticEventArgs e)
{
     MessageQueue _queue = new
MessageQueue(@".\private$\YourOutboundMessageQueue");
     _queue.Send(@"http://" + Environment.MachineName + "/"
+ Request.Url.Segments[1] + "Dialog.aspx?PhoneNumber=" + 
MainMakeCall.CalledDirectoryNumber + "&Message=" +
Request.QueryString["Message"]);
}

Finally, add a client-side function MakeCall_OnClientFailed. Remember to return true so that RunSpeech will resume after calling this error-handling routine.

function MakeCall_OnClientFailed()
{
siFailed.SetText("false", true);
     return true;
}

Now, when an outbound call fails (for example, when the line is busy):

RunSpeech will execute MakeCall_OnClientFailed(), which will set the semantic item siFailed.

When RunSpeech resumes, it will post back and trigger siFailed_Changed.

The message for requesting the outbound call will then be added back to the outbound call message queue, and the request will be received by the telephony server when another outbound channel becomes available.

Q.How do I build a custom control for more advanced call control functions?
A.

Currently the Speech Application SDK (SASDK) provides speech call controls for answering a call, making a call, disconnecting a call, and blindly transferring a call. The example below shows how you can create a user control that implements a "supervised transfer," where unlike a blind transfer, the application receives call progress events for the transfer. This allows for scenarios such as follow-me style transfer applications.

On the Microsoft Speech Server (MSS) platform, the SALT interpreter establishes a communication channel to the Telephony Interface Manager (TIM) for call control purposes. The SALT <smex> element is used for this simple communication channel where XML messages are sent to the TIM (using the sent property) and received from the TIM (using the onreceive event). The XML message contains CSTA XML service requests and events as defined in Standard ECMA-323 (XML Protocol for Computer Supported Telecommunications Applications Phase III). Typically the SALT application makes service requests and the TIM responds with service request responses and call control events.

A supervised transfer uses the CSTA Consultation Call service, completed with a CSTA Transfer Call service.

The code example for this Tip and Trick can be found in two files:

Default.aspx is a simple page that answers the phone, plays a welcome prompt, attempts a transfer, plays a failure prompt if the transfer failed, and then disconnects.

SupervisedTransferControl.ascx is a user control that implements the supervised transfer.

The SupervisedTransferControl wraps the supervised transfer functionality into a reusable control. The following code segment shows its use on Default.aspx page:

<STC:SupervisedTransferControl id="SupervisedTransferControl1"
         runat="server"
   TransferToNum="5551234"
   ClientActivationFunction=
         "SupervisedTransferControl1_ClientActivationFunction"
   OnClientFailure="SupervisedTransferControl1_OnClientFailure"
   OnClientTransfered=
         "SupervisedTransferControl1_OnClientTransfered"/>

The SupervisedTransferControl relies on support from three SmexMessage controls, described below, to implement the supervised transfer. The SmexMessage is a standard speech control that allows the author to send a CSTA message to the TIM and pause execution of the dialog until a particular response has been received.

1.

The ConsultationMessage control is activated when the application initiates the supervised transfer. The ConsultationMessage control sends the CSTA ConsultationCall message and waits to see if the consultation call is successful or not. The ConsultationCall places the original call on hold, now the held party, and places an outbound call to the consulted party. The call to the consulted party is generally on a separate channel.

Success: Receipt of a CSTA EstablishedEvent-the consulted party has answered

Failure: Receipt of either a CSTAErrorCode-for example, no channel resources available for the transfer-or a CSTA FailedEvent-for example, the consulted party is busy or the phone "rings no answer"

2.

If the consultation was successful, the TransferMessage control is activated. This control joins the original (held) party to the consulted party. The control sends the CSTA TransferCall message and waits to see if the transfer call is successful or not.

Success: Receipt of a CSTA TransferedEvent
Note that the CSTA specification has spelled this message with one 'r' in the word transferred; for consistency, this user control maintains that spelling throughout.

Failure: Receipt of either a CSTAErrorCode or CSTA FailedEvent

3.

If the consultation or the transfer fails, the ReconnectMessage control is activated. This attempts to disconnect the consulted party and bring the original (held) party back into an active call state with the SALT application. The control sends the CSTA ReconnectCall message and waits to see if the reconnect is successful or not.

Success: Receipt of a CSTA RetrievedEvent

Failure: Receipt of either a CSTAErrorCode or FailedEvent

Q.Advanced Debugging with TASim-Sending Events to the Microsoft Windows Event Log Using Speech Server
A.

Problem Description
The Telephony Application Simulator generates log events equivalent to those generated by the Telephony Application Service of Microsoft Speech Server, but only a subset of these events (those that are of most interest to application developers) are displayed in the Speech Debugging Console's output window. Sometimes, a developer needs additional information to determine what kinds of problems are occurring.

By default, when a grammar resource is missing, the Speech Debugging Console only reports a '-4' error message that might look similar to the following:

Listen QA1_Reco: onerror beep="False" initialtimeout="3000" babbletimeout="20000" maxtimeout="120000" endsilence="1000" reject="0" mode="automatic" recoresult="" text="" status="-4" recordlocation="" recordtype="" recordduration="0" recordsize="0" id="QA1_Reco" title="" lang="en-us" dir="" className="" xmlns="" onreco="QA1_Reco_obj.SysOnReco()" onerror="QA1_Reco_obj.SysOnError()" onnoreco="QA1_Reco_obj.SysOnNoReco()" onsilence="QA1_Reco_obj.SysOnSilence()"

Although it is possible to start with the -4 status and determine which grammar is missing, having additional information could save time. This additional information can help you determine exactly which resource is causing the problem, particularly when there are multiple grammars, a multiple level grammar, or dynamic grammars where the URL to the grammar is constructed at run time.

Solution
We can set the speech engine of the Microsoft Speech Server to produce extra log instructions by modifying the TASim.exe.config and TAsinstrumentation.config files.

(1) Add a reference to the sink named logSink in the TASiminstrumention.config file. TASiminstrumention.config is located at: \Program Files\Microsoft Speech Application SDK 1.0\SDKTools\Telephony Application Simulator\

<filters>
<filter name="TraceAll">
<eventCategoryRef name="All Events">
<eventSinkRef name="TASimSink" />
<eventSinkRef name="logSink" />
</eventCategoryRef>
</filter>
</filters>

(2) Modify the TASim.exe.config file to include the speech engine of the Microsoft Speech Server. TAS.exe.config is located at: \Program Files\Microsoft Speech Application SDK 1.0\SDKTools\Telephony Application Simulator\

<configuration>
  <appSettings>
   <add key="Lang" value="en-us"/>
   <add key="RecordingDirectory" value="%TEMP%" />
   <add key="SpeechServer"
   value="http://yourSESserver/speechserverweb/lobby.asmx" />
   <add key="instrumentationConfigFile"
   value="TASimInstrumentation.config" />
  </appSettings>
</configuration>

(3) Add the Web server that hosts your application to the trusted sites list of the Speech Engine Service.

(4) From the File menu, click Open and launch TASim.exe. Enter http://yourwebservername/yourapplication/default.aspx, and dial the number.

Suppose that your application has a QA control called QA1. QA1 has a grammar called toplevel.grxml that has a rule reference to another grammar called pizza.grxml. Assume that pizza.grxml is a dynamic grammar that is missing at the time you are testing the application. If you have configured the speech engine using the instructions above, you will now find additional information in the Windows Event Log; for example:

Event Type: Information 
Event Source: Application (TASim) 
Event Category: None 
Event ID: 0 
Date: 3/6/2004 
Time: 10:33:09 AM 
User: N/A 
Computer: YOURWEBSERVERNAME 

Description:
Microsoft.SpeechServer.Log.Trace
{
String Message = "Recognition error, error = 8004600A, description = 
"System.Web.Services.Protocols.SoapException: Server was unable to process request. 
---> Microsoft.SpeechServer.GrammarException: Error loading grammar 
'http://YOURWEBSERVERNAME/yourapp/Grammars/toplevel.grxml' or one of the grammars it references. 
---> Microsoft.SpeechServer.SpeechServerException: Error downloading grammar 
'http://YOURWEBSERVERNAME/yourapp/Grammars/pizza.grxml'. 
---> System.Net.WebException: The remote server returned an error: (404) Not Found.
  --- End of inner exception stack trace ---
  --- End of inner exception stack trace ---
  --- End of inner exception stack trace ---""
  String ExceptionDetails = ""
  Int32 ProcessID = 2244
  WindowsSecurityInfo WindowsSecurity = <null>
  ManagedSecurityInfo ManagedSecurity = <null>
  String StackTrace = ""
  String ServiceProvider = "DesktopSaltInterpreter"
  Int32 EventLogEntryTypeID = 4
  Int64 EventSequenceNumber = 55
  String EventSourceInstance = "f68372f1-1073-4a9f-94ae-e690a5dac007"
  String EventSourceName = "Application"
  String MachineName = "YOURWEBSERVERNAME "
  DateTime TimeStamp = 3/6/2004 10:33:09 AM
  SpeechContext SpeechContext = {
    String ApplicationInstance = "0ba6ddd0-7ca7-4fa3-957c-b299f9bc68d5"
    String PageUri = "http://YOURWEBSERVERNAME/logsinktest/default.aspx"
    String RequestID = <null>
  }
}

Here, the Windows Event Log clearly shows that the grammar 'http://YOURWEBSERVERNAME/yourapp/Grammars/pizza.grxml' is responsible for the -4 error.

Q.How to Debug Server and Client-Side Code at the Same Time?
A.

Problem Description

Microsoft Speech Server applications employ both server-side and client-side code, which means you need to be able to debug code on both sides. By default, only client-side debugging is available. Other procedures have been documented that allow you to debug server-side code, but once you follow these steps you can't debug client-side code. So you're forced to use one or the other, while what you really need is to do both: client-side and server-side debugging, in the same debugger at the same time.

Solution

From Visual Studio .NET, create a speech application using the 'Speech Web Application' template

Select the speech project node in the Solution Explorer window.

Right-click and select Properties.

In the left pane, select Configuration Properties and then Debugging.

Keep the Debug Mode set to Program.

Clear out the command line arguments (delete 'http://localhost/yourproject/default.aspx').

Ensure that Always Use Internet Explorer is set to false.

Click OK to close the dialog box.

Set up breakpoints in the code-behind.

Add the script command 'debugger' to .js and .pf files.

If you start debugging using the F5 key from Visual Studio .NET, TASim.exe (the Telephony Application Simulator) will be launched. But because the Telephony Application Simulator does not have the application URL, it will wait for the application URL. This will give you an opportunity to attach to aspnet_wp.exe or w3sp.exe (the name will vary depending on which version of Microsoft Internet Information Services you are running), as follows: Enter the application URL in the Telephony Application Simulator. Before you click the OK button in the 'Open Start Page' dialog, switch to Visual Studio .NET. From the Debug menu, select Processes. In the Processes window, attach to aspnet_wp.exe or w3wp.exe. Then switch back to the Telephony Application Simulator, and click the OK button.

The debugger will stop at breakpoints in the code-behind first, because the debugger is attached to aspnet_wp.exe or w3wp.exe. Once the page is rendered to the Telephony Application Simulator, the debugger will stop at the script command 'debugger' in the client-side code as well. This is because by default, the debugger is attached to the Telephony Application Simulator, which was set up by the 'Speech Web Application' template.

Q.How to Test nbest in the Desktop Development Environment?
A.

This Tip and Trick is for developers using the Microsoft Speech Server Beta 2 and the Microsoft Speech Application SDK v1.0 Beta 4.

Problem Description
A speech recognition engine can return more than one recognition hypothesis. The best hypothesis-for example, the hypothesis with the highest score-is always returned as the recognition itself. If alternates are specified, additional hypotheses are returned as alternate recognitions.

The nbest parameter specifies the number of hypotheses that will be returned. The default value of nbest is 1, which specifies that the speech recognition engine should return no alternates. If you specify the value of nbest as 2, then the speech recognition engine will return the default hypothesis plus one alternative.

Currently, the speech recognition engine installed by the Microsoft Speech Application SDK v1.0 Beta 4 for use in the development environment does not support nbest. The following two workarounds will allow you to test nbest in the development environment before deploying the application to the production server.

Solution 1: Configure the TASim.exe to use the Microsoft Speech Server speech recognition engine, which supports nbest.

Open the TAsim.exe.config file, located in the same directory where TASim.exe is installed, and change the key to point to the speech recognition in the Microsoft Speech Server, as follows:

<add key="speechserver" 
Value="http://yourMSS/speechserverweb/lobby.asmx" />

TASim.exe will then use the Microsoft Speech Server speech recognition engine, which supports nbest.

Solution 2: If you do not have access to a Microsoft Speech Server speech recognition engine during the development, modify the SML resulting in the Speech Console Debugger to simulate an nbest result.

This solution allows you to test the logic that you provide in your application for handling alternate recognitions. The following example illustrates how this works.

1.

Create a Telephony application. Add a SemanticItem object (siCity) in the SemanticItem map, and add a QA Control (QA1) to the Web form.

2.

Create a grammar that recognizes two phrases: 'houston' and 'boston'.

3.

In the HTML view of the application, in the <reco> element of the QA control, set the value of nbest to 2 using <Params> </Params>, as follows:

<Reco ID="QA1_Reco"> 
    <Params> 
        <speech:Param Name="nbest" >2</speech:Param> 
    </Params> 
    <Grammars> 
        <speech:Grammar .....></speech:Grammar> 
    </Grammars> 
</Reco> 

4.

In the Web form view, right-click on the QA control (QA1), select Property Builder, add siCity to the QA control's Answer collection, and add a client normalization function for siCity.

5.

Implement the client normalization function as follows (assume the client normalization function is called QA1_ClientNormalizationFunction):

<script> 
function QA1_ClientNormalizationFunction( theSML, siCity) 
{ 
    if (siCity.alternates.length > 0) 
        LogMessage ("NBest one:"+siCity.alternates[0].value.text, 
                    "NBest two:"+siCity.alternates[1].value.text); 
    else 
        LogMessage ("siCity", theSML.text); 
} 
</script> 

Now if the engine returns the alternates object, the LogMessage ( ) function ensures it will be logged for monitoring purposes. Please note that we hard-code in the LogMessage ( ) function for two items in the alternates object. You might want to enumerate these items in other ways, such as: for(var i=0; i<siCity.alternates.length; i++)

6.

In the Speech Debugging Console, enable SML editing. When QA1 is listening, speak or enter 'boston' in the Speech Debugging Console. In the Speech Debugging Console SML window, you will see something like this:

<SML confidence="1.000" text="boston" utteranceConfidence="1.000"> 
    <city confidence="1.000">boston</city> 
</SML> 

7.

Change the SML result to the following:

<SML confidence="1.000" text="boston" utteranceConfidence="1.000"> 
    <city confidence="1.000">boston</city> 
    <alternate Rank="1"> 
        <city confidence="1.000">boston</city> 
    </alternate> 
    <alternate Rank="2"> 
        <city confidence="1.000">houston</city> 
    </alternate> 
</SML>

8.

Click the Submit button in the Speech Debugging Console. The default result and the alternative will now be logged by the LogMessage( ) function when a speech recognition occurs.

Have other questions or comments? Join the discussion about the Microsoft Speech Application SDK by visiting our newsgroup at microsoft.public.netspeechsdk.

Q.Can we explicitly stop the listen event when DTMF starts?
A.

Problem Description
According to the Speech Application Language Tags (SALT) specification 2.3.6, when listen mode (for audio input) and DTMF mode (for touch-tone input) are both supported in an application simultaneously, there are two default behaviors: "(i) the disabling of initial timeouts on the other mode on detection of input, and (ii) the automatic cancellation of one mode when the other mode comes to an end."

There are some cases where we might need to fine-tune these behaviors. For example, if the user needs to enter a very long DTMF string, the listen element may fire other timeout events, preventing the user from finishing their DTMF inputs. This may occur even if the initial timeouts have been disabled.

Another example is when the phone connection is poor and has an echo. Because listen mode comes to an end before DTMF mode, if the echo causes a false recognition, the DTMF mode will be cancelled. In this case, the user will not be able to finish the DTMF input.

Solution
To help avoid these problems, you can tune the timeout properties of the listen element.

Alternatively, you can modify the QA control's DTMF client-side event OnClientKeyPress( ), call listen.cancel( ) to explicitly cancel out the listen mode, as follows:

<script> 
function MyQAName_OnClientKeyPress( ) 
{ 
	MyQAName_Reco.Cancel(); 
}
</script>
Q.How do you detect a user-initiated hang-up in a speech application?
A.

If an application is actively running (that is, a QA control is running as part of a dialog), and the end user initiates a disconnect-for example, clicking the hang up button on the Telephony Application Simulator-is there any way to detect the user-initiated disconnect in client-side script?

Solution
When an end user initiates a hang-up while the application is still running, you can use RunSpeech APIs to detect it. Specifically, you can use the "RunSpeech.OnUserDisconnected" event as described below:

When a user initiates the hang-up, the SMEX "ConnectionClearedEvent" is activated, which causes "RunSpeech.OnUserDisconnected" to be called.

You can register the "RunSpeech.OnUserDisconnected" event in your script in any function. One approach is to use the AnswerCall control's "OnClientConnected" event to register the disconnect event. If you have a DisconnectCall control in your application that initiates the disconnect, then the "OnClientDisconnected" event is activated. However, if the user initiates the disconnect before the "DisconnectCall" control gets activated, then the "RunSpeech.OnUserDisconnected" event is activated.

Example
Following is an example of this approach to detecting disconnects:

function AnswerCall1_OnClientConnected( sender, callId, callingDevice, calledDevice ) 
{ 
	// Call accepted successfully, register for user-initiated disconnects 
	RunSpeech.OnUserDisconnected = OnUserDisconnected; 
} 

function OnUserDisconnected( activeObject ) 
{ 
	// Handle user-initiated disconnects here
}

function DisconnectCall1_OnClientDisconnected( sender ) 
{ 
	// Handle application-initiated disconnects here 
}
Q.When creating speech applications that will accept DTMF (touch-tone) input, how do I enable users to enter information using the # key as an optional terminator?
A.

Speech applications can be designed to accept Dual Tone Multiple Frequency (DTMF), or touch-tone, input as one type of input from users. This is an important feature when you need to capture digit strings of a particular length and would like the recognition to end when the # key is pressed. The trick involved in terminating the DTMF input by using the # key is simply to write code into the grammar file itself.

Shown below is an example of a DTMF grammar that has a "UserName" rule that contains a # phrase and a group referencing a second grammar: "digits.grxml". This second grammar includes the "digitkey" rule, which contains phrases from "0" to "9".

image

Note that the # element in the phrase above is set to optional, which means that users may or may not press # after entering the other digits. If the user does press the # key when finished, the application will return to the user sooner rather than waiting for time to elapse. The above grammar is set to take an explicit input of 5 digits from users, excluding the # key. You can test this by adding a DTMF grammar using a QA control's property builder section, and then dialing using the Telephony Application Simulator (TASim) tool. (The TASim tool is automatically installed as part of the Microsoft Speech Application SDK Beta 3.)

Q.What can I do to fix this?
A.

I have a subfolder in my application that contains an .aspx file (page2.aspx). The QA on this aspx file is configured to use a .pf file (page2.pf) in the same subdirectory. My application's start page (page1.aspx) will do a Server.Transfer() that transfers to the page2.aspx mentioned in above during the post back. I am getting a runtime error saying that the page2.pf file can not be found.

When you add a .pf (PromptSelectFunction) file to a QA control, there are three options you can use: Absolute, Document Relative, and Root Relative. Typically, you can use the default option: Document Relative. Because the path of the .pf file is relative to that of the .aspx file, this option makes is easier to move the .aspx and .pf files. For example, to move the .aspx and .pf files to a new application folder using the default setting, you can just copy and paste both the .pf file and .aspx file to the new folder, and the QA control will be able to locate the .pf file successfully.

This approach will not work if you are transferring an .aspx page to a subfolder, and the .aspx page contains a QA control that needs to reference a .pf file in the same subfolder. This fails because on the server side, when Server.Transfer() is called to transfer an application to a different .aspx page in a subfolder, the path of the new page is different from the previous path. However, by design, the client is still looking for the .pf file using the original URL path.

For example, the client is first connected to http://servername/speechApp/page1.aspx, but during the post back, the application is transferred to page2.aspx, which is under a subfolder of the speechApp folder. The server is now serving a file located at http://servername/speechApp/subfolder/page2.aspx. But when the QA control in the page2.aspx file needs to reference page2.pf file, the client will look for it in http://servername/speechApp/page2.pf rather than in the subfolder, causing the runtime error.

To solve this problem, you should use the Root Relative option.

Q.How Do I use Prompt and Listen?
A.

The Basic Speech Controls used in the Microsoft Speech Application SDK are an ASP.NET representation of the two fundamental Speech Application Language Tags (SALT) elements: prompt and listen. These controls are designed especially for applications running on tap-and-talk client devices, and are used primarily for managing application flow through a graphical user interface (GUI). In order to use speech recognition on handheld devices running Pocket Internet Explorer, a separate speech server running Microsoft Speech Server SES (Speech Engine Services) must be available. The URL of the Microsoft Speech Server SES must be specified for each Listen and Prompt control. Although the Property Builder included in the Microsoft Speech Application SDK makes it easy to enter the URL for an individual control, it can be impractical to enter this information over and over, and to keep it up to date. Because in most cases the same server will be used for all controls in an application, and because the name of the server might change during development, testing, and deployment, it makes sense to store this URL in just one location in the application. One solution to this might be to store the URL of the speech server in the application's web.config file, in the appSettings area. For example, the web.config file might include a section like this:

<appSettings>
	<add key="speechserver" value="http://MyServer/speechserverweb/lobby.asmx" />
</appSettings>

The author can then read the URL from the web.config file at render-time. The URL could then be stored in a string, and ASP.NET data binding could be used to link this value to the Prompt and Listen elements. The author would still need, however, to manually add the necessary text to each ASPX page to specify the information that will be databound. A more advanced but convenient approach is to add the required information to each control dynamically, at render time. In essence, code is added that finds each Prompt and Listen control on the page, and dynamically inserts a Param tag specifying the speech server. The code to do this can be added using a Page_PreRender method. In order to employ this method, when in Design mode, select the current page (in the example described here, "_Default") and enter "Page_PreRender" in the Properties window. This will cause Microsoft Visual Studio® to insert the code to attach a new handler, as follows:

private void InitializeComponent()
{    
	...
	this.PreRender += new System.EventHandler(this.Page_PreRender);
	...
}

Visual Studio will automatically jump to the code-behind for the page, and will have created and selected the new Page_PreRender method. The code to add the necessary Param tags looks like this:

private void Page_PreRender(object sender, System.EventArgs e)
{
	// Check if the web.config lists a server to user
	if (ConfigurationSettings.AppSettings["speechserver"] == null)
		return;
	string server = 
		ConfigurationSettings.AppSettings["speechserver"].ToString();

	// Find all the controls of type Prompt
	ArrayList a = FindAllControlsOfType(Page.Controls, typeof(Prompt));

	// Loop over them
	for (int i=0;i<a.Count;i++) 
	{
		Prompt p = a[i] as Prompt;
		// If it already contains a server param, skip it
		if (p.Params.ContainsName("server")) continue;

		// Otherwise create and add a new param setting the server
		Param sp = new Param();
		sp.Name = "server";
		sp.Value = server;
		p.Params.Add(sp);
	}

	// And do the same for the Listen controls
	a = FindAllControlsOfType(Page.Controls, typeof(Listen));
	for (int i=0;i<a.Count;i++) 
	{
		Listen l = a[i] as Listen;
		if (l.Params.ContainsName("server")) continue;
		Param sp = new Param();
		sp.Name = "server";
		sp.Value = server;
		l.Params.Add(sp);
	}
}

One additional method is required, to build an ArrayList of all controls on the page of a particular type:

private ArrayList FindAllControlsOfType(ControlCollection theList, 
	System.Type theType) 
{
	// Loop over all the controls in the control collection that is
	// passed in. Keep an ArrayList of those matching the requested 
	// type. Recursively check the children of all controls.
	ArrayList toRet = new ArrayList();
	for(int i=0;i<theList.Count;i++) 
	{
		if (theList[i].GetType() == theType)
			toRet.Add(theList[i]);
		toRet.AddRange(
			FindAllControlsOfType(theList[i].Controls,
			theType));
	}
	return(toRet);
}

Using this mechanism, the author need only specify the URL of the Speech Server in one place (the web.config file), and the URL in the Speech Controls Property Builder in every instance can be left blank, unless the author wishes to override the global setting.

A more advanced author might wish to move this code to a separate helper class or to implement a new class derived from System.Web.UI.Page, so that it would not be necessary to duplicate this work on each page.

Q.How do I determine whether input is Speech or DTMF (Touch-Tone)?
A.

QA Dialog Speech Controls support both speech and DTMF (dual-tone multi-frequency) input modes simultaneously. It is often important to know which mode the user is using when there is a noreco event (when the speech input is not recognized). When the noreco event is from DTMF input, it is also useful to retrieve the digits and render the prompts accordingly. There is a significant difference in the way the prompt should be rendered, depending on whether the noreco event was related to speech input or DTMF input. For example, if we are receiving a speech-related noreco event, the prompt can say something like: "sorry, I did not understand you." However, this prompt does not make sense if it is a DTMF noreco because in the DTMF mode, we always 'know' what the user enters - but the digits may be incorrect or unexpected.

QA Dialog Speech Controls provide two client-side OnClientNoReco properties: one for Speech and one for DTMF. You can provide a different event handler for each one of them. For example, in the property builder for the QA control, under DTMF properties, you can provide an event handler for the OnClientNoReco property. Let's assume that we have a QA called QA1, and the function name for the DTMF OnClientNoReco is "DTMFNoReco" in the property builder.

In the HTML view of the Web form, we will have something like this:

<Dtmf OnClientNoReco="DTMFNoReco" ID="QA1_Dtmf">

After you provide the function DTMFNoReco( ), if there is a DTMF noreco event, the function DTMFNoReco( ) will be called, and you will know that this is a DTMF noreco event.

According to the SALT (Speech Application Language Tags) specification version 1.0, section 2.3.2.2, the DTMF object has a 'text' property which is updated on every onkeypress, onreco, and onnoreco event. This means that we can retrieve the digits that the user enters all the way to the first key that goes wrong. For example, the user enters 12354 as a PIN number, and the correct PIN is 12345. Here the fourth digit went wrong, and the DTMF text property will capture the digits 1235.

Using the code given previously, the DTMF object ID is 'QA1_Dtmf', so we can use the following implementation:

</form>
<script>
function DTMFNoReco()
{
LogMessage("DTMF NoReco", QA1_Dtmf.text);
}
</script>
//...
Q.How do I migrate applications from Beta 2 to Beta 3 of the Microsoft Speech Application SDK?
A.

Now that the Beta 3 version of the Microsoft Speech Application SDK is available, it will be important to understand how to migrate your current applications from Beta 2. The document Migrating Applications from Beta 2 to Beta 3 of the Microsoft Speech Application SDK describes the changes made to various components of the SDK, and can assist you in this migration.

Q.
A.
Top of pageTop of page