Tips and Tricks

Tips and Tricks

Current Archive

Beta 2 Archive

Want some helpful hints for getting the most out of the Microsoft Speech Application SDK? Check out the tips and tricks section.

Have other questions or comments? Join the discussion about the Microsoft Speech Application SDK. Visit our newsgroup at microsoft.public.netspeechsdk.


Q.How Can I Find More Tools to Troubleshoot Speech Applications?
A.

Grammar compilation and loading, speech application call flow, and Speech Application Programming Interface (SAPI) error codes are all complex topics with which any speech application developer might need occasional help.

At the Microsoft Download Center, there's a small collection of tools that a speech application developer might find useful. The tools were not included with the Microsoft Speech Application SDK Version 1.1 (SASDK). They are unsupported, but they’re free.

The tools are listed and described in the following sections.

GramStat Speech Utility for Microsoft Speech Technologies

The GramStat Speech Utility is a command-line utility that provides statistics for both compiled files and raw grammar files. These statistics can be used to perform basic grammar analysis, and to troubleshoot grammar compilation problems and loading problems.

Recognizer Speech Utility for Microsoft Speech Technologies

The Recognizer speech utility is a command-line utility that is useful for the analysis of offline call flow, the diagnosis of simple speech recognition errors , and top-line error diagnosis for grammars, rules, and speech application installations.

SAPIErr Speech Utility for Microsoft Speech Technologies

The SAPIErr speech utility is a command-line lookup utility that is useful for deciphering SAPI error codes that are returned by either the speech recognizer, the Microsoft Speech Server 2004 prompt engine, or the SAPI itself.

GetPron Speech Utility for Microsoft Speech Technologies

The GetPron speech utility is a command-line tool that takes a list of words and the outputs pronunciations for those words that are used by the Microsoft Speech Server 2004 speech-recognition engine.

BuildAppLex Speech Utility for Microsoft Speech Technologies

The BuildAppLex.exe speech utility is a command-line tool that enables you to create an Application Lexicon by using the Speech API. The BuildAppLex.ese speech utility requires one required command-line argument: a text file that contains a list of words and their corresponding pronunciations.

These tools can be downloaded by searching Microsoft.com for ‘speech utilities’ or by going to http://www.microsoft.com/downloads/details.aspx?FamilyID=52744fb8-9238-4cbd-b615-be2ca781880d&displaylang=en.

Q.How Can I Make My Grammars Load Faster?
A.

Grammars created and edited in Grammar Editor in Microsoft Speech Application SDK (SASDK) 1.1 are XML text files. Text is fine for development work and debugging, but speed is important in a production environment. Because a compiled grammar is smaller, it loads faster from the Web server.

Use the command-line grammar compiler, SrGSGc.exe, to compile your XML grammars. The grammar compiler installs with the SASDK and by default is located at %SystemDrive%\Program Files\Microsoft Speech Application SDK 1.1\SDKTools\Bin. The following example shows how to compile a grammar called Input.grxml into a grammar called Output.cfg.

Srgsgc.exe /O C:\myProject\Grammars\Input.grxml C:\myProject\Grammars\Output.cfg

As a rule of thumb, compiling a 320-KB text grammar yields a 210-KB compiled grammar.

Q.Can I return more than one semantic value as the result of recognition on a single branch in a .grxml grammar?
A.

The short answer to this question is yes. The tip this month illustrates how to do this using semantic interpretation markup in .grxml grammars for Microsoft Speech Server (MSS) 2004 and MSS R2.

Understanding the Issue

To better understand the issue, imagine the following scenario. Suppose you want to create a directory assistance application for your organization. You want a user to be able to call your organization's main number, say the name of the person to whom the user wants to speak, and then offer the user the choice of connecting to either the contact person's office phone or cell phone. For this task, the grammar that your application uses must be able to recognize the person's name and return at least two semantic values: one value representing the person's office phone number and one representing the person's cell phone number.

To accomplish this, for each contact you can use an item element to contain the contact name and corresponding tag elements to return the semantic information associated with that contact. The key here is that because the tag element contains ordinary ECMA script, you can use multiple script expressions within tag elements to declare, store, and return multiple property values.

A Quick Review

Before looking at a few examples, recall the following points about semantic interpretation in .grxml grammars:

A tag element contains script expressions that are executed when the recognizer follows the branch in which the tag element is located. The Microsoft speech recognizer serializes the script products and generates the semantic result in the form of a Semantic Markup Language (SML) output.

Every rule element in a grammar has a single Rule Variable object ('$') that holds a semantic value. You can use script expressions contained in tag elements to define properties of the Rule Variable and these are returned as child nodes in the SML output.

Two properties are predefined for the Rule Variable and for all custom-defined properties. These are the _value and the _attributes properties. The _value property produces the text content of an SML node, and the _attributes property produces XML attributes in the start tag of a node.

With these points in mind, we are ready for a few examples. The examples illustrate how to create custom-defined properties of the Rule Variable, but differ in how they store the semantic values that you want to return for a successful recognition. These differences produce differently structured SML output.

Example 1: Returning Semantic Values in Child Nodes of the SML Return

The following example illustrates how you can store semantic values in the _value property for custom-defined properties so that these semantic values are returned as the content of child nodes. In this example, the script expressions contained in the first three tag elements initialize three custom properties of the Rule Variable: the first to hold the contact's name, the second to hold the contact's office phone extension, and the third to hold the contact's cell phone number. The script expressions contained in the last three tag elements set the semantic value for each custom property using the _value property of the custom property.

<rule id="Contacts" scope="public">
    <tag>$.ContactName={}</tag>
    <tag>$.ContactOfficeExtension={}</tag>
    <tag>$.ContactCellPhone={}</tag>
    <item>John Smith
        <tag>$.ContactName._value="John Smith"</tag>
        <tag>$.ContactOfficeExtension._value="1234"</tag>
        <tag>$.ContactCellPhone._value="5554321"</tag>
    </item>
</rule>


Using the previously illustrated rule, a successful recognition of the utterance "John Smith" produces the following SML output.

<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
    <ContactName confidence="0.805">John Smith</ContactName>
    <ContactOfficeExtension confidence="0.805">1234</ContactOfficeExtension>
    <ContactCellPhone confidence="0.805">5554321</ContactCellPhone>
</SML>


Because the content of tag elements is ordinary script, you can also contain multiple expressions within a single tag element, provided that the expressions are delimited using semicolons. In other words, the results of the following grammar markup are equivalent to those of the previously illustrated grammar markup.

<rule id="Contacts" scope="public">
    <tag>$.ContactName={};$.ContactOfficeExtension={};$.ContactCellPhone={}</tag>
    <item>John Smith
        <tag>$.ContactName._value="John Smith";
             $.ContactOfficeExtension._value="1234";
             $.ContactCellPhone._value="5554321"
        </tag>
    </item>
</rule>


Although using multiple tag elements that each contain a single script expression requires slightly more memory, the effect on performance is usually indiscernible. On the other hand, using multiple tag elements can make your code easier to read.

Example 2: Returning Semantic Values as Attributes of the Top-Level SML Node

The following example illustrates how you can store semantic values in the _attributes property of a custom-defined property so that these semantic values are returned as attributes of the child node. In this example, the script expression contained in the first tag element initializes the custom property as an object, the second expression sets the semantic value of the property itself, and the remaining expressions set attribute properties of the object.

<rule id="Contacts">
    <tag>$.ContactInfo={};</tag>
    <item>John Smith
        <tag>$.ContactInfo._value="John Smith"</tag>
        <tag>$.ContactInfo._attributes.officeExtension="1234"</tag>
        <tag>$.ContactInfo._attributes.cellPhone="5554321"</tag>
    </item>
</rule>


Using the previously illustrated rule, a successful recognition of the utterance "John Smith" produces the following SML output.

<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
    <ContactInfo confidence="0.805" officeExtension="1234" cellPhone="5554321">
        John Smith
    </ContactInfo>
</SML>

 

Example 3: Returning Semantic Values in an Array

The following example illustrates how you can store semantic values in a custom-defined property that is initialized as an array so that these semantic values are returned in a series of item elements in the SML output. In this example, the script expression contained in the tag element initializes the custom property as an array containing the semantic values for the contact's office phone extension and cell phone number.

<rule id="Contacts">
    <item>John Smith
        <tag>$.ContactInfo=["1234", "5554321"]</tag>
    </item>
</rule>


Using the previously illustrated rule, a successful recognition of the utterance "John Smith" produces the following SML output:

<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
    <ContactInfo confidence="0.805">
        <item confidence="0.805">1234</item>
        <item confidence="0.805">5554321</item>
    </ContactInfo>
</SML>

 

Conclusion

The tip this month illustrates several ways to write markup in a .grxml grammar so that more than one semantic value is returned in SML output as the result of recognition on a single branch in the grammar. For a more thorough discussion of semantic interpretation markup, see the "Semantic Interpretation Markup" section in the MSS Help documentation. For additional examples, see the "SML Reference" section.

Q.Why do my RuleRefs use absolute file paths?
A.

When adding RuleRef elements to grammars in Speech Application SDK version 1.1, you might have wondered why the paths are absolute rather than relative. After all, absolute paths mean there is more work involved if you move the grammar to a different folder. For example, you might want to use a single grammar in multiple applications or want to change folders on the production server.

There is a simple explanation and an easy way to choose relative paths or absolute paths for grammar rule references. If you open a grammar as a stand-alone file, the paths in RuleRef elements are absolute paths. However, if you open a grammar in a speech project, the paths in RuleRef elements are relative paths.

To get absolute paths for grammar rule references:

1.

In Visual Studio .NET 2003, click New on the File menu, and then click File.

2.

In the New File dialog box, select Speech in the Categories pane.

3.

In the Templates pane, select Grammar File, and then click Open.

4.

In Grammar Explorer, double-click Rule 1.

5.

In the Rule Editor, add a List element.

6.

Add a RuleRef element to the Phrase element.

7.

Right-click the RuleRef element, select Set Target Rule, and then click Browse.

8.

In the Open Grammar File dialog box, select a grammar file, and then click Open.

9.

In the Rule Browser dialog box, select a rule, and then click Set Target Rule.

In the Properties window, notice that the URI property is now set to a value similar to file:///C:MyGrammarFiles/TestGrammar.grxml#InvoiceRule. This is an absolute file reference.

To get relative paths for grammar rule references:

1.

In Visual Studio .NET 2003, open a speech application.

2.

In Solution Explorer, double-click a grammar file in the Grammars folder.

3.

In the Rule Editor, double-click a rule.

4.

In the Rule Editor, add a RuleRef element to an element already present in the designer.

5.

Right-click the RuleRef element, select Set Target Rule, and then click Browse.

6.

In the Pick a Grammar URL dialog box, click Browse, select a grammar file, click Open, and then click OK.

7.

In the Grammar Editor warning dialog box, click Continue.

8.

In the Rule Browser dialog box, select a rule, and then click Set Target Rule.

In the Properties window, notice that the URI property is now set to a value similar to TestGrammar.grxml#InvoiceRule. This is a relative file reference.

You can easily choose whether grammar rule references are absolute or relative. Absolute references are created in stand-alone grammar files. Relative rule references are created in grammar files contained in a speech project.

Q.How can I enter and record prompt text in the prompt database?
A.

You know Microsoft Speech Application SDK 1.1 has great tools, but you probably don’t know it can automatically populate your prompt database for you. You might be entering prompts manually into a prompt database after the prompts are added to an application. There's an easier way. Use prompt validation to identify all the prompts in your application, and then click Add All to Database to automatically populate the transcription and extraction windows. When that's done, just click Record All to record your prompts.

To automatically populate a prompt database:

1.

Create a new speech Web application.

2.

Add a QA control, and then add a prompt to the control.

3.

Open the prompt database.

4.

On the Prompt Editor toolbar, click Prompt Validation.

5.

On the Prompt Validation toolbar, click Do Validate Solution.

6.

In the Solution Prompt Validation dialog box, select the project, and then click OK.

7.

On the Output Window toolbar, click Add All to Database.

8.

Select a row in the prompt database, and then click Record All on the Prompt Editor toolbar.

Prompt validation finds all the prompts that could possibly be called by the application. To automatically add the missing prompts to the database, click Add All to Database.

Q.How do I set TTS volume and speed?
A.

You may want to make a global change to the speed or volume of text-to-speech (TTS) prompts in your application. Altering speed or volume is easy to do by changing parameters in the Speechify configuration files, and then restarting the Speechify Voice service.

To set TTS volume and speed:

1.

In a text editor, open the configuration file for the application's TTS voice. For example, the Jill voice configuration file for English (United States) applications is Ojill8.xml. By default, Ojill8.xml is located at the path Program Files\Common Files\SpeechEngines\ScanSoft\Speechify\en-US\jill

2.

To change the TTS speed rate, find and change the value attribute for tts.audio.rate. The default value is 100.

3.

To change the TTS volume, find and change the value attribute for tts.audio.volume. The default value is 30.

4.

Restart the appropriate Speechify Voice service as described in the following steps.

5.

On the Windows® taskbar, click Start, point to Administrative Tools, and then click Services.

6.

In the Services pane right-click Speechify Voice - voice, where voice is the voice service you want to restart, and then click Restart.

Use the ssml:prosody element to change the speed and volume of individual prompts. Use the TTS voice's configuration file to make a global change to TTS characteristics.

Q.How do I resolve a 401 Error with Telephony Application Services?
A.

In Microsoft® Speech Server (MSS) 2004 R2, requests from Telephony Application Services (TAS) to Speech Engine Services (SES) may result in the following error, seen in the Application Log in Event Viewer.

"A call failed because SES URL 'http://<application>/SES/Lobby.asmx' could not be found. Please ensure that the TAS SpeechServer property is correct. The following error was returned: 80131509: 'The request failed with HTTP status 401: Unauthorized. (System.Net.WebException)'."

Internet Information Services (IIS) authentication settings may be changed unexpectedly by updates. This may require a manual change to restore the desired settings. See the following procedure.

To reset Windows authentication:

1.

On the Windows taskbar, click Start, right-click My Computer, and then click Manage.

2.

In the Computer Management dialog box, in the tree view pane, expand Services and Applications, expand Internet Information Services, and then expand Web Sites.

3.

Under Web Sites, browse to the MSS application directory, right-click the application directory, and then click Properties.

4.

In the application's Properties dialog box, on the Directory Security tab, locate Authentication and access control, and then click Edit.

5.

In the Authentication Methods dialog box, locate Authenticated access, clear the Integrated Windows authentication check box, and then click OK.

6.

Once again, in the application's Properties dialog box, on the Directory Security tab, locate Authentication and access control, and then click Edit.

7.

In the Authentication Methods dialog box, locate Authenticated access, and then select the Integrated Windows authentication check box.

8.

Click OK twice to return to the Computer Management dialog box.

Clearing Integrated Windows authentication and applying the change, and then setting it back and applying the restoration, resets IIS so that Lobby.asmx is accessible.

Q.How do I easily compare widely-separated prompt database values?
A.

A Microsoft Speech Server application prompt database contains sixteen columns, and can contain thousands of rows of data. You might want to compare the value in the first column of the first row, with the value in the sixteenth column of a row hundreds or thousands of rows distant from the top row. You could open two instances of Visual Studio and see two separate views of the prompt database, but there's an easier way, explained in this tip.

To view widely-separated prompt database fields:

1.

In Visual Studio, open a prompt database. Select the first row.

2.

Make sure the Properties pane is visible. To open the Properties pane, press F4. Note that the values for all sixteen columns of the selected row are visible in the Properties pane.

3.

Use the scroll bars to navigate to the last row in the database. Without selecting the last row, scroll left and right to view all columns in the last row. Note while you can freely scroll to view any column in any row, all values from the selected row are still visible in the Properties window.

Use the Properties window in combination with the Prompt Editor pane to display fields that otherwise are not visible at the same time.

Q.When I view an event log for a Microsoft Speech Server (MSS) system in a different time zone, the times shown for the events in the Event Viewer are not correct. How do I view the correct local time for the events as they occurred on the remote computer?
A.

If you need to troubleshoot a problem on an MSS system on a remote computer, you may need to view the event log for that remote system on a computer that is not in the same time zone. When you do this, the times shown for the events in the Event Viewer are offset by the difference in the time zones. For example, an event that occurred at 1:00 A.M. on a remote computer in the U.S. Eastern time zone, would appear to have occurred at 10:00 P.M. (a 3-hour difference) if viewed on a different computer in the Pacific time zone. This difference occurs for two reasons:

Event times are stored in the event log file (a file with an .evt extension) in Coordinated Universal Time (UTC), a time scale similar to Greenwich Mean Time (GMT).

Event Viewer calculates the event time recorded in the log file relative to the time zone of the computer used to view the event log, not the computer that originated the events.

This Event Viewer behavior can cause considerable confusion when the exact local time of the event is an important part of troubleshooting the problem. To show a more realistic picture of when events occurred, set the time zone on the local computer to match the time zone of the remote computer. This action forces Event Viewer to calculate event times relative to the time zone of the computer on which the events occurred.

To set the time zone:

1.

On the local computer, click Start, click Control Panel, and then double-click Date and Time.

2.

Click the Time Zone tab and then choose the remote computer time zone from the drop-down list.

3.

Click Apply and then click OK.

Note: When you are done viewing events that originated on the remote computer, set the local computer time zone back to the correct local time.

An alternative solution is to save the event log on the remote computer in text file format (a file with a .txt extension) or comma-separated value format (a file with a .csv extension). This solution causes Event Viewer to write the actual local time of the events to the file being saved, instead of the UTC.

To save an event log as a .txt or .csv file:

1.

In the Event Viewer console tree, click the log you want to save.

2.

On the Action menu, click Save Log File As.

3.

In File name, enter a name for the archived log file.

4.

In Save as type, select the .txt or .csv file format, and then click Save.

Q.How does the speech recognition engine in Microsoft Speech Server 2004 treat abbreviations, digit strings, dollar amounts, etc?
A.

To recognize the words and phrases specified in a grammar, the speech recognition (SR) engine in Microsoft Speech Server 2004 needs to look up the pronunciation of each word in the grammar. If your grammar contains abbreviations like "Mr. Smith", digit strings like "123" or dollar amounts like "$34.05" the SR engine first converts these strings into one or more unambiguous sequence of words in a process called text normalization. For example the speech recognition engine converts the string "123" into the word sequence "one hundred and twenty three". Once the string is converted the SR engine can then look up the pronunciation of each individual word and use this in the recognition process. Converting a string like "123" is not necessarily as straightforward as turning it into "one hundred and twenty three" though. Other valid interpretations might be "one two three", "hundred twenty three", "one twenty three" or "twelve three". Similarly the abbreviation "Dr." might mean "Doctor" or "Drive" and its correct interpretation is based on context, which can be complicated to determine.

Therefore it is always better to spell out phrases like abbreviations, digit strings, or dollar amounts in your grammar explicitly rather than rely on the SR engine to guess the appropriate phrase for them.

Below is a list of examples that show how the SR engine will normalize phrases for US English in Microsoft Speech Server 2004. Some examples have multiple normalized forms and in this case all are used as valid phrases in the grammar. This list is not exhaustive, but is meant to cover the most frequent and/or interesting cases:

Numbers

Less than 1,000
ExampleNormalized Form

925

nine hundred twenty five

925

nine hundred and twenty five

1000 to 9999 (because they could be interpreted as years)
ExampleNormalized Form

1905

nineteen oh five

2002

two thousand two

1500

fifteen hundred

10,000 to 4,000,000,000
ExampleNormalized Form

12340

one two three four oh

12,340

twelve thousand three hundred forty

12,340

twelve thousand three hundred and forty

Above 4,000,000,000
ExampleNormalized Form

12345678910

one two three four five six seven eight nine one oh

Decimal

ExampleNormalized Form

92.5

nine two point five

92.50

nine two point five oh

12345.6

one two three four five point six

12,345.6

twelve thousand three hundred forty five point six

Dollar Amounts

ExampleNormalized Form

$35.23

thirty five dollars and twenty three cents

$1

one dollar

$0.50

fifty cents

$45,000

forty five thousand dollars

$45000

dollar four five oh oh oh

Abbreviations
These are case insensitive.

ExampleNormalized Form

assoc

association

bldg

building

ch

chapter

cont

continued

cont

cont

corp

corporation

corp

corp

etc

etcetera

intl

international

jr

junior

mr

Mister

mrs

Missus

miss

Miss

mt

mountain

oz

ounce

oz

oz

pres

president

pres

pres

sec

S. E. C.

sec

second

sec

seconds

sq

square

sq

S. Q.

sr

senior

sr

S. R.

vol

volume

vol

vol

Symbols

ExampleNormalized Form

!

exclamation-point

"

quote

#

pound-sign

$

dollar

%

percent

&

ampersand

'

quote

(

paren

)

close-paren

*

asterisk

+

plus

,

comma

--

double-dash

-

hyphen

...

ellipsis

.

dot, period

/

slash

:

colon

;

semicolon

<

less-than

=

equals

>

greater-than

?

question-mark

@

at-sign

[

bracket

\

back-slash

]

close-bracket

^

circumflex

_

underscore

`

back-quote

{

left-brace

|

vertical-bar

}

right-brace

~

tilde

Q.How can I get my JScript files to support multiple character sets?
A.

In multilanguage projects, JScript files from any editor must be saved as Unicode (UTF-8 with signature) - Codepage 65001. In particular, when saving JScript files in Visual Studio .NET 2003 this selection must be made every time the file is saved, or the setting will be incorrect. If this is not done one possible result is that extended characters are incorrectly stripped from strings.

Visual Studio provides a setting that makes this the default setting whenever JScript files are saved. See the following procedure for details.

To set JScript file encoding in Visual Studio .NET 2003

1.

In Visual Studio .NET 2003, on the File menu select Advanced Save Options.

2.

In the Advanced Save Options dialog box, in the Encoding list select Unicode (UTF-8 with signature) - Codepage 65001.

3.

Click OK.

Q.How Can I Record Messages Longer Than 20 Seconds?
A.

In Microsoft Speech Server 2004, use the RecordSound control to record user speech. When you use the RecordSound control, you'll find that by default, recording ends after 20 seconds. If you want to record messages longer than 20 seconds there are three properties you can set to increase the timeout.

The EndSilence, BabbleTimeout, and MaxTimeout properties interact with each other to set the recording timeout. The default values for these properties are listed in the following table.

PropertyDefault Value

EndSilence property

1000 milliseconds

BabbleTimeout property

20000 milliseconds

MaxTimeout property

120000 milliseconds

The three properties interact in the following ways:

The EndSilence property sets the maximum length of any silent period after the time when the user starts speaking. Use EndSilence to determine the end of user speech.

The BabbleTimeout property sets the maximum time for recording the user's speech, beginning at the point that speech is detected.

The MaxTimeout property sets the maximum total time that can be recorded and must be equal to or greater than the sum of EndSilence and BabbleTimeout.

For example, assuming that the EndSilence and MaxTimeout properties are at their defaults, to record a message up to 30 seconds the only change needed is to set the BabbleTimeout property to 30000.

When the value of any of these properties is exceeded, recording ends and a file of the type specified by the Type property is written to the folder specified by the SavePath property. If the values of the BabbleTimeout or MaxTimeout properties are exceeded, the recording is only written to the file if the SavePartialRecording property is set to True.

Q.How Do I Send a Fax Using an ASP.NET Speech-Enabled Web Application?
A.

Sending a fax using an ASP.NET speech-enabled Web application is as easy as sending a fax using a non-speech-enabled ASP.NET Web application. This article briefly discusses the major tasks required to create a speech-enabled fax-back application, provides fax service implementation details, and points out several security issues.

Task Overview
In this scenario, we would like a customer to be able to call our fax-back service, listen to a list of document titles, choose a document, and receive a faxed copy of the document at a fax number that the customer provides. In order to do this, the application must be able to perform the following tasks:

1.

Load the document titles, present the list, and get the customer's selection.

2.

Elicit and confirm the fax number to which the customer wants the document sent.

3.

Fax the requested document.

Using the Microsoft Speech Application SDK (SASDK), you can easily accomplish the first two tasks. The SASDK includes Application Speech Controls that are well suited for these tasks.

Use the DataTableNavigator Speech Control to accomplish the first task. For a simple implementation, read the document titles directly from an XML file into a DataSet, and then bind the DataSet to the DataTableNavigator control. For a more sophisticated implementation, select the document information from a database table, fill a DataSet, and then bind the DataSet to the DataTableNavigator control. Alternatively, you can construct the document by using pieces of information from various sources.

For the second task, use the Phone Speech Control to get and confirm the customer's fax number. Assuming that the actual faxing is performed in a server-side event handler such as Page_Unload, the responses provided by the customer (such as selected document title and fax number) must be posted back to the server. The easiest way to do this is to enable AutoPostBack in the speech SemanticMap properties for each of the semantic items corresponding to the customer's responses (for example, document title, area code, and local number).

Implementing the Fax Service
Faxing information is the most challenging task because .NET Framework does not provide fax services. To work around this problem, use the Fax Service Extended COM API with early binding via the .NET COM Interop. Although you can use .NET reflection and late binding to access these COM objects, early binding yields better performance and makes the job of writing the code easier as well.

In order to use the Fax Service Extended COM API, you must first use Windows Setup to install the Fax Service component on the host computer. Once installed, for computers running Microsoft Windows XP and Windows Server 2003, the Fax Service is provided by the file fxscomex.dll, which is usually found in the Windows\System32 directory. If you are creating your page with Microsoft Visual Studio .NET 2003, add a reference to this DLL in your project so that Visual Studio imports the DLL's COM objects as .NET classes. If you are using a program other than Visual Studio, create the import library with the TLBIMP utility. In either case, make sure that the import library is in the bin subdirectory of your fax-back application's Web host virtual directory so that the application can automatically compile it into the assembly when it is first accessed.

The Fax API requires a physical file path to the document file that is to be faxed. If your documents are stored in a subdirectory of the Web host virtual directory, you can map document files to a physical file path using the MapPath() function that your fax-back application inherits from the Page class as follows:

String DataPath = this.MapPath(".");

The following code illustrates how to fax the document file. In this example, the fax-back application uses a dialing prefix to place a call outside of the fictitious company Proseware, and uses a remote fax server to send the fax.

FaxDocument  objFaxDoc = new FaxDocumentClass();
FaxServer objFaxServer = new FaxServerClass();

objFaxDoc.Body = String.Format(@"{0}\{1}", ProsewareDataPath, SelectedDocFile);
objFaxDoc.DocumentName = "Proseware FaxBack Document";
objFaxDoc.Priority = fxscomexassembly.FAX_PRIORITY_TYPE_ENUM.fptNORMAL;              
   // 0 == low, 1 == normal, 2 == high

string dialoutPrefix = "9";	
string faxRecipientNumber = String.Format("{0}{1}",
	dialoutPrefix,
	siFaxNumberLocalDigits.Text);

objFaxDoc.Recipients.Add(faxRecipientNumber, "Proseware Customer");
   // Adds the fax phone number and the name of addressee.

objFaxDoc.ReceiptType = fxscomexassembly.FAX_RECEIPT_TYPE_ENUM.frtNONE;         
   // 0 == no receipt, 1 == e-mail, 4 == message box

objFaxDoc.CoverPageType = fxscomexassembly.FAX_COVERPAGE_TYPE_ENUM.fcptLOCAL;         
   // 0 = no cover page, 1 = local cover page, 2 = server cover page

objFaxDoc.CoverPage = String.Format(@"{0}\Proseware.COV", ProsewareDataPath);       
   // The path to the cover page file. See MS Fax Server Cover Page editor.

objFaxDoc.Note = "Here is the document you requested.";    
   // The text of the note printed on the cover page.

objFaxDoc.ScheduleType = fxscomexassembly.FAX_SCHEDULE_TYPE_ENUM.fstNOW;        
   // 0 == "now" (as soon as possible), 1 = scheduled time, 
   // 2 = discounted period. See FaxOutgoingQueue.DiscountRateStart, etc.

objFaxDoc.Subject = String.Format("The document you requested: \"{0}\"", siSelectedDocName.Text);

   // All of the following lines set sender information:
objFaxDoc.Sender.Title = "Mr.";
objFaxDoc.Sender.Name = "Great Docs Fax Robot";
objFaxDoc.Sender.City = "Redmond";
objFaxDoc.Sender.State = "WA";
objFaxDoc.Sender.Company = "Proseware, Inc.";
objFaxDoc.Sender.Country = "USA";
objFaxDoc.Sender.Email = "FaxBackRobot@proseware.com";
objFaxDoc.Sender.FaxNumber = "11234567890";
objFaxDoc.Sender.HomePhone = "10987654321";
objFaxDoc.Sender.OfficeLocation = "Redmond";
objFaxDoc.Sender.OfficePhone = "12223334444";
objFaxDoc.Sender.StreetAddress = "Great Documents Library\nRedmond, WA 98052";
objFaxDoc.Sender.TSID = "ProsewareFAX";
objFaxDoc.Sender.ZipCode = "98052";
objFaxDoc.Sender.BillingCode = "NCC1701C";
objFaxDoc.Sender.Department = "Library Fax Support";

objFaxDoc.Sender.SaveDefaultSender();  
   // This saves the sender information for reuse if you want to send 
   // the document to multiple recipients using the same sender information. 

objFaxServer.Connect(@"REMOTEFAXSERVER01");
objFaxServer.Connect(@"REMOTEFAXSERVER01");
   // Connects to the fax server. See the second note following this code
   // sample for an explanation of why this method is called twice.

objFaxDoc.ConnectedSubmit(objFaxServer);
objFaxServer.Disconnect();

Note: Only computers running Windows Server 2003 can accept fax requests from remote client computers. If the fax server is running on a computer that is running Windows Server 2003, remote fax client computers cannot access the fax server through the Fax Services Extended COM API until you:

1.

Share the "fax printer" on the computer running Windows Server 2003.

2.

Add the "fax printer" to the remote fax client computer.

Note: A known bug in the FaxServer.Connect(FaxServerName) method causes the method to always creates a connection to the local fax server, even if a remote fax server is specified in the parameter, and even though the call appears to complete normally and the FaxServer.ServerName property returns the name of the remote fax server. If the computer on which the fax-back application is running is not running a fax server (in other words, if there is not a local fax server), the subsequent FaxServer.ConnectedSubmit() call fails and throws an exception. To work around this problem, connect to a remote fax server by calling the FaxServer.Connect(FaxServerName) method twice as illustrated in the previous code sample.

Fax-back Application Security Issues
As with any application, and particularly with an ASP.NET application, pay special attention to security issues. First, because the host page is executing server-side code (in its own account and security context) on behalf of an untrusted client, be careful not to expose sensitive information such as physical file paths and computer names to the client side.

Second, be aware that Fax Service faxes a document file by "printing" it as a temporary TIFF image using a Windows application that is:

Installed on the host computer.

Associated with the TIFF file type.

Set so that the ASP.NET process has rights to use it.

If the Windows application associated with the TIFF file type is compromised or replaced, "printing a fax" could compromise the host computer. For this reason, always ensure that the Windows TIFF application is properly protected and that the ASP.NET host process runs in a minimal security context.

Third, if you use a fax server computer that is independent of the ASP.NET host computer (the remote fax client computer), the remote fax client computer requires security rights to access the fax server, but not necessarily rights to anything else on the fax server computer. Always apply the Rule of Least Privilege: Grant only the minimum access needed to get your job done.

Be aware that by default, ASP.NET Web pages run in an application pool under the "Network Service" identity. The access permissions for this account are not sufficient to establish a connection to a remote fax server. Therefore, if you are developing a fax-back application that utilizes a remote fax server, you must do the following:

1.

Create an application pool that runs in an account that is a member of the Users group (or a higher-privilege group if required).

2.

Add the account (to which you assigned the application pool) to the IIS_WPG group. The IIS_WPG group is the group of accounts that can run the IIS working process required for a remote fax server connection.

Q.How do I use Speech Application Error pages?
A.

To gracefully respond to unexpected errors in voice-only applications on Microsoft® Speech Server 2004 (MSS), use custom error pages. MSS uses two types of error pages: application and system error pages. When an unexpected Speech Control error occurs, the application error page runs. If a more serious error occurs, the system error page runs. This tip provides information about how to create and specify these two types of error pages.

Using the Application Error Page
The application error page might be a custom error page or the default error page. The default application error page, DefaultErrorPage.aspx, is installed by the Microsoft Speech Application SDK (SASDK) at \Inetpub\wwwroot\aspnet_speech\<build number>\client_script. The default application error page plays one of several text-to-speech (TTS) prompts, depending on the nature of the error.

If the quality of TTS messages is acceptable but different message text is needed, rename and edit the default application error page. It is best not to edit the default page itself because it is a resource used by all speech applications on the Web server. If the quality of TTS messages is not adequate, replace the default page and its TTS prompts with a custom application error page containing QA controls that play recorded messages from the prompt database.

Specifying a Custom Application Error Page
To specify a custom application error page, use the appSettings tag in the Web.config file, as shown in the following example.

<configuration>
    <appSettings>
        <add key="errorpage" value="ErrorPage.aspx" />
    </appSettings>
</configuration>

Note By default, the Web.config file is located in the application's root folder and is visible in the Solution Explorer window.

Using the System Error Page
Provide a system error page to prevent calls from disconnecting without warning in the event of an unexpected error that the platform cannot recover from, including the following:

Web server HTTP errors, such as 404 and 500 errors

Application page errors, such as a failure to create the Document Object Model or a failure to compile inline JScript® code

JScript run-time errors

Problems with SES requests, such as SES becoming unavailable during a call

Specifying a System Error Page
There are two ways to specify the system error page: using the Microsoft Management Console (MMC) snap-in for MSS and using the error page meta tag.

To specify the system error page in the MMC, use the Global Error Page URL setting. For more information, see Adding a Speech Application in the MSS Help file, MSS.chm.

To specify the system error page on an .aspx page, use a meta tag as in the following example.

<meta http-equiv="error-page" content="http://MyServer/MyApplication/SystemErrorPage.html"/>

The error page setting, whether made in the MMC or using the meta tag, specifies the error page that is stored in the cache. Only one system error page per SALT interpreter is stored in the cache. The setting persists until it is overridden. If the page is specified in the MMC, that setting lasts until a meta tag on an application page overrides it. The new setting persists until a different setting is encountered in a subsequent navigation.

Creating a System Error Page
The system error page should be designed so that it does not rely on external services such as Speech Engine Services (SES) and the Web server. To do this, provide a .wav file containing the error message. To play the .wav file, use SALT elements in the .html file specified in the application start page meta tag described previously. See the following example.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html xmlns:SALT="http://www.saltforum.org/2002/SALT">
<head>
<meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
<meta name=ProgId content=VisualStudio.HTML>
<meta name=Originator content="Microsoft Visual Studio .NET 7.1">

<object id="Speechtags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC" VIEWASTEXT></object>
<?import namespace="salt" implementation="#Speechtags" ?>
<SALT:prompt id="SystemErrorPrompt">
  <SALT:content id="PromptContent" href="http://myServer/SystemError.wav" />
</SALT:prompt>

<script language=jscript>
function fnOnLoad()
{
  SystemErrorPrompt.Start();
}
</script>

</head>
<body onload="fnOnLoad()">
</body>
</html>

The .wav file must be 8-kHz mono a-law or mu-law compressed audio depending on the telephony standard of the locale where it is used. For better quality audio, use 8-kHz 16-bit PCM. Store the system error page on the Web server.

Conclusion
Error pages that play .wav files are a good alternative to the default SASDK error page or to calls disconnecting without warning. Use the described techniques to specify and create application and system error pages. Remember that system and application error pages have different scopes, respond to different types of errors, and are specified separately.

Q.How do I reduce false barge-in issues caused by prompt echo?
A.

Barge-in is an important feature of Microsoft® Speech Server 2004 (MSS) that allows the caller to interrupt a prompt. One of the main benefits of barge-in is that prompts can be designed so that novice users get sufficient guidance, while repeat users can quickly move through the application. However, using barge-in can sometimes be problematic. One of the main sources of barge-in problems is the presence of prompt echo. It can cause prompt playback to suddenly stop without the caller's intervention (false barge-in) and usually results in an incorrect recognition by the system.

This article discusses the causes of prompt echo, how to verify prompt echo by using log analysis tools, the steps you can take to reduce prompt echo, and finally, issues that should be considered when disabling barge-in.

Causes of prompt echo
Prompt echo is caused by one of two conditions:

An impedance mismatch at an analog connection point, which leads to a partial reflection of the prompt signal. This is typically created in one of two places:

Between an analog handset used by the caller and the Central Office,

Between an analog telephony card and the Private Branch Exchange (PBX).

Acoustic echo between the telephone speaker and the telephone microphone. In this case, the microphone hears the prompt being played by the speaker. This typically occurs when using speaker phones. Speaker phones may employ echo-elimination techniques such as half-duplex (in which speaker and microphone take turns and are not turned on at the same time) or acoustic echo cancellation.

False barge-in caused by an analog handset only occurs only on some calls. Only some analog handsets will produce a significantly loud enough echo to cause false barge-in. Also, it is generally only an issue for local calls because telephony service providers are required to provide network echo cancellation on long-distance telephone calls, which reduces the echo enough to avoid false barge-in. The length of the delay of the prompt echo increases with the distance of the echo source. Therefore, the prompt echo delay is often greater than what the telephony card echo canceller (available on both analog and digital telephony cards) can effectively eliminate.

In contrast, echo caused by the connection of an analog telephony card and the PBX typically results in consistent barge-in regardless of the type of phone that is used to call the system. Some amount of echo caused by the connection is unavoidable. Therefore, analog telephony cards are equipped with an echo-canceling feature that can usually significantly reduce the echo caused by the analog connection to the PBX. However, if there is a significant impedance mismatch between the telephony card and the PBX, the echo may be so strong that it cannot be sufficiently removed. In this case, contact your Intel representative to get assistance. For more information, see the Intel telecom support resource document titled Alternate Impedance and Gain settings for the DMV160LP and D/41JCT Boards.

Verifying the presence of echo by using the log analysis tools
You can easily verify the presence of significant echo by logging the recognition audio and playing it back.

You can turn on recognition audio logging by opening a command window on the computer running MSS, and changing the directory to the following folder:

%programfiles%\Microsoft Speech Server\Administrative Tools\Scripts

and running the MSSLogConfig.vbs file, using the following command:

cscript MSSLogConfig /activate /filter:RecognitionAudio

To play back the audio data, use "Microsoft Log Analysis Tools for Speech Applications" which is found as a redistributable installer, in the Microsoft Speech Application SDK Version 1.1 (SASDK). After a few calls have been received, files containing the logged audio can be extracted by using the MSSContentExtract log analysis tool. The audio files can be played in most standard audio players. If significant prompt echo is present it will be audible in the logged audio files.

For more information on how to setup logging and use the MSSContentExtract tool, see "Log Analysis Tools" in SASDK Help. The Help file also describes how to set up additional logging and how to use the CallViewer tool, which enables you to analyze the events logged by MSS.

Reducing echo
If echo caused by an analog handset is creating problems with your speech application, there are a few things you can do to reduce prompt echo:

Reduce the prompt volume to as low a level as possible, since this in turn lowers the volume of the prompt echo.

Use a toll-free number, since telephony service providers generally use network echo cancellation on all calls to such a number. Check with your telephony service provider to verify that it uses echo-cancellation with toll-free numbers.

Use an external echo canceller to reduce echo. An external echo canceller is a hardware device that can be inserted between the PBX and the telephony card. Since it is mainly used by network providers to provide network echo cancellation, it is only available for T1 connections.

Disabling barge-in
In most case the procedures outlined above will enable you to avoid false barge-ins caused by prompt echo. However, sometimes it is necessary to disable barge-in, because it is important to guarantee that the caller hears the entire prompt. This requires special care to avoid recognition problems that can occur when a caller speaks too quickly right after a prompt has been played.

When barge-in is disabled, the system does not begin to listen to the caller until it finishes playing the prompt. If the caller begins speaking before the completion of the prompt, the system will miss the beginning of the caller's utterance. Additionally, if the prompt ends with a silence, the silence may make the caller think the system is ready for a response when it is not. This timing is critical since callers often speak immediately after they think the prompt is finished. Responses that are partially cut off by the caller will cause misrecognitions, and completely missed responses will make the system seem unresponsive. Both of these errors can occur without obvious reason to the caller.

One method of ensuring proper timing is to set the beep property on prompt elements. This will cause the telephony card to play a beep right before it starts listening to the caller. This turn-taking cue is quickly picked up by users and they adapt their behavior to speak only after hearing the beep. Since this beep is generated by the telephony card, it ensures that the system is listening right after the beep, therefore eliminating any timing issues between the end of the prompt and the start of the listen.

Conclusion
Barge-in is an extremely useful feature that you can use to accommodate both novice and experienced callers. Being able to recognize, and possibly mitigate, the types of false barge-in issues that prompt echo can introduce into your applications gives you an advantage in developing more effective speech solutions.

Q.How do I export a .wav File From a Prompt Database?
A.

It is well known that you can use the Speech Prompt Editor in Microsoft Speech Application SDK Version 1 to import .wav files to a prompt database. Many users don't realize that it's also possible to export .wav files by using the Wave Editor, which is a tool included with the Speech Prompt Editor.

To export a .wav file from a prompt database

In the Speech Prompt Editor, double-click a .wav icon in the Transcription pane. The Wave Editor opens, displaying wave boundaries and tuning alignments for the selected .wav file.

In the Wave Editor, on the File menu, click Save Prompts.promptdb As.

In the Choose File Format dialog box, select a recording format and sampling frequency, and then click OK.

In the Save Copy As dialog box, select a save location, type a file name, and then click Save.

Q.How Can I Record Prompts for Application Speech Controls?
A.

You can spend significant time and money getting the prompts for a speech application recorded by professional voice talent. Then, if you add Application Speech Controls to the project, you find that the default prompts for those controls play as text-to-speech (TTS). As a result, application user experience is inconsistent, with users hearing a mixture of both professionally recorded prompts and TTS prompts.

Application Speech Controls are a valuable tool for speech application development. They make it easy to add frequently used functionality, such as collecting phone numbers and dates, to an application. The challenge in this case is to get Application Speech Control prompts to speak in the same voice as the rest of the application.

Speech Prompt Editor offers an easy and convenient, but little-known solution: import the transcriptions from the Application Speech Control, add them to the prompt database, and then record the transcriptions using the same voice talent you use to record the rest of the application prompts.

To import transcriptions from an Application Speech Control

1.

In Solution Explorer, open a prompt project, and then double-click a prompt database

2.

In Speech Prompt Editor, select Add New Item on the File menu

3.

In the Add New Item – <prompt project name> dialog box, select Prompt Project Items in the left pane

4.

In the right pane, select the template for the appropriate Application Speech Control

5.

Click Open

Conclusion
Application Speech Controls offer valuable functionality to speech application developers. To make these controls fit seamlessly into an application that uses recorded prompts, it is important to record the prompts in the Application Speech Control using the same voice as the other prompts. To do this, import the Application Speech Control prompt transcriptions into the prompt database, where they can then be edited and recorded in common with the application's other prompts.

Q.How do I use the Speech Application SDK Log Player to Make Testing and Debugging Easier?
A.

Log Player allows you to record and play back application debug sessions. Log Player is installed, by default, as part of the Microsoft Speech Application SDK in the \Program Files\Microsoft Speech Application SDK 1.0 folder.

Use Log Player as a shortcut to reach interesting points in a dialogue
Some parts of dialogues that need attention during debugging or testing might be difficult to reach, either because of the time needed to step through the dialogue or because of the need to provide specific inputs. Use Log Player to create a shortcut to a selected point in a dialogue or to reproduce a specific set of conditions for a test case. Run the application once to get to the desired state or location. Close Speech Debugging Console, and the log file is saved. Replay the log file to continue debugging manually at that point.

Use Log Player for regression testing
Use Log Player to detect changes to an application. When portions of a dialogue are complete and should not change, record a log file. Then, replay the log file periodically. If the application has changed, Log Player returns an error or a warning.

Use caution when recording dialogues that are still in development. If those dialogues change, the log files that contain them break and require re-recording, which can be time-consuming if numerous log files are affected.

Saving and Replaying Log Files
For information about saving and replaying log files, see the following two procedures.

To save a debug session log

1.

In Visual Studio .NET 2003, click Options on the Tools menu.

2.

In the left pane, select Speech Application SDK, and then select Speech Debugging Console.

3.

Under Logging, select Record Log Files.

4.

Click Browse, and then select a folder.

5.

Click Open.

When recording, log files are opened when the application is started in debug mode and closed when debugging stops. File names are time stamps and 14 digits in length. To prevent confusion, it is a good idea to give log files a descriptive name as soon as possible. Do not forget to clear the Record Log Files check box, unless you want to continue recording log files. Editing the log files is not recommended; when a dialogue changes, re-record the log file.

To replay a debug session log

1.

Click Start, select All Programs, select Microsoft Speech Application SDK Version 1.0, select Debugging Tools, and then select Speech Debugging Console Log Player.

2.

In Log Player, click Browse.

3.

In the Open A Log File box, select a log file, and then click Open.

4.

On the toolbar, click Start Replay.
Note When trying to identify which Log Player file you want to open, bear in mind that files are named using the format yyyymmddhhmmss.xml.

When replay starts, Speech Debugging Console opens and programmatic output text begins to stream into the Output pane. If the log file ends part way through the application, Speech Debugging Console remains open and ready for you to continue debugging manually at that point.

Simple and Strict Modes
Use Simple and Strict modes to choose how sensitive Log Player is to differences between the log file and the application.

If you want to know about the absolute slightest change, use Strict mode.

If you only want to know about changes that noticeably affect the user (such as changes in a dialogue, prompt, or SML), use Simple mode.

Replaying Multiple Log Files
You can also replay log files in batch mode. Create a batch file listing the log files to be played, and then run Log Player from the command line as described in the following procedure. The batch file lists the log files to be played and specifies a file to contain the batch replay results. Use the following sample and save the text in a file with an .xml file name extension.

<BatchReplay>
  <ResultsFilePath>BatchResults.xml</ResultsFilePath> 
  <Replay Mode="Strict">
    <LogFilePath>c:\SpeechLogs\20030507093859.xml</LogFilePath>
  </Replay>
  <Replay Mode="Strict">
    <LogFilePath>c:\SpeechLogs\20030507095458.xml</LogFilePath>
  </Replay>
</BatchReplay>

To run the batch
Open a command prompt window. Browse to the folder containing LogPlayer.exe. By default, this is \Program Files\Microsoft Speech Application SDK 1.0\SDKTools. Type the following at the command prompt.

LogPlayer myBatchFilePath\myBatchFileName

The log files run in the sequence specified in the batch file. See the results file after the batch finishes. Subsequent replays overwrite the results file unless you change the file name in the batch file.

Conclusion
Using Log Player to record application dialogues helps with automating debugging and testing tasks on the computer running the Speech Application SDK.

Q.How do I troubleshoot DTMF issues?
A.

An overview of DTMF processing
Microsoft Speech Server 2004 (MSS) is designed to recognize both speech and dual tone multi-frequency (DTMF) input. Although speech recognition is often a more attractive option, DTMF recognition is useful in cases such as keeping a PIN private or recognizing a credit card in a noisy environment. As an application runs, the DTMF inputs that the caller enters are stored in an internal DTMF buffer. This buffer is used by MSS to keep track of DTMF key presses and is beneficial for experienced callers who want to type ahead of the prompts. When a speech control with a DTMF grammar is activated, the contents of the DTMF buffer begin to be collected and compared with the DTMF grammar. Collection continues until it can be determined whether or not a match with the grammar exists, and then the collection is cleared. At this point, the remainder of the DTMF buffer is then ready for the next speech control activation, input collection, and DTMF grammar comparison.

Managing the RecordSound control
The DTMF buffer works in the standard way for most speech controls, except for the RecordSound control. Although this control reads from the DTMF buffer if the StopOnDtmf property is set to a value other than DtmfNone, it does not actually remove any DTMF inputs. The unique behavior of the RecordSound control provides a great benefit in this case:

When the caller presses a DTMF input to stop recording, it remains in the buffer so that it can be processed to perform additional tasks by the application. For example, you might want '#' to just mean that the control should stop recording, but '*' to mean cancel recording and reactivate the RecordSound control. You could then read the value of the DTMF input from the buffer with the next speech control that is activated, and then perform the appropriate application logic depending on that value.

It is useful to keep this behavior in mind when considering how to handle application logic for successive speech controls that capture DTMF input. If no processing of the DTMF input is planned, that input still remains in the buffer and could interfere with the processing of the next DTMF grammar. A common method is simply to set the PreFlush property to True on the next speech control, clearing the DTMF buffer of any extra inputs. This will prevent the DTMF input from interfering with DTMF recognition for the next speech control that is activated. For example, if the '#' key stops the recording but the DTMF grammar for the next speech control only recognizes numbers, a "No Reco: Out Of Grammar Key Press" (-13) error will occur.

On the other hand, if it is important that callers retain the ability to type ahead, you must provide additional application logic to account for the extra DTMF input that remains in the buffer. This can be done, for example, by temporarily pausing RunSpeech, pulling out the single DTMF input in the buffer that we wish to discard, and then resuming RunSpeech. The SALT dtmf element can be used to accomplish this task with the following steps:

1.

In the Speech Application SDK (SASDK), set the StopOnDtmf property of the RecordSound control to any value.

2.

In HTML view, verify that the SALT namespace is declared in the html element at the top of the page:

<HTML xmlns:salt="http://www.saltforum.org/2002/SALT">

3.

Now we can create SALT elements on the page. In HTML view, create a new SALT dtmf element as follows:

<salt:dtmf id="myDtmf" onreco="myDtmf_onreco()">
	<salt:grammar src="Grammars/myDtmf.grxml#Rule1" id="myDtmf_DtmfGrammar1">
	</salt:grammar>
   </salt:dtmf>

4.

In a script block before the myDtmf element, create the event handler for the onreco event of the myDtmf element:

function myDtmf_onreco()
{
	RunSpeech.Resume();
}	

5.

Create a new grammar rule, Rule1, in a grammar named myDtmf.grxml. This grammar should be designed to accept any DTMF input. Note that this grammar is just a placeholder for collecting the DTMF input that is in the buffer, but in this case its actual value does not matter. For more information on designing grammars, see the section titled "Creating Grammars" in the SASDK Help documentation. For an example of a DTMF grammar that accepts all input, see the section titled "Dtmf Remarks" in the SASDK Help documentation.

6.

In a script block before the RecordSound control, create an event handler for the OnClientDone event of the RecordSound control, say RecordSound1_OnClientDone:

function RecordSound1_OnClientDone()
{
	RunSpeech.Pause(false);
	myDtmf.Start();
}

To summarize what we have done here, the SALT dtmf element is started when the RecordSound element is done. While the dtmf element collects the single DTMF input from the buffer, RunSpeech is paused. Once the input is collected, RunSpeech resumes, allowing the next speech control to be activated by RunSpeech. For more information on the SALT dtmf element, see the section titled "dtmf Element" in the SASDK Help documentation. For more information on client side RunSpeech functionality, see the section titled "Additional Client Scripting Elements" in the SASDK Help documentation.

Troubleshooting with the Speech Debugging Console
Once you have designed your application to take into account the DTMF buffer behavior as described above, you can perform further debugging by taking advantage of the DTMF tab on the Speech Debugging Console, a tool that comes with the SASDK. Be sure to enable the Break on DTMF Start button at the top of the window when trying this, so that application runtime will pause to allow you to view the DTMF buffer or enter test DTMF input. When paused, the Use Buffer button near the bottom of the DTMF panel will allow you to submit the keys that are already stored in the buffer and then perform any additional DTMF key presses. Additionally, there is a Collection field, which shows the grouping of DTMF inputs that are being compared with the DTMF grammar. When collection begins, there is a Status of Collection Active. When the collection is accepted or rejected, the Status changes to Collection Finished.

Conclusion
By understanding how DTMF inputs are stored in the buffer, how the buffer works in coordination with different speech controls, and how the Speech Debugging Console can help with troubleshooting DTMF issues, your DTMF applications can be expanded to allow for call flows that satisfy both the novice and advanced caller. You will be able to combine the interaction of Command, RecordSound, QA, and many more controls with both DTMF and voice recognition logic?ultimately resulting in a speech application that is more intuitive and secure.

Q.How do I build a speech-enabled application to call customers?
A.

Outbound Dialing Applications

Consider this.

You've been carefully monitoring the progress of a vintage camera for the past several days on a popular e-bidding site. There are two hours left until the auction closes and you have the top bid. You're confident that you're going to get your hands on this camera as you leave for dinner at a restaurant. A little less than two hours later you receive a call on your cell phone. An automated agent tells you that someone just outbid you with less than a minute to go, offering a bid only slightly higher than yours. You converse with the agent, direct the agent to raise your bid, and win the camera!

Speech applications in which a caller dials in to book a flight, make a stock transaction, or to reach a person are very well known. Notifications via e-mail or instant messaging for important events are also very common. The happy marriage of these two types of communication, where the customer can receive a phone call triggered by events of their choosing and then engage in a natural conversation to direct a response to these events not only opens up numerous compelling end-user scenarios, but also highlight ways in which businesses can save money.

The types of applications in which outbound notifications can add value are nearly endless, such as applications that:

Remind customers of an upcoming dentist appointment and reschedule on the phone if needed

Alert bank customers about upcoming bills and provide them the capability to pay them on the phone

Notify parents of school closures

Inform store patrons of shopping opportunities specific to their interests

Call an employee and read out a set of important e-mails

Advise a manufacturer about changes in daily production schedules

Report important changing business conditions (such as changes in the stock market) to executives

Caution homeowners about potential nature hazards in their area

The Microsoft Speech Server (MSS) and the Speech Application SDK (SASDK) is uniquely positioned to provide developers with the ability to build complex outbound dialing applications, as well as deliver them on a highly robust and performant platform.

Getting Started
The easiest way to start getting acquainted with outbound dialing is to build a simple application in which you can press a button and receive an outbound call that speaks a message using text-to-speech (TTS).

In order to do this, you will need to create a new speech Web application and build the following pages:

A graphical user interface page (GUI) that has a text field for text input and a button that submits that text input and acts as the trigger for the voice user interface (VUI) application

The VUI page that contains a single prompt-only QA that takes the text and speaks it out

To find out more about how to build a simple outbound dialing application, see the "MakeCall Example" topic in the Microsoft Speech Application SDK 1.0 documentation.

The Next Level
Two of the most important components of an outbound dialing application that you will need to understand in order to build an application that can be deployed in an enterprise environment are the:

Notification generator: Every outbound application depends on the ability to generate one or more notifications. These notifications might be triggered by events as simple as a button click. For more advanced applications, Microsoft provides the SQL Notification Server (SQL NS), which is a set of software components that sits on top of SQL and monitors changing data and generates a notification event when a particular rule is matched (for example, if someone made a higher bid than you).

Notification queue: Managing multiple events requires the use of a queue to contain these notifications and to ensure timely and accurate delivery, while maintaining priority sequencing and logic handling for retries (for example, if a call is placed and the receiving phone is busy or if an answering machine picks up instead of a human).

The SASDK ships with a detailed reference application called Banking Alerts. Banking Alerts allows customers to choose which of three transaction events they want to receive notification about calls them when one of the events is triggered, and engages them in a conversation to elicit a response to the event. Banking Alerts provides detailed examples of a notification generator, a notification queue, and voice user interfaces.

Get started with Banking Alerts by reading the topic "The Banking Alerts Reference Application: Overview" in the SASDK Help documentation. To find this topic, in the SASDK Help documentation table of contents expand "Speech Application SDK," expand "Learning with Microsoft Speech Application SDK," expand "SASDK Sample and Reference Applications," and then click "The Banking Alerts Reference Application: Overview."

Let us know how you get along and we look forward to providing more tips and tricks on outbound dialing applications!

Q.How can I handle increased memory demand when running multiple applications on Microsoft Speech Server?
A.

Problem Description
Deploying multiple applications on Microsoft Speech Server 2004 may increase the load on available memory due to the additional application resources that are preloaded into system memory by default.

In such cases it may make sense to create an "engine partition," or to dedicate one Speech Engine Services (SES) engine configuration to handle the resources for a particular application. This solution avoids the default situation- all application resources are preloaded into all available SES engine instances.

In this article we use the following scenario to illustrate the issue:

Two applications are deployed on Microsoft Speech Server.

The first application, Application1, has a very large grammar that needs to be preloaded. Users access this application regularly and repeatedly.

The second application, Application2, also has a very large-and different-grammar that needs to be preloaded. Users rarely access this application more than once.
For example, Application2 might be an enrollment application for users of Application1.

Application1 receives many more calls than Application2.

The grammars for both applications are large enough that if both are preloaded in all engines, the maximum total available engines serving Application1 (the high-demand application) will be lower than if a partition is used. Note: Extra memory is consumed by preloading the low-demand application grammars in all engine instances.

Specific numbers are application dependent, but in our scenario it is reasonable to expect gains similar to the following:

1.

Preloading all grammars in all engine instances-max 15 engine instances serving both applications.

2.

Partitioning-max 20 engine instances serving Application1, 4 serving Application2.

The partition enabled Application1 to handle a higher volume of incoming calls without overusing system memory. The tradeoff is that Application2 handles a lower call volume.

Solution
The Speech Application software development kit (SASDK) creates a manifest file, Manifest.xml, by default when a new speech Web application is created. Developers add application-specific information to the file. For our scenario, we use the following manifest files:

Application1 manifest:

<?xml version="1.0" encoding="utf-8" ?>
<manifest>
  <application name="Application1">
    <resourceset type="TelephonyRecognizer">
      <resource src="Grammars/Library.grxml" /> 
      <resource src="Grammars/App1LargeGrammar.grxml" /> 
    </resourceset>
      <resourceset type="Voice">
      <resource src="Prompts/App1Prompts.prompts" /> 
    </resourceset>
  </application>
</manifest>

Application2 manifest:

<?xml version="1.0" encoding="utf-8" ?>
<manifest>
  <application name="Application2">
    <resourceset type="TelephonyRecognizer">
      <resource src="Grammars/Library.grxml" /> 
      <resource src="Grammars/App2LargeGrammar.grxml" /> 
    </resourceset>
    <resourceset type="Voice">
      <resource src="Prompts/App2Prompts.prompts" />
    </resourceset>
  </application>
</manifest>

When creating the dedicated engine configuration f