Want some helpful hints for getting the most out of the Microsoft Speech Application SDK? Check out the tips and tricks section.
Have other questions or comments? Join the discussion about the Microsoft Speech Application SDK. Visit our newsgroup at microsoft.public.netspeechsdk.
| Q. | How Can I Find More Tools to Troubleshoot Speech Applications? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Grammar compilation and loading, speech application call flow, and Speech Application Programming Interface (SAPI) error codes are all complex topics with which any speech application developer might need occasional help. At the Microsoft Download Center, there's a small collection of tools that a speech application developer might find useful. The tools were not included with the Microsoft Speech Application SDK Version 1.1 (SASDK). They are unsupported, but they’re free. The tools are listed and described in the following sections. GramStat Speech Utility for Microsoft Speech Technologies The GramStat Speech Utility is a command-line utility that provides statistics for both compiled files and raw grammar files. These statistics can be used to perform basic grammar analysis, and to troubleshoot grammar compilation problems and loading problems. Recognizer Speech Utility for Microsoft Speech Technologies The Recognizer speech utility is a command-line utility that is useful for the analysis of offline call flow, the diagnosis of simple speech recognition errors , and top-line error diagnosis for grammars, rules, and speech application installations. SAPIErr Speech Utility for Microsoft Speech Technologies The SAPIErr speech utility is a command-line lookup utility that is useful for deciphering SAPI error codes that are returned by either the speech recognizer, the Microsoft Speech Server 2004 prompt engine, or the SAPI itself. GetPron Speech Utility for Microsoft Speech Technologies The GetPron speech utility is a command-line tool that takes a list of words and the outputs pronunciations for those words that are used by the Microsoft Speech Server 2004 speech-recognition engine. BuildAppLex Speech Utility for Microsoft Speech Technologies The BuildAppLex.exe speech utility is a command-line tool that enables you to create an Application Lexicon by using the Speech API. The BuildAppLex.ese speech utility requires one required command-line argument: a text file that contains a list of words and their corresponding pronunciations. These tools can be downloaded by searching Microsoft.com for ‘speech utilities’ or by going to http://www.microsoft.com/downloads/details.aspx?FamilyID=52744fb8-9238-4cbd-b615-be2ca781880d&displaylang=en. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How Can I Make My Grammars Load Faster? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Grammars created and edited in Grammar Editor in Microsoft Speech Application SDK (SASDK) 1.1 are XML text files. Text is fine for development work and debugging, but speed is important in a production environment. Because a compiled grammar is smaller, it loads faster from the Web server. Use the command-line grammar compiler, SrGSGc.exe, to compile your XML grammars. The grammar compiler installs with the SASDK and by default is located at %SystemDrive%\Program Files\Microsoft Speech Application SDK 1.1\SDKTools\Bin. The following example shows how to compile a grammar called Input.grxml into a grammar called Output.cfg. Srgsgc.exe /O C:\myProject\Grammars\Input.grxml C:\myProject\Grammars\Output.cfg As a rule of thumb, compiling a 320-KB text grammar yields a 210-KB compiled grammar. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | Can I return more than one semantic value as the result of recognition on a single branch in a .grxml grammar? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | The short answer to this question is yes. The tip this month illustrates how to do this using semantic interpretation markup in .grxml grammars for Microsoft Speech Server (MSS) 2004 and MSS R2. Understanding the Issue To better understand the issue, imagine the following scenario. Suppose you want to create a directory assistance application for your organization. You want a user to be able to call your organization's main number, say the name of the person to whom the user wants to speak, and then offer the user the choice of connecting to either the contact person's office phone or cell phone. For this task, the grammar that your application uses must be able to recognize the person's name and return at least two semantic values: one value representing the person's office phone number and one representing the person's cell phone number. To accomplish this, for each contact you can use an item element to contain the contact name and corresponding tag elements to return the semantic information associated with that contact. The key here is that because the tag element contains ordinary ECMA script, you can use multiple script expressions within tag elements to declare, store, and return multiple property values. A Quick Review Before looking at a few examples, recall the following points about semantic interpretation in .grxml grammars:
With these points in mind, we are ready for a few examples. The examples illustrate how to create custom-defined properties of the Rule Variable, but differ in how they store the semantic values that you want to return for a successful recognition. These differences produce differently structured SML output. Example 1: Returning Semantic Values in Child Nodes of the SML Return The following example illustrates how you can store semantic values in the _value property for custom-defined properties so that these semantic values are returned as the content of child nodes. In this example, the script expressions contained in the first three tag elements initialize three custom properties of the Rule Variable: the first to hold the contact's name, the second to hold the contact's office phone extension, and the third to hold the contact's cell phone number. The script expressions contained in the last three tag elements set the semantic value for each custom property using the _value property of the custom property. <rule id="Contacts" scope="public">
<tag>$.ContactName={}</tag>
<tag>$.ContactOfficeExtension={}</tag>
<tag>$.ContactCellPhone={}</tag>
<item>John Smith
<tag>$.ContactName._value="John Smith"</tag>
<tag>$.ContactOfficeExtension._value="1234"</tag>
<tag>$.ContactCellPhone._value="5554321"</tag>
</item>
</rule>
<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
<ContactName confidence="0.805">John Smith</ContactName>
<ContactOfficeExtension confidence="0.805">1234</ContactOfficeExtension>
<ContactCellPhone confidence="0.805">5554321</ContactCellPhone>
</SML>
<rule id="Contacts" scope="public">
<tag>$.ContactName={};$.ContactOfficeExtension={};$.ContactCellPhone={}</tag>
<item>John Smith
<tag>$.ContactName._value="John Smith";
$.ContactOfficeExtension._value="1234";
$.ContactCellPhone._value="5554321"
</tag>
</item>
</rule>
Example 2: Returning Semantic Values as Attributes of the Top-Level SML Node The following example illustrates how you can store semantic values in the _attributes property of a custom-defined property so that these semantic values are returned as attributes of the child node. In this example, the script expression contained in the first tag element initializes the custom property as an object, the second expression sets the semantic value of the property itself, and the remaining expressions set attribute properties of the object. <rule id="Contacts">
<tag>$.ContactInfo={};</tag>
<item>John Smith
<tag>$.ContactInfo._value="John Smith"</tag>
<tag>$.ContactInfo._attributes.officeExtension="1234"</tag>
<tag>$.ContactInfo._attributes.cellPhone="5554321"</tag>
</item>
</rule>
<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
<ContactInfo confidence="0.805" officeExtension="1234" cellPhone="5554321">
John Smith
</ContactInfo>
</SML>
Example 3: Returning Semantic Values in an Array The following example illustrates how you can store semantic values in a custom-defined property that is initialized as an array so that these semantic values are returned in a series of item elements in the SML output. In this example, the script expression contained in the tag element initializes the custom property as an array containing the semantic values for the contact's office phone extension and cell phone number. <rule id="Contacts">
<item>John Smith
<tag>$.ContactInfo=["1234", "5554321"]</tag>
</item>
</rule>
<SML text="John Smith" utteranceConfidence="0.805" confidence="0.805">
<ContactInfo confidence="0.805">
<item confidence="0.805">1234</item>
<item confidence="0.805">5554321</item>
</ContactInfo>
</SML>
Conclusion The tip this month illustrates several ways to write markup in a .grxml grammar so that more than one semantic value is returned in SML output as the result of recognition on a single branch in the grammar. For a more thorough discussion of semantic interpretation markup, see the "Semantic Interpretation Markup" section in the MSS Help documentation. For additional examples, see the "SML Reference" section. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | Why do my RuleRefs use absolute file paths? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | When adding RuleRef elements to grammars in Speech Application SDK version 1.1, you might have wondered why the paths are absolute rather than relative. After all, absolute paths mean there is more work involved if you move the grammar to a different folder. For example, you might want to use a single grammar in multiple applications or want to change folders on the production server. There is a simple explanation and an easy way to choose relative paths or absolute paths for grammar rule references. If you open a grammar as a stand-alone file, the paths in RuleRef elements are absolute paths. However, if you open a grammar in a speech project, the paths in RuleRef elements are relative paths. To get absolute paths for grammar rule references:
In the Properties window, notice that the URI property is now set to a value similar to file:///C:MyGrammarFiles/TestGrammar.grxml#InvoiceRule. This is an absolute file reference. To get relative paths for grammar rule references:
In the Properties window, notice that the URI property is now set to a value similar to TestGrammar.grxml#InvoiceRule. This is a relative file reference. You can easily choose whether grammar rule references are absolute or relative. Absolute references are created in stand-alone grammar files. Relative rule references are created in grammar files contained in a speech project. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How can I enter and record prompt text in the prompt database? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | You know Microsoft Speech Application SDK 1.1 has great tools, but you probably don’t know it can automatically populate your prompt database for you. You might be entering prompts manually into a prompt database after the prompts are added to an application. There's an easier way. Use prompt validation to identify all the prompts in your application, and then click Add All to Database to automatically populate the transcription and extraction windows. When that's done, just click Record All to record your prompts. To automatically populate a prompt database:
Prompt validation finds all the prompts that could possibly be called by the application. To automatically add the missing prompts to the database, click Add All to Database. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I set TTS volume and speed? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | You may want to make a global change to the speed or volume of text-to-speech (TTS) prompts in your application. Altering speed or volume is easy to do by changing parameters in the Speechify configuration files, and then restarting the Speechify Voice service. To set TTS volume and speed:
Use the ssml:prosody element to change the speed and volume of individual prompts. Use the TTS voice's configuration file to make a global change to TTS characteristics. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I resolve a 401 Error with Telephony Application Services? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | In Microsoft® Speech Server (MSS) 2004 R2, requests from Telephony Application Services (TAS) to Speech Engine Services (SES) may result in the following error, seen in the Application Log in Event Viewer. "A call failed because SES URL 'http://<application>/SES/Lobby.asmx' could not be found. Please ensure that the TAS SpeechServer property is correct. The following error was returned: 80131509: 'The request failed with HTTP status 401: Unauthorized. (System.Net.WebException)'." Internet Information Services (IIS) authentication settings may be changed unexpectedly by updates. This may require a manual change to restore the desired settings. See the following procedure. To reset Windows authentication:
Clearing Integrated Windows authentication and applying the change, and then setting it back and applying the restoration, resets IIS so that Lobby.asmx is accessible. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I easily compare widely-separated prompt database values? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | A Microsoft Speech Server application prompt database contains sixteen columns, and can contain thousands of rows of data. You might want to compare the value in the first column of the first row, with the value in the sixteenth column of a row hundreds or thousands of rows distant from the top row. You could open two instances of Visual Studio and see two separate views of the prompt database, but there's an easier way, explained in this tip. To view widely-separated prompt database fields:
Use the Properties window in combination with the Prompt Editor pane to display fields that otherwise are not visible at the same time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | When I view an event log for a Microsoft Speech Server (MSS) system in a different time zone, the times shown for the events in the Event Viewer are not correct. How do I view the correct local time for the events as they occurred on the remote computer? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | If you need to troubleshoot a problem on an MSS system on a remote computer, you may need to view the event log for that remote system on a computer that is not in the same time zone. When you do this, the times shown for the events in the Event Viewer are offset by the difference in the time zones. For example, an event that occurred at 1:00 A.M. on a remote computer in the U.S. Eastern time zone, would appear to have occurred at 10:00 P.M. (a 3-hour difference) if viewed on a different computer in the Pacific time zone. This difference occurs for two reasons:
This Event Viewer behavior can cause considerable confusion when the exact local time of the event is an important part of troubleshooting the problem. To show a more realistic picture of when events occurred, set the time zone on the local computer to match the time zone of the remote computer. This action forces Event Viewer to calculate event times relative to the time zone of the computer on which the events occurred. To set the time zone:
Note: When you are done viewing events that originated on the remote computer, set the local computer time zone back to the correct local time. An alternative solution is to save the event log on the remote computer in text file format (a file with a .txt extension) or comma-separated value format (a file with a .csv extension). This solution causes Event Viewer to write the actual local time of the events to the file being saved, instead of the UTC. To save an event log as a .txt or .csv file:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How does the speech recognition engine in Microsoft Speech Server 2004 treat abbreviations, digit strings, dollar amounts, etc? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | To recognize the words and phrases specified in a grammar, the speech recognition (SR) engine in Microsoft Speech Server 2004 needs to look up the pronunciation of each word in the grammar. If your grammar contains abbreviations like "Mr. Smith", digit strings like "123" or dollar amounts like "$34.05" the SR engine first converts these strings into one or more unambiguous sequence of words in a process called text normalization. For example the speech recognition engine converts the string "123" into the word sequence "one hundred and twenty three". Once the string is converted the SR engine can then look up the pronunciation of each individual word and use this in the recognition process. Converting a string like "123" is not necessarily as straightforward as turning it into "one hundred and twenty three" though. Other valid interpretations might be "one two three", "hundred twenty three", "one twenty three" or "twelve three". Similarly the abbreviation "Dr." might mean "Doctor" or "Drive" and its correct interpretation is based on context, which can be complicated to determine. Therefore it is always better to spell out phrases like abbreviations, digit strings, or dollar amounts in your grammar explicitly rather than rely on the SR engine to guess the appropriate phrase for them. Below is a list of examples that show how the SR engine will normalize phrases for US English in Microsoft Speech Server 2004. Some examples have multiple normalized forms and in this case all are used as valid phrases in the grammar. This list is not exhaustive, but is meant to cover the most frequent and/or interesting cases: Numbers
Decimal
Dollar Amounts
Abbreviations
Symbols
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How can I get my JScript files to support multiple character sets? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | In multilanguage projects, JScript files from any editor must be saved as Unicode (UTF-8 with signature) - Codepage 65001. In particular, when saving JScript files in Visual Studio .NET 2003 this selection must be made every time the file is saved, or the setting will be incorrect. If this is not done one possible result is that extended characters are incorrectly stripped from strings. Visual Studio provides a setting that makes this the default setting whenever JScript files are saved. See the following procedure for details. To set JScript file encoding in Visual Studio .NET 2003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How Can I Record Messages Longer Than 20 Seconds? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | In Microsoft Speech Server 2004, use the RecordSound control to record user speech. When you use the RecordSound control, you'll find that by default, recording ends after 20 seconds. If you want to record messages longer than 20 seconds there are three properties you can set to increase the timeout. The EndSilence, BabbleTimeout, and MaxTimeout properties interact with each other to set the recording timeout. The default values for these properties are listed in the following table.
The three properties interact in the following ways:
For example, assuming that the EndSilence and MaxTimeout properties are at their defaults, to record a message up to 30 seconds the only change needed is to set the BabbleTimeout property to 30000. When the value of any of these properties is exceeded, recording ends and a file of the type specified by the Type property is written to the folder specified by the SavePath property. If the values of the BabbleTimeout or MaxTimeout properties are exceeded, the recording is only written to the file if the SavePartialRecording property is set to True. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How Do I Send a Fax Using an ASP.NET Speech-Enabled Web Application? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Sending a fax using an ASP.NET speech-enabled Web application is as easy as sending a fax using a non-speech-enabled ASP.NET Web application. This article briefly discusses the major tasks required to create a speech-enabled fax-back application, provides fax service implementation details, and points out several security issues. Task Overview
Using the Microsoft Speech Application SDK (SASDK), you can easily accomplish the first two tasks. The SASDK includes Application Speech Controls that are well suited for these tasks. Use the DataTableNavigator Speech Control to accomplish the first task. For a simple implementation, read the document titles directly from an XML file into a DataSet, and then bind the DataSet to the DataTableNavigator control. For a more sophisticated implementation, select the document information from a database table, fill a DataSet, and then bind the DataSet to the DataTableNavigator control. Alternatively, you can construct the document by using pieces of information from various sources. For the second task, use the Phone Speech Control to get and confirm the customer's fax number. Assuming that the actual faxing is performed in a server-side event handler such as Page_Unload, the responses provided by the customer (such as selected document title and fax number) must be posted back to the server. The easiest way to do this is to enable AutoPostBack in the speech SemanticMap properties for each of the semantic items corresponding to the customer's responses (for example, document title, area code, and local number). Implementing the Fax Service In order to use the Fax Service Extended COM API, you must first use Windows Setup to install the Fax Service component on the host computer. Once installed, for computers running Microsoft Windows XP and Windows Server 2003, the Fax Service is provided by the file fxscomex.dll, which is usually found in the Windows\System32 directory. If you are creating your page with Microsoft Visual Studio .NET 2003, add a reference to this DLL in your project so that Visual Studio imports the DLL's COM objects as .NET classes. If you are using a program other than Visual Studio, create the import library with the TLBIMP utility. In either case, make sure that the import library is in the bin subdirectory of your fax-back application's Web host virtual directory so that the application can automatically compile it into the assembly when it is first accessed. The Fax API requires a physical file path to the document file that is to be faxed. If your documents are stored in a subdirectory of the Web host virtual directory, you can map document files to a physical file path using the MapPath() function that your fax-back application inherits from the Page class as follows:
String DataPath = this.MapPath(".");
The following code illustrates how to fax the document file. In this example, the fax-back application uses a dialing prefix to place a call outside of the fictitious company Proseware, and uses a remote fax server to send the fax.
FaxDocument objFaxDoc = new FaxDocumentClass();
FaxServer objFaxServer = new FaxServerClass();
objFaxDoc.Body = String.Format(@"{0}\{1}", ProsewareDataPath, SelectedDocFile);
objFaxDoc.DocumentName = "Proseware FaxBack Document";
objFaxDoc.Priority = fxscomexassembly.FAX_PRIORITY_TYPE_ENUM.fptNORMAL;
// 0 == low, 1 == normal, 2 == high
string dialoutPrefix = "9";
string faxRecipientNumber = String.Format("{0}{1}",
dialoutPrefix,
siFaxNumberLocalDigits.Text);
objFaxDoc.Recipients.Add(faxRecipientNumber, "Proseware Customer");
// Adds the fax phone number and the name of addressee.
objFaxDoc.ReceiptType = fxscomexassembly.FAX_RECEIPT_TYPE_ENUM.frtNONE;
// 0 == no receipt, 1 == e-mail, 4 == message box
objFaxDoc.CoverPageType = fxscomexassembly.FAX_COVERPAGE_TYPE_ENUM.fcptLOCAL;
// 0 = no cover page, 1 = local cover page, 2 = server cover page
objFaxDoc.CoverPage = String.Format(@"{0}\Proseware.COV", ProsewareDataPath);
// The path to the cover page file. See MS Fax Server Cover Page editor.
objFaxDoc.Note = "Here is the document you requested.";
// The text of the note printed on the cover page.
objFaxDoc.ScheduleType = fxscomexassembly.FAX_SCHEDULE_TYPE_ENUM.fstNOW;
// 0 == "now" (as soon as possible), 1 = scheduled time,
// 2 = discounted period. See FaxOutgoingQueue.DiscountRateStart, etc.
objFaxDoc.Subject = String.Format("The document you requested: \"{0}\"", siSelectedDocName.Text);
// All of the following lines set sender information:
objFaxDoc.Sender.Title = "Mr.";
objFaxDoc.Sender.Name = "Great Docs Fax Robot";
objFaxDoc.Sender.City = "Redmond";
objFaxDoc.Sender.State = "WA";
objFaxDoc.Sender.Company = "Proseware, Inc.";
objFaxDoc.Sender.Country = "USA";
objFaxDoc.Sender.Email = "FaxBackRobot@proseware.com";
objFaxDoc.Sender.FaxNumber = "11234567890";
objFaxDoc.Sender.HomePhone = "10987654321";
objFaxDoc.Sender.OfficeLocation = "Redmond";
objFaxDoc.Sender.OfficePhone = "12223334444";
objFaxDoc.Sender.StreetAddress = "Great Documents Library\nRedmond, WA 98052";
objFaxDoc.Sender.TSID = "ProsewareFAX";
objFaxDoc.Sender.ZipCode = "98052";
objFaxDoc.Sender.BillingCode = "NCC1701C";
objFaxDoc.Sender.Department = "Library Fax Support";
objFaxDoc.Sender.SaveDefaultSender();
// This saves the sender information for reuse if you want to send
// the document to multiple recipients using the same sender information.
objFaxServer.Connect(@"REMOTEFAXSERVER01");
objFaxServer.Connect(@"REMOTEFAXSERVER01");
// Connects to the fax server. See the second note following this code
// sample for an explanation of why this method is called twice.
objFaxDoc.ConnectedSubmit(objFaxServer);
objFaxServer.Disconnect();
Note: Only computers running Windows Server 2003 can accept fax requests from remote client computers. If the fax server is running on a computer that is running Windows Server 2003, remote fax client computers cannot access the fax server through the Fax Services Extended COM API until you:
Note: A known bug in the FaxServer.Connect(FaxServerName) method causes the method to always creates a connection to the local fax server, even if a remote fax server is specified in the parameter, and even though the call appears to complete normally and the FaxServer.ServerName property returns the name of the remote fax server. If the computer on which the fax-back application is running is not running a fax server (in other words, if there is not a local fax server), the subsequent FaxServer.ConnectedSubmit() call fails and throws an exception. To work around this problem, connect to a remote fax server by calling the FaxServer.Connect(FaxServerName) method twice as illustrated in the previous code sample. Fax-back Application Security Issues Second, be aware that Fax Service faxes a document file by "printing" it as a temporary TIFF image using a Windows application that is:
If the Windows application associated with the TIFF file type is compromised or replaced, "printing a fax" could compromise the host computer. For this reason, always ensure that the Windows TIFF application is properly protected and that the ASP.NET host process runs in a minimal security context. Third, if you use a fax server computer that is independent of the ASP.NET host computer (the remote fax client computer), the remote fax client computer requires security rights to access the fax server, but not necessarily rights to anything else on the fax server computer. Always apply the Rule of Least Privilege: Grant only the minimum access needed to get your job done. Be aware that by default, ASP.NET Web pages run in an application pool under the "Network Service" identity. The access permissions for this account are not sufficient to establish a connection to a remote fax server. Therefore, if you are developing a fax-back application that utilizes a remote fax server, you must do the following:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I use Speech Application Error pages? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | To gracefully respond to unexpected errors in voice-only applications on Microsoft® Speech Server 2004 (MSS), use custom error pages. MSS uses two types of error pages: application and system error pages. When an unexpected Speech Control error occurs, the application error page runs. If a more serious error occurs, the system error page runs. This tip provides information about how to create and specify these two types of error pages. Using the Application Error Page If the quality of TTS messages is acceptable but different message text is needed, rename and edit the default application error page. It is best not to edit the default page itself because it is a resource used by all speech applications on the Web server. If the quality of TTS messages is not adequate, replace the default page and its TTS prompts with a custom application error page containing QA controls that play recorded messages from the prompt database. Specifying a Custom Application Error Page
<configuration>
<appSettings>
<add key="errorpage" value="ErrorPage.aspx" />
</appSettings>
</configuration>
Note By default, the Web.config file is located in the application's root folder and is visible in the Solution Explorer window. Using the System Error Page
Specifying a System Error Page
<meta http-equiv="error-page" content="http://MyServer/MyApplication/SystemErrorPage.html"/> The error page setting, whether made in the MMC or using the meta tag, specifies the error page that is stored in the cache. Only one system error page per SALT interpreter is stored in the cache. The setting persists until it is overridden. If the page is specified in the MMC, that setting lasts until a meta tag on an application page overrides it. The new setting persists until a different setting is encountered in a subsequent navigation. Creating a System Error Page
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html xmlns:SALT="http://www.saltforum.org/2002/SALT">
<head>
<meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
<meta name=ProgId content=VisualStudio.HTML>
<meta name=Originator content="Microsoft Visual Studio .NET 7.1">
<object id="Speechtags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC" VIEWASTEXT></object>
<?import namespace="salt" implementation="#Speechtags" ?>
<SALT:prompt id="SystemErrorPrompt">
<SALT:content id="PromptContent" href="http://myServer/SystemError.wav" />
</SALT:prompt>
<script language=jscript>
function fnOnLoad()
{
SystemErrorPrompt.Start();
}
</script>
</head>
<body onload="fnOnLoad()">
</body>
</html>
The .wav file must be 8-kHz mono a-law or mu-law compressed audio depending on the telephony standard of the locale where it is used. For better quality audio, use 8-kHz 16-bit PCM. Store the system error page on the Web server. Conclusion | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I reduce false barge-in issues caused by prompt echo? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Barge-in is an important feature of Microsoft® Speech Server 2004 (MSS) that allows the caller to interrupt a prompt. One of the main benefits of barge-in is that prompts can be designed so that novice users get sufficient guidance, while repeat users can quickly move through the application. However, using barge-in can sometimes be problematic. One of the main sources of barge-in problems is the presence of prompt echo. It can cause prompt playback to suddenly stop without the caller's intervention (false barge-in) and usually results in an incorrect recognition by the system. This article discusses the causes of prompt echo, how to verify prompt echo by using log analysis tools, the steps you can take to reduce prompt echo, and finally, issues that should be considered when disabling barge-in. Causes of prompt echo
False barge-in caused by an analog handset only occurs only on some calls. Only some analog handsets will produce a significantly loud enough echo to cause false barge-in. Also, it is generally only an issue for local calls because telephony service providers are required to provide network echo cancellation on long-distance telephone calls, which reduces the echo enough to avoid false barge-in. The length of the delay of the prompt echo increases with the distance of the echo source. Therefore, the prompt echo delay is often greater than what the telephony card echo canceller (available on both analog and digital telephony cards) can effectively eliminate. In contrast, echo caused by the connection of an analog telephony card and the PBX typically results in consistent barge-in regardless of the type of phone that is used to call the system. Some amount of echo caused by the connection is unavoidable. Therefore, analog telephony cards are equipped with an echo-canceling feature that can usually significantly reduce the echo caused by the analog connection to the PBX. However, if there is a significant impedance mismatch between the telephony card and the PBX, the echo may be so strong that it cannot be sufficiently removed. In this case, contact your Intel representative to get assistance. For more information, see the Intel telecom support resource document titled Alternate Impedance and Gain settings for the DMV160LP and D/41JCT Boards. Verifying the presence of echo by using the log analysis tools You can turn on recognition audio logging by opening a command window on the computer running MSS, and changing the directory to the following folder: %programfiles%\Microsoft Speech Server\Administrative Tools\Scripts and running the MSSLogConfig.vbs file, using the following command: cscript MSSLogConfig /activate /filter:RecognitionAudio To play back the audio data, use "Microsoft Log Analysis Tools for Speech Applications" which is found as a redistributable installer, in the Microsoft Speech Application SDK Version 1.1 (SASDK). After a few calls have been received, files containing the logged audio can be extracted by using the MSSContentExtract log analysis tool. The audio files can be played in most standard audio players. If significant prompt echo is present it will be audible in the logged audio files. For more information on how to setup logging and use the MSSContentExtract tool, see "Log Analysis Tools" in SASDK Help. The Help file also describes how to set up additional logging and how to use the CallViewer tool, which enables you to analyze the events logged by MSS. Reducing echo
Disabling barge-in When barge-in is disabled, the system does not begin to listen to the caller until it finishes playing the prompt. If the caller begins speaking before the completion of the prompt, the system will miss the beginning of the caller's utterance. Additionally, if the prompt ends with a silence, the silence may make the caller think the system is ready for a response when it is not. This timing is critical since callers often speak immediately after they think the prompt is finished. Responses that are partially cut off by the caller will cause misrecognitions, and completely missed responses will make the system seem unresponsive. Both of these errors can occur without obvious reason to the caller. One method of ensuring proper timing is to set the beep property on prompt elements. This will cause the telephony card to play a beep right before it starts listening to the caller. This turn-taking cue is quickly picked up by users and they adapt their behavior to speak only after hearing the beep. Since this beep is generated by the telephony card, it ensures that the system is listening right after the beep, therefore eliminating any timing issues between the end of the prompt and the start of the listen. Conclusion | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I export a .wav File From a Prompt Database? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | It is well known that you can use the Speech Prompt Editor in Microsoft Speech Application SDK Version 1 to import .wav files to a prompt database. Many users don't realize that it's also possible to export .wav files by using the Wave Editor, which is a tool included with the Speech Prompt Editor. To export a .wav file from a prompt database
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How Can I Record Prompts for Application Speech Controls? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | You can spend significant time and money getting the prompts for a speech application recorded by professional voice talent. Then, if you add Application Speech Controls to the project, you find that the default prompts for those controls play as text-to-speech (TTS). As a result, application user experience is inconsistent, with users hearing a mixture of both professionally recorded prompts and TTS prompts. Application Speech Controls are a valuable tool for speech application development. They make it easy to add frequently used functionality, such as collecting phone numbers and dates, to an application. The challenge in this case is to get Application Speech Control prompts to speak in the same voice as the rest of the application. Speech Prompt Editor offers an easy and convenient, but little-known solution: import the transcriptions from the Application Speech Control, add them to the prompt database, and then record the transcriptions using the same voice talent you use to record the rest of the application prompts. To import transcriptions from an Application Speech Control
Conclusion | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I use the Speech Application SDK Log Player to Make Testing and Debugging Easier? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Log Player allows you to record and play back application debug sessions. Log Player is installed, by default, as part of the Microsoft Speech Application SDK in the \Program Files\Microsoft Speech Application SDK 1.0 folder.
Use caution when recording dialogues that are still in development. If those dialogues change, the log files that contain them break and require re-recording, which can be time-consuming if numerous log files are affected. Saving and Replaying Log Files To save a debug session log
When recording, log files are opened when the application is started in debug mode and closed when debugging stops. File names are time stamps and 14 digits in length. To prevent confusion, it is a good idea to give log files a descriptive name as soon as possible. Do not forget to clear the Record Log Files check box, unless you want to continue recording log files. Editing the log files is not recommended; when a dialogue changes, re-record the log file. To replay a debug session log
When replay starts, Speech Debugging Console opens and programmatic output text begins to stream into the Output pane. If the log file ends part way through the application, Speech Debugging Console remains open and ready for you to continue debugging manually at that point. Simple and Strict Modes
Replaying Multiple Log Files
<BatchReplay>
<ResultsFilePath>BatchResults.xml</ResultsFilePath>
<Replay Mode="Strict">
<LogFilePath>c:\SpeechLogs\20030507093859.xml</LogFilePath>
</Replay>
<Replay Mode="Strict">
<LogFilePath>c:\SpeechLogs\20030507095458.xml</LogFilePath>
</Replay>
</BatchReplay>
To run the batch LogPlayer myBatchFilePath\myBatchFileName The log files run in the sequence specified in the batch file. See the results file after the batch finishes. Subsequent replays overwrite the results file unless you change the file name in the batch file. Conclusion | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I troubleshoot DTMF issues? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | An overview of DTMF processing Managing the RecordSound control When the caller presses a DTMF input to stop recording, it remains in the buffer so that it can be processed to perform additional tasks by the application. For example, you might want '#' to just mean that the control should stop recording, but '*' to mean cancel recording and reactivate the RecordSound control. You could then read the value of the DTMF input from the buffer with the next speech control that is activated, and then perform the appropriate application logic depending on that value. It is useful to keep this behavior in mind when considering how to handle application logic for successive speech controls that capture DTMF input. If no processing of the DTMF input is planned, that input still remains in the buffer and could interfere with the processing of the next DTMF grammar. A common method is simply to set the PreFlush property to True on the next speech control, clearing the DTMF buffer of any extra inputs. This will prevent the DTMF input from interfering with DTMF recognition for the next speech control that is activated. For example, if the '#' key stops the recording but the DTMF grammar for the next speech control only recognizes numbers, a "No Reco: Out Of Grammar Key Press" (-13) error will occur. On the other hand, if it is important that callers retain the ability to type ahead, you must provide additional application logic to account for the extra DTMF input that remains in the buffer. This can be done, for example, by temporarily pausing RunSpeech, pulling out the single DTMF input in the buffer that we wish to discard, and then resuming RunSpeech. The SALT dtmf element can be used to accomplish this task with the following steps:
To summarize what we have done here, the SALT dtmf element is started when the RecordSound element is done. While the dtmf element collects the single DTMF input from the buffer, RunSpeech is paused. Once the input is collected, RunSpeech resumes, allowing the next speech control to be activated by RunSpeech. For more information on the SALT dtmf element, see the section titled "dtmf Element" in the SASDK Help documentation. For more information on client side RunSpeech functionality, see the section titled "Additional Client Scripting Elements" in the SASDK Help documentation. Troubleshooting with the Speech Debugging Console Conclusion | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I build a speech-enabled application to call customers? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Outbound Dialing Applications Consider this. You've been carefully monitoring the progress of a vintage camera for the past several days on a popular e-bidding site. There are two hours left until the auction closes and you have the top bid. You're confident that you're going to get your hands on this camera as you leave for dinner at a restaurant. A little less than two hours later you receive a call on your cell phone. An automated agent tells you that someone just outbid you with less than a minute to go, offering a bid only slightly higher than yours. You converse with the agent, direct the agent to raise your bid, and win the camera! Speech applications in which a caller dials in to book a flight, make a stock transaction, or to reach a person are very well known. Notifications via e-mail or instant messaging for important events are also very common. The happy marriage of these two types of communication, where the customer can receive a phone call triggered by events of their choosing and then engage in a natural conversation to direct a response to these events not only opens up numerous compelling end-user scenarios, but also highlight ways in which businesses can save money. The types of applications in which outbound notifications can add value are nearly endless, such as applications that:
The Microsoft Speech Server (MSS) and the Speech Application SDK (SASDK) is uniquely positioned to provide developers with the ability to build complex outbound dialing applications, as well as deliver them on a highly robust and performant platform. Getting Started In order to do this, you will need to create a new speech Web application and build the following pages:
To find out more about how to build a simple outbound dialing application, see the "MakeCall Example" topic in the Microsoft Speech Application SDK 1.0 documentation. The Next Level
The SASDK ships with a detailed reference application called Banking Alerts. Banking Alerts allows customers to choose which of three transaction events they want to receive notification about calls them when one of the events is triggered, and engages them in a conversation to elicit a response to the event. Banking Alerts provides detailed examples of a notification generator, a notification queue, and voice user interfaces. Get started with Banking Alerts by reading the topic "The Banking Alerts Reference Application: Overview" in the SASDK Help documentation. To find this topic, in the SASDK Help documentation table of contents expand "Speech Application SDK," expand "Learning with Microsoft Speech Application SDK," expand "SASDK Sample and Reference Applications," and then click "The Banking Alerts Reference Application: Overview." Let us know how you get along and we look forward to providing more tips and tricks on outbound dialing applications! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How can I handle increased memory demand when running multiple applications on Microsoft Speech Server? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Problem Description In such cases it may make sense to create an "engine partition," or to dedicate one Speech Engine Services (SES) engine configuration to handle the resources for a particular application. This solution avoids the default situation- all application resources are preloaded into all available SES engine instances. In this article we use the following scenario to illustrate the issue:
Specific numbers are application dependent, but in our scenario it is reasonable to expect gains similar to the following:
The partition enabled Application1 to handle a higher volume of incoming calls without overusing system memory. The tradeoff is that Application2 handles a lower call volume. Solution Application1 manifest:
<?xml version="1.0" encoding="utf-8" ?>
<manifest>
<application name="Application1">
<resourceset type="TelephonyRecognizer">
<resource src="Grammars/Library.grxml" />
<resource src="Grammars/App1LargeGrammar.grxml" />
</resourceset>
<resourceset type="Voice">
<resource src="Prompts/App1Prompts.prompts" />
</resourceset>
</application>
</manifest>
Application2 manifest:
<?xml version="1.0" encoding="utf-8" ?>
<manifest>
<application name="Application2">
<resourceset type="TelephonyRecognizer">
<resource src="Grammars/Library.grxml" />
<resource src="Grammars/App2LargeGrammar.grxml" />
</resourceset>
<resourceset type="Voice">
<resource src="Prompts/App2Prompts.prompts" />
</resourceset>
</application>
</manifest>
When creating the dedicated engine configuration for an application, certain values from the manifest file must match settings in the Speech Engine configuration tab of the MMC snap-in. These are:
These procedures assume that the applications are deployed to a default Microsoft Speech Server Standard Edition installation. Step 1: Create a new engine configuration to preload Application2 grammars.
Step 2: Modify the DefaultTelephonyRecognizer engine configuration to adjust the number of instances and preload only Application1 grammars.
Step 3: Deploy the applications to Microsoft Speech Server using Speech Application Deployment Service (SADS). This procedure remains the same whether partitioning is used or not. Follow the instructions described in the SASDK documentation to:
NOTES:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I handle unsuccessful MakeCalls in an outbound application? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | All outbound applications need to be capable of handling call failures (for example, when the number called by the application is busy). A typical solution for this is to try calling the number again later. This can be done by inserting the requested message for the outbound call back to the outbound call message queue. This may present a challenge because there is no server-side event for MakeCall:ConnectionFailed, but here is a simple list of steps to follow in order to attempt making the call later. For a more advanced, deployable solution, see the Banking Alerts Reference Application, which can be found in the Reference Applications directory on the Microsoft Speech Application SDK 1.0 CD. The first step is to declare a SemanticItem called siFailed in the SemanticMap control for your application. We'll use the AutoPostback feature of this SemanticItem to add the message back to the outbound call request message queue. To do this, perform the following steps:
Next, on the server-side code-behind page, add the following to put the message back in the message queue:
private void siFailed_Changed(object sender, Microsoft.Speech.Web.UI.SemanticEventArgs e)
{
MessageQueue _queue = new
MessageQueue(@".\private$\YourOutboundMessageQueue");
_queue.Send(@"http://" + Environment.MachineName + "/"
+ Request.Url.Segments[1] + "Dialog.aspx?PhoneNumber=" +
MainMakeCall.CalledDirectoryNumber + "&Message=" +
Request.QueryString["Message"]);
}
Finally, add a client-side function MakeCall_OnClientFailed. Remember to return true so that RunSpeech will resume after calling this error-handling routine.
function MakeCall_OnClientFailed()
{
siFailed.SetText("false", true);
return true;
}
Now, when an outbound call fails (for example, when the line is busy):
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I build a custom control for more advanced call control functions? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Currently the Speech Application SDK (SASDK) provides speech call controls for answering a call, making a call, disconnecting a call, and blindly transferring a call. The example below shows how you can create a user control that implements a "supervised transfer," where unlike a blind transfer, the application receives call progress events for the transfer. This allows for scenarios such as follow-me style transfer applications. On the Microsoft Speech Server (MSS) platform, the SALT interpreter establishes a communication channel to the Telephony Interface Manager (TIM) for call control purposes. The SALT <smex> element is used for this simple communication channel where XML messages are sent to the TIM (using the sent property) and received from the TIM (using the onreceive event). The XML message contains CSTA XML service requests and events as defined in Standard ECMA-323 (XML Protocol for Computer Supported Telecommunications Applications Phase III). Typically the SALT application makes service requests and the TIM responds with service request responses and call control events. A supervised transfer uses the CSTA Consultation Call service, completed with a CSTA Transfer Call service. The code example for this Tip and Trick can be found in two files:
The SupervisedTransferControl wraps the supervised transfer functionality into a reusable control. The following code segment shows its use on Default.aspx page:
<STC:SupervisedTransferControl id="SupervisedTransferControl1"
runat="server"
TransferToNum="5551234"
ClientActivationFunction=
"SupervisedTransferControl1_ClientActivationFunction"
OnClientFailure="SupervisedTransferControl1_OnClientFailure"
OnClientTransfered=
"SupervisedTransferControl1_OnClientTransfered"/>
The SupervisedTransferControl relies on support from three SmexMessage controls, described below, to implement the supervised transfer. The SmexMessage is a standard speech control that allows the author to send a CSTA message to the TIM and pause execution of the dialog until a particular response has been received.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | Advanced Debugging with TASim-Sending Events to the Microsoft Windows Event Log Using Speech Server | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Problem Description By default, when a grammar resource is missing, the Speech Debugging Console only reports a '-4' error message that might look similar to the following: Listen QA1_Reco: onerror beep="False" initialtimeout="3000" babbletimeout="20000" maxtimeout="120000" endsilence="1000" reject="0" mode="automatic" recoresult="" text="" status="-4" recordlocation="" recordtype="" recordduration="0" recordsize="0" id="QA1_Reco" title="" lang="en-us" dir="" className="" xmlns="" onreco="QA1_Reco_obj.SysOnReco()" onerror="QA1_Reco_obj.SysOnError()" onnoreco="QA1_Reco_obj.SysOnNoReco()" onsilence="QA1_Reco_obj.SysOnSilence()" Although it is possible to start with the -4 status and determine which grammar is missing, having additional information could save time. This additional information can help you determine exactly which resource is causing the problem, particularly when there are multiple grammars, a multiple level grammar, or dynamic grammars where the URL to the grammar is constructed at run time. Solution (1) Add a reference to the sink named logSink in the TASiminstrumention.config file. TASiminstrumention.config is located at: \Program Files\Microsoft Speech Application SDK 1.0\SDKTools\Telephony Application Simulator\ <filters> <filter name="TraceAll"> <eventCategoryRef name="All Events"> <eventSinkRef name="TASimSink" /> <eventSinkRef name="logSink" /> </eventCategoryRef> </filter> </filters> (2) Modify the TASim.exe.config file to include the speech engine of the Microsoft Speech Server. TAS.exe.config is located at: \Program Files\Microsoft Speech Application SDK 1.0\SDKTools\Telephony Application Simulator\ <configuration> <appSettings> <add key="Lang" value="en-us"/> <add key="RecordingDirectory" value="%TEMP%" /> <add key="SpeechServer" value="http://yourSESserver/speechserverweb/lobby.asmx" /> <add key="instrumentationConfigFile" value="TASimInstrumentation.config" /> </appSettings> </configuration> (3) Add the Web server that hosts your application to the trusted sites list of the Speech Engine Service. (4) From the File menu, click Open and launch TASim.exe. Enter http://yourwebservername/yourapplication/default.aspx, and dial the number. Suppose that your application has a QA control called QA1. QA1 has a grammar called toplevel.grxml that has a rule reference to another grammar called pizza.grxml. Assume that pizza.grxml is a dynamic grammar that is missing at the time you are testing the application. If you have configured the speech engine using the instructions above, you will now find additional information in the Windows Event Log; for example:
Event Type: Information
Event Source: Application (TASim)
Event Category: None
Event ID: 0
Date: 3/6/2004
Time: 10:33:09 AM
User: N/A
Computer: YOURWEBSERVERNAME
Description:
Microsoft.SpeechServer.Log.Trace
{
String Message = "Recognition error, error = 8004600A, description =
"System.Web.Services.Protocols.SoapException: Server was unable to process request.
---> Microsoft.SpeechServer.GrammarException: Error loading grammar
'http://YOURWEBSERVERNAME/yourapp/Grammars/toplevel.grxml' or one of the grammars it references.
---> Microsoft.SpeechServer.SpeechServerException: Error downloading grammar
'http://YOURWEBSERVERNAME/yourapp/Grammars/pizza.grxml'.
---> System.Net.WebException: The remote server returned an error: (404) Not Found.
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---""
String ExceptionDetails = ""
Int32 ProcessID = 2244
WindowsSecurityInfo WindowsSecurity = <null>
ManagedSecurityInfo ManagedSecurity = <null>
String StackTrace = ""
String ServiceProvider = "DesktopSaltInterpreter"
Int32 EventLogEntryTypeID = 4
Int64 EventSequenceNumber = 55
String EventSourceInstance = "f68372f1-1073-4a9f-94ae-e690a5dac007"
String EventSourceName = "Application"
String MachineName = "YOURWEBSERVERNAME "
DateTime TimeStamp = 3/6/2004 10:33:09 AM
SpeechContext SpeechContext = {
String ApplicationInstance = "0ba6ddd0-7ca7-4fa3-957c-b299f9bc68d5"
String PageUri = "http://YOURWEBSERVERNAME/logsinktest/default.aspx"
String RequestID = <null>
}
}
Here, the Windows Event Log clearly shows that the grammar 'http://YOURWEBSERVERNAME/yourapp/Grammars/pizza.grxml' is responsible for the -4 error. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How to Debug Server and Client-Side Code at the Same Time? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Problem Description Microsoft Speech Server applications employ both server-side and client-side code, which means you need to be able to debug code on both sides. By default, only client-side debugging is available. Other procedures have been documented that allow you to debug server-side code, but once you follow these steps you can't debug client-side code. So you're forced to use one or the other, while what you really need is to do both: client-side and server-side debugging, in the same debugger at the same time. Solution
If you start debugging using the F5 key from Visual Studio .NET, TASim.exe (the Telephony Application Simulator) will be launched. But because the Telephony Application Simulator does not have the application URL, it will wait for the application URL. This will give you an opportunity to attach to aspnet_wp.exe or w3sp.exe (the name will vary depending on which version of Microsoft Internet Information Services you are running), as follows: Enter the application URL in the Telephony Application Simulator. Before you click the OK button in the 'Open Start Page' dialog, switch to Visual Studio .NET. From the Debug menu, select Processes. In the Processes window, attach to aspnet_wp.exe or w3wp.exe. Then switch back to the Telephony Application Simulator, and click the OK button. The debugger will stop at breakpoints in the code-behind first, because the debugger is attached to aspnet_wp.exe or w3wp.exe. Once the page is rendered to the Telephony Application Simulator, the debugger will stop at the script command 'debugger' in the client-side code as well. This is because by default, the debugger is attached to the Telephony Application Simulator, which was set up by the 'Speech Web Application' template. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How to Test nbest in the Desktop Development Environment? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | This Tip and Trick is for developers using the Microsoft Speech Server Beta 2 and the Microsoft Speech Application SDK v1.0 Beta 4. Problem Description The nbest parameter specifies the number of hypotheses that will be returned. The default value of nbest is 1, which specifies that the speech recognition engine should return no alternates. If you specify the value of nbest as 2, then the speech recognition engine will return the default hypothesis plus one alternative. Currently, the speech recognition engine installed by the Microsoft Speech Application SDK v1.0 Beta 4 for use in the development environment does not support nbest. The following two workarounds will allow you to test nbest in the development environment before deploying the application to the production server. Solution 1: Configure the TASim.exe to use the Microsoft Speech Server speech recognition engine, which supports nbest. Open the TAsim.exe.config file, located in the same directory where TASim.exe is installed, and change the key to point to the speech recognition in the Microsoft Speech Server, as follows: <add key="speechserver" Value="http://yourMSS/speechserverweb/lobby.asmx" /> TASim.exe will then use the Microsoft Speech Server speech recognition engine, which supports nbest. Solution 2: If you do not have access to a Microsoft Speech Server speech recognition engine during the development, modify the SML resulting in the Speech Console Debugger to simulate an nbest result. This solution allows you to test the logic that you provide in your application for handling alternate recognitions. The following example illustrates how this works.
Have other questions or comments? Join the discussion about the Microsoft Speech Application SDK by visiting our newsgroup at microsoft.public.netspeechsdk. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | Can we explicitly stop the listen event when DTMF starts? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Problem Description There are some cases where we might need to fine-tune these behaviors. For example, if the user needs to enter a very long DTMF string, the listen element may fire other timeout events, preventing the user from finishing their DTMF inputs. This may occur even if the initial timeouts have been disabled. Another example is when the phone connection is poor and has an echo. Because listen mode comes to an end before DTMF mode, if the echo causes a false recognition, the DTMF mode will be cancelled. In this case, the user will not be able to finish the DTMF input. Solution Alternatively, you can modify the QA control's DTMF client-side event OnClientKeyPress( ), call listen.cancel( ) to explicitly cancel out the listen mode, as follows:
<script>
function MyQAName_OnClientKeyPress( )
{
MyQAName_Reco.Cancel();
}
</script>
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do you detect a user-initiated hang-up in a speech application? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | If an application is actively running (that is, a QA control is running as part of a dialog), and the end user initiates a disconnect-for example, clicking the hang up button on the Telephony Application Simulator-is there any way to detect the user-initiated disconnect in client-side script? Solution When a user initiates the hang-up, the SMEX "ConnectionClearedEvent" is activated, which causes "RunSpeech.OnUserDisconnected" to be called. You can register the "RunSpeech.OnUserDisconnected" event in your script in any function. One approach is to use the AnswerCall control's "OnClientConnected" event to register the disconnect event. If you have a DisconnectCall control in your application that initiates the disconnect, then the "OnClientDisconnected" event is activated. However, if the user initiates the disconnect before the "DisconnectCall" control gets activated, then the "RunSpeech.OnUserDisconnected" event is activated. Example
function AnswerCall1_OnClientConnected( sender, callId, callingDevice, calledDevice )
{
// Call accepted successfully, register for user-initiated disconnects
RunSpeech.OnUserDisconnected = OnUserDisconnected;
}
function OnUserDisconnected( activeObject )
{
// Handle user-initiated disconnects here
}
function DisconnectCall1_OnClientDisconnected( sender )
{
// Handle application-initiated disconnects here
}
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | When creating speech applications that will accept DTMF (touch-tone) input, how do I enable users to enter information using the # key as an optional terminator? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Speech applications can be designed to accept Dual Tone Multiple Frequency (DTMF), or touch-tone, input as one type of input from users. This is an important feature when you need to capture digit strings of a particular length and would like the recognition to end when the # key is pressed. The trick involved in terminating the DTMF input by using the # key is simply to write code into the grammar file itself. Shown below is an example of a DTMF grammar that has a "UserName" rule that contains a # phrase and a group referencing a second grammar: "digits.grxml". This second grammar includes the "digitkey" rule, which contains phrases from "0" to "9".
Note that the # element in the phrase above is set to optional, which means that users may or may not press # after entering the other digits. If the user does press the # key when finished, the application will return to the user sooner rather than waiting for time to elapse. The above grammar is set to take an explicit input of 5 digits from users, excluding the # key. You can test this by adding a DTMF grammar using a QA control's property builder section, and then dialing using the Telephony Application Simulator (TASim) tool. (The TASim tool is automatically installed as part of the Microsoft Speech Application SDK Beta 3.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | What can I do to fix this? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | I have a subfolder in my application that contains an .aspx file (page2.aspx). The QA on this aspx file is configured to use a .pf file (page2.pf) in the same subdirectory. My application's start page (page1.aspx) will do a Server.Transfer() that transfers to the page2.aspx mentioned in above during the post back. I am getting a runtime error saying that the page2.pf file can not be found. When you add a .pf (PromptSelectFunction) file to a QA control, there are three options you can use: Absolute, Document Relative, and Root Relative. Typically, you can use the default option: Document Relative. Because the path of the .pf file is relative to that of the .aspx file, this option makes is easier to move the .aspx and .pf files. For example, to move the .aspx and .pf files to a new application folder using the default setting, you can just copy and paste both the .pf file and .aspx file to the new folder, and the QA control will be able to locate the .pf file successfully. This approach will not work if you are transferring an .aspx page to a subfolder, and the .aspx page contains a QA control that needs to reference a .pf file in the same subfolder. This fails because on the server side, when Server.Transfer() is called to transfer an application to a different .aspx page in a subfolder, the path of the new page is different from the previous path. However, by design, the client is still looking for the .pf file using the original URL path. For example, the client is first connected to http://servername/speechApp/page1.aspx, but during the post back, the application is transferred to page2.aspx, which is under a subfolder of the speechApp folder. The server is now serving a file located at http://servername/speechApp/subfolder/page2.aspx. But when the QA control in the page2.aspx file needs to reference page2.pf file, the client will look for it in http://servername/speechApp/page2.pf rather than in the subfolder, causing the runtime error. To solve this problem, you should use the Root Relative option. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How Do I use Prompt and Listen? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | The Basic Speech Controls used in the Microsoft Speech Application SDK are an ASP.NET representation of the two fundamental Speech Application Language Tags (SALT) elements: prompt and listen. These controls are designed especially for applications running on tap-and-talk client devices, and are used primarily for managing application flow through a graphical user interface (GUI). In order to use speech recognition on handheld devices running Pocket Internet Explorer, a separate speech server running Microsoft Speech Server SES (Speech Engine Services) must be available. The URL of the Microsoft Speech Server SES must be specified for each Listen and Prompt control. Although the Property Builder included in the Microsoft Speech Application SDK makes it easy to enter the URL for an individual control, it can be impractical to enter this information over and over, and to keep it up to date. Because in most cases the same server will be used for all controls in an application, and because the name of the server might change during development, testing, and deployment, it makes sense to store this URL in just one location in the application. One solution to this might be to store the URL of the speech server in the application's web.config file, in the appSettings area. For example, the web.config file might include a section like this: <appSettings> <add key="speechserver" value="http://MyServer/speechserverweb/lobby.asmx" /> </appSettings> The author can then read the URL from the web.config file at render-time. The URL could then be stored in a string, and ASP.NET data binding could be used to link this value to the Prompt and Listen elements. The author would still need, however, to manually add the necessary text to each ASPX page to specify the information that will be databound. A more advanced but convenient approach is to add the required information to each control dynamically, at render time. In essence, code is added that finds each Prompt and Listen control on the page, and dynamically inserts a Param tag specifying the speech server. The code to do this can be added using a Page_PreRender method. In order to employ this method, when in Design mode, select the current page (in the example described here, "_Default") and enter "Page_PreRender" in the Properties window. This will cause Microsoft Visual Studio® to insert the code to attach a new handler, as follows:
private void InitializeComponent()
{
...
this.PreRender += new System.EventHandler(this.Page_PreRender);
...
}
Visual Studio will automatically jump to the code-behind for the page, and will have created and selected the new Page_PreRender method. The code to add the necessary Param tags looks like this:
private void Page_PreRender(object sender, System.EventArgs e)
{
// Check if the web.config lists a server to user
if (ConfigurationSettings.AppSettings["speechserver"] == null)
return;
string server =
ConfigurationSettings.AppSettings["speechserver"].ToString();
// Find all the controls of type Prompt
ArrayList a = FindAllControlsOfType(Page.Controls, typeof(Prompt));
// Loop over them
for (int i=0;i<a.Count;i++)
{
Prompt p = a[i] as Prompt;
// If it already contains a server param, skip it
if (p.Params.ContainsName("server")) continue;
// Otherwise create and add a new param setting the server
Param sp = new Param();
sp.Name = "server";
sp.Value = server;
p.Params.Add(sp);
}
// And do the same for the Listen controls
a = FindAllControlsOfType(Page.Controls, typeof(Listen));
for (int i=0;i<a.Count;i++)
{
Listen l = a[i] as Listen;
if (l.Params.ContainsName("server")) continue;
Param sp = new Param();
sp.Name = "server";
sp.Value = server;
l.Params.Add(sp);
}
}
One additional method is required, to build an ArrayList of all controls on the page of a particular type:
private ArrayList FindAllControlsOfType(ControlCollection theList,
System.Type theType)
{
// Loop over all the controls in the control collection that is
// passed in. Keep an ArrayList of those matching the requested
// type. Recursively check the children of all controls.
ArrayList toRet = new ArrayList();
for(int i=0;i<theList.Count;i++)
{
if (theList[i].GetType() == theType)
toRet.Add(theList[i]);
toRet.AddRange(
FindAllControlsOfType(theList[i].Controls,
theType));
}
return(toRet);
}
Using this mechanism, the author need only specify the URL of the Speech Server in one place (the web.config file), and the URL in the Speech Controls Property Builder in every instance can be left blank, unless the author wishes to override the global setting. A more advanced author might wish to move this code to a separate helper class or to implement a new class derived from System.Web.UI.Page, so that it would not be necessary to duplicate this work on each page. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I determine whether input is Speech or DTMF (Touch-Tone)? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | QA Dialog Speech Controls support both speech and DTMF (dual-tone multi-frequency) input modes simultaneously. It is often important to know which mode the user is using when there is a noreco event (when the speech input is not recognized). When the noreco event is from DTMF input, it is also useful to retrieve the digits and render the prompts accordingly. There is a significant difference in the way the prompt should be rendered, depending on whether the noreco event was related to speech input or DTMF input. For example, if we are receiving a speech-related noreco event, the prompt can say something like: "sorry, I did not understand you." However, this prompt does not make sense if it is a DTMF noreco because in the DTMF mode, we always 'know' what the user enters - but the digits may be incorrect or unexpected. QA Dialog Speech Controls provide two client-side OnClientNoReco properties: one for Speech and one for DTMF. You can provide a different event handler for each one of them. For example, in the property builder for the QA control, under DTMF properties, you can provide an event handler for the OnClientNoReco property. Let's assume that we have a QA called QA1, and the function name for the DTMF OnClientNoReco is "DTMFNoReco" in the property builder. In the HTML view of the Web form, we will have something like this: <Dtmf OnClientNoReco="DTMFNoReco" ID="QA1_Dtmf"> After you provide the function DTMFNoReco( ), if there is a DTMF noreco event, the function DTMFNoReco( ) will be called, and you will know that this is a DTMF noreco event. According to the SALT (Speech Application Language Tags) specification version 1.0, section 2.3.2.2, the DTMF object has a 'text' property which is updated on every onkeypress, onreco, and onnoreco event. This means that we can retrieve the digits that the user enters all the way to the first key that goes wrong. For example, the user enters 12354 as a PIN number, and the correct PIN is 12345. Here the fourth digit went wrong, and the DTMF text property will capture the digits 1235. Using the code given previously, the DTMF object ID is 'QA1_Dtmf', so we can use the following implementation:
</form>
<script>
function DTMFNoReco()
{
LogMessage("DTMF NoReco", QA1_Dtmf.text);
}
</script>
//...
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Q. | How do I migrate applications from Beta 2 to Beta 3 of the Microsoft Speech Application SDK? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| A. | Now that the Beta 3 version of the Microsoft Speech Application SDK is available, it will be important to understand how to migrate your current applications from Beta 2. The document Migrating Applications from Beta 2 to Beta 3 of the Microsoft Speech Application SDK describes the changes made to various components of the SDK, and can assist you in this migration. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||