Frequently Asked Technical Questions - SDK

Technical FAQs

Speech Server

SDK

This topic contains frequently asked questions and answers about using Microsoft Speech Application SDK (SASDK). Some of the questions and answers refer to the Speech Application Language Tags (SALT) specification, which can be found at the SALTforum Web site.


Q.When authoring a multi-language DTMF application, what value should I give the mode attribute of the grammar tag in my grammar files?
A.

When creating applications for multi-language deployment, ensure that the mode attribute for the grammar tag in all grammar files is set to dtmf. This ensures that the grammar file can be loaded into any engine regardless of the language specified.

Q.How do I cause my application to raise a server event when the value of a particular semantic item changes?
A.

Enable the AutoPostBack property of the semantic item that you want to monitor.

Q.Can I confirm all 10 digits of a telephone number by using the Phone Application control?
A.

No. This control only confirms either area code or a local number.

Work around this problem in one of two ways:

Write an Application control to wrap the Phone control.

Use the AlphaDigit control instead of the Phone control.

Q.Can I associate multiple extraction ids with a single word in the prompt database? For example, if the utterance "zero" is recorded into Rec0000.wav, can I place the ids [zero] and [0] on the same word?
A.

No.

Q.How do I set the bargein type for a particular Speech Control?
A.

To set the bargein type for a particular Speech Control, do the following:

Create a SpeechControlSettingsItem element within the SpeechControlSettings element of the Web.config file, and assign it a unique id value.

Add a settings attribute to the HTML code for the targeted Speech Control, using the id value of the SpeechControlSettingsItem as the value of the settings attribute.

For example, suppose that you want to set the bargein type of a QA control in your speech application to speech. In the Web.config file, add the following SpeechControlSettingsItem element that has an id attribute with the value QAStyle:

<configuration> 
   <system.web>  
       -
   </system.web>
   <speechControlSettings>
       <speechControlSettingsItem id="QAStyle">
           <qa>
               <prompt>
                   <Params>
                       <speech:Param Name="bargeintype">peech</speech:Param>
                   </Params>
               </prompt>
           </qa>
       </speechControlSettingsItem>
   </speechControlSettings>
</configuration>
				

Next, as shown in the following code, add a settings attribute with the value QAStyle to the speech:QA element of the QA control for which you want to set the bargein type to speech:

<speech:QA settings="QAStyle" id="QA1" runat="server"></speech:QA>

Alternatively, you can add a Params collection containing a Param element directly to the QA, as shown in the following code:

<speech:QA id="QA1" style="Z-INDEX: 102; LEFT: 304px; POSITION: absolute; TOP: 176px" runat="server">
    <Dtmf ID="QA1_Dtmf"></Dtmf>
    <Reco ID="QA1_Reco"></Reco>
    <Prompt InlinePrompt="hello" ID="QA1_Prompt">
        <Params>
            <speech:Param Name="bargeintype">speech</speech:Param>
        </Params>
    </Prompt>
</speech:QA>

Note To set the bargein type globally for all Speech Controls of a particular type, create a SpeechControlSettingsItem element within the SpeechControlSettings element of the Web.config file, and assign it the id value globalStyle (case-sensitive).

Q.Prompt validation does not validate prompts that are embedded in Application Controls. Is this expected behavior?
A.

Yes, this is expected behavior. For more information, see the topic entitled \"Validating Prompts\" in the SASDK Help documentation.

Q.What is the recommended practice for verifying that ASP.NET Speech Controls (Speech Controls) are installed on a Web server?
A.

Verify that one of the following two registry keys exists in the Web server registry:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{65FE981F-A24C-4B44-A559-70A22D63B153}
Note This key is registered when the Speech Controls standalone installer installs Speech Controls.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{4EBB919B-F15C-41BD-882A-FF3246613D41}
Note This key is registered when the Speech Application SDK installer installs Speech Controls.

Q.Data Transformation Services (DTS) takes three hours to import a 940 MB log. Is this expected behavior?
A.

Yes, this is expected behavior. If audio is not part of your logging, removing it should improve performance.

Q.Are the rejection and confirmation threshold values logged by QAsummary associated with the SemanticItem object?
A.

No. These values are associated with utterances.

Q.How do I return to a previous QA?
A.

A QA is active only when its semantic item has not been filled. To return to a previous QA, clear out its semantic item value.

Q.How do I clear the semantic items for a QA control?
A.

Add the following code in your script to loop over all the answers of the QA and clear the value of each semantic item, where myQA is the name of your QA

for (var i=0;i<myQA.answers.length;i++) {
	var theSI = myQA.answers[i].semanticItem;
	theSI.Clear();
}

The previous code loops over the answers of a QA. To loop over the ExtraAnswers or Confirms collections of the QA, enumerate as follows:
     myQA.extraAnswers
     -or-
     myQA.confirms

Q.What is the difference between the Reject property of the Reco object, and the Reject property of the Answer object?
A.

The value of Reco.Reject is used by the listen object to determine whether to reject the recognition result and send a NoReco exception to RunSpeech. If a recognition is successful, RunSpeech uses the value of an answer's Answer.Reject property to determine whether to reject the answer as a possible match with the recognition result.

The Answer.Reject property is typically used in cases where a Reco object listens for one of a multiple number of candidate answers. In these cases, the Answer.Reject property of each candidate answer sets the threshold value that RunSpeech compares to the recognition confidence value of the associated recognition result. If the confidence value of the recognition result is lower than the rejection threshold value of a particular answer, RunSpeech ignores that answer. In effect, if conditions such as telephone-line quality causes recognition to be poor, RunSpeech evaluates a larger number of answers to determine which one best matches the recognition, whereas if conditions cause recognition to be exceptional, RunSpeech might need to determine only whether one answer matched the recognition.

Q.What is the relationship between rank and confidence in an n-Best list?
A.

Rank and confidence are usually (but not always) correlated. Rank indicates the certainty with which the speech recognition (SR) engine recognizes an utterance, while confidence indicates the degree of certainty that the recognition is correct. Rank is one of the factors that the SR engine uses to calculate confidence.

Q.Can I specify additional answers for any of the Application Speech Controls? For example, can I add "tomorrow at two" to the Date control in such a way that the Date control produces a semantic item for both "tomorrow" and "at two"?
A.

This is not supported. Application Speech Control functionality is encapsulated by design.

Q.Is there a way to determine which constituent of an Application Speech Control is begin activated at any one point in time?
A.

For example, the CreditCardDate control appears to contain three internal QA controls. Can I get the id of the QA that is active at a given point in time?

No. Application Speech Control functionality is encapsulated, and as a consequence, the internal QA ids of the controls such as the CreditCardDate control are hidden by design.

Q.Which scripting engines are used by the SALT interpreters in the Microsoft Speech Platform?
A.

There are four SALT interpreters in the Speech Platform. Each uses a different JScript interpreter. The following table lists each SALT interpreter and the JScript interpreter it uses.

SALT InterpreterJScript Interpreter Used

Speech Add-in for Internet Explorer

JScript Version 5.5

Speech Add-in for Pocket Internet Explorer

JScript Version 3.0

Telephony Application Simulator

JScript .NET

Telephony Application Services

JScript .NET

Q.How do I use the SASDK MakeCall control in an application?
A.

The .aspx page on which you want to use the MakeCall control should contain the following declaration, where the value of the CalledDirectoryNumber attribute is the phone number that you want to call. In this example, the control places a call to the phone number 123-4567.

<speech:MakeCall id="MakeCall" style="Z-INDEX: 101; LEFT: 8px; POSITION: absolute; TOP: 16px"
     runat="server" CalledDirectoryNumber="1234567"></speech:MakeCall>
Q.How do I terminate dual tone multi-frequency (DTMF) input using the pound (#) key?
A.

Use code contained in the recognition grammar file. In the grammar file, create a rule in a DTMF grammar that contains a phrase consisting of the pound sign (#).

Q.How do I write effective grammars for speech applications?
A.

A comprehensive discussion of how to write effective grammars is beyond the scope of this documentation. However, the following guidelines provide a solid foundation.

1.

Do not generate unnecessary grammar rules. As a rule, do not add words or phrases unless there is evidence that those words or phrases will be used by the customers. Additional words in rules have a negative impact on recognition accuracy, resources used, and performance efficiency. For example, for a voice dialer application, if the application prompts the caller to say first name followed by last name, write the recognition rule so that the names are recognized only if spoken in this order. In other words, do not write the rule so that it recognizes last name followed by first name.

2.

Do not create unnecessary rule hierarchies in the grammar. The Microsoft Speech API (SAPI) can optimize grammars at the rule level. Duplicated paths are merged automatically to improve the performance, but only at the rule level. SAPI can not optimize referenced rules. For this reason, do not create unnecessary hierarchies in the grammars.

3.

Normalize text in the grammars. Do not assume that the speech recognition engine can pronounce abbreviations or proper names in a grammar. For example, replace "James Allen III" with "James Allen the third." When in doubt, add a pronunciation to the word in question by using the token element.

4.

Avoid phonetically confusing phrases. If possible, remove phonetically confusing words or phrases from the rules, and use the application's prompts to tell callers which words to use. The words "next" and "exit" sound very similar over a phone line. For example, in a recognition rule where "next," "exit," and "quit" may be an appropriate response, use only the word "quit" in the rule, and tell the caller to say either "next" or "quit" in response to the prompt.

5.

When barge-in is allowed, reduce the set of words (initial fanouts) that start the rule. Limiting the number of words that can start a rule significantly reduces the number of false barge-ins that a speech recognition engine triggers as it tries to detect the first word of a rule.

6.

Optimize the number and content of semantic markup language (SML) elements embedded in the grammar. Values generated by expressions contained in SML elements require memory. As the number of SML elements increases, and as the complexity of processing the expressions contained in these elements increases, demands on memory increase, and performance is negatively affected.

7.

Use the SASDK grammar library as much as possible. The rules in the SASDK grammar library provide good coverage of common speech recognition tasks. The grammar library is also optimized to achieve high accuracy and efficiency. Grammar authors are encouraged to use the grammar library as much as possible.

Q.What is the difference between the developer version and the end-user version of Microsoft Internet Explorer Speech Add-in provided in the SASDK?
A.

The developer version enables debugging and provides support for the SASDK tools environment. It is identified as follows:

<!-- Microsoft Internet Explorer Speech Add-in-->
<object id="SpeechTags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC" VIEWASTEXT>
</object>

The end-user version does not provide these features. It is identified as follows:

<!-- Microsoft Internet Explorer Speech Add-in-->
<object id="SpeechTags" CLASSID="clsid:33cbfc53-a7de-491a-90f3-0e782a7e347a" VIEWASTEXT>
</object>
Q.Do I need to replace the CLSID for the developer version of the Speech Add-in component with the end-user Speech Add-in for my final page?
A.

If you are developing your application using the QA control from the SDK, then the QA will detect the Speech Add-in component and render accordingly. If you are developing your application using SALT directly or if your application is mixed with SALT code (for example, to provide prompt support for multimodal mode), you must configure the IIS MIME types.

Q.Can I install both the developer and end-user versions of the Speech Add-in components on the same computer?
A.

No, both versions cannot be installed on the same computer. If one version is already installed, the SASDK installer does not install the other version.

Q.Can I test the final speech-enabled Web page on my development computer?
A.

Yes. However, we strongly recommend that for testing, you use a clean computer that is configured with the same environment the user will have.

Q.How do I detect whether the client browser supports SALT?
A.

Use the user agent string that is documented in the SALT 1.0 specification. Following is an example of a user agent string:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; 
Q312461; .NET CLR 1.0.3328; SALT 1.0.1023 3583).
Q.Can I have a QA that can respond to either <listen> or DTMF? For example, can I say "Please say your PIN number or enter your password."?
A.

Yes, you can. Detailed information on supporting DTMF and <listen> can be found in the SALT specification under section 2.3.6 "Using Listen and DTMF simultaneously". SALT enables this in two ways: "(i) the disabling of initial timeouts on the other mode on detection of input, and (ii) the automatic cancellation of one mode when the other mode comes to an end."

Q.What wave file recording format is supported by the SASDK?
A.

The SASDK supports only 8 kHz, 8 bit, mono, PCM recording.

Q.Can I debug into a prompt function in the SASDK?
A.

Yes. You can add a debugger statement in the code of the prompt function. When the prompt function is executed, the debugger starts.
Note Script debugging must be enabled within Internet Explorer (under Tools | Options).

Q.How do I pass a textbox string into a prompt function?
A.

Pass document.all["TextBox1"].value into the Runtime Value column in the Prompt Function Editor.

Q.How do I play a recorded prompt that is mixed with TTS voice?
A.

Use a <div/> element in the prompt function. For example, suppose that you have recorded the prompt: "you are flying from", and you want to follow this prompt with the name of a city spoken using a TTS voice. Write the prompt function as follows:

{
    Return "you are flying from <div/>" + runtime_city_name_variable; 
}

Alternatively, you can use the <tts> element to specify content that should be spoken using the TTS voice, as in the following example.

{
    Return "you have email from <tts> Joe </tts>";
}
Q.My prompt function has many strings that are not in the prompt database. How do I extract the prompt string from the Prompt Function Editor to the prompt database tool so that I do not need to type it twice?
A.

Do the following:

1.

Open the prompt function using Prompt Function Editor.

2.

On the View menu select Prompt Validation. The validate tool appears.

3.

Click the Do Validate Solution button, which is the third button from the left. Any words that are not in the prompt database are displayed in red.

4.

In the Prompt Validation Results window, select a word (or sentence) that is displayed in red, and then click the Add to Database button. This will add the prompt to the prompt database.

Q.Can I run the SASDK sample applications on Microsoft Speech Server?
A.

Yes. Manage the sample applications like any other speech application using Speech Application Deployment Services (SADS). See the Microsoft Speech Server Help documentation for information on using SADS.

Q.Can I programmatically change the TTS prompt voice that is used by the SASDK?
A.

No.

Q.Can I pass a client parameter to the server in the URL that points to the server?
A.

Yes. For example, your URL could be "http://MyServer/webform1.aspx?&a=1&b=2&c=3". In the code behind, you can use code similar to the following code that outputs "1" to a text box:

String str1 = Request.QueryString['a'];
TextBox1.Text = str1.ToString( );
Q.How do I pass a "hangup" event to the Web server?
A.

This mechanism is not implemented. For a workaround, pass a string into a text box on the webform from the client side, and then retrieve the string from the server side.

Q.Why does the TTS engine sometimes spell out a word that is in uppercase letters instead of rendering the word as a single entity?
A.

Words represented in uppercase letters must be included in the TTS lexicon in order to be treated as a single word. To avoid this issue, use lowercase letters to represent the words that you send to TTS. For example, instead of using <tts>EXAMPLE</tts>, use <tts>example</tts>.

Q.Why do I sometimes see a result returned in the SML output displayed in the Speech Debugging Console, but the result is not bound to a text box?
A.

This usually is caused by the value of the ConfirmThreshold property. If you use a confirm threshold, and if the confidence score of the returned result is lower than the value of the ConfirmThreshold property, then the SemanticItem state is set to "NeedConfirmation". The result will not be bound to the text box until the SemanticItem is confirmed. To work around this, either:

     Provide a confirm QA.
     - or -
     Change the SemanticItem in the SemanticMap control, and set its binding behavior to "BindOnChanged".

Q.How do I test the AlphaDigit Application Speech Control using text?
A.

Set the value of the InputMask property of the control so that it accepts text. For example, if the InputMask property is set to AAADDD, the control recognizes a series of 3 letters followed by 3 digits, such as "a. b. c. one two three".

Q.How do I specify a remote Speech Engine Services (SES) service to an application using SALT?
A.

Insert the following SALT tag, as shown in this example:

<speech:listen id="...." ............. >
    <speech:param name="server">http://Server_Name/SpeechServerWeb/Lobby.asmx</Speech: param>
</speech:listen>
Q.How can I play a wave prompt using SALT?
A.

Create a reference to the .wav file using the <content> element, as shown in the following example:

<html xmlns:salt="http://www.saltforum.org/2002/SALT"> 
    <body onload= "prmt.Start()">
        <input id = "ButtonCityListen" type = "Button" onClick = "prmt.Start()" />
        
        <!--SALT-->
        <salt:prompt id="prmt">
        
            <salt:content href="./1.wav"></salt:content>
        
        </salt:prompt>
    
    </body>
</html>
Q.How can I set the BargeInType value of Application Speech Controls when I set the SpeechBargeIn = "engine"?
A.

Set the BargeInType value in the Web.config file. Insert the following code inside the <configuration> element of the Web.config file.

<SpeechControlSettings>
    <speechControlSettingsItem id="globalStyle">
        <QA>
            <Prompt>
                <Params>
                    <Param Name="bargeintype">grammar</Param>
                </Params>
            </Prompt>
        </QA>
    </speechControlSettingsItem>
</SpeechControlSettings>
Q.How can I use the Navigator Application Speech Control and leave the DataContentFields property empty? (I just need the values of the DataHeaderFields property.)
A.

Set the DisableColumnNavigation property to true.

Q.Can I raise the volume in a QA?
A.

Yes. Using SALT markup, adjust the volume using the volume attribute of the prosody element, as shown in the following example:

<html xmlns:salt="http://www.saltforum.org/2002/SALT">
<object id="SpeechTags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC" VIEWASTEXT></object>
<?import namespace="salt" implementation="#SpeechTags" />
  <body onload="promptHello.Start()">
    <salt:prompt id="promptHello">
        <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">

            Hello. <emphasis> Hello. </emphasis> <prosody volume = "20"> quiet </prosody>

        </speak>  
    </salt:prompt>
  </body>
</html>

With QA inline prompt:

<speak version="1.0" xmlns:ssml=http://www.w3.org/2001/10/synthesis xml:lang="en-US">
    <prosody volume = "10"> quiet text.</prosody>
    <prosody volume = "80"> loud text.</prosody>
</speak>
Q.Why can't I find any prompt function files for the SASDK Application Speech Control samples?
A.

The promptSelectFunction functions used in the SASDK samples are contained in .js files (not in .pf files). For example, the prompt function for the sample "Navigating Tabular Data" is contained in the PromptHelper.js file associated with that sample.

Q.
A.
Top of pageTop of page