Tips and Tricks Archive

Tips and Tricks

Current Archive

Beta 2 Archive

Want some helpful hints for getting the most out of the Microsoft Speech Application SDK? Check out the tips and tricks section.

Have other questions or comments? Join the discussion about the Microsoft Speech Application SDK. Visit our newsgroup at microsoft.public.netspeechsdk.

These tips and tricks are for the Beta 2 version of the Microsoft .NET Speech SDK


Q.How do you monitor how many times an exception is called in the PromptSelectFunction attribute?
A.

The PromptSelectFunction attribute has a default parameter called 'count,' which allows you to count how many times the PromptSelectFunction attribute is called. However, this parameter cannot be used for monitoring which exception PromptSelectFunction is getting. For example, what if you want to know how many times the NoReco exception is received, or the Silence exception is received?

Sample 2 in the Microsoft .NET Speech SDK Beta 2 shows how to monitor the exceptions received in the PromptSelectFunction using client-side ViewState variables, which allows the counter (RunSpeech.ClientViewState["nMumbleSilenceCount"]++;) for NoReco and Silence to be incremented dynamically as follows:

<SCRIPT>
{
  var sUnsure = "";
  var sSilence = "";
  var sPrompt = "";
  
  var sMain = "Please select Black or White.";
    
  if (lastCommandOrException == "NoReco"){
    RunSpeech.ClientViewState["nMumbleSilenceCount"]++;
	if (RunSpeech.ClientViewState["nMumbleSilenceCount"] == 1)
		sUnsure = "I'm sorry, I didn't understand that. ";
	else
		sUnsure = "I'm sorry, my fault again. ";
	
	sPrompt = sUnsure + sMain;
	
  } else if (lastCommandOrException == "Silence") {
    RunSpeech.ClientViewState["nMumbleSilenceCount"]++;
	if (RunSpeech.ClientViewState["nMumbleSilenceCount"] == 1)
		sSilence = "I'm sorry, I didn't hear you. ";
	else
		sSilence = "I'm sorry, my fault again. ";
  
    sPrompt = sSilence + sMain; 
  } else {
    RunSpeech.ClientViewState["nMumbleSilenceCount"]=0;
    sPrompt = sMain;
  }
 
  return(sPrompt);
}</SCRIPT>

The side effect of this approach is that the Prompt Editor could not validate the

RunSpeech.ClientViewState["nMumbleSilenceCount"]++;

The user must comment out this variable during the prompt validation process within the Prompt Editor. We are proposing an alternative approach to solve this problem:

1.

Create a QA control with speech index equal to 1, and name it QA1.

2.

In QA1, clientActivation function, we check the lastCommandOrException on NoReco and Silence. For example, if the lastCommandOrException == "NoReco", we increment the counter (noRecoCount) for "NoReco" by 1.

3.

We need to provide QA1 with a prompt string such as "this is a fake prompt", since the Beta 2 SDK does not support an empty prompt string.

4.

In QA1, clientAcitvation function, we return false. Therefore the fake prompt for QA1 will not be played.

<script>
var silenceCount = 0;
var noRecoCount = 0;

function LastCommandOrException_Tracking()
{
	if (SpeechCommon.lastCommandOrException == "NoReco")
	{
		noRecoCount++;
	}
	else if (SpeechCommon.lastCommandOrException == "Silence")
	{
		silenceCount++;
	}
	return false;
}
</script>

5.

In the PromptSelectFunction attribute for other QA controls, we pass the exception counters for Silence (silenceCount) and NoReco (noRecoCount) into the Runtime variable column of the PromptSelectFunction. If we add the noRecoCount counter to Runtime variable first, a variable called "param1" will appear in the Parameter Name column. If we then add the silenceCount counter to the Runtime variable, another variable called "param2" will appear in the Parameter Name column. "param1" and "param2" are the variables for "noRecoCount" and "silenceCount", respectively. We can implement the PromptSelectFunction as follows:

{
 if (lastCommandOrException == "NoReco")
    {
        if (param1==3)
            return "transferring to operator";
        else
            return "I can not recognize anything";
    }
    else if (lastCommandOrException == "Silence")
    {
        if (param2 == 2)
            return "no data is coming through"
        else
            return param2;
    }
    else
    {
       
        return "where are you flying from? ";
    }
	return " ";
}
Q.I want my application to be able to speak U.S. telephone numbers in a recorded voice. When I just record the digits 0-9 and use them to construct the phone numbers, the resulting output sounds strange and artificial.
A.

Clearly you cannot record all possible telephone numbers and then pick a whole recording each time. However, you can produce much more natural output by recording the digits in each of the contexts that they will appear in the final phone number. For example, take the phone number:

203 535 3245

Each of the examples of the digit 3 sounds very different because of their position in the phone number. People normally read phone numbers with a rising intonation for the first block, a relatively flat intonation during the second and a falling intonation during the final block.

The key to getting natural-sounding output is to capture each of the digits in every position in the phone number, tag these contextually different recordings appropriately, and then use them to construct the final number.

In order to make a reasonable* capture of all of the positional contexts, one only has to make twelve recordings. An example of the required recordings is shown below.

321 230 1234
132 302 4123
213 023 3412
654 879 2341
4665 798 65678
6546 987 86567
987 6546 78656
798 4665 67865
879 6654 9012
023 213 2901
302 132 1290
230 321 0129

The digit 5 has been made bold and you can see that it is captured in each positional context.

You should then tag each of the digits with a tag that can be easily constructed programmatically in the prompt functions. For example, the tag b3_4_9 refers to the digit 9 in position 4 of the 3rd block of numbers.

The output from the prompt function would then be a string of the form "<WITHTAG TAG=b1_1_2> 2 <\WITHTAG> <WITHTAG TAG=b1_2_0> 0 <\WITHTAG> ..." etc.

*Note that this method only captures the positional context of the digits, not the coarticulatory affects of different bordering digits. For example, in natural speech the 2 in 206 and in 216 will sound slightly different because of the bordering digit.

Q.How do I reuse the command object across pages?
A.

One of the best ways to create a reusable command object across pages is to create a User Control.

There is one trick though: you've got to remember to set the scope for the commands! The scope doesn't default to "every QA", which means that you need to specify the name of the parent container. You can use a bit of server-side code to get the name of the container object and you can also add properties to your User Control, if you desire!

Q.Some of the prompts in my application need to be changed frequently. Do I need to recompile and redeploy a huge .prompts file every time I change a prompt?
A.

Your application can use more than one .prompts file simultaneously, so you can put the prompts that change frequently in separate, smaller databases. When a prompt changes, you can recompile and redeploy just that file. Use "Manage Prompt Databases..." (from the prompt panel of a speech control property builder) to add additional prompt databases when creating your application.

Alternatively, you can use the SALT content tag to set a frequently changed prompt to reference a single wave file, which you can then update as necessary. To do this, enter "<salt:content href='/MyWav.wav' \>" as the text of your inline prompt, where MyWav.wav is a URI referencing the intended audio.

Q.How do I add prompts to a multimodal application?
A.

So you've installed your copy of the Microsoft .NET Speech SDK Beta 2 and you want to write your "Hello World" test application. As a first step, you decide that you can use a QA (Question-Answer) control and specify its Prompt property, right? And you can, for the most part.

However, when the application is in multimodal mode, prompts will not play, as those of you that have tried this before already know. QA controls are designed to be used for spoken dialog (voice-only interaction), when the prompting strategy is closely tied to a dialog flow. For multimodal browsers, QA controls "degenerate" to the equivalent of a SALT <listen> tag, enabling the "tap-and-talk" authoring paradigm.

If you want to enable spoken dialog in a multimodal application, we recommend adding speech synthesis (or waveform concatenation) capabilities to your application by using the SALT <prompt> tag directly.

Let's imagine a simple multimodal application that uses a text box to capture a destination city. This is shown below:

<HTML>
  <body>
    <form id="Form1" method="post" runat="server">
      <asp:Label id="Label1" runat="server">
		Destination city:</asp:Label>
      <asp:TextBox id="DestinationTextBox" 
		runat="server"></asp:TextBox>
    </form>
  </body>
</HTML>

In order to speech-enable the text box, you would add a SemanticMap control and a QA control, and set the appropriate events to use "tap-and-talk." (For illustration purposes, this QA control has an inline grammar that accepts "New York, "Los Angeles," and "Seattle" only.)

Here is what the HTML looks like after adding these controls using the visual designer. Note that this code is generated for you automatically:

<%@ Register TagPrefix="speech" 
	Namespace="Microsoft.Web.UI.SpeechControls" 
	Assembly="Microsoft.Web.UI.SpeechControls, Version=1.0.3200.0, 
	Culture=neutral, PublicToken=31bf3856ad364e35" %>
<HTML>
  <body>
    <form id="Form1" method="post" runat="server">
      <asp:Label id="Label1" runat="server">
		Destination city:</asp:Label>
      <asp:TextBox id="DestinationTextBox" runat="server">
		</asp:TextBox>
      <speech:SemanticMap id="Sm1" runat="server">
        <SemItems>
          <speech:SemanticItem ID="Destination" 
				TargetElement="DestinationTextBox" 
				TargetAttribute="value">
          </speech:SemanticItem>
        </SemItems>
      </speech:SemanticMap>


      <speech:QA id="QA1" runat="server">
        <Prompt InlinePrompt="Where are you flying to?">
           </Prompt>
        <Reco StartElement="DestinationTextBox" 
          StartEvent="onclick">
          <Grammars>
            <speech:Grammar id="CityGrammar">
              <grammar xml:lang=&qout;en-US&qout; 
                tag-format=&qout;semantics-ms/1.0&qout; 
                version=&qout;1.0&qout; mode=&qout;voice&qout; 
                root=&qout;Rule1&qout; 
                xmlns=&qout;http://www.w3.org/2001/06/grammar&qout;>
                <rule id=&qout;Rule1&qout;>
                  <one-of>
                    <item>New York</item>
                    <item>Los Angeles</item>
                    <item>Seattle</item>
                  </one-of>
                </rule>
              </grammar>
            </speech:Grammar>
          </Grammars>
        </Reco>
        <Answers>
		<speech:Answer SemanticItem="Destination" 
			XpathTrigger="/SML"></speech:Answer>
        </Answers>
      </speech:QA>

    </form>
  </body>
</HTML>

At this point, the basic interaction is ready; however, your customers might have difficulty understanding how to interact with this application. Let's add a help button that will trigger a prompt that will provide further instructions. We will add these elements at the bottom of the form:

[Previous lines deleted for clarity]

        </Answers>
      </speech:QA>
      <input type="button" value="Help" onclick="HelpPrompt.start()">
      <salt:prompt id="HelpPrompt">Click on the textbox 
		before speaking.</salt:prompt>
    </form>
  </body>
</HTML>

Remember that QA controls are designed primarily for spoken dialog, but can also be used to enable "tap-and-talk" scenarios. You can achieve more flexibility by using SALT tags directly on your ASP.NET page.

Q.How do I build and use Dynamic Grammars?
A.

When building ASP.NET speech applications, the author may want to incorporate dynamic content into their grammars. In most cases, this is done to make the application more flexible when the content is not known ahead of time or is expected to change. For instance, in a retail application in which the user chooses an item from the "specials deals of the day," the author can build a dynamic grammar that will pull the specials for that particular day.

Dynamic grammars can be implemented as normal ASPX pages. The author specifies the static content of the grammar declaratively, and can generate the dynamic parts in the code-behind. For example, the file MyGrammar.aspx might look like:

<%@ Page language="c#" Codebehind="MyGrammar.aspx.cs"
AutoEventWireup="false" Inherits="MyApp.MyGrammar" %>
<grammar langid="409" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:pdc="http://www.microsoft.com/speech/pdc">
	<rule name="MyRule" export="true">
		<l>
		<%# ListContents %>
		</l>
	</rule>
</grammar>

In the code-behind, the author can perform a Page.DataBind() in Page_Load, and can define the property ListContents as follows:

protected string ListContents 
{
	get 
	{
		string s = "";
		// build 's' into the dynamic bit
		// e.g., s += "      <p propname=\"LARGE\">big</p>\n"
		return(s);
	}
}

This grammar and rule are referenced like any others:

<ruleref name="MyRule" url="MyGrammar.aspx" />

However, it is important to note that the grammar editor will be unable to load this file, and will therefore be unable to test any grammars that include this grammar as a child.

Q.How do I work around the error "SpeechCommon is undefined"?
A.

The speech control components in the Microsoft .NET Speech SDK V1.0 Beta may not function correctly if Microsoft Internet Information Server (IIS) is not configured to use the default Home Directory (\Inetpub\wwwroot\). To identify the home directory IIS is using, go to Control Panel | Administrative Tools | Internet Information Services. Expand the tree view to locate Web Sites | Default Web Site. Right click on Default Web Site, choose Properties, and select the Home Directory tab. As a workaround, you can resolve the "SpeechCommon is undefined" error by manually copying the directory %SYSTEMDRIVE%\Inetpub\wwwroot\aspnet_speech into the root of your IIS Home Directory.

Q.I want to validate prompt coverage for my Web application before I record any audio. How do I do this?
A.

Open the project properties dialog box by right-clicking on the prompt project in the Solution Explorer and clicking Properties. In the properties dialog box, select the Use filler for missing waveforms check box. When you validate prompt coverage for your solution, the prompt editor will insert a few samples of silence for every missing waveform, allowing the project to build (a necessary step for validation).

Q.All of my recordings are at somewhat different volumes. How do I make them sound more natural when they are concatenated?
A.

You can use volume normalization to achieve this. Open the properties dialog box for your prompt project again. You can either maximize the peak for each wave, or you can adjust the average energy in each wave to match a recording that you like. To match a recording, select Match to promptdb recording, browse to a .promptdb file, and enter the name of a .wav audio file within the prompt file. (You can find the .wav file names in the Prompt Editor.) The next time you build the project, the volume of the audio in the output .prompts file will be normalized based on your selection.

Q.My prompts sound choppy when they are concatenated together. How can I more easily identify word boundaries when I tune the alignments?
A.

When you import a waveform audio file, the speech recognizer automatically aligns the text to the audio data. Generally, the alignments are quite accurate, but sometimes they need to be tuned. The spectrogram view can help you. The spectrogram view shows concentrations of energy at different frequencies. Word boundaries typically occur where energy levels at various frequencies change substantially. To view a spectrogram for a recording, from the Prompt Editor, open the .wav file you want to view. On the Wave menu, click Show Spectrum. Look for the edges of the black areas.

Q.
A.
Top of pageTop of page