|
Figure 3
Speech Hardware Minimum Requirements
Technology | CPU | RAM | Discrete command and control | User speaks simple commands like "mail," "change time," "minimize." | 386/33 | 500KB | Continuous command and control | User speaks complex commands, like "Send mail to Fred," "Change the time to ten o'clock," and "Minimize the window." | 486/33 | 1MB | Discrete dictation | Transcribes whatever the user says into a word processor. The user must pause between words. | 486/66 | 8MB | Continuous dictation | Transcribes natural speech into a word processor | P6 | 16MB | Text-to-speech | Convert ASCII or Unicode strings to natural speech. | 486/33 | 1MB |
Figure 5
High-level Speech Objects
Voice Commands Object | IUnknown | Provide access to other interfaces in the object. | IVoiceCmd | Simple command and control speech recognition. Member functions let the | | app create Voice Menu objects. | IVCmdAttributes | Controls the attributes of the speech recognitionenginesuchastheautomatic gain, speaker name, and recognition threshold. | IVCmdDialogs | Displays Windows dialog boxes that let the user configure the speech recognition engine, such as training. | IVCmdNotifySink | (Supplied by the app.) Used to notify the app when a command is recognized, the user is speaking too loudly or softly, or something else happens. |
Voice Menu Object | IUnknown | Provide access to other interfaces in the object. | IVCmdMenu | Methodstoadd/remove/modifyvoice commands,andtostartlisteningforthem. |
Voice Text Object | IUnknown | Provide access to other interfaces in the object. | IVoiceText | Main interface for generating speech; contains the Speak function. | IVTxtAttributes | ControlstheattributesoftheTTSenginesuchasthevoice'spitchandgender. | IVTxtDialogs | Displays dialog boxes that let the user configure the TTS engine. | IVTxtNotifySink | Supplied by the app. Used to notify the app when talking has begun or ended, or when a bookmark is reached or something else happens. |
Figure 6
Low-level Speech Objects
Speech Recognition Grammar Object | IUnknown | Provides access to other interfaces in the object. | ISRGramCommon | Provides methods to activate and deactivate the grammar object, or archive it to disk. | ISRGramCFG | Provides interfaces specific to context-free grammars and methods to manage lists of words and link grammars together. | ISRGramDictation | Used for dictation grammars. Apps can supply hints about what the user might be dictating next. | ISRGramNotifySink | Supplied by the app. Used to pass grammar notifications from the engine to the app. |
Speech Recognition Results Object (All interfaces are optional except IUnknown) | IUnknown | Provides access to other interfaces in the object. | ISRResAudio | Gets an audio recording of what was spoken. | ISRResBasic | Provides general information about what was spoken, such as the phrase that was recognized and when it was spoken. | ISRResCorrection | Lets the app confirm that the phrase was correctly or incorrectly recognized, so the engine can learn from its mistakes. | ISRResEval | Tells the engine to reevaluate a recognition decision based on what it now knows about the context. | ISRResGraph | Provides a graph of alternate recognition hypotheses, either for words or phonemes. | ISRResMemory | Since storing results objects consumes memory, this interface is provided to let apps control how results objects are stored. | ISRResMerge | To merge or split two results objects. | ISRResModifyGUI | Tells the engine to display a graphical user interface so the user can correct a recognition result. | ISRResSpeaker | If an engine supports this, the application can use it to identify who spoke. |
Text-to-Speech Engine Object | IUnknown | Provides access to other interfaces in the object. | ITTSAttributes | Controls the attributes of the text-to-speech engine such as the volume, processor usage, speaking speed, and pitch. | ITTSCentral | Controls the engine object. Member functions allow an application to add buffers, and start and stop speech. | ITTSDialogs | Displays windows dialog boxes that allow the end-user to configure the text-to-speech engine, such as correcting word pronunciations. | ITTSBufNotifySink | Supplied by the app. Used to notify the app of changes to text buffer, such as when bookmarks are reached. | ITTSNotifySink | Supplied by the app. Used to notify the app when audio starts or stops, or when attributes are changed. | ILexPronounce | Optional. Lets app query and control the pronunciation of words. |
Figure 8
CIVCmdNotifySink
class CIVCmdNotifySink : public IVCmdNotifySink {
public:
CIVCmdNotifySink(void);
~CIVCmdNotifySink(void);
// Standard IUnkown members,
// all COM objects must have them.
//
STDMETHODIMP QueryInterface (REFIID, LPVOID FAR *);
STDMETHODIMP_(ULONG) AddRef(void);
STDMETHODIMP_(ULONG) Release(void);
// IVCmdNotifySink members
//
STDMETHODIMP CommandRecognize (DWORD, PVCMDNAME, DWORD, DWORD, PVOID, DWORD,PSTR, PSTR);
STDMETHODIMP CommandOther (PVCMDNAME, PSTR);
STDMETHODIMP MenuActivate (PVCMDNAME, BOOL);
STDMETHODIMP UtteranceBegin (void);
STDMETHODIMP UtteranceEnd (void);
STDMETHODIMP CommandStart (void);
STDMETHODIMP VUMeter (WORD);
STDMETHODIMP AttribChanged (DWORD);
STDMETHODIMP Interference (DWORD);
};
|