Speech Recognition is a subfield of computer science, usually studied at a postgraduate level, that produces methodologies and technologies that enable computer systems to convert spoken words into their text equivalent. If the text is for information or communication it can be printed, if it is commands it can be executed. Because it combines wisdom from fields as diverse as liguistics, phonetics, acoustics, signal processing, mathematics, machine learning, and even psychology, speech recognition is a multidisciplinary subject.
“We totally believe speech recognition will go mainstream somewhere over the next decade.”Bill Gates, 14 September 2005
How does Speech Recognition Work?
- As the computer listens to the voice signal from the speaker, it tries to break it down into elemental units or phenomes. Those can be syllables, or vowels and consonants, depending on the granularity of the system.
- The signature of those elemental sounds is compared to known acoustic sounds through pattern matching.
- A result is produced, but it is not a binary True or False. Instead, it is a list of probabilities. The results can be visualized as a matrix.
- Next, the systems sifts through the volume of phenomes and probabilities, and tries to identify the actual words against its vocabulary.
- An acoustic model may be employed to aid in the identification, especially if the user’s voice and pronounciation characteristics are known.
- The list of words produced are referred to as Hypotheses in programmers’ parlance, and each hypothesis is designated by a confidence score from 0 to 1, where 0 means “no confidence” and 1 means “total confidence”.
Visual Studio 2019 providing insights into a speech recognizer API
Speech recognition software distinguishes between words that should be printed and commands. Commands are statements that instruct the app to do something to the printed content, such as to select a segment of text, delete a sentence, apply the bold typeface to a word, scroll to a different part of the document, and so on.
Frequently Asked Questions…
What is speech recognition used for?
Speech recogntion allows a user to interact with a mobile device or PC through speech. So the mouse and keyboard, or touch screen, are bypassed. Spoken statements can either be printed or executed, depending on whether they were information or communication, or commands.
What is the best speech recognition software?
There are many great speech recognition software for Windows, Mac, Android and iOS. Some of the better known are Dragon Professional, Google Now, Amazon Transcribe, Braina Pro and Watson Speech to Text. SpeechToText Pro is an inexpensive alternative that often outperforms other apps.