Return to Home


TTS has a big potential of the market's five major segments: education, disabled, computer interface, consumer and telecommunications.

What is Text-to-Speech?
Text-to-speech is a process through which text is rendered as digital audio and then "spoken." Most text-to-speech engines can be categorized by the method that they use to translate phonemes into audible sound.

Why Use Text-to-Speech?
Text-to-speech should be used to audibly communicate information to the user, when digital audio recordings are inadequate. Generally, text-to-speech is better than audio recordings when:

  • Audio recordings are too large to store on disk or expensive to record.
  • Audio recording is impossible because the application doesn't know ahead of time what it will speak.

Text-to-speech also offers a number of benefits. In general, text-to-speech is most useful for short phrases or for situations when prerecording is not practical. Text-to-speech has the following practical uses:

  • Reading dynamic text. Text-to-speech is useful for phrases that vary too much to record and store using all possible alternatives. For example, speaking the time is a good use for text-to-speech, because the effort and storage involved in concatenating all possible times is manageable.
  • Proofreading. Audible proofreading of text and numbers helps the user catch typing errors missed by visual proofreading.
  • Conserving storage space. Text-to-speech is useful for phrases that would occupy too much storage space if they were prerecorded in digital-audio format.
  • Notifying the user of events. Text-to-speech works well for informational messages. For example, to inform the user that a print job is complete, an application could say "Printing complete" rather than displaying a message box and requiring the user to click OK. (This should be used for noncritical notifications in case the user turns the computer's sound off or is out of hearing range.)
  • Providing audible feedback. Text-to-speech can provide audible feedback when visual feedback is inadequate or impossible. For example, the user's eyes might be busy with another task, such as transcribing data from a paper document. Users that have low vision may rely on text-to-speech as their sole means of feedback from the computer.

Games and Edutainment
Text-to-speech is useful in games and edutainment to allow the characters in the application to "talk" to the user instead of displaying speech balloons. Of course, it's also possible to have recordings of the speech.

Text-to-Speech Voice Quality Most text-to-speech engines can render individual words successfully. However, as soon as the engine speaks a sentence, it is easy to identify the voice as synthesized because it lacks human prosody -- i.e., the inflection, accent, and timing of speech.

Application Design Consideration
TTS in Multimedia Builder