Wednesday, October 31, 2007

Text to Speech with accents

Because development of text to speech voices has traditionally been driven by call centers, companies have generally gone for the most standard, non-accented speakers they can find. I suppose it makes sense, if you are doing a nationwide call-in center, you don't want your automated voices sounding like they are from Boston or Georgia, you want just a plain vanilla American. And that is what you get with most US English voices. We've been able to get some variety with great British voices like Audrey and Australian Karen. AT&T and Nuance have added Indian accented voices, and the first South African voice is on the way, but there still aren't a lot of choices.

I'm not sure traditional businesses are driving demand for more variety, but online uses and general consumers are always looking for more personality in voices. I've heard requests for everything from southern red-neck to inner-city black kid, and about everything in between. Nothing to offer in those areas yet, although it will come, but it did prod me into this interesting experiment. I took all of the non-english speaking voices we offer and gave them a shot at a paragraph of English text. In some cases the results were terrible, in some cases tolerable, a few cases funny, and with a few of them, they were fantastic. Listen to all the results at the NextUp.com Accents Page.

2 comments:

Manoj said...

The latest trend among TTS companies seems to be geared in that direction. Besides selling voices, they now are selling software that let you create your own synthetic voice. You can hire your own talent or use your own voice to build a synthetic voice. Cepstral's voiceforge is a step in that direction. I am told Nuance has a similar product( I don't know all the details etc.)

I would really like to see the next generation TTS offerings to record my voice, and then use it for automatic speech synthesis.

P.S.: My browser crashed when I clicked on publish. So I am resubmitting my comment.

Mike Rozak said...

Accents are ABSOLUTELY CRITICAL to TTS in games, a market that the TTS industry is currently ignoring (ye-olde chicken and egg problem).

For my own game, the AWB voice from the Blizzard 2005 challenge is probably the most interesting of the voices I have. It's a Scottish voice, which I've analyzed with an American lexicon. Anyone who has intimate knowledge of a Scottish accent thinks its strange, but Americans and Australians think it sounds Scottish enough.

Of course, part of the problem is that there aren't any lexicons for "inner city black kid" (or even Scottish? (From which part of Scottland?) ). From my POV, a free Australian lexicon would be nice. I live in Australia, and can easily record australian accents but they sound a bit strange (to me) when generated against an American lexicon.