Wednesday, July 30, 2008

Creating your own voice

This idea has been one of the Holy Grails in TTS, letting you easily create new TTS voices based on your own voice. Nobody has it right yet, it just isn't simple. The efforts I know of are
http://www.modeltalker.com which I've tried. I was able to create something on par or slightly worse than Microsoft Mary, but you could sort of tell I was in there.

Cepstral has a project underway at
http://www.voiceforge.com

and now this coming from OKI
OKI Brings a Unique Voice to TTS (from SpeechTEK)

Posted Jul 25, 2008 Print Version
ShareThis

Page 1of 1
Click here to learn more!

Japanese telecommunications firm OKI Electric Industry yesterday launched Polluxstar, a text-to-speech (TTS) software solution that allows users to reproduce their own voice.

Using Polluxstar software on their computers, individuals can communicate through TTS in their own voice, complete with their unique tones and inflections, rather than a computerized, non-human voice.

To recreate the individual’s voice, OKI requires users to submit prescribed voice files or go to an OKI-approved recording studio. Once the company has the recorded audio it needs, it enters the individual’s voice information into a database that is included with the software package he buys.

The technology is only available in Japan right now, but the company has not ruled out the possibility of taking it to other markets. "At this point, we think the overseas market is a good possibility, but we don’t have any immediate plans," says Naomi Takeuchi, OKI’s U.S. spokesperson.

Takeuchi adds that the product is currently being targeted to consumers, especially those who have illnesses that could result in the loss of their voices. One of the first users of the technology was Izumi Maki, a computer science professor at Osaka University of Arts. Prior to undergoing surgery to remove his vocal chords, OKI recorded his voice data and installed it in Polluxstar for him to use. Six months after his surgery, Maki returned to the university to give lectures again, using his real voice through the computer. More...


8 comments:

RickyF said...

This is quite interesting.

Does anyone have any idea when it will be available and what it will cost?

Mike Rozak said...

I'm perplexed at why you wrote: "This idea has been one of the Holy Grails in TTS, letting you easily create new TTS voices based on your own voice. Nobody has it right yet, it just isn't simple."

Why hasn't anyone gotten it right? It's a fairly standard feature, especially in research engines. (My game-oriented engine has it too.) The major engine vendors DON'T have it only because they're afraid it will hurt their bottom line.

Mike Rozak said...

I'm perplexed at why you wrote: "This idea has been one of the Holy Grails in TTS, letting you easily create new TTS voices based on your own voice. Nobody has it right yet, it just isn't simple."

Why hasn't anyone gotten it right? It's a fairly standard feature, especially in research engines. (My game-oriented engine has it too.) The major engine vendors DON'T have it only because they're afraid it will hurt their bottom line

Ken White said...

There currently isn't technology where a user can create a high quality TTS voice themselves. The process is still far too complicated to do it right, and the tools that have been made available produce very poor voices.

Mike Rozak said...

Which parts of them are too complicated?

(I haven't tried other companies' systems, but the basic process is read these 1000-ish prompts... which is long and boring.)

Ken White said...

Have you heard the voices that come out of that short process? They just don't sound great compared to more natural ones like
http://nextup.com/TextAloud/SpeechEngine/voices.html

My understanding on the really good ones is more like 20 hours of scripts read in a studio, then a few months with a sound engineer chopping up the audio.

Mike Rozak said...

Ken White wrote: "Have you heard the voices that come out of that short process?"

No, I haven't heard samples. Is there anything on the web to listen to?

I have, however, participated in and heard the blizzard challenge. As far as I know, most of the groups DON'T spend months hand-tuning results.

Blizzard tests are based on 1000 sentences (1 hour) from the original voice, as well as the full 10,000 (10 hours). 10K voices sound around 0.3 to 0.4 MOS (mean opinion score) better, but many of the 1000-sentence voices are very good.

FYI, I don't do any hand tuning. My voice was last this year, but i've improved the algorithms since; it's still not as good as the pros. If you want, I can E-mail you some samples generated with 16K unit versions of some of my voices (based on 1000 sentences). I don't have a working E-mail for you. My Email is Mike@mXac.com.au if you want to reply.

My voices don't sound as good as the pro voices you have. But I'm not trying to push my voices. My point is that if I can do an okay job of automated voice generation, then the pro companies can do a better job... if they want.

aledan said...

Have you tried the Vocal Synthesis in Italian Language of the SLD Software? Listens in real time:
"VOCE VIVA - LETTORE VOCALE DELLA LINGUA ITALIANA"?
Find them here:
http://demoserver1.ath.cx:51234/
http://www.voceviva.it
http://www.voceviva.it/page4.html