Tuesday, February 27, 2007

Windows TTS on Linux

Great article from our friend Del about using windows emulator to get TextAloud and good voices working on linux. Full article in NextUp Forum Here. Be sure to check out the forum for follow-ups.

I seem to have got TextAloud running on Linux with NeoSpeech US English premium voices.

TextAloud icon is now sitting on my Linux desktop and I can simply click it to start using this great application in Linux. An important point is that TextAloud gets integrated with Linux environment so that I can now directly open and save documents in TextAloud on my Linux machine. Running TextAloud on Linux also lets me directly convert texts to WAV and MP3 audio files that are flawlessly written to the hard disk using Linux file system.

I decided to post a scenario that helped me get TextAloud to work on Linux with the hope that other users looking forward to moving TextAloud to Linux could probably find these guidelines helpful.

1. Install Wine. Wine is needed to run TextAloud on Linux. If Wine is not installed on your system, you will have to download and install appropriate package. See Wine User Guide for detailed instructions.

2. Configure Wine. Latest builds have graphic interface and are easy to configure. Just run 'winecfg' in terminal window to invoke the configuration applet.

A. I’d strongly recommend setting Wine to emulate Windows Me. I’ve tried other Windows systems on top a Red Had based Linux and have found that Windows Me was the fastest and didn’t cause any issues with opening and saving files in TextAloud.

B. Make sure Wine is set to use appropriate sound drivers. Older Linux distributions (kernel 2.4) typically use the OSS driver, while 2.6 kernels have switched to ALSA. (In my setup ALSA worked fine.) Also set Acceleration to ‘Standard’ and uncheck ‘Driver Emulation’ box under audio settings tab.

3. Install SAPI5. You will need SAPI5 to use NeoSpeech premium voices with TextAloud on Linux. If you want to check whether or not SAPI5 has already been installed, download, unzip and run FixRegistry utility
If FixRegistry reports a SAPI error, download and install SAPI5

4. Install TextAloud. For some unknown reasons, TextAloud v 223x might need being reinstalled to work properly on Linux. It was not until I reinstalled TextAloud that its icon showed up on my Linux desktop and TextAloud started ok. I installed TextAloud using default options suggested by the installer and then just installed it for the second time on top the first installation.

5. Launch TextAloud to verify it works ok with SAPI5 Microsoft voices and then exit. If you see the splash screen but TextAloud doesn’t actually start and freezes at the initialization stage, you should try different audio settings in Wine. The reason to freeze is that, most likely, TextAloud just can’t find the audio device. Try switching to a different audio driver in Wine.

6. Install NeoSpeech voices. Please note, that NeoSpeech temporary files take about 1 GB of disk space so the system might ‘freeze’ for a minute or two during the install.

7. Launch TextAloud and enjoy using this great program with NeoSpeech premium voices on Linux.

Hopefully this info can be of any help to users who want to start using TextAloud in Linux. As it’s impossible to work out a universal scenario applicable to all Linux versions, I’d greatly appreciate any comments, corrections, and additions to this topic.

Monday, February 26, 2007

Using TTS from PC on BlueTooth Cell Phone

Interesting thread here about using NextUp Talker with a Blue Tooth enabled wireless phone.

did some checking on the idea of using the windows desktop as a handsfree device for a cell phone. In other words, the PC acts like a headset, where audio output from the PC is sent to your cell phone mic, and audio input from the cell phone is directed to your pc speakers.

I found that the Widcomm Bluetooth stack includes a Headset service, and was pretty simple to set up to work with Text-To-Speech output.

Assuming you have the Widcomm stack installed, here are the steps:

1. Enable bluetooth on the pc and cell phone.

2. On the cell phone, search for new devices, and locate your pc. Add that device.

3. Next, scan the device for services and look for "Headset". Select this service.

4. On your phone, mark your pc 'device' as handsfree.

Now when you make a phone call, your cell phone should automatically connect to the pc as if it were an ordinary headset.

The only other thing you need to do is choose "Select Audio Device" from the NextUp Talker Options menu, and set the audio output device. On my system the audio device is named "Bluetooth Audio". Once this is selected, TTS output from NextUp Talker is sent to your 'headset' microphone.

The real trick to this is figuring out what Bluetooth stack you're using, and getting the Widcomm stack installed if you happen to be using the Microsoft stack. If you're using a stack other than Microsoft or Widcomm, you'll need to find out if the stack includes a headset service.

Some helpful links:

Finding Your Bluetooth PC Stack

Updating the Bluetooth stack on your XP Computer

This second link talks about bluetooth headsets in general, and covers how to switch from Microsoft to the Widcomm stack.

TextAloud on the Radio

Interesting TA mention. NextUp TAlker might be a better choice for this.

The interview with Paige was awesome; everything worked. She will now edit the recording before putting it “on air”. I’ll announce the details once I know them.

For now, I would like to share how I, with cerebral palsy and a significant speech impairment, was able to give my first radio show! It is actually mind-boggling that this “non-verbal” red-head was able to do this.

Here are the steps taken to accomplish this feat:

  1. Paige sent me her questions ahead of time.
  2. I typed my responses into Microsoft Word.
  3. I copied each individual response into my text-to-speech software TextAloud and tweaked the text so that my computerized voice Kate reads it as accurately as possible.
  4. I saved each response as a separate wave file.
  5. I created one PowerPoint slide with links to each wave file; that way each response is only one mouse click away.
  6. In Paige’s online room used for recording, when it was time to give response, I hit the microphone button and then the appropriate link in PowerPoint.
  7. Voila…my first radio interview!
Full Story...

Friday, February 23, 2007

Making your own TTS Voice

I'll be geting more info on this, but shows some interesting promise.

Build Your Own Talking Voice With VoiceForge(TM) by Cepstral
Self-Directed Voice-Banking Tools for Creating Synthetic TTS Voices

Distribution Source : Market Wire

Date : Tuesday, February 20, 2007

PITTSBURGH, PA -- (Market Wire - Feb 20, 2007) -- Cepstral LLC announced the release of VoiceForge(TM), a web 2.0 product that can turn a set of recorded audio prompts into a Text-to-Speech (TTS) voice capable of saying anything. With VoiceForge(TM), companies or actors can capture or "bank" their voices on their own. Once a voice is synthetically forged, it can be used to speak dynamic information for Entertainment, Telephony, Navigation, Education, or Reminder applications.

VoiceForge(TM) is a novel suite of web tools that gives clients the ability to create their own voices rapidly and inexpensively by themselves. Furthermore, the client retains all intellectual property rights to his/her voice creations. The final voice database is a plug-in that synthesizes using Cepstral's core engine. Cepstral's core engine runs on all platforms from the smallest cell phone devices to large distributed systems as well as PC, Mac, and Linux desktops offering unparalleled flexibility with respect to distribution once a voice is finished.


"As an industry, we make voices that are safe, but not necessarily exciting," said Cepstral CEO Craig Campbell. "With VoiceForge(TM), clients can now create unique high-quality TTS voices that keep pace with consumer and business demand for branded, celebrity, ethnic, and even cartoon personalities. To cite but one example of the need to improve voice diversity, there are currently no African American TTS voices available," added Mr. Campbell.

VoiceForge(TM) may help spur a new layer of speech services as companies take on voice building to serve the specific needs of their vertical markets. In the 1990s, Cepstral's founders released an open source TTS engine, Festival. Today, Cepstral's tools are proprietary, but there exists an ecosystem of "master voice builders" who have toiled under the complex old tools and welcome newer-better-faster ones. One such partner is Silex Creations who uses VoiceForge(TM) to offer professional voice creation in conjunction with their audio and voice manipulation technology. "The VoiceForge system has been fast and intuitive. Within days we can hear voices. We are now in a position where we can commercially apply our experience and help clients bring truly exciting speech products to market," said François Lanctôt, president of Silex Creations.

VoiceForge(TM) is a breakthrough for any company, entertainer, or brand manager interested in voice banking their talent, preserving a celebrity voice for an estate, or extending a franchise to include dynamic features such as VoIP announcements, SMS-to-Voice, Text-to-Podcast, custom ring tones, etc. Clients have the option to bring the tools in-house, or contract services through third-party experts like Silex.

About Cepstral LLC

Cepstral is a speech technology company based in Pittsburgh, PA, USA, which provides speech technologies and services for the spoken delivery of information. We build high quality, natural sounding voices for hand-held, desktop, and server applications. Cepstral: We Build Voices.

Thursday, February 22, 2007

Experts say more and more drivers reaching for iPods

Nice TextAloud mention

According to a survey conducted by text-to-speech software manufacturer NextUp.com, three out of four iPod and portable consumers also use their devices for listening to text outside the home or office, with 60 percent listening in their car.

The company manufactures TextAloud software utilizing voice synthesis to "speak aloud" documents, Web pages and e-mails for playback by MP3 players.

Rick Ellis, president of NextUp.com, said most people view their commute to work as wasted time, but with the use of portable audio devices like iPods, a whole new world for the commuter has opened up.

According to Novi Mayor David Landry, 183,000 vehicles travel Interstate 275 each day, and 153,000 vehicles pass through Interstate 96.

"The world passes through Novi," he said during his recent state of the city address.

And traffic backups come hand-in-hand with long or congested commutes, causing many drivers to reach for their MP3 players for solace.

Full Story...

Spoken Translation(TM) Unveils World's First Software for Reliable Translation of Extensive Written or Spoken Conversations

Spoken Translation, Inc.(TM), the worldwide developer of ground-breaking technology for cross-lingual communication, today introduced Converser for Healthcare at a press conference at the renowned SpeechTEK speech technology conference in San Francisco.

Converser is a system for two-way translated communication between limited-English-speaking patients and English-speaking caregivers. The system allows people who do not speak the same language to hold broad health-related conversations in real time, without a human interpreter. It addresses a major pain-point in healthcare organizations: low budgets for patient communication and interpreting services. Converser gives medical institutions a translation solution that not only significantly reduces costs but improves overall patient safety, helping to eliminate numerous grave errors made by non-professional human interpreters.

Converser is a system for two-way translated communication between limited-English-speaking patients and English-speaking caregivers

Converser represents a fundamental advance in Machine Translation (MT) technology. No other system on the market today can provide reliable, bi-directional, real-time, wide-ranging translation via multiple interface modalities including speech recognition.

To improve translation accuracy and enhance the user experience, Converser provides reverse (or back-) translations and permits verification and selection of word definitions to ensure that the translation is "what you mean to say." Never before has a commercial product for conversational translation enabled a user to verify in real time that the translation is accurate, and, if not, to correct it on the spot. By allowing even monolingual users to monitor and guide translations as they happen, Converser promotes understanding of and trust in its translations, even in wide-ranging conversations. Monolinguals are thus empowered in multilingual settings, achieving an unprecedented degree of control. Other software products usable for real-time translation (e.g. free online translation services like http://babelfish.altavista.com) provide no such control or confidence.

Monitoring human translators is also impractical, although human translation errors have been a significant issue in healthcare institutions. Studies have shown that non-professional medical interpreters risk patient safety and increase liability. According to a study published in Pediatrics, the leading journal for illnesses affecting children, an average of 31 interpreter errors occurred on each of the 13 doctor visits studied.

Some of the mistakes were minor, such as omission of a word that did not significantly change a doctor's meaning, but 63% were considered serious enough to have medical consequences. In these cases, the incorrect translation changed the description of an illness to the doctor, misstated diagnostic or treatment options, or affected a parent's understanding of a child's condition or the need for follow-up visits or referrals.

The problem is a serious one. According to a 2004 study of 200 state hospitals, roughly 51 percent of California hospital patients who needed translation did not receive it. San Mateo Medical Center spokesman Dave Hook said in an Examiner.com article last summer that an estimated 35 percent, or about 23,400, of the Medical Center's annual patients speak a language other than English. He added that this number is growing. A significant number of medical errors have occurred nationwide because people have misinterpreted medical information.

In the U.S., the increasing number of patients with limited English proficiency (LEP) has recently been attracting considerable attention in federal and state legislatures. Language barriers impact the quality of care, service utilization, patient satisfaction, health outcomes, legal liability, and hospital admissions, and have resulted in excessive costs within the healthcare industry.

As a result, the Department of Health and Human Services (HHS) Office of Civil Rights and Office of Minority Health has mandated that any entities receiving federal funds, including healthcare organizations, "must offer and provide language assistance services, including bilingual staff and interpreter services, at no cost to each patient/consumer with limited English proficiency, at all points of contact, in a timely manner during all hours of operation."(1) The 6,003 hospitals (2003 http://www.USNews.com) and 836,156 physicians in the United States (2001 http://www.ama.com) are expected to absorb hundreds of millions of dollars to comply. Converser for Healthcare will directly address this demand and provide institutions with a reliable alternative which is lower in cost than any other solution on the market today.

Benefits of the Converser for Healthcare translation system: Converser is

-- A cost-efficient alternative to human translators, interpreters, and transcribers.

-- Highly reliable. It is the first broad-coverage translation product that allows a user to check accuracy and easily correct errors in real time, using Spoken Translation's unique Meaning Cues(TM) technology.

-- A private, consistent, and verifiable solution for translation in an environment where mistranslation could result in medical reporting errors and incorrect patient diagnoses.

-- An around-the-clock system that can be used anywhere, anytime.

-- Capable of broader, more reliable translation results than other automatic translation solutions on the market today.

Availability & Pricing:

Converser for Healthcare is available starting in March 2007 for English to Spanish, with other languages planned for release later this year. Chinese is planned for the healthcare market, while German and Japanese are currently under development for other markets. Converser can run on Tablet PCs or laptops (full-size or ultra-portable), and release is planned for numerous handheld devices. Converser uses Nuance's RealSpeak text-to-speech engine, which accurately pronounces translated text.

The list price is slated for $1,499. In North America, Spoken Translation will sell Converser for Healthcare through VARs (value added resellers), direct sales, and government contracts. In North America and worldwide, sales through OEMs are also planned.

For further information about reselling or purchasing Converser, please call 1-866-SPOKENT.

Monday, February 19, 2007

4 new Nuance RealSpeak Voices

We just released 4 new Nuance RealSpeak Voices.

Samantha US English Female 22khz Voice Sample **New**

MP3 File WMA File WAV File

Sangeeta Indian Accent English 22khz Voice Sample **New**

MP3 File WMA File WAV File

Yannick German Male 22khz Voice Sample **New**

MP3 File WMA File WAV File

Alexandros Greek Male 22khz Voice Sample **New**

MP3 File WMA File WAV File

Especially excited about Alexandros as he is our first Greek voice.

Purchase is via

RealSpeak™ Solo is a text-to-speech solution, optimized to enhance embedded conversational applications. It is a scalable solution that provides exceptionally high speech quality output across a range of footprints. Scaling from 8 to 20MB for embedded and automotive applications and up to 100MB for desktop applications, it is the ideal solution for a wide range of deployments where high quality speech output is ess

Sunday, February 18, 2007

TTS Reader for PocketPC and SmartPhone

Text To Spech for Pocket PC

We continue to receive a lot of requests for TextAloud on PocketPC. While we still don't have this ready yet, we have found a partner with a decent PocketPC product. It uses the Neospeech voices Kate and Paul. There isn't a downloadable trial version, but is available for purchase. You can read more about it through the affiliate links below.

If you do purchase, if we do come up with a TextAloud for PocketPC, we will give you a free upgrade to it.

To read more, see screenshots and purchase, use the links below:

Windows Mobile PocketPC Version

Windows Mobile SmartPhone Version

If you do purchase be sure to email us and let us know what you think.

Kevin Lenzo of Cepstral talks about TTS

Kevin Lenzo has a unique background in academia, the open source community, and now as the founder of Cepstral, a text-to-speech (TTS) company seeking to interact with the open source community to build a commercial product. This gives him a panoramic view of both the potential and the problems involved in implementing voice technology most effectively.

Lenzo asserts that the key speech technology is not speech recognition, but text-to-speech. Speech output is of paramount importance, not speech input. He uses the example of a car radio to illustrate that buttons can be just as effective in controlling what you hear, with greater privacy, and without the inevitable occasional failures associated with speech recognition.

Lenzo presents a long list of possible applications of TTS, including hands-free in-car navigation systems, location-based weather reporting, remote network monitoring, and just-in-time broadcasting. He contrasts the latter with packaged podcasts that can end up relaying stale information. In all his examples, he sees it as crucial that devices are driven by user needs rather than the needs of the service provider, so that such applications can evolve into what he terms "an external brian" that, in a sense, controls the user. This may sound almost threatening on first hearing, but Lenzo welcomes devices, such as location-aware warehouse systems, that can guide and inform as you perform other tasks.

There are other areas in which TTS can be extremely useful. Lenzo is involved in a project in Kenya to provide speech services via phone in areas where computers are extremely rare. With literacy and language problems, TTS can provide accessibility to information that traditional computing cannot.

After bemoaning the problems involved in porting VoiceXML across different platforms, Lenzo ends his presentation with a plea for a vendor-independent cross-platform API for speech components.

More from ITConversations...

Friday, February 16, 2007

TextAloud a Vital Resource for Those with Aging Parents

View a PDF file for this press release

CLEMMONS, NC - NextUp's TextAloud software has made several headlines recently, thanks to its usefulness to the blind. Now, users with aging or visually impaired relatives are discovering its value as well. Easy to install even for PC novices, TextAloud is the perfect tool for those helping older relatives deal with an increasing loss of sight, enabling them to quickly, cheaply, and easily export e-mails, web pages, magazine articles, books and more, into spoken audio.

TextAloud most recently proved its usefulness to Warren Scharbert and his father, Don Scharbert of Joliet, Illinois. Don Scharbert is an active and knowledgeable longtime computer user with macular degeneration in one eye and a detached retina in the other. This leaves him with limited vision, and with screen magnifiers and other tools an increasingly necessary, frustrating part of life -- pages that might take seconds or minutes for sighted viewers, may take Don hours to read by magnifier. His son sought a better solution for him -- and found TextAloud, which not only integrated flawlessly with his father's existing PC programs, but quickly proved itself to be an invaluable support for Don's love of computing, bringing the web, e-mail, and more into spoken audio.

An award-winning program that converts text into spoken audio for listening on a PC or laptop, TextAloud can also save text to audio files for playback on portables like the iPod®, PocketPC®, and a wide range of other devices. Priced at just $29.95, TextAloud is simple for anyone with a PC, and has become popular for the visually impaired, who are increasingly turning to the program as their main source of Text to Speech.

As a computer user who enjoys everything from reading e-mail to surfing the web, despite the difficulties posed by his disability, Don Scharbert was delighted to discover TextAloud. Like many with visual disabilities, Don had tried and been disappointed by several other high-end programs, including one created to work as a magnifier and reader, which crashed his NetZero High-Speed and was incompatible with many of his PC's configurations. By contrast, TextAloud was able to integrate directly with Internet Explorer, and worked like a dream. This instantly simplified such tasks as surfing the 'Net, visiting his Church's website, and reading his e-mail.

"TextAloud has made the Internet enjoyable again," Don comments. "You just highlight the text and press Play -- can't get much simpler than that!" TextAloud also enables Don to read newsletters, online documents and PDFs with remarkable ease and quickness, even changing the voices depending on his mood. "Most of all, I'm so happy to be able to get my daily e-mails from friends and family again," he adds. "Before you spend lots of money on a text reader that promises the moon, try TextAloud, because it does what it says efficiently -- and without glitches." A free download trial of TextAloud is offered from the NextUp.com website.

About TextAloud
TextAloud has been featured in The New York Times, PC Magazine, Writer's Digest, on CNN, and more. Hailed by critics and users alike, TextAloud is priced at just $29.95, and is compatible with systems using Windows (R) 98, NT, 2000, XP and VISTA. The program is available for fast, safe and secure purchase via http://www.nextup.com/purchase.html.

NextUp.com also offers TextAloud with optional premium voices from AT&T Natural Voices (TM), NeoSpeech (R), Nuance (R), Acapela (R) and Cepstral (R) for the most natural-sounding computer speech anywhere. Available languages include US English, UK English, Indian Accent English, Scottish Accent English, French, Canadian French, Latin American Spanish, Castilian (European) Spanish, Mexican Spanish, Brazilian Portuguese, European Portuguese, Russian, Mandarin Chinese, Cantonese Chinese, Korean, Japanese, German, Italian, Dutch, Belgian Dutch, Danish, Swedish, Norwegian, Polish, and Arabic.

About NextUp.com
NextUp.com, a division of NextUp Technologies, LLC, provides award-winning Text to Speech software for consumers, business customers, educators, and those with visual or vocal impairment, or learning disabilities.

In addition to TextAloud, NextUp.com markets other innovative Windows software designed to save time and deliver vital information. NewsAloud™ is a talking personal "news agent" that finds the stories users want, and then reads them aloud or to portable audio files. WeatherAloud™ is a weather application that lets users select and listen to personalized weather forecasts, while StocksAloud™ reads stock updates and related news headlines aloud for specific companies of interest. NextUp Talker is an easy and affordable program that allows people who have lost their voices to use the latest in high-quality computer voices to communicate with others. Most recently, NextUp introduced a new text reader, AbleReader, available with the AT&T Natural Voices (TM) for use on Mac computers. Information on AbleReader is available at http://www.AbleReader.com.

Note to Editors:

Evaluation copies of TextAloud are currently available upon request. To receive a review copy, or for more information on NextUp.com or TextAloud, please contact publicist Angela Mitchell at (904) 982-8043.

All companies and products referenced in this press release are the trademarks of their respective owners.

Repeating text over and over using TTS

I've been getting this question a lot lately. Can I use TextAloud to repeat the same text over and over. Seems most of it is related to people trying to learn content of a text, or memorize text, although a few users were doing other things, like listening to motivational statements or affirmations.

Within TextAloud on the main menu, there are 2 items on the Speak menu
Speak->Loop Speak Current Article
Speak->Loop Speak All Articles

With either of these, text will be repeated over and over until Stop is pressed.

This could certainly be an interesting way to lean material.

Thursday, February 15, 2007

Proofreading with Text To Speech

One great use for Text To Speech is for proofreading. We have a ton of writers, ranging from professional writers to those just wanting to make sure their emails make sense who use TextAloud's Proofreading features. Editors like Word, Word Perfect and others all have built in spell check, which is great. Some also have grammar check, which can at times be helpful, but at other times just gets in the way. But how many times have you written a document that passes spell check no problem only to find that you either put the wrong word in, or left a word out and now a sentence doesn't make any sense. These types of mistakes can reflect badly on you. And, proofreading by simply reading over your text sometimes catches errors, but our brains are so good at reading, that it will often fill in or correct the mistake internally and you never really see the problem.

Text To Speech to the rescue. Below are some quotes from writer's who use it

Praise from Writers who use TextAloud:

Tom Hannon
"Everything I write gets the run-through with TextAloud now," comments fiction writer Tom Hannon. "It is as important as running a spell-check." It's not uncommon, when proofreading from a screen or printed page, to miss sentence fragments or improper word choices. However, "with TextAloud, you can hear when something doesn't sound right," he adds. "It's so easy to use. It is the greatest writing tool since the word processor."

Kathryn Caskie
"I use TextAloud every day when I am writing," comments multi-published fiction author Kathryn Caskie, whose latest book, A Lady's Guide to Rakes, was released on September 1, 2005 for Warner Books. "In fact, when I was recently out of town and didn't have access to my favorite voice 'Audrey,' I wasn't nearly as productive."

Caskie finds the program valuable not only for its ambiance (listening to British-accented voice 'Audrey' helps her to get into the feel of the English characters in her historical romance in progress), but also for its value as a proofreading tool. "Most writers read their work aloud to themselves in order to make sure the dialogue sounds natural, as well as to catch typos and to ensure that their prose flows. However, by time many authors get to this point, they've likely read any given passage several times. So, too often, we read what 'should' be on that page, not necessarily what actually appears. With TextAloud, I can just sit back with a cup of coffee and listen to my book read back to me. I follow along on the computer screen and correct any typos, and will also pause instantly if a particular line of dialogue doesn't sing."

She even uses TextAloud to save her chapters in MP3 format, loading them onto her iPod for later listening at her convenience, whether in the car or at a child's soccer game. "I've used TextAloud for two published books so far," she comments. "I love the program, and rave about it to all of my author friends."

Sherryl King-Wilds
Book Reviewer and fantasy novelist Sherryl King-Wilds values TextAloud's usefulness when editing her novels, as well as in writing her book reviews, and has high praise for the program's edit function in particular. "The ability to highlight a certain section of text and hear it without having to listen to endless blocks of 'wordage' gives me flexibility and saves me much-needed time," she says.

In her fiction writing, as well as her book reviews and articles for the webzine Fantasy Novel Review, the program alerts King-Wilds to awkward language - or worse, "the horrible typo," and allows her to correct such mistakes instantly. "I can fix things then and there," she comments, "without having to worry so much about my editor's flowing red ink pen." For her, TextAloud has proven to be "a time saver, an editor, a partner - even a preservationist of sanity" when deadlines approached.


So I wanted to give a quick run-through of how to setup and use the TextAloud Proofread hotkey for proofreading. Here is the FAQ response I usually give

One great way to improve your writing is with better proofreading. Spell checkers catch the typos but don't help much with missing or wrong words, or with bad grammar. Whether you write for a living or just want your emails to be mistake free, using TextAloud to listen to text will help catch most mistakes. To make this process even easier, in TextAloud 2.0 we've created a special proofreading process that makes catching mistakes and correcting them very simple.

You start by setting a proofread hotkey. Go to Options->Hotkey Setup on the TextAloud main menu, and select a hotkey combination for Proofreading. Hotkeys are special key combinations that you can press within any program, and as long as TextAloud is running (even if the window isn't displayed), TextAloud will take an action. You want the hotkey to be an obscure combination so that it will be unique and not in use by other programs. We recommend Control-Alt-Shift-P for Proofread. Once this is set, click OK. Now you can minimize the TextAloud window.

After typing in your document or email, highlight a paragraph of text with your mouse, then hold down the Shift, Control, and Alt keys, and press P. The TextAloud Proofread window will pop up. The text of the paragraph will be spoken, with each word highlighted. If you hear a mistake, simply click anywhere on the proofread window and speaking will stop, returning you to your original document to make the correction. You simply repeat this process through the document until it is mistake free. Based on the feedback we've heard so far, this easy process can quickly lead to mistake free writing.

From the TextAloud Manual

Proofreading with TextAloud

An often overlooked use for TextAloud is to help proofread. Spell Checkers within word processors and email clients help correct many common typing and spelling errors, but do little to correct other common problems such as typing the wrong word, leaving out words, or poorly constructed sentences. Proofreading the old fashioned way is often in ineffective too, because our brains are so adept at reading that we will often not catch mistakes. But hearing our own written words spoken back to us in another voice will almost always alert us to mistakes.

To assist with this proofreading task, TextAloud has a special Proofread HotKey and Popup Proofreading window. Via Options->HotKey Setup you can set a keyboard combination that will activate the proofreading window. Choose something obscure to insure other programs will not be using the combination. We suggest using Control-Shift-Alt-P, but you can experiment with any combination that works for you.

The theory behind using the Proofreading function is that you need to hear small sections of text at a time, while watching the words being highlighted. Since you typically aren’t typing this text within TextAloud, but within your email or word processing program, you need a way to quickly return to the text to make any corrections. This means that using the TextAloud main window could become cumbersome. So instead, with the Proofread HotKey, a popup window will show you the text, if you see a mistake, simply click on the window and speaking will stop and you will be returned to the program you are writing in to make corrections.

To demonstrate this process, assume you are typing a document in your word processor such as Microsoft Word. Once the document or a section of the document is complete and you are ready to proofread, return to the top and highlight a paragraph. Next, hit the Proofread HotKey combination (Control-Alt-Shift-P for example). The TextAloud Proofreading Window will appear as shown below:

Text from the highlighted paragraph will immediately begin speaking as words are highlighted. You can customize the size of the window, voice and speed used, as well as Font and Colors used for the text. These settings will be remembered for future use. Most users will increase speed to slightly faster than normal listening since this is text they are already familiar with.

If while listening and watching the text you find a mistake, simply click anywhere within the text area of this window and speaking will stop and the window disappears, returning you to your word processor. Correct the mistake you found, and repeat the process. If no mistake is found, when the paragraph is complete, the window disappears and you are ready to repeat the process with the next paragraph, until the document is completed. This process will greatly cut down on mistakes in your writing.

Wednesday, February 14, 2007

IBM getting closer to Voice Translation?

One dream a lot of companies are working on is a Star Trek style translator that would let people who speak different languages simply talk through a device and hear the words of the other person in their own language. This type of things will come and work well eventually, but things are slowly progressing.

IBM to break language barrier

MUMBAI: Imagine you are in a foreign country where you don't speak the language, and you need to decipher a confusing train schedule in a hurry. Wouldn't it be handy to be able to talk into a device, asking questions about departures and ticket prices, and have your queries translated into spoken word in the native language of train officials? Thanks to IBM's experiments with translation and speech technology, the spoken language gap for travellers, and others who might need a personal translator in their pockets, may be bridged now.

Revealing some innovative research initiatives that are under way, IBM innovation and technology executive vice-president Nicholas Donofrio says: "We have been working on speech technology for nearly 35 years now. As opposed to our earlier efforts where we were solely focussed on perfect translation, the ones that can stand up in a court of law, or face up to financial scrutiny, this time around we focused on its use in other parlances where perfection does not matter. A technology was born that can offer translation service in real time." Although several companies, including IBM, produce software that provides text-to-speech translation, so-called speech-to-speech translation engines have always remained on the horizon.

The prototype of the IBM software, dubbed Multilingual Automatic Speech-to-Speech Technology or MASTOR, "lets someone speak to me in say Hindi or Chinese, and the receiver understand by way of MASTOR what is being said in English in real time. May be a few words would be changed, give or take a few prepositions and adjectives here and there. But the whole idea now is to have speech-to-speech language translation even without the perfection of the language or grammatical skills on the part of the technology," explains Mr Donofrio.

IBM's earlier attempts at speech translation include ViaVoice, which gave voice handheld devices like PDAs, and Phrasalator. Now, after having tested with the US armed forces in the Iraq war, IBM has commercial plans for its newest technological breakthrough. It intends to explore market opportunities where language translation technologies are in high demand, including medical facilities, law enforcement, banking and travel. "We are planning to talk to telcos and get them interested in offering the service," the innovation head for IBM adds. The company also plans to offer the service to first-time care givers such as the fire department.

The technology also brings good news to gamers around the world. IBM has plans for the technology for gaming companies. "The technology has uses in massively multiplayer online games aka MMOGs. Imagine you are playing World of Warcraft with players from different countries. How can you converse in a common language? Well with MASTOR, you could be playing from Korea, China or Italy, all at the same time and everybody could understand each other, " he explains. For internet users, IBM also has plans to bring this technology to use in e-mails. "The service is not just for voice, it works for text as well," says Mr Donofrio.

Monday, February 12, 2007

Recent TextAloud blog mentions

Busan Mike

NeoSpeech led in turn to the discovery of TextAloud, where NeoSpeech and other compatible voices are sold. While a TextAloud demo succeeded in getting the Microsoft Korean voice to work, the quality was predictably robotic - though my girlfriend thought at first it was reading with a North Korean accent - make of this what you will... Having satisfied myself that a combination of TextAloud and the NeoSpeech Korean 'Yumi' voice was the best, I bought them. Purchasing was fairly painless but the voice file was a 550Mb download which took a while.

Create audio books for iPod with TextAloud make your own MP3 audio
TextAloud is the text to speech tool that enables you to create your very own audio books, ready to Create your own audio books. In today’s busy world, audio books are a excellent way of being

The Advantages of Text To Speech Readers

We live in such a fast paced world. Everything that we do is getting quicker and quicker. Waiting a minute or two for something 10 years ago is equivalent to waiting 2 or 3 seconds today. Don't you think that is true? And we are now masters at multitasking. Even men multitask and that used to be thought of as a woman's talent. I found a new to me gadget that helps us to multitask called text to speech. More specifically it is a tool called TextAloud. It is really rather amazing! It can read your email, web pages, reports and lots more aloud to you on your PC. Imagine being able to dust and have your email read aloud to you, or cook supper while listening to a report for work. You can also have them saved as Mp3 or Windows Media files. Then you can just take your reading with you and listen on your iPod or PocketPC. Isn't that amazing? More...

Moving Back to Jamaica


I have found something good!

As an increasingly frequent writer, my concern about putting out well written material has given me pause for thought. Some recently read, self-published books that I found to be horrific literary adventures, have only added to my concern. Given that I write a blog, I can't very well blame my editor, publisher or proof-reader.

I imagine that I could blame my wife (my unpaid editor)... but doing so would only confirm suspicions that my jackass writings are indeed written by... a jackass.

Well, the good thing I have found will at least let people know that I can, more frequently than not, put together the basics of grammar, punctuation and spelling.

The tool is called TextAloud, and it simply converts written words into words spoken aloud by my computer.

Sunday, February 11, 2007

Nice series on TTS in XP

Troubleshooting can be a difficult task, especially if you have not worked with a specific technology before. When it comes to troubleshooting text-to-speech problems, there are a few points that you should keep in mind.

  • Use the Preview Text button from the Speech Properties dialog box to verify that the TTS engine.
  • Open the Utility Manager to check the status of the Narrator program.
  • If you do not hear any sound and you are using external speakers, make sure they are turned on.
  • Check the Master Volume dialog box to make sure that muting is not enabled.
  • Verify that the speakers are properly connected to the computer. You may need to check the documentation that came with the speakers for the proper procedure.
  • Use Device Manager to check the status of the computer’s sound card. If necessary, reinstall or update the drivers for the device.

Friday, February 9, 2007

Interesting TextAloud menion - MEET BORING MARY

From Linda Ford Books:

I want you to meet Mary. She has a monotone voice and speaks at a slow, steady pace. It`s hard to listen to her for more than a few minutes at a time and yet I regularly submit to listening to her for several hours. Why, you ask? Because Mary in the voice in a computer program I have called TextAloud. (For more information see www.NextUp.com ) And one of the final steps on every manuscript is to let Mary read it to me. The purpose is not that I might enjoy the story. If I do, that`s a bonus. What I`m wanting is to hear and catch typos, repeated words, things like using ever instead of even. And it`s amazing how many little things I catch. Things that would normally be called line editing. But with Mary`s help I can correct them before the manuscript goes to the editor. It`s nice to know I`m sending a manuscript as clean as I can make it.

Thursday, February 8, 2007

Cepstral Voices

Cepstral is another voice company offering a wide variety of SAPI5 Voices. These voices integrate into TextAloud and the other Aloud products, plus any application that uses SAPI5.

These voices have come a long way over the last few years, and a few of them are very, very good. Cepstral's voices are known for smaller footprint, using less memory and CPU Power, and being very quick at creating audio files. They are also unique in having a couple of fun voices, Shouty and Whispery, which are truly unique, and only cost $6.95 each.

The other unique thing about Cesptral is you can download trial versions of their voices. These voices are trial versions, which will work forever, but until purchased they have a little audio notice at the begin of the audio.

You can download a trial of my favorite, Callie at
or the others via the Cepstral Store.

Samples and more info from our site below:

Cepstral® Voices from NextUp.com

*Version 4.0 Cepstral Voices Now Available* Exciting new voices from Cepstral® are now available for only $29.99 each. These high quality voices take up less disk space (average less than 50mb) than most premium voices, do not use as much processor power, and are very fast when creating audio files. These SAPI5 compliant voices are supported by all NextUp.com products as well as most other speech products.

NOTE: If you purchased Cepstral Voices prior to January 2006, email us for upgrade information.

Click Here to purchase any or all of these great new voices from Cepstral® for only $29.99 each and download them after purchase.

Premium Voices

Callie US English Voice **NEW**
MP3 File WMA File WAV File
David US English Voice
MP3 File WMA File WAV File
Diane US English Voice
MP3 File WMA File WAV File
William US English Voice
MP3 File WMA File WAV File
Italian Vittoria Voice
MP3 File WMA File WAV File
Lawrence UK English Voice
MP3 File WMA File WAV File
Isabelle Canadian French Voice
MP3 File WMA File WAV File
Katrin German Voice
MP3 File WMA File WAV File
Miguel Spanish Voice
MP3 File WMA File WAV File
Character Voices (only $6.99 each)
Shouty US English Character Voice **NEW**
MP3 File WMA File WAV File
Whispery US English Character Voice **NEW**
MP3 File WMA File WAV File
Damien Character Voice **NEW**
MP3 File WMA File WAV File
Dog Character Voice **NEW**
MP3 File WMA File WAV File
Duchess Character Voice **NEW**
MP3 File WMA File WAV File
Additional Voices

Robin US English Child's Voice
MP3 File WMA File WAV File
Amy US English Voice
MP3 File WMA File WAV File
Duncan Scottish Accent Voice
MP3 File WMA File WAV File
Emily US English Voice
MP3 File WMA File WAV File
Linda US English Voice
MP3 File WMA File WAV File
Walter US English Voice
MP3 File WMA File WAV File
Millie UK English Voice
MP3 File WMA File WAV File
Jean-Pierre Canadian French Voice
MP3 File WMA File WAV File
Matthias Voice
MP3 File WMA File WAV File
Marta Spanish Voice
MP3 File WMA File WAV File

Try an Interactive Demo of Cepstral Voices below:

Buy AT&T Natural Voices™ Click Here to purchase any or all of these great new voices from Cepstral® for only $29.99 each and download them after purchase.

Wednesday, February 7, 2007

Speech synthesis - Wikipedia

Good primer on what tts/speech synthesis is from WikiPedia.

is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.[1]

Synthesized speech can also be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.[2]

The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1980s.



Overview of text processing

Audio sample:

A text-to-speech system (or "engine") is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion .[3] Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.

More at


Monday, February 5, 2007

NeoSpeech SAPI5 Voices

Next set of voices I wanted to highlight are NeoSpeech. Developed by the Korean company, VoiceWare, these are among the most natual sounding voices around. They are SAPI5 compatible, which means they not only work in TextAloud, but are also very popular among users of screen readers like Window-Eyes and Jaws for Windows, along with programs like Kurzweil 3000 and people simply using some of windows built-in TTS.

The Kate and Paul bundle of US English voices is $35 and they are fantastic. Neospeech is also unique in their focus on Asian voices. There is a nice interactive demo of the voices at

Some samples from our site

NeoSpeech Voices

NextUp.com is pleased to now be able to offer new Text To Speech voices from Neospeech. Kate and Paul are US English voices, available in 16khz or 8khz versions, supporting SAPI5 Speech applications including all NextUp.com Products, most newer TTS programs from other companies, as well as TTS functions built into Windows XP.

Asian voices for Korean, Japanese, and Mandarin Chinese are available in 16khz SAPI5 Verisons.

Each voice requires between 300 and 650mb of disk space, and is available on CD or via download. They support speed and pitch adjustments, and require a minimum of Pentium II, 400mhz with 128mb RAM.

Listen to NeoSpeech Samples below:

US English Voices

Kate16 - 16khz US English Female

MP3 File WMA File WAV File

Paul16 - 16khz US English Male

MP3 File WMA File WAV File

Kate8 - 8khz US English Female

MP3 File WMA File WAV File

Paul8 - 8khz US English Male

MP3 File WMA File WAV File

Mandarin Chinese Voices

Lily - 16khz Chinese Female

MP3 File WMA File WAV File

Wang - 16khz Chinese Male

MP3 File WMA File WAV File

Korean Voices

Junwoo - 16khz Korean Male

MP3 File WMA File WAV File

Yumi - 16khz Korean Female

MP3 File WMA File WAV File

Japanese Voices

Miyu - 16khz Japanese Female

MP3 File WMA File WAV File

If you have any questions, please email us at support@nextup.com.

The technology as created by VoiceWare is called VoiceText. Some of the info from their site on these voices:


VoiceText™ is the leading software solution for generating extremely natural-sounding voices from text input.

VoiceText is available in configurations for a wide range of embedded devices, desktop and network/server applications, making it the most flexible high quality TTS solution on the market today.

VoiceText is available in US English, Korean, Japanese and Mandarin Chinese.


Natural Sound and Clear Pronunciation
VoiceText provides very natural and highly intelligible output.

Variable Footprints
With variable footprints ranging from 16 to over 500 megabytes, VoiceText is configurable for use in a wide range of embedded, desktop, and network/server applications. No other single TTS solution offers this degree of flexibility.

Multiple Languages, Multiple Voices
VoiceText is available US English, Korean, Japanese and Mandarin Chinese. A collection of eleven native voices is available across these languages.

Large Extensible Dictionary
Hundreds of thousands of pronunciations are included in the default dictionary of each of the supported languages. VoiceText supports customization of the dictionary so that developers can adjust pronunciations of symbols, abbreviations, and new terms.

Expressive Control
Pitch, speed, volume, and pauses can be customized, both dynamically and by default.

Pre-Processing of Input Text
VoiceText automatically handles special input such as dates, times, abbreviations found in addresses, and sentences with mixed languages. New cases can be added using customizable rules.

System Design
VoiceText synthesizes speech sub-real time, and supports multi-thread and multi-channel architectures. An optional load balancer is available for the network/server configuration.

Flexible Data Output Formats
VoiceText supports various formats including 8kHz/16kHz sampling rates, linear 8-bit/16-bit PCM, 8-bit mu-law/a-law, ADPCM, Windows .wav, and others.

Support of APIs
VoiceText supports SAPI 5, C/C++, COM, and Java-based Application Programming Interfaces (APIs).


From monitoring services to custom voice creation, VoiceText offers a range of product support services to meet your exact needs and specifications. NeoSpeech is committed to working more closely with developers to deliver the products, pricing and support that will ensure our customers' success in the marketplace.

Secure Internet Support
The VoiceText network/server configuration can be monitored and managed from a distance via a secure Internet connection.

New Voice Creation
VoiceText can adapt specifically for new voices. Ask us about custom voice creation!

Application Tuning
Does your application have a lot of cryptic input or hard-to-pronounce vocabulary? Ask us about our cost-effective tuning services!


Unparalleled Flexibility
VoiceText is highly flexible and can be easily configured to support a variety of embedded, desktop and network/server applications. Regardless of the application, VoiceText provides an instant, seamless way to present dynamic content or update evolving information on the fly.


From PDAs to network servers, VoiceText accurately delivers the news and information your customers, employees and vendors need to know. Our clear, natural-sounding voices can process any size or type of text, from simple driving directions to complete news stories to personal e-mail, delivering incomparable accuracy and quality.

Second Language Instruction
VoiceText is of such high quality that it can easily be used in English as a Second Language (ESL) or other second language learning applications. It allows users to learn any material, gathered from any electronic source, any time. VoiceText TTS technology is currently in use in ESL applications in three languages.

Multimedia Games and Avatars
VoiceText allows avatars in multimedia applications to say anything. Characters can interact with the player in real-time.

Messaging and Information Delivery
Whether it's personal account information for a valued customer, or rapidly changing information that requires constant updating, VoiceText's seamless technology provides the clear, concise processing to effectively communicate and showcase your information.

  • Customer account information. VoiceText smoothly and seamlessly delivers individual account information
  • Promotional information in call queues. Instantly update your customers on new developments.
  • News reading
  • Driving directions
  • Visually Impaired
  • Web access
  • Book and document reading

System Requirements

Network / Server Desktop Embedded
O/S Windows NT 4.0 / 2000 / 2003
Windows 98 / NT 4.0 / 2000 / XP / 2003 Windows CE 2.0 - 5.0 /
PocketPC 2002
CPU Pentium IV 1.7 GHz Pentium III 500 MHz 170 MHz
RAM 1 GB 128 MB (256 MB Recommended) 6-12 MB
DB 35-900 MB 12-64 MB