Voice Recognition Software to the Rescue?
by David L. Farquhar08/31/2000
If you're a writer or a programmer, I know three words that may strike greater fear into your heart, soul, and mind than the thought of death itself: carpal tunnel syndrome.
I thought I left career-threatening injuries behind when I stopped playing baseball seriously when I was sixteen. And as a sysadmin type by day and a writer by night, I always thought the greatest danger to my livelihood was braving insane St. Louis traffic twice a day.
In February I started feeling weird pains in my lower arm. In my baseball days, I always played in pain, so I didn't think much of it. Athletes rest and heal between games and during the off-season. So I figured I'd just tough it out, get my new book finished, and take some extra time off before starting another. Besides, the pain didn't bother me. It just felt a little weird.
But as time passed, the pain became excruciating. Finally, I went to see my doctor, who uttered the three dreaded words. I got a second opinion, then a third. All disagreed about what I had, but they agreed on one point: Keep keyboarding to a minimum, or better yet, stop completely.
With a book on deadline and my doctor's orders to avoid typing, I hoped voice recognition software would come to my rescue.
So I cracked open my copy of IBM's ViaVoice 97. Its system requirements: Pentium-MMX 166MHz, 32 megs of RAM, Windows 95. I normally scoff at minimum system requirements, but this was serious business. I installed it on my Celeron-400 with 256 megs running Windows 98, spent a couple of hours training it, and found it totally unusable.
For starters, it was slow. In spite of having literally four times the horsepower stated in the system requirements and having used every trick in the book to make that system overachieve (And when I say I wrote the book on system performance, I'm not lying), I could get at best 20 words per minute. That's slightly faster than I figure I can type if I hunt and peck with my two big toes. When healthy, I can touch-type well over 80 wpm. I'm not interested in spending $2,000 to see if a gigahertz CPU would do better when voice recognition was promised to come of age at 100MHz.
Not only was it slow, it was also inaccurate. I added the word "Celeron" to my dictionary. Next thing I knew, every other word I said was "Celeron." I said "is" once and my PC heard "Celeron." If I hadn't been so mad, I'd have thought it hilarious. Unless I spoke far more slowly than the words came to me, ViaVoice got hopelessly confused, and then when I stopped to make corrections, I was prone to lose my train of thought.
There's nothing like an example, so I pulled several books off my bookshelf and read a short segment from each into ViaVoice. The first, The Christmas Cross by Max Lucado, is a children's book. The second, my own Optimizing Windows for Games, Graphics and Multimedia, was supposed to be more challenging. Finally, for a giggle, I pulled out Oh, the Places You'll Go! by Dr. Seuss. ViaVoice struggled on all of them.
Here's a segment from Lucado rendered by ViaVoice:
So, it was the fight with demand that got me into taxes. But it was the photo that led me to clear water. Began had received did in the mail. No return address. Your. Just this photo: black and white image of a large, stone building.Here's what Lucado actually published:
So, it was the fight with Meg that got me into Texas. But it was the photo that led me to Clearwater. Megan had received it in the mail. No return address. No letter. Just this photo: a black and white image of a large, stone building.ViaVoice reduced Optimizing Windows to laughable gibberish:
one Web site changed my whole way of looking at computers.I was looking for sites about Windows98. I guess I about cars. If there was ever any doubt in my mind that there is a Web site about everything, this site E restate. Before in Newark, I was reading about tweaking of Dodge spirits in raising them. One fan of the site rhodium, criticizing some aspects of the spirits designed. Another rebuff said it that without cutting down the this brains, modifying that gizmo that holds the ear filter to get more beer flowing into the engine, and other less-than-trivial modifications, the cart was "totally inadequate " for a stop-and-goal city driving.
I related the story to a couple of friends don't exactly sure my love of computers and passed it palatable and eyes and a black woman started talking about computers. Both agree that adamantly.
Reading about raising Dodge Spirits in Newark? Beer flowing into the engine? My original words weren't that ridiculous:
One Web site changed my whole way of looking at computers.And finally, Dr. Seuss:I was looking for sites about Windows 98. I got a site about cars. If there was ever any doubt in my mind that there is a Web site about everything, this site erased it. Before I knew it, I was reading about tweaking out Dodge Spirits and racing them. One fan of the site wrote in, criticizing some aspects of the Spirit's design. Another buff said that without cutting down the springs, modifying the gizmo that holds the air filter to get more air flowing into the engine, and other less-than-trivial modifications, the car was "totally inadequate" for stop-and-go city driving.
I related the story to a couple of friends who don't exactly share my love of computers and asked if that was what I sounded like when I started talking about computers. Both agreed adamantly.
you'll come down from the lurch with an unpleasant bomb. And the chances are, then, that you'll be in this long. And when you were in this long, you're not in for much fun. On the slumping yourself is not easily done.The original reveals that ViaVoice was slumping badly:
You'll come down from the Lurch with an unpleasant bump. And the chances are, then, that you'll be in a Slump. And when you're in a Slump, you're not in for much fun. Un-slumping yourself is not easily done.It's not entirely ViaVoice's fault. The quality of PC audio hardware today is absolutely pathetic. I've dabbled in songwriting and recording, so I'm not a complete stranger to audio. The quality of the audio inputs on most sound cards on the market today is an atrocity. The quality of the majority of computer microphones is worse. Occasionally, when I played back a word the computer didn't understand, I couldn't understand it myself because the recording was so bad. If I can't understand it, how can I expect the computer to?
Knowing that one of my other systems (a 350MHz K6-2) has an Ensoniq sound card in it, I tried ViaVoice on it. Ensoniq makes professional-quality synthesizers and keyboards, so they know what they're doing when it comes to audio. Theoretically, their sound cards would reflect that. In all fairness, the Ensoniq-equipped K6-2 did a noticeably better job, but its results were still less than acceptable.
As for the microphone that came with ViaVoice, no self-respecting crooner would be seen in the same room with it. I'm a self-deprecating punk-rock vocalist and I don't trust any mic that costs less than $300.
In addition to the technical problems, there's a human element. Talking is a totally different mental process than writing. Some of us are very comfortable with one but not the other. I'm much more comfortable writing. While I was writing up a tech review for PC Hardware in a Nutshell by Robert Bruce Thompson and Barbara Fritchman Thompson, I tried an experiment. I dictated half of the review and typed the other half. The results sounded like the work of two totally different writers: one a tolerable writer (no writer should think too highly of his/her own work) and the other a boring, unadventurous writer who only knew about 34 words.
After I thought about it, this made total sense: I had quickly learned the kind of words the computer could and couldn't understand, and I had avoided saying words the computer would probably mangle. But knowing your computer won't understand certain words makes you work within those limits, which in turn makes for some pretty poor writing.
So, rather than throw money at sound cards and microphones and faster motherboards and CPUs and the latest and greatest versions of IBM ViaVoice and Dragon Naturally Speaking, I bought a mind-boggling array of vitamins, started taking them religiously, and disconnected my computers for a while. When I did have to use a computer, I held a capped pen in each hand and used them to hunt and peck.
I suppose I could request evaluation copies of the very latest offerings from both IBM and Dragon, plus a smattering of sound cards and microphones, and then try different combinations until I found something workable. (Computer companies are usually pretty receptive to the idea of loaning stuff to writers to test.) I expect accuracy would increase, but I'd be shocked if real-world tests produced anything faster than 30 wpm. And, of course, Joe Consumer doesn't have that luxury.
I probably wouldn't have the time for such experiments, especially experiments that hold so little promise. And, now that my wrists are feeling much better, I've got a book to write.
David L. Farquhar saw a computer for the first time, a Commodore VIC-20, at the impressionable age of seven and hasn't been the same since. He has five years' experience as a systems analyst and a journalism degree from the University of Missouri. When he's not fixing computers or writing about them, he's a die-hard Kansas City Royals fan, fiction writer, and songwriter. He volunteers frequently at his church as a Bible study teacher, sound technician, and projectionist. Dave lives in St. Louis, Missouri. You'll find his personal Web site at www.access2k1.net/users/farquhar.



