In Hollywood science-fiction thrillers, humans talking freely with machines is de rigueur. In such films it would appear strange if people could not talk with their machines.
In reality, however, it remains a dream, the distance to which would frustrate any hardcore sci-fi fan.
Citing the many technological obstacles that have been in place for decades, some experts doubt the day of talking machines will come anytime soon, while others have talked up technologies that may enable man-machine communication via speech, at least in certain specific circumstances, in the not-too-distant future.
The consensus now among researchers is that putting available technologies into salable form is more realistic at the moment than sinking resources into technology that will allow free talk between human and machine.
"There is still difficulty in realizing this goal given the technologies available now," said Xu Bo, a researcher from the Institute of Automation Technology (IAT) of the Chinese Academy of Sciences. "But in some lower-end areas, technology has become mature enough to create salable products."
Eric Chang, a research fellow at Beijing-based Microsoft Research Asia, predicted that the day the sci-fi dream comes true is at least 20 to 30 years off.
"Progress has been continuously made in this field," he said. "But people have to take a more realistic attitude."
In fact, the realistic attitude has proved to be a shot in arm for stagnant research and has created a thriving market in speech technologies and products that has attracted an increasing number of giant companies and research institutes.
Personal computers, mobile devices and even fixed-line telephones embedded with speech technologies have been continuously emerging in the Chinese market over the past few years. The new devices promise to facilitate human-machine interaction through a more natural interface.
At least over a dozen Chinese companies are now developing speech technologies, either independently or in collaboration with foreign companies, all eying the huge potential of the Chinese-language market.
The IAT, for instance, came out with its Pattek ASR series speech products based on Chinese language that can be used to replace humans in paging services or be embedded in toys and teaching instruments. Their applications, like others on the market, are restricted to specific conversational situations.
Similar products in other languages have proved commercially viable in some applications, but are typically disliked by users because they are inefficient, rigid, incomplete and difficult to figure out. These shortcomings prevent them from being more widely deployed.
Naturally, researchers hope to develop universally applicable speech technologies that allow for more natural human-machine interaction.
At the center of the research is so-called speech recognition, which has been described by experts as more difficult than sending a man to the moon in technological terms.
Research in this area began as early as the computer was invented and was popularized through science fiction. But inaccuracy in recognition has constantly plagued researchers, who have to look for ways to educate the machine to adapt to the nuances of human speech. "The problem is that human languages appear too ambiguous for the machine," said Frank Seide, a researcher with Microsoft Research Asia.
Another problem is the spontaneity of human speech, which could confuse even the most powerful computer in the world, he said.
When you take into account the various local accents, particularly of the Chinese language, and the discrepancies in individual expression, creating a truly universal application for speech recognition with desirable accuracy appears all but a mission impossible.
Even with the best speech technology now available on the market, talking with machines -- a computer, for example -- is still a bumpy, far from pleasurable experience often characterized by inaccuracy and misunderstanding, experts said.
Take dictation software products for example. It often takes hours or even longer to "teach" the software to get acquainted with your voice and accent before it can transform your speech into text. And you have to carefully control your tone and speed to ensure as few inaccuracies as possible.
And it is commonplace to see speech software malfunctioning in public demonstrations, forcing the users to awkwardly repeat instructions until it reacts.
Under such circumstances, people's enthusiasm wears off rather quickly.
When IBM released its first Chinese speech recognition software, ViaVoice, five years ago, many Chinese computer experts and ordinary users who were tired of the keyboard enthusiastically embraced it.
The product, as its name suggests, reportedly allows users to input using their voices rather the keyboard, which is of special significance to Chinese users who have long been plagued with the difficulty of typing Chinese characters with Roman alphabet-based keyboards. It was touted as the first marketable product for Chinese speech recognition in the world.
After the preliminary hubbub settled down, ViaVoice's shortcomings began to expose themselves.
The application turned out not to be as "universal" as expected. For example, it performs well in dictation of a news article, for example, but appears far less efficient when the dictation is of a short essay.
What's more, it requires lengthy recitation in a quiet environment to get the computer acquainted with your voice.
Microsoft's Office XP also incorporates a dictation application but requires similar training, which has prevented its wider use.
Microsoft's Chang admitted the technological barriers might not be easy to break in the near future.
But his research group has nevertheless developed a spoken document search engine called Speech finder to allow users to search voice mail or online presentations using keywords. New probabilistic techniques have been developed to reduce the impact of imperfect speech recognition on retrieval precision, according to Chang. This technology also allows the user to skip quickly to audio segments of interest, he said.
Xu from the IAT also argued that dictation software, which requires verbatim speech-to-text transformation, might not necessarily be the best focus of current applications. Rather, mobile telecommunication is the field where speech recognition technologies should be more widely applied for now, because verbatim dictation, with all its difficulties, is unnecessary.
"Its main function will then be to replace the time consuming typing or button pressing needed to give instructions to mobile devices," he said.
Market demand in this field is huge, and far more tangible than sci-fi dreams, at least in the foreseeable future, he said.
(China Daily October 23, 2002)