034-035_EETE-VF

EETE SEPTEMBER 2012

DESIGN & PRODUCTS DATA ACQUISITION Voice input processing for automotive speech recognition systems By Sverrir Olafsson IN A QUIET, controlled environment, today’s speech recognition engines have become quite effective. Whether doing dicta- tion with a quality headset in a quiet office, or giving search- phrases to a smartphone in a silent room, hit rates of close to 100 percent are now commonly achieved. However, adding a few disturbances tends to quickly degrade the performance. The automobile environment is one of the most challenging in this respect. A variety of noise sources both outside of the car (passing cars, honking horns) and inside (multiple passengers talking, the air conditioning fan, the radio) along with audio reverberations off the hard surfaces result in the lackluster per- formance with which many car owners are familiar. Further, in order to avoid false triggers, the driver of the car needs to push a button to trigger the speech command system. This is not just a nuisance but also a safety hazard. Yet few applications could benefit more from using speech recognition for voice command operation than the automobile. It is therefore critical and of great value if technology can make speech recognition more effective in cars, detecting com- Selective source pickup (SSP) mands reliably in the presence of all disturbances without use Independent Component Analysis (ICA) is an emerging area of button-presses. While fundamentally being a speech recogni- of research within audio technology that attempts to separate tion problem, performance improvements will primarily come or extract different voice or noise sources. Established in the by processing the voice input signal by removing noise and early 90s, it is based on the idea that the underlying sources of disturbances. a mixed signal are statistically independent. Using prior knowl- In recent years, one of the key areas that Conexant has edge of the statistics of the certain types of signals combined focused its vast experience in audio technology is in Voice Input with the measured correlation parameters, adaptive techniques Processing (VIP). By doing careful design from the microphone can in fact separate or “de-mix” the combined signal to extract interface, providing clean bias signals and lownoise pre-amplifi- one or more of the underlying sources. Typically, ICA algorithms cation and gain control, to implementing complex digital signal require an extreme amount of processing power and memory. processing algorithms on its high-performance yet low-power This makes them impractical for implementation in embedded DSPs, Conexant has been able to deliver VIP devices for a real-time systems. Conexant’s SSP algorithm utilizes some of number of applications including TVs, home appliances and the fundamental ideas from ICA, reduces these requirements to automobiles. Within those applications, one of the primary ad- a practical level and yet delivers on the promise of separating vantages of using the Conexant solution is to improve the per- one talker from another talker or from the environmental noise formance of speech recognition engines, where the Conexant using only two microphones. The decision of which source solution has been optimized for many of the common speech to extract can be made in real time. The algorithm can simply recognition algorithms for use in challenging environments. extract the dominant talker or use the position of the talker with To achieve superior performance, several algorithms are em- respect to the microphones to decide what signal to extract. In ployed to enhance the desired input signal and suppress noise effect, this allows the VIP to zoom in on a single talker in a room sources in a coordinated manner. Conexant’s Selective Source or car filled with interference from other sources, which can Pickup (SSP) algorithm is uniquely able to separate the de- be extremely useful for a speech recognition application in an sired signal from the noise sources by analyzing statistical and automobile environment. spatial information in the signal. The interference coming from the local loudspeakers is cancelled with Conexant’s advanced Multi-channel acoustic echo cancelling Multichannel Acoustic Echo Canceller (MAEC), reverberation (MAEC) is suppressed with a novel de-reverberation algorithm, and the One of the most controlled sources of noise in a car is the audio remaining environmental noise is attenuated by a Non-Station- being played back from a radio or CD. Most current speech ary Noise Reduction (NSNR) algorithm. Tuning these algorithms recognition systems require that audio playback be either at- together, and in particular if they are tuned for a specific speech tenuated or fully squelched for the recognition system to work. recognition engine, can vastly improve the word hit rate without However, using echo cancellation techniques, the audio play- any changes to the speech recognition system. back signal can be estimated as it appears at the microphone and subtracted out, leaving only the desired voice signal for the speech recognition engine. This is common practice for speak- Sverrir Olafsson is VP of engineering at Conexant - erphone conversations over Bluetooth, but with audio playback www.conexant.com there are typically multiple speakers playing the audio, poten- 34 Electronic Engineering Times Europe September 2012 www.electronics-eetimes.com


EETE SEPTEMBER 2012
To see the actual publication please follow the link above