Page 16

EETE FEBRUARY 2013

AUDIO & VIDEO ELECTRONICS 3D noise cancellation makes voice control significantly more reliable By Lior Blanka Voice quality is becoming a hot issue due to the recent rise of voice control interfaces for tablets, computers, Smart TVs and other consumer electronic devices. Without intelligible speech, automatic voice recognition can’t function properly nor be relied upon as a form of device control. This problem is compounded by noisy environments that can severely degrade the quality of speech to the point where voice control is totally inoperable. Traditional noise cancellation suffers from trade-offs between the degree of noise reduction and voice quality: the higher the noise reduction levels, the greater the potential for voice distortion. Attempting to minimise the trade-offs, engineers have developed Noise reduction algorithms to reduce the amount of noise which perform well mainly in stationary noise and poor performance in non-stationary noise such as street noise and similar other noises. Noise cancellation technique took a leap forward with the introduction of a second microphone in smart phones, enabling both microphones to operate in similar manner to the human auditory system. However, this capability does not provide sufficient noise cancellation to eliminate all background noise for voice calls or voice control, while driving or riding on public transportation, or even at home when, for instance, music is turned up loud. Adding a sensor for advanced noise cancellation Advanced noise cancellation technology uses an additional sensor in addition to the standard two audio microphones, and then applies a 3D-Vocal algorithm to perform multiple voice processing tasks including echo and background noise cancellation, loudness equalisation and general voice enhancement. Removing background noise significantly improves the accuracy rate of ASR, (Automatic Speech Recognition) and voice-call applications for smart phones, tablets and other mobile devices. An example of how the advanced noise cancellation affects Fig. 2: Spectrogram of 3D processing on speech and ambient noise. the noisy speech is shown in figure 1. The upper waveform illustrates the noisy speech that is the superposition of speech and ambient noise (S+N), while the lower waveform shows the clean speech signal after 3D voice processing. Figure 2 shows a spectrogram, where the upper graph presents the spectrogram of the noisy speech S+N, the lower spectrogram shows the resulting speech signal after 3D voice processing. Using the expanded set of data provided by the sensor and the two microphones, the 3D-Vocal algorithm extracts features that characterise the speech source and distinguishes between the sound components that belong to required speech vs. ambient noise. The block diagram in figure 3 shows the audio path for the advanced noise cancellation technique. 3D voice processing diagram components The 3D-Vocal (Spectro-Temporal Analysis) consists in receiving all the signals from the microphone array and from the VSensor, and performing special spectro-temporal processing on the combined information. Some correlated patterns in the 3DVocal data are associated with ambient noise, while others are identified as the user’s voice. The 3D-Vocal spectro-temporal process separates the user’s voice from the predicted ambient noise and produces some reference information for the voice/ noise Feature Extraction block. With Feature Extraction, voice/noise data is fed to the other blocks. The extracted features contain spectro-temporal, realtime information about the user’s speech and ambient noise. This information can be used to filter out ambient noise from the user’s speech, enhance echo cancellation performance, and more. Ambient Noise Cancellation cancels various types of stationary and non-stationary, coherent and non-coherent ambient Fig. 1: Typical 3D voice processing results on speech and ambient noise. Lior Blanka is Corporate Vice President and Chief Technology Officer of the DSP Group - www.dspg.com Fig. 3: Block diagram for 3D voice processing. 16 Electronic Engineering Times Europe February 2013 www.electronics-eetimes.com


EETE FEBRUARY 2013
To see the actual publication please follow the link above