11.1. Noise attenuation#
When using speech technology in realistic environments, such as at home, office or in a car, there will invariably be also other sounds present and not only the speech sounds of desired speaker. There will be the background hum of computers and air conditioning, cars honking, other speakers, and so on. Such sounds reduces the quality of the desired signal, making it more strenuous to listen, more difficult to understand or at the worst case, it might render the speech signal unintelligible. A common feature of these sounds is however that they are independent of and uncorrelated with the desired signal. [Benesty et al., 2008]
That is, we can usually assume that such noises are additive, such that the observed signal \(y\) is the sum of the desired signal \(x\) and interfering noises \(v\), that is, \(y=x+v\). To improve the quality of the observed signal, we would like to make an estimate \( \hat x \) of the desired signal \(x\). The estimate should approximate the desired signal \( x\approx \hat x \) or conversely, we would like to minimize the distance \( d\left(x,\hat x\right) \) with some distance measure \(d(\cdot,\cdot)\).
Show code cell source Hide code cell source
# Initialization for all from scipy.io import wavfile import numpy as np import matplotlib.pyplot as plt import IPython.display as ipd from helper_functions import stft, istft, halfsinewindow fs = 44100 # Sample rate seconds = 5 # Duration of recording window_length_ms=30 window_step_ms=15 window_length = int(window_length_ms*fs/2000)*2 window_step_samples = int(window_step_ms*fs/1000) windowing_function = halfsinewindow(window_length) filename = 'sounds/enhancement_test.wav' # read from storage fs, data = wavfile.read(filename) data = data[:] ipd.display(ipd.Audio(data,rate=fs)) plt.figure(figsize=[12,6]) plt.subplot(211) t = np.arange(0,len(data),1)/fs plt.plot(t,data) plt.xlabel('Time (s)') plt.ylabel('Amplitude') plt.title('Waveform of noisy audio') plt.axis([0, len(data)/fs, 1.05*np.min(data), 1.05*np.max(data)]) spectrogram_matrix = stft(data, fs, window_length_ms=window_length_ms, window_step_ms=window_step_ms, windowing_function=windowing_function) fft_length = spectrogram_matrix.shape window_count = spectrogram_matrix.shape length_in_s = window_count*window_step_ms/1000 plt.subplot(212) plt.imshow(20*np.log10(np.abs(spectrogram_matrix[:,range(fft_length)].T)), origin='lower',aspect='auto', extent=[0, length_in_s, 0, fs/2000]) plt.axis([0, length_in_s, 0, 8]) plt.xlabel('Time (s)') plt.ylabel('Frequency (kHz)'); plt.title('Spectrogram of noisy audio') plt.tight_layout() plt.show()