11.8.3. Time-Delay of Arrival (TDoA) and Direction of Arrival (DoA) Estimation#

In multi-channel speech enhancement, a regularly appearing task is to estimate the time-delay between channels or equivalently, the angle at which a wavefront arrives to an array of microphones. By knowing the time-delay or angle of arrival, we can use beamforming to isolate sounds from that particular direction. A frequently used method for time-delay estimation is the generalized cross-correlation (GCC) method and especially its PHAT-weighted variant known as GCC-PHAT [Azaria and Hertz, 1984, Knapp and Carter, 1976, Kwon et al., 2010]. Generalized cross-correlation#

The cross-spectrum of two spectra \(X_{1,k,t}\) and \(X_{2,k,t}\) is

\[ C_{k,t} = X_{1,k,t}^* X_{2,k,t}, \]

where \(k\) and \(t\) are the frequency and time indices. The spectra are of form \(X_{h,k}=a_{h,k} e^{i\frac{2\pi kn_h}N}\), where \(n_h\) is the time-offset and \(N\) is the length of the analysis window. We thus have

\[ C_k = a_{1,k} e^{-i\frac{2\pi kn_1}N} a_{2,k} e^{i\frac{2\pi kn_2}N} . \]

If the time-difference between channels is \(\tau=n_2-n_1\), then

\[ C_k = a_{1,k}a_{2,k} e^{-i\frac{2\pi kn_1}N+i\frac{2\pi k(\tau+n_1)}N} = a_{1,k}a_{2,k} e^{i\frac{2\pi k\tau}N} . \]

It can be weighted with a variety of approaches such as

\[ C_k' = \frac{X_{1,k}^* X_{2,k}}{|X_{1,k}^* X_{2,k}|} = e^{i\frac{2\pi k\tau}N} \]

to obtain the generalized cross-spectrum. The generalized cross-correlation is the inverse Fourier transform of the generalized cross-spectrum

\[ r_k' = {\mathcal F}^{-1}\{C_k'\} = \delta_\tau, \]

where \(\delta_k\) is the Dirac-delta function. In other words, the generalized cross-covariance has a single peak whose position indicates the time-delay \(\tau\) between the two channels.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
import scipy
import IPython.display as ipd
from helper_functions import stft

# read from storage
filename = 'sounds/temp.wav'
fs, data = wavfile.read(filename)

delay_samples = 20
delay_zeros = np.zeros(delay_samples)

data1 = np.concatenate((data,delay_zeros))
data2 = np.concatenate((delay_zeros,data))

noise_gain_dB = -10
noise_gain = np.std(data)*10**(noise_gain_dB/20)

observation1 = data1 + noise_gain*np.random.randn(len(data1))
observation2 = data2 + noise_gain*np.random.randn(len(data1))
X1 = stft(observation1,fs)
X2 = stft(observation2,fs)

crossspectrum = np.mean(np.conj(X1)*X2,axis=0)
crosscorrelation = scipy.fft.irfft(crossspectrum/np.abs(crossspectrum))
Hide code cell source
ix = np.argmax(observation1)
plt.plot(range(ix-100,ix+100),observation1[(ix-100):(ix+100)],label='Channel 1')
plt.plot(range(ix-100,ix+100),observation2[(ix-100):(ix+100)],label='Channel 2')
plt.xlabel('Time (samples)')
plt.title('Input signals')

plt.plot([delay_samples,delay_samples],[-.1,.6],'r--',label='True delay location')
plt.xlabel('Sample $k$')
plt.ylabel("Generalized cross-correlation $r_k'$")

../_images/bdda3e55c8e8a321e9dc71621b5641cc38b1a3e8532bd5656be3039fb0853f14.png References#


Mordechai Azaria and David Hertz. Time delay estimation by generalized cross correlation methods. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):280 – 285, 1984. URL: https://doi.org/10.1109/TASSP.1984.1164314.


C. Knapp and G. Carter. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4):320–327, 1976. doi:10.1109/TASSP.1976.1162830.


Byoungho Kwon, Youngjin Park, and Youn-sik Park. Analysis of the GCC-PHAT technique for multiple sources. In ICCAS 2010, volume, 2070–2073. 2010. doi:10.1109/ICCAS.2010.5670137.