# 5.4. Vector quantization (VQ)#

Suppose you have recorded sounds at different locations and want to categorize them into similar groups. In other words, you have a stochastic vector $$x$$ which you want to characterize with a simple description. For example, categories could correspond to office, street, hallway and cafeteria. A classic way for this task is to choose template vectors $$c_{k}$$, which represents a typical sound in each environment $$k$$. To categorize the sounds, you then find that template vector which is closest to your recording $$x$$. In mathematical notation, you search for a $$k^{^*}$$ by

$k^* = \arg\min_k \|x-c_k\|^2.$

The above expression thus calculates the squared error between $$x$$ and each of the vectors $$c_{k}$$ and chooses the index $$k$$ of the vector with the smallest error. The vectors $$c_{k}$$ then represent a codebook and the vector $$x$$ is quantized to $$c_{k^*}$$. This is the basic idea behind vector quantization, which is also known as k-means.

A illustration of a simple vector codebook is shown on the right. The input data is a Gaussian distribution shown with grey dots and the codebook vectors $$c_{k}$$ with red circles. For each input vector we thus search for the nearest codebook vector and the borders of the regions where input vectors are assigned to a particular codebook vector are illustrated with blue lines. These regions are known as Voronoi regions and the blue lines are the decision-boundaries between codebook vectors.

Example of a codebook for a 2D Gaussian with 16 code vectors.

## 5.4.1. Metric for codebook quality#

Suppose then that you have a large collection of vectors $$x_{h}$$, and you want to find out how well this codebook represents the input data. The expectation of the squared error is approximately the mean over your data, such that

$E_h\left[ \min_k \|x_h-c_k\|^2 \right] \approx \frac 1N \sum_{h=1}^N \min_k \|x_h-c_k\|^2,$

where $$E[ ]$$ is the expectation operator and $$N$$ is the number of input vectors $$x_{h}$$. Above, we thus find the codebook vector which is closest to $$x_{h}$$, find its squared error and take the expectation over all possible inputs. This is approximately equal to the mean of those squared errors over a set of input vectors.

To find the best set of codebook vectors $$c_{k}$$, we then need to minimize the mean squared error as

$\{c_k^*\} := \arg\min_{\{c_k\}}\, E_h\left[ \min_k \|x_h-c_k\|^2 \right]$

or more specifically, for a dataset as

$\{c_k^*\} := \arg\min_{\{c_k\}} \sum_{h=1}^N \min_k \|x_h-c_k\|^2.$

Unfortunately we do not have an analytic solution for this optimization problem, but have to use numerical, iterative methods.

## 5.4.2. Codebook optimization#

### 5.4.2.1. Expectation maximization (EM)#

Classical methods for finding the best codebook are derivatives of expectation maximization (EM), which is based on two alternating steps:

Expectation Maximation (EM) algorithm:

1. For every vector $$x_{h}$$ in a large database, find the best codebook vector $$c_{k}$$.

2. For every codebook vector $$c_{k}$$;

1. Find all vectors $$x_{h}$$ assigned to that codevector.

2. Calculate mean of those vectors.

3. Assign the mean as a new value for the codevector.

3. If converged then stop, otherwise go to 1.

This algorithm is guaranteed to give a codebook at every step which is not worse than the previous codebook. That is, at each iteration will improve until it finds a local minimum, where it stops changing. The reason is that each step in the iteration finds a partial best-solution. In the first step, we find the best matching codebook vectors for each data vectors $$x_{h}$$. In the second step, we find the within-category mean. That is, the new mean is more accurate than the previous codevector in that it reduces the average squared error. If the mean is equal to the previous codevector, then there is no improvement. We prepared a sample code of this algorithm in the following:

codebook vectors =  8


As noted above, this algorithm is the basis to most vector quantization codebook optimization algorithms. There are a multiple reasons why this simple algorithm is usually not sufficient alone. Most importantly, the above algorithm is slow to converge to a stable solution and it often finds a local minimum instead of a global minimum.

To improve performance, we can apply several heuristic approaches. For example, we can start with a small codebook $$\{ c_k \}_{k=1}^K$$ of $$K$$ elements and optimize it with the EM algorithm. We then split the codebook into two, offset by a small delta $$d$$, such that $$\|d\|<\epsilon$$ and make the new codebook $$\{ \hat c_k \}_{k=1}^{2K} := \{ c_k,\, c_k+d \}_{k=1}^K$$ of 2$$K$$ elements. We then rerun the EM algorithm on the new codebook. The codebook thus doubles in size at every iteration and we continue until we have the desired codebook size. There is a sample code for this approach in below.

initial codebook vectors =  1
desired codebook vectors =  8