Introduction to Speech Processing
Introduction to Speech Processing
1. Preface
2. Introduction
2.1. Why speech processing?
2.2. Physiological speech production
Speech perception (Wikipedia)
2.5. Linguistic structure of speech
Speech-language pathology (Wikipedia)
2.6. Applications and systems structures
Social and cognitive processes in human communication (external)
3. Basic Representations
3.1. Waveform
3.2. Windowing
3.3. Signal energy, loudness and decibel
3.4. Spectrogram and the STFT
3.5. Autocorrelation and autocovariance
3.6. Cepstrum and MFCC
3.7. Linear prediction
3.8. Fundamental frequency (F0)
3.9. Zero-crossing rate
3.10. Deltas and Delta-deltas
3.11. Pitch-Synchoronous Overlap-Add (PSOLA)
3.12. Jitter and shimmer
https://en.wikipedia.org/wiki/Crest_factor
4. Pre-processing
4.1. Pre-emphasis
Noise gate (Wikipedia)
Dynamic Range Compression (Wikipedia)
5. Modelling tools in speech processing
5.1. Source modelling and perceptual modelling
5.2. Linear regression
5.3. Sub-space models
5.4. Vector quantization (VQ)
5.5. Gaussian mixture model (GMM)
5.6. Neural networks
5.7. Non-negative Matrix and Tensor Factorization
6. Evaluation of speech processing methods
6.1. Subjective quality evaluation
6.2. Objective quality evaluation
6.3. Other performance measures
6.4. Analysis of evaluation results
7. Speech analysis
7.1. Fundamental frequency estimation
7.2. Inverse filtering for glottal activity estimation
Voice and speech analysis (Wikipedia)
7.3. Measurements for medical applications
Electroglottography (Wikipedia)
Videokymography (Wikipedia)
7.3.1. Glottal inverse filtering
7.4. Forensic analysis
8. Recognition tasks in speech processing
8.1. Voice activity detection (VAD)
8.2. Wake-word and keyword spotting
8.3. Speech Recognition
8.4. Speaker Recognition and Verification
8.5. Speaker Diarization
8.6. Paralinguistic speech processing
9. Speech Synthesis
9.2. Concatenative speech synthesis
9.3. Statistical parametric speech synthesis
10. Transmission, storage and telecommunication
10.1. Design goals
10.2. Basic tools
10.3. Modified discrete cosine transform (MDCT)
10.3.4. Entropy coding
10.3.5. Perceptual modelling in speech and audio coding
10.4. Code-excited linear prediction (CELP)
10.5. Frequency-domain coding
11. Speech enhancement
11.1. Noise attenuation
11.2. Echo cancellation
11.3. Bandwidth extension (BWE)
11.4. Multi-channel speech enhancement and beamforming
Chatbots / Conversational design (external link)
12. Computational models of human language processing
13. Security and privacy in speech technology
.md
.pdf
Glottal inverse filtering
7.3.1.
Glottal inverse filtering
¶
previous
7.3.
Measurements for medical applications
next
7.4.
Forensic analysis