Skip to main content
Back to top
Introduction to Speech Processing
1. Preface
1.4. Using this book
1.6. List of authors
1.7. Instructions for Developers
1.8. Contributing
1.9. Contributor Covenant Code of Conduct
2. Introduction
2.1. Why speech processing?
2.2. Speech production and acoustic properties
Speech perception (Wikipedia)
2.3. Linguistic structure of speech
Speech-language pathology (Wikipedia)
2.4. Applications and systems structures
Social and cognitive processes in human communication (external)
3. Basic Representations
3.1. Waveform
3.2. Windowing
3.3. Spectrogram and the STFT
3.4. Short-time analysis of speech and audio signals
3.5. Short-time processing of speech signals
3.6. Signal energy, loudness and decibel
3.7. Autocorrelation and autocovariance
3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
3.9. Linear prediction
3.10. Fundamental frequency (F0)
3.11. Zero-crossing rate
3.12. Deltas and Delta-deltas
3.13. Pitch-Synchoronous Overlap-Add (PSOLA)
3.14. Jitter and shimmer
4. Pre-processing
4.1. Pre-emphasis
Noise gate (Wikipedia)
Dynamic Range Compression (Wikipedia)
5. Modelling tools in speech processing
5.1. Source modelling and perceptual modelling
5.2. Linear regression
5.3. Sub-space models
5.4. Vector quantization (VQ)
5.5. Gaussian mixture model (GMM)
5.6. Neural networks
5.7. Non-negative Matrix and Tensor Factorization
5.8. Vocoder
5.9. The Griffin-Lim algorithm: Signal estimation from modified short-time Fourier transform
6. Evaluation of speech processing methods
6.1. Subjective quality evaluation
6.2. Objective quality evaluation
6.3. Other performance measures
6.4. Analysis of evaluation results
7. Speech analysis
Voice and speech analysis (Wikipedia)
7.1. Measurements for medical analysis of speech
Electroglottography (Wikipedia)
Videokymography (Wikipedia)
7.2. Forensic Speaker Recognition
8. Recognition tasks in speech processing
8.1. Voice Activity Detection (VAD)
8.2. Wake-word and keyword spotting
8.3. Speech Recognition
8.4. Speaker Recognition and Verification
8.5. Speaker Diarization
8.6. Paralinguistic speech processing
9. Speech Synthesis
9.1. Concatenative speech synthesis
9.2. Statistical parametric speech synthesis
10. Transmission, storage and telecommunication
10.1. Design goals
10.2. Code-excited linear prediction (CELP)
10.3. Frequency-domain coding
10.4. Modified discrete cosine transform (MDCT)
10.5. Entropy coding
10.6. Perceptual modelling in speech and audio coding
11. Speech enhancement
11.1. Noise attenuation
11.2. Echo cancellation
11.3. Bandwidth extension (BWE)
11.8. Multi-channel speech enhancement and beamforming
11.8.3. Time-Delay of Arrival (TDoA) and Direction of Arrival (DoA) Estimation
12. Self-supervised learning
13. Computational models of human language processing
14. Research and Development
14.1. Design of Experiments and Projects in Speech Technology
14.2. Speech data and experiment design
15. Security and privacy in speech technology
16. Glossary
17. References
Please activate JavaScript to enable the search functionality.