Skip to main content
Ctrl+K
Introduction to Speech Processing - Home
  • Introduction to Speech Processing
  • 1. Preface
    • 1.4. Using this book
    • 1.6. List of authors
    • 1.7. Instructions for Developers
    • 1.8. Contributing
    • 1.9. Contributor Covenant Code of Conduct
  • 2. Introduction
    • 2.1. Why speech processing?
    • 2.2. Speech production and acoustic properties
    • Speech perception (Wikipedia)
    • 2.3. Linguistic structure of speech
    • Speech-language pathology (Wikipedia)
    • 2.4. Applications and systems structures
    • Social and cognitive processes in human communication (external)
  • 3. Basic Representations
    • 3.1. Waveform
    • 3.2. Windowing
    • 3.3. Spectrogram and the STFT
    • 3.4. Short-time analysis of speech and audio signals
    • 3.5. Short-time processing of speech signals
    • 3.6. Signal energy, loudness and decibel
    • 3.7. Autocorrelation and autocovariance
    • 3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
    • 3.9. Linear prediction
    • 3.10. Fundamental frequency (F0)
    • 3.11. Zero-crossing rate
    • 3.12. Deltas and Delta-deltas
    • 3.13. Pitch-Synchoronous Overlap-Add (PSOLA)
    • 3.14. Jitter and shimmer
    • https://en.wikipedia.org/wiki/Crest_factor
  • 4. Pre-processing
    • 4.1. Pre-emphasis
    • Noise gate (Wikipedia)
    • Dynamic Range Compression (Wikipedia)
  • 5. Modelling tools in speech processing
    • 5.1. Source modelling and perceptual modelling
    • 5.2. Linear regression
    • 5.3. Sub-space models
    • 5.4. Vector quantization (VQ)
    • 5.5. Gaussian mixture model (GMM)
    • 5.6. Neural networks
    • 5.7. Non-negative Matrix and Tensor Factorization
    • 5.8. Vocoder
    • 5.9. The Griffin-Lim algorithm: Signal estimation from modified short-time Fourier transform
  • 6. Evaluation of speech processing methods
    • 6.1. Subjective quality evaluation
    • 6.2. Objective quality evaluation
    • 6.3. Other performance measures
    • 6.4. Analysis of evaluation results
  • 7. Speech analysis
    • Voice and speech analysis (Wikipedia)
    • 7.1. Measurements for medical analysis of speech
      • Electroglottography (Wikipedia)
      • Videokymography (Wikipedia)
    • 7.2. Forensic Speaker Recognition
  • 8. Recognition tasks in speech processing
    • 8.1. Voice Activity Detection (VAD)
    • 8.2. Wake-word and keyword spotting
    • 8.3. Speech Recognition
    • 8.4. Speaker Recognition and Verification
    • 8.5. Speaker Diarization
    • 8.6. Paralinguistic speech processing
  • 9. Speech Synthesis
    • 9.1. Concatenative speech synthesis
    • 9.2. Statistical parametric speech synthesis
  • 10. Transmission, storage and telecommunication
    • 10.1. Design goals
    • 10.2. Code-excited linear prediction (CELP)
    • 10.3. Frequency-domain coding
    • 10.4. Modified discrete cosine transform (MDCT)
    • 10.5. Entropy coding
    • 10.6. Perceptual modelling in speech and audio coding
  • 11. Speech enhancement
    • 11.1. Noise attenuation
    • 11.2. Echo cancellation
    • 11.3. Bandwidth extension (BWE)
    • 11.8. Multi-channel speech enhancement and beamforming
      • 11.8.3. Time-Delay of Arrival (TDoA) and Direction of Arrival (DoA) Estimation
  • 12. Self-supervised learning
  • 13. Computational models of human language processing
  • 14. Research and Development
    • 14.1. Design of Experiments and Projects in Speech Technology
    • 14.2. Speech data and experiment design
  • 15. Security and privacy in speech technology
  • 16. Glossary
  • 17. References

Index

B | F | O | P | S | T

B

  • Backchannel

F

  • Formants
  • Fundamental frequency (F_0)

O

  • Objective test

P

  • Perceptual model
  • Phonation
  • Phone
  • Phoneme

S

  • Sampling rate
  • Subjective test

T

  • Turn-taking

By Tom Bäckström, Okko Räsänen, Abraham Zewoudie, Pablo Pérez Zarazaga, Liisa Koivusalo, Sneha Das, Esteban Gómez Mellado, Mariem Bouafif Mansali, Daniel Ramos, Mohammad Hassan Vali

Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.