Skip to main content
Ctrl+K
Logo image
  • Introduction to Speech Processing
  • 1. Preface
    • 1.4. Using this book
    • 1.5. List of authors
    • 1.6. Instructions for Developers
    • 1.7. Contributor Covenant Code of Conduct
  • 2. Introduction
    • 2.1. Why speech processing?
    • 2.2. Speech production and acoustic properties
    • Speech perception (Wikipedia)
    • 2.3. Linguistic structure of speech
    • Speech-language pathology (Wikipedia)
    • 2.4. Applications and systems structures
    • Social and cognitive processes in human communication (external)
  • 3. Basic Representations
    • 3.1. Short-time analysis of speech and audio signals
    • 3.2. Short-time processing of speech signals
    • 3.3. Waveform
    • 3.4. Windowing
    • 3.5. Signal energy, loudness and decibel
    • 3.6. Spectrogram and the STFT
    • 3.7. Autocorrelation and autocovariance
    • 3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
    • 3.9. Linear prediction
    • 3.10. Fundamental frequency (F0)
    • 3.11. Zero-crossing rate
    • 3.12. Deltas and Delta-deltas
    • 3.13. Pitch-Synchoronous Overlap-Add (PSOLA)
    • 3.14. Jitter and shimmer
    • https://en.wikipedia.org/wiki/Crest_factor
  • 4. Pre-processing
    • 4.1. Pre-emphasis
    • Noise gate (Wikipedia)
    • Dynamic Range Compression (Wikipedia)
  • 5. Modelling tools in speech processing
    • 5.1. Source modelling and perceptual modelling
    • 5.2. Linear regression
    • 5.3. Sub-space models
    • 5.4. Vector quantization (VQ)
    • 5.5. Gaussian mixture model (GMM)
    • 5.6. Neural networks
    • 5.7. Non-negative Matrix and Tensor Factorization
    • 5.8. Vocoder
    • 5.9. The Griffin-Lim algorithm: Signal estimation from modified short-time Fourier transform
  • 6. Evaluation of speech processing methods
    • 6.1. Subjective quality evaluation
    • 6.2. Objective quality evaluation
    • 6.3. Other performance measures
    • 6.4. Analysis of evaluation results
  • 7. Speech analysis
    • 7.1. Fundamental frequency estimation
    • Voice and speech analysis (Wikipedia)
    • 7.2. Measurements for medical analysis of speech
      • Electroglottography (Wikipedia)
      • Videokymography (Wikipedia)
    • 7.3. Forensic Speaker Recognition
  • 8. Recognition tasks in speech processing
    • 8.1. Voice Activity Detection (VAD)
    • 8.2. Wake-word and keyword spotting
    • 8.3. Speech Recognition
    • 8.4. Speaker Recognition and Verification
    • 8.5. Speaker Diarization
    • 8.6. Paralinguistic speech processing
  • 9. Speech Synthesis
    • 9.1. Concatenative speech synthesis
    • 9.2. Statistical parametric speech synthesis
  • 10. Transmission, storage and telecommunication
    • 10.1. Design goals
    • 10.2. Code-excited linear prediction (CELP)
    • 10.3. Frequency-domain coding
    • 10.4. Modified discrete cosine transform (MDCT)
    • 10.5. Entropy coding
    • 10.6. Perceptual modelling in speech and audio coding
  • 11. Speech enhancement
    • 11.1. Noise attenuation
    • 11.2. Echo cancellation
    • 11.3. Bandwidth extension (BWE)
    • 11.8. Multi-channel speech enhancement and beamforming
      • 11.8.3. Time-Delay of Arrival (TDoA) and Direction of Arrival (DoA) Estimation
  • 12. Computational models of human language processing
  • 13. Speech data and experiment design
  • 14. Security and privacy in speech technology
  • 15. Glossary
  • 16. References
  • .md

Speech analysis

7. Speech analysis#

  • Fundamental frequency estimation

  • Formant estimation and tracking

  • Measurements for medical applications

  • Inverse filtering for glottal activity estimation

  • Forensic speaker recognition

previous

6.4. Analysis of evaluation results

next

7.1. Fundamental frequency estimation

By Tom Bäckström, Okko Räsänen, Abraham Zewoudie, Pablo Pérez Zarazaga, Liisa Koivusalo, Sneha Das, Esteban Gómez Mellado, Mariem Bouafif Mansali, Daniel Ramos

Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.