Speech is an interaction, where parties are in a constant communication. Though speakers generally take turns such that only one speaker ‘has the turn’ and speaks at a time, the other, listening participants actively participate in the interaction by nodding or shaking their heads in agreement or disagreement, or by corresponding interjection such as ‘Uh-oh’, ‘Yeah’, and ‘Huh?’. See also Backchannel (linguistics) / Wikipedia.
The vocal tract has acoustic resonances, which emphasise some frequency ranges while attenuating others. Such high-energy regions of the spectrum are known as formants. They are important in speech since their location in frequency (and amplitude) uniquely identify vowels. By changing the shape of the vocal tract, we can change the location of those resonances and formants, to choose the vowel we want to utter. See also Acoustic properties of speech signals and Formant / Wikipedia.
- Fundamental frequency (\(F_0\))#
The vocal folds can oscillate when air flows through them and when they are appropriately tensioned. The frequency of such oscillation is known as the fundamental frequency, often abbreviated as \(F_0\), and it is perceived as the pitch of a speech sound. See also Acoustic properties of speech signals and Voice frequency / Wikipedia.
- Objective test#
An evaluation methodology based on a computational algorithm. See also Objective quality evaluation
- Perceptual model#
A model which simulates human perception. Typically used as a quality evaluation method, to judge how important different characteristics of a signal are for a human.
The phsyiological process of producing a speech sound is referred to phonation. In some areas, phonation is limited to voiced sounds or just those sounds with some sort of oscillation. See also Phonation / Wikipedia.
Phones are the elementary units of speech, associated with articulatory gestures responsible for producing them and with acoustic cues that make them distinct from other phones. See also Phones and Phone (phonetics) / Wikipedia.
Phonemes are defined in terms of their meaning contrasting function: two different phones of a language are also different phonemes, if they can change the meaning of a word. See also Phonemes and Phoneme / Wikipedia.
- Sampling rate#
The frequency at which the time-domain signal is sampled (measured). See also Waveform/Sampling rate.
- Subjective test#
An evaluation methodology where a human subject rates a characteristic of a system or signal. See also Subjective quality evaluation
Humans are generally able to follow only one speech message at a time. In a dialogue, it is therefore important that only one person is speaking at a time. The organization of a dialogue to agree on who ‘has the turn’ and is currently ‘in turn’ to speak, is known as turn-taking. See also Turn taking / Wikipedia.