2.1. Why speech processing?#

2.1.1. Why speech?#

Speech is our primary mode of communication; When you want to communicate something important, you say it face-to-face. Think about your first “I love you”, your last job interview and a nice evening with friends. Everything important is communicated in a spoken form.

Speech is about communication. A characteristic trait of humans in comparison to other animals is our refined abilities to communicate. To work efficiently as a group, we need to communicate. To learn from our mistakes, we need to communicate. Where hand waving and smoke signals can be used to communicate, speech remains as our best way to communicate abstract thoughts.

However, a common idiom is “a picture is worth a thousand words”. It is also the reason why this document has pictures on the side. They help in capturing the essential information. An important difference between speech and images is however that where pictures excel in transmission of information, speech excels in interaction. The game “Pictionary” is fun because interaction through a picture is difficult.

Speech interaction is part of a large research in its own right (see e.g. the book Message processing).


2.1.2. Early communications technology#

The expressive power of speech is tremendous, it is a powerful tool for interaction, but in early human cultures, it was difficult to store information. Story-telling was a way to memorize history, but our capability to accurately reproduce stories is limited.

Cave paintings was an early way to store information more permanently, and this technology later evolved to stone tablets, papyrus and paper letters. Such ancient documents provide the most accurate information we have about our past. We would not know about Socrates, without the writings by Plato. Innovations in communications technology, such as cave paintings, book printing and the Internet, have been so important that they characterize historical eras.


2.1.3. Evolution of speech technology#

Telecommunications was another milestone in human history. Though the telegraph was an effective way for communicating, it also required specialized training. The invention of the telephone in 1849 was therefore a great invention because it was the first technology to provide instantaneous telecommunication without specialized training.

The first wireless (radio) transmission of speech came 50 years later in 1900, quickly to become an important broadcast media. Again, while newspapers had an important role in broadcasting news, the radio was faster and more accessible (does not require the ability to read).

Another important step was the introduction of mobile phones in the 1990’s. The importance of its impact is easily demonstrated by the changes in our behaviour which are a consequence of the new technology:

  • Before, we would agree on a specific time to talk on the phone - “I’ll call you at home around 18 o’clock.”. The other party would then know to stay at home waiting for the phone call. Today we just say “I’ll talk to you later”. There is no need to know where the other person is and we also do not agree on a specific time.

  • Before, we would agree on a specific time and place where to meet - “I’ll meet you at the main building at 12:15.” Both people would then adjust the timing of their arrival to match the agreed time. Today, we can just say “I’ll call you when I’m nearby.”

In both cases, we are more flexible in our scheduling, making for more efficient use of time.


2.1.4. Further development#

We have thus determined that speech is an important and powerful mode of communication for humans. For improving technology, this gives two prominent opportunities;

  • If we can make communication with speech easier using technology, it can be very useful. For example, if telecommunication, such as telephony, teleconferences, and voice-over-IP, can be improved, then that would allow people to use speech more efficiently.

  • We can use to our advantage the people’s preference of speech communication. For example, interactions with devices and computers could be improved by allowing spoken interaction with them. In particular, typing on a keyboard and other tactile interfaces are difficult for children, the elderly and handicapped people, whereas a majority of people (but not all) can speak. Similarly, user interfaces based on visual information is often based on accessing information and services through menus. Using natural language can be more intuitive and simple to use; we could just say to the washing machine “Wash this small amount of dirty curtains. ” instead of searching for washing options from a menu.

The devices and services which use speech and language are extremely wide-spread. By now, a majority of people in the world has access to a mobile phone and there are almost 8 billion active mobile-phone subscriptions. If we can improve the technology used by those 8 billion people, by say, reducing energy consumption, then the impact of such improvements would be majestic.