IBM Type 704 Electronic Data Processing Machine
IBM Type 704 Electronic Data Processing Machine. Image via NASA.

History of Text-to-Speech

Written by Kirey Ismaya on February 19, 2021

Quoted from Wikipedia, The first computer-based speech-synthesis systems originated in the late 1950s. Noriko Umeda developed the first general English text-to-speech system in 1968, at the Electrotechnical Laboratory in Japan. In 1961, physicist John Larry Kelly, Jr and his colleague Louis Gerstman used an IBM 704 computer to synthesize speech, an event among the most prominent in the history of Bell Labs.

Kelly's voice recorder synthesizer (vocoder) recreated the song "Daisy Bell", with musical accompaniment from Max Mathews. Coincidentally, Arthur C. Clarke was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel 2001: A Space Odyssey, where the HAL 9000 computer sings the same song as astronaut Dave Bowman puts it to sleep. Despite the success of purely electronic speech synthesis, research into mechanical speech-synthesizers continues.

Initially in the 1960s, speech analysis and synthesis techniques were divided into two approaches, namely particulatory synthesis approach and terminal-analogue synthesis. In particulatory synthesis approach, the speech production mechanism is modelled physiologically in sufficient detail. Then in the terminal-analogue speech synthesis approach, it is modelled with any model. More orientation emphasis is placed on modelling speech signals, not on how how to raise it.