NETtalk (artificial neural network)

NETtalk is an artificial neural network that learns to pronounce written English text by supervised learning. It takes English text as input, and produces a matching phonetic transcriptions as output.^[1]

It is the result of research carried out in the mid-1980s by Terrence Sejnowski and Charles Rosenberg. The intent behind NETtalk was to construct simplified models that might shed light on the complexity of learning human level cognitive tasks, and their implementation as a connectionist model that could also learn to perform a comparable task. The authors trained it by backpropagation.^[1]

The network was trained on a large amount of English words and their corresponding pronunciations, and is able to generate pronunciations for unseen words with a high level of accuracy. The success of the NETtalk network inspired further research in the field of pronunciation generation and speech synthesis and demonstrated the potential of neural networks for solving complex natural language processing problems. The output of the network was a stream of phonemes, which fed into DECtalk to produce audible speech, It achieved popular success, appearing on the Today show.^[2]^: 115

From the point of view of modeling human cognition, NETtalk does not specifically model the image processing stages and letter recognition of the visual cortex. Rather, it assumes that the letters have been pre-classified and recognized. It is NETtalk's task to learn proper associations between the correct pronunciation with a given sequence of letters based on the context in which the letters appear.

A similar architecture had been subsequently used for the opposite task, that of converting continuous speech signal to a phoneme sequence.^[3]

^ ^a ^b Sejnowski, Terrence J., and Charles R. Rosenberg. "Parallel networks that learn to pronounce English text." Complex systems 1.1 (1987): 145-168.
^ Sejnowski, Terrence J. (2018). The deep learning revolution. Cambridge, Massachusetts London, England: The MIT Press. ISBN 978-0-262-03803-4.
^ Bourlard, H.; Wellekens, C.J. (December 1990). "Links between Markov models and multilayer perceptrons". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (12): 1167–1178. doi:10.1109/34.62605.

[:3-1] Sejnowski, Terrence J., and Charles R. Rosenberg. "Parallel networks that learn to pronounce English text." Complex systems 1.1 (1987): 145-168.

[:0-2] Sejnowski, Terrence J. (2018). The deep learning revolution. Cambridge, Massachusetts London, England: The MIT Press. ISBN 978-0-262-03803-4.

[3] Bourlard, H.; Wellekens, C.J. (December 1990). "Links between Markov models and multilayer perceptrons". IEEE Transactions on Pattern Analysis and Machine Intelligence. 12 (12): 1167–1178. doi:10.1109/34.62605.

[1]

[2]

[3]