All models are wrong, but some are useful.
— George Box

PROJECT AIMS

Wide Incremental learning with Discrimination nEtworks

Principal Investigator: R. Harald Baayen (Professor of Quantitative Linguistics)

This five-year project aims to deepen our understanding of how we produce and understand words in everyday speech.

It is almost-universally assumed that language use involves a form of mental calculus, in which alphabets of elementary symbols and rules define well-formed sequences. This calculus is usually believed to operate at two distinct levels, the level of phonology and the level of morphology and syntax. The phonological alphabet consists of letter-like units of sound called phonemes. Strings of phonemes build the atomic meaningful units of the language, known as morphemes. Rules and constraints define which sequences of phonemes can form legal morphemes. These morphemes in turn comprise the alphabet of a second calculus, with morphological and syntactic rules defining the legal sequences of morphemes (and thus the words and sentences of a language).

This pairing of a meaning-free phonological calculus with a morpheme-based morphological and syntactic calculus is widely regarded as a fundamental design feature of language, one that structuralist linguistics referred to as the dual articulation of language. Psychologists have followed linguists in positing that phonemes and morphemes exists as real mental units, and a large body of research has sought to show how these units are strung together in production and how in comprehension, visual or auditory input is first segmented into these elementary units, which are subsequently re-assembled into hierarchical structures.

In this project, we are investigating whether the comprehension and production of words truly requires sub-word units such as phonemes and morphemes. The realization of phonemes is known to vary tremendously with the context in which they occur. For distinguishing a 'p' from a 't' or a 'k', changes in the first and second formants of adjacent vowels are crucial. Furthermore, the theoretical construct of the morpheme, as the smallest linguistic sign, is perhaps attractive for agglutinating languages such as Turkish, but is not helpful at all for understanding the structure of words in fusional languages such as Latin. The central hypothesis under research in this project is that the relation between words' forms and their meanings can be modeled computationally in an insightful and cognitively valid way without using the theoretically problematic constructs of the phoneme and the morphemes.

Recent advances in machine learning and natural language engineering have shown that much can be achieved without these constructs. How far current natural language processing technology has moved away from concepts in classical (psycho)linguistics theory is exemplified by Hannun et al. (2014), who announced that they "... do not need a phoneme dictionary, nor even the concept of a 'phoneme' ". Importantly, also in theoretical morphology within linguistics, the construct of the morpheme has been heavily criticized. For inflectional morphology, many scientists now agree that inflectional features (such as for person, number, tense, etc.) are realized in sound, without there being a one-to-one mapping between bits of sound and individual feature values. In fact, one morphological theory, Word and Paradigm Morphology (Blevins, 2016), holds that words, and not sublexical units such as stems and affixes, are the fundamental units. According to this theory, proportional analogies between whole words drive morphological cognition.

The first goal of the WIDE project is to show that indeed the relation between words' forms and meanings can be computationally modeled without using phonemes and morphemes. In other words, we aim to develop a computational implementation of Word and Paradigm Morphology that provides, at the functional level, a cognitively valid characterization of the comprehension and production of complex words.

The second goal of the WIDE project is to clarify how much progress can be made with, and what the limits are of, wide learning networks, i.e., networks with very large numbers of input and output nodes, but no hidden layers. The mathematics of these networks are well understood. From a statistical perspective, wide learning is related to multivariate multiple regression. In this respect, wide learning differs from deep learning. Deep learning networks, however impressive their performance, are still largely black boxes when it comes to understanding why they work, and how exactly they work for a given problem.

There are three main reasons for studying wide networks. Apart from interpretational transparency, they turn out to perform surprisingly well, especially when the input and output features for such networks are carefully designed against the background of what we know about language and the brain. Furthermore, if we can show that wide networks can perform speech production and language comprehension with a high degree of accuracy, similar to that of listeners and speakers, then we have the strongest possible proof for the existence of algorithms that can do comprehension and production without the help of phoneme and morpheme units. This is of crucial importance in the context of deep learning networks. The units on the hidden layers of deep learning networks as applied to natural language processing have been interpreted as "fuzzy" variants of phonemes and morphemes, and hence as evidence that the classical hierarchical linguistic models must be correct after all. For instance, Hannagan et al. (2014) proposed a deep learning network explaining lexical learning in baboons, and attributed hidden units at various levels of granularity to different parts of the ventral pathway in the primate brain. However, as shown by Linke et al. (2017), much better prediction for baboon learning behavior is obtained with a wide learning network.

REFERENCES

Arnold, D., Tomaschek, F., Sering, K., Lopez, F., and Baayen, R.H. (2017). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLoS ONE 12(4): e0174623, 1-16.

Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.

Birkholz, P. (2013). Modeling Consonant-Vowel Coarticulation for Articulatory Speech Synthesis. PloS ONE, 8.

Blevins, J. P. (2016). Word and paradigm morphology. Oxford University Press.

Hannagan, T., Ziegler, J. C., Dufau, S., Fagot, J. & Grainger, J. (2014). Deep Learning of Orthographic Representations in Baboons, PLOS-ONE, vol. 9.

Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al. (2014). Deep speech: Scaling up end-to-end speech recognition. arXiv:1412.5567.

Linke, M., Bröker, F., Ramscar, M., and Baayen, R. H. (2017). Are baboons learning "orthographic" representations? Probably not. PLoS ONE, 12 (8): e0183876.

Shafaei-Bajestan, E. and Baayen, R. H. (2018). Wide Learning for Auditory Comprehension. In Yegnanarayana, B. (Chair) Proceedings of Interspeech 2018, 966-970. Hyderabad, India: International Speech Communication Association (ISCA).

PROJECT

Project Packages

The WIDE research programme comprises three subprojects, one addressing language comprehension, one addressing speech production, and one focusing on how to best model word use and lexical semantics.

A synthesis of some recent results is presented in Baayen et al. (2019). An outreach article on this and related research carried out in the quantitative linguistics lab is available in the leading science communication publication, Scientia.

LANGUAGE COMPREHENSION

The project on language comprehension focuses on the understanding of natural spontaneous speech. Building on previous work on speech comprehension (Arnold et al. 2017), we are studying auditory word recognition with wide learning.

Wide learning networks trained with low-level acoustic features extracted from the audio signal of words occurring in corpora of spontaneous speech (such as vast repository of multi-model TV news broadcasts of the Distributed Little Red Hen Lab) perform surprisingly well, outperforming deep learning networks on the task of isolated word recognition by a factor of two (Shafaei-Bajestan & Baayen, 2018). Deep learning networks, however, are amazingly good at recognition of words in continuous speech, and an important challenge for this project is to show that wide learning can also be made to work for continuous speech.

SPEECH PRODUCTION

The project on speech production addresses the question of how to model the learning of articulation.

We have started working with the Vocal Tract Lab (VTL) model developed by Birkholz and collaborators at the TU Dresden. VTL provides a 3-dimensional model of the vocal tract, and generates speech sounds based on simulated articulator and vocal fold motion. The model has 20 parameters, including parameters for velic opening, horizontal jaw position, tongue root position, parameters for tongue body and tongue tip, lip parameters, and velum shape. The challenge here is to learn how to modulate these parameters over time to produce words, given the lexical semantics to be expressed and the feedback the speaker receives from the audio signal and her own articulators.

SPEECH IN CONTEXT

The third project is concerned with how to represent words' meanings, and how to model the effect of words' contexts on comprehension and production.

Spoken words can be very difficult to make sense of without context. For instance, in conversational German, 'wuerden' (they became) is often realized as 'wuen' instead of 'wuerdn'; in spontaneous Dutch, 'natuurlijk' ('of course') reduces to 'tuuk', English 'hilarious' becomes 'hlεrəs,', and in Mandarin informal speech, all that may be left of the three-syllable word '要不然' ('jaʊpuʐan', or, otherwise) is ʊɪ. We therefore will examine different statistical models that predict words' probabilities given their context, as these will be informative for extending our current system for auditory comprehension so that it can deal not only with single-word recognition but also with the understanding of continuous speech. A better understanding of the role of context is also essential for modeling how exactly words are articulated. This project also addresses the question of the optimal representation of words' meanings, focusing on the meanings of morphologically complex words on the one hand, and exploring the potential of wide learning networks on the other hand.

ERC-WIDE

Publications

ARTICLES

Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.

Baayen, R. H. (2019). Are You Listening? Teaching a Machine to Understand Speech. Scientia, 2019, 1-5.

Baayen, R. H., Chuang, Y. Y. and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.

Sering, K., Milin, P. and Baayen, R. H. (2018). Language comprehension as a multi-label classification problem. Statistica Neerlandica, 72, 339-353.

CONFERENCE PAPERS

Chuang, Y. Y., Sun, C. C., Fon, J., Baayen, R. H. (2019). Geographical variation of the merging between dental and retroflex sibilants in Taiwan Mandarin. In Calhoun, S., Escudero, P., Tabain, M. and Warren, P. (Eds.) Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia, 274-276. Canberra, Australia: Australasian Speech Science and Technology Association Inc..

Chuang, Y. Y., Vollmer, M. L., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., Baayen, R. H. (2019). On the processing of nonwords in word naming and auditory lexical decision. In Calhoun, S., Escudero, P., Tabain, M. and Warren, P. (Eds.) Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia, 1432-1436. Canberra, Australia: Australasian Speech Science and Technology Association Inc..

Boll-Avetisyan, N., Nixon, J. S. Lentz, T. O., Liu, L., van Ommen, S., Çöltekin, Ç. and van Rij, J. (2018). Neural response development during distributional learning. In Yegnanarayana, B., Et.Al. (Eds.) Proceedings of Interspeech 2018,1432-1436. Hyderabad, India: International Speech Communication Association (ISCA).

Nixon, J. S. (2018). Effective acoustic cue learning is not just statistical, it is discriminative. In Yegnanarayana, B., Et.Al. (Ed.) Proceedings of Interspeech 2018, 1447-1451. Hyderabad, India: International Speech Communication Association (ISCA).

Shafaei-Bajestan, E. and Baayen, R. H. (2018). Wide Learning for Auditory Comprehension. In Yegnanarayana, B. (Chair) Proceedings of Interspeech 2018, 966-970. Hyderabad, India: International Speech Communication Association (ISCA).

ERC-WIDE

Project Presentations

UPCOMING

---

2019

Chuang, Y. Y., Vollmer, M. L., Shafaei-Bajestan, E., Gahl, S., Hendrix, P., Baayen, R. H., On the processing of nonwords in word naming and auditory lexical decision, International Congres of Phonetic Sciences (ICPhS2019), Melbourne, Australia, August 8, 2019.

Chuang, Y. Y., Sun, C. C., Fon, J., Baayen, R. H., Geographical variation of the merging between dental and retroflex sibilants in Taiwan Mandarin, International Congres of Phonetic Sciences (ICPhS2019), Melbourne, Australia, August 5, 2019.

Tomaschek, F., Nixon, J. S., Emerging Structures in Random Data Result in Naive Learning, Psycholinguistics in Iceland – Parsing and Prediction, Reykjavik, Iceland, June 20, 2019.

Baayen, R. H., Wide learning in language modeling, Colloquium ICCLS - Interdisciplinary Centre for Cognitive Language Studies, München, Germany, June 17, 2019.

Sun, K., A Regression Model for Simulating and Predicting the Use of Periods by Chinese Natives, Interpunktion international, Regensburg, Germany, May 4, 2019.

Baayen, R. H., Throwing off the shackles of the morpheme with simple linear transformations, Colloquium for Computational Linguistics and Linguistics in Stuttgart, Stuttgart, Germany, April 29, 2019 (invited).

Chuang, Y. Y., Making sense of auditory nonwords, Groningen Spring School on Cognitive Modeling, Groningen, The Netherlands, April 11, 2019 (keynote).

Baayen, R. H., Wide learning in language modeling, Vienna University of Economics and Business, Vienna, Austria, March 15, 2019 (invited).

Sering, K., Stehwien, N., Gao, Y., Butz, M. V., Baayen, R. H., Resynthesizing the GECO speech corpus with VocalTractLab, 30th Conference on Electronic Speech Signal Processing (ESSV), Dresden, Germany, March 7, 2019.

Chuang, Y. Y., Baayen, R. H., Making sense of auditory nonwords, Workshop - "Models of Computational Morpho(phono)logy", Cambridge, UK, February 15, 2019 (invited).

Baayen, R. H., Linear discriminative learning and the bilingual lexicon, A Language Learning Roundtable, Fribourg, Switzerland, February 11, 2019 (invited).

2018

Baayen, R. H., Throwing off the shackles of the morpheme with simple linear mappings, Annual Meeting of the Society for Computers in Psychology (SCiP), New Orleans, USA, November 15, 2018 (keynote).

Cassani, G., Chuang, Y.-Y., and Baayen, R. H., On the Semantics of Non-words and their Lexical Categories, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 27, 2018 (poster presentation).

Cassani, G., Chuang, Y.-Y., and Baayen, R. H., On the Semantics of Non-words and their Lexical Categories, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 27, 2018 (poster presentation).

Nixon, J. S., The Kamin Blocking Effect in Speech Acquisition: Non-native Acoustic Cue Learning is Blocked by Already-learned Cues, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 27, 2018 (poster presentation).

Sun, K., Diachronic and Qualitative Analysis of English Hyphenated Compounds in the Last Two Hundred Years, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 27, 2018 (poster presentation).

Sun, K., Diachronic and Qualitative Analysis of English Hyphenated Compounds in the Last Two Hundred Years, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 27, 2018 (poster presentation).

Baayen, R. H., Speech Production in the Discriminative Lexicon, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 26, 2018 (poster presentation).

Chuang, Y.-Y., and Baayen, R. H., Computational Modeling of the Role of Phonology in Silent Reading, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 26, 2018.

Denistia, K., Shafaei Bajestan, E., and Baayen, R. H., A Semantic Vector Model for the Indonesian Prefixes pe-and peN-, Eleventh International Conference on the Mental Lexicon, Edmonton, Canada, September 26, 2018 (poster presentation).

Baayen, R. H., Word and Paradigm Morphology with Linear Discriminative Learning, University of Sheffield, Sheffield, UK, September 12, 2018 (invited).

Boll-Avetisyan, N., Nixon, J. S. Lentz, T. O., Liu, L., van Ommen, S., Çöltekin, Ç. and van Rij, J. (2018). Neural response development during distributional learning, Interspeech 2018 – The 19th Annual Conference of the International Speech Communication Association, Hyderabad, India. September 2-6, 2018.

Nixon, J. S. (2018). Effective acoustic cue learning is not just statistical, it is discriminative, Interspeech 2018 – The 19th Annual Conference of the International Speech Communication Association, Hyderabad, India. September 2-6, 2018.

Shafaei-Bajestan, E. and Baayen, R. H. (2018). Wide Learning for Auditory Comprehension, Interspeech 2018 – The 19th Annual Conference of the International Speech Communication Association, Hyderabad, India. September 2-6, 2018.

Baayen, R. H., Participant in a discussion on the lifespan development of the mental lexicon, Symposium on the Aging Lexicon, Basel, Switzerland, June 7-9, 2018 (invited).

Steiner, I., Tomaschek, F., Bolkart, T., Hewer, A., and Sering, K., Simultaneous Dynamic 3D Face Scanning and Articulography, SimPhon.Net workshop 5, Stuttgart, Germany, June 6, 2018.

Baayen, R. H., and E. Shafaei, A discriminative perspective on lexical access in auditory comprehension, Basque Center for Applied Mathematics, Bilbao, Spain, April 10, 2018 (invited).

Baayen, R. H., and E. Shafaei, A discriminative perspective on lexical access in auditory com- prehension and speech production, Basque Center on Cognition, Brain and Language, San Sebastian, Spain, April 9, 2018 (invited).

2017

Baayen R. H., Tomaschek, F., Ernestus, M., and Plag, I., Explaining the acoustic durations of s in conversational English with naive discriminative learning, Workshop Current Approaches to Morphology, Edmonton, Canada, December 20, 2017 (invited).

Baayen, R. H., Trial-by-trial discrimination learning in the lexical decision task, CLiPS (Computational Linguistics and Psycholinguistics) Colloquium, Antwerp, Belgium, October 16, 2017 (invited).