Activities and Seminars

Harald Baayen and Elnaz Shafaei, A discriminative perspective on lexical access in auditory comprehension and speech production
Date: Apr 09, 2018

What: A discriminative perspective on lexical access in auditory comprehension and speech production

Where: BCBL Auditorium

Who: Harald Baayen and Elnaz Shafaei; Eberhard-Karls University Tübingen, Tübingen, Germany.

When: 4 PM

Standard linguistic theories portray language as a two-tiered system. A sound system defines minimal meaning-bearing units, the morphemes, which are taken to consist of hierarchically organized sound units, the phonemes. A second system defines sentences as hierarchically ordered compositional sequences of morphemes. Although it is well known that both the phoneme and the morpheme are highly problematic units for a variety of theoretical and descriptive reasons, introductory linguistics textbooks as well as the overwhelming majority of papers on language processing published in the psychology and neuroscience communities adopt the traditional post-Bloomfieldian perspective as ground truth. However, recent advances in machine learning have shown that deep convolutional networks are much more effective for a variety of natural language processing tasks than algorithms grounded in the classical post-Bloomfieldian constructs of traditional linguistics. The current successes of deep learning networks challenge the post-Bloomfieldian framework of language at its heart, leaving linguistics basically two choices. One option is to argue that the intermediate layers of deep networks represent the hierarchical layers of units that linguists “have always known to be necessary” for understanding language. The other option is to fundamentally rethink how language actually works, to step away from the traditional post-Bloomfieldian constructs, and to start exploring what machine learning has to offer for understanding the relation between form and meaning. In our presentation, we report research recent work attempting to fundamentally rethink lexical access. Instead of using deep learning, however, we are exploring wide learning with large but simple networks with input and output units and no hidden layers, as implemented in naive discriminative learning (Baayen et al., 2011). In wide learning, the choice of input and output features is crucial, as there are no hidden layers to compensate for suboptimal coding at input and output. For auditory comprehension, we will present a wide learning model that discriminates between words’ meanings on the basis of smart low-level features extracted from the speech signal developed by Arnold et al. (2017). We will present an extension of this model that, for the audio of words presented in isolation, out-performs the accuracy of off-the-shelf deep learning networks by a factor of two. For speech production, we will lay out a computational model that provides proof of concept that it is possible to predict, with high accuracy, words’ forms from their semantics without representations for morphemes, stems, affixes, and phonemes and without rules operating on these representations. The model correctly predicts that the speech signal is elongated when there is more uncertainty, and hence is able to explain `morphological’ effects in production (Weingarten et al., 2004; Bertram et al., 2015; Cho, 2001) without requiring morphemes. References Arnold, D., Tomaschek, F., Sering, K., Lopez, F., and Baayen, R.H. (2017). Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit. PLoS ONE 12(4): e0174623. Baayen, R. H., Milin, P., Filipovic Durdevic, D., Hendrix, P., and Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118, 438-482. Bertram, R., Tønnessen, F. E., Strömqvist, S., Hyönä, J., & Niemi, P. (2015). Cascaded processing in written compound word production. Frontiers in human neuroscience, 9, 207. Cho, T. (2001). Effects of morpheme boundaries on intergestural timing: Evidence from Korean. Phonetica, 58(3), 129-162. Weingarten, R., Nottbusch, G., & Will, U. (2004). Morphemes, syllables, and graphemes in written word production. Trends in linguistics studies and monographs, 157, 529-572.