Predictive neural computations in speech perception

Davis, M.

MRC Cognition and Brain Sciences Unit, Cambridge, UK

Human speech perception achieves unmatched speed and accuracy, even for speech that is acoustically degraded. This work explores the underlying neural computations by relating computational simulations to brain activity measured with Magnetoencephalography (MEG).

A first experiment assessed early identification of spoken words (e.g. /fo:mju/ identifies “formula” from the cohort of matching words, Marslen-Wilson & Tyler, 1980). Lexical competition accounts propose inhibitory selection among matching words with activation of “formula” suppressing competitors like “formal” (TRACE, McClelland & Elman, 1986). Conversely, predictive coding uses lexical knowledge to predict upcoming sounds following unique sequences like /fo:mju/ (cf. Elman, 1990). To distinguish these theories, participants learned new words (“formubo”) that resemble one existing word. Computational simulations show that knowing “formubo”: (1) increases lexical competition following /fo:m/ but reduces prediction error, (2) increases prediction error at later segments when “formula/formubo” diverge. MEG responses in the superior temporal gyrus (STG) were uniquely consistent with predictive coding simulations.

A second study assessed MEG responses to speech degraded using noise vocoding (Shannon et al., 1995). Manipulations of both acoustic clarity and prior knowledge (priming with matching text) increased ratings of speech clarity, but had opposite effects on neural responses. The STG produce a larger response to acoustically clearer speech, but a reduced response for syllables following matching text. These opposite effects are inconsistent with models like TRACE in which bottom-up input and top-down expectations are summed in pre-lexical processing. However, they are consistent with a predictive coding account in which the STG computes the difference between predicted and heard speech sounds.

Thus, both experiments support a predictive coding model of the neural computations involved in speech perception. STG neurons code the difference between previously predicted and currently heard speech sounds. Prediction error is fed-forward to higher levels to generate a phonological percept, update lexical activation and drive comprehension.