Modelling multimodal interaction in language mediated eye gaze

Smith, A. 1 , Huettig, F. 1, 2 & Monaghan, P. 3

1 Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
2 Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, The Netherlands
3 Department of Psychology, Lancaster University, Lancaster, U.K.

Hub-and-spoke models of semantic processing which integrate modality specific information within a central resource have proven successful in capturing a range of neuropsychological phenomena (Rogers et al, 2004; Dilkina et al, 2008). Within our study we investigate whether the scope of the Hub-and-spoke architectural framework can be extended to capture behavioural phenomena in other areas of cognition. The visual world paradigm (VWP) has contributed significantly to our understanding of the information and processes involved in spoken word recognition. In particular it has highlighted the importance of non-linguistic influences during language processing, indicating that combined information from vision, phonology, and semantics is evident in performance on such tasks (see Huettig, Rommers & Meyer, 2011). Huettig & McQueen (2007) demonstrated that participants’ fixations to objects presented within a single visual display varied systematically according to their phonological, semantic and visual relationship to a spoken target word. The authors argue that only an explanation allowing for influence from all three knowledge types is capable of accounting for the observed behaviour. To date computational models of the VWP (Allopenna et al, 1998; Mayberry et al, 2009; Kukona et al, 2011) have focused largely on linguistic aspects of the task and have therefore been unable to offer explanations for the growing body of experimental evidence emphasising the influence of non-linguistic information on spoken word recognition. Our study demonstrates that an emergent connectionist model, based on the Hub-and-spoke models of semantic processing, which integrates visual, phonological and functional information within a central resource, is able to capture the intricate time course dynamics of eye fixation behaviour reported in Huettig & McQueen (2007). Our findings indicate that such language mediated visual attention phenomena can emerge largely due to the statistics of the problem domain and may not require additional domain specific processing constraints.