Back External speakers: Hans Rutger. Just beat it: How the timing of simple gestures guides word recognition, vowel perception, and speech segmentation

Hans Rutger.Just beat it: How the timing of simple gestures guides word recognition, vowel perception, and speech segmentation

29/1/2026
- BCBL auditorium (and BCBL Auditorium zoom room)

What: Just beat it: How the timing of simple gestures guides word recognition, vowel perception, and speech segmentation

Where: BCBL Auditorium and Auditorium Zoom room  (If you would like to attend to this meeting reserve at info@bcbl.eu)

Who: Hans Rutger. PhD, Donders Centre for Cognition, Radboud University, Nijmegen, The Netherlands

When:  Thursday,  Jan 29th at 12:00 PM noon.

The study of audiovisual speech perception has typically focused on the integration of visual articulatory cues (i.e., lip movements) with auditory speech, for instance exemplified in the McGurk effect. Moreover, many such studies investigated the perception of speech segments (e.g., consonants), overlooking the important role of speech prosody in speech perception.
In this talk, I argue that speech prosody is principally conveyed through other bodily articulators, focusing primarily on the timing of simple hand gestures, known as beat gestures. This multimodal perspective on prosody predicts that the timing of simple hand movements may (like acoustic prosody) influence word recognition, segmental perception, and speech segmentation. I will present empirical data in support of this prediction, demonstrating that carefully-timed beat gestures guide the perception of lexical stress (i.e., distinguishing OBject from obJECT). In fact, subtle visual articulatory cues on the face – despite carrying relevant information about prosody – are purposefully ignored in audiovisual stress perception, highlighting distinct contributions of articulatory vs. gestural cues to audiovisual speech perception. Furthermore, the temporal alignment of simple hand movements can even be shown to influence the perception of segmental distinctions (e.g., vowel length in Dutch) and word segmentation in English.
Together, these findings emphasize that the timing of seemingly meaningless hand movements, commonly occurring in everyday face-to-face conversations, are important cues to prosody, with a pervasive influence on audiovisual speech perception. Moreover, they raise important questions about the theoretical construct of ‘audiovisual integration’, whether it exists at all, and if so how it might be best quantified.