OS_14.6 - To see and hear a word, we inefficiently combine features but efficiently combine streams

Dubois, M. 1, 2 , Poeppel, D. 2 & Pelli, D. G. 2

1 Laboratoire Cognition, Langage et Développement, Université Libre de Bruxelles, Brussels, Belgium
2 Psychology and Neural Science, New York University, New York, USA

To recognize an object, we detect and bind the features it is made of. We also merge information across the senses into a coherent percept of our external environment. In general, how well do we combine information from several sources, be they features, cues, or sensory modalities? Building on the classic efficiency approach, here we introduce a “relative efficiency” paradigm to assess binding. We measure the energy threshold as a function of object extent (a word) or for a combination as opposed to each component alone (audio and visual). Efficient binding has a fixed energy threshold, independent of length or distribution among modalities. Inefficient binding requires more energy as length or number of modalities increases. Our results reveal an amazing dichotomy. Energy is integrated inefficiently within each modality: Observers need more energy to recognize longer words, whether seen or heard. However, text and speech summate perfectly: Observers require the same overall energy, irrespective of its distribution across eye and ear. Thus, to see and hear a word, we inefficiently combine features but efficiently combine streams.