Seeing Two Things at Once: A Neural Network Model

Henderson, C. M. & McClelland, J. L.

Department of Psychology, Stanford University

Cluttered scenes are ubiquitous in both natural and artificial environments. Experimental paradigms such as illusory conjunctions explore the effects on visual encoding when multiple objects are present. These paradigms have given rise to theories such as Feature Integration Theory, in which perception is limited to a single object at a time and contrary subjective experience is posited as spurious (Treisman & Schmidt, 1982). However, the common experience of seeing a small set of objects at once is consistent with and often supported by empirical data (e.g. Najemnik & Geisler, 2005). Our explorations of neural network models have led to a system that can simultaneously identify abstract depictions of two objects while also demonstrating illusory conjunctions under certain conditions (Henderson & McClelland, 2011).

This model consists of ventral and dorsal components, capturing aspects of object recognition and representation for action (reaching and grasping for objects). The ventral portion is sufficient to recognize a single object while the dorsal component alone can model a simple analog of reaching and grasping. Simultaneous recognition of two objects, however, depends on collaboration between the components. Simulated lesions support this characterization and also match the pattern of errors from lesion cases exhibiting simultanagnosia (Robertson et al., 1997) and spared vision for simple actions in the absence of visual recognition (Milner & Goodale, 2006).

We are now exploring spatial proximity and topographic organization both in perception and in the model. There is evidence that the proximity of stimuli matters in illusory conjunctions. In our talk we will present these findings and describe our simulation studies exploring topography in the model. We are especially interested in whether restrictions on the model structure and/or on the spatial structure of inputs can lead to emergent topography, or whether it is necessary to incorporate explicit mechanisms that promote topographic bias.