Cognitive “Tricks” in Embodied Grounded Language Learning: going beyond labels

Morse, A. 1 , Marocco, D. 1 , Kerstin, F. 2 & Cangelosi, A. 1

1 Center for Robotics and Neural Systems, University of Plymouth
2 University of Southern Denmark

While it is possible that evolution has crafted an incredibly complex, specialized and modular cognitive engine in us, another possibility, and the hypothesis explored here, is that we embody a relatively simple and homogeneous cognitive mechanism bolstered, as Dennett has suggested, by a variety of “tricks”.

For Dennett, the “bag of tricks” hypothesis draws an analogy between cognition and a conjurer using a several methods to achieve the same illusion and so while one method is being tested another is actually in use and the vale of mystique remains. Herein we show that a system implementing multiple learning methods (cross-situational learning, mutual exclusivity, and simple grammar learning) can outstrip the capabilities of each to become more than the sum of its parts.

Our modeling approach is the Epigenetic Robotics Architecture (ERA), combining self-organizing-maps with structured hebbian learning to autonomously generate a dynamic neural structure displaying a wide variety of psychological phenomena. Previous work with the architecture explored the role that body posture plays in infant learning and lead to child experiments confirming model predictions. Herein the same model is again embodied in the iCub humanoid robot in interaction with people, teaching it the names of objects, actions and features. Though these word-feature-action relationships can be learned across multiple exposures, cross-situational learning requires a well-balanced set of experiences to expose the appropriate statistical relationships without introducing false ones. We show that mutual exclusivity and grammar cues not only remove many false statistical relationships, but that cross-situational learning can provide the grounding required to make grammar learning possible in the first place. These methods interact so that multiple novel words, in the presence of multiple novel targets, can be correctly attributed in a single exposure thereby going significantly beyond what the basic mechanism is supposedly capable of.