Connectionist model of object reaching and grasping based on reinforcement learning

Zdechovan, L. & Farkas, I.

Comenius University in Bratislava

The recent rise of interest in developmental cognitive robotics attempts to provide new understanding of how cognitive functions develop by synthetic approach and how physical embodiment shapes information structuring through agent's interactions with the environment. We use a simulator of the humanoid robot iCub whose task is to master reaching and grasping of objects of various shapes (block, cylinder, ball) at various locations within robot's reach. Novelty of the approach lies in the action learning modules that are based on a biologically plausible reinforcement learning that does not use any target variables, as opposed to supervised learning. The model operates in continuous state and action spaces, that lend themselves nicely to generalization. We approach the task by devising two separate, but interacting modules - Reaching and Grasping, each having the actor-critic architecture, where the actor learns motor actions (changes of joint angles) and the critic learns to estimate reward for visited states. Both modules, each implemented with two multi-layer perceptrons (actor and critic), are trained by the recently proposed Continuous Actor-Critic Learning Automaton (CACLA; van Hasselt, 2007), using the visual, proprioceptive, and also haptic/tactile information (in Grasping). We experiment with CACLA algorithm and its two modifications (in case of Reaching). The learning of both modules is driven by the reward function, whose design is an important feature of the approach, “motivating” the robot to perform actions that maximize the reward. The reward function consists of the combination of features, that reflect the arm position and mutual relationship between the approaching hand and the object. We show that iCub, using 12 degrees of freedom in its right arm, learns to reach for novel object positions and becomes able to perform all three types of grasp - power, side and precision grasp (roughly in this order), in less than 500 training episodes.