Deferred Restructuring of Experience in Autonomous Machines

DREAM overview

Accumulating knowledge over long periods of time requires a consolidation process, so as to avoid being overwhelmed by the abundance of incoming information. Sleep has been shown to be critical for many consolidation processes, such as restructuring of representations, maintaining knowledge integration and coherence, improving insight learning, driving abstractions, forming novel levels of description, deleting unwanted information, exploring recombination of concepts, and stimulating creative thinking (Wagner et al., Nature, 2004). Our targeted scientific breakthrough is to enable robots to gain an open-ended understanding of the world over long periods of time, with alternating periods of experience and sleep. The role of sleep has so far been neglected in robotics and artificial intelligence.

To achieve higher levels of autonomy and understanding in developmental robotics, we propose a paradigm shift with DREAM, a cognitive architecture that exploits sleep to improve its functioning. It is contended here that Evolutionary Neurodynamic ensemble methods (Fernando et al, Frontiers in Comp Neuro, 2012; Bellas et al., IEEE-TAMD, 2010) are a unifying principle for creative thinking and knowledge consolidation; these methods form the core of DREAM. Our key insight is that the brain consists of three coupled subsystems that are generated and adapted according to experience through evolutionary means: Models to make predictions about future state of the environment, notably to understand the results of actions; Policies that generate actions and behaviors, and are related to task-specific perceptual features; Values to reward, evaluate and compare policies or models. The long-term vision is to build genuinely situated and embodied agents with beliefs, desires, personalities, and idiosyncrasies, who are as inevitably influenced by their individual developmental trajectories as we are. To reach the proposed adaptive properties, the architecture will rely on alternating between active interaction and passive introspection over past events, i.e. sleep, and must satisfy the following realistic and measurable objectives:

Objective 1

Evolve and decompose new values and motivations in an open-ended manner, on the basis of a low-dimensional set of immutable intrinsic motivations and other self-built values. Users should be able to guide the evolution of the value system, and values should be exchangeable between different robots. The discovery of new policies and models enables the robot to associate specific skills as being capable of fulfilling specific motivations, which drives the exploration of its own abilities. This objective is met when the robot discovers its own talents and capabilities in an open-ended manner ("I can move objects"), can use human interaction to guide and accelerate this discovery ("I should move objects there"), and can use its discovered talents to fulfill externally specified tasks ("tidy the table").

Objective 2

Restructure representations and models to understand and organize the dynamics of being in a particular environment. This understanding is organized as a set of predictive models and efficient policies, which are acquired during experience and then evolved, refined and restructured during sleep. This objective is met when restructuring enables the robot to understand basic causal relationships such as gravity (as studied in Daniel Povinelli's experiments with chimpanzees), or more task-related relationships ("different skills are required to manipulate objects of type A and B").

Objective 3

Consolidate knowledge by committing successful predictive models (those that predict environment dynamics well), value decompositions (those that satisfy lower level motivations), and policies (those that fulfill values) to long-term memory. This consolidation takes place during sleep. This objective is met when robots can switch between different contexts or domains, without catastrophic forgetting of previously acquired models, values and policies ("I am able to manipulate books in the library and cutlery in the kitchen")

Objective 4

Expand knowledge through social interactions by sharing knowledge between different beings in the environment. Different experiences can lead to different models, policies or values, depending on the encountered conditions. Sharing knowledge can lead (1) to the identification of the most efficient ones and (2) to more generic and robust values, models and policies through a consolidation over the knowledge acquired by the different beings. This objective is met when a robot can generalize over an unforeseen context through the experience of other robots ("I can manipulate cutlery though I never went to a kitchen before"). Robots should be able to identify one another based on their capabilities textit{and} experience to enable privileged information sharing ("we have always been very close, I trust you better than any other about solving this new problem").

In order to achieve these objectives, DREAM is organized with 8 work packages:

WP1 : Value decomposition and restructuring (WP leader: Richard Duro, UDC)
WP2 : Representation restructuring (STM) (WP leader: David Filliat, ENSTA Paristech / Armines)
WP3 : Knowledge consolidation and restructuring (LTM) (WP leader: Richard Duro, UDC)
WP4 : Collective sharing of knowledge (WP leader: A.E Eiben, VU/VUmc)
WP5 : Computational neuroscience and cross-fertilization (WP leader: Benoît Girard, SU)
WP6 : Integration and demonstration (WP leader: Stéphane Doncieux, SU)
WP7 : Dissemination (WP leader: Nicolas Bredeche, SU)
WP8 : Project Management (WP leader: Zoitsa Siaplaoura Karagkouni, SU)