|
|
||||||||
Letter |
doya{at}atr.co.jp, Human Information Science Laboratories, ATR International, Seika, Soraku, Kyoto 619-0288, Japan; CREST, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan; Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan; and Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
samejima{at}atr.co.jp, Human Information Science Laboratories, ATR International, Seika, Soraku, Kyoto 619-0288, Japan, and Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan
keniti-k{at}syd.odn.ne.jp, ATR Human Information Processing Research Laboratories, Seika, Soraku, Kyoto 619-0288, Japan, and Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
kawato{at}atr.co.jp, Human Information Science Laboratories, ATR International, Seika, Soraku, Kyoto 619-0288, Japan; Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan; and Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The "responsibility signal," which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.
This article has been cited by other articles:
![]() |
A. N. Hampton, P. Bossaerts, and J. P. O'Doherty Neural correlates of mentalizing-related computations during strategic interactions in humans PNAS, May 6, 2008; 105(18): 6741 - 6746. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Fujita and S. Ishii Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation Neural Comput., November 1, 2007; 19(11): 3051 - 3087. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. N. Hampton, P. Bossaerts, and J. P. O'Doherty The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans. J. Neurosci., August 9, 2006; 26(32): 8360 - 8367. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. Paine and J. Tani How Hierarchical Control Self-organizes in Artificial Adaptive Systems Adaptive Behavior, September 1, 2005; 13(3): 211 - 225. [Abstract] [PDF] |
||||
![]() |
M. E. Hasselmo A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior J. Cogn. Neurosci., July 1, 2005; 17(7): 1115 - 1129. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Khamassi, L. Lacheze, B. Girard, A. Berthoz, and A. Guillot Actor-Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats Adaptive Behavior, June 1, 2005; 13(2): 131 - 148. [Abstract] [PDF] |
||||
![]() |
N. Malfait, P. L. Gribble, and D. J. Ostry Generalization of Motor Learning Based on Multiple Field Exposures and Local Adaptation J Neurophysiol, June 1, 2005; 93(6): 3327 - 3338. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Imamizu, T. Kuroda, T. Yoshioka, and M. Kawato Functional Magnetic Resonance Imaging Examination of Two Modular Architectures for Switching Multiple Internal Models J. Neurosci., February 4, 2004; 24(5): 1173 - 1181. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. I. Krouchev and J. F. Kalaska Context-Dependent Anticipation of Different Task Dynamics: Rapid Recall of Appropriate Motor Skills Using Visual Cues J Neurophysiol, February 1, 2003; 89(2): 1165 - 1175. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| J COGNITIVE NEUROSCIENCE | NEURAL COMPUTATION | MIT PRESS JOURNALS |