Neural Comp. NEW Faster Access
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Doya, K.
Right arrow Articles by Kawato, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Doya, K.
Right arrow Articles by Kawato, M.
(Neural Computation. 2002;14:1347-1369.)
© 2002 The MIT Press


Letter

Multiple Model-Based Reinforcement Learning

Kenji Doya

doya{at}atr.co.jp, Human Information Science Laboratories, ATR International, Seika, Soraku, Kyoto 619-0288, Japan; CREST, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan; Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan; and Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan

Kazuyuki Samejima

samejima{at}atr.co.jp, Human Information Science Laboratories, ATR International, Seika, Soraku, Kyoto 619-0288, Japan, and Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan

Ken-ichi Katagiri

keniti-k{at}syd.odn.ne.jp, ATR Human Information Processing Research Laboratories, Seika, Soraku, Kyoto 619-0288, Japan, and Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan

Mitsuo Kawato

kawato{at}atr.co.jp, Human Information Science Laboratories, ATR International, Seika, Soraku, Kyoto 619-0288, Japan; Kawato Dynamic Brain Project, ERATO, Japan Science and Technology Corporation, Seika, Soraku, Kyoto 619-0288, Japan; and Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan

We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The "responsibility signal," which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.




This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
A. N. Hampton, P. Bossaerts, and J. P. O'Doherty
Neural correlates of mentalizing-related computations during strategic interactions in humans
PNAS, May 6, 2008; 105(18): 6741 - 6746.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
H. Fujita and S. Ishii
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation
Neural Comput., November 1, 2007; 19(11): 3051 - 3087.
[Abstract] [Full Text] [PDF]


Home page
J. Neurosci.Home page
A. N. Hampton, P. Bossaerts, and J. P. O'Doherty
The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans.
J. Neurosci., August 9, 2006; 26(32): 8360 - 8367.
[Abstract] [Full Text] [PDF]


Home page
Adaptive BehaviorHome page
R. W. Paine and J. Tani
How Hierarchical Control Self-organizes in Artificial Adaptive Systems
Adaptive Behavior, September 1, 2005; 13(3): 211 - 225.
[Abstract] [PDF]


Home page
J. Cogn. Neurosci.Home page
M. E. Hasselmo
A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior
J. Cogn. Neurosci., July 1, 2005; 17(7): 1115 - 1129.
[Abstract] [Full Text] [PDF]


Home page
Adaptive BehaviorHome page
M. Khamassi, L. Lacheze, B. Girard, A. Berthoz, and A. Guillot
Actor-Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats
Adaptive Behavior, June 1, 2005; 13(2): 131 - 148.
[Abstract] [PDF]


Home page
J. Neurophysiol.Home page
N. Malfait, P. L. Gribble, and D. J. Ostry
Generalization of Motor Learning Based on Multiple Field Exposures and Local Adaptation
J Neurophysiol, June 1, 2005; 93(6): 3327 - 3338.
[Abstract] [Full Text] [PDF]


Home page
J. Neurosci.Home page
H. Imamizu, T. Kuroda, T. Yoshioka, and M. Kawato
Functional Magnetic Resonance Imaging Examination of Two Modular Architectures for Switching Multiple Internal Models
J. Neurosci., February 4, 2004; 24(5): 1173 - 1181.
[Abstract] [Full Text] [PDF]


Home page
J. Neurophysiol.Home page
N. I. Krouchev and J. F. Kalaska
Context-Dependent Anticipation of Different Task Dynamics: Rapid Recall of Appropriate Motor Skills Using Visual Cues
J Neurophysiol, February 1, 2003; 89(2): 1165 - 1175.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2002 by The MIT Press.