Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Gers, F. A.
Right arrow Articles by Cummins, F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gers, F. A.
Right arrow Articles by Cummins, F.
(Neural Computation. 2000;12:2451-2471.)
© 2000 The MIT Press


Letter

Learning to Forget: Continual Prediction with LSTM

Felix A. Gers

IDSIA, 6900 Lugano, Switzerland

Jürgen Schmidhuber

IDSIA, 6900 Lugano, Switzerland

Fred Cummins

IDSIA, 6900 Lugano, Switzerland

Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.




This article has been cited by other articles:


Home page
Neural Comput.Home page
J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez
Training recurrent networks by evolino.
Neural Comput., March 1, 2007; 19(3): 757 - 779.
[Abstract] [Full Text] [PDF]


Home page
INFORMS Journal on ComputingHome page
J. A. Franklin
Recurrent Neural Networks for Music Computation
INFORMS Journal on Computing, January 1, 2006; 18(3): 321 - 338.
[Abstract] [PDF]


Home page
Neural Comput.Home page
R. C. O'Reilly and M. J. Frank
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia
Neural Comput., February 1, 2005; 18(2): 283 - 328.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
A. Aussem
Sufficient Conditions for Error Backflow Convergence in Dynamical Recurrent Neural Networks
Neural Comput., August 1, 2002; 14(8): 1907 - 1927.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2000 by The MIT Press.