Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hochreiter, S.
Right arrow Articles by Schmidhuber, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hochreiter, S.
Right arrow Articles by Schmidhuber, J.

Neural Computation, Vol 9, 1735-1780, Copyright © 1997 by The MIT Press


LETTERS

Long Short-Term Memory

Sepp Hochreiter and Jurgen Schmidhuber

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.


This article has been cited by other articles:


Home page
Neural Comput.Home page
A. Gruning
Elman Backpropagation as Reinforcement for Simple Recurrent Networks
Neural Comput., November 1, 2007; 19(11): 3108 - 3131.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Hochreiter, M. Heusel, and K. Obermayer
Fast model-based protein homology detection without alignment
Bioinformatics, July 15, 2007; 23(14): 1728 - 1736.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez
Training recurrent networks by evolino.
Neural Comput., March 1, 2007; 19(3): 757 - 779.
[Abstract] [Full Text] [PDF]


Home page
INFORMS Journal on ComputingHome page
J. A. Franklin
Recurrent Neural Networks for Music Computation
INFORMS Journal on Computing, January 1, 2006; 18(3): 321 - 338.
[Abstract] [PDF]


Home page
Neural Comput.Home page
M. C. Ozturk, D. Xu, and J. C. Principe
Analysis and Design of Echo State Networks
Neural Comput., January 1, 2006; 19(1): 111 - 138.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. P. Rougier, D. C. Noelle, T. S. Braver, J. D. Cohen, and R. C. O'Reilly
Prefrontal cortex and flexible cognitive control: Rules without symbols
PNAS, May 17, 2005; 102(20): 7338 - 7343.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
R. C. O'Reilly and M. J. Frank
Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia
Neural Comput., February 1, 2005; 18(2): 283 - 328.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
B. Hammer and P. Tino
Recurrent Neural Networks with Small Weights Implement Definite Memory Machines
Neural Comput., August 1, 2003; 15(8): 1897 - 1929.
[Abstract] [Full Text]


Home page
Neural Comput.Home page
G. de A. Barreto, A. F. R. Araujo, and S. C. Kremer
A Taxonomy for Spatiotemporal Connectionist Networks Revisited: The Unsupervised Case
Neural Comput., June 1, 2003; 15(6): 1255 - 1320.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
A. D. Back and T. Chen
Universal Approximation of Multiple Nonlinear Operators by Neural Networks
Neural Comput., November 1, 2002; 14(11): 2561 - 2566.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
J. Schmidhuber, F. Gers, and D. Eck
Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM
Neural Comput., September 1, 2002; 14(9): 2039 - 2041.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
A. Aussem
Sufficient Conditions for Error Backflow Convergence in Dynamical Recurrent Neural Networks
Neural Comput., August 1, 2002; 14(8): 1907 - 1927.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
S. C. Kremer
Spatiotemporal Connectionist Networks: A Taxonomy and Review
Neural Comput., February 1, 2001; 13(2): 249 - 306.
[Abstract] [Full Text]


Home page
Neural Comput.Home page
F. A. Gers, J. Schmidhuber, and F. Cummins
Learning to Forget: Continual Prediction with LSTM
Neural Comput., October 1, 2000; 12(10): 2451 - 2471.
[Abstract] [Full Text]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 1997 by The MIT Press.