|
|
||||||||
Neural Computation, Vol 9, 1735-1780, Copyright © 1997 by The MIT Press
LETTERS |
Sepp Hochreiter and Jurgen Schmidhuber
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
This article has been cited by other articles:
![]() |
A. Gruning Elman Backpropagation as Reinforcement for Simple Recurrent Networks Neural Comput., November 1, 2007; 19(11): 3108 - 3131. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hochreiter, M. Heusel, and K. Obermayer Fast model-based protein homology detection without alignment Bioinformatics, July 15, 2007; 23(14): 1728 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez Training recurrent networks by evolino. Neural Comput., March 1, 2007; 19(3): 757 - 779. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Franklin Recurrent Neural Networks for Music Computation INFORMS Journal on Computing, January 1, 2006; 18(3): 321 - 338. [Abstract] [PDF] |
||||
![]() |
M. C. Ozturk, D. Xu, and J. C. Principe Analysis and Design of Echo State Networks Neural Comput., January 1, 2006; 19(1): 111 - 138. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. P. Rougier, D. C. Noelle, T. S. Braver, J. D. Cohen, and R. C. O'Reilly Prefrontal cortex and flexible cognitive control: Rules without symbols PNAS, May 17, 2005; 102(20): 7338 - 7343. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. O'Reilly and M. J. Frank Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia Neural Comput., February 1, 2005; 18(2): 283 - 328. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Hammer and P. Tino Recurrent Neural Networks with Small Weights Implement Definite Memory Machines Neural Comput., August 1, 2003; 15(8): 1897 - 1929. [Abstract] [Full Text] |
||||
![]() |
G. de A. Barreto, A. F. R. Araujo, and S. C. Kremer A Taxonomy for Spatiotemporal Connectionist Networks Revisited: The Unsupervised Case Neural Comput., June 1, 2003; 15(6): 1255 - 1320. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Back and T. Chen Universal Approximation of Multiple Nonlinear Operators by Neural Networks Neural Comput., November 1, 2002; 14(11): 2561 - 2566. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Schmidhuber, F. Gers, and D. Eck Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM Neural Comput., September 1, 2002; 14(9): 2039 - 2041. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Aussem Sufficient Conditions for Error Backflow Convergence in Dynamical Recurrent Neural Networks Neural Comput., August 1, 2002; 14(8): 1907 - 1927. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Kremer Spatiotemporal Connectionist Networks: A Taxonomy and Review Neural Comput., February 1, 2001; 13(2): 249 - 306. [Abstract] [Full Text] |
||||
![]() |
F. A. Gers, J. Schmidhuber, and F. Cummins Learning to Forget: Continual Prediction with LSTM Neural Comput., October 1, 2000; 12(10): 2451 - 2471. [Abstract] [Full Text] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| J COGNITIVE NEUROSCIENCE | NEURAL COMPUTATION | MIT PRESS JOURNALS |