Neural Comp. NEW Faster Access
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hinton, G. E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hinton, G. E.
(Neural Computation. 2002;14:1771-1800.)
© 2002 The MIT Press

Training Products of Experts by Minimizing Contrastive Divergence

Geoffrey E. Hinton

hinton{at}cs.toronto.edu, Gatsby Computational Neuroscience Unit, University College London, London WC1N 3AR, U.K.

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.




This article has been cited by other articles:


Home page
Neural Comput.Home page
I. Sutskever and G. E. Hinton
Deep, Narrow Sigmoid Belief Networks Are Universal Approximators
Neural Comput., November 1, 2008; 20(11): 2629 - 2636.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
J. R. Movellan
Contrastive Divergence in Gaussian Diffusions
Neural Comput., September 1, 2008; 20(9): 2238 - 2252.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
N. L. Roux and Y. Bengio
Representational power of restricted boltzmann machines and deep belief networks.
Neural Comput., June 1, 2008; 20(6): 1631 - 1649.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
P. Byrne and S. Becker
A principle for learning egocentric-allocentric transformation.
Neural Comput., March 1, 2008; 20(3): 709 - 737.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
S.-i. Amari
Integration of Stochastic Models by Minimizing {alpha}-Divergence
Neural Comput., October 1, 2007; 19(10): 2780 - 2796.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
R. Turner and M. Sahani
A maximum-likelihood interpretation for slow feature analysis.
Neural Comput., April 1, 2007; 19(4): 1022 - 1038.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
A. Hyvarinen
Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines.
Neural Comput., October 1, 2006; 18(10): 2283 - 2292.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
S. Osindero, M. Welling, and G. E. Hinton
Topographic Product Models Applied to Natural Scene Statistics
Neural Comput., February 1, 2005; 18(2): 381 - 414.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2002 by The MIT Press.