|
|
||||||||
hinton{at}cs.toronto.edu, Gatsby Computational Neuroscience Unit, University College London, London WC1N 3AR, U.K.
It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.
This article has been cited by other articles:
![]() |
I. Sutskever and G. E. Hinton Deep, Narrow Sigmoid Belief Networks Are Universal Approximators Neural Comput., November 1, 2008; 20(11): 2629 - 2636. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Movellan Contrastive Divergence in Gaussian Diffusions Neural Comput., September 1, 2008; 20(9): 2238 - 2252. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. L. Roux and Y. Bengio Representational power of restricted boltzmann machines and deep belief networks. Neural Comput., June 1, 2008; 20(6): 1631 - 1649. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Byrne and S. Becker A principle for learning egocentric-allocentric transformation. Neural Comput., March 1, 2008; 20(3): 709 - 737. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-i. Amari Integration of Stochastic Models by Minimizing {alpha}-Divergence Neural Comput., October 1, 2007; 19(10): 2780 - 2796. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Turner and M. Sahani A maximum-likelihood interpretation for slow feature analysis. Neural Comput., April 1, 2007; 19(4): 1022 - 1038. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hyvarinen Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines. Neural Comput., October 1, 2006; 18(10): 2283 - 2292. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Osindero, M. Welling, and G. E. Hinton Topographic Product Models Applied to Natural Scene Statistics Neural Comput., February 1, 2005; 18(2): 381 - 414. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| J COGNITIVE NEUROSCIENCE | NEURAL COMPUTATION | MIT PRESS JOURNALS |