Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Watanabe, S.
Right arrow Articles by Amari, S.-i.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Watanabe, S.
Right arrow Articles by Amari, S.-i.
(Neural Computation. 2003;15:1013-1033.)
© 2003 The MIT Press


Letter

Learning Coefficients of Layered Models When the True Distribution Mismatches the Singularities

Sumio Watanabe

swatanab{at}pi.titech.ac.jp, Precision and Intelligence Laboratory, Tokyo Institute of Technology, Midori-ku, Yokohama, 226-8503 Japan

Shun-ichi Amari

amari{at}brain.riken.go.jp, Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Wako-shi, Saitama, 351-0198, Japan

Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degenerate, with the result that the conventional learning theory of regular statistical models does not hold. Recently, it was proved that if the parameter of the true distribution is contained in the singularities of the learning machine, the generalization error in Bayes estimation is asymptotically equal to {lambda}/n, where 2{lambda} is smaller than the dimension of the parameter and n is the number of training samples. However, the constant {lambda} strongly depends on the local geometrical structure of singularities; hence, the generalization error is not yet clarified when the true distribution is almost but not completely contained in the singularities. In this article, in order to analyze such cases, we study the Bayes generalization error under the condition that the Kullback distance of the true distribution from the distribution represented by singularities is in proportion to 1/n and show two results. First, if the dimension of the parameter from inputs to hidden units is not larger than three, then there exists a region of true parameters such that the generalization error is larger than that of the corresponding regular model. Second, if the dimension from inputs to hidden units is larger than three, then for arbitrary true distribution, the generalization error is smaller than that of the corresponding regular model.




This article has been cited by other articles:


Home page
Neural Comput.Home page
H. Wei, J. Zhang, F. Cousseau, T. Ozeki, and S.-i. Amari
Dynamics of learning near singularities in layered networks.
Neural Comput., March 1, 2008; 20(3): 813 - 843.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
S. Nakajima and S. Watanabe
Variational bayes solution of linear neural networks and its generalization performance.
Neural Comput., April 1, 2007; 19(4): 1112 - 1153.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
S.-i. Amari, H. Park, and T. Ozeki
Singularities affect dynamics of learning in neuromanifolds.
Neural Comput., May 1, 2006; 18(5): 1007 - 1065.
[Abstract] [Full Text] [PDF]


Home page
IEICE Trans Inf & SystHome page
S. NAKAJIMA and S. WATANABE
Generalization Performance of Subspace Bayes Approach in Linear Neural Networks
IEICE Trans D: Information, March 1, 2006; E89-D(3): 1128 - 1138.
[Abstract] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2003 by The MIT Press.