Neural Comp. Sign up for ETOCS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wolpert, D. H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wolpert, D. H.

Neural Computation, Vol 9, 1211-1243, Copyright © 1997 by The MIT Press


ARTICLES

On Bias Plus Variance

David H. Wolpert

This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in Bayesian analysis) and training sets are averaged over (as in the conventional bias plus variance formula). Another additive correction casts conventional fixed-training-set Bayesian analysis directly in terms of bias plus variance. Another correction is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point. Yet another correction can help explain the recent counterintuitive bias-variance decomposition of Friedman for zero-one loss. After presenting these corrections, this article discusses some other loss-function-specific aspects of supervised learning. In particular, there is a discussion of the fact that if the loss function is a metric (e.g., zero-one loss), then there is bound on the change in generalization error accompanying changing the algorithm’s guess from h1 to h2</>, a bound that depends only on h1 and h2 and not on the target. This article ends by presenting versions of the bias-plus-variance formula appropriate for logarithmic and quadratic scoring, and then all the additive corrections appropriate to those formulas. All the correction terms presented are a covariance, between the learning algorithm and the posterior distribution over targets. Accordingly, in the (very common) contexts in which those terms apply, there is not a "bias-variance trade-off" or a "bias-variance dilemma," as one often hears. Rather there is a bias-variance-covariance trade-off.


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. T. A. Shamim, M. Anwaruddin, and H.A. Nagarajaram
Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs
Bioinformatics, December 15, 2007; 23(24): 3320 - 3327.
[Abstract] [Full Text] [PDF]


Home page
Neural Comput.Home page
Z. Chen and S. Haykin
On Different Facets of Regularization Theory
Neural Comput., December 1, 2002; 14(12): 2791 - 2846.
[Abstract] [Full Text]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 1997 by The MIT Press.