|
|
||||||||
Neural Computation, Vol 10, 1895-1923, Copyright © 1998 by The MIT Press
LETTERS |
Thomas G. Dietterich
This article reviews five approximate statistical tests for determining whether onelearning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5x2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The cross-validated t test is the most powerful. The 5x2 cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, McNemar's test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5x2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.
This article has been cited by other articles:
![]() |
H. Shin and S. Cho Neighborhood property-based pattern selection for support vector machines. Neural Comput., March 1, 2007; 19(3): 816 - 855. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. P. Enot, M. Beckmann, D. Overy, and J. Draper Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals PNAS, October 3, 2006; 103(40): 14865 - 14870. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hochreiter and K. Obermayer Support vector machines for dyadic data. Neural Comput., June 1, 2006; 18(6): 1472 - 1510. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Berrar, I. Bradbury, and W. Dubitzky Avoiding model selection bias in small-sample genomic datasets Bioinformatics, May 15, 2006; 22(10): 1245 - 1250. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.M. Cohen, W.R. Hersh, K. Peterson, and P.-Y. Yen Reducing Workload in Systematic Review Preparation Using Automated Citation Classification J. Am. Med. Inform. Assoc., March 1, 2006; 13(2): 206 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. S. Catchpole, M. Beckmann, D. P. Enot, M. Mondhe, B. Zywicki, J. Taylor, N. Hardy, A. Smith, R. D. King, D. B. Kell, et al. Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops PNAS, October 4, 2005; 102(40): 14458 - 14462. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. A. Vinterbo, E.-Y. Kim, and L. Ohno-Machado Small, fuzzy and interpretable gene expression based classifiers Bioinformatics, May 1, 2005; 21(9): 1964 - 1970. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis Bioinformatics, March 1, 2005; 21(5): 631 - 643. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Baddeley, H. A. Ingram, and R. C. Miall System Identification Applied to a Visuomotor Task: Near-Optimal Human Performance in a Noisy Changing Task J. Neurosci., April 1, 2003; 23(7): 3066 - 3075. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vehtari and J. Lampinen Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities Neural Comput., October 1, 2002; 14(10): 2439 - 2468. [Abstract] [Full Text] |
||||
![]() |
H. Schwenk and Y. Bengio Boosting Neural Networks Neural Comput., August 1, 2000; 12(8): 1869 - 1887. [Abstract] [Full Text] |
||||
![]() |
E. Alpaydin Combined 5 2 cv F Test for Comparing Supervised Classification Learning Algorithms Neural Comput., November 15, 1999; 11(8): 1885 - 1892. [Abstract] [Full Text] |
||||
![]() |
M. Brand Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction Neural Comput., July 1, 1999; 11(5): 1155 - 1182. [Abstract] [Full Text] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| J COGNITIVE NEUROSCIENCE | NEURAL COMPUTATION | MIT PRESS JOURNALS |