Neural Comp. NEW Faster Access
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Christensen, J. K.
Right arrow Articles by Lund, O.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Christensen, J. K.
Right arrow Articles by Lund, O.
(Neural Computation. 2003;15:2931-2942.)
© 2003 The MIT Press


Letter

Selecting Informative Data for Developing Peptide-MHC Binding Predictors Using a Query by Committee Approach

Jens Kaae Christensen

jenskc{at}cbs.dtu.dk, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Kasper Lamberth

k.lamberth{at}immi.ku.dk, Department of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Dk-2200 Copenhagen N, Denmark

Morten Nielsen

mniel{at}cbs.dtu.dk, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Claus Lundegaard

lunde{at}cbs.dtu.dk, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Peder Worning

peder{at}cbs.dtu.dk, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Sanne Lise Lauemøller

lauemoeller{at}get2net.dk, Department of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Dk-2200 Copenhagen N, Denmark

Søren Buus

sb{at}immi.ku.dk, Department of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Dk-2200 Copenhagen N, Denmark

Søren Brunak

brunak{at}cbs.dtu.dk, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Ole Lund

lund{at}cbs.dtu.dk, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Strategies for selecting informative data points for training prediction algorithms are important, particularly when data points are difficult and costly to obtain. A Query by Committee (QBC) training strategy for selecting new data points uses the disagreement between a committee of different algorithms to suggest new data points, which most rationally complement existing data, that is, they are the most informative data points. In order to evaluate this QBC approach on a real-world problem, we compared strategies for selecting new data points. We trained neural network algorithms to obtain methods to predict the binding affinity of peptides binding to the MHC class I molecule, HLA-A2. We show that the QBC strategy leads to a higher performance than a baseline strategy where new data points are selected at random from a pool of available data. Most peptides bind HLA-A2 with a low affinity, and as expected using a strategy of selecting peptides that are predicted to have high binding affinities also lead to more accurate predictors than the base line strategy. The QBC value is shown to correlate with the measured binding affinity. This demonstrates that the different predictors can easily learn if a peptide will fail to bind, but often conflict in predicting if a peptide binds. Using a carefully constructed computational setup, we demonstrate that selecting peptides with a high QBC performs better than low QBC peptides independently from binding affinity. When predictors are trained on a very limited set of data they cannot be expected to disagree in a meaningful way and we find a data limit below which the QBC strategy fails. Finally, it should be noted that data selection strategies similar to those used here might be of use in other settings in which generation of more data is a costly process.







HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
J COGNITIVE NEUROSCIENCE NEURAL COMPUTATION MIT PRESS JOURNALS
Copyright © 2003 by The MIT Press.