Bioinformatics, 2006, 22, 453-459. Epub 2005 Dec 13
PMID: 16352655
Method: We combine a simple-Bayes classifier with IMGT unique numbering. Our method comprises two steps: (i) selection of discriminant binary features, which associate an alignment position with an amino acid group; (ii) learning of the classifier by estimating the frequencies of selected features, conditionally to B2M binding property.
Results: Our dataset contains aligned sequences of 806 allelic forms of 47 MhcSF proteins, corresponding to 9 receptor types and 4 mammalian species. 18 discriminant features are selected, belonging to B2M contact sites, or stabilizing the molecular structure that is required for this contact. Three leave-one-out procedures are used to assess classifier performance, which corresponds to B2M binding prediction for: (1) new proteins, (2) species being not represented in the dataset, (3) new receptor types. High prediction accuracy is showed, of 98%, 94% and 70%, respectively. Application of our classifier to inferior vertebrate MHC-I proteins indicate that these proteins bind to B2M and should then be expressed on cellular surface by a process similar to that of mammalian MHC-I proteins. These results demonstrate the usefulness and accuracy of our (simple) approach, which should apply to other function or interaction prediction problems.