Prediction of metabolic pathways is based on InterPro indices from UNIPROT database.
We selected all records uniprot_sprot_archaea.dat and uniprot_sprot_bacteria.dat that mentioned involvement of a given protein in metabolic pathways. From these records, indices InterPro, descriptions of metabolic pathways and organism names were retrieved.
Based on these data, Apriori program was used to develop the rules for predicting metabolic pathways: The source data were randomly split into learning and testing sets, the former about five times larger than the latter.
The learning set was used to develop the rules, and the testing set to test the predictions.
Metabolic pathways were predicted correctly for 84.1% proteins, while for 15.8% the program hasn’t found any pathways. Only for four proteins out of 14,821 the pathways were predicted incorrectly.
For each query sequence, InterProscan program generates a set of InterPro indices, which are then compared with the rules developed on a learning set, and prediction is made.