Prediction of splice sites in DNA sequences
Using information about significant triplet frequencies in various functional parts of splice site regions, and preferences of octanucleotides in protein coding and intron regions, a combined linear discriminant recognition function was developed. The splice site prediction scheme gives an accuracy of donor site recognition on the test set 97% (correlation coefficient C=0.62) and 96% for acceptor splice sites (C=0.48). The method is a good alternative to neural network approach (Brunak et al.,Mol.Biol.,1991) that has C=0.61 with 95% accuracy of donor site prediction and C < 40 with 95% accuracy of acceptor site prediction. False positive rate for splice site prediction is relatively high - about one false positive per one true site for 97% accuracy of true sites prediction. More precise splice site positions might be found if you use programs of exons recognition (Fex) and gene structure prediction (Fgenesh).
First line - name of your sequence
Second line - length of your sequence
After that are positions and scores of the predicted sites
HUMALPHA 4556 bp ds-DNA PRI 15-SEP-1 length of sequence - 4556 Number of Donor sites: 11 Threshold: 0.76 1 329 0.76 2 517 0.87 3 728 0.88 4 955 0.98 5 1322 0.81 6 1954 0.85 .............. Number of Acceptor sites: 18 Threshold: 0.65 1 244 0.65 2 379 0.67 3 610 0.89 4 615 0.68 5 838 0.83 6 1146 0.75 ...............
Solovyev V.V., Salamov A.A., Lawrence C.B. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. (Nucl.Acids Res.,1994,22,24,5156-5163).
Solovyev V.V., Salamov A.A. , Lawrence C.B. The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. in: The Second International conference on Intelligent systems for Molecular Biology (eds. Altman R., Brutlag D., Karp R., Latrop R. and Searls D.), AAAI Press, Menlo Park, CA (1994, 354-362)
Solovyev V.V., Lawrence C.B. (1993) Identification of Human gene functional regions based on oligonucleotide composition. In Proceedings of First International conference on Intelligent System for Molecular Biology (eds. Hunter L., Searls D., Shalvic J.), Bethesda, 371-379.