Prediction of protein secondary sturcture by combining nearest-neighbor algorithms and multiply sequence alignments

Input sequence for this program should be in fasta format with 80 or less sequence letters per line.

Yi and Lander (*) developed a neural-network and nearest-neighbor method with a scoring system that combined a sequence similarity matrix with the local structural environment scoring scheme of Bowie et al.(**) for predicting protein secondary structure. We have improved their scoring system by taking into consideration N- and C-terminal positions of a-helices and b-strands and also b-turns as distinctive types of secondary structure. Another improvement, which also significantly decrease the time of computation, is performed by restricting a data base with a smaller subset of proteins which are similar with a query sequence. Using multiple sequence alignments rather than single sequences and a simple jury decision method we achieved an over all three-state accuracy of 72.2%, which is better than that observed for the most accurate multilayered neural network approach, tested on the same data set of 126 non-homologous protein chains.

(*) Yi T-M., Lander E.S. (1993)

Protein secondary structure prediction using nearest-neighbor methods.

J.Mol.Biol.,232:1117-1129.

(**) Bowie J.U., Luthy R., Eisenberg D. (1991)

A method to identify protein sequences that fold into a known
three-dimensional structure.

Science, 253, 164-170.)

**Accuracy:**

Overall 3-states (a, b, c) prediction gives ~67.6% correctly predic- ted residues on 126 non-homologous proteins using the jack-knife test procedure. Using multiple sequence alignments instead of single sequences increases prediction accuracy up to 72.2%.

SEE ALSO **"SSP"** program.

**Example of NNssp output:** This output contains probabilities (Pa and Pb) of a and b structures in 0-9 scale. Probability of c is approximately 10 - Pa - Pb.

ADENYLATE KINASE ISOENZYME-3, /GTP:AMP$ L= 214 SS content: a- 0.43 b= 0.05 c= 0.52 10 20 30 40 50 PredSS aaaaaaa aaaaaa aaaaaaaa aa AA seq RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA Prob a 99888651000001112244545422211111346775554221332335 Prob b 00001221000001134422321222233221001110010101134443 60 70 80 90 100 PredSS aaaa aaaaaaaaaaaaaaaa aaaaaaaaa AA seq KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY Prob a 54543201110346789888877545553334210001113588888875 Prob b 22221001210001111000000000111233410101110000000011 110 120 130 140 150 PredSS bb aaaaaaaa bb bbbb AA seq QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE Prob a 32111111111466766643321110001100000000000111111111 Prob b 12135643321222110122245531001478764210013333211101 160 170 180 190 200 PredSS aaaaaaaaaaaaaaaaaaaaaaa bbb a AA seq PLVQREDDRPETVVKRLKAYEAQTEPVLEYYRKKGVLETFSGTETNKIWP Prob a 23433211146788999997765577888886621121111111123335 Prob b 12321000001110000000000000000000101365542111111221 210 PredSS aaaaaaa AA seq HVYAFLQTKLPQRS Prob a 46687764210111 Prob b 22211110110001

**Reference: **

Salamov A.A., Solovyev V.V.

Prediction of protein secondary sturcture by combining nearest-neighbor
algorithms and multiply sequence alignments.

J.Mol.Biol.,1995, 247, 11-15.