The program predicts putative cytotoxic T lymphocyte (CTL) epitopes
in protein sequences. These polypeptides are known as potential candidates
for vaccine design.
The sequence length for predicted epitopes is 9.
Protein sequence in 20-letter alphabet in FASTA format.
For each position of the sequence (except eight C-terminal positions) the program output whether the polypeptide of length 9 starting at this position is predicted as cytotoxic T lymphocyte epitope(*) or not ( ). If List Output checkbox is checked, list of predicted epitopes is printed out.
The algorithm uses sequence comparison and linear discriminant analysis to predict CTL-epitopes. For each query sequence of length 9 we calculate position score similarity values with position specific score matrices derived for positive and negatibe training sets (9 predicting parameters).
Additionally we calculate 5 top sequence similarity scores of query sequence with sequences from positive set and 5 top scores from negative set (10 parameters).
Using such 19 parameters we obtain linear discriminant function for training dataset.
We use this frunction to discriminate between epitope and non-epitope sequences.
We used MHCBN database (1) to obtain training and testing datasets.
The algorithm of data extraction is similar to that described in (2). For positive examples we selected CTL epitopes from database using criteria: [ACTIVITY=yes] & [SEQLEN=9] & [BINDING=yes]. 1368 left after removing identical sequences and sequences with non-standard amino acids.
Negative dataset was constructed on the basis of non-epitope and non-binding sequences in the same way as described in (2).
Data were randomly split into 200+200 negative and positive sequences for test set and the rest sequences comprising training set.
For test set the fraction of true predictions by our program is 0.835 (334 true prediction out of 400).
(1) Bhasin M, Singh H, Raghava GPS. MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics (2003)19:666.
(2) Bhasin M, Raghava GPS. Prediction of CTL epitopes using QM, SVM and ANN techniques. Vaccine (2004)22:3195-3204.