CYS_REC: The Program for Predicting SS-bonding States of Cysteines and disulphide briges in Protein Sequences.
The program performs prediction of SS-bonding states of cysteines and locating of disulphide briges in proteins.
Methodology
Procedure: The sequence is processed in steps.
- Secondary structure is predicted for a query sequence.
- Amino acid fragment as well as fragment of secondary structure in ±10 positions interval of each cysteine is compared with such fragments of training sets using prepared log-odds matrix, and the maximal score is defined for each set.
- Scores of comparisons with profiles (weight matrices) constructed on positive (bounded) and negative examples are calculated for a given fragment.
- Value of linear discriminant function is calculated based on 4 the most significant amino acid properties.
- The resulting score computed as a linear combination of five scores listed above is used for the recognition of SS-bonding states of cysteines.
- A neural network calculates some scores for each possible pair of cisteines forming a 'Matrix of pair scores'.
- A pattern of possible pairs of bounded cysteines is defined for maximum of sum of the scores of the matrix.
Input Format
Fasta formatted sequence divided by lines ≤ 80 positions in lengths is accepted.
Specially prepared alignment without gaps in the first sequence is accepted too.
Example of alignment:
|
T0129
5 182
MLISHSDLNQQLKSAGIGFNATELHGFLSGLLCGGLKDQSWLPLLYQFSN
---SYSDFSQQLKTAGIALSAAELHGFLTGLICGGIHDQSWQPLLFQFTN
-LPTYPSLALALSQQAVALTPAEMHGLISGMLCGGSKDNGWQTLVHDLTN
----YDEMNRFLNQQGAGLTPAEMHGLISGMICGGNNDSSWQPLLHDLTN
----YNEMNQYLNQQGTGLTPAEMHGLISGMICGGNDDSSWLPLLHDLTN
DNHAYPTGLVQPVTELYEQISQTLSDVEGFTFELGLTEDENVFTQADSLS
ENHAYPTALLQEVTQIQQHISKKLADIDGFDFELWLPENEDVFTRADALS
EGVAFPQALSLPLQQLHEATQEALEN-EGFMFQLLIPEGEDVFDRADALS
EGLAFGHELAQALRKMHAATSDALED-DGFLFQLYLPEDVSVFDRADALA
EGMAFGHELAQALRKMHSATSDALQD-DGFLFQLYLPDDVSVFDRADALA
DWANQFLLGIGLAQPELAKEKGEIGEAVDDLQDICQLGYDEDDNEEELAE
EWTNHFLLGLGLAQPKLDKEKGDIGEAIDDLHDICQLGYDESDDKEELSE
GWVNHFLLGLGMLQPKLAQVKDEVGEAIDDLRNIAQLGYDEDEDQEELAQ
GWVNHFLLGLGVTQPKLDKVTGETGEAIDDLRNIAQLGYDESEDQEELEM
GWVNHFLLGLGVTQPKLDKVTGETGEAIDDLRNIAQLGYDEDEDQEELEM
ALEEIIEYVRTIAMLFYSHFNEGEIESKPVLH
ALEEIIEYVRTLACLLFTHFQPQLPEQKPVLH
SLEEVVEYVRVAAILCHIEFTQQKPTAKPTLH
SLEEIIEYVRVAALLCHDTFTRQQPTAKPTLH
SLEEIIEYVRVAALLCHDTFTHPQPTAKPTLH
|
Output Format
Query sequence
Positions of cysteines which are predicted to form
disulfide bonds, matrix of pair scores results of SS-bonding states
predictions, the most probable pattern of pairs.
Example of output:
|
CYS_REC Version 2. Recognition of SS-bounded cysteines
>1AC5_
length=483
LPSSEEYKVAYELLPGLSEVPDPSNIPQMHAGHIPLRSEDADEQDSSDLEYFFWKFTNNDSNGNVDRPLIIWLNGGPGCSS
MDGALVESGPFRVNSDGKLYLNEGSWISKGDLLFIDQPTGTGFSVEQNKDEGKIDKNKFDEDLEDVTKHFMDFLENYFKIF
PEDLTRKIILSGESYAGQYIPFFANAILNHNKFSKIDGDTYDLKALLIGNGWIDPNTQSLSYLPFAMEKKLIDESNPNFKH
LTNAHENCQNLINSASTDEAAHFSYQECENILNLLLSYTRESSQKGTADCLNMYNFNLKDSYPSCGMNWPKDISFVSKFFS
TPGVIDSLHLDSDKIDHWKECTNSVGTKLSNPISKPSIHLLPGLLESGIEIVLFNGDKDLICNNKGVLDTIDNLKWGGIKG
FSDDAVSFDWIHKSKSTDDSEEFSGYVKYDRNLTFVSVYNASHMVPFDKSLVSRGIVDIYSNDVMIIDNNGKNVMITT
7 cysteines are found in positions: 79 251 271 293 308 345 386
Matrix of pair scores
POS: 79 251 271 293 308 345
79: -999 -21 -4 8 18 143
251: -21 -999 155 7 -3 -12
271: -4 155 -999 13 -20 -15
293: 8 7 13 -999 133 -8
308: 18 -3 -20 133 -999 -7
345: 143 -12 -15 -8 -7 -999
CYS 79 is SS-bounded Score= 56.7
CYS 251 is SS-bounded Score= 53.2
CYS 271 is SS-bounded Score= 47.0
CYS 293 is SS-bounded Score= 68.1
CYS 308 is SS-bounded Score= 63.9
CYS 345 is SS-bounded Score= 60.7
CYS 386 is not SS-bounded Score= -70.7
The most probable pattern of pairs: 79-345, 251-271, 293-308,
|
Performance:
3000 positive and 3000 negative examples (i.e ± 10 fragments surrounding bounded and not bounded cysteines) were prepared from PDB sequences that were not participated in the training. An accuracy of SS-bonding states recognition by combined function on this control set was ~90%.