Services Test Online

New and updated gene identification programs: Fgenesh+ and Prot_map in comparison with GeneWise.

Softberry significantly improved its gene prediction with protein support programs. New Prot_map program can be used to generate a set of gene in new organism and use them to learn parameters for gene prediction programs fgenesh and fgenesh+. It is very useful to find pseudogenes by selection corrupted genes generated by mapping known proteins.

Speed of processing sequences

  Fgenesh+ Prot_map GeneWise
88 sequences of genes < 20 kb ~1 min ~1 min ~90 min
8 sequences of genes > 400000 kb ~1 min ~1 min ~1200 min

Prot_map mapping of Human protein set of 55946 proteins on chromosome 19 (~59 MB) takes just 90 min (best hit for each protein) and 148 min (all significant hits for each protein)

Accuracy comparison

Comparison of accuracy of gene prediction by ab initio Fgenesh and prediction with protein support by Fgenesh+ or GenWise and Prot_map - mapping protein to human DNA is done on large set of human genes with using mouse or drosophila homologous proteins. We can see that Fgenesh+ shows the best performance with mouse proteins. With Drosophila proteins ab initio prediction Fgenesh works better than GeneWise for all ranges of similarity and Fgenesh+ is the best predictor if similarity is higher 60%. Sn ex, Sensitivity on exon level (exact exon predictions); Sno ex, sensitivity with exon overlap; Sp ex, specificity, exon level; Sn nuc, seisitivity, nucleotides; Sp nuc, specificity, nucleotides; CC, correlation coefficient; %CG, percent of genes predicted completely correctly (no missing and no extra exons, and all exon boundaries are predicted exactly correctly).

Gene prediction with mouse protein support:

1. Similarity level > 90% - 921 sequences

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 86.2 91.7 88.6 93.9 93.4 0.9334 34
Genwise 93.9 97.6 95.9 99.0 99.6 0.9926 66
Fgenesh+ 97.3 98.9 98.0 99.1 99.6 0.9936 81
Prot_map 95.9 98.3 96.9 99.1 99.5 0.9924 73

2. 80% < similarity level < 90% - 1441 sequences

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 85.8 92.1 87.7 94.0 93.4 0.9334 30
Genwise 92.6 98.0 94.1 98.9 99.5 0.9912 58
Fgenesh+ 96.8 99.0 97.2 99.1 99.5 0.9929 77
Prot_map 93.9 98.5 94.1 98.9 99.3 0.9898 60

2. 60% < similarity level < 80% - 1425 sequences

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 83.4 90.9 86.8 93.2 94.9 0.937 30
Genwise 88.1 96.5 90.5 97.8 99.2 0.984 43
Fgenesh+ 93.9 97.9 94.9 98.4 99.3 0.988 65
Prot_map 87.0 96.5 86.6 97.0 98.5 0.976 40

3. 0% < similarity level < 60% - 259 sequences

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 78.7 88.0 82.3 91.8 93.5 0.921 33
Genwise 74.8 92.1 79.5 95.6 98.6 0.963 24
Fgenesh+ 84.8 94.0 87.5 95.9 98.4 0.969 48
Prot_map 65.8 91.6 61.5 90.3 96.8 0.930 19
Gene prediction with Drosophila proteins with similarity ranging from 22% to 98% and coverage in both proteins > 75%:

1. Similarity level > 80% - 66 sequences.

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 90.5 93.8 95.1 97.9 96.9 0.950 55
Genwise 79.3 83.9 86.8 97.3 99.5 0.985 23
Fgenesh+ 95.1 97.8 97.0 98.9 99.5 0.9914 70
Prot_map 86.4 95.3 88.1 97.6 99.0 0.982 41

2. 60% < similarity level < 80% - 290 sequences

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 88.6 93.1 90.8 94.9 93.8 0.941 34
Genwise 76.3 91.8 82.9 92.8 99.4 0.959 7
Fgenesh+ 89.2 94.4 92.7 95.5 98.5 0.968 44
Prot_map 75.1 92.5 74.9 91.4 97.5 0.941 10

2. 40% < similarity level < 60% - 653 sequences

  Sn ex Sno ex Sp ex Sn nuc Sp nuc CC %CG
Fgenesh 86.3 91.8 88.4 93.6 92.8 0.917 30
Genwise 64.5 85.2 75.1 84.9 98.5 0.911 1
Fgenesh+ 78.2 89.5 82.8 89.5 96.3 0.925 20
Prot_map 48.1 81.0 44.8 73.6 91.4 0.811 1