Services Test Online

FGENESH_GC - Program for predicting multiple genes in genomic DNA sequences

A version of FGENESH program including NONCANONICAL GC dinucleotide in donor splice sites is installed to use on-line.
This program is useful to analyze ALTERNATIVE gene structure, where non-standard splice sites are often found (see also FGENES-M program to predict alternative gene variants) and create A SET of GENES and PROTEINS absent in standard gene prediction.
Donor GC splice site is accounting for the major part of non-standard splice sites in human genes. It present about 0.6% of all splice sites and observed in more than 5% of human genes. Prediction genes on large scale genomic sequences will contain hundreds of GC-donor exons and required programs which will predict their major amount. The noncanonical splice sites were investigated by us recently (Burset, Seledtsov and Solovyev, 2000, Nucleic Acids Res., 28(21), 4364-4375) and we received about 20000 verified by EST splice sites. We received a very strong GC-donor site weight matrix which is used in gene prediction program. We have developed this variant of program to predict GC-donor exons in addition to standard exons and we preserve the accuracy of program on the standard genes. Testing the program on 68 human genes with at least one GC donor site shows that FGENESH (GC) provide 10% higher rate of exact exon prediction for such group and 5% higher accuracy on the nucleotide level.

Click Human parameters and FGENESH_GC button Paste your sequence to the window or load your file with sequence in FASTA format

Solovyev V.V. (2001) Statistical approaches in Eukaryotic gene prediction. In Handbook of Statistical genetics (eds. Balding D. et al.), John Wiley & Sons, Ltd., p. 83-127.

Fgenesh_GC output:
(IN THIS EXAMPLE 2nd EXON HAVING GC-DONOR SITE IS FOUND, and it is LOST by STANDARD gene finders)
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct or - for complementary);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
Start and End - Position of the Feature;
Weight - Log likelihood*10 score for the feature;
ORF - start/end positions where the first complete codon starts and the last codon ends.


fgeneshgc  Wed Jan 30 20:59:27 EST 2002
 FGENESH (with GC possible donor site) Gene prediction in Human      genomic DNA
 Time:   Wed Jan 30 20:59:27 2002
 Seq name: Softberry SERVER PAST Sequence 
 Length of sequence:  2932  GC content: 65 Zone: 4
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 5 in +chain 5 in -chain 0
 Positions of predicted genes and exons:
  G Str Feature    Start     End   Score        ORF           Len

  1 +   1 CDSf     501 -     580     15.57     501 -     578     78
  1 +   2 CDSi     747 -     853     22.53     748 -     852    105
  1 +   3 CDSi    1847 -    1980     17.97    1849 -    1980    132
  1 +   4 CDSi    2255 -    2333     10.88    2255 -    2332     78
  1 +   5 CDSl    2563 -    2705     15.94    2565 -    2705    141

Predicted protein(s):
>FGENESH   1   5 exon (s)    501  -   2705    180 aa, chain +
MADSELQLVEQRIRSFPDFPTPGVVFRDISPVLKDPASFRAAIGLLARHLKATHGGRIDY
IAGLDSRGFLFGPSLAQELGLGCVLIRKRGKLPGPTLWASYSLEYGKAELEIQKDALEPG
QRVVVVDDLLATGGTMNAACELLGRLQAEVLECVSLVELTSLKGREKLAPVPFFSLLQYE