Services Test Online

FGENESH_C - Program for predicting multiple genes in genomic DNA sequences using HMM gene model plus similarity with known mRNA/EST.

The program can be used if you know mRNA/EST sequence that is homologous to that of predicted gene. First, run any ab initio gene finding program such as FGENES or FGENESH. Then, run BLAST DB search with each predicted exon. If homologous mRNA is found, use it to improve accuracy of assembly of your predicted gene.

Ab initio gene prediction programs usually correctly predict significant fraction of exons in a gene, but they often assemble gene in incorrect way: combine several genes or split one gene into several, skip exons or include false exons. Using mRNA homology information provided by one or several true predicted exons can significantly improve accuracy of gene finding.

Program use and output are similar to those of FGENESH+:
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct or - for complementary);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
Start and End - Position of the Feature;
Weight - Log likelihood*10 score for the feature ORF - start/end positions where the first complete codon starts and the last codon ends Last three values: Length of exon, positions in protein, percent of similarity with target protein

FGENESH_C Prediction of potential genes in Human genomic DNA
 Time:   Tue Nov  7 15:50:03 2000
 Seq name: HUMSFRS_8213_DNA_14-FEB-1996
 cDNA - >HUMSFRS_8213_DNA_14-FEB-1996 Length  817 Sim: 95
 Length of sequence:  6423  GC content: 43 Zone: 1
 Number of predicted genes 1 in +chain 1 in -chain 0
 Number of predicted exons 8 in +chain 8 in -chain 0
 Positions of predicted genes and exons:
  G Str Feature    Start     End   Score        ORF           Len

  1 +   1 CDSi     105 -     178     33.09     106 -     177     72      5  -     78  100
  1 +   2 CDSi    1213 -    1393    135.18    1215 -    1391    177     79  -    259  100
  1 +   3 CDSi    1702 -    1878    105.94    1703 -    1876    174    260  -    436  100
  1 +   4 CDSi    2754 -    2828     34.63    2755 -    2826     72    437  -    511  100
  1 +   5 CDSi    3250 -    3360     46.17    3251 -    3358    108    512  -    622  100
  1 +   6 CDSi    4659 -    4712     23.18    4660 -    4710     51    623  -    676  100
  1 +   7 CDSi    5227 -    5262     25.79    5228 -    5260     33    677  -    712  100
  1 +   8 CDSl    6219 -    6273     19.89    6220 -    6273     54    713  -    767  100

Predicted protein(s):
>FGENESH_C   1   8 exon (s)    105  -   6273    253 aa, chain +
PGRCLLKSRARGSVIMSRYGRYGGETKVYVGNLGTGAGKGELERAFSYYGPLRTVWIARN
PPGFAFVEFEDPRDAEDAVRGLDGKVICGSRVRVELSTGMPRRSRFDRPPARRPFDPNDR
CYECGEKGHYAYDCHRYSRRRRSRSRSRSHSRSRGRRYSRSRSRSRGRRSRSASPRRSRS
ISLRRSRSASLRRSRSGSIKGSRYFQSPSRSRSRSRSISRPRSSRSKSRSPSPKRSRSPS
GSPRRSASPERMD