The program can be used if you know mRNA/EST sequence that is homologous to that of predicted gene. First, run any ab initio gene finding program such as FGENES or FGENESH. Then, run BLAST DB search with each predicted exon. If homologous mRNA is found, use it to improve accuracy of assembly of your predicted gene.
Ab initio gene prediction programs usually correctly predict significant fraction of exons in a gene, but they often assemble gene in incorrect way: combine several genes or split one gene into several, skip exons or include false exons. Using mRNA homology information provided by one or several true predicted exons can significantly improve accuracy of gene finding.
Program use and output are similar to those of FGENESH+:
G - predicted gene number, starting from start of sequence;
Str - DNA strand (+ for direct or - for complementary);
Feature - type of coding sequence: CDSf - First (Starting with Start codon), CDSi - internal (internal exon), CDSl - last coding segment, ending with stop codon);
TSS - Position of transcription start (TATA-box position and score);
Start and End - Position of the Feature;
Weight - Log likelihood*10 score for the feature ORF - start/end positions where the first complete codon starts and the last codon ends Last three values: Length of exon, positions in protein, percent of similarity with target protein
FGENESH_C Prediction of potential genes in Human genomic DNA Time: Tue Nov 7 15:50:03 2000 Seq name: HUMSFRS_8213_DNA_14-FEB-1996 cDNA - >HUMSFRS_8213_DNA_14-FEB-1996 Length 817 Sim: 95 Length of sequence: 6423 GC content: 43 Zone: 1 Number of predicted genes 1 in +chain 1 in -chain 0 Number of predicted exons 8 in +chain 8 in -chain 0 Positions of predicted genes and exons: G Str Feature Start End Score ORF Len 1 + 1 CDSi 105 - 178 33.09 106 - 177 72 5 - 78 100 1 + 2 CDSi 1213 - 1393 135.18 1215 - 1391 177 79 - 259 100 1 + 3 CDSi 1702 - 1878 105.94 1703 - 1876 174 260 - 436 100 1 + 4 CDSi 2754 - 2828 34.63 2755 - 2826 72 437 - 511 100 1 + 5 CDSi 3250 - 3360 46.17 3251 - 3358 108 512 - 622 100 1 + 6 CDSi 4659 - 4712 23.18 4660 - 4710 51 623 - 676 100 1 + 7 CDSi 5227 - 5262 25.79 5228 - 5260 33 677 - 712 100 1 + 8 CDSl 6219 - 6273 19.89 6220 - 6273 54 713 - 767 100 Predicted protein(s): >FGENESH_C 1 8 exon (s) 105 - 6273 253 aa, chain + PGRCLLKSRARGSVIMSRYGRYGGETKVYVGNLGTGAGKGELERAFSYYGPLRTVWIARN PPGFAFVEFEDPRDAEDAVRGLDGKVICGSRVRVELSTGMPRRSRFDRPPARRPFDPNDR CYECGEKGHYAYDCHRYSRRRRSRSRSRSHSRSRGRRYSRSRSRSRGRRSRSASPRRSRS ISLRRSRSASLRRSRSGSIKGSRYFQSPSRSRSRSRSISRPRSSRSKSRSPSPKRSRSPS GSPRRSASPERMD