Annotation of Plant Genomes: Genes, Promoters, Functional Motifs, Protein Sub-Cellular Localization. Softberry Software and Services:

  • Ab initio gene prediction: Fgenesh (3180 citation of using in genomic projects according to Google scholar) with total 506 genome-specific parameters for various organisms, including 42 parameters for plants.
  • Gene prediction using similar protein support: Fgenesh+
  • Automatic genome annotation pipeline Fgenesh++ (see flowchart) that includes mapping known genes (mRNAs) by Est_map and database protein mapping (Prot_map) with subsequent evidence-based gene prediction by Fgenesh+.
  • Fgenesh++est: Anntotaion pipeline that incorporates EST mapping for improved predictions, including actual noncoding 5'- and 3'-gene ends, when possible.
  • PGF: Pseudogene finding program.
  • Togenbank: A set of script fot converting outputs of FGENESH or annotation pipeline to GenBank and Sequin formats, for vizualization in popular viewers and submission to GeneBank.
  • TSSP: Promoter prediction program for plant genes.
  • NsitePL: Search tool for functional motifs in plant promoter sequences.
  • RegSite DB: Database of plant functional motifs with 1816 entries.
  • ProtCompPL: Program for predicting protein localization in cellular compartments.
  • FindmiRNA and TargetmiRNA: programs for finding miRNAs and their targets.
  • Genome Explorer: Interactive genome viewer with search capabilities.
  • Other programs: Finding new regulatory motifs; analysis of expression data; comparison of genomes etc.
  • Services: Annotation of new genomes, development of customized pipelines and custom genome-specific parameters for gene finders; compilation of specific genes in certain genomes (for example, all cytochrome P450 genes) etc.

Evaluations:

Plant Molecular Biology (2005), 57, 3, 445-460: "Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. FGENESH yielded the most accurate and GeneMark.hmm the second most accurate predictions" (FGENESH identified 11% more correct gene models than GeneMark on a set of 1353 test genes).

Science (2002) 296, 79-92. As part of rice genome sequencing project, the team led by Beijing Genomics Institute has compared several well-known ab initio gene prediction programs and shown that FGENESH is by far the most accurate. As a result, their rice genome annotation was based almost exclusively on FGENESH results.

Application for plant genome analysis in selected publications:

Fgenesh group software: Nature (2008) 452, 991 - 996; Nature (2008) 4doi:10.1038/nature07410; Plant Physiology (2008) 146, 940-951; Plant Physiology (2008) 146, 200-212; Molecular Plant (2008) 1(3), 471-481; Nature Biotechnology (2007) 25, 930 - 937; Plant Physiology (2007) 144, 623-636; The Plant Cell (2006) 18, 1339-1347; Nature (2005) 436, 793-800; Nature Genetics (2005) 37, 997-1002; Plant Physiology (2005) 139:1612-1624; Nature Biotechnology (2005) 23, 482-487; Science (2003) 300, 1566-1569; Nature (2002) 420, 316-320; Science (2002) 296:79-92; Science (2002) 296, 92-100. Searching Google Scholar by keywords Fgenesh and plant  yields ~500 research publications that report using Fgenesh software for gene finding in plants (Arabidopsis, rice, maize, banana, medicago, poplar, tomato, and many other genomes).
TSSP: Plant Biotechnology Journal (2007)5,5, 664-674; Genetics (2007) 176, 2541-2549; The Plant Cell (2006) Vol. 18, 2929-2945; Genome (2006) 49, 3, 209-218; Genetica (2006), 128, 395-407; Bioinformatics (2005), 21,14, 3074-3081; BMC Bioinformatics (2005) 6,114, doi:10.1186/1471-2105-6-114; The Plant Journal (2004) 37 (4), 517-527; Plant Physiology (2004), 136, 3023-3033.
NSITE-PL: Journal of Experimental Botany (2008) 59(8), 2043-2056; Plant Pathology (2008) 57(1), 92-102; Plant Pathology (2008) 57(1), 92-102; Plant Physiology (2007), 144, 1786-1796; In Silico Biology (2007) 7, 1,7-19; The Plant Cell (2007)19,1278-1294; Biochimica et Biophysica Acta (BBA) (2007) 1769, 2, 139-148; Plant Molecular Biology (2006) 60, 2, 269-275; The Plant Cell (2006) 18, 2443-2451.
ProtCompPL: Journal of Plant Physiology (2007) 164, 3, 350-363; In Silico Biology (2007) 7, 1,7-19; Journal of Experimental Botany (2006) 57, 14, 3767-3779; Molecular Biology Reports (2006), 33,4, 279-285; MPMI (2006)19, 10, 1055-1061; Genetics and Molecular Biology (2005) , 28, 3 (suppl), 529-538; MPMI (2004) 17, 7, 789-797; Plant Physiology (2004) 134:286-295; Journal of Cellular Biochemistry (2003) 90:361-378.

Fgenesh, TSSP, NsitePL and ProtCompPL have also been used in numerous patents and patent applications.