Annotation of Plant Genomes: Genes, Promoters, Functional Motifs, Protein
Sub-Cellular Localization. Softberry Software and Services:
- Ab initio gene prediction: Fgenesh (3180 citation of using in genomic projects according to Google scholar)
with
total
506 genome-specific parameters for various organisms, including 42 parameters for
plants.
- Gene prediction using similar protein support: Fgenesh+
- Automatic genome annotation pipeline
Fgenesh++ (see flowchart)
that includes
mapping known genes (mRNAs) by Est_map
and database protein mapping (Prot_map)
with subsequent evidence-based gene prediction by Fgenesh+.
- Fgenesh++est: Anntotaion pipeline that incorporates EST mapping
for improved predictions, including actual noncoding 5'- and 3'-gene
ends, when possible.
- PGF: Pseudogene finding program.
- Togenbank: A set of script fot converting outputs of FGENESH or annotation
pipeline to GenBank and Sequin formats, for vizualization in popular
viewers and submission to GeneBank.
- TSSP:
Promoter prediction program for plant genes.
- NsitePL:
Search tool for functional motifs in plant promoter sequences.
- RegSite DB: Database of plant
functional motifs with 1816 entries.
- ProtCompPL:
Program for predicting protein localization in cellular compartments.
- FindmiRNA
and TargetmiRNA:
programs for finding miRNAs and their targets.
- Genome
Explorer: Interactive genome viewer with search capabilities.
- Other programs: Finding new regulatory motifs; analysis of
expression data; comparison of genomes etc.
- Services: Annotation of new genomes, development of customized
pipelines and custom genome-specific parameters for gene finders; compilation
of specific genes in certain genomes (for example, all cytochrome P450
genes) etc.
Evaluations:
Plant Molecular Biology (2005), 57, 3, 445-460:
"Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN,
GlimmerR and Grail) were evaluated for their accuracy in predicting maize
genes. FGENESH yielded the most accurate and GeneMark.hmm the second
most accurate predictions" (FGENESH identified 11% more correct gene models
than GeneMark on a set of 1353 test genes).
Science (2002) 296, 79-92. As part of rice genome sequencing project,
the team led by Beijing Genomics Institute has compared several well-known
ab initio gene prediction programs and shown that FGENESH
is by far the most accurate. As a result, their rice genome annotation
was based almost exclusively on FGENESH results.
Application for plant genome analysis in selected publications:
Fgenesh group software:
Nature (2008) 452, 991 - 996;
Nature (2008) 4doi:10.1038/nature07410;
Plant Physiology (2008) 146, 940-951;
Plant Physiology (2008) 146, 200-212;
Molecular Plant (2008) 1(3), 471-481;
Nature Biotechnology (2007) 25, 930 - 937;
Plant Physiology (2007) 144, 623-636; The Plant Cell (2006) 18, 1339-1347;
Nature (2005) 436, 793-800; Nature Genetics (2005) 37, 997-1002; Plant Physiology (2005) 139:1612-1624;
Nature Biotechnology (2005) 23, 482-487; Science (2003) 300, 1566-1569; Nature (2002) 420, 316-320;
Science (2002) 296:79-92; Science (2002) 296, 92-100.
Searching Google Scholar by keywords Fgenesh and plant
yields ~500 research publications that report using Fgenesh
software for gene finding in plants (Arabidopsis, rice, maize, banana,
medicago, poplar, tomato, and many other genomes).
TSSP: Plant Biotechnology Journal (2007)5,5, 664-674; Genetics (2007) 176, 2541-2549;
The Plant Cell (2006) Vol. 18, 2929-2945; Genome (2006) 49, 3, 209-218; Genetica (2006), 128, 395-407;
Bioinformatics (2005), 21,14, 3074-3081; BMC Bioinformatics (2005) 6,114, doi:10.1186/1471-2105-6-114;
The Plant Journal (2004) 37 (4), 517-527; Plant Physiology (2004), 136, 3023-3033.
NSITE-PL:
Journal of Experimental Botany (2008) 59(8), 2043-2056;
Plant Pathology (2008) 57(1), 92-102;
Plant Pathology (2008) 57(1), 92-102;
Plant Physiology (2007), 144, 1786-1796; In Silico Biology (2007) 7, 1,7-19;
The Plant Cell (2007)19,1278-1294;
Biochimica et Biophysica Acta (BBA) (2007) 1769, 2, 139-148; Plant Molecular Biology (2006) 60, 2, 269-275;
The Plant Cell (2006) 18, 2443-2451.
ProtCompPL: Journal of Plant Physiology (2007) 164, 3, 350-363;
In Silico Biology (2007) 7, 1,7-19; Journal of Experimental Botany (2006) 57, 14, 3767-3779;
Molecular Biology Reports (2006), 33,4, 279-285; MPMI (2006)19, 10, 1055-1061;
Genetics and Molecular Biology (2005) , 28, 3 (suppl), 529-538; MPMI (2004) 17, 7, 789-797;
Plant Physiology (2004) 134:286-295; Journal of Cellular Biochemistry (2003) 90:361-378.
Fgenesh, TSSP, NsitePL and ProtCompPL have
also been used in numerous patents and patent applications.
|
|