NSITE can be used for analysis of regulatory regions and composition of their functional motifs.
The method is based on statistical estimation of expected number of a nucleotide consensus pattern in a given sequence [1-3]. NSITE-PL searches for statistically significant functional motifs of plant promoter/regulatory sequences. Plant functionally motifs is selected from RegSite Database developed by Softberry Inc. using published data on transcription regulation of plant genes.
If we find a pattern which has expected number significantly less than one, it can be expected that analyzed sequence possesses the pattern's function.
In the output of NSITE, we can see a pattern, its position in the sequence, accession number, ID, description of motif and binding factor name from the original database, if available.
nsitep Thu Jun 27 20:25:01 EDT 2002 Program N S I T E (Softberry Inc.) Search for motifs of 432 Regulatory Elements from RegSite - The Transcription Regulatory Sites Database (Plants) (http://www.softberry.com) Number of QUERY Sequences: 1 File of QUERY Sequences: /httpd/tmp/loadrun/pssp.seq.176588 Search PARAMETERS: Expected Mean Number : 0.0100000 Statistical Significance Level : 0.9500000 Print Query Sequence : No Special numbering of Query Sequence : No Variation of Distance between RE Blocks: No NOTE: RE - Regulatory Element/Consensus AC - Accession No of RE in RegSite OS - Organism/Species BF - Binding Factor or One of them Mism. - Mismatches Mean. Exp. Number - Mean Expected Number Up.Conf.Int. - Upper Confidence Interval ================================================== QUERY: >Softberry SERVER PAST Sequence Length of Query Sequence: 2975 Nucleotide Frequencies: A - 0.30 G - 0.20 T - 0.26 C - 0.23 .................................................. RE: 21. AC: RSP00021 /OS: Catharantus roseus /GENE: TDC /RE: GT-1#Box5 /BF: GT-1 Motifs on "-" Strand: Mean Exp. Number 0.00915 Up.Conf.Int. 1 Found 1 389 AAAAAGTAAAgA 378 (Mism.= 1) .................................................. RE: 34. AC: RSP00034 /OS: Zea mays /GENE: gamma-27kDa zein /RE: P-box (s) /BF: PB Motifs on "+" Strand: Mean Exp. Number 0.00002 Up.Conf.Int. 1 Found 1 1353 GACGTGTAAAGTAAATTTACAAC 1375 (Mism.= 0) .................................................. RE: 183. AC: RSP00366 /OS: Nicotiana tabacum /GENE: CHN50 /RE: ERE /BF: TDBA12 Motifs on "-" Strand: Mean Exp. Number 0.00849 Up.Conf.Int. 1 Found 1 2826 TGACTTTCTGAt 2815 (Mism.= 1) .................................................. RE: 199. AC: RSP00395 /OS: Zea mays /GENE: gamma-27kDa zein /RE: O2-like-box /BF: Motifs on "+" Strand: Mean Exp. Number 0.00365 Up.Conf.Int. 1 Found 1 1414 TTACGTAGAT 1423 (Mism.= 0) .................................................. RE: 234. AC: RSP00430 /OS: barley /GENE: Hor2 gene /RE: GSN; hor1-box; /BF: BLZ1; Motifs on "+" Strand: Mean Exp. Number 0.00918 Up.Conf.Int. 1 Found 1 1221 GTGAGTCAT 1229 (Mism.= 0) .................................................. RE: 264. AC: RSP00459 /OS: coix /GENE: alpha-coixin /RE: O2u /BF: O2 Motifs on "-" Strand: Mean Exp. Number 0.00384 Up.Conf.Int. 1 Found 1 992 TTGACTAGGA 983 (Mism.= 0) .................................................. RE: 295. AC: RSP00491 /OS: Zea mays /GENE: Zc2 /RE: Zc2 A/T-1 /BF: nuclear factor Motifs on "+" Strand: Mean Exp. Number 0.00000 Up.Conf.Int. 1 Found 1 771 CATATGTTTTATTAAAacAAAaTTTATC 798 (Mism.= 3) .................................................. RE: 296. AC: RSP00492 /OS: Zea mays /GENE: Zc2 /RE: Zc2 A/T-2 /BF: nuclear factor Motifs on "+" Strand: Mean Exp. Number 0.00000 Up.Conf.Int. 1 Found 10 789 AaAatTtatcATATATATATATATATATATATATATATATAT 830 (Mism.= 7) 791 AatTtatcATATATATATATATATATATATATATATATATAT 832 (Mism.= 6) 793 tTtatcATATATATATATATATATATATATATATATATATAT 834 (Mism.= 5) 795 tatcATATATATATATATATATATATATATATATATATATAT 836 (Mism.= 4) 797 tcATATATATATATATATATATATATATATATATATATATAT 838 (Mism.= 2) 799 ATATATATATATATATATATATATATATATATATATATATAT 840 (Mism.= 0) 801 ATATATATATATATATATATATATATATATATATATATATAa 842 (Mism.= 1) 803 ATATATATATATATATATATATATATATATATATATATAata 844 (Mism.= 3) 805 ATATATATATATATATATATATATATATATATATATAatata 846 (Mism.= 5) 807 ATATATATATATATATATATATATATATATATATAatataAa 848 (Mism.= 6) Motifs on "-" Strand: Mean Exp. Number 0.00000 Up.Conf.Int. 1 Found 10 848 tTtatatTATATATATATATATATATATATATATATATATAT 807 (Mism.= 6) 846 tatatTATATATATATATATATATATATATATATATATATAT 805 (Mism.= 5) 844 tatTATATATATATATATATATATATATATATATATATATAT 803 (Mism.= 3) 842 tTATATATATATATATATATATATATATATATATATATATAT 801 (Mism.= 1) 840 ATATATATATATATATATATATATATATATATATATATATAT 799 (Mism.= 0) 838 ATATATATATATATATATATATATATATATATATATATATga 797 (Mism.= 2) 836 ATATATATATATATATATATATATATATATATATATATgata 795 (Mism.= 4) 834 ATATATATATATATATATATATATATATATATATATgataAa 793 (Mism.= 5) 832 ATATATATATATATATATATATATATATATATATgataAatT 791 (Mism.= 6) 830 ATATATATATATATATATATATATATATATATgataAatTtT 789 (Mism.= 7) .................................................. Totally 27 motifs of 8 different REs have been found =========================================================================================
References:
1. Shahmuradov K.A. Kolchanov N.A.Solovyev V.V.Ratner V.A.
Enhancer-like structures in middle repetitive sequences of the
eukaryotic genomes.
Genetics (Russ),22, 357-368,(1986).
2. Solovyev V.V., Kolchanov N.A. 1994,
Search for functional sites using consensus
In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim H.A.),
World Scientific, p.16-21.
3. Solovyev V.V. (2002) Structure, Properties and Computer Identification of Eukaryotic genes. In Bioinformatics from Genomes to Drugs. V.1. Basic Technologies. (ed. Lengauer T.), p. 59 - 111.