PROGRAM
ScanWM-PL searches for functional motifs described by weight matrixes of plant regulatory
sequences.
Weight matrixes used with the program are build for a subset of plant regulatory sequences from
RegSite Database developed by Softberry Inc. using published data on transcription regulation
of plant genes.
An assumption used in the program is that if a pattern found in a sequence has a weight greater
than a cut-off value for a corresponding weight matrix, it can be expected that the pattern is a
functional motif, and the sequence analyzed possesses the pattern's function.
OUTPUT
In the output of ScanWM-PL, each query sequence is indicated by its ID line, and then weight
matrix patterns (motifs) found in a query sequence are shown.
Weight matrixes' ID lines include accession numbers of regulatory sites from the original database
(RegSite Database) and additional fields like organism name, gene name, and binding factor name,
if available.
For each motif found on "+" and/or "-" strands of DNA a nucleotide sequence is given as well as
coordinates in a query sequence and a weight calculated based on a corresponding weight matrix.
All motifs are shown in 5' to 3' orientation on corresponding strand of DNA. For motifs found on
"-" strand the 1st coordinate is greater than the 2nd coordinate because coordinates are indicated
relative to the "+" strand corresponding to a query sequence.
An example of output of the program for one query sequence is shown below.
Program ScanWM (Softberry Inc.) Search for motifs by Weight Matrixes of Regulatory Elements Version 1.2004 SET of WMs: derived from subsection of REGSITE DB (Plants; version IV) ____________________________________________________________ File with QUERY Sequences: TEST_SEQ.seq Search PARAMETERS: Threshold type : 2 Threshold value : 0.90 Search for motifs on "+" strand : yes Search for motifs on "-" strand : yes NOTE: WM - Weight Matrix of Regulatory Element AC - Accession No of Regulatory Element in a given DB OS - Organism/Species BF - Binding Factors or One of them ============================================================ QUERY: >At4g00160 [-300,+50] region of F-box family protein Length of Query Sequence: 350 ............................................................ WM: >151. AC: RSP00151//OS: tomato, Lycopersicon esculentum /GENE: Lhcb1*1, Lhcb1*2, Lhca3, Lhca4/RE: CRE, consensus /BF:unknown Motifs on "+" strand (in DIR orientation): Found 1 79 CAAGTACATC 88 7.76 ............................................................ WM: >174. AC: RSP00174//OS: Phaseolus vulgaris /GENE: beta-phaseolin, or phas/RE: ATCATC motif /BF:unknown Motifs on "+" strand (in DIR orientation): Found 2 21 ATCATC 26 7.98 102 ATCATC 107 7.98 ............................................................ WM: >359. AC: RSP00359//OS: barley, Hordeum vulgare /GENE: GCCGAC motif/RE: HVA1s /BF: HvCBF1 Motifs on "-" strand (in INV orientation): Found 1 103 ATCGAC 98 4.73 ............................................................ WM: >707. AC: RSP00707//OS: /GENE: /RE: W-box (consensus 1) /BF: transcription factors of WRKY family Motifs on "-" strand (in INV orientation): Found 3 120 AATGACC 114 4.56 137 AATGACC 131 4.56 286 AATGACT 280 4.42 ............................................................ WM: >722. AC: RSP00722//OS: Nicotiana plumbaginifolia /GENE: rbcS 8B/RE: I-box /BF: unknown transcription factor Motifs on "-" strand (in INV orientation): Found 1 251 GATAAGA 245 9.12 ............................................................ Totally 8 motifs of 5 different WMs have been found ------------------------------------------------------------
MORE INFO
For complete information on ScanWM-PL please consult the program manual (technical notes) or
inquire for additional information if you have any questions.