Services Test Online

PROMH(W) Recognition of human and animal Pol II promoters

(Transcription Start Site and TATA-box)

Method description
To improve promoter identification accuracy achieved by TSSW program, we developed a new program, promH(W), by extending the TSSW program feature set. PromH uses linear discriminant functions that take into account, in addition to features realized in TSSW, conserved features of major promoter functional components, such as transcription start points, TATA-boxes and regulatory motifs, in pairs of orthologous genes aligned by SCAN2 program.
The program was tested on two sets of pairs of orthologous, mostly human and rodent, sequences with known transcription start sites (TSS), annotated to have TATA (21 genes) or TATA-less promoters (38 genes). For the first set, promH(W) correctly predicted TSS for all 21 genes with a median deviation of 2 bp from annotated site location. Only for two genes, there was significant (46 and 105 bp) discrepancy between predicted and annotated TSS positions. For the second set of TATA-less promoters, TSS was predicted for 27 genes, in 14 cases within 10 bp distance from annotated TSS, and in 21 cases - within 100 bp distance. Despite more discrepancies between predicted and annotated TSS for genes from the second set, these results are consistent with observations of much higher occurrence of multiple TSS in TATA-less promoters.

Due to TRANSFAC license limitations, only academic users are allowed to access PromH(W) at our site.

PromH(W) output
An output file begins with description of the Program's allocation, used abbreviations and search parameters (Lines 1-11). Next two lines includes name and length of the first query sequence and the number of predicted promoter regions. Then, positions of predicted sites, their "weights" and TATA-box position (for TATA promoters) are given. After that, functional motifs are given for every predicted region; (+) and (-) reflect direct or complementary chain; $... means a particular motif identificator from TRANSFAC database (Wingender et al., Nucleic Acids Res., 2001, 28, 316-319). Then, the same information is given for second query sequence.
Example of output file


   Program  promHW  (Softberry Inc.)
      Search for TATA+/TATA- promoters in 2 aligned DNA sequences

 NOTE:  PHa  - Homology Level of Aligned Sequences in LOCAL Search Area (-100,TSS+40)
        PHs  - Homology Level of Aligned Sequences around TSS
        PHss - Homology Level of Aligned Sequences to Right from TSS
        PHt  - Homology Level of TATA-boxes in Aligned Sequences
        PHr  - Mean Homology Level of Regulatory Elements in LOCAL Search Area

 Initial / Final Thresholds for TATA+ promoters -   0.10 /  2.50
 Initial / Final Thresholds for TATA-/enhancers -   0.70 /  3.70
 ===========================================================================
 >h-PGAM2 [1:962]/-920:61/ AC J05073 
  Length of sequence-       981
      2 promoter/enhancer(s) have been predicted
  Enhancer Pos:    899 (Weight:   5.79)
      PHa -  68%   PHs - 100%   PHss -  22%   PHr -  76%
 Promoter Pos:  921 (Weight -  3.61) TATA box at: 895 (Weight - 18.51)
      PHa -  66%   PHs -  77%   PHss -  23%   PHt -  70%   PHr -  71%
 
  Transcription factor binding sites:
 for promoter at position -     921
    752 (+) MAIZE$ADH1   CGTGG
    631 (+) Y$ADH2_01    TCTCC
    854 (+) HS$ALBU_02   TTGGCA
    853 (+) MOUSE$A21C   ATTGG
    824 (+) MOUSE$MCK_   cccaaCACCTGCtgcctgagcc
 ...................
 --------------------------------------------------
 >r-PGAM2 [-1181..+800: 1:2160] AC Z17319/   
  Length of sequence-      1300
      2 promoter/enhancer(s) have been predicted
  Enhancer Pos:   1123 (Weight:   3.97)
      PHa -  68%   PHs - 100%   PHss -  22%   PHr -  80%
 Promoter Pos: 1148 (Weight -  2.83) TATA box at: 1119 (Weight - 17.83)
      PHa -  65%   PHs -  88%   PHss -  23%   PHt -  70%   PHr -  82%
 
  Transcription factor binding sites:
 for promoter at position -    1148
    902 (+) Y$ADH2_01    TCTCC
    935 (+) HS$ALBU_02   TTGGCA
   1081 (+) MOUSE$A21C   ATTGG
    942 (+) RAT$EAI_08   ccctgccCAGCTGgc
 ........................................