Introduction
Running RScan
PATfile format
PATfile: mandatory tags
PATfile: optional tags
CFGfile format
RScan options
Output format
Score calculation
Examples of use
Scanning speed
RScan program is intended for searching occurrences of definite secondary structure patterns in long genomic sequences
RScan is a console application. It can be run as follows:
$ rscan in.fa in.pat o:rs.cfg [options]
Here
in.fa  (where to search) is a FASTAfile with one or many DNA/RNA sequences
in.pat  (what to search) is a PATfile (a file with a description of the secondary structure pattern)
rs.cfg  CFGfile (configuration file)
Below is an example of PATfile for a clover leaf structure:
RNA_TREE_BEGIN F ESL:0.3 E len:0..0 S len:5..10 LEN:45..60 msl:2 tmm_ex L len:4..12 cons_5:AG cons_3:AAA mm_3:1(w:1) S len:3..7 msl:3 tmm_in L len:5..15 len_opt:7(tdev:1,mul:1.5) L len:0..7 S len:3..7 msl:2 L len:5..10 L len:0..7 S len:3..7 L_cons_5:AWAG L_dist_5:1..2 L_mm_5:2(w:1) L len:5..10 L len:0..12 E len:0..0 RNA_TREE_END PSEUDOKNOTS_BEGIN PSEUDOKNOTS_END
Identifiers RNA_TREE_BEGIN, RNA_TREE_END denote the beginning and the end of the topology description section. Pseudoknots section is not processed in the current version. Strings beginning with ';' are comments.
An RNA pattern is a tree of a secondary RNA structure of a certain configuration. Tree elements are nodes of different types ("F"  fictive, "E"  end, "L"  loop, "S"  stem) and edges connecting them. Only node of "F" and "S" types can have descendant nodes, and when listing from 5' to 3' (starting from 1) "L" nodes are odd and "H"/"S" nodes are even. There are only two "E" nodes in a pattern: first and last childes of node "F". Consideration of nodes of type "E" and type "L" in this algorithm is equivalent, so basically "L" type nodes will be mentioned below.
In the description, one line is assigned to one node. Fictive node line begins with "F", end node lines begin with "E", loops  with "L" and stems  with "S". Indent from the left edge equals level of the corresponding node in a tree.
F, E, S, L  Denote node types
len:57  The interval of the allowed length of this element for the nodes E, S, L
"F" node can have two additional tags:
ESL:1.5  outputs occurrences having ES score to length L ratio above 1.5. Here, ES is a negenergy (in kcal/mol) multiplied by 10.
LEN:70..90  limits occurrence length by an interval [70,90]
E and L nodes can have the following additional tags:
len_opt:15(tdev:2.1,mul:1.5) 
option defines the optimal length of the element. If the length of the element differs from the optimal value,
its DS score is penalized according to a certain formula (see below); if the option is absent, length of the element is not penalized (though has an lengthdependant energy)
cons_5:AWGUC  consensus sequence in a 15letter alphabet (IUPAC), which presence is required at the 5'border of the element
mm_5:2(w:1)  there are 2 mismatches allowed in the consensus, weight for mismatch is 1
dist_5:0..2 
allowed shift of the consensus from the 5 'edge of the element to a distance
of 0 to 2 nt. In the description of shifts small absolute negative values are
allowed (for example, 'dist_5:1..2'). A negative shift value means that the
consensus is "sticking out" of the element in the 5' direction (or in the 3'
direction in case of 'dist_5:' option)
dist_5_opt:2(w:1)  The option shows that shift of 2 nt is considered optimal. Deviation of the
shift value from the optimum by every 1 nt is penalized in the DS score with
a weight of 1. If there is no option, there is no penalty.
cons_3:AWGUC  similarly, from the 3'boundary
mm_3:2(w:1)  similarly, from the 3'boundary
dist_3:0..2  similarly, from the 3'boundary
dist_3_opt:2(w:1)  similarly, from the 3'boundary
S nodes may have the following tags:
LEN:70..90  sets the length limits of the subfragment, closed by this stem (including the stem itself)
msl:3  ("max stem loop") Defines the maximum size of the interior loop in the stem,
or its "looseness"; for example, at value 'msl:3' stems
((((...))...)), ((.((...)))..))), ((..((...)).)), ((... ((...)))) are allowed, whereas stems ((.((...))...)) or ((((...))....)) are not allowedtmm_ex  checks pair of nucleotides adjacent to the stem from the "outside" for noncomplementarity. The check is performed only if the loops adjacent to the stem have a nonzero length
Basically, the CFG file is not intended for editing. However, lines beginning with "COMMAND_LINE:" can be edited. The options placed at these lines are the same as allowed in the command line (see next section).
D:N (def.: 0)  strand to search in. N = 0  search in direct strand, N = 1  in the reverse strand, N = 2  in both strands 
P1:N1 P2:N2  search only in interval [N1,N2] 
max_stem_loop:N (def.: 2)  the maximum length of the interior loop in the stem (analogous to the stem option 'msl:' in the PATfile). The value N is set for all stems for which the individual value of this parameter is not set in the PATfile 
tmm_mode:N (def.: 0) 
how to check the noncomplementarity of nucleotides flanking the stems;
N = 0  check only for stems with 'tmm_in' or 'tmm_ex' set on in the PATfile; N = 1  do not check; N = 2  check for all stems where possible 
out_N  outputs fragments containing the unknown nucleotide 'N' 
out_over  outputs all overlapping fragments. Overrides option 'max_len_perc:' 
score_type:N (def.: 1)  which score to optimize, ES (energy), DS (deviation score), or CS (combination score). N = 0  optimize the DS score, which penalizes mismatches with consensus, deviation of the consensus distance from the optimum value to the element boundary, deviation of the element length from the optimal value. N = 1  optimize the ES score. N = 2  optimize the CS score (linear combination of DS and ES scores). More details about the scores can be found below. 
d_e_mul:x:y (def.: 1:1)  weights of DS and ES scores when calculating CS score: CS = x * DS + y * DS . Integer numbers are taken by the option, so for a fine balance higher values can used, for example 'd_e_mul:14:15' 
cmm_mul:N (def.: 10)  the penalty for a mismatch with consensus is multiplied by N. At the default value of the option the mismatch with letters A, U, G, C will give a penalty of 20 * w, mismatch with the letters W, R, M, K, Y, S will give a penalty of 10 * w, mismatch with the letters B, V, H, D will give a penalty of 4 * w, where w is an individual penalty value of every consensus, given by the options looking like ' mm_5:2(w:1)' or 'L_mm_5:2(w:1)' in the PATfile 
score_thr:N (def.: 2147483648)  score threshold. Score is a DS score (at 'score_type:0'), ES score (at 'score_type:1') or CS score (at 'score_type:2'). Fragments having a score value higher than N are not outputted 
node_ener_thr100:N (def.: 2147483648)  the option is associated with the option 'net_loops_n'. During the calculation, discard substructures with ES < (N * L / 100), where L is the length of the fragment. Speeds up the calculation. 
net_loops_n:N (def.: 100)  apply the option 'node_ener_thr100' only for substructures that contain at least N loops 
nothr  disables all the stem energy thresholds and the general score threshold 
max_len_perc:N (def.: 100)  the option is associated with the option 'mlp_loops_n'. Discard all substructures that have a length L > A + (BA) * N / 100, where A is the minimum and B is the maximum possible length of the substructure. 
mlp_loops_n:N (def.: 100)  apply the option 'max_len_perc' only for substructures that contain at least N loops 
max_iloop_len:N (def.: 40)  maximum allowed length of the interior loop between stems ('max_stem_loop'  the same value, but inside the stems) 
max_mloop_len:N (def.: 50)  maximum allowed length of a multiloop (the sum of the lengths of all its arms). Note: sometimes structures with multiloop slightly longer than N, still falls into the output. This happens when a shorter (<=N) multiloop is also possible, but has a lower score however 
max_xloop_len:N (def.: 50)  The maximum allowed length of the "external" loop. An "external " loop is the length of 'x' in a structure like ...(((...)))xxx(((...)))xxx(((...)))... 
out_mode:N (def.: 2)  outputs the result in the following formats: N = 0  occurrences in Vienna and GCG, N = 1  occurrences in FASTA format, N = 2  occurrences in extended Vienna format, N = 3  output in FASTAformat all sequences, if they contain occurrences 
stat  outputs statistics on nucleotides and nucleotide pairs for each occurrence. Works only with 'out_mode:2' option 
del_olap_perc:N  throws out welloverlapping occurrences. More specifically, among all the occurrences, a pair is searched that overlap by more than N% of the length of the shortest of these two occurrences. Throw out a member of a pair having a lower score. Repeat until such pairs are no longer detectable. The type of score is defined by the option 'score_type' 
del_max_len_diff:N (def.: 100)  used in conjunction with the 'del_olap_perc'option (see its description). The overlap of the two occurrences is detected if the relative difference in their length does not exceed N% 
out_best:N  outputs only the N best entries. Since scanning of long sequences is done in chunks, N occurrences are output for each such piece (by default, 30000 nt) 
toL  applies 'out_best' or 'del_olap_perc:'option not to the value of score S, but to the ratio S / L, where L is the length of the occurrence (without taking into account the length of the flanks  nodes of type "E") 
strnum  calculates the number of possible parsings of the [i, j] fragment by the pattern. Slightly slows down the calculation 
progr  outputs progress info 
min_stem_cm:N  sets the minimum required number of matches with consensus, common to the consensus of all stems. Overrides the settings in the PATfile 
stem_cmm_pen:N (def.: 1)  sets the multiplier to the penalty for a mismatch with consensus, common to the consensus of all the stems 
min_loop_cm:N  sets the minimum required number of matches with consensus, common to the consensus of all loops. Overrides mismatch options in the PATfile 
loop_cmm_pen:N (def.: 1)  sets the multiplier to the penalty for a mismatch with consensus, common to the consensus of all loops 
cmm_pen_freq_depend  makes consensus mismatch score depend on the frequencies of nucleotides in the sample 
cons_stickout_max:N (def.: 4)  the maximum possible "stick out" of the consensus beyond its element, allowed in the PATfile 
max_stem_loop:N (def.: 2)  defines the upper size of the interior loop inside all stems, except those for which the individual option 'msl' is set in PATfile 
stem_ener_thr:N (def.: 3)  the stem is considered if its ES> = N * (L1 + L2) / 2, where L1 and L2 are the lengths of the stem arms 
stem_ener_thr_1bp:N (def.: 8)  a 1bp stem is considered if its ES>=T. The threshold T = max (N, M), where M is specified by the 'stem_ener_thr:M' option 
stem_ener_thr_2bp:N (def.: 12)  a 2bp stem is considered if its ES>=T. The threshold T = max (N, 2M), where M is specified by the 'stem_ener_thr:M' option 
stem_ener_thr_3bp:N (def.: 16)  a 3bp stem is considered if its ES>=T. The threshold T = max (N, 3M), where M is specified by the 'stem_ener_thr:M' option 
stable_root  if the stem is the root (or one of the root stems), this option requires that its ES is equal to or greater than the destabilizing contribution of the loop it forms 
nowse 
("no weak stem ends") throw out from consideration stems with weak closing
helices. For example,
uauggg...cccuuuuug ((.(((...)))....)) 
mispair_score:N  allows the formation of all noncanonical pairs, assigning them ES = N. Slows down the algorithm dramatically. Ensures that an occurrence can be found in any fragment of a suitable length 
The following is an example of the recommended type of the RScan output (produced with an option 'out_mode:2'):
>NM:[chr_rand].1 CH:+ X:2023007 L:60 ES:23 DS:51(cons:50) CS:28 LN:2.40000e+002 uaccuuagaauuucauacacggguggcccugccggcaguguguucggcgcacacaaggua ((((((.....(((......)))..(((.....))).((((((.....)))))))))))) AAAAAA.....BBB......BBB..CCC.....CCC.DDDDDD.....DDDDDDAAAAAA ......agaaU.........................aGUg.................... ......ag............................awag.................... ........aaa.................................................
The first line contains:
NM:  the name of the sequence in which the occurrence was found, in square brackets. After the brackets, the number of occurrence in the given sequence and the chain 
CH:  chain ("+" or "") 
X:  position from the 5'end of the current chain 
L:  length 
ES:  energy score. Is equal to the energy in kcal / mol, taken with the opposite sign and multiplied by 10 
DS:  deviation score. Includes penalties for deviations from optimal lengths and for mismatches with consensus (penalty for mismatches is given separately in brackets "(cons:50)") 
‘S:  complex score; is a combination of ES and DS 
LN:  number of the pattern parsings at the given fragment 
The second line contains sequence fragment.
The third line shows the secondary structure in the dotbracket notation.
The fourth line shows the stems mark up. Nucleotides belonging to the same stem
are marked with the same letter.
The fifth line shows the positions of the fragment corresponding to the consensus.
If the position agrees with all overlapping consensus fragments, it is denoted by a
lowercase letter, otherwise  the uppercase.
The sixth line shows the consensus fragments that are associated with the 5edges of the
elements of the pattern.
The seventh line shows the consensus fragments that are associated with the 3edges of
the elements of the pattern.
ES score:
By default, optimization is performed according to the ES score ('score_type:2').
ES score represents the energy (in kcal/mol) multiplied by 10 and taken with the
opposite sign. ES score, like DS score and CS score, is maximized (the higher
means the better).
DS score:
If the option 'score_type:0' is given, the optimization is carried out according
to the DS score (deviation score). DS score is the total of penalties for
mismatches with consensus (where consensus is defined in the PATfile), penalties
for shifting of the consensus from the optimal position, penalties for deviation
of the elements length from the optimal values (only those elements for which the
optimal length values are set). DS score can not be positive.
DS score, consensus:
Let's say you have a PATfile exa.hp.pat looking like this:
RNA_TREE_BEGIN F E len:0..0 S len:2..2 L len:3..10 cons_5:ATGCKYBH mm_5:1(w:1) E len:0..0 RNA_TREE_END PSEUDOKNOTS_BEGIN PSEUDOKNOTS_END
If the weight of the mismatch is set ('w=1' in the PATfile) and the option 'cmm_mul:100' is given, DS score for mismatch with letters A, T, G, C will be 200, DS score for mismatch with letters W(=AT), R(=AG), M(=AC), K(=TG), Y(=TC), S(=CG) will be 100, DS score for mismatch with letters B(=TGC), V(=AGC), H(=ATC), D(=ATG) will be 42:
$ rscan chr.fa exa.hp.cfg o:rscan.cfg score_type:0 cmm_mul:100
Output:
>NM:[ chr].4 CH:+ X:751 L:15 ES:38 DS:200(cons:200) CS:238 LN:1.00000e+000 ugauacgccucgcgc ((........))... AA........AA... ..auAcgccu..... ..augckybh..... ............... >NM:[ chr].7 CH:+ X:1166 L:14 ES:46 DS:42(cons:42) CS:88 LN:1.00000e+000 uaaugcucauaggc ((.......))... AA.......AA... ..augcucAu.... ..augckybh.... .............. >NM:[ chr].16 CH:+ X:4286 L:16 ES:48 DS:100(cons:100) CS:148 LN:1.00000e+000 ggaugcuggcuucgcg ((.........))... AA.........AA... ..augcuGgc...... ..augckybh...... ................
The option 'dist_5_opt:L_opt(w:X'), referring to some consensus in the PATfile, also requires calculating the DS score for the shift of the consensus position from the optimal one. In this case, the deviation is linear: DS = L  L_opt * X , where L_opt  is the optimal value of the consensus shift relative to the element boundary.
DS score, optimal elements length:
Penalties for deviations from optimal lengths of elements are calculated by the
following formula: DS = Mul * ln( ((L  L_opt) / L_tdev) ^ 2 + 1) , where L is
the actual length of the element (or the average length of its arms, when talking
about the stem), L_opt is the optimal length value, L_tdev is a typical deviation,
L_mul is an arbitrary multiplier. The values L_opt, L_tdev, Mul are specified in
the PATfile in the following form: 'len_opt:L_opt(tdev:L_tdev,mul:Mul)', for
example, 'len_opt:6.7(tdev:0.8,mul:10.0)'.
CS score:
If 'score_type:2' is given, the optimization is done by the CS score (complex
score). CS score is calculated as a combination of ES score and DS score:
CS = x*DS + y*ES .
The coefficients x, y are specified by the option 'd_e_mul:x:y'.
In the work directory, some examples of patterns and sets of sequences containing occurrences of these patterns are given. They will be used in the examples below.
Example 1:
$ rscan
Running the program without options gives a help
Example 2:
$ rscan exa.regexp.fa exa.regexp.pat o:rscan.cfg nothr score_type:0 out_best:3
This example shows that RScan can be used to find in primary sequence some kind of regular expressions. The file exa.regexp.pat describes a pattern with zero stems length. It is configured to search for the following context template (regular expression): ATC...HWAGCSS...ATB...AAA...TACGTG...SS...HYWWYSS, in which "..." means intervals of arbitrary (in some limits) length, the allowed number of mismatches in blocks is also somehow adjusted. The option 'nothr' removes all energy thresholds (since in this case the energy is not of interest to us), the 'score_type:0' option requires optimizing the DS score (in this case, to minimize the number of mismatches with consensus). Output:
>NM:[ RegExpExampleSeq].1 CH:+ X:8 L:20 ES:83 DS:204(cons:204) CS:287 LN:1.00000e+000 ggcuaagaaagcuuauuagc .................... .................... GGc....aAAgCUUAuUAg. auc....aaaguacgugss. .................... >NM:[ RegExpExampleSeq].2 CH:+ X:17 L:47 ES:130 DS:94(cons:94) CS:224 LN:1.00000e+000 agcuuauuagcgauaauucuccuauaugccuucauauuaugcagccg ............................................... ............................................... aGc...uuagcgAuAaU.....uaUAugcc.....auuaugc..... auc...hwagcsauaaa.....uacgugss.....hywwyss..... ............................................... ...
Example 3:
$ rscan exa.ires.fa exa.ires.pat o:rscan.cfg out_best:1
This example shows the use of RScan for searching the internal ribosome entry site (IRES). A pattern in the file exa.ires.pat is described fairly strictly and finds 72 occurrences of 520 IRES entries (from RFAM 12.0), and zero occurrences per 10MB of a random sequence. Output:
>NM:[L02971.1/237705].1 CH:+ X:158 L:79 ES:139 DS:60(cons:60) CS:79 LN:1.00000e+000 auccuagugccagcggaacaacaucugguaacagaugccucuggggccaaaagccaagguuugacagacccauuaggau (((((((((((((.((......(((((....))))).)).))))(((.....)))..(((((...)))))))))))))) AAAAAAAAABBBBBBB......CCCCC....CCCCC.BBBBBBBDDD.....DDD..EEEEE...EEEEEAAAAAAAAA .........ccagcggaacaAcAUcugguaa.............ggccaaaa...aAgGuu.................. .........syrbsggaahhccymyykgura.............ggccraaa...aygyby.................. ............................................................................... ...
Example 4:
$ rscan exa.trna.fa exa.trna.1.pat o:rscan.cfg
This example shows the use of RScan for searching tRNA. The exa.trna.1.pat pattern is defined in such a way that it finds 2,508 occurrences of 3,514 tRNAs (from RFAM 12.0) and 1 occurrence per 1,000 nt of a random sequence. The pattern uses 4 consensus blocks and the energy threshold ESL = 0.5 (that is, Energy / L <= 0.05 kcal/mol/nt is required). Output:
>NM:[DR1281].1 CH:+ X:1 L:71 ES:59 DS:20(cons:20) CS:39 LN:1.00000e+000 cauucauagcucaauuggauagagcggcggacuucgaauccgaagguugcagguucgacuccugcugagug ((((((..((((.........))))..((((.......))))......(((((.......))))))))))) AAAAAA..BBBB.........BBBB..CCCC.......CCCC......DDDDD.......DDDDDAAAAAA ......uagc.....uggA............cu...aa...............uucgacucc......... ......urgc.....uggu............cu....................uucranucc......... ....................................ra................................. ...
Example 5:
$ rscan exa.trna.fa exa.trna.2.pat o:rscan.cfg score_type:2 d_e_mul:5:3
This example also shows the use of RScan for searching tRNA. The exa.trna.2.pat pattern is defined in such a way that it finds 2,508 occurrences of 3,514 tRNAs (from RFAM 12.0), and at a given CS threshold ('score_thr:0') detects 1 occurrence in approximately 360,000 nt of a random sequence. The pattern does not fix consensual nucleotides, but it sets optimal lengths of stems and loops and penalizes deviations from them. The optimization is performed according to CS score ('score_type:2'), the ratio of DS and ES score in CS score is set to 5:3 ('d_e_mul:5:3'). Output:
>NM:[DR1281].1 CH:+ X:0 L:73 ES:92 DS:24(cons:0) CS:156 LN:1.00000e+000 gcauucauagcucaauuggauagagcggcggacuucgaauccgaagguugcagguucgacuccugcugagugc (((((((..((((.........))))..((((.......))))......(((((.......)))))))))))) AAAAAAA..BBBB.........BBBB..CCCC.......CCCC......DDDDD.......DDDDDAAAAAAA ......................................................................... ......................................................................... ......................................................................... >NM:[DH9330].1 CH:+ X:0 L:71 ES:121 DS:4(cons:0) CS:343 LN:1.00000e+000 gccgugaucguauagggguuaguacucugcguuguggccgcagcaaccucgguucgaauccgagucacggc (((((((..((((........)))).(((((.......)))))....(((((.......)))))))))))) AAAAAAA..BBBB........BBBB.CCCCC.......CCCCC....DDDDD.......DDDDDAAAAAAA ....................................................................... ....................................................................... ....................................................................... ...
Example 6:
$ rscan exa.trna.fa exa.trna.2.pat o:rscan.cfg mispair_score:50 out_best:1
There is often a case when the pattern is not found in the sequences in which it should be present. In these cases, it is recommended to use the 'mispair_score:N' option, which allows any noncanonical pairs. Often this allows to identify the shortcomings of the pattern and correct it properly. Output:
>NM:[DM2440].1 CH:+ X:0 L:73 ES:32 DS:157(cons:0) CS:189 LN:1.00000e+000 gccugcuuagcucaguugguuagagcguccguuucauaagcugauugucacuaguucaaaucuaguagcaggc (((((((((((...)))).(((((((....)).)).)))..........(((((<(...)>)))))))))))) AAAAAAABBBB...BBBB.CCCCCCC....CCCCCCCCC..........DDDDDDD...DDDDDDDAAAAAAA ......................................................................... ......................................................................... ......................................................................... >NM:[DN1140].1 CH:+ X:0 L:72 ES:8 DS:135(cons:0) CS:143 LN:1.00000e+000 ggcuuuuuagcucagcagguagagcaaccggcuguuaaccgguuugucacagguucgagcccuguaaaagcc (((((((..((((<(....)>))))((((((<(...)>))))))....(((((<(...)>)))))))))))) AAAAAAA..BBBBBB....BBBBBBCCCCCCCC...CCCCCCCC....DDDDDDD...DDDDDDDAAAAAAA ........................................................................ ........................................................................ ........................................................................Here the noncanonical pairs are denoted by angular brackets, like in this case:
gggaaaaauugcc ((<((...))>))
Introduction
Running RInf
PATfile format
CFGfile format
RInf options
Output format
Examples of use
RInf program is intended for estimation of frequency (or bitscore) of occurrence of the pattern with a definite secondary structure in long random sequence. It is implied that the sequence is generated according to the Bernoulli scheme with uniform nucleotide distribution. The program takes into account the shape of the structure and the energy threshold.
The current version does not take into account the contextual constraints imposed on the pattern. Only a rough estimation of the information content of contextual requirements is given in isolation from the structural one.
Frequency estimation is done by any combination of three algorithms:
(A) by scanning a random sequence,
(B) by estimating the properties of pseudorandom (generated) occurrences,
(C) by linear regression from some statistics of the pattern, including
characteristics of its frequencyenergy spectrum.
Algorithm (A) works only with relatively frequently occurring patterns (with a
frequency from 10^{7} to 1).
Algorithm (B) can evaluate any pattern, but it needs to generate tens of thousands
of pseudooccurrences, which can sometimes take tens of minutes.
Algorithm (C) gives a less accurate estimate than (A) and (B), working at the same time
no longer than one minute.
Note: Frequency estimates can sometimes exceed 1. This is because several occurrences with a different length may start from or end in the same position, whereas as an estimate the sum of frequencies over all possible lengths of occurrences is considered.
See detailed method description here.
RInf is a console application. It can be run as follows:
$ rinf x in.pat o:rinf.cfg [options]
Here
x  1^{st} argument is an empty option
in.pat  is a PATfile (a file with a description of the secondary structure pattern)
rinf.cfg  CFGfile (configuration file)
The description of the PATfile can be found at the Rscan help page.
Basically, the CFG file is not intended for editing. However, lines beginning with "COMMAND_LINE:" can be edited. The parameters located on these lines are the same as the command line parameters.
sec_scan:N (def.: 30)  Search for occurrences in a random sequence no longer than N seconds 
sec_imit:N (def.: 30)  Simulate random occurrences no longer than N seconds 
vol_scan:N (def.: 1000)  Stop searching for occurrences in a random sequence after finding N occurrences. Note: the search for random occurrences stops when one of the thresholds is reached: 'sec_scan:X' or 'vol_scan:Y' 
vol_imit:N (def.: 1000)  The amount of pseudorandom occurrences to be simulated. Note: simulating pseudorandom occurrences is stopped when one of the thresholds is reached: 'sec_imit:X' or 'vol_imit:Y' 
vol_imit_min:N (def.: 50)  Until this amount of statistics is reached, the 'sec_imit:X' and 'vol_imit:Y' options do not work 
noscan  Skip scanning algorithm 
noimit  Skip the algorithm generating pseudooccurrences 
nospec  Skip spectrum estimation algorithm 
max_iloop_len:N (def.: 40)  Same as in the RScan program. Maximum allowed length of the interior loop between stems 
max_mloop_len:N (def.: 50)  Same as in the RScan program. Maximum allowed length of the multiloop (the sum of all its arms length) 
max_xloop_len:N (def.: 50)  Same as in the RScan program. Maximum allowed length of "external" loop. An "external" loop is the length of 'x' in a structure not closed by stem, like ...(((...)))xxxx(((...)))xx(((...)))... 
Below is an example of the basic type of RInf output:
 Summ of frequences (partition function), without energy threshold: 4.625e+001 Estimated frequence , without energy threshold: 3.191e+000 Primary sequence consensus bits: 13.87 Shape bits: 0.00  Summ of frequences (partition function), above energy threshold: 4.827e001 Estimated frequence , above energy threshold: 9.768e002 Shapeenergy bits: 3.36  ES/L ES/L ES/L Scan. Scan. Scan. Scan. Imit. Imit. Linear. Imit. Part. Part. Part. Number Most Number Esti Thresh. Expect. Stdev Observed Estim. Estim. Estim. Var_#. Mism_#. Combin. Stat. Func. Func. Func. Of Freq. Of St. Ave. ByPart. ByPart Freq. Freq. Freq. Vol. Estim. Estim. Var_# Vol. Estim. Estim. Possible Struct. In the Occ. Func. Func. Norm.Distr Norm.rTail Freq. Freq. Mism_# Freq.1 Freq.2 Forms Freq. Pattern Len.  100.00 0.81 0.80 8.90e01 8.90e001 8.90e001 8626 1.62e+00 8.69e01 9.71e01 9310 4.62e+01 3.19e+00 3.19e+00 3.91e+07 1.20e003 3 70.05 2.00 0.81 0.80 8.82e01 8.82e001 8.82e001 8549 1.60e+00 8.59e01 9.59e01 9272 4.36e+01 2.85e+00 6.05e+00 3.91e+07 1.20e003 3 70.05 1.50 0.81 0.80 8.51e01 8.53e001 8.51e001 8247 1.54e+00 8.24e01 9.20e01 9023 3.84e+01 2.45e+00 6.19e+00 3.91e+07 1.20e003 3 70.05 1.00 0.81 0.80 7.70e01 7.65e001 7.67e001 7460 1.34e+00 7.22e01 8.04e01 8091 2.91e+01 1.85e+00 5.31e+00 3.91e+07 1.20e003 3 70.05 0.50 0.81 0.80 5.99e01 5.93e001 5.96e001 5802 9.78e01 5.25e01 5.84e01 5860 1.76e+01 1.19e+00 3.52e+00 3.91e+07 1.20e003 3 70.05 0.00 0.81 0.80 3.73e01 3.68e001 3.72e001 3611 5.37e01 2.89e01 3.19e01 3128 8.00e+00 6.22e01 1.69e+00 3.91e+07 1.20e003 3 70.05 0.50 0.81 0.80 1.67e01 1.72e001 1.74e001 1618 2.07e01 1.12e01 1.22e01 1111 2.69e+00 2.60e01 5.75e01 3.91e+07 1.20e003 3 70.05 1.00 0.81 0.80 5.24e02 5.80e002 5.84e002 508 5.51e02 2.97e02 3.21e02 302 6.70e01 8.55e02 1.37e01 3.91e+07 1.20e003 3 70.05 1.50 0.81 0.80 1.62e02 1.37e002 1.37e002 157 1.02e02 5.49e03 5.83e03 63 1.23e01 2.21e02 2.32e02 3.91e+07 1.20e003 3 70.05 2.00 0.81 0.80 5.16e03 2.22e003 2.20e003 50 1.35e03 7.32e04 7.62e04 7 1.76e02 4.53e03 2.95e03 3.91e+07 1.20e003 3 70.05 2.50 0.81 0.80 5.16e04 2.45e004 2.39e004 5 1.36e04 7.39e05 7.52e05 1 1.94e03 7.38e04 2.81e04 3.91e+07 1.20e003 3 70.05 3.00 0.81 0.80 0.00e+00 1.82e005 1.74e005 0 1.08e05 5.88e06 5.86e06 0 1.71e04 9.64e05 2.07e05 3.91e+07 1.20e003 3 70.05 3.50 0.81 0.80 0.00e+00 9.05e007 8.47e007 0 7.03e07 3.79e07 3.71e07 0 1.25e05 1.02e05 1.24e06 3.91e+07 1.20e003 3 70.05 4.00 0.81 0.80 0.00e+00 3.01e008 2.74e008 0 3.70e08 1.98e08 1.90e08 0 7.55e07 8.76e07 6.10e08 3.91e+07 1.20e003 3 70.05 4.50 0.81 0.80 0.00e+00 6.66e010 5.87e010 0 1.55e09 8.21e10 7.76e10 0 3.87e08 6.13e08 2.47e09 3.91e+07 1.20e003 3 70.05 5.00 0.81 0.80 0.00e+00 9.79e012 8.32e012 0 5.05e11 2.66e11 2.46e11 0 1.64e09 3.50e09 8.05e11 3.91e+07 1.20e003 3 70.05
The first 4 lines contain:
 The sum of the frequencies of different shapes of the pattern (partition
function) without taking energy into account
 Estimate of the frequency of the pattern without taking energy into account
energy
 Estimate of the information content of contextual constraints of the pattern
 Estimate of the information content of the structure (shape) of the pattern
The following 3 lines contain:
 The sum of the frequencies of different shapes of the pattern (partition
function), taking into account the energy threshold
 Estimate of the frequency of the pattern, taking into account the energy
threshold
 Estimate of the information content of the structure (shape) of the pattern,
taking into account the energy threshold
Below follows the table:
Column 1  ES/L threshold, where ES is a negenergy (in kcal/mol) multiplied by 10, L is
a length of occurrence
Column 2  value ES/L, which cuts out half of the occurrences, averaged over all
shapes of the pattern
Column 3  standard deviation of the ES/L value
Column 4  Observed frequency by scanning algorithm
Column 5 is the same as column 4, but asymtotically approximated by the normal distribution
Column 6 is the same as column 4, but asymtotically approximated by an asymmetric
distribution with a light tail
Column 7  the volume of statistics, collected by the scanning algorithm
Column 8  estimate of the pattern frequency using the algorithm generating pseudorandom
occurrences. The estimate (still not taking energy into account) is a sum of
the frequencies of all shapes of the pattern, divided by the average
number of variants of the pattern placement on a fixed fragment of
the sequence. The obtained value is multiplied then by the fraction
of occurrences having ES/L score above the threshold from column 1
Column 9  estimate of the pattern frequency using the algorithm generating pseudorandom
occurrences. The estimate (still not taking energy into account) is the
frequency of the occurrence of fragments that do not require the
replacement of noncanonical pairs for the perfect correspondence
of the fragment to the pattern. The obtained value is multiplied then
by the fraction of occurrences having ES/L score above the
threshold from column 1. The estimate is more rough than in column 8
Column 10 is a linear combination of values from columns 8 and 9, better than both of them
Column 11  the volume of statistics of pseudorandom occurrences
Column 12  the sum of frequencies (partition function) of different shapes of the pattern
with ES/L score exceeding the threshold from column 1
Column 13  estimate of the frequency based on the assumed normality of the
energy distribution of the occurrences. This estimate is better than
in column 14 in the area of high frequencies
Column 14  estimate of the frequency by linear regression, in which the most
meaningful regressor is the value from column 12, the other regressors
are other statistics of the pattern. This estimate is somewhat better
than in column 13, especially in the area of lower frequencies
Column 15  the exact number of possible pattern shapes
Column 16  the frequency of the most frequent shape of the pattern (only the shape,
without taking energy into account)
Column 17  number of stems in the pattern
Column 18  estimate of the average occurrence length
In the work directory, some examples of patterns are given. They will be used in the examples below.
Example 1:$ rinfRunning the program without options gives a help
$ rinf x exa.secis.pat o:rinf.cfgThis example shows the easiest way to launch RInf. All 3 evaluation algorithms are runned: scanning, generation of pseudorandom occurrences, and regression by spectrum characteristics
$ rinf x exa.secis.pat o:rinf.cfg noscan noimitSame as in the previous example, but only the regression algorithm is runned
$ rinf x exa.secis.pat o:rinf.cfg noscan noimit nospecDo not perform any of the algorithms that take energy into account. An estimate is given only for the shape of the pattern. Keeps within one second.
$ rinf x exa.secis.pat o:rinf.cfg sec_scan:3600 vol_scan:100000Scan at least 1 hour. Scanning will be interrupted if statistics volume of 100,000 occurrences is reached
Introduction
Running b2t
b2t options
Output format
Examples of use
b2t ("bracket to tree") program is designed to generate files with patterns of secondary structure (PATfiles), which are accepted by the program Rscan. The input data of b2t is a dotbracket RNA secondary structure which can be obtained by any RNA folding program.
b2t is a console application. It can be run as follows:
$ b2t file:in.file [options]
Here
in.file is a file consisting of 2 or 3 lines. The first line should contain the RNA
sequence; its dotbracket structure should follow in the second line; the third
line can (optionally) contain the primary sequence constraints in 15letter (IUPAC) alphabet.
An example of the contents of the input file is shown below:
gcaugcaagccgcgggaacucccccuuggugacaaggacccgcggggccaaaagccacguucucugaaccuugcaugu ((((((((((((((((.......(((((....))))).)))))))(((.....)))..((((...))))))))))))) .............SGSMA..........DDDD........................AC....................
read_cons_str 
Read third string determining primary sequence requirements (in 15letters code). When the option is ON, input file should look
like this:
aagcgacccucgcaa ..((((...)))).. AASC..MC.......otherwise like this: aagcgacccucgcaa ..((((...)))).. 
max_stem_loop:N (def.: 2)  Max allowed stem defect size (sum of left and right internal loop arms) 
min_stem_len:N (def.: 2)  Lowest allowed min stem length limit (value of 'A' in 'len:A..B' section of the description of a stem). Alowed values are 0, 1, 2, 3. 
consider_lp  Consider lonely pairs. By default, nucleotides of lonely pairs in unput structures are dropped. 
st_A: (def.: 1.0)  is A for stems 
st_B: (def.: 1.0)  is B for stems 
st_C: (def.: 0.5) 
is C for stems

hp_A: (def.: 1.0)  is A for hairpin loops 
hp_B: (def.: 1.0)  is B for hairpin loops 
hp_C: (def.: 0.5) 
is C for hairpin loops

in_A: (def.: 1.0)  is A for internal loops 
in_B: (def.: 1.0)  is B for internal loops 
in_C: (def.: 0.5) 
is C for internal loops

ex_A: (def.: 1.0)  is A for external loops 
ex_B: (def.: 1.0)  is B for external loops 
ex_C: (def.: 0.5) 
is C for external loops

mu_A: (def.: 1.0)  is A for multiple loops 
mu_B: (def.: 1.0)  is B for multiple loops 
mu_C: (def.: 0.5) 
is C for multiple loops

sp_A: (def.: 1.0)  is A for a subpattern closed with a stem 
sp_B: (def.: 1.0)  is B for a subpattern closed with a stem 
sp_C: (def.: 0.5) 
is C for a subpattern closed with a stem

sp_randshift  Select interval of subpattern LEN randomly. Pattern produced with 'sp_randshift' may not catch original structure 
sp_sometimes  Set subpattern LEN limits only for some (randomply chosen) nodes 
sp_never 
Cancels setting LEN limits for subpatterns

tl_A: (def.: 1.0)  is A for total pattern length; if 'tl_A' <= 100, total pattern len is made fixed (LEN:X..X) 
tl_B: (def.: 1.0)  is B for total pattern length 
tl_C: (def.: 0.5)  is C for total pattern length 
tl_randshift 
Select interval of total LEN randomly. Pattern produced with 'tl_randshift' may not catch original structure

al_A: (def.: 1.0)  is A for all elements above (including total length) except specified 
al_B: (def.: 1.0)  is B for all elements above (including total length) except specified 
al_C: (def.: 0.5) 
is C for all elements above (including total length) except specified

relax_stem_loop:N (def.: 0)  Increase 'max_stem_loop' for all stems in the output. 
mark:s  Put mark string into output (in comments). 
The description of the resulting PATfile can be found at the Rscan help page.
The work directory contains an example file with a dotbracket secondary structure: exa.b2t.in. It will be used in the following examples.
Example 1:$ b2tRunning without options gives a help
$ b2t file:exa.b2t.in > exa.b2t.patThis example shows the easiest way to run b2t. The output of the program is redirected to the file exa.b2t.pat
$ b2t file:exa.b2t.in read_cons_str > exa.b2t.patSame as in the previous example, but the program will try to read the third line of the input file. The third line must have the same length as the sequence fragment in the first line and the dotbracket structure in the second line. The line must consist only of dots ('.') and 15letter alphabet characters (IUPAC). Letters will be converted into consensus requirements
$ b2t file:exa.b2t.in max_stem_loop:0 > exa.b2t.patA stronger restriction on stems defects is established: 'max_stem_loop:0' option, for example, will not allow two adjacent stems consisting of the three pairs (in the input) to be combined into one stem (in the output) consisting of six to seven pairs in the following structure: (((.(((...))))))
$ b2t file:exa.b2t.in al_A:2 > exa.b2t.patAllow a wider range of length than the default value for all element types.