LCRep-P description

Program for mapping low complexity regions in protein sequences. Search for the low complexity regions is performed with using Shannon's information measure.

Algorithm description

Search for the low complexity regions is performed with using Shannon's information measure. Shannon's information is defiened as follows:


where: {a1, ..., ak} is the alphabet of the size k, and P(ai) is a fractional composition of ai

The search is carried out as follows. For each position i of the sequence S calculation of the Shannon's information H(i, l) is performed in the window of size l within the range [lbegin, lend]. If H(i, l) turns out below prespecified threshold Hthr(l) then fragment [i, i+l] is declared low complex. Intersection of all such fragments at the end of calculation gives a map of low complexity regions of the sequence S.

Output examples


>EXAMPLE SEQ
Masked regions:
p1: 81        p2: 120       l: 40        chain(+) [Low Complexity Region]
p1: 81        p2: 120       l: 40        chain(+) [Low Complexity Region]
p1: 81        p2: 120       l: 40        chain(+) [Low Complexity Region]
p1: 81        p2: 120       l: 40        chain(+) [Low Complexity Region]
....


>EXAMPLE SEQ
ASFDPHEKQLIGDLWHKVDVAHCGGEALSRMLIVYPWKRRYFENFGDISNAQAIMHNEKVQAHGKKVLASFGEAVCHLDG
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXIRAHFANLSKLHCEKLHVDPENFKLLGDIIIIVLAAHYPK
DFGLECHAAYQKLVRQVAAALAAEYHIGDLXXXXXXXXXXXXXXXXXX
....


>EXAMPLE FILE
asfdphekqligdlwhkvdvahcggealsrmlivypwkrryfenfgdisnaqaimhnekvqahgkkvlasfgeavchldg
EEEEEKKKKKEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEirahfanlsklhceklhvdpenfkllgdiiiivlaahypk
dfglechaayqklvrqvaaalaaeyhigdlEEEEEEEEEEEEEEEEEE
....