Genome Alignment pipeline help

Genome Alignment pipeline help

Description

Genomic sequences alignment is to be completed in two stages.

Input data

target_set.fa and query_set.fa files are to be in multifasta format.

1. Produce raw alignment

    Options:
    Chain - strand:
  • direct - in direct only,
  • reverse - in reverse only,
  • both - in both strands.

At the current stage for every sequence of target_set.fa there will be created a binary file in *.da format with the set of all possible variants of its alignments.

File name is being determined by the names of input files and a number of sequence in "target" set, and has the following appearance:
"target_set.fa:query_set.fa:XXXXX:00.da" , where
target_set.fa - name of file with target set
query_set.fa - name of file with query set
XXXXX - number of sequence in target set, and the first sequence has the number 00000, second - 00001 etc.

2. Get optimal coverage from raw alignment

The possible variants of coverage search: with search for optimal way for each query sequence separately, and with search for optimal way for the whole query set. It is also possible to output all found local alignments.

To the program input the *.da file obtained at the previous stage is being set, thus the current step is to be executed individually for every of the target sequences.

    Options:
    Find the optimal path:
  • for each of the query sequence alone - Search for optimal way for every of the query sequences separately.
  • for the entire set of the query sequences - Search for optimal way for the whole set of query sequences.
  • all found alignments - The total list of all found alignments will be output.
  • Way search options are not mandatory, if they are not specified, the default values are to be used.

    Pictures