Services Test Online

Content

1. Introduction

The "MALI viewer" application is developed for aligning nucleotide sequences and working with results of such alignments:

  • Aligning sequences
  • Editing sequences
  • Using variable color schemes for residues
  • Building tree diagrams (dendrograms) for sequences
  • Analyzing main components

2. General system description

2.1. Main window

The main window contains (fig. 2.1.1):

  • The main menu
  • The control panel
  • The alignment area
  • The alignment's histogram area
  • Information bars
  • The string of status

Main window
Figure 2.1.1.

1. Main menu. 2. Control panel. 3. Alignment area. 4. Information bars. 5. String of status.

2.2. Commands of the main menu

    Action - the group of commands for loading and calculating alignments:
  • Input align - opens the Alignment dialog box purposed for defining the alignment to be loaded and its format.
  • Output align - opens the Alignment dialog box purposed for retrieving the currently loaded in this window alignment in selected format.
  • Send postscript - opens the Export postscript file dialog box purposed for obtaining the postscript file.
  • Process alignment - opens the Load dialog box purposed for calculating a new alignment.
  • Realign all sequences - opens the Load dialog box purposed for recurring aligning all sequences.
  • Realign selected sequences - opens the Load dialog box purposed for aligning the selected sequences.
  • Exit - closing the application.
    Edit - the group of commands for editing sequences and their groups:
  • Groups… - opens the Group properties dialog box purposed for working with groups of sequences.
  • Groups editing mode - provides (when enabled) the simultaneous editing all sequences in the same group.
  • Select all sequences - makes all sequences selected.
  • Deselect all sequences - makes all sequences unselected.
  • Invert sequence selection - makes the selection of sequences inverted.
  • Delete selected sequences - removes selected sequences.
  • Move selected sequences to new alignment - replaces selected sequences to a new window.
  • Copy selected sequences to new alignment - creates copies of selected sequences in a new window.
  • Deselect all columns - makes all columns unselected.
  • Delete selected columns - removes selected columns.
  • Remove sequence <- left of selected columns - removes the fragments of sequences to the left of selected columns.
  • Remove sequence -> right of selected columns - removes the fragments of sequences to the right of selected columns.
  • Set gap character to - allows selecting the symbol for gap: "." or "-".
  • Remove gapped columns - removes columns consisting of gaps only.
    View - the group of commands for setting up the image display:
  • Options - opens the Options dialog box purposed for setting up the visualization of sequences in the main window.
    Color - the group of commands for setting up the color schemes for applying to nucleotides of alignment. The following color schemes are available in the application (the schemes' description see in chapter 3.3):
  • By PID
    Calculate - the group of commands for processing the calculations:
  • Consensus - calculates a consensus.
  • Conservation - calculates the conservatism of sequences. The conservatism is calculated in groups.
  • Remove redundancy - removes the sequences with similarities to consensus below the threshold defined.
  • Sort - Sequences' sorting modes:
    • Sort by pairwise identity - sorts by pairwise identity.
    • Sort by ID - sorts by alphabetical order. Sequences' IDs are used for sorting.
    • Sort by group ID - sorts by groups first and then by alphabetical order inside a group.
    • Sort by tree - sorts sequences by their order on the tree.
  • Show tree - opens the Tree diagram dialog box purposed for visualization of hierarchic clustering of sequences (the tree of sequences).
  • Principal Component Analysis - analysis of principal components:
    • Calculate - calculates principal components.
    • PCA 2D sequences - opens the Sequence PCA Diagram 2D window displaying two-dimensional plot of sequences' principal components.
    • PCA 3D sequences - opens the Sequence PCA Diagram 3D window displaying three-dimensional plot of sequences' principal components.
    • PCA 2D nucleotides - opens the Nucleotide PCA Diagram 2D window displaying two-dimensional plot of nucleotides' principal components.
    • PCA 3D nucleotides - opens the Nucleotide PCA Diagram 3D window displaying three-dimensional plot of nucleotides' principal components.
  • Calculate all - calculates consensus, conservatism and principal components.
    Help - the group of commands for opening the boxes with information on the application:
  • About - opens the About dialog box with information on the application.

2.3. Control panel

Functions of control panel buttons:

Select all sequences- makes all sequences selected.
Deselect all sequences - makes all sequences unselected.
Inverse sequence selection - makes the selection of sequences inverted.
Delete selected sequences - removes selected sequences (strings).
Delete selected columns - removes selected columns.
Move selected sequences to new alignment - replaces selected sequences to a new alignment window.
Copy selected sequences to new alignment - creates copies of selected sequences in a new alignment window.
Deselect all columns - makes all columns unselected.
Remove sequence <- left of selected columns - removes the fragments of sequences to the left of selected columns.
Remove sequence -> right of selected columns - removes the fragments of sequences to the right of selected columns.
Tree - opens the dialog box with tree diagram.
Options - opens the configuration dialog box.

2.4. Alignment area

This area is purposed for visualizing and editing sequences of alignment.

Single left mouse button click over a symbol marks this symbol with color and changes its style to italic, and similar symbols that are in the same positions in all sequences are being marked also (see fig. 2.1.1).

To move a fragment of sequence press the "Shift" key, place mouse pointer over a symbol, press the mouse left button and then drag a symbol. If to drag a symbol to the right, then a gap will appear in the original position of symbol and sequence will be displaced in chosen direction. If the Edit>Groups editing mode is on, then gap insertion occurs in all sequences of the selected group. If to drag a symbol to the left, then gap symbols will be replaced by the dragging fragment.

2.4.1. Replacing a symbol in a sequence.

Double left mouse button click over a sequence's symbol opens the Symbol selection dialog box (fig. 2.4.1.1). The box contains the list of all symbols and their meaning. The clicked symbol is marked in the list. To replace this symbol with another one select the new symbol in the list and press the OK button. The Symbol selection dialog box will be closed and the new symbol (letter) will appear in a sequence.

 
Figure 2.4.1.1.

1. List of symbols. 2. Button for closing the window and applying changes. 3. Button for closing the window and rejecting changes.

2.5. Information bars

The left information bar (fig. 2.1.1) displays the summary on a sequence, the symbol of which is currently pointed by the mouse: ID, similarity to consensus and the ID of sequence with the most homology to the current one.

The right information bar (fig. 2.1.1) displays the summary on a symbol currently pointed by the mouse: name of a symbol, the rate of occurrence in the given column and the name of a symbol in consensus of current position.

3. Operating the application

3.1. Loading a file, calculating an alignment

Loading an alignment

    The Action>Input align command opens the Alignment dialog box purposed for loading an alignment (fig. 3.1.1):
  • The entry field for alignment input
  • The Alignment format list provides the available formats of alignment.
    If the wrong format is selected, the loading will not occur.
    The list contains the following formats:
  • FASTA
  • PIR
  • CLUSTAL
  • PFAM
  • POSTAL
  • BLC
  • MSF

The Load in new frame checkbox. If the box is checked in, the alignment will be loaded in the new window.
The OK button is purposed for closing the window and loading an alignment.
The Apply button is purposed for loading an alignment without closing the window.
The Cancel button is purposed for canceling the loading.

 
Figure 3.1.1.

1. Alignment entry field. 2. List of alignment formats. 3. "Load in new frame" checkbox. 4. "OK" button. 5. "Apply" button. 6. "Cancel" button.

Outputting alignment

    The Action>Output align command opens the Alignment dialog box purposed for retrieving an alignment in the defined format (fig. 3.1.2).
  • The alignment entry field is purposed for inputting an alignment.
  • The Alignment format list is purposed for defining the format of alignment.
    The list contains the following formats:
  • FASTA
  • PIR
  • CLUSTAL
  • PFAM
  • POSTAL
  • BLC
  • MSF


The OK button is purposed for closing the window and outputting an alignment.
The Apply button is purposed for inputting an alignment from the main window into the alignment entry field in the defined format.
The Cancel button is purposed for canceling the alignment outputting.

 
Figure 3.1.2.

1. Alignment entry field. 2. List of alignment formats. 3. "OK" button. 4. "Apply" button. 5. "Cancel" button.

Downloading an alignment from the server

    The Send postscript command opens the Export postscript file dialog box purposed for downloading the post-script file from the server (fig. 3.1.1) by the following means:
  • Only send e-mail - to send the file with data via e-mail (the address is to be specified in the "E-mail" field).
  • Only get file - to download the file.
  • Send mail and get file - both to receive the file via e-mail (the address is to be specified in the "E-mail" field) and download it.

The Options button opens the Postscript options dialog box (fig. 3.1.1.2) purposed for defining the parameters of data file (size and layout of a page, font).
The OK button is purposed for closing the window and applying the selected changes.
The Cancel button is purposed for canceling all changes.

 
Figure 3.1.1.1.

1. The e-mail address entry field. 2. Panel for selecting an action. 3. Button for opening the options dialog box. 4. The "OK" button. 5. The "Cancel" button.

 
Figure 3.1.1.2.

1. Panel for selecting size of paper. 2. Panel for selecting layout of paper. 3. Panel for selecting font. 4. Button for setting default parameters. 5. The "OK" button. 6. The "Cancel" button.

Calculating an alignment

  • Process alignment - opens the Load dialog box purposed for calculating an alignment.
  • Realign all sequences - opens the Load dialog box purposed for realigning all sequences.
  • Realign selected sequences - opens the Load dialog box purposed for aligning the selected sequences.

"Load dialog" box

To compare sequences one of two algorithms (or their combination) can be used:
1. An alignment is calculated using a dendrogram;
2.A concurrent performing an alignment and building a similarity dendrogram.

Algorithm 1 and Algorithm 2 can be used in combination. In the beginning of analysis it is more effective to use Algorithm 1, while at the following steps - Algorithm 2.

For taken sequences using the sequences similarity measure the matrix of pair similarity is being calculated. The pair of sequences with maximal similarity is being selected. The appropriate sequences are being removed from the total pool and instead of them the "object" consisting of these two sequences (profile of the pair alignment) is being added. Combining these two sequences is the first knot of dendrogram. The remained pool (its size is reduced by 1) is being used for calculating the matrix on the basis of sequences similarity measure and the process is being repeated. The similarity of the matrix's objects every time is being calculated anew. This algorithm provides more precise results but is time consuming. Each iteration reduces the size of a matrix by 1. The total number of iterations in this algorithm is equal to n-1, where n - the original size of a pool. Every next iteration provides the new dendrogram's knot, and in the end of aligning we have the one sequence, which represents the profile of aligning of all sequences from the original pool (multiple alignment) , and a dendrogram, which represents the joining order of this sequences.

Algorithm 1 (ordered by dendrogram).

A dendrogram can be either taken from a file or built with use of one of matrix clustering methods (UPGMA, WPGMA, NN, FN).

It is known that for these methods only the table of pair similarity of objects is required. Applying the sequences similarity measure serially to all pairs the required matrix (as for algorithm 2) can be obtained. Further, using the selected clustering method the dendrogram can be built.

The sequences similarity measures for algorithms 1 and 2 may differ.

    As the similarity measure the following ones may serve:
  • In the case of alignment the following measures can be:
    • The sum of alignment blocks' scores.
    • The score of alignment representing the probability of its appearance by occasional reasons.
    • The score of the best block of alignment.
    • The total score of alignment including penalties for gaps etc.
  • In the case of no alignment the following measures can be:
    • The maximal number of common L-plets.

Tab "Sequences"

    The Sequences tab (fig. 3.1.1.3) is purposed for loading sequences to be aligned. The Source data panel is purposed for selecting the source of data to be loaded:
  • From text area - sequences are to be taken from the text field.
  • From file - sequences are to be taken from a file.
 
Figure 3.1.1.3.

1. Panel for selecting the source of data. 2. Checkbox regulating the loading of alignment in a new window. 3. The "Process" button. 4. The "Clear" button. 5. The "Cancel" button.

Tab "Alignment"

The Alignment tab is purposed for defining the parameters of alignment processing.

    The Align mode panel is purposed for selecting the algorithm of aligning:
  • Align all - to use the algorithm 1 only.
  • Percent of sequences to align - to align N percents of sequences using the algorithm 1, and all remaining with use of the algorithm 2.
  • Number of sequences to align - to align N sequences using the algorithm 1, and all remaining with use of the algorithm 2.
    The panel of alignment options contains:
  • The Align method list, purposed for selecting the sequences similarity measure to be used in algorithm 2:
    • Alignment score - The score of alignment representing the probability of its appearance by occasional reasons.
    • Sum of block weight - The sum of alignment blocks' scores.
    • Maximum of block weight (weight of best block of alignment) - The score of the best block of alignment.
    • Total alignment weight - The total score of alignment including penalties for gaps etc.
  • The Join flag checkbox. If the box is checked in, the unaligned flanks are to be added to alignment, otherwise - are not..
  • The Align only tree (project) option. If the option is selected, then only sets of sequences selected in a project are to be compared. At this the certain number of single alignments can be obtained.
    The Globopt panel is purposed for defining the following parameters:
  • Use globopt for aligned sequences label - N/A.
  • Distance between globopt using label - The frequency of global alignment optimization (i.e. how often the optimization of the internal gap's localization will occur). If the "0" value is defined in the field, the option is not in use.
 
Figure 3.1.1.4.

1. The panel for choosing the algorithm of aligning. 2. The panel of alignment's options. 3. The "Globopt" panel. 4. The checkbox regulating the loading of alignment in a new window. 5. The "Process" button. 6. The "Clear" button. 7. The "Cancel" button.

Tab "Tree"

The Tree tab is purposed for defining the parameters of a dendrogram processing.

    The Method of matrix evaluation list contains the sequences similarity measures to be used for building a dendrogram:
  • Number of common L-plets - The maximal number of common L-plets.
  • Sum of alignment blocks weights - The sum of alignment blocks' scores.
  • Aligmnetn score - The score of alignment representing the probability of its appearance by occasional reasons.
  • Weight of best block of alignment - The score of the best block of alignment.
  • Total alignment weight - The total score of alignment including penalties for gaps etc.
    The Build tree using method list contains the methods to be used for joining the knots of a dendrogram:
  • UPGMA
  • WPGMA
  • Nearest Neighbours
  • Furthest Neighbours
 
Figure 3.1.1.5.

1. ÑThe list of the sequences similarity measures. 2. The list of the methods for joining the knots of a dendrogram. 3. The checkbox regulating the loading of alignment in a new window. 4. The "Process" button. 5. The "Clear" button. 6. The "Cancel" button.

    Operating the Load dialog box:
  • The Load in new frame checkbox. If the box is checked in, the alignment is being loaded in a new window, otherwise - in the current one.
  • The Process button is purposed for closing the window and starting the processing of alignment.
  • The Clear button is purposed for restoring the default window's settings.
  • The Cancel button is purposed for closing the window and refusing the calculations.

3.2. Editing

Editing can be performed with use of the "Edit" commands' group and with use of the mouse in the alignment area.

3.2.1. Editing the groups of sequences

The Edit>Group command opens the Group properties dialog box purposed for editing the groups of sequences (fig. 3.2.1.1).

The list of groups

The list of groups contains the identifiers of groups and information on a number of sequences in a group.

The Add new group button is purposed for adding a new group to the list. On pressing the button the Enter group name dialog box appears (fig. 3.2.1.2). Define the name of a group in this box and press the OK button.

 
Figure 3.2.1.1.

1. List of groups. 2. List 1 (sequences included in a group). 3. Buttons controlling lists 1 and 2. 4. List 2 (sequences excluded from a group). 5. Button for adding a new group. 6. Button for removing a group. 7. Button for applying changes. 8. Button for closing the window.

 
Figure 3.2.1.2.

The Delete selected group button is purposed for removing the selected groups from the list.

Comment. By default, the list always contains the single group with the name "default" that originally (i.e. before editing) includes all sequences of the loaded alignment.

Lists of sequences

To edit a group select it in the list. In the list 1 the identifiers of sequences included in this group will be displayed. The list 2 contains the identifiers of sequences not included in any of the groups.

The "Add", "Delete", "Add all" and "Delete all" buttons are purposed for operating the lists 1 and 2.

    Buttons for operating the "Group properties" box:
  • Apply - to apply all made changes.
  • Close - to close the window without applying any made changes.

3.2.2. Editing sequences.

Substituting a symbol and inserting a gap in the sequence are described in chapter 2.4.
Selecting, deleting, copying and relocating can be performed using the Edit group's commands of the application's main menu.

3.3. Color schemes

The application provides the possibility to apply different color schemes to the symbols of residues. Selecting a color scheme can be performed using the Color group's commands of the main menu. The coloring may occur either for the symbol of residue or for its background (it depends on settings defined in the Options dialog box, see chapter 3.5).

By PID - aminoacidic residues are colored in accordance to the rate of occurrence of a symbol in the column (see table 3.3.4, fig. 3.3.12). This scheme is used by default.

Table 3.3.1.

Percent Color
> 80 % Mid blue
> 60 % Light blue
> 40 % Light grey
<= 40% White
 
Figure 3.3.1.

3.4. Performing the calculations

    Performing the calculations and sorting as well as opening the dialog boxes of tree's diagram and principal components' plots can be completed with use of the Calculate commands' group of the main menu:
  • Consensus - to calculate the consensus for the set of sequences that are in the display area.
  • Conservation - to calculate the conservatism of sequences. The conservatism is calculated in groups.
  • Remove redundancy - to remove the sequences with similarities to the consensus below the defined threshold value. On executing this command the Enter redundancy… dialog box (fig. 3.4.1) with Percent identity entry field, where the appropriate value should be defined, appears. After completing the entry press the OK button.
  •  
    Figure 3.4.1.
  • Sort - sequences' sorting modes:
    • Sort by pairwise identity - to sort by identity in pairs.
    • Sort by ID - to sort in alphabetical order.
    • Sort by group ID - to sort by groups first and then by alphabetical order in groups.
    • Sort by tree - to sort sequences by their order in a tree.
  • Show tree - opens the "Tree diagram" dialog box (see chapter 3.4.1) purposed for visualization of the hierarchical clustering of sequences (dendrogram).
  • Principal Component Analysis - to perform the calculating and visualizing the principal (main) components (see chapter 3.4.2).
  • Calculate all - to calculate the consensus, the conservatism and the main (principal) components.

3.4.1. Tree diagram

    The Tree Diagram dialog box (fig. 3.4.1.1) is purposed for visualizing the tree of sequences. The box can be opened using:
  • The Calculate>Show tree command of the main menu
  • The button of the control panel
    The window contains:
  • The main menu;
  • The diagram visualization area.
    Main menu commands
  • Action - contains the following commands:
    • Show scalable tree diagram - opens the scaleable diagram's dialog box.
    • Send postscript - opens the Export postscript file dialog box purposed for downloading the postscript file from the server (see chapter 3.1).
 
Figure 3.4.1.1.

1. Main menu. 2. Diagram visualization area. 3. Navigation line.

Diagram's visualization area

Clicking the mouse left button on a diagram defines (determines) the maximal distance between any two sequences in the group. he different clusters are marked with different colors both on the tree diagram, and in the main window of the application.

3.4.1.1. The dialog box of the scalable diagram

    The Scalable Tree Diagram dialog box (fig. 3.4.1.1.1) can be opened with use of the Action>Show scalable tree diagram command of the Tree Diagram window's main menu. It contains:
  • The main menu
  • The control panel
  • The area of diagram's visualization and scaling
 
Figure 3.4.1.1.1.

1. The main menu. 2. The control panel. 3. The area of diagram's visualization and scaling.

Commands of the main menu and functions of the control panel's buttons

    File - provides the set of commands for operating the files:
  • Close - to close the dialog box.
    Edit - provides the set of commands for changing the scale of images:
  • Zoom in vertical (the button) - to expand the image vertically.
  • Zoom in horizontal (the button) - to expand the image horizontally.
  • Zoom out vertical (the button) - to condense the image vertically.
  • Zoom out horizontal (the button) - to condense the image horizontally.
  • Zoom 100% (the button) - to restore the original size of the image
  • Zoom min (the button) - to set the minimal scaling of the image.
  • Zoom in (the button) - to expand the selected part of the image.
  • Zoom out (the button) - to condense the selected part of the image.
  • Go to (the button) - to select a knot for visualizing on the main diagram.

3.4.2. Principal components

Calculating and visualizing the main components can be completed using the Calculate>Principal Component Analysis commands' group of the main menu.

Calculating the components.
The Calculate>Principal Component Analysis>Calculate command is purposed for calculating the main components. During the calculations appears the information windows, which disappears on calculations are finished (fig. 3.4.2.1).

 
Figure 3.4.2.1.
    Visualizing the main components.
    The application provides the possibility to visualize the main (principal) components of sequences and aminoacidic residues both in 2D and 3D space. Plots can be opened with use of the following commands of the application's main menu:
  • PCA 2D sequences - opens the Sequence PCA Diagram 2D window showing the 2D plot of the sequence's main components.
  • PCA 3D sequences - opens the Sequence PCA Diagram 3D window showing the 3D plot of the sequence's main components.
  • PCA 2D nucleotides - opens the Nucleotide PCA Diagram 2D window showing the 2D plot of the residues' main components.
  • PCA 3D nucleotides - opens the Nucleotide PCA Diagram 3D (see fig. 3.4.2.2) window showing the 3D plot of the residues' main components.

The composition and functions of these four windows are similar.

 
Figure 3.4.2.2.

1. Control panel. 2. Plot visualization area.

    The control panel contains:
  • The button returns the plot in the originally loaded position.
  • The button scales the image into the plot's visualization area (for 3D plots only).
  • The button switches selection mode on.
  • The X axis list allows selecting the X axis components..
  • The Y axis list allows selecting the Y axis components.
  • The Z axis list allows selecting the Z axis components (for 3D plots only).

The plot visualization area allows:
Zooming image in and out. To perform this operation use the mouse pointer while the "Shift" button is pressed. Moving the pointer up zooms the image out, moving it down zooms the image in.
Moving the image. To perform this operation use the mouse pointer while the "Ctrl" button is pressed.
Rotating the image (for 3D plots only). To perform this operation use the mouse pointer.
Selection is to be made by the mouse left button. Selection mode runs on by the button. Identifiers of sequences selected in the plot area are automatically selected in the sequences area of the main window and vice versa. Aminoacidic residues selected in the plot area are automatically selected in the sequences area of the main window and vice versa.

The axis of yellow color on the 2D plots is auxiliary, the user can turn this axis in any direction by mouse.

3.5. Options dialog box

    The Options dialog box (fig. 3.5.1) is purposed for setting up the sequences visualization. This box can be opened using:
  • The View>Options command of the main menu;
  • The button of the control panel.
    The Text color panel allows the color of symbols to be set up:
  • Color text - for colored symbols. The color of symbols depends on applied color scheme. The color scheme can be selected using the Color group of commands.
  • Black text - for black symbols.

The Boxes checkbox. If the box is checked in, the color background of residue is shown. The background color depends on applied color scheme. The color scheme can be selected using the Color group of commands.

The Text checkbox. If the box is checked in, the residue symbols are shown.

    The Font panel is purposed for setting up the font for residues:
  • The Size field is used for setting up the size of font.
  • The Plain option - for normal font.
  • The Bold option - for bold font.

The button OK is purposed for closing the box and applying the selected options, the button Apply - for applying the selected options before closing the box, the button Cancel - for closing the box and rejecting the selected options.

 
Figure 3.5.1.

1. Panel for setting up the symbols' color. 2. "Boxes" checkbox. 3. "Text" checkbox. 4. Panel for setting up the font. 5. "OK" button. 6. "Apply" button. 7. "Cancel" button.