The "MALI viewer" application is developed for aligning amino acids sequences and working with results of such alignments:
The main window contains (fig. 2.1.1):
Calculate - the group of commands for processing the calculations:
Functions of control panel buttons:
|Select all sequences - makes all sequences selected.|
|Deselect all sequences - makes all sequences unselected.é|
|Inverse sequence selection - makes the selection of sequences inverted.|
|Delete selected sequences - removes selected sequences (strings).|
|Delete selected columns - removes selected columns.|
|Move selected sequences to new alignment - replaces selected sequences to a new alignment window.|
|Copy selected sequences to new alignment - creates copies of selected sequences in a new alignment window.|
|Deselect all columns - makes all columns unselected.|
|Remove sequence <- left of selected columns - removes the fragments of sequences to the left of selected columns.|
|Remove sequence -> right of selected columns removes the fragments of sequences to the right of selected columns.|
|Tree - opens the dialog box with tree diagram|
|Options - opens the configuration dialog box|
This area is purposed for visualizing and editing sequences of alignment.
Single left mouse button click over a symbol marks this symbol with color and changes its style to italic, and similar symbols that are in the same positions in all sequences are being marked also (see fig. 2.1.1).
To move a fragment of sequence press the "Shift" key, place mouse pointer over a symbol, press the mouse left button and then drag a symbol. If to drag a symbol to the right, then a gap will appear in the original position of symbol and sequence will be displaced in chosen direction. If the "Edit>Groups editing mode" is on, then gap insertion occurs in all sequences of the selected group. If to drag a symbol to the left, then gap symbols will be replaced by the dragging fragment.
Double left mouse button click over a sequence's symbol opens the "Symbol selection" dialog box (fig. 22.214.171.124). The box contains the list of all symbols and their meaning. The clicked symbol is marked in the list. To replace this symbol with another one select the new symbol in the list and press the "OK" button. The "Symbol selection" dialog box will be closed and the new symbol (letter) will appear in a sequence.
This area is shown if the "View>Quality" command of the main menu is selected.
The histogram represents the internal homogeneity of positions. Histogram represents the internal homogeneity of positions and can be calculated by the following manner: in the column the pairs of residues are checked through and then values for these pairs taken from BLOSUM62 table are summarized . Further all these data are normalized to the maximal among all columns value.
The left information bar (fig. 2.1.1) displays the summary on a sequence, the symbol of which is currently pointed by the mouse: ID, similarity to consensus and the ID of sequence with the most homology to the current one.
The right information bar (fig. 2.1.1) displays the summary on a symbol currently pointed by the mouse: name of a symbol, the rate of occurrence in the given column and the name of a symbol in consensus of current position.
Loading an alignment
The "Action>Input align" command opens the "Alignment dialog" box purposed for loading an alignment (fig. 3.1.1):
The "Load in new frame" checkbox. If the box is checked in, the alignment will be loaded in the new window.
The "OK" button is purposed for closing the window and loading an alignment.
The "Apply" button is purposed for loading an alignment without closing the window.
The "Cancel" button is purposed for canceling the loading.
The "Action>Output align" command opens the "Alignment dialog" box purposed for retrieving an alignment in the defined format (fig. 3.1.2).
The "OK" button is purposed for closing the window and outputting an alignment.
The "Apply" button is purposed for inputting an alignment from the main window into the alignment entry field in the defined format.
The "Cancel" button is purposed for canceling the alignment outputting.
Downloading an alignment from the server
The "Send postscript" command opens the "Export postscript file dialog" box purposed for downloading the post-script file from the server (fig. 126.96.36.199) by the following means:
The "Options" button opens the "Postscript options dialog" box (fig. 188.8.131.52) purposed for defining the parameters of data file (size and layout of a page, font).
The "OK" button is purposed for closing the window and applying the selected changes.
The "Cancel" button is purposed for canceling all changes.
Calculating an alignment
"Load dialog" box
To compare sequences one of two algorithms (or their combination) can be used:
1. An alignment is calculated using a dendrogram;
2. A concurrent performing an alignment and building a similarity dendrogram.
Algorithm 1 and Algorithm 2 can be used in combination. In the beginning of analysis it is more effective to use Algorithm 1, while at the following steps - Algorithm 2.
For taken sequences using the sequences similarity measure the matrix of pair similarity is being calculated. The pair of sequences with maximal similarity is being selected. The appropriate sequences are being removed from the total pool and instead of them the "object" consisting of these two sequences (profile of the pair alignment) is being added. Combining these two sequences is the first knot of dendrogram. The remained pool (its size is reduced by 1) is being used for calculating the matrix on the basis of sequences similarity measure and the process is being repeated. The similarity of the matrix's objects every time is being calculated anew. This algorithm provides more precise results but is time consuming. Each iteration reduces the size of a matrix by 1. The total number of iterations in this algorithm is equal to n-1, where n - the original size of a pool. Every next iteration provides the new dendrogram's knot, and in the end of aligning we have the one sequence, which represents the profile of aligning of all sequences from the original pool (multiple alignment) , and a dendrogram, which represents the joining order of this sequences.
Algorithm 1 (ordered by dendrogram).
A dendrogram can be either taken from a file or built with use of one of matrix clustering methods (UPGMA, WPGMA, NN, FN).
It is known that for these methods only the table of pair similarity of objects is required. Applying the sequences similarity measure serially to all pairs the required matrix (as for algorithm 2) can be obtained. Further, using the selected clustering method the dendrogram can be built.
The sequences similarity measures for algorithms 1 and 2 may differ.
As the similarity measure the following ones may serve:
The "Sequences" tab (fig. 184.108.40.206) is purposed for loading sequences to be aligned. The "Source data" panel is purposed for selecting the source of data to be loaded:
The "Alignment" tab is purposed for defining the parameters of alignment processing.
The "Align mode" panel is purposed for selecting the algorithm of aligning:
The panel of alignment options contains:
The "Globopt" panel is purposed for defining the following parameters:
The "Tree" tab is purposed for defining the parameters of a dendrogram processing.
The "Method of matrix evaluation" list contains the sequences similarity measures to be used for building a dendrogram:
The "Build tree using method" list contains the methods to be used for joining the knots of a dendrogram:
Operating the "Load dialog" box:
Editing can be performed with use of the "Edit" commands' group and with use of the mouse in the alignment area.
The "Edit>Group" command opens the "Group properties" dialog box purposed for editing the groups of sequences (fig. 220.127.116.11).
The list of groups
The list of groups contains the identifiers of groups and information on a number of sequences in a group.
The "Add new group" button is purposed for adding a new group to the list. On pressing the button the "Enter group name" dialog box appears (18.104.22.168). Define the name of a group in this box and press the "OK" button.
The "Delete selected group" button is purposed for removing the selected groups from the list.
Comment. By default, the list always contains the single group with the name "default" that originally (i.e. before editing) includes all sequences of the loaded alignment.
Lists of sequences
To edit a group select it in the list. In the list 1 the identifiers of sequences included in this group will be displayed. The list 2 contains the identifiers of sequences not included in any of the groups.
The "Add", "Delete", "Add all" and "Delete all" buttons are purposed for operating the lists 1 and 2.
Buttons for operating the "Group properties" box:
Substituting a symbol and inserting a gap in the sequence are described in chapter 2.4.
Selecting, deleting, copying and relocating can be performed using the "Edit" group's commands of the application's main menu.
The application provides the possibility to apply different color schemes to the symbols of residues. Selecting a color scheme can be performed using the "Color" group's commands of the main menu. The coloring may occur either for the symbol of residue or for its background (it depends on settings defined in the "Options" dialog box, see chapter 3.5).
ClustalX colors - ClustalX colors - The color scheme used in the "ClustalX" application . The color of a symbol depends on residue type and on frequency of its occurrence in the column (see fig. 3.3.1, tab. 3.3.1).
|With negative charge (DE)||>50%||MAGENTA|
|With positive charge (KR)||>60%||RED|
|Aromatic (FYW)||>50%||hydrophobic CYAN|
Zappo colorscheme - aminoacidic residues are colored in accordance to their physicochemical properties (see fig.3.3.2, table 3.3.2).
|KRH||With positive charge||Red|
|DE||With negative charge||Green|
|PG||Proline/Glycine (conformationnaly special)||Magenta|
Taylor colorscheme - Taylor colorscheme - aminoacidic residues are colored in accordance to the scheme provided by Taylor  (fig. 3.3.3).
By hydrophobicity - aminoacidic residues are colored in accordance to the table of hydrophobicity by . According to the table the most hydrophobic residues are colored in red, the most hydrophilic ones - in blue. Colors of residues with intermediate properties are the halftones of purple, which depend on the scale value (fig. 3.3.4).
Helix propensity - preferring formation of the a-helix (fig. 3.3.5) .
Strand propencity - preferring formation of the b-folds (fig. 3.3.6) .
Turn propencity - preferring formation of the bending (fig. 3.3.7) .
Buried index - in accordance to the frequency of occurrence inside a globule (fig. 3.3.8).
By conservation - it changes the color's intensity of used color scheme in dependence on conservatism [6, 7]. The conservatism can be assigned with use of numbers 0 to 9 and a symbol "*". The number defines the number of characteristics that are to be similar for properties of aminoacids in the column of alignment. The "*" symbol means that the set of aminoacidic residues has the similar properties for all characteristics in consideration. For "*" the color cannot be changed. As for numbers, the lesser the number, the lighter the color. In figure 3.3.9 the result of serial using of "ClustalX colors" and "By conservation" schemes is shown.
Above PID threshold only - the chosen color scheme (Zappo coloscheme, Taylor coloscheme or By hydrophobicity) is to be applied only to those aminoacidic residues, percentage of which in the current column exceeds the threshold value (fig. 3.3.10). This command opens the "Enter PID threshold" dialog box (fig. 3.3.11), where the user should define the threshold value and then press the "OK" or "Apply" button.
By PID - aminoacidic residues are colored in accordance to the rate of occurrence of a symbol in the column (see table 3.3.4, fig. 3.3.12). This scheme is used by default.
|> 80 %||Mid blue|
|> 60 %||Light blue|
|> 40 %||Light grey|
By BLOSUM62 score - aminoacidic residues are colored in accordance to their score in the column, which is determined accordingly to the BLOSUM62 matrix  (fig. 3.3.13). The symbol with maximal score is colored in blue. If the score of a symbol is positive when compared to the maximal one, it is colored in light blue, in all other cases the symbol is colored in white.
Performing the calculations and sorting as well as opening the dialog boxes of tree's diagram and principal components' plots can be completed with use of the "Calculate" commands' group of the main menu:
The "Tree Diagram" dialog box (fig. 22.214.171.124) is purposed for visualizing the tree of sequences. The box can be opened using:
The window contains:
Main menu commands
Diagram's visualization area
Clicking the mouse left button on a diagram causes the navigation line to be moved.
Clicking the mouse left button on a knot leads to, in the case when there are no daughter knots, reordering of the leaves in reverse order, otherwise it causes the same reordering of the daughter knots.
Clicking the mouse left button on a diagram defines the maximal distance between any two sequences in the group The different clusters are marked with different colors both on the tree diagram, and in the main window of the application.
The "Scalable Tree Diagram" dialog box (fig. 126.96.36.199.1) can be opened with use of the "Action>Show scalable tree diagram" command of the "Tree Diagram" window's main menu. It contains:
Commands of the main menu and functions of the control panel's buttons
"File" - provides the set of commands for operating the files:
"Edit" - provides the set of commands for changing the scale of images:
Calculating and visualizing the main components can be completed using the "Calculate>Principal Component Analysis" commands' group of the main menu.
Calculating the components.
The "Calculate>Principal Component Analysis>Calculate" command is purposed for calculating the main components. During the calculations appears the information windows, which disappears on calculations are finished (fig. 188.8.131.52).
Visualizing the main components.
The composition and functions of these four windows are similar.
The control panel contains:
The plot visualization area allows:
Zooming image in and out. To perform this operation use the mouse pointer while the "Shift" button is pressed. Moving the pointer up zooms the image out, moving it down zooms the image in.
Moving the image. To perform this operation use the mouse pointer while the "Ctrl" button is pressed.
Rotating the image (for 3D plots only). To perform this operation use the mouse pointer.
Selection is to be made by the mouse left button. Selection mode runs on by the button. Identifiers of sequences selected in the plot area are automatically selected in the sequences area of the main window and vice versa. Aminoacidic residues selected in the plot area are automatically selected in the sequences area of the main window and vice versa.
The axis of yellow color on the 2D plots is auxiliary, the user can turn this axis in any direction by mouse.
The "Options" dialog box (fig. 3.5.1) is purposed for setting up the sequences visualization. This box can be opened using:
The "Text color" panel allows the color of symbols to be set up:
The "Boxes" checkbox. If the box is checked in, the color background of residue is shown. The background color depends on applied color scheme. The color scheme can be selected using the "Color" group of commands.
The "Text" checkbox. If the box is checked in, the residue symbols are shown.
The "Font" panel is purposed for setting up the font for residues:
The button "OK" is purposed for closing the box and applying the selected options, the button "Apply" - for applying the selected options before closing the box, the button "Cancel" - for closing the box and rejecting the selected options.
1. Henikoff, S. and Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89: 10915-10919.
2. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680.
3. Taylor WR. (1997) Residual colours: a proposal for aminochromography. Protein Eng.,10,743-746
4. Kyte J, Doolittle RF. (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol.,157,105-132.
5. Levitt M. (1978) Conformational preferences of amino acids in globular proteins.Biochemistry,17,4277-4285.
6. Livingstone, C. D. and Barton. G. J. (1993), CABIOS 9, 745-756.
7. Zvelebil, M. J. J. M. et al (1987), J. Mol. Biol., 195 957-961.