• 1. Introduction
  • 2. Common system description
  • 3. System operating
    • 3.1. Selecting genes by request
    • 3.2. Operating the groups of fields (experiments)
    • 3.3. Dialog window of fields selecting
    • 3.4. Selecting genes with most correlated expression profiles
    • 3.5. Setting the parameters of request for coefficients of correlation between genes' profiles
    • 3.6. Search for correlation between two sets of fields
    • 3.7. Dialog window of the correlation matrix
    • 3.8. Building the tree of genes
    • 3.9. Building the tree of fields
    • 3.10. Dialog window for visualization of the tree as a diagram
    • 3.11. Clustering genes
    • 3.12. Analysis of principal components
    • 3.13. The dialog window of profiles
      • 3.13.1. The plot
      • 3.13.2. The information bar
      • 3.13.3. Modes of experiments' visualization
      • 3.13.4. The list of genes
      • 3.13.5. The list of groups of experiments

    1. Introduction

    SelTag is the one of the most perfect tools developed for analysis of data on genes expression. It allows:

    • To analyze either all or selected groups of genes or tissues;
    • To select tissue-specific genes on the complex criteria basis;
    • To visualize expression data;
    • To identify genes, which are expressed similarly (correlatively) in the given set of tissues;
    • To select specific genes, such as receptors or secreted proteins, associated with certain disorders.

    2. Common system description

    2.1. Main window

    The main window is purposed for visualization of genes expression as a table. In the upper part of the window there are main menu commands.

    2.2. Description of the main menu commands

      1. The "File" menu contains the set of commands for operating files:
    • "Open data" - calls the "Load data" dialog window, which allows loading a file with genes expression table from a certain project.
    • "Upload data" - calls the "Upload file" dialog window, which allows adding a new file to a project from the workstation.
    • "Data description" - opens a document, which contains description and list of files for loaded data.
    • "Save" - N/A in the current version.
    • "Save as" - N/A in the current version.
    • "Close" - unload currently loaded data.
    • "New project" - create a new project.
    • "Open project" - calls the "Open project" dialog window, which allows loading existing projects by their ID.
    • "Delete project" - calls the "Delete projects" dialog window, which allows removing existing projects.
    • "Close project" - close a currently opened project.
    • "Link gene data" - link the file with genes' description. Calls the "Load data" dialog window, purposed for loading the file with genes' description. On the file is loaded, the "URLs>UniGene" command of the popup menu becomes active. If the size of this file exceeds the value, defined in the "Application Options" dialog window (the window can be called using the "Options>Application Options" command of the main menu), then the dialog window with suggestion to use the dynamic loading mode appears. In dependence on selected variant, the system either activates the proposed mode or uses the complete loading of the file.
    • "Save gene data" - N/A in the current version.
    • "Unlink gene data" - unlinks the file with genes' description from the main data.
    • "Link sequence" - link the file with nucleotide sequences of genes. Calls the "Load data" dialog window, purposed for loading the file with genes' sequences. On the file is loaded, the "Show sequence" command of the popup menu becomes active. If the size of this file exceeds the value, defined in the "Application Options" dialog window (the window can be called using the "Options>Application Options" command of the main menu), then the dialog window with suggestion to use the dynamic loading mode appears. In dependence on selected variant, the system either activates the proposed mode or uses the complete loading of the file.
    • "Unlink gene sequence" - unlinks the file with genes' sequences from the main data.
    • "Exit" - quit the program.
      2. The "Edit" menu contains the set of commands for quick search for genes in a table:
    • "Quick search gene" - calls the "Quick search" dialog window, which allows searching a table for a gene by the "Find what" field's value.
      3. The "View" menu contains the set of commands for viewing expression matrix and information on genes:
    • "Image" - calls the diagram of expression matrix visualization.
    • "Project information" - calls the "Project information" dialog window, which allows viewing the information on a project (IDs of the active project and file).
    • "Gene description" - N/A in the current version.
      4. The "Select" menu contains the set of commands for tables operating:
    • "Select genes by query…" - calls the "make selection" dialog window, which allows selecting genes in the worktable, which meet certain conditions, followed by the reorganization of these genes into a new table.
    • "Initial data" - opens the main table (table with the complete list of genes loaded at the start of a project) of genes expression.
    • "Remove all selections" - closes all tables created as a result of certain selection operations.
      5. The "Group" menu contains the set of commands for operating the groups of experiments:
    • "View" - calls the "view group data" dialog window, which allows viewing groups.
    • "Add" - calls the "edit group data" dialog window, which allows creating groups.
    • "Edit" - calls the "select field group" dialog window, which allows selecting one of the groups for editing in the appropriate dialog window ("Edit group data").
    • "Load" - N/A in the current version.
    • "Save" - N/A in the current version.
    • "Delete" - calls the "select field group" dialog window, which allows selecting one (or several) group(s) for further deleting.
      6. The "Analysis" menu contains the set of commands for data analysis:
    • "Correlations":
      • "Select most correlated genes" - calls the "select most correlated genes for specified gene set" dialog window, which allows searching for genes, expression profiles of which are most correlated with one or several selected genes.
      • "Get correlations between genes" - calls the "correlation analysis setup" dialog window, which allows calculating the correlation matrix between two sets of genes.
      • "Get correlations between fields" - calls the "correlation analysis setup" dialog window, which allows calculating the correlation matrix between two sets of fields.
    • "Clustering":
      • "Build tree for genes" - calls the "tree calculation setup" dialog window, which allows building the tree for genes.
      • "Find genes clusters" - calls the "setup for clustering procedure" dialog window, which allows setting up the parameters for genes clustering request.
      • "Build tree for fields" - calls the "tree calculation setup" dialog window, which allows building the tree for fields.
      • "Load tree" - N/A in the current version.
    • "Principal component" - calls the "setup for principal component analysis" dialog window, which allows analyzing the correlative or covariational matrix of genes expression profiles by the main components method and visualizing the obtained results.
      7. The "Graph" menu contains the set of commands for graphic visualization of genes expression profiles:
    • "Specified with current selection" - calls the "Profile dialog" dialog window, which allows graphic visualizing of genes' profiles from the active table.
      8. The "Options" menu contains the command:
    • "Application Options" - calls the "Application Options" dialog window, which allows setting up the limitations on the size of a loading file with genes description in the "Gene Description" section and on that of a file with nucleotide sequences of genes in the "Gene Sequence" section. If settings of a project have been changed, then on the end of work the system proposes to user to save them on the server.
      9. The "Help" menu contains two commands for calling the information on the program:
    • "Short help" - call the brief user's manual.
    • "About" - call the dialog window with the information on the program.

    2.3. Format of the file with genes expression table

    Data on expression of a set of genes are being depictured in a table consisting of the rows/strings (genes) and the columns (fields). Each row of a table corresponds to the set of data on a single gene. Sets of data are the same for all genes (rows). As a rule, columns correspond to measures of genes expression under various conditions (tissues, organs, cell cultures etc.). Besides, some of the columns may contain additional information (both numerical and textual) related to the genes (e.g. gene's name).

    In the overall view the table's columns (fields) may be of 4 main types:

    IVALUE - value - whole number (integer).
    FVALUE - value - number with floating point
    WORD - value - text without blanks (word)
    STRING - value - text with blanks (phrase)

    Fields are completely defined by the type and the name.

    2.3.1. Basic format of the SelTag's input file

    The basic format of the input file must be the following:

    ; May contain comment in any line of the file
    NAME<tab>WORD
    GENEID<tab>IVALUE
    TISSUECANCER0<tab>FVALUE
    TISSUECANCER1<tab>FVALUE
    TISSUENORMAL0<tab>FVALUE
    TISSUENORMAL1<tab>FVALUE
    TISSUENORMAL2<tab>FVALUE
    DATA
    GENE04675<tab>402<tab>6.00<tab>5.60<tab>5.97<tab>6.00<tab>6.00
    GENE46890<tab>794<tab>2.77<tab>3.22<tab>5.65<tab>5.68<tab>5.68
    GENE23794<tab>404<tab>5.97<tab>5.97<tab>6.00<tab>5.60<tab>5.97

    In this example the <tab> means the tabulation symbol ('Tab' key).

    The first lines (above the "DATA" line) contain the description of data format. In this part of the file each line contains the field's description: field's name and basic type.

    Lines below the "DATA" line contain data on expression for each gene. Each line corresponds to a separate gene. Data of fields are separated by the tabulation. The double tabulation means the empty field.

    2.3.2. Groups of fields

    Fields of the same type can be combined into a group of fields. A group can be defined by the list of appropriate fields and a name. In general, such a combination may represent a functional significance of these fields. For example, fields representing the levels of expression in tumors can be combined into the "Cancer tissues" group. The same field may be included into several groups of fields, i.e. sets of fields in groups may overlap.

    Description of the format may include a description of the fields' group. It begins with the "#GROUP" line, which defines the name of a group. Next follow the lines containing the list of fields, which are included in this group. Description of a group must be finished with the "#ENDGROUP" line.

    The example of the data format with defined fields.

    ; May contain comment in any line of the file
    NAME<tab>WORD
    GENEID<tab>IVALUE
    TISSUECANCER0<tab>FVALUE
    TISSUECANCER1<tab>FVALUE
    TISSUENORMAL0<tab>FVALUE
    TISSUENORMAL1<tab>FVALUE
    TISSUENORMAL2<tab>FVALUE
    #GROUP<tab>Cancer tissues
    TISSUECANCER0
    TISSUECANCER1
    #ENDGROUP
    #GROUP<tab>Arbitrary group
    TISSUECANCER1
    TISSUECANCER2
    TISSUENORMAL0
    TISSUENORMAL1
    #ENDGROUP
    DATA
    GENE04675<tab>402<tab>6.00<tab>5.60<tab>5.97<tab>6.00<tab>6.00
    GENE46890<tab>794<tab>2.77<tab>3.22<tab>5.65<tab>5.68<tab>5.68
    GENE23794<tab>404<tab>5.97<tab>5.97<tab>6.00<tab>5.60<tab>5.97

    In this data format two groups are defined: "Cancer tissues" (includes the TISSUECANCER0 and TISSUECANCER1 fields) and "Arbitrary group" (includes the TISSUECANCER1, TISSUECANCER2, TISSUENORMAL0 and TISSUENORMAL1 fields).

    Data can be loaded to the SelTag from files with more simple tables' format as well as retrieved from a set of tables with similar order and set of columns.

    2.4. Format of the file with genes' description

    The "Gene Description" type file consists of several successive descriptions. Each description takes up one or more lines.

    The first line has the following appearance:
    ">" Id "|" Short_description
    where ID is a certain identifier for the link to data and Short_description is a brief description with the length of a single line.

    After the first line there can be several lines with URL_description followed by several lines with Long_description.

    URL_description is the line of the following appearance:
    "URL:" Menu_text "|" URL
    Menu_text is a textual description for URL
    URL is an usual URL, i.e. a line "http://.../..."

    Long_description is a line(s) containing a long (multiline) description.

    3">

    3. System operating

    3.1. Selecting genes by request

    The "make selection" dialog window can be called using the "Select>Select genes by query…" command and allows selecting genes from the worktable by logical and/or mathematical expression. This window contains the following elements:

    3.1.1. Expression line. It is an editable text field placed in the upper part of the window. The "Clear Expr." button allows clearing a content of this line.

    3.1.2. Status (comments) line. It is a text field, in which additional information is displayed, placed below the Expression line.

    3.1.3. "Field" bar (Field's index entering bar).It contains the drop down list of fields, which can be called by pressing the "Select" button. Fields are in the following format:
    Fn NAME, where n - the field's order number, NAME - the field's name.
    On selecting a field the following information is displayed in the status line:
    $Fn: type=FIELDTYPE. Is item of group(s):x, where n - the field's order number, FIELDTYPE - the name of the field's data type, x - the group's order number(s) (if the field is included in any group(s)).
    The "Insert" button allows adding the current value of $Fn to an expression, where n - the field's order number.

    3.1.4. Text fields' values selection bar contains:

    • The list of text fields' values allows selecting values of a string type. This list becomes activated when the type of a selected field is a string one. If the type of a field is not a string (neither WORD nor STRING), then the list remains empty.
    • The "Select" button allows calling the list of text fields.
    • The "Insert" button allows adding a selected value from the list of current elements in the "TEXT" format to an expression.

    3.1.5. "Group" bar (group's order number entering bar)contains:

    • 3.1.5.1. The "Group" list contains groups of experiments. The list can be called by pressing the "Select" button. Groups are in the following format:
      Gn NAME, where n - the group's order number, NAME - the group's name.
      On selecting a field the following information is displayed in the status line (3.1.2):
      $Gn: type=FIELDTYPE, fields[x]: y1,y2,y3.....
      where n - the group's order number, FIELDTYPE - the name of the field's data type, x - the number of fields included in a group, ó1,ó2... - the order numbers of fields included in a group.
    • 3.1.5.2. The "Input condition level" field (the field of an additional condition). Sets an additional condition on selection in a group. This condition defines the lower threshold on a number of experiments in a group, which satisfy the main condition. This threshold can be described either as an absolute number of experiments (in this case the whole number is being entered into the field), or as a part of experiments in percents (in this case the percentage in the format of X% is being entered into the field).
    • 3.1.5.3. The "Insert" button allows adding a current $Gn value to an expression, where n - the field's order number.
      If additional conditions are defined, they become added to an expression:
      $Gn:50%

    3.1.6. "Card No." bar. This bar allows entering the specificator of the order number of a gene in a table. The "Insert" button allows adding a $N value to an expression.

    3.1.7. Buttons for operating the dialog window:

    • "Cancel" - exit without building a new table.
    • "Scan" - perform a query by an expression. On a query is finished, the information on the number of found genes is displayed in the status line.
    • "OK" - exit with acceptance of query results and building a new table of genes.

    3.1.8. Query entering buttons allow adding to an expression various lexemes: operations, numbers, and mathematical functions.

    3.1.9. "Expr. Score options" bar contains:

    • The "Calculate scores for last selection" checkbox. If this box is checked in, the list of selected by the last query genes and their scores appears.
    • The field with the list of genes and their scores.
        The "Type" drop down list and the "Value" field are purposed for additional selection of genes by their score. The "Type" list allows two types of selection:
      • "Not applied" - on pressing the "OK" button all genes, satisfying the query conditions, will be selected.
      • "Best N" - on pressing the "OK" button only the genes with highest scores will be selected. The number of genes can be defined in the "Value" field.

    On a query is finished, in the menu of the "Select" main window appears the additional item, the name of which corresponds to the selected set of genes (query number + expression). A user is allowed in a progress of work to save the retrieved sets of genes or to switch between them. To remove a list of tables use the "Remove all selections" option of the menu.

    3.2. Operating the groups of fields (experiments)

    Operating the groups of fields can be performed using the commands of the "Group" menu.

    • The Group>View command calls the "view group data" dialog window. In the "Group list" list the groups are displayed; in the "Group Fields" list the fields contained in the selected group are displayed. Above the lists the information on a total number of groups and a number of non-overlapping fields in a project is displayed. The "OK" button provides quitting the window.
    • The Group>Add command calls the "edit group data" dialog window, purposed for adding new groups to a project. To create a new group:
      1. Select the types of fields using the switches (radiobuttons) on the "Select field types" bar. Fields of the selected type will appear in the "Data fields" list. To add fields to a group use the "Add" button, to remove - the "Remove" one.
      2. Enter the name of a group (e.g. NewGroup) into the "Group Name" field. If the "Simultaneously add inverted group" checkbox is checked in, two groups with the names NewGroup and INV_NewGroup (which contains fields, which were not included in a NewGroup) will be created.
      3. Press the "Save group" button. The "Cancel" button provides canceling all changes. If the name of newly created group concurs with that of existing one, the "Save group" and "Cancel" buttons become inactive.
    • The Group>Edit command calls the "Select field group" dialog window, purposed for selecting of a group to be edited. On pressing the "OK" button appears the "edit group data" dialog window, which allows:
      1. Changing the list of fields, included in a group (displayed in the "Group Fields" list). To add fields to a group use the "Add" button, to remove - the "Remove" one.
      2. Changing the name of a group in the "Group Name" field.
      3. Creating an "inverted group" (it will include fields, contained in the "Data fields" list, and in the name of a group will be added the "INV_" prefix). It requires checking in the "Simultaneously add inverted group" checkbox. If an "inverted group" with such a name already exists, its content will automatically be updated taking into account the made changes.

      4. The "Save group" button allows confirming the changes, the "Cancel" one - rejecting them.
    • 4. The Group>Delete command calls the "Select field group" dialog window, which allows removing a group from a project. It requires selecting a group in the list by clicking on it and pressing the "Delete" button. To confirm changes press the "OK" button, to reject - the "Cancel" one.

    3.3. Dialog window of fields selecting

    Dialog window of fields' selection can be called by pressing the "Fields…" button in the "Select most correlated genes for specified gene set", "correlation analysis setup", "Tree calculation setup", "setup for clustering procedure", "setup for principal component analysis" dialog windows as well as using the "Graph>specified with current selection" command (in the case of no groups present). It provides selecting fields from the main table.

    3.3.1. List of data types

    The list becomes activated when the "Field types filtering" checkbox is checked in. The list displays the names of data types: WORD, IVALUE, STRING, and FVALUE. Types, selected in the list, define filter for fields and groups of a project.

    If the "Field types filtering" checkbox is not checked in, the list remains inactive, and in the list of groups and the list of fields appear groups and fields of all data types.

    3.3.2. List of the groups of fields (experiments)

    The list becomes activated when the "Field groups filtering" checkbox is checked in. The list contains the names of groups. Groups, selected in the list, define filter for fields (experiments).

      Buttons for operating the list:
    • "Select all" - allows selecting all groups.
    • "Invert selection" - allows inverting the selection.
    • "Unselect all" - allows unselecting of all selected groups.

    If the "Field types filtering" checkbox is checked in, then the list 3.3.2 contains the groups, types of which correspond to that, selected in the list 3.3.1. If the "Field types filtering" checkbox is not checked in, then the list 3.3.2 contains all groups.

    3.3.3. List of fields

    This list is located on the "Fields" bar. It contains the fields, which satisfy the restrictions, set in the lists 3.3.1 and 3.3.2. If no restrictions are set, the list contains all fields.

    Buttons for operating the list:

    • "Select all" - allows selecting all groups.
    • "Invert selection" - allows inverting the selection.
    • "Unselect all" - allows unselecting of all selected groups.
    • "Select all experiments" - allows selecting all fields, which contain expression values.

    3.3.4. Buttons for operating the dialog window

    • "Cancel" - quitting the window without acceptance of made selections.
    • "OK" - quitting the window with acceptance of fields, selected in the list 3.3.3.

    3.4. Selecting genes with most correlated expression profiles

    The "Analysis>Correlations>Select most correlated genes" command calls the "Select most correlated genes for specified gene set" dialog window, which allows setting the values for parameters of query.

    3.4.1. The list of genes available for selection is located below the "Gene list to select from" title. It contains genes from the current table. Above the list there is the textual field, which allows selecting lines by mask. In the mask the "%" symbol is used to substitute a single symbol, the "*" symbol - to substitute multiple symbols.

    3.4.2. The list of selected genes is located below the "Specified genes (query set to compare with)" title. It contains the genes, selected from the list 3.4.1. Above the list there is the textual field, which allows selecting lines by mask. In the mask the "%" symbol is used to substitute a single symbol, the "*" symbol - to substitute multiple symbols.

    Genes from the list 3.4.1 can be selected to form the subset of genes (set#2), which consists of one or more genes. Genes of the set#2 are displayed in the list 3.4.2. Genes remained in the list 3.4.1 form the set#1. The sets #1 and #2 do not contain common genes.

    3.4.3. Buttons for operating the lists:
    "Add->" - move the genes, selected in the list 1, to the list 2.
    "All->" - move all genes form the list 1 to the list 2.
    "<-Remove" - move the genes, selected in the list 2, to the list 1.
    "<-All" - move all genes form the list 2 to the list 1.
    "File load->" - load the list 2 from a file. N/A in the current version.

    The program selects from genes of the set#1 those, the expression profiles of which are most correlated with that of genes from the set#2. On the process is finished, the expression profiles of selected genes are built and correlation coefficients are placed in a separate table.

    3.4.4. If the "Initial gene set" checkbox is checked in, then in the list 1 the genes from the main table are placed.

    Setting the parameters of computing

    3.4.5. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.

    3.4.6. The "Correlation type" list contains 3 types of correlation coefficients:

    • Pearson r - Pearson's correlation coefficient.
    • Spearman r - Spearman's correlation coefficient.
    • Kendall tau - Kendall's correlation coefficient.

    3.4.7. The "Threshold type" list provides 4 types of restrictions on selecting genes:

    • "Best N" - select N genes with the best correlation value from the set#1 (N should be defined in the "and value" field).
    • "Best %" - select a part (in %%) of the genes with the best correlation value from the set#1 (%% should be defined in the "and value" field).
    • "Value" - select the genes with an absolute correlation value equal or higher than threshold, defined in the "value" field, from the set#1.
    • "All" - select all genes from the set#1.

    3.4.8. The "value" field allows setting the correlation threshold value (see 3.4.7).

    3.4.9. The "Regime to treat multiple genes for query set" list contains three regimes to compute a correlation:

    • The "Max. correlation value to select" regime - at comparing genes from the set#1 the key parameter is the maximal value of the coefficient of the correlation of gene from the set#1 with genes from the set#2.
    • The "Aver. correlation value to select" regime - at comparing genes from the set#1 the key parameter is the average value of the coefficient of the correlation of gene from the set#1 with genes from the set#2.
    • The "Corr for aver field values to select" regime - at comparing genes from the set#1 the key parameter is the value of the coefficient of the correlation of gene from the set#1 with an average profile of genes from the set#2, i.e. the "Fictive Gene" with average values of all fields from the set#2 is being created and used to compute the correlation coefficient.

    3.4.10. Options for saving the data in a file. N/A in the current version.

    3.4.11. History of lists 2. N/A in the current version.

    3.4.12. Data output parameters.

    • · The "View corr. matrix in separate window" checkbox. When checked in, then correlation coefficients are displayed in the additional window (description of the work with the matrix dialog window is in 3.7). In the matrix the lines correspond to genes, selected from the set#1 and ordered by descending the coefficients of the correlation with genes from the set#2. The first column contains the genes' identifications, the second one contains the coefficient of correlation, used to select by (in accordance to regime of selection). All remained columns contain coefficients of the correlation with genes from the set#2.

    3.4.13. Buttons for operating the dialog window:

    • "Cancel" - quit without completing an operation.
    • "ÎÊ" - complete an operation. Pressing this button calls the "Profile dialog" dialog window, in the list of experiments' groups of which appears the new group ("group from SelCorr +N0"), containing experiments, which were used to complete an operation.

    3.5. Setting the parameters of request for coefficients of correlation between genes' profiles

    The "correlation analysis setup" dialog window allows setting the parameters for computing the matrix of correlation between two sets of genes, and can be called using the "Analysis>Correlations>Get correlations between genes" command.

    The complete set of genes is used to form two subsets: the set#1, which corresponds to rows of the correlation matrix, and the set#2, which corresponds to columns of the correlation matrix. These sets of genes may overlap. Coefficients of correlation between expression profiles of genes from these sets are being computed. As a result, the correlation matrix with a size of n*m (n rows and m columns) is formed. An element of the matrix with indices "i,j" corresponds to a coefficient of the correlation between expression profiles of a gene with index "i" from the set#1 and a gene with index "j" from the set#2.

    3.5.1. The "Set 1 (rows)" list contains genes, to be used to form the set#1.

    3.5.2. The "Set 2 (columns)" list contains genes, to be used to form the set#2.

    3.5.3. The "Fields…" button calls the dialog window for selecting fields (3.3). The textual field represents the number of selected fields.

    3.5.4. The "Correlation type" list contains three types of correlation coefficients:

    • Pearson r - Pearson's correlation coefficient.
    • Spearman r - Spearman's correlation coefficient.
    • Kendall tau - Kendall's correlation coefficient.

    3.5.5. The "Threshold type" list allows selecting a type of threshold for correlation coefficients. Elements of the matrix exceeding the selected threshold are displayed in the red color. The list contains 4 types of thresholds:

    • "Best N" - select N correlation coefficients with maximal values (N should be defined in the "Value" field).
    • "Best %" - select a part (in %%) of the correlation coefficients with maximal values (%% should be defined in the "Value" field).
    • "Value" - select the correlation coefficients with values equal or higher than threshold, defined in the "Value" field.
    • "All" - select all genes from the set#1.

    3.5.6. The "Value" field allows setting the correlation threshold value (see 3.4.5).

    3.5.7. Buttons for operating the dialog window:

    • "Cancel" - quit without completing an operation.
    • "ÎÊ" - complete an operation. Pressing this button calls the "Correlation matrix dialog" dialog window, in which the correlation matrix is displayed (description of the work with the matrix dialog window is in 3.7).

    3.6. Search for correlation between two sets of fields

    The Analysis>Correlations>Get correlations between fields command calls the "correlation analysis setup" dialog window, which allows setting the parameters for computing the correlation matrix between two sets of fields.

    3.6.1. The "ITYPE" checkbox. When checked in, the fields, which belong to "ITYPE" data type, are being added to the lists 3.6.3 and 3.6.4.

    3.6.2. The "FTYPE" checkbox. When checked in, the fields, which belong to "FTYPE" data type, are being added to the lists 3.6.3 and 3.6.4.

    3.6.3. The "Field set 1 (rows)" list contains fields, to be used to form the set#1. The "Mark all" button allows selecting all fields in the list. The "Unmark all" button allows deselecting all fields in the list.

    3.6.4. The "Field set 2 (columns)" list contains fields, to be used to form the set#2. The "Mark all" button allows selecting all fields in the list. The "Unmark all" button allows deselecting all fields in the list.

    3.6.5. The "Correlation type" list contains three types of correlation coefficients:

    • Pearson r - Pearson's correlation coefficient.
    • Spearman r - Spearman's correlation coefficient.
    • Kendall tau - Kendall's correlation coefficient.

    3.6.6. The "Type" list allows selecting a type of threshold for correlation coefficients. Elements of the matrix exceeding the selected threshold are displayed in the red color. The list contains 4 types of thresholds:

    • "Best N" - select N correlation coefficients with maximal values (N should be defined in the "Value" field).
    • "Best %" - select a part (in %%) of the correlation coefficients with maximal values (%% should be defined in the "Value" field).
    • "Value" - select the correlation coefficients with values equal or higher than threshold, defined in the "Value" field.
    • "All" - select all genes from the set#1.

    3.6.7. The "Value" field allows setting the correlation threshold value (see 3.6.6).

    3.6.8. Buttons for operating the dialog window:

    • "Cancel" - quit without completing an operation.
    • "ÎÊ" - complete an operation. Pressing this button calls the "Correlation matrix dialog" dialog window, in which the correlation matrix is displayed (description of the work with the matrix dialog window is in 3.7).

    3.7. Dialog window of the correlation matrix

    The "Correlation matrix dialog" dialog window is being called at work with the "Select most correlated genes for specified gene set" (if checkbox 3.4.12 is checked in) and the "correlation analysis setup" dialog windows.

    1. The "File" menu contains the set of commands for operating files:

    • Save matrix - N/A in the current version.
    • Save row genes - N/A in the current version.
    • Close - close the dialog window.

    The "Action" menu contains the set of commands for operating the matrix:

    • Sort rows - calls the "sort matrix rows" dialog window, which allows sorting the rows of data.
    • Graph>Line plot - calls the "setup for correlations graph" dialog window, which allows building the profiles of correlation matrix's values.

    3.7.1. The "sort matrix rows" dialog window allows setting the parameters for sorting the rows of data.

    3.7.1.1. "Regime to get sorting values from row data" - regime for sorting the values of data rows:

    • "Max. correlation value to select" - the key parameter is a maximal correlation value.
    • "Aver. correlation value to select" - the key parameter is an average correlation value.

    3.7.1.2. "Order" - the order of sorting:

    • "Descending" - by descending values
    • "Ascending" - by ascending values

    3.7.1.2. "Sign operation" - sorting by the sign of value:

    • "Not applied" - do not sort
    • "ABS" - by the absolute value
    • "Negative" - by the negative value

    3.7.1.3. Buttons for operating the dialog window:

    • "Cancel" - quit without sorting.
    • "OK" - sorting by the defined parameters.

    3.7.2. The "setup for correlations graph" dialog window allows setting the parameters for graphic visualization of the correlation matrix in the "Profile dialog" dialog window (operating the window is described in 3.13):

    3.7.2.1. The "Set 1 (rows)" list contains genes, to be used to form the set#1.

    3.7.2.2. The "Set 2 (columns)" list contains genes, to be used to form the set#2.

    3.7.2.3. "Select X-Y axis representation" - regime for selecting the coordinates' axes:
    "X = rows (Set1), Graphs=columns (Set2)" - the X-axis represents the identifications of genes from the set#1, the Y-axis represents the coefficients of correlation between genes from the set#1 and the set#2. The list of genes contains genes from the set#2.
    "X = columns (Set2), Graphs=rows (Set1)" - the X-axis represents the identifications of genes from the set#2, the Y-axis represents the coefficients of correlation between genes from the set#1 and the set#2. The list of genes contains genes from the set#1.

    3.7.2.4. Buttons for operating the dialog window:

    • "Cancel" - quit without building a plot.
    • "OK" - calls the "Profile dialog" dialog window with defined visualization parameters./li>

    3.8. Building the tree of genes

    The "Analysis>Clustering>Build tree for genes" command calls the "Tree calculation setup" dialog window, which allows setting the parameters for building the tree of genes by the current table.

    3.8.1. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.

    3.8.2. The "Correlation type" list contains three types of correlation coefficients:

    • Pearson r - Pearson's correlation coefficient.
    • Spearman r - Spearman's correlation coefficient.
    • Kendall tau - Kendall's correlation coefficient.

    3.8.3. The "Distance type" list contains 3 types of distance measures, which are computed on the basis of correlation coefficients Rij:

    • 1-Rij
    • 1+Rij
    • 1-|Rij|

    3.8.4. The "Amalgamation rule" list contains 4 types of tree nodes combination:

    • UPGMA
    • Nearest neighbor
    • Furthest neighbor
    • WPGMA

    3.8.5. The "Data subset" list contains subsets of genes for building the tree:

    • all genes - all genes from the current table.

    3.8.6. The "Create expr. image" checkbox. When checked in, then, on computing is finished, the diagram of expression matrix appears.

    3.8.7. The "Make tree for fields" checkbox. When checked in, the tree of similarity between experiments' values in the expression matrix is being computed, and on visualizing the expression diagram the order of experiments is being determined by the tree of fields.

    3.8.8. Options for saving the data in a file. N/A in the current version.

    3.8.9. Buttons for operating the dialog window:

    • "Cancel" - quit without completing an operation.
    • "ÎÊ" - complete the tree building operation and call the dialog window for visualization of the tree as a diagram (operating the dialog window is described in 3.10).

    3.9. Building the tree of fields

    The "Analysis>Clustering>Build tree for fields" command calls the "Tree calculation setup" dialog window, which allows setting the parameters for building the tree of fields by the current table.

    3.8.1. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.

    3.8.2. The "Correlation type" list contains three types of correlation coefficients:

    • Pearson r - Pearson's correlation coefficient.
    • Spearman r - Spearman's correlation coefficient.
    • Kendall tau - Kendall's correlation coefficient.

    3.8.3. The "Distance type" list contains 3 types of distance measures, which are computed on the basis of correlation coefficients Rij:

    • 1-Rij
    • 1+Rij
    • 1-|Rij|

    3.8.4. The "Amalgamation rule" list contains 4 types of tree nodes combination:

    • UPGMA
    • Nearest neighbor
    • Furthest neighbor
    • WPGMA

    3.9.5. Options for saving the data in a file. N/A in the current version.

    3.9.6. Buttons for operating the dialog window:

    • "Cancel" - quit without completing an operation.
    • "ÎÊ" - complete the tree building operation and call the dialog window for visualization of the tree as a diagram (operating the dialog window is described in 3.10).

    3.10. Dialog window for visualization of the tree as a diagram

    The "Tree Diagram" dialog window appears on completing the tree building command (in the "Tree calculation setup" dialog window) and allows visualizing of the tree as a diagram.

    Commands of the main menu:

    • "File" - N/A in the current version.
    • "Edit" - provides the set of commands for setting the modes of operating the tree diagram.
      • Mark - marking operations:
        • Mark nodes - calls the "find node setup" dialog window, which allows setting the parameters for marking nodes. N/A in the current version.
        • Unmark all - unmark all nodes.
      • Mouse mode - selecting the modes of mouse click events:
        • Mark nodes - allows marking certain nodes on click
        • Swap child nodes - allows swapping child nodes on click
        • Marking all descent nodes - allows marking descent nodes on click
      • Options - calls the "tree options" dialog window.
    • "Marked leaves…" - N/A in the current version.
    • "Image" - provides the set of commands for setting the modes of operating the expression matrix's diagram.
      • "View expr. image" - if checked, a diagram of expression matrix visualization is being added.
      • "Image setup" - calls the "pattern setup" dialog window, which allows setting up the visualization of expression matrix.
      • "Column data subset" - N/A in the current version.
    • "Scalable Tree" - contains the single command:
      • "Open Tree" - calls the "Scalable Tree Diagram" dialog window (see 3.10.1), which allows scaling the built tree.

    Functions of the toolbar's buttons:

     

    "Nodes marking" - allows marking certain nodes on mouse click. Functions similarly to the "Edit>Mouse mode>Mark nodes" command of the main menu.

     

    "Child nodes swapping" - allows swapping child nodes on mouse click. Functions similarly to the "Edit>Mouse mode>Swap child nodes" command of the main menu.

     

    "Marking of all descent nodes" - allows marking descent nodes on mouse click.Functions similarly to the "Edit>Mouse mode>Marking all descent nodes" command of the main menu.

     

    "Unselect all" - allows unmarking all nodes. Functions similarly to the "Edit>Mark>Unmark all" command of the main menu.

     

    "Options" - calls the "tree options" dialog window. Functions similarly to the "Edit>Mark>Options" command of the main menu.

     

    "Image creation" - add a diagram of expression matrix visualization (N/A for tree of fields).

     

    "Open scalable tree" - calls the "Scalable Tree Diagram" dialog window, which allows scaling the built tree.

    3.10.1. The "find node setup" dialog window can be called using the "Edit>Mark>Mark nodes" command and allows setting the parameters for marking nodes. N/A in the current version.

    3.10.1.1. The "Mark by leaf name" switch. If selected, nodes are being marked in accordance with the name of a leaf, defined in the textual field.

    3.10.1.2. The "Mark by distance range" switch. If selected, nodes are being marked in accordance with the defined range of distances

    Buttons for operating the dialog window:

    • "Cancel" - quit without defining the parameters.
    • "ÎÊ" - quit with defining the parameters.

    3.10.2. The "pattern setup" dialog window can be called using the "Image> Image setup" command and allows setting up the visualization of expression matrix.

    3.10.2.1. The "Palette type" list contains the following color pallets:

    • 3.10.2.1.1. "White(min)->Colormax" - color of cells with ascending values changes from white to the most saturated one, defined on the bar 3.10.2.4.
    • 3.10.2.1.2. "Black(min)->Colormax" - color of cells with ascending values changes from black to the most saturated one, defined on the bar 3.10.2.4.
    • 3.10.2.1.3. "Colormin->White->Colormax" - color of cells with ascending values changes from the less saturated color, defined on the bar 3.10.2.5, to the most saturated one, defined on the bar 3.10.2.4. The color of cells with intermediate values is white.
    • 3.10.2.1.4. "Colormin->Black->Colormax" - color of cells with ascending values changes from the less saturated color, defined on the bar 3.10.2.5, to the most saturated one, defined on the bar 3.10.2.4. The color of cells with intermediate values is black.
    • 3.10.2.1.5. "Geographic map colors" - palette of geographic maps.

    3.10.2.2. The "Range type" list. N/A in the current version.

    3.10.2.3. The "Field specific ranges" checkbox. N/A in the current version.

    3.10.2.4. The "Max. value color" bar allows setting the color for the maximal value. Available for the palettes 3.10.2.1.1, 3.10.2.1.2, 3.10.2.1.3 and 3.10.2.1.4. The "Color" button calls the "Color chooser dialog" dialog window, which allows setting the required color. The selected color is displayed on the "Preview" bar.

    3.10.2.5. The "Min. value color" bar allows setting the color for the minimal value. Available for the palettes 3.10.2.1.3 and 3.10.2.1.4. The "Color" button calls the "Color chooser dialog" dialog window, which allows setting the required color. The selected color is displayed on the "Preview" bar.

    3.10.2.6. The "Dataset max & min" field allows visualizing the maximal and minimal values of the expression matrix's cells:

    • 3.10.2.6.1. The "Max. value" field displays the maximal value.
    • 3.10.2.6.2. The "Min. value" field displays the minimal value.

    3.10.2.7. The "User defined values" field. N/A in the current version.

    3.10.2.8. The "Number of intervals" list - allows setting the number of intervals for computing the scale of palettes' colors.

    Buttons for operating the dialog window:

    • "Cancel" - quit without applying the settings.
    • "ÎÊ" - quit with applying the settings.

    In the status line the information on the cell of expression matrix, which is currently targeted with the mouse pointer, is displayed. It includes: the number of the gene in a table, identification of the gene, the number of the field in a table, identification of the field and the value of experiment.

    3.10.1. Dialog window of the scalable diagram

    The "Scalable Tree Diagram" can be called using one of the following ways:

    • The "Scalable Tree>Open Tree" command of the "Tree Diagram" window's main menu.
    • The button on the "Tree Diagram" window's toolbar.

    Commands of the main menu:

    • "File" - provides the set of commands for operating files:
      • "Save GIF data" - N/A in the current version.
      • "Close" - close the dialog window.
    • "Edit" - provides the set of commands for scaling an image:
      • "Zoom in vertical" - enlarge an image vertically
      • "Zoom in horizontal" - enlarge an image horizontally
      • "Zoom out vertical" - lessen an image vertically
      • "Zoom out horizontal" - lessen an image horizontally
      • "Zoom 100%" - restore the original size of an image
      • "Zoom min" - set an image to the minimal scale
      • "Zoom in" - enlarge the selected area
      • "Zoom out" - lessen the selected area
      • "Go to" - select a node to be visualized on the main diagram
    • "Image" - provides the set of commands for setting up the modes of operating the expression matrix's diagram:
      • "View expr. image" - when checked in, the diagram of the expression matrix visualization is being added.
      • "Image setup" - calls the "pattern setup" dialog window for setting up the visualization of the expression matrix (see description in 3.3.1.1).

    Functions of the toolbar's buttons:

     

    "Zoom in vertical" - enlarge an image vertically

     

    "Zoom in horizontal" - enlarge an image horizontally

     

    "Zoom out vertical" - lessen an image vertically

     

    "Zoom out horizontal" - lessen an image horizontally

     

    "Zoom 100%" - restore the original size of an image

     

    "Zoom min" - set an image to the minimal scale

     

    "Zoom in" - enlarge the selected area

     

    "Zoom out" - lessen the selected area

     

    "Go to" - select a node to be visualized on the main diagram

     

    "Image creation" - add the diagram of the expression matrix's visualization (N/A for trees of fields).

    3.11. Clustering genes

    The "Analysis>Clustering>Find genes clusters" command calls the "setup for clustering procedure" dialog window, which allows setting the parameters of request for clustering genes.

    Setting the parameters of clustering.

    3.11.1. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.

    3.11.2. The "Distance type" list contains 3 types of distance measures, which are computed on the basis of correlation coefficients Rij:

    • 1-Rij
    • 1+Rij
    • 1-|Rij|

    3.11.3. The "Correlation type" list contains three types of correlation coefficients:

    • Pearson r - Pearson's correlation coefficient.
    • Spearman r - Spearman's correlation coefficient.
    • Kendall tau - Kendall's correlation coefficient.

    3.11.4. On the threshold of clustering ("Distance threshold") is defined, genes are combined in a single cluster if the distance between them is lesser than defined threshold.

    Clustering occurs on pressing the "Clustering" button. The results of clustering are displayed in the right part of the dialog window.

    Operating the results of clustering.

    3.11.5. The "Cluster #, size, score" list contains the obtained clusters. Each cluster is represented by its number, size and score.

    3.11.6. The "Gene NAME, cluster index, gene score" list contains the genes, which belong to selected in the list 3.11.5 clusters. Each gene is represented by its identification and score.

    3.11.7. The "Sort clusters by:" list contains the parameters of clusters' sorting:

    • ascending index - sort by ascending index (cluster's number)
    • r-score - sort by score
    • size - sort by size

    3.11.8. The "Find gene by name" field allows searching for gene in the list 3.11.6 by its identification. It requires entering the identification (name) of the gene and pressing the "Find" button. If the gene is present in the list - it becomes selected, if not - the message "Not Clustered!" appears.

    Setting the parameters of the window quitting.

    3.11.9. The "Build tree for selected genes" checkbox. When checked in, then on exiting the dialog window (on pressing the "OK, Exit" button) occurs building the tree for genes, which belong to clusters, selected in the list 3.11.5.

    3.11.10. The "Save selected data" checkbox, the "File name" field and the "Browse" button are N/A in the current version.

    3.11.11. The "Add cluster info for current data" switch (radiobutton). When selected, then on exiting the dialog window (on pressing the "OK, Exit" button) the results of clustering are being added into 4 new fields with the following names:

    • ~TXT_n - the number of cluster for each gene
    • ~TXT_size - the size of cluster for each gene
    • ~TXT_Raver - the score of cluster for each gene
    • ~TXT_Rcard - the score of gene
    • where TXT - the textual string, defined in the "Fields name" field.

    On default, every time the dialog window is started the "Cl#" value, where # is the order number of current window's launch, is being assigned to TXT. If fields with such a name already exist, they are being updated, if not - created.

    The table with the results of clustering is being added to the list of saved projects under an appropriate name (*.table)

    3.11.12. The "Do not add clustering info" switch (radiobutton). When selected, then on exiting the dialog window (on pressing the "OK, Exit" button) the results of clustering are not being saved.

    3.11.13. Buttons for operating the dialog window:

    • "Cancel" - quit without completing the clustering.
    • "Clustering" - complete the clustering with defined parameters and switch to the work with the results of clustering. On the process is finished, the information bar with the number of clusters and maximal/minimal size appears.
    • "ÎÊ, Exit" - quit the dialog window with additional options.

    3.12. Analysis of principal components

    The "Analysis>Principal component" command calls the "setup for principal component analysis" dialog window, which allows the followings:

    • Setting up the parameters for decomposition of the covariation or correlation matrix into eigenvalues and eigenvectors
    • Visualizing the profiles of eigenvalues, coefficients of eigenvectors.
    • Visualizing the projections of genes' expression values onto the plane of different pairs of eigenvectors.
    For selected fields the covariation/correlation matrix and its eigenvalues and eigenvectors are being computed.

    3.12.1. The "Fields..." button calls the dialog window for selecting fields. The textual field contains the number of selected fields.

    3.12.2. The "Matrix type" list contains the types of matrices:

    • covariation
    • correlation

    3.12.3. Buttons for operating the dialog window:

    • "Cancel" - quit without computing
    • "Calculate" - perform computing and switch to visualization of results.

    • On computing is finished, in the lists 3.12.4, 3.12.7 and 3.12.8 appear the order numbers of components from 1 to a number of fields, selected for analysis.
    • "OK" - quit on computing is finished.

    3.12.4. The "Component plots" list contains the components (eigenvectors), numbered in the order of descent eigenvalues.

    3.12.5. The "Mark All" button allows selecting all components in the list. The "Unmark all" button allows deselecting all components in the list.

    3.12.6. The bar for visualizing the sum of eigenvalues of components, selected in the "Component plots" list.

    • The "Variance" textual field represents the sum of eigenvalues of components, selected in the "Component plots" list.
    • The "Variance (%total)" textual field represents the part of sum (in %%) of eigenvalues of components, selected in the "Component plots" list.
    • The "Eigenvalue plot" button calls the "Graph" dialog window, purposed for visualizing the values of selected components.
    • The "Loadings plot" button calls the "Graph" dialog window, purposed for visualizing the values of groups for each component.
    • The "Save results" button. N/A in the current version.

    Building the projections.

    3.12.7. "ComponentX" - contains the components (eigenvectors), numbered in the order of descent eigenvalues and allows selecting the component for X-axis.

    3.12.8. "ComponentY" - contains the components (eigenvectors), numbered in the order of descent eigenvalues and allows selecting the component for Y-axis.

    The "Draw" button allows building the diagram.

    3.13. The dialog window of profiles

    The dialog window of diagrams' profiles is purposed for visualizing the profiles of genes' expression and composed of three parts: the plot, the list of genes and the list of fields' groups.

    3.13.1. The plot

    • The button sets the mode of data visualization as a histogram.
    • The button sets the mode of data visualization as a plot with data markers.
    • The button sets the mode of data visualization as a plot.
    • The button sets the mode of data visualization as a dot diagram.
    • The button turns on/off horizontal grid lines in the plot area.
    • The button turns on/off vertical grid lines in the plot area.
    • The button enables the pseudo 3D-plot mode (for histogram only).
    • The button calls the dialog window for setting up the options.

    3.13.2. The information bar

    The information bar is located below the diagram area. Information becomes available on placing the mouse pointer over an element of a plot.

    3.13.3. Modes of experiments' visualization

    There are two available modes of experiments' visualization:

    • Single
    • Multiple

    Switching between these modes can be performed in the list of genes.

    The "single" mode enables visualizing the profile for a single gene. The color of plot's elements corresponds to that of the group, to which experiments belong. Above the plot the maximal and minimal values for all groups of experiments are displayed. In the upper part of the plot there is a line, which represents the layout of the groups of experiments. The color of the line corresponds to that of the group below the line. Below the groups' layout and above the plot the total profile of a gene for all experiments is displayed. The Y-axis represents the expression value, the X-axis - the brief names of experiments.

    The "multiple" mode enables visualizing the profiles for one or more genes. The color of plot's elements corresponds to that of the gene. Above the plot the maximal and minimal values for all groups of experiments are displayed. In the upper part of the plot there is a line, which represents the layout of the groups of experiments. The color of the line corresponds to that of the group below the line. Below the groups' layout and above the plot the total profile of all genes from the list is displayed.

    3.13.4. The list of genes

    In the right upper part of the window there is the list, which contains the names of genes.

    Each button allows switching between the "single" and "multiple" modes. It also allows turning on/off the visualization of an appropriate profile. This option is associated with the indicator in the left part of each button.

    If one of the buttons is pressed, the "single" mode is enabled, otherwise - the "multiple" one. The black indicator ( ) means the profile is visualized, the white one ( ) - not visualized.

    In order to change the color of profile make the mouse right click on the appropriate button. In the appeared popup menu select the "Set color" command and then in the appeared "Color dialog" window set the required color.

    Above the list of genes there are the buttons for quick selecting/unselecting the genes ( - selecting, - unselecting).

    The "C" button calls the dialog window, which represents the list, containing the results of clustering and its color layout. Genes that do not belong to any of clusters are colored in black. On removing a cluster from being visualized, all genes in this cluster automatically discontinue to be visualized also.

    The drop down list above the list of genes contains the identifications of the worktables.

    3.13.5. The list of groups of experiments

    In the right lower part of the dialog window there is the list, which contains the names of the groups of experiments. The buttons (color of the button corresponds to that of the group) allow operating each group separately:

    • - turn off visualizing the group.
    • - display the minimal and maximal values of the group only.
    • - display the elements, selected in the list of this group. The list can be called using the popup menu of the group (see below).
    • - display all elements of the group.

    • If one of the buttons in the list is pressed, the "single" mode is enabled, otherwise - the "multiple" one.

    Above the list of groups there are the buttons for quick switching between the modes of visualization of all groups: - turn off visualizing, - display maximums and minimums of the groups only, - display experiments, selected in the lists of groups, - display all elements of all groups.

    Clicking the mouse right button results in appearance of the popup menu:

    • The "Set color" command calls the "Color dialog" window, which allows setting the color of a group.
    • The "Select elements" command calls the window with the list of group's experiments. To select experiments in the list for visualization make the mouse click on the indicator to the left of the name of experiment ( - experiment is being displayed, - experiment is not being displayed). Above the list there are the buttons for quick selecting/unselecting the experiments ( - selecting, - unselecting).