SelTag is the one of the most perfect tools developed for analysis of data on genes expression. It allows:
The main window is purposed for visualization of genes expression as a table. In the upper part of the window there are main menu commands.
Data on expression of a set of genes are being depictured in a table consisting of the rows/strings (genes) and the columns (fields). Each row of a table corresponds to the set of data on a single gene. Sets of data are the same for all genes (rows). As a rule, columns correspond to measures of genes expression under various conditions (tissues, organs, cell cultures etc.). Besides, some of the columns may contain additional information (both numerical and textual) related to the genes (e.g. gene's name).
In the overall view the table's columns (fields) may be of 4 main types:
IVALUE - value - whole number (integer).
FVALUE - value - number with floating point
WORD - value - text without blanks (word)
STRING - value - text with blanks (phrase)
Fields are completely defined by the type and the name.
The basic format of the input file must be the following:
; May contain comment in any line of the file
NAME<tab>WORD
GENEID<tab>IVALUE
TISSUECANCER0<tab>FVALUE
TISSUECANCER1<tab>FVALUE
TISSUENORMAL0<tab>FVALUE
TISSUENORMAL1<tab>FVALUE
TISSUENORMAL2<tab>FVALUE
DATA
GENE04675<tab>402<tab>6.00<tab>5.60<tab>5.97<tab>6.00<tab>6.00
GENE46890<tab>794<tab>2.77<tab>3.22<tab>5.65<tab>5.68<tab>5.68
GENE23794<tab>404<tab>5.97<tab>5.97<tab>6.00<tab>5.60<tab>5.97
In this example the <tab> means the tabulation symbol ('Tab' key).
The first lines (above the "DATA" line) contain the description of data format. In this part of the file each line contains the field's description: field's name and basic type.
Lines below the "DATA" line contain data on expression for each gene. Each line corresponds to a separate gene. Data of fields are separated by the tabulation. The double tabulation means the empty field.
Fields of the same type can be combined into a group of fields. A group can be defined by the list of appropriate fields and a name. In general, such a combination may represent a functional significance of these fields. For example, fields representing the levels of expression in tumors can be combined into the "Cancer tissues" group. The same field may be included into several groups of fields, i.e. sets of fields in groups may overlap.
Description of the format may include a description of the fields' group. It begins with the "#GROUP" line, which defines the name of a group. Next follow the lines containing the list of fields, which are included in this group. Description of a group must be finished with the "#ENDGROUP" line.
The example of the data format with defined fields.
; May contain comment in any line of the file
NAME<tab>WORD
GENEID<tab>IVALUE
TISSUECANCER0<tab>FVALUE
TISSUECANCER1<tab>FVALUE
TISSUENORMAL0<tab>FVALUE
TISSUENORMAL1<tab>FVALUE
TISSUENORMAL2<tab>FVALUE
#GROUP<tab>Cancer tissues
TISSUECANCER0
TISSUECANCER1
#ENDGROUP
#GROUP<tab>Arbitrary group
TISSUECANCER1
TISSUECANCER2
TISSUENORMAL0
TISSUENORMAL1
#ENDGROUP
DATA
GENE04675<tab>402<tab>6.00<tab>5.60<tab>5.97<tab>6.00<tab>6.00
GENE46890<tab>794<tab>2.77<tab>3.22<tab>5.65<tab>5.68<tab>5.68
GENE23794<tab>404<tab>5.97<tab>5.97<tab>6.00<tab>5.60<tab>5.97
In this data format two groups are defined: "Cancer tissues" (includes the TISSUECANCER0 and TISSUECANCER1 fields) and "Arbitrary group" (includes the TISSUECANCER1, TISSUECANCER2, TISSUENORMAL0 and TISSUENORMAL1 fields).
Data can be loaded to the SelTag from files with more simple tables' format as well as retrieved from a set of tables with similar order and set of columns.
The "Gene Description" type file consists of several successive descriptions. Each description takes up one or more lines.
The first line has the following appearance:
">" Id "|" Short_description
where ID is a certain identifier for the link to data and Short_description is a brief description with the length of a single line.
After the first line there can be several lines with URL_description followed by several lines with Long_description.
URL_description is the line of the following appearance:
"URL:" Menu_text "|" URL
Menu_text is a textual description for URL
URL is an usual URL, i.e. a line "http://.../..."
Long_description is a line(s) containing a long (multiline) description.
3">The "make selection" dialog window can be called using the "Select>Select genes by query…" command and allows selecting genes from the worktable by logical and/or mathematical expression. This window contains the following elements:
3.1.1. Expression line. It is an editable text field placed in the upper part of the window. The "Clear Expr." button allows clearing a content of this line.
3.1.2. Status (comments) line. It is a text field, in which additional information is displayed, placed below the Expression line.
3.1.3. "Field" bar (Field's index entering bar).It contains the drop down list of fields, which can be called by pressing the "Select" button. Fields are in the following format:
Fn NAME, where n - the field's order number, NAME - the field's name.
On selecting a field the following information is displayed in the status line:
$Fn: type=FIELDTYPE. Is item of group(s):x, where n - the field's order number, FIELDTYPE - the name of the field's data type, x - the group's order number(s) (if the field is included in any group(s)).
The "Insert" button allows adding the current value of $Fn to an expression, where n - the field's order number.
3.1.4. Text fields' values selection bar contains:
3.1.5. "Group" bar (group's order number entering bar)contains:
3.1.6. "Card No." bar. This bar allows entering the specificator of the order number of a gene in a table. The "Insert" button allows adding a $N value to an expression.
3.1.7. Buttons for operating the dialog window:
3.1.8. Query entering buttons allow adding to an expression various lexemes: operations, numbers, and mathematical functions.
3.1.9. "Expr. Score options" bar contains:
On a query is finished, in the menu of the "Select" main window appears the additional item, the name of which corresponds to the selected set of genes (query number + expression). A user is allowed in a progress of work to save the retrieved sets of genes or to switch between them. To remove a list of tables use the "Remove all selections" option of the menu.
Operating the groups of fields can be performed using the commands of the "Group" menu.
Dialog window of fields' selection can be called by pressing the "Fields…" button in the "Select most correlated genes for specified gene set", "correlation analysis setup", "Tree calculation setup", "setup for clustering procedure", "setup for principal component analysis" dialog windows as well as using the "Graph>specified with current selection" command (in the case of no groups present). It provides selecting fields from the main table.
The list becomes activated when the "Field types filtering" checkbox is checked in. The list displays the names of data types: WORD, IVALUE, STRING, and FVALUE. Types, selected in the list, define filter for fields and groups of a project.
If the "Field types filtering" checkbox is not checked in, the list remains inactive, and in the list of groups and the list of fields appear groups and fields of all data types.
The list becomes activated when the "Field groups filtering" checkbox is checked in. The list contains the names of groups. Groups, selected in the list, define filter for fields (experiments).
If the "Field types filtering" checkbox is checked in, then the list 3.3.2 contains the groups, types of which correspond to that, selected in the list 3.3.1. If the "Field types filtering" checkbox is not checked in, then the list 3.3.2 contains all groups.
This list is located on the "Fields" bar. It contains the fields, which satisfy the restrictions, set in the lists 3.3.1 and 3.3.2. If no restrictions are set, the list contains all fields.
Buttons for operating the list:
The "Analysis>Correlations>Select most correlated genes" command calls the "Select most correlated genes for specified gene set" dialog window, which allows setting the values for parameters of query.
3.4.1. The list of genes available for selection is located below the "Gene list to select from" title. It contains genes from the current table. Above the list there is the textual field, which allows selecting lines by mask. In the mask the "%" symbol is used to substitute a single symbol, the "*" symbol - to substitute multiple symbols.
3.4.2. The list of selected genes is located below the "Specified genes (query set to compare with)" title. It contains the genes, selected from the list 3.4.1. Above the list there is the textual field, which allows selecting lines by mask. In the mask the "%" symbol is used to substitute a single symbol, the "*" symbol - to substitute multiple symbols.
Genes from the list 3.4.1 can be selected to form the subset of genes (set#2), which consists of one or more genes. Genes of the set#2 are displayed in the list 3.4.2. Genes remained in the list 3.4.1 form the set#1. The sets #1 and #2 do not contain common genes.
3.4.3. Buttons for operating the lists:
"Add->" - move the genes, selected in the list 1, to the list 2.
"All->" - move all genes form the list 1 to the list 2.
"<-Remove" - move the genes, selected in the list 2, to the list 1.
"<-All" - move all genes form the list 2 to the list 1.
"File load->" - load the list 2 from a file. N/A in the current version.
The program selects from genes of the set#1 those, the expression profiles of which are most correlated with that of genes from the set#2. On the process is finished, the expression profiles of selected genes are built and correlation coefficients are placed in a separate table.
3.4.4. If the "Initial gene set" checkbox is checked in, then in the list 1 the genes from the main table are placed.
Setting the parameters of computing
3.4.5. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.
3.4.6. The "Correlation type" list contains 3 types of correlation coefficients:
3.4.7. The "Threshold type" list provides 4 types of restrictions on selecting genes:
3.4.8. The "value" field allows setting the correlation threshold value (see 3.4.7).
3.4.9. The "Regime to treat multiple genes for query set" list contains three regimes to compute a correlation:
3.4.10. Options for saving the data in a file. N/A in the current version.
3.4.11. History of lists 2. N/A in the current version.
3.4.12. Data output parameters.
3.4.13. Buttons for operating the dialog window:
The "correlation analysis setup" dialog window allows setting the parameters for computing the matrix of correlation between two sets of genes, and can be called using the "Analysis>Correlations>Get correlations between genes" command.
The complete set of genes is used to form two subsets: the set#1, which corresponds to rows of the correlation matrix, and the set#2, which corresponds to columns of the correlation matrix. These sets of genes may overlap. Coefficients of correlation between expression profiles of genes from these sets are being computed. As a result, the correlation matrix with a size of n*m (n rows and m columns) is formed. An element of the matrix with indices "i,j" corresponds to a coefficient of the correlation between expression profiles of a gene with index "i" from the set#1 and a gene with index "j" from the set#2.
3.5.1. The "Set 1 (rows)" list contains genes, to be used to form the set#1.
3.5.2. The "Set 2 (columns)" list contains genes, to be used to form the set#2.
3.5.3. The "Fields…" button calls the dialog window for selecting fields (3.3). The textual field represents the number of selected fields.
3.5.4. The "Correlation type" list contains three types of correlation coefficients:
3.5.5. The "Threshold type" list allows selecting a type of threshold for correlation coefficients. Elements of the matrix exceeding the selected threshold are displayed in the red color. The list contains 4 types of thresholds:
3.5.6. The "Value" field allows setting the correlation threshold value (see 3.4.5).
3.5.7. Buttons for operating the dialog window:
The Analysis>Correlations>Get correlations between fields command calls the "correlation analysis setup" dialog window, which allows setting the parameters for computing the correlation matrix between two sets of fields.
3.6.1. The "ITYPE" checkbox. When checked in, the fields, which belong to "ITYPE" data type, are being added to the lists 3.6.3 and 3.6.4.
3.6.2. The "FTYPE" checkbox. When checked in, the fields, which belong to "FTYPE" data type, are being added to the lists 3.6.3 and 3.6.4.
3.6.3. The "Field set 1 (rows)" list contains fields, to be used to form the set#1. The "Mark all" button allows selecting all fields in the list. The "Unmark all" button allows deselecting all fields in the list.
3.6.4. The "Field set 2 (columns)" list contains fields, to be used to form the set#2. The "Mark all" button allows selecting all fields in the list. The "Unmark all" button allows deselecting all fields in the list.
3.6.5. The "Correlation type" list contains three types of correlation coefficients:
3.6.6. The "Type" list allows selecting a type of threshold for correlation coefficients. Elements of the matrix exceeding the selected threshold are displayed in the red color. The list contains 4 types of thresholds:
3.6.7. The "Value" field allows setting the correlation threshold value (see 3.6.6).
3.6.8. Buttons for operating the dialog window:
The "Correlation matrix dialog" dialog window is being called at work with the "Select most correlated genes for specified gene set" (if checkbox 3.4.12 is checked in) and the "correlation analysis setup" dialog windows.
1. The "File" menu contains the set of commands for operating files:
The "Action" menu contains the set of commands for operating the matrix:
3.7.1. The "sort matrix rows" dialog window allows setting the parameters for sorting the rows of data.
3.7.1.1. "Regime to get sorting values from row data" - regime for sorting the values of data rows:
3.7.1.2. "Order" - the order of sorting:
3.7.1.2. "Sign operation" - sorting by the sign of value:
3.7.1.3. Buttons for operating the dialog window:
3.7.2. The "setup for correlations graph" dialog window allows setting the parameters for graphic visualization of the correlation matrix in the "Profile dialog" dialog window (operating the window is described in 3.13):
3.7.2.1. The "Set 1 (rows)" list contains genes, to be used to form the set#1.
3.7.2.2. The "Set 2 (columns)" list contains genes, to be used to form the set#2.
3.7.2.3. "Select X-Y axis representation" - regime for selecting the coordinates' axes:
"X = rows (Set1), Graphs=columns (Set2)" - the X-axis represents the identifications of genes from the set#1, the Y-axis represents the coefficients of correlation between genes from the set#1 and the set#2. The list of genes contains genes from the set#2.
"X = columns (Set2), Graphs=rows (Set1)" - the X-axis represents the identifications of genes from the set#2, the Y-axis represents the coefficients of correlation between genes from the set#1 and the set#2. The list of genes contains genes from the set#1.
3.7.2.4. Buttons for operating the dialog window:
The "Analysis>Clustering>Build tree for genes" command calls the "Tree calculation setup" dialog window, which allows setting the parameters for building the tree of genes by the current table.
3.8.1. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.
3.8.2. The "Correlation type" list contains three types of correlation coefficients:
3.8.3. The "Distance type" list contains 3 types of distance measures, which are computed on the basis of correlation coefficients Rij:
3.8.4. The "Amalgamation rule" list contains 4 types of tree nodes combination:
3.8.5. The "Data subset" list contains subsets of genes for building the tree:
3.8.6. The "Create expr. image" checkbox. When checked in, then, on computing is finished, the diagram of expression matrix appears.
3.8.7. The "Make tree for fields" checkbox. When checked in, the tree of similarity between experiments' values in the expression matrix is being computed, and on visualizing the expression diagram the order of experiments is being determined by the tree of fields.
3.8.8. Options for saving the data in a file. N/A in the current version.
3.8.9. Buttons for operating the dialog window:
The "Analysis>Clustering>Build tree for fields" command calls the "Tree calculation setup" dialog window, which allows setting the parameters for building the tree of fields by the current table.
3.8.1. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.
3.8.2. The "Correlation type" list contains three types of correlation coefficients:
3.8.3. The "Distance type" list contains 3 types of distance measures, which are computed on the basis of correlation coefficients Rij:
3.8.4. The "Amalgamation rule" list contains 4 types of tree nodes combination:
3.9.5. Options for saving the data in a file. N/A in the current version.
3.9.6. Buttons for operating the dialog window:
The "Tree Diagram" dialog window appears on completing the tree building command (in the "Tree calculation setup" dialog window) and allows visualizing of the tree as a diagram.
Commands of the main menu:
Functions of the toolbar's buttons:
![]() |
"Nodes marking" - allows marking certain nodes on mouse click. Functions similarly to the "Edit>Mouse mode>Mark nodes" command of the main menu. | |
![]() |
"Child nodes swapping" - allows swapping child nodes on mouse click. Functions similarly to the "Edit>Mouse mode>Swap child nodes" command of the main menu. | |
![]() |
"Marking of all descent nodes" - allows marking descent nodes on mouse click.Functions similarly to the "Edit>Mouse mode>Marking all descent nodes" command of the main menu. | |
![]() |
"Unselect all" - allows unmarking all nodes. Functions similarly to the "Edit>Mark>Unmark all" command of the main menu. | |
![]() |
"Options" - calls the "tree options" dialog window. Functions similarly to the "Edit>Mark>Options" command of the main menu. | |
![]() |
"Image creation" - add a diagram of expression matrix visualization (N/A for tree of fields). | |
![]() |
"Open scalable tree" - calls the "Scalable Tree Diagram" dialog window, which allows scaling the built tree. |
3.10.1. The "find node setup" dialog window can be called using the "Edit>Mark>Mark nodes" command and allows setting the parameters for marking nodes. N/A in the current version.
3.10.1.1. The "Mark by leaf name" switch. If selected, nodes are being marked in accordance with the name of a leaf, defined in the textual field.
3.10.1.2. The "Mark by distance range" switch. If selected, nodes are being marked in accordance with the defined range of distances
Buttons for operating the dialog window:
3.10.2. The "pattern setup" dialog window can be called using the "Image> Image setup" command and allows setting up the visualization of expression matrix.
3.10.2.1. The "Palette type" list contains the following color pallets:
3.10.2.2. The "Range type" list. N/A in the current version.
3.10.2.3. The "Field specific ranges" checkbox. N/A in the current version.
3.10.2.4. The "Max. value color" bar allows setting the color for the maximal value. Available for the palettes 3.10.2.1.1, 3.10.2.1.2, 3.10.2.1.3 and 3.10.2.1.4. The "Color" button calls the "Color chooser dialog" dialog window, which allows setting the required color. The selected color is displayed on the "Preview" bar.
3.10.2.5. The "Min. value color" bar allows setting the color for the minimal value. Available for the palettes 3.10.2.1.3 and 3.10.2.1.4. The "Color" button calls the "Color chooser dialog" dialog window, which allows setting the required color. The selected color is displayed on the "Preview" bar.
3.10.2.6. The "Dataset max & min" field allows visualizing the maximal and minimal values of the expression matrix's cells:
3.10.2.7. The "User defined values" field. N/A in the current version.
3.10.2.8. The "Number of intervals" list - allows setting the number of intervals for computing the scale of palettes' colors.
Buttons for operating the dialog window:
In the status line the information on the cell of expression matrix, which is currently targeted with the mouse pointer, is displayed. It includes: the number of the gene in a table, identification of the gene, the number of the field in a table, identification of the field and the value of experiment.
The "Scalable Tree Diagram" can be called using one of the following ways:
Commands of the main menu:
Functions of the toolbar's buttons:
![]() |
"Zoom in vertical" - enlarge an image vertically | |
![]() |
"Zoom in horizontal" - enlarge an image horizontally | |
![]() |
"Zoom out vertical" - lessen an image vertically | |
![]() |
"Zoom out horizontal" - lessen an image horizontally | |
![]() |
"Zoom 100%" - restore the original size of an image | |
![]() |
"Zoom min" - set an image to the minimal scale | |
![]() |
"Zoom in" - enlarge the selected area | |
![]() |
"Zoom out" - lessen the selected area | |
![]() |
"Go to" - select a node to be visualized on the main diagram | |
![]() |
"Image creation" - add the diagram of the expression matrix's visualization (N/A for trees of fields). |
The "Analysis>Clustering>Find genes clusters" command calls the "setup for clustering procedure" dialog window, which allows setting the parameters of request for clustering genes.
Setting the parameters of clustering.
3.11.1. The "Fields..." button calls the dialog window for selecting fields 3.3. The textual field contains the number of selected fields.
3.11.2. The "Distance type" list contains 3 types of distance measures, which are computed on the basis of correlation coefficients Rij:
3.11.3. The "Correlation type" list contains three types of correlation coefficients:
3.11.4. On the threshold of clustering ("Distance threshold") is defined, genes are combined in a single cluster if the distance between them is lesser than defined threshold.
Clustering occurs on pressing the "Clustering" button. The results of clustering are displayed in the right part of the dialog window.
Operating the results of clustering.
3.11.5. The "Cluster #, size, score" list contains the obtained clusters. Each cluster is represented by its number, size and score.
3.11.6. The "Gene NAME, cluster index, gene score" list contains the genes, which belong to selected in the list 3.11.5 clusters. Each gene is represented by its identification and score.
3.11.7. The "Sort clusters by:" list contains the parameters of clusters' sorting:
3.11.8. The "Find gene by name" field allows searching for gene in the list 3.11.6 by its identification. It requires entering the identification (name) of the gene and pressing the "Find" button. If the gene is present in the list - it becomes selected, if not - the message "Not Clustered!" appears.
Setting the parameters of the window quitting.
3.11.9. The "Build tree for selected genes" checkbox. When checked in, then on exiting the dialog window (on pressing the "OK, Exit" button) occurs building the tree for genes, which belong to clusters, selected in the list 3.11.5.
3.11.10. The "Save selected data" checkbox, the "File name" field and the "Browse" button are N/A in the current version.
3.11.11. The "Add cluster info for current data" switch (radiobutton). When selected, then on exiting the dialog window (on pressing the "OK, Exit" button) the results of clustering are being added into 4 new fields with the following names:
On default, every time the dialog window is started the "Cl#" value, where # is the order number of current window's launch, is being assigned to TXT. If fields with such a name already exist, they are being updated, if not - created.
The table with the results of clustering is being added to the list of saved projects under an appropriate name (*.table)
3.11.12. The "Do not add clustering info" switch (radiobutton). When selected, then on exiting the dialog window (on pressing the "OK, Exit" button) the results of clustering are not being saved.
3.11.13. Buttons for operating the dialog window:
The "Analysis>Principal component" command calls the "setup for principal component analysis" dialog window, which allows the followings:
3.12.1. The "Fields..." button calls the dialog window for selecting fields. The textual field contains the number of selected fields.
3.12.2. The "Matrix type" list contains the types of matrices:
3.12.3. Buttons for operating the dialog window:
3.12.4. The "Component plots" list contains the components (eigenvectors), numbered in the order of descent eigenvalues.
3.12.5. The "Mark All" button allows selecting all components in the list. The "Unmark all" button allows deselecting all components in the list.
3.12.6. The bar for visualizing the sum of eigenvalues of components, selected in the "Component plots" list.
Building the projections.
3.12.7. "ComponentX" - contains the components (eigenvectors), numbered in the order of descent eigenvalues and allows selecting the component for X-axis.
3.12.8. "ComponentY" - contains the components (eigenvectors), numbered in the order of descent eigenvalues and allows selecting the component for Y-axis.
The "Draw" button allows building the diagram.
The dialog window of diagrams' profiles is purposed for visualizing the profiles of genes' expression and composed of three parts: the plot, the list of genes and the list of fields' groups.
The information bar is located below the diagram area. Information becomes available on placing the mouse pointer over an element of a plot.
There are two available modes of experiments' visualization:
Switching between these modes can be performed in the list of genes.
The "single" mode enables visualizing the profile for a single gene. The color of plot's elements corresponds to that of the group, to which experiments belong. Above the plot the maximal and minimal values for all groups of experiments are displayed. In the upper part of the plot there is a line, which represents the layout of the groups of experiments. The color of the line corresponds to that of the group below the line. Below the groups' layout and above the plot the total profile of a gene for all experiments is displayed. The Y-axis represents the expression value, the X-axis - the brief names of experiments.
The "multiple" mode enables visualizing the profiles for one or more genes. The color of plot's elements corresponds to that of the gene. Above the plot the maximal and minimal values for all groups of experiments are displayed. In the upper part of the plot there is a line, which represents the layout of the groups of experiments. The color of the line corresponds to that of the group below the line. Below the groups' layout and above the plot the total profile of all genes from the list is displayed.
In the right upper part of the window there is the list, which contains the names of genes.
Each button allows switching between the "single" and "multiple" modes. It also allows turning on/off the visualization of an appropriate profile. This option is associated with the indicator in the left part of each button.
If one of the buttons is pressed, the "single" mode is enabled, otherwise - the "multiple" one. The black indicator ( ) means the profile is visualized, the white one (
) - not visualized.
In order to change the color of profile make the mouse right click on the appropriate button. In the appeared popup menu select the "Set color" command and then in the appeared "Color dialog" window set the required color.
Above the list of genes there are the buttons for quick selecting/unselecting the genes ( - selecting,
- unselecting).
The "C" button calls the dialog window, which represents the list, containing the results of clustering and its color layout. Genes that do not belong to any of clusters are colored in black. On removing a cluster from being visualized, all genes in this cluster automatically discontinue to be visualized also.
The drop down list above the list of genes contains the identifications of the worktables.
In the right lower part of the dialog window there is the list, which contains the names of the groups of experiments. The buttons (color of the button corresponds to that of the group) allow operating each group separately:
Above the list of groups there are the buttons for quick switching between the modes of visualization of all groups: - turn off visualizing,
- display maximums and minimums of the groups only,
- display experiments, selected in the lists of groups,
- display all elements of all groups.
Clicking the mouse right button results in appearance of the popup menu: