Content

1. Introduction

Using SelTag software the alon1999_set.txt data were analyzed. The data contain measurements of expression levels of 2000 human cDNAs and ESTs (including sequences homologous to some known eukaryotic genes) in colon adenocarcinoma tissues from several patients. For some patients, expression of these RNAs was also measured in normal colon tissues. Totally the table contains the measurements of expression in 40 tumor and 22 normal colon tissues. These data are combined into appropriate measurement groups "Tumor" and "Normal". Analysis consisted in building the hierarchical clustering for tissues. It was obtained the division of tissues (experiments) into two classes. The first class includes predominantly tumor tissues, the second one - normal. Results were compared to the ones obtained in original paper [1].

2. Running the application and loading data.

2.1. On the application startup the "Login" dialog window appears (fig. 2.1). In this window select the "Anonymous" mode and press the "OK" button.
Note. The "Anonymous" mode is intended for working with demo data.

Figure 2.1.
1. Non-registered user mode. 2. Button to run application. 3. Button to cancel the application start.

2.2. After the application is started the main application window appears. Select the "File>Open data" command in the main menu (fig. 2.2)

Figure 2.2.

2.3. Once it is done, appears the "Load data" dialog window that contains the names of files with data tables and sizes of these files (fig. 2.3.1). In this window select a file and press the "OK" button. It will cause appearance of the "Wait" message box, which will disappear after finishing of data loading (fig. 2.3.2).

Figure 2.3.1.
1. Selected file with expression data. 2. Confirmation button. 3. Cancel button.

Figure 2.3.2.

2.4. Table with the selected data will be shown in the main application window.

2.5. To load a file with genes description select the "File>Link gene data" command in the main menu (fig. 2.5).

Figure 2.5.

2.6. The "Load data" dialog window that contains the names of files with genes descriptions and sizes of these files will appear (fig. 2.6). In this window select a file and press the "OK" button.

Figure 2.6.
1. Selected file with genes description. 2. Confirmation button. 3. Cancel button.

2.7. The "Description load" message box with suggestion to use dynamic file loading mode (fig. 2.7) will appear. Press the "Yes" button.

Figure 2.7.

2.8. In contextual menu of the application main table (contextual menu can be called out by the right mouse click) the "URLs>UniGene" command will become active (fig. 2.8.1). This command, using a web link to gene, loads a card from UniGene database for the appropriate gene into new window of your web browser (fig. 2.8.2).

Figure 2.8.1.

Figure 2.8.2.

2.9. To load a file with gene's nucleotide sequences, select the "File>Link sequence" command in the main menu (fig. 2.9).

Figure 2.9.

2.10. It will result in appearance of the "Load data" dialog window that contains the names of files with genes' sequences and information on sizes of these files (fig. 2.10). In this window select a file and press the "OK" button.

Figure 2.10.
1. Selected file with genes description. 2. Confirmation button. 3. Cancel button.

2.11. Further the "Description load" message box with suggestion to use dynamic file loading mode (fig. 2.11) will appear. Press the "Yes" button.

Figure 2.11.

2.12. In contextual menu of the application main table (contextual menu can be called out by the right mouse click) the "Show sequence" command will become active (fig. 2.8.1). This command calls out the window with nucleotide sequence of a gene (fig. 2.12).

Figure 2.12.

2.13. To retrieve a description of loaded data, select the "File>Data description" command in the main menu (fig. 2.13).

Figure 2.13.

2.14. It will result in opening of a document with description and list of files for data "alon1999_set" from the Softberry server (fig. 2.14).

Figure 2.14.

3. Hierarchical clustering of experiments.

One of the approaches for revealing clusters of genes with similar profiles of expression is the method of hierarchical clustering [2]. Such an analysis is based on building of binary tree for experiments by defined metrics of distances between them. Each knot of a tree connects two child knots, lengths of branches correspond to distances between expression profiles in experiments.

This chapter contains description of the building of trees for fields with use of various methods of hierarchical clustering as well as comparison of obtained clustering results with original ones [1].

To perform the claimed task it is necessary to do the following:

3.1. Select the "Clustering>Build tree for fields" command in the main menu (fig. 3.1).

Figure 3.1.

3.2. It will cause opening of the "Tree calculation setup" dialog (fig. 3.2). For the beginning, choose fields that will be used for calculation. To do this press the "Fields" button (fig. 3.2).

Figure 3.2.
1. Button for opening of the "Field selection" dialog window.

3.3. The "Field selection" dialog (fig. 3.3) that is purposed for fields selection will appear. In the current example, all experiments are involved in calculation. Press the "Select all experiments" button.

Figure 3.3.
1. Fields selection list. 2. Button for selecting of all fields with data on experiments.

3.4. It will result in selecting of all fields that contain expression values (fig. 3.4). Press the "OK" button.

Figure 3.4.
1. Selected fields with data on experiments. 2. Confirmation button.

3.5. In the "Tree calculation setup" dialog, alongside to the "Fields" button, the information on number of selected fields will appear (fig. 3.5).

Figure 3.5.
1. Information on number of selected fields. 2. List of correlation types. 3. List of distance types. 4. List of clustering rules. 5. Confirmation button.

3.6. Further actions are required:

3.6.1. To choose the appropriate variant from the "Correlation type" list (fig. 3.6.1.).

Figure 3.6.1.

3.6.2. To choose the type of distances (that are calculated in dependence on correlation coefficient Rij) from the "Distance type" list (fig. 3.6.2.).

Figure 3.6.2.

3.6.3. To choose the clustering method from the "Amalgamation rule" list (fig. 3.6.3.).

Figure 3.6.3.

3.6.4. To press the "OK" button.
Example of settings is shown on fig. 3.5.

3.7. It will cause the "Tree Diagram" dialog with obtained fields tree diagram to appear. On figures 3.7-3.10 the diagrams obtained with use of different knot binding ways are shown. To build the diagrams the following parameters shown on fig. 3.5 (Pearson's correlation and 1-Rij type of distance) were used. The figure 3.7 represents the results of using the UPGMA knots binding way, the fig. 3.8 - the WPGMA one, the fig 3.9 - the Furthest neighbor type, and the fig 3.10 - the Nearest neighbor one.

Figure 3.7.
Figure 3.8.
Figure 3.9.
Figure 3.10.

On all diagrams shown the tissues are clearly divided into cancerous and normal ones, and for all diagrams the clusters of normal tissues contain the cancerous tissues T30, T33 and T36, while the clusters of cancerous tissues include the normal ones N34 (for all 4 diagrams) and N8 (for those built with use of Furthest neighbor and UPGMA types) that is in accordance to results obtained by Alon et al., 1999. Three diagrams (UPGMA, WPGMA and Furthest neighbor) contain a small cluster of N36, T2, T37 and T40 tissues that is stably being excluded from common pull.

Thus, the analysis of clusters content by the methods described shows the stability of data clustering process.

Table 1. Content of clusters for methods UPGMA and Furthest neighbor (FN), and for results described in the article. In the "original" columns the order of cluster fields corresponding to tree is shown, the "sort" one contains the fields sorted by numbers. Clusters enumeration corresponds to that on figures 3.8-3.10. The numbers for normal tissues that are included in cancerous cluster, as well as that for cancerous ones that are included in normal cluster, are shown in red.

Cluster name Paper UPGMA FN WPGMA
original sort original sort original sort original sort
T1 T16 T1 T12 T3 T32 T5 T12 T5
T28 T4 T9 T5 T34 T6 T9 T6
T13 T5 T10 T6 T31 T9 T6 T9
T9 T8 T5 T7 T26 T10 T29 T10
T21 T9 T32 T8 T24 T12 T20 T12
T35 T10 T31 T9 T21 T20 T34 T20
T10 T13 T34 T10 T12 T21 T26 T21
T27 T15 T26 T12 T10 T24 T24 T24
T8 T16 T24 T13 T5 T26 T21 T26
T5 T21 T21 T20 T9 T29 T32 T29
T4 T26 T22 T21 T29 T31 T31 T31
T1 T27 T13 T22 T20 T32 T10 T32
T15 T28 T8 T24 T6 T34 T5 T34
T26 T35 T7 T26        
    T3 T29        
    T29 T31        
    T20 T32        
    T6 T34        
                 
T2 T17 N34 N8 N8 T19 N34 T19 N34
T25 T14 T25 N34 T15 T1 T11 T1
T18 T17 T19 T1 T17 T4 T23 T4
T23 T18 T11 T4 T16 T11 T1 T11
T31 T20 T23 T11 T14 T14 T16 T14
T20 T23 T1 T14 T11 T15 T17 T15
N34 T24 T16 T15 T23 T16 T15 T16
T24 T25 T17 T16 T1 T17 T14 T17
T29 T29 T15 T17 T18 T18 T18 T18
T38 T31 T14 T18 T4 T19 T4 T19
T14 T32 T18 T19 N34 T23 N34 T23
T40 T38 T4 T23 T39 T27 T39 T27
T32 T40 N34 T25 T27 T39 T27 T39
    T39 T27        
    T27 T28        
    T38 T35        
    T35 T38        
    T28 T39        
                 
T3 T39 N8     N8 N2 T25 T3
T11 N12     T38 N6 T22 T7
T6 T3     T35 N8 T13 T8
T19 T6     T28 T3 T8 T13
T12 T7     N6 T7 T7 T22
T22 T11     N2 T8 T3 T25
T34 T12     T25 T13 T38 T28
T7 T19     T22 T22 T35 T35
N8 T22     T13 T25 T28 T38
T3 T34     T8 T28    
N12 T39     T7 T35    
        T3 T38    
                 
N1 N4 N2 N7 N3 N4 N3 N4 N3
N33 N3 N4 N4 N12 N4 N5 N4
N7 N4 N5 N5 N9 N9 N10 N5
T37 N5 N10 N7 N10 N10 N3 N9
N5 N7 N3 N9 N3 N12 N12 N10
N27 N10 N12 N10     N9 N12
N3 N27 N9 N12        
N2 N29            
N40 N33            
N36 N36            
T2 N40            
N29 T2            
N10 T37            
                 
N2 N9 N1 N33 N1 N39 N1 N33 N1
T30 N6 N35 N2 T33 N5 N35 N2
T36 N9 N27 N6 N1 N11 N27 N6
N6 N11 N40 N11 N32 N27 N40 N11
T33 N28 T36 N27 T30 N28 T36 N27
N11 N32 N29 N28 N11 N29 N29 N28
N1 N35 N28 N29 N5 N32 N28 N29
N39 N39 N11 N32 N33 N33 N11 N32
N28 T30 N39 N33 N35 N35 N39 N33
N35 T33 T33 N35 N27 N39 T33 N35
N32 T36 N1 N39 N40 N40 N1 N39
    N32 N40 T36 T30 N32 N40
    T30 T30 N29 T33 T30 T30
    N6 T33 N28 T36 N6 T33
    N2 T36     N2 T36
 
Figure 3.1.1.

4. References

1. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, and Levine AJ (1999) Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. USA, 96, 6745-6750.

2. Sneath P.H.A., Sokal R.R. (1973) Numerical Taxonomy. The principles and practice of numerical classification. San Francisco: W.H. Freeman and Co.