P. Toronen, G. Wong and E. Castren Pages 445 - 463 ( 19 )
The analysis of gene expression has been revolutionized over recent years through the use of DNA arrays that can monitor the expression of thousands of genes. The field has great potential but requires a multidisciplinary approach integrating biochemistry with computer science, statistics, and engineering. Analysis of gene expression data has no golden rule, and the methodologies vary extensively between studies. Errors are inherent in data and should be estimated as part of the analysis. This article presents analytical methods applied to gene expression data. We describe the preprocessing steps and explain their need. The actual analysis of data concentrates on clustering of the gene expression data and how it can also be used in conjunction with other data sources like functional annotation. We also explain the functions of clustering algorithms. At the end we present a method for validation of clustering results that is based on probabilities of observed frequencies of functional classes within clusters. This method is able to integrate information from several functional classes and sort them out according to the statistical significance.
gene expression data, dna arrays, functional annotation, clustering algorithms
A.I. Virtanen Institute, University of Kuopio, Box 1627, 70211 Kuopio, Finland.