**What: **Determine gene association and sample clustering.

**Why: **An intensively used tool for gene expression and samples clustering is the
parametric Pearson correlation.The clustering method is applied to microarray data to
identify significant associations in genomic data or to establish links between pairs of
genes in a network. Graphical Gaussian Model (GGM) characterizes gene associations using
correlation and partial correlation computation in the data matrix: columns are genes and
rows are samples. The partial correlation matrix is usually defined by correlation between
any pair of genes conditioned on all the remainder of genes. Hence, having g gens, g(g+1)/2
partial correlation are computed. If the number of pairs is high, some spuriously significant
partial correlation could be obtained purely by chance. The partial correlation of a pair of
variables is the standard Pearson correlation computed between the residuals of two multiple
linear regressions (the two regressions equations have the specified pair as dependent
variables and all the conditioning variables as independent variables).

The point is that the accuracy of partial correlation and hence, the reliability of all the
conclusions obtained using whatever procedure including explanations based on partial
correlation is depending on the accuracy of some regression residuals.

Copyright ©2010 by INCDSB