Partial correlation of gene association


What: Determine gene association and sample clustering.

Why: An intensively used tool for gene expression and samples clustering is the parametric Pearson correlation.The clustering method is applied to microarray data to identify significant associations in genomic data or to establish links between pairs of genes in a network. Graphical Gaussian Model (GGM) characterizes gene associations using correlation and partial correlation computation in the data matrix: columns are genes and rows are samples. The partial correlation matrix is usually defined by correlation between any pair of genes conditioned on all the remainder of genes. Hence, having g gens, g(g+1)/2 partial correlation are computed. If the number of pairs is high, some spuriously significant partial correlation could be obtained purely by chance. The partial correlation of a pair of variables is the standard Pearson correlation computed between the residuals of two multiple linear regressions (the two regressions equations have the specified pair as dependent variables and all the conditioning variables as independent variables).
The point is that the accuracy of partial correlation and hence, the reliability of all the conclusions obtained using whatever procedure including explanations based on partial correlation is depending on the accuracy of some regression residuals.