**What: **The DNA-microarray technology allows simultaneous analysis of thousands
of genes (or of the entire genome) to determine their (changes in) gene expression level,
but can also be used to detect mutations such as single nucleotide polymorphisms (SNPs) type.

**Why: **There are two main types of technology platforms for DNA microarrays implementation
and analysis: “Two-Colour Spotted Microarrays” or “Spotted Microarrays” technology platform
and “Probe-Set Arrays” or “Affymetrix” platform.

*Pre-processing microarray data*

The first step in processing stage is the image analysis. The microarray is scanned
for to obtain a digital image in which every sample is described by some tens of
pixels. It follows the image segmentation used to locate the pixels that correspond
to each sample. Each sample is quantified (the light intensities of the corresponding
pixels are summarized). The background quantification (for to separate the specific
effects of those unspecific) is also necessary.

The next step consists in eliminating the background effect. The objective is to
estimate the genetic material abundance measuring the samples signals intensity.
Other important aspects are: data normalization to ensure the compatibility with
other microarray experiments and the evaluation of the data quality, for identifying
the discordant data.

*Statistical analysis of gene expression*

One microarrays experiment is testing the gene expression in the case: one gene-one
sample. The question is: if a specific gene has a significant expression for a given
sample.

The hybridization intensities values obtained by the target samples and control samples
are dependent pairs. The statistical problem of interest is to verify the null
hypothesis of equal means for two dependent random variables or, in other words,
to verify if there are significant differences between the means of the two groups of
dependent values. Thus, the case “one gene-one sample” becomes “one gene-two samples”.

The most natural problem in microarrays analysis is to compare levels of gene expression
for two target samples, one with diseased cells and one with healthy cells. This case
also becomes the case “one gene-two samples” from above, but in this case, the variables
can be dependent or independent.

For those problems there are well known statistical tests that can be used, already
implemented in software packages, like SPSS, SAS, Statistica, R: two-sample t-test,
one sample t-test: testing for a mean, the Wilcoxon two sample test, Wilcoxon
signed-ranks test, two-sample permutation test and the ANOVA test.

Copyright ©2010 by INCDSB