Background: Data analytic approaches to Affymetrix microarray data include: (a) a covariate model, in which the observed signal is some estimated linear function of perfect match (PM) and mismatch (MM) signals; (b) a difference model [PM-MM]; and (c) a PM-only model, in which MM data is not utilized.
Methods: By decomposing the correlations among the variables in the statistical model and making certain assumptions, we theoretically derive the statistical model that reflects the actual gene expression level under a variety of conditions expected in microarray data.
Results and conclusion: When modeling non-systematic variation, the covariate model provides maximum flexibility and often reflects the actual gene expression levels better than the difference model. However, the PM-only model demonstrates superior power in an overwhelming majority of realistic situations, which provides theoretical support for the current trend to employ PM-only models in microarray data analyzes.