Which is Better for cDNA-Microarray-Based Classification: Ratios or Direct Intensities

Sanju Attoor1, Edward R. Dougherty 1,2,5, Yidong Chen3, Michael L. Bittner4, Jeffrey M. Trent4

1Department of Electrical Engineering, Texas A&M University, College Station, Texas
2Department of Pathology, University of Texas M. D. Anderson Cancer Center, Houston, Texas
3National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
4Translational Genomics Research Institute, Phoenix, Arizona
5Corresponding author (e-dougherty@tamu.edu)

Abstract

Motivation: There are two general methods for making gene-expression microarrays: one is to hybridize a single test set of labeled targets to the probe, and measure the background-subtracted intensity at each probe site; the other is to hybridize both a test and a reference set of differentially labeled targets to a single detector array, and measure the ratio of the background-subtracted intensities at each probe site. Which method is better depends on the variability in the cell system and the random factors resulting from the microarray technology. It also depends on the purpose for which the microarray is being used. Classification is a fundamental application and it is the one considered here.

Results: This paper describes a model-based simulation paradigm that compares the classification accuracy provided by these methods over a variety of noise types and presents the results of a study modeled on noise typical of cDNA microarray data. The model consists of four parts: (a) the measurement equation for genes in the reference state; (b) the measurement equation for genes in the test state; (c) the ratio and normalization procedure for a dual-channel system; and (d) the intensity and normalization procedure for a single-channel system. In the reference state, the mean intensities are modeled as a shifted exponential distribution, and the intensity for a particular gene is modeled via a normal distribution, Normal(I, &alphaI), about its mean intensity I, with &alpha being the coefficient of variation of the cell system. In the test state, some genes have their intensities up-regulated by a random factor. The model includes a number of random factors affecting intensity measurement: deposition gain d, labeling gain, and post-image-processing residual noise. The key conclusion resulting from the study is that the coefficient of variation governing the randomness of the intensities and the deposition gain are the most important factors for determining whether a single-channel or dual-channel system provides superior classification, and the decision region in the &alpha-d plane is approximately linear.

1. Color Figures.[ Figure 1, Figure 2, Figure 3]
2. Appendices. [pdf]
This appendix contains supplemental material for the paper and provides basic background material for readers who might not be familiar with some of the concepts concerning the model or the statistical methods. In addition to the model references in the main body of the paper, we refer to the literature for overview discussions concerning: microarrays [Mousses et al., 2000; Dobbin et al., 2003; Chen et al., 2002], microarray classification [Dougherty and Attoor, 2002; Dougherty; 2001], and Boolean regulatory networks [Somogyi and Greller, 2001; Huang, 1999; Shmulevich et al., 2002].
i.
Appendix A1. Noise Sources
ii.
Appendix A2. Parameter Estimation
iii.
Appendix A3. Classification

    Updated: May, 2004
    Yidong Chen