Ratio Statistics of Gene Expression Levels and
Applications to Microarray Data Analysis
Yidong Chen1, Vishnu Kamat2,
Edward R. Dougherty2,*, Michael L. Bittner1, Paul S.
Meltzer1, and Jeffery M. Trent1
1Cancer Genetics Branch, National Human
Genome Research Institute, National Institutes of Health, Bethesda, MD 20892
2Department of Electrical Engineering,
Texas A&M University, College Station, TX 77843-3128
Corresponding author: Edward
R. Dougherty (e-dougherty@tamu.edu)
and Yidong Chen (yidong@nhgri.nih.gov).
1. Abstract. [PDF]
Motivation: Expression-based analysis for large families of genes has
recently become possible owing to the development of cDNA microarrays, which
allow simultaneous measurement of transcript levels for thousands of genes. For
each spot on a microarray, signals in
two channels must be extracted from their backgrounds. This requires algorithms
to extract signals arising from tagged mRNA hybridized to arrayed cDNA
locations and algorithms to determine the significance of signal ratios.
Results: This paper
focuses on estimation of signal ratios from the two channels, and the
significance of those ratios. The key issue is the determination of whether a
ratio is significantly high or low in order to conclude whether the gene is
up-regulated or down-regulated. The paper builds on an earlier study that
involved a hypothesis test based on a ratio statistic under the supposition
that the measured fluorescent intensities subsequent to image processing can be
assumed to reflect the signal intensities. Here, a refined hypothesis test is
considered in which the measured intensities forming the ratio are assumed to
be combinations of signal and background. The new method involves a
signal-to-noise ratio, and for a high signal-to-noise ratio the new test
reduces (with close approximation) to the original test. The effect of low
signal-to-noise ratio on the ratio statistics constitutes the main theme of the
paper. Finally, and in this vein,
a quality metric is formulated for spots. This measure can be used to decide
whether or not a spot ratio should be deleted, or to adjust various
measurements to reflect confidence in the quality of the measurement.
Key
words: genomics,
image processing, microarray, ratio statistics
2. Microarray Image Analysis.
A typical
glass-substrate and fluorescent-based cDNA microarray detection system is based
on a scanning confocal microscope, where two monochrome images are obtained
from laser excitations at two different wavelengths. We generally assume that
specific DNA products from two samples have an equal probability of hybridizing
to the specific target, therefore, the fluorescent intensity measurement is a
function of the amount of specific RNA available within each sample, provided
that samples are well-mixed and the amount of cDNA deposited at each target
location is sufficiently abundant. An earlier paper describes an image analysis
system that was previously implemented [Chen, 1997]. Since the
measurement via image processing techniques is essential to the ratio
statistics discussed in this paper, we will briefly summarize the key
modifications and improvements to that system that have been made in accordance
with five years of experience. The block diagram of the image analysis system
is shown in Fig. 1. A microarray images is first segmented into individual cDNA
targets, either by manual interaction or an automated algorithm. For each
target, the surrounding background fluorescent intensity is estimated, along
with the exact target location, fluorescent intensity and expression ratios.
|
2.1. |
Target segmentation and
clone information assignment. [PDF] Given a
typical microarray image with a regular pen-spotting pattern, many algorithms
have been proposed to automatically detect the location of array within a
scanned image and therefore determine the segmentation of the cDNA target
location. Some use a set of "landing lights" that form a predefined
spotting pattern to assist the detection process. Assuming that a good
robotic spotting process produces rigid grid pattern, we have chosen to
implement a user-assisted graphic interface to overlay a grid for each image
in order to assure that the grid is correctly placed. After segmentation, clone
information (clone ID, titles, etc. are usually supplied with the location in
a microtiter plate) must be connected to array locations before further
processing. Information regarding non-printed, duplicated, negative-control,
or housekeeping-gene locations is useful for normalization and other
calibration purposes. The mapping of clone information to microarrays is
determined by the robotic arrayer’s deposition program. We use a
standard array pattern that most arrayers can produce and a standard input
format for other mass-fabricated microarray layouts. |
|
2.2. |
Background detection. [PDF] We
assume that the total fluorescent emitted from a target location consists of
two components: fluorescent signal derived from fluor-tagged mRNA specific to
the target, and background fluorescent due to non-specific binding to the
glass surface, or the target itself. We choose to use the peripheral area of
the target under consideration to estimate the fluorescent background at the
target site if no signal is presented. |
|
2.3. |
Target detection. [PDF] A
unique target detection algorithm is designed to utilize the fact that cDNA
targets printed by each print-tip possess similar morphology. Therefore, a
target mask, which determines the rough location and shape, is obtained for
each print-tip in order to reduce the interference of background noise to the
expression signal. After placing target mask to each cDNA location, a
thresholding algorithm is employed to precisely locate the cDNA target. |
|
2.4. |
Intensity measurement and
ratio calculation. [PDF] We use
trimmed mean to estimate intensity from each fluorescent channel, and
expression ratio is simply the background subtracted fluorescent intensity
from first sample channel, usually call "red channel" divided by
the that of second sample channel. |
3. Figures. [PDF]
4. Appendices
i.
Appendix A. [PDF]
ii.
Appendix B. [PDF]