Ratio Statistics of Gene Expression Levels and Applications to Microarray Data Analysis

 

Yidong Chen1, Vishnu Kamat2, Edward R. Dougherty2,*, Michael L. Bittner1, Paul S. Meltzer1, and Jeffery M. Trent1

 

1Cancer Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892

2Department of Electrical Engineering, Texas A&M University, College Station, TX 77843-3128

 

 

Corresponding author: Edward R. Dougherty (e-dougherty@tamu.edu) and Yidong Chen (yidong@nhgri.nih.gov).

 

 

1.      Abstract. [PDF]

Motivation: Expression-based analysis for large families of genes has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.

Results: This paper focuses on estimation of signal ratios from the two channels, and the significance of those ratios. The key issue is the determination of whether a ratio is significantly high or low in order to conclude whether the gene is up-regulated or down-regulated. The paper builds on an earlier study that involved a hypothesis test based on a ratio statistic under the supposition that the measured fluorescent intensities subsequent to image processing can be assumed to reflect the signal intensities. Here, a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper.  Finally, and in this vein, a quality metric is formulated for spots. This measure can be used to decide whether or not a spot ratio should be deleted, or to adjust various measurements to reflect confidence in the quality of the measurement.

 

Key words: genomics, image processing, microarray, ratio statistics

 

2.      Microarray Image Analysis.

A typical glass-substrate and fluorescent-based cDNA microarray detection system is based on a scanning confocal microscope, where two monochrome images are obtained from laser excitations at two different wavelengths. We generally assume that specific DNA products from two samples have an equal probability of hybridizing to the specific target, therefore, the fluorescent intensity measurement is a function of the amount of specific RNA available within each sample, provided that samples are well-mixed and the amount of cDNA deposited at each target location is sufficiently abundant. An earlier paper describes an image analysis system that was previously implemented [Chen, 1997]. Since the measurement via image processing techniques is essential to the ratio statistics discussed in this paper, we will briefly summarize the key modifications and improvements to that system that have been made in accordance with five years of experience. The block diagram of the image analysis system is shown in Fig. 1. A microarray images is first segmented into individual cDNA targets, either by manual interaction or an automated algorithm. For each target, the surrounding background fluorescent intensity is estimated, along with the exact target location, fluorescent intensity and expression ratios.

 

2.1.

Target segmentation and clone information assignment. [PDF]

Given a typical microarray image with a regular pen-spotting pattern, many algorithms have been proposed to automatically detect the location of array within a scanned image and therefore determine the segmentation of the cDNA target location. Some use a set of "landing lights" that form a predefined spotting pattern to assist the detection process. Assuming that a good robotic spotting process produces rigid grid pattern, we have chosen to implement a user-assisted graphic interface to overlay a grid for each image in order to assure that the grid is correctly placed.

    After segmentation, clone information (clone ID, titles, etc. are usually supplied with the location in a microtiter plate) must be connected to array locations before further processing. Information regarding non-printed, duplicated, negative-control, or housekeeping-gene locations is useful for normalization and other calibration purposes. The mapping of clone information to microarrays is determined by the robotic arrayer’s deposition program. We use a standard array pattern that most arrayers can produce and a standard input format for other mass-fabricated microarray layouts.

 

2.2.

Background detection. [PDF]

We assume that the total fluorescent emitted from a target location consists of two components: fluorescent signal derived from fluor-tagged mRNA specific to the target, and background fluorescent due to non-specific binding to the glass surface, or the target itself. We choose to use the peripheral area of the target under consideration to estimate the fluorescent background at the target site if no signal is presented.

 

2.3.

Target detection. [PDF]

A unique target detection algorithm is designed to utilize the fact that cDNA targets printed by each print-tip possess similar morphology. Therefore, a target mask, which determines the rough location and shape, is obtained for each print-tip in order to reduce the interference of background noise to the expression signal. After placing target mask to each cDNA location, a thresholding algorithm is employed to precisely locate the cDNA target.

 

2.4.

Intensity measurement and ratio calculation. [PDF]

We use trimmed mean to estimate intensity from each fluorescent channel, and expression ratio is simply the background subtracted fluorescent intensity from first sample channel, usually call "red channel" divided by the that of second sample channel.

 

 

3.      Figures. [PDF]

 

4.      Appendices

                                             i.              Appendix A. [PDF]

                                               ii.              Appendix B. [PDF]