dna microarrays: a powerful genomic tool for biomedical and … · 2018. 10. 9. · 528 | trevino...

15
INTRODUCTION Genomics approaches have changed the way we do research in biology and medicine. We now can measure the ma- jority of mRNAs, proteins, metabolites, protein-protein interactions, genomic mu- tations, polymorphisms, epigenetic alter- ations, and micro RNAs in a single ex- periment. The data generated by these methods together with the knowledge de- rived by their analyses was unimaginable just a few years ago. These techniques, however, produce such amounts of data that making sense of them is a difficult task. So far, DNA microarray technologies are perhaps the most successful and ma- ture methodologies for high-throughput and large-scale genomic analyses. DNA microarray technologies initially were designed to measure the transcrip- tional levels of RNA transcripts derived from thousands of genes within a genome in a single experiment. This technology has made it possible to relate physiologi- cal cell states to gene expression patterns for studying tumors, diseases progression, cellular response to stimuli, and drug tar- get identification. For example, subsets of genes with increased and decreased activ- ities (referred to as transcriptional profiles or gene expression “signatures”) have been identified for acute lymphoblast leukemia (1), breast cancer (2), prostate cancer (3), lung cancer (4), colon cancer (5), multiple tumor types (6), apoptosis- induction (7), tumorigenesis (8), and drug response (9). Moreover, because the pub- lished data is increasing every day, inte- grated analysis of several studies or “meta-analysis,” have been proposed in the literature (10). These approaches de- tect generalities and particularities of gene expression in diseases. More recent uses of DNA microarrays in biomedical research are not limited to gene expression. DNA microarrays are being used to detect single nucleotide polymor- phisms (SNPs) of our genome (Hap Map project) (11), aberrations in methylation patterns (12), alterations in gene copy- number (13), alternative RNA splicing (14), and pathogen detection (15,16). In the last ten or 15 years, high quality arrays, standardized hybridization proto- cols, accurate scanning technologies, and robust computational methods have es- tablished DNA microarray for gene ex- pression as a powerful, mature, and easy to use essential genomic tool. Although the identification of the most relevant in- formation from microarray experiments is still under active research, very well established methods are available for a broad spectrum of experimental setups. In this publication, we present the most common uses of DNA microarray tech- nologies, provide an overview of their MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 | TREVINO ET AL. | 527 DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research Address correspondence and reprint requests to Hugo A. Barrera-Saldaña, Departamento de Bioquímica, Facultad de Medicina de la Universidad Autónoma de Nuevo León, Avenida. Madero y Eduardo Aguirre Pequeño, Colonia, Mitras Centro Zip Code. 64460, Monterrey, Nuevo León, México. Phone: 818-329-4174 ext. 2587; Fax: 818-123-8249; E-mail : [email protected]. Submitted December 6, 2006; Accepted for publication July 2, 2007. Communicated by: Adolofo Martinez-Palomo Victor Trevino, 1,2 Francesco Falciani, 2 and Hugo A Barrera-Saldaña 3 1 Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Nuevo León, México; 2 School of Biosciences, University of Birmingham, Birmingham, United Kingdom; 3 Laboratorio de Genómica y Bioinformática del Unidad de Laboratorios de Ingeniería y Expresión Genética, Departamento de Bioquímica, Facultad de Medicina de la Universidad Autónoma de Nuevo León. Monterrey, Nuevo León, México Among the many benefits of the Human Genome Project are new and powerful tools such as the genome-wide hybridization devices referred to as microarrays. Initially designed to measure gene transcriptional levels, microarray technologies are now used for comparing other genome features among individuals and their tissues and cells. Results provide valuable information on disease subcategories, disease prognosis, and treatment outcome. Likewise, they reveal differences in genetic makeup, regula- tory mechanisms, and subtle variations and move us closer to the era of personalized medicine. To understand this powerful tool, its versatility, and how dramatically it is changing the molecular approach to biomedical and clinical research, this review de- scribes the technology, its applications, a didactic step-by-step review of a typical microarray protocol, and a real experiment. Finally, it calls the attention of the medical community to the importance of integrating multidisciplinary teams to take advan- tage of this technology and its expanding applications that, in a slide, reveals our genetic inheritance and destiny. Online address: http://www.molmed.org doi: 10.2119/2006–00107.Trevino

Upload: others

Post on 08-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

INTRODUCTIONGenomics approaches have changed

the way we do research in biology andmedicine. We now can measure the ma-jority of mRNAs, proteins, metabolites,protein-protein interactions, genomic mu-tations, polymorphisms, epigenetic alter-ations, and micro RNAs in a single ex-periment. The data generated by thesemethods together with the knowledge de-rived by their analyses was unimaginablejust a few years ago. These techniques,however, produce such amounts of datathat making sense of them is a difficulttask. So far, DNA microarray technologiesare perhaps the most successful and ma-ture methodologies for high-throughputand large-scale genomic analyses.

DNA microarray technologies initiallywere designed to measure the transcrip-

tional levels of RNA transcripts derivedfrom thousands of genes within a genomein a single experiment. This technologyhas made it possible to relate physiologi-cal cell states to gene expression patternsfor studying tumors, diseases progression,cellular response to stimuli, and drug tar-get identification. For example, subsets ofgenes with increased and decreased activ-ities (referred to as transcriptional profilesor gene expression “signatures”) havebeen identified for acute lymphoblastleukemia (1), breast cancer (2), prostatecancer (3), lung cancer (4), colon cancer(5), multiple tumor types (6), apoptosis-induction (7), tumorigenesis (8), and drugresponse (9). Moreover, because the pub-lished data is increasing every day, inte-grated analysis of several studies or“meta-analysis,” have been proposed in

the literature (10). These approaches de-tect generalities and particularities of geneexpression in diseases.

More recent uses of DNA microarrays inbiomedical research are not limited to geneexpression. DNA microarrays are beingused to detect single nucleotide polymor-phisms (SNPs) of our genome (Hap Mapproject) (11), aberrations in methylationpatterns (12), alterations in gene copy-number (13), alternative RNA splicing (14),and pathogen detection (15,16).

In the last ten or 15 years, high qualityarrays, standardized hybridization proto-cols, accurate scanning technologies, androbust computational methods have es-tablished DNA microarray for gene ex-pression as a powerful, mature, and easyto use essential genomic tool. Althoughthe identification of the most relevant in-formation from microarray experimentsis still under active research, very wellestablished methods are available for abroad spectrum of experimental setups.In this publication, we present the mostcommon uses of DNA microarray tech-nologies, provide an overview of their

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 2 7

DNA Microarrays: a Powerful Genomic Tool for Biomedicaland Clinical Research

Address correspondence and reprint requests to Hugo A. Barrera-Saldaña, Departamento deBioquímica, Facultad de Medicina de la Universidad Autónoma de Nuevo León, Avenida.Madero y Eduardo Aguirre Pequeño, Colonia, Mitras Centro Zip Code. 64460, Monterrey, NuevoLeón, México. Phone: 818-329-4174 ext. 2587; Fax: 818-123-8249; E-mail: [email protected] December 6, 2006; Accepted for publication July 2, 2007.Communicated by: Adolofo Martinez-Palomo

Victor Trevino,1,2 Francesco Falciani,2 and Hugo A Barrera-Saldaña3

1Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Nuevo León, México; 2School of Biosciences, University ofBirmingham, Birmingham, United Kingdom; 3Laboratorio de Genómica y Bioinformática del Unidad de Laboratorios de Ingeniería yExpresión Genética, Departamento de Bioquímica, Facultad de Medicina de la Universidad Autónoma de Nuevo León. Monterrey,Nuevo León, México

Among the many benefits of the Human Genome Project are new and powerful tools such as the genome-wide hybridizationdevices referred to as microarrays. Initially designed to measure gene transcriptional levels, microarray technologies are nowused for comparing other genome features among individuals and their tissues and cells. Results provide valuable information ondisease subcategories, disease prognosis, and treatment outcome. Likewise, they reveal differences in genetic makeup, regula-tory mechanisms, and subtle variations and move us closer to the era of personalized medicine. To understand this powerful tool,its versatility, and how dramatically it is changing the molecular approach to biomedical and clinical research, this review de-scribes the technology, its applications, a didactic step-by-step review of a typical microarray protocol, and a real experiment.Finally, it calls the attention of the medical community to the importance of integrating multidisciplinary teams to take advan-tage of this technology and its expanding applications that, in a slide, reveals our genetic inheritance and destiny.Online address: http://www.molmed.orgdoi: 10.2119/2006–00107.Trevino

Page 2: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

frequent biomedical applications, de-scribe the steps of a typical laboratoryprocedure, guide the reader through theprocessing of a real experiment to detectdifferentially expressed genes, and listvaluable web-based microarray data andsoftware repositories.

TECHNOLOGY DESCRIPTIONIt is well known that complementary

single-stranded sequences of nucleicacids form double stranded hybrids. Thisproperty is the basis of the very powerfulmolecular biology tools such as Southernand Northern blots, in situ hybridization,and Polymerase Chain Reaction (PCR).In these, specific single-stranded DNAsequences are used to probe for its com-plementary sequence (DNA or RNA)forming hybrids. This same idea also isused in DNA microarray technologies.The aim, however, is not only to detectbut also to measure the expression levelsof not a few but rather thousands ofgenes in the same experiment. For thispurpose, thousands of single-strandedsequences that are complementary totarget sequences are bound, synthesized,or spotted to a glass support whose sizeis similar to a typical microscope slide.There are mainly two types of DNA arrays,depending on the type of spotted probes.One uses small single-stranded oligonu-cleotides (~22 nt) synthesized in situwhose leading provider is Affymetrix(Santa Clara, CA, USA, http://www.affymetrix.com). The other type of arraysuses complementary DNA (cDNA) ob-tained by reverse transcription of thegenes’ messenger RNAs (mRNA), com-pletion of the second strand, cloning ofthe double-stranded DNAs, and typicallyPCR amplification of their open readingframes (ORF), which become the boundprobes. One of the limitations of usinglarge ORF or cDNA sequences is an un-even optimal melting temperaturecaused by differences in their sizes andthe content of GC-paired nucleotides. Asecond problem is cross-hybridization ofclosely related sequences, overlappedgenes, and splicing variants. In oligo-based DNA arrays, the targeted nucleic

acid specie is redundantly detected bydesigning several complementaryoligonucleotides spanning each entiretarget sequence by segments. Theoligonucleotides are designed in such away to avoid the cDNA probe draw-backs and to maximize the specificity forthe target gene. Initially, DNA arrayswere based on nylon membranes that arestill in use. However, the glass providesan excellent support for attaching the nu-cleotide sequences, is less sensitive tolight than membranes, and is non-porous, allowing the use of very smallamounts of sample. There is a more re-cent and different technology that usesdesigned oligonucleotide probes at-tached to beads that are deposited ran-domly in a support. The position of eachbead and hence the sequence it carries isdetermined by a complex pseudo-se-quencing process. These types of arrays,provided by Illumina (SanDiego, CA,USA, http://www.illumina.com) aremainly used for genotyping, copy-numbermeasurements, sequencing, and detect-

ing loss of heterozygosity (LOH), allele-specific expression, and methylation. Arecent review of this technology has beenpublished elsewhere (17). For clinical re-search, however, the preferred technologyso far is the oligo-based microarrays whoseleading provider is Affymetrix.

The general process in microarrayexperiments is depicted in Figure 1.Fluorescent dyes are used to label theextracted mRNAs or amplified cDNAsfrom the tissue or cell samples to beanalyzed. The DNA array is then hy-bridized with the labeled sample(s) byincubating, usually overnight, andthen washing to remove non-specifichybrids. A laser excites the attachedfluorescent dyes to produce light whichis detected by a (confocal) scanner.The scanner generates a digital imagefrom the excited microarray. The digitalimage is further processed by special-ized software to transform the imageof each spot to a numerical reading.This process is performed, first, findingthe specific location and shape of each

5 2 8 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Figure 1. Schematic Representation of a Gene Expression Microarray Assay. Arrows repre-sent process (left column) and pictures or text represent the product. Differences in theprotocol in one- and two-dye technologies are specific to the technology rather than tothe samples or question. For CGH, the process is similar, replacing mRNA by DNA.

Page 3: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

spot, followed by the integration (sum-mation) of intensities inside the de-fined spot, and, finally, estimating thesurrounding background noise. Back-ground noise generally is subtractedfrom the integrated signal. This finalreading is an integer value assumed tobe proportional to the concentration ofthe target sequence in the sample towhich the probe in the spot is directed.In competitive two-dye assays, thereading is transformed to a ratio equalto the relative abundance of the targetsequence (labeled with one type of flu-orochrome) from a sample respect to areference sample (labeled with anothertype of fluorochrome). In the one-dyeAffymetrix technologies, the fluores-cence is commonly yellow, whereas intwo-dyes technologies the colors usedare green for reference and red for sample(although a replicate using dye-swap iscommon). The choice of the technologythat is more appropriate depends onexperimental design, availability, costs,and the expected number of expressionchanges. In general, when only a mi-nority of the genes is expected tochange, a two-dye or reference designis more suitable, otherwise a one-dyetechnology may be more appropriate.

Finally, at the end of the experiment,an important issue derived from statisti-cal tests in microarray data is the conceptof the real significance of results and theconcomitant need for multiplicity of tests.For example, when applying a t-test, theresult is the probability that the observedvalues are given by chance. Commonly,we call a result significant when theprobability is smaller than five percent.For large-scale data, a t-test would beperformed thousands of times (one foreach gene) which means that from10,000 t-tests at five percent of signifi-cance level, we will call 500 genes differ-entially expressed merely by chancewhich is very close or even higher thanthose actually selected from experi-ments. Therefore, a correction to attemptto control for false positives should beperformed. The most common correctionmethod is the False Discovery Rate

proposed originally by Benjamini andHochberg (18) and extended by Storeyand Tibshirani (19).

APPLICATIONS IN BIOMEDICALRESEARCH

The ultimate output from any microar-ray assay, independent of the technology,is to provide a measure for each gene orprobe of the relative abundance of thecomplementary target in the examinedsample. In this section, we revise themost common applications of the dataderived from clinical studies using mi-croarrays irrespective of the technologyemployed.

Relating Gene Expression toPhysiology: Differential ExpressedGenes

The most common and basic questionin DNA microarray experiments iswhether genes appear to be downregu-lated (the expression has decreased) orupregulated (the expression has in-creased) between two or more groupsof samples. This type of analysis is es-

sential because it provides the simplestcharacterization of the specific molecu-lar differences that are associated witha specific biological effect. These signa-tures can be used to generate new hy-potheses and guide the design of fur-ther experiments. A statistical test isused to assess each gene to determinewhether the expression is statisticallydifferent between two or more groupsof samples (Figure 2). When comparingpopulations of individuals, a large num-ber of samples per class are needed toavoid interference from variation dueto individuals rather than experimentalgroup. For laboratory-controlled sam-ples, such as cell lines or strains, at leastthree biological replicates are recom-mended to compute a good estimate ofthe variance, hence the statistical confi-dence (as more replicates means moreconfidence and fewer false positives).Using a statistical technique calledpower analysis, it is possible to estimatethe number of samples required toidentify a high percentage of truly dif-ferentially regulated genes. Although

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 2 9

Figure 2. Detection of Differential Expressed Genes. Large differences in gene expressionare likely to be genuine differences between two groups of samples (A and B) whereassmall differences are unlikely to be truly differences. Samples can be biological replicatesor unreplicated populational samples.

Page 4: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

the use of this approach is commonpractice in the design of biological ex-periments, its use is not widespread inthe microarray community.

To detect differentially expressedgenes, intuitive and formal statisticalapproaches have been proposed. Themost famous intuitive approach, pro-posed in early microarray studies, is thefold change in fluorescence intensity(20,21) expressed as the logarithm (base2 or log2) of the sample divided by thereference (ratios). In this way, foldchange equal to one means that the ex-pression level has increased two fold(upregulation), fold change equal to –1 means that the expression level hasdecreased two fold (downregulation)whereas zero means that the expressionlevel has not changed. Larger values account for larger fold changes. Geneswhose fold change is larger than a cer-tain (arbitrary) value, are selected forfurther analyses. Although fold changeis a very useful measure, the weak-nesses of this criterion are the overesti-mation for low expressed genes in thereference (denominators close to zerotend to elevate the value of the ratio),the subjective nature of the value thatdetermines a “significant” change, andthe tendency to omit small but signifi-cant changes in gene expression levels.For these reasons, currently the mostsensible option is following formal sta-tistical approaches to select differen-tially expressed genes. For two groupsof samples, the common t-test is the eas-iest option, while not the best, for ana-lyzing two-dye microarrays whose log2

ratios generate normal-like distributionsafter normalization (see next section),and the ANOVA (analysis of variance)test for more than two groups of sam-ples. These options apply for both one-and two-dye microarrays. If the data isnon-standardized, Wilcoxon or Mann-Whitney tests may be applied. A com-parison of differential expression statis-tical tests, including t-test, has beenpublished elsewhere (22).

The approaches we have described areunivariate. That is, one gene is tested at

a time independently of any other gene.There are multivariate procedures how-ever, where genes are tested in combina-tions rather than isolated. Whilst beingmore powerful (23–26), these approachesrequire a more complex analysis.

Biomarker Detection: SupervisedClassification

Disease type and severity often are de-termined by expert physicians or patholo-gists on the basis of patient symptoms orby analyzing features of the diseased tis-sue obtained by biopsy inspection. Thiscategorization may allow the choice of ap-propriate pharmacological or surgicaltherapy. In this context, the availability ofmolecular markers associated with clinicaloutcome have been useful in allowing dis-ease monitoring to begin at a very earlystage and complementing the clinical andhisto-pathological analysis. The more re-cent application of DNA microarrays inclinical research has been a very importantstep toward the development of morecomplex markers based on multi-gene sig-natures. The identification of gene expres-

sion “signatures” associated with diseasecategories is called biomarker detection orsupervised classification (Figure 3).

The fundamental difference betweenidentifying differentially expressed genesand identifying a set of genes of real di-agnostic or prognostic value is that a bio-marker needs to be predictive of diseaseclass or clinical outcome. For this reason,it must be possible to associate, to a givenset of marker genes, a rule that allowsidentification of an unknown sample. Theclassification accuracy of the biomarkeralso needs to be determined with robuststatistical procedures. Therefore, duringthe biomarker selection procedure, a sub-stantial fraction of the samples are setaside in order to evaluate independentlythe accuracy of the selected biomarkers(in terms of sensitivity and specificity).Thus, such studies require a relativelylarge number of samples.

We already explained that unlike differ-ential expression, in biomarker selectionfor diagnostics, a rule is needed to makepredictions. This rule is generated by aclassifier, a statistical model that assigns a

5 3 0 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Figure 3. Biomarker Detection. Larger differences in gene expression are more likely to begenuine differences between two groups of samples (A and B) than small differences. Inthis case, a large number of samples are more informative than individual replications.

Page 5: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

sample to a certain category based ongene expression values. For example, asensible classifier for diabetes is whethersugar levels in serum reach certain value.In statistics, this classifier is referred to asunivariate. That is, only one variable(sugar level) is needed in the rule. Never-theless, for DNA microarray studies, it iscommon to obtain a large gene list usefulfor disease discrimination. Multiple genesprovide robustness in the estimation andconsider potential synergy between genes.Therefore, multivariate classifiers are com-monly used. For example, it is well knownthat obesity and parental predispositionto diabetes, in addition to sugar levels inserum, is a more precise diabetes diagno-sis criteria. Multivariate classifier can bedesigned using genes selected either by aunivariate method such as t-test, ANOVA,Wilcoxon, PAM (27), Golub’s centroid (1),or by a multivariate method (23–26).

Thus, the possibility to characterize themolecular state of diseased tissues hasled to an improvement in prognosis anddiagnosis as well as providing evidenceof the existence of distinct disease sub-classes in previously considered homo-geneous diseases.

Describing the Relationship Betweenthe Molecular State of BiologicalSamples: Unsupervised Classification

One key issue in the analysis of mi-croarray data is finding genes with asimilar expression profile across a num-ber of samples. Co-expressed genes havethe potential to be regulated by the sametranscriptional factors or to have similarfunctions (for example belonging to thesame metabolic or signaling pathways).The detection of co-expressed genestherefore may reveal potential clinicaltargets, genes with similar biologicalfunctions, or expose novel biologicalconnections between genes. On the otherhand, we may want to describe the de-gree of similarity between biologicalsamples at the transcriptional level (28).We may expect such analysis to confirmthat samples with similar biologicalproperties (for example samples derivedfrom patients affected by the same dis-

ease) tend to have a similar molecularprofile. Although this is true, it also hasbeen demonstrated that the molecularprofile of samples reflects disease hetero-geneity and therefore it is useful in dis-covering novel diseases sub-classes (5).From the methodological prospective,these questions can be addressed usingunsupervised clustering methods.

In this context, hierarchical clusteringis, among several options (29), one of themost used unsupervised classificationmethods (Figure 4). Other methods areavailable in several software packagessuch as R (The R Roundation for StatisticalComputing, http://www.r-project.org),GEPAS (30), TIGR T4 (31), (32), Gene-Spring (33), and Genesis (34). The coreconcept behind hierarchical clustering isthe progressive construction of gene orsample cluster by adding one element(gene, sample, or a smaller cluster) at thetime. In this way, more similar elementsare added early to small clusters whereasless similar elements are added to later

forming larger clusters. To decide whichelement is more similar to another, it isimportant to rely on a similarity or dis-similarity measure. Commonly usedmeasures include Euclidean distance (de-fined as the geometrical distance betweentwo elements in an n-dimensional space)and correlation distance. The result ofthe hierarchical clustering is therefore ahierarchical organization of patterns, sim-ilar to a phylogenetic tree. For example,in Figure 4b the most similar genes fiveand six are first merged to form a cluster,then genes one and two form a differentcluster which is lengthened later on byadding the next more similar gene three;and the process continues until all geneshave been included in a cluster and allclusters have been merged. For large-scale microarray data, it is common touse a simultaneous hierarchical clusteringfor samples and genes (32). Typically,genes are represented in the y-axis,whereas samples are drawn in the x-axis.A color-coded matrix (heatmap), where

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 3 1

Figure 4. Unsupervised Classification and Detection of Co-expressed Genes. (A) Double-Hierarchical clustering of gene expression values (heatmap), in rows by genes, and incolumns by samples. Similar samples (columns) generate clusters easily identified. For ex-ample, the gene expression of samples A and C is similar across genes. However A and Care different from the rest. Co-expressed genes (rows) form tight and small clusters. A se-lected cluster framed by dotted lines is shown in B. (B) Hierarchical generation of clustersfrom a selected group of genes in A.

Page 6: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

samples and genes are sorted accordingto the results of the clustering, is used torepresent the expression values for eachgene in each sample. This two-dimen-sional clustering procedure is particularlysuitable to explore the results of a largemicroarray experiment (see Figure 4).

Identification of Prognostic GenesAssociated with Risk and Survival

In medicine, the association of prog-nostic factors with survival times is in-valuable. The link between gene expres-sion levels and survival times mayprovide a useful tool for early diagnosis,prompt therapeutic intervention, anddesigning patient-specific treatments.Consequently, the selection of biomark-ers that correlate with survival times isa very important objective in the analysisof microarray data. To date, a number ofapproaches have been developed. Themost commonly used procedures incor-porate genes into exponential, poison, orCox regression models using a univariatevariable selection procedure (35). Thegene selection procedure is summarizedin Figure 5. The selected genes combined

in clinical classes can then be used to de-tect variations in survival times usingboth the Kaplan-Meier method and sta-tistical tests. Often, researchers are inter-ested in finding subgroups of samplesindependently of the recorded clinicaldata whose survival times are signifi-cantly different. This information canthen be used to prescribe specific treat-ments. In previous sections, we haveshown how unsupervised data explo-ration methods such as cluster analysiscan be used to identify sub-groups ofsamples within what was previouslyconsidered an homogeneous disease.Once these sub-groups have been identi-fied, survival analysis can be used totest whether they are characterized bydifferent clinical outcomes (35).

Association of Genes with DiseaseSurrogate Markers: Regression Analysis

An interesting question in the analysisof microarray data derived from clinicalstudies is whether there is an associationbetween gene expression and an ordinalvariable that represent a response, ormore generally, a measure of disease

progression – a surrogate marker. Exam-ples of these variables are the concentra-tion of metabolites, proteins in serum,response to treatment or dosage, growth,or any other clinical measure whose nu-merical representation makes sense pro-gressively. The approach, depicted inFigure 6, is conceptually similar to thatintroduced in the Survival Analysis sec-tion of this review. The mathematicalmodel in the cases that relate the inde-pendent variable, such as time, levels ofmetabolites, protein, or treatment, to de-pendent variables (genes) is, commonly,a linear regression model. Nevertheless,such a model can be modified to includeother available information.

Genetic Disorders: Gene CopyNumber and Comparative GenomicHybridization

It is well known that several inheriteddiseases are a consequence of genetic re-arrangements such as gene duplications,translocations, and deletions. Moreover,these alterations are observed in cancercells as well. A specific microarray tech-nique used to detect these abnormalitiesin a single hybridization experiment iscalled Comparative Genomic Hybridiza-tion (CGH) (Pollack, 1999) (13). The coreconcept in CGH is the use of genomicDNA (gDNA) in the hybridization tocompare the gDNA from a disease sam-ple versus that of a healthy individual.Hence, a typical microarray design canbe used in this approach (see Figure 1).The signal intensity in all probes in themicroarray should, therefore, be verysimilar for healthy samples. Thus, differ-ences in gene copy number are easilydetected by changes in signal intensity.Using this technology, Zhao et al., (2005)(36) recently have characterized the vari-ations of gene copy number in severalcell lines derived from prostate cancerand Braude et al., (36) confirmed an al-teration in chronic myeloid leukemia.

Genetic Disorders: Epigenetics andMethylation

Around 80 percent of CpG-dinucleotidesare naturally methylated at the fifth posi-

5 3 2 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Figure 5. Selection Procedure for Genes Associated with Survival Times as Risk Factors. Apositive gene (left plot) is that whose expression included as a risk factor in a survivalmodel (Cox, exponential, poison, etc.) can be fitted reasonably well (dotted line) to theoriginal survival times (steep solid line). The predicted survival curve from a negative gene(dotted line in right plot) is not close to the observed survival curve (steep solid line).

Page 7: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

tion of the cytosine pyrimidine ring (37).The patterns of cytosine methylationalong with histone acetylation and phos-phorylation control the activation anddeactivation of genes without changingthe nucleotide sequence (38). These regu-latory mechanisms are known as the epi-genetic phenomena. In particular, genesmethylated in their promoters becomeinactive irrespective of the presence ofthe transcriptional activators. Aberra-tions in any of these epigenetic patternscause several syndromes and may pre-dispose carriers to cancer (39). To detectpatterns of methylation using microar-rays, two main methods have been pro-posed (40). One is based on the enrich-ment of the unmethylated fraction ofCpG islands and the other focuses on thehypermethylated fraction. Both methodsmake use of methylation-sensitive re-striction enzymes to generate fragmentsenriched in either unmethylated ormethylated CpG sites (Figure 7). In thefirst method, sample and control gDNAare cleaved with methylation-sensitiveenzymes that cut unmethylated CpGsites generating protruding shorter frag-

ments leaving methylated CpG sites un-altered. Specific adaptors then are linkedto these protruding ends. Methylatedfragments subsequently are cut by a CpGspecific enzyme. The remaining frag-ments that contain the adaptor, thosethat were originally unmethylated, areamplified using PCR and primers com-plementary to the adaptors’ sequence.The result is that genes belonging to theunmethylated fraction are associatedwith higher fluorescent intensities onthe microarray. On the other hand, in thesecond method, the gDNA from the sam-ple and control samples are cleaved witha restriction enzyme to generate smallprotruding fragments. Fragments then arelinked to adaptors and cut by methylation-sensitive restriction enzymes leavingmethylated flanked fragments unalteredwhich are amplified using PCR. The re-sult is that the methylated fraction isamplified and detected in the microarray.The microarrays used in these experi-ments are, therefore, specially designed

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 3 3

Figure 6. Selection Procedure for Genes Associated with Outcome. The expression of apositive gene (horizontal axis in left plot) is highly correlated with the associated outcome(vertical axis). For a non-associated gene (right plot), the gene expression (horizontal axis)is not correlated to outcome (vertical axis).

Figure 7. Detection of Altered Methylated Patterns and DNA Polymorphisms in Genomic DNA.Left Panel: Enrichment of unmethylated DNA fragments (see text). Right Panel: Enrichment ofhypermethylated fragments (see text). Scheme adapted from Schumacher et al. (2006) (41).

Page 8: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

to include such fragments. Using themethods described, methylation patternshave been screened for several types ofcancers (41–46).

Genetic Disorders and Variability:Gene Polymorphism and SingleNucleotide Polymorphism

The human genome carries at least tenmillion nucleotide positions that vary inat least one of 100 individuals in a popu-lation (47). The identification of thesesingle nucleotide polymorphisms (SNPs)is an important tool for identifying ge-netic loci linked to complex disorders(47). Although there are commerciallyavailable microarrays to detect SNP, thesetechnologies still are in their infancy andthe widespread distribution is still haltbecause of the relatively high cost persample. So far, the number of SNPsstored in public databases is more than

two million whereas the available mi-croarrays for SNPs detection only cover10,000 SNPs. The three major strategiesfor SNP genotyping using microarraysare all based on primer extension tech-niques depicted in Figure 8. The primerincluded in the microarray probe hy-bridizes to the target sequence preciselyadjacent to its SNP. The first strategy (seeFigure 8A) consists of mini-sequencingthe primer specific for each polymor-phism immobilized in the microarraysupport. PCR products, DNA poly-merase, and different color fluorescent-la-beled nucleotides are added in the hy-bridization-one-base-extension to detectthe SNPs in parallel. The genotype is de-tected by color combinations. The secondstrategy (see Figure 8B) uses the sameconcept of primer-specific hybridization,though combined with only one dye andmore than one base extension. The geno-

type is revealed by signal strength. Thethird strategy (see Figure 8C) makes one-base extension in solution combined withdifferent color fluorescent-labeled nu-cleotides. Primers then are captured byhybridization in the microarray. Thegenotype is detected by color combina-tions. Recent studies have produced ge-nome-wide SNP characterization for anumber of tumor types (48–50).

Chromatin Immunoprecipitation:Genetic Control and TranscriptionalRegulation

Transcription factors (TF) are regula-tory proteins that can bind specific DNAsequences (usually promoters) to controlthe level of gene expression. Mutationsor alterations in the expression or activa-tion of TF are known in several diseases(51). For example, abnormal over-expres-sion of the TF c-Myc is found in 90 per-

5 3 4 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Figure 8. Major Techniques for Detection of SNPs Using Microarrays. Colors and patterns are used for illustrative purposes. Schemeadapted from Syvanen (2005) (48).

Page 9: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

cent of gynecological cancers, 80 percentof breast cancers, 70 percent of colon can-cers, and 50 percent of hepatocarcinomas(52). Therefore, establishing the link be-tween TF and their targets is essential tocharacterize and design better cancertherapies. To identify these targets, DNAfragments are incubated with a selectedTF that has been tagged (Figure 9). Thecomplex DNA-TF is precipitated usinga quite specific antibody against thetagged peptide. Precipitated DNA thenis labeled and hybridized in DNA mi-croarrays to reveal genome-wide targetsfor the selected TF (see Figure 9). An ex-perimental overview and computationalmethods for the analysis of these datahave been revised elsewhere (53,54).

Pathogen DetectionClassically, pathogen detection is

achieved through a series of clinical testswhich detect, generally, single pathogens.A battery of clinical assays is therefore

performed to typify a sample. A radicalrecent approach uses DNA microarraysto test for the presence of hundreds ofpathogens in a single experiment (15,16).For this, known sequences from eachpathogen are collected and those beingpathogen-specific are selected (Figure 10).The collection of specific sequences isused to build a purpose-specific microar-ray. Then genomic DNA from a patientbiopsy, or from a food sample suspectedto be infected, is extracted and hybridizedto the microarray. Pathogen detection issimply revealed by spot intensity.

AN OVERVIEW OF A TYPICALMICROARRAY EXPERIMENT

In this section we provide a brief de-scription of the typical workflow of amicroarray experiment and its dataanalysis (see Figure 1).

RNA ExtractionRNA can be extracted from tissue or

cultured cells using molecular biology lab-oratory procedures (although several com-mercial kits are available). The amount ofmRNA required is about 0.5/μ/g whichis equivalent to 20/μ/g of total RNA,though there is some variation depend-ing on the microarray technology. Whenthe amount of mRNA (or DNA) is scarce,

an amplification step, for example byPCR amplification of reverse transcribedcDNA, is needed before labeling.

LabelingmRNA is retro-transcribed using re-

verse transcriptase to generate cDNA.Labeling is achieved by including in thereaction (or in a separate reaction) modi-fied fluorescent nucleotides that aremade fluorescent by excitation at appro-priate wavelengths. The most commonfluorescent dyes used are Cy3 (green)and Cy5 (red). The unincorporated dyesusually are removed by column chro-matography or ethanol precipitation.

HybridizationHybridization is carried out according

to conventional protocols. Hybridizationsolution contains saline sodium citrate(SSC), sodium dodecyl sulphate (SDS)as detergent, non-specific DNA such asyeast DNA, salmon sperm DNA, orrepetitive sequences, blocking reagentslike bovine serum albumin (BSA) orDenhardt’s reagent, and labeled cDNAfrom the samples. Hybridization tem-peratures range from 42°C to 45°C forcDNA-based microarrays and from 42°Cto 50°C for oligo-based microarrays.Hybridization volumes vary between

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 3 5

Figure 9. Chromatin Immuno-Precipitation(ChIP-on-chip) Essay. The generation of ahybrid gene between a gene for a tran-scription factor (TF) and a tag coding se-quence renders a quimaeric TF. Uponbinding to its DNA target the complex canbe pulled-down from the tag to recoversuch type of DNA sequences.

Figure 10. Multi-Pathogen Detection Using DNA Microarrays. Specific DNA sequences fromdisease-causing micro-organisms can be spotted on a microarray for pathogen detection.

Page 10: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

20/μ/L to 1 mL depending on the mi-croarray technology. A hybridizationchamber is usually needed to keep temperature and humidity constant.

ScanningAfter hybridization, the microarray is

washed in salt buffers of decreasing con-centration and dried by slide centrifuga-tion or by blowing air after immersion inalcohol. Then the slide is read by a scan-ner which consists of a device similar toa fluorescence microscope coupled witha laser, robotics, and digital camera torecord the fluorescent excitation. The ro-botics focuses on the slide, lens, camera,and laser by rows similar to a commondesktop scanner. The amount of signal(color) detected is presumed to be pro-portional to the amount of dye at eachspot in the microarray and hence propor-tional to the RNA concentration of thecomplementary sequence in the sample.The output is, for each fluorescent dye,a monochromatic (non-colored) digitalimage file typically in TIFF format. False-color images (red, green, and yellow) arereconstructed by specialized software forvisualization purposes only.

Image AnalysisThe goal in this step is to identify the

spots in the microarray image, quantifythe signal, and record the quality of eachspot. Depending on the software used,this step may need some degree ofhuman intervention. The digital imagesare loaded in specialized software witha pre-loaded design of the microarray(grid layout) which instructs the soft-ware to consider number, position,shape, and dimension of each spot. Thegrid is then accommodated to the actualimage automatically or manually. Fine-tuning of spot positions and shapes isusually performed to avoid any bias inthe robotic construction of the microar-ray. Human involvement is needed tomark those spots that could be artifactssuch as bubbles or scratches which arecommon. Finally, an automated integra-tion function is performed using the soft-ware to convert the actual spot readings

to a numerical value. The integrationfunction considers the signal and back-ground noise for each spot. The outputof the image analysis may be commonlya tab-delimited text file or a specific fileformat. Common image analysis soft-ware include ScanArray (PerkinElmer,Waltham, MA, USA), GenePix (Axon),(Molecular Devices Corporation, UnionCity, CA, USA) TIGR-SpotFinder/TM4(www.tigr.org), (The Institute for Ge-nomic Research, Rockville, MD, USA)and GeneChip (Affymetrix, Santa Clara,CA, USA). This process varies from auto-matic or semi-automatic to manual de-pending on the microarray technology,scanner, and software used.

NormalizationSystematic errors are introduced in la-

beling, hybridization, and scanning pro-cedures. The main aims of normalizationis to correct for these errors preservingthe biological information and to gener-ate values that can be compared betweenexperiments, especially when they weregenerated in, and with, different times,places, reagents, microarrays, or techni-cians. There are two types of normaliza-tion, “within” and “between” array nor-malization. “Within” array normalizationrefers to normalization applied in thesame slide and it is applicable, generally,to two-dye technologies. For this, let usdefine M = Log2(R/G) and A = Log2(R*G)/2 where R and G are the red and greenreadings respectively. Under the assump-tion that the majority of genes have notbeen differentially expressed, the major-ity of the M values should oscillatearound zero. “Within” normalization isfinally performed shifting the imaginaryline produced by the values of M (in ver-tical axis) to zero along the values of A (inhorizontal axis). This kind of normaliza-tion, sometimes called loess, usually isperformed by spatial blocks to avoid anybias in the microarray printing process(called print-tip-loess). “Between” nor-malization is necessary when at least twoslides are analyzed to guarantee that bothslides are measured in the same scaleand that its values are independent from

the parameters used to generate themeasurements. The goal is to transformthe data in such a way that all microar-rays have the same distribution of values.For two-dye technologies this is optionaland is commonly done through scalingor standardizing the values once withinnormalization has been performed. Forone-dye microarrays, between normaliza-tion is usually performed using methodsto equalize distributions such as quantile-normalization (55) after log2 transforma-tion. There are, however, a number ofnormalization methods. The right choiceis usually data-dependent. A compari-son of the results of different normaliza-tion methods is recommended.

Missing ValuesThe image analysis process (generally

in spotted microarrays) does not alwaysgenerate a value for a gene because thespot was defective or manually markedas faulty. This is not a major issue whengenes are replicated in several spots inthe microarray, because the reading ofthe gene still can be estimated using theremaining spots. If the value in a spot issystematically missing in several arrays,it should be removed from the analysis.If the number of missing values is low,the corresponding spots can simply notbe considered in all arrays. However,when the number of arrays is large, thiscould lead to the removal of severalspots. To avoid these problems, one mustuse only those methods that can dealwith missing values, or, use algorithmsto infer those values (30). Results should,therefore, be interpreted considering thatsome values were inferred.

FilteringCurrent microarrays contain more than

10,000 genes, spots, or probes. Dealingwith large amounts of data may requireexpensive computational resources andlarge processing times. A common prac-tice is to remove genes that have notshown significant changes across samples,genes with several missing data, or thosewhose average expression is very low(because low expressed genes are more

5 3 6 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Page 11: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

susceptible to noise). The most commonapproaches use statistical tests (lower),signal-to-noise estimations (higher), vari-ability (higher), and average (higher).

TransformationThe numerical values from image

analysis are commonly integer numbersbetween one and 32,000 for both signaland background. The background nor-mally is subtracted from the signal. Thedistribution of these values is, however,concentrated in a narrow range and,therefore, is transformed using loga-rithms (base 2 generally) which generatenormal-like distributions. Negative val-ues resulting from subtraction may raiseproblems in transformations which areresolved by restricting the values or per-forming more robust transformationssuch as the generalized logarithm.

Statistical AnalysisThe procedure after image analysis

and data processing depends mainly onthe particular biological issue and dataavailable. These procedures have beendescribed in the Applications Section ofthis review.

ILLUSTRATING THE DETECTION OFDIFFERENTIALLY EXPRESSED GENES: THE CASE OF TERM PLACENTA

In previous sections, we have intro-duced the experimental and data analy-sis methods used in common microar-ray experiments. To illustrate theseprocedures, we will use a case studydesigned to identify genes that are pref-erentially expressed in placenta. Thisstudy, currently ongoing in our labora-tory, is part of a larger project whose re-sults are expected to assist further re-search revealing molecular mechanismsinvolved in fetus development, placen-tal function, and pathologies related topregnancy. To identify genes specific forhuman placenta, we used a two-colormicroarray. In this experiment, mRNAextracted from two normal human pla-centas was compared with a pool ofmRNA extracted from several normaltissues not including placenta. To gain

information on the variability expectedfrom experimental errors, we also com-pared two aliquots of the referencemRNA in the same array. An overviewof the process is depicted in Figure 11. A brief description of the detailed pro-cedure follows.

Step 1: mRNA Extraction andMicroarray Hybridization

Human total term placenta RNA iso-lated using proteinase K-phenol basedprotocol (described in (56)) and a pool ofcommercially available total RNAs fromseveral human tissues not including pla-centa were part of the set of reagents uti-lized in the EMBO-INER Advanced Prac-tical Course 2005 held in Mexico city(EMBO Courses and Workshops Pro-gramme, Heidelberg Germany, http://www.embo.org/courses_workshops/mexico.html). They were quality controlled byrunning them in a RNA 6000 Nano Assayfrom Agilent (Agilent Technologies Inc.,Santa Clara, CA, USA). First strand cDNAwas synthesized from each RNA (5 μg)sample by reverse transcription using anoligo-dT primer with a T7-promoter se-quence attached to its 5′ end, while s

strand resulted from treating the firststrands with RNase H plus DNA poly-merase I (Message Amp aRNA kit fromAmbion, Austin, TX, USA). Column puri-fied double-stranded cDNAs were tran-scribed (in vitro transcription) with T7RNA polymerase and the amplified RNAs(aRNAs) were purified also by columnbinding and subsequent elution. Fluores-cent labels were attached indirectly to thehybridization probes by a two-step proce-dure. The first step consisted of a reversetranscription of the aRNA using this timea mixture of all four desoxiribonu-cleotides and including aminoallyl-dUTP.In the second step, N-hydroxysuccin-imide-activated fluorescent dyes (Cy3 andCy5) were coupled to the cDNAs by reac-tion with the amino functional groups.Probes were preincubated with blockingreagents (human Cot DNA at 1 μg/mLand poly-dA DNA also at μg/mL) andthen hybridized to prehybridized (6XSSC, 0.5 percent SDS and one percentBSA) slides in hybridization buffer (50percent formamide, 6X SCC, 0.5 percentSDS and 5X Denhardt’s solution). Slideswere washed once in 2X SSC/0.1 percentSDS at 65°C for five minutes, twice in

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 3 7

Figure 11. Experimental Design of the Placenta Microarray Experiment. RNAs from twoterm human placentas were compared with RNAs from a collection of human tissues, ex-cept placenta, in search of placental specific transcripts.

Page 12: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

0.1X SSC/0.1 percent SDS but first at65°C for ten minutes, and then at roomtemperature for two minutes, and finallyin isopropanol, also at room temperature,with slide centrifugation between eachwashing step, and stored in the dark untilscanning. Fluorescent probes were hy-bridized to cDNA microarrays (laboratorymade oligo-based microarray containinghalf of the probes in each of two slides).

Step 2: Microarray Scanning, SpotFinding and Image Processing

Microarrays were scanned using Sca-nArray Express (PerkinElmer, Waltham,MA, USA). Images obtained were ana-lyzed using ChipSkipper (EMBLEMTechnology Transfer GmbH, Heidelberg,Germany, http://www.embl-em.de) toobtain a single value for each spot repre-senting the ratio (in log2 scale) of themRNA expression level from placenta to

the reference mRNA from the pool ofnon-placenta tissues. A value of zerorepresents similar expression level inboth mRNA samples. A value of one rep-resents two-fold over-expression in pla-centa whereas a value of –1 representstwo-fold downregulation in placenta.One placental sample was hybridized induplicate into the two microarrays usinga dye-swap design. In this approach thelabeling scheme is reversed in two sepa-rate microarrays. To gain information onthe variability associated with experi-mental error, two aliquots of the refer-ence pool mRNA were compared on thesame microarray. Likewise the compari-son between experimental and controlsamples and the comparison betweenthe two control samples were performedin duplicate using the dye-swap design.To summarize, the experiment was per-formed using six microarrays (two pla-

centa samples compared with a referencein duplicate and two reference mRNA ascontrols, see Figure 11).

Step 3: Quality Assessment,Processing and Normalization

To ensure that all microarrays werecomparable in scale, we performedprint-tip loess normalization, shiftingthe imaginary M line to zero (Figure 12).We processed the dataset, removingfrom the analysis all control and emptyspots. Representative plots before andafter “within” normalization and pro-cessing for both placenta and controlexperiments are shown in Figure 12.Note that, as expected, there are impor-tant differences in ratio values (see Mvalue in Figure 12C-D) for highly ex-pressed genes (A value) in placentacompared with the reference (see Figure12C), whereas ratios in the control ex-periment are very close to zero (seeFigure 12D) indicating a very high re-producibility of the technology.

Step 4: Detection of DifferentialExpressed Genes

Duplicated spots were averaged togenerate a unique measure per gene perarray. To detect differentially expressedgenes, we used a one-sample t-testunder the null hypothesis of no differ-ential expression (mean ratio equalzero). Resulted P-values were adjustedfor multiplicity tests using the FalseDiscovery Rate (FDR) approach (18,57).Because of the small number of sam-ples, we treated the replicated biologi-cal samples as independent for prelimi-nary purposes only. The effect of thisexercise is a slight underestimation ofthe variance in favor of more sensibleresults. We treated the replicated bio-logical samples as independent to in-crease the level of confidence in the statistical tests. In addition, we limitedthe selection of differentially expressedgenes to those that fulfill two condi-tions: firstly, genes whose FDR value is less than 0.10 (ten percent corre-sponding to raw P-values less than0.0000118), and secondly, genes whose

5 3 8 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Figure 12. Quality Assessment and Normalization. (A) Ratio values (M = Log2(R/G), R =Red channel, G = Green channel) versus average values (A = Log2(R×G)/2) for one pla-centa sample. Dots represent spots in the microarray. Crosses correspond to controlspots. Lines represent the tendency for each block (print-tip) in the microarray. (B) Con-trol assay, two reference mRNA aliquots were hybridized changing the dye color only.Symbols as in (A). (C) Normalized data from (A). (D) Normalized data from (B). Controlspots removed in (C) and (D).

Page 13: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

absolute fold expression is at least two.Using these criteria, 350 (out of 21,456)were selected. A subset of 205 genes isdepicted in Figure 13 (see step 5).

Step 5: ValidationTo verify the process of selection, we

made two comparisons. First, as negativecontrol, we followed the same selectioncriteria for the control microarrays thatmade use of the reference sample in bothchannels. The result was that no genesmatch the criteria. Second, we performeda comparison using the Tissue Expressiontool (http://www.t1dbase.org/page/TissueHome) from T1dbase (59). This tool makesuse of Gene Expression Atlas (59),SAGEmap (60), and TissueInfo (58), inte-grating all measurements in a single score(58). This score, estimated for several tis-sues, represents whether the expressionfor a gene is tissue-specific. Scores closer toone are meant to be tissue-specific whereasscores closer to zero represents no-tissue-specificity. From the 350 genes resulted inStep 4, we selected only those that are in-

cluded in this database. The result was 201genes. Several genes that seem to be over-expressed in the placentas processed here(darker colors in Figure 13A) shows con-sistently higher placenta-specific scores inT1dbase (darker colors in Figure 13B).These results suggest that the experimentis coherent and valid.

Step 6: AnalysisOnce genes have been selected, further

computational, literature, and laboratoryanalyses are needed to confirm, expand,or restrain the results. Here, the analysisonly dealt with comparing the resultswith T1dbase-Tissue Specific ExpressionTool. However, queries to Gene Ontol-ogy, KEEG pathways, Pubmed, Blasts,or any other pertinent database resourceshould be considered a compulsory step.

CONCLUSIONS AND TRENDSDNA microarrays are a powerful, ma-

ture, versatile, and easy-to-use genomictool that can be applied for biomedicaland clinical research. The research com-

munity is expanding the use of this ap-proach for novel applications. The mainadvantage is the genomic-wide informa-tion provided at reasonable costs. Biologi-cal interpretation however requires the in-tegration of several sources of information.In this context, a new discipline referredas Systems Biology is emerging that inte-grates biological knowledge, clinical infor-mation, mathematical models, computersimulations, biological databases, imaging,and high-throughput “omic” technologies,such as microarray experiments. There-fore, multidisciplinary groups involvingclinicians, biologists, statisticians, and, re-cently, bioinformaticians are being formedand expanded in all important researchinstitutions. Subsequently, virtually allbiology-related research areas are movingfrom merely describing cellular and mo-lecular components in a qualitative man-ner, toward a more quantitative approach.These new teams are generating hugeamounts of data and more convincingmodels to ultimately reveal hidden piecesin the biological puzzle. This new knowl-edge is having a crucial impact on thetreatment of diseases, because, amongother things, it individualizes subtypesof pathologies, disease risks, and survival,treatment, prognosis, and outcome,quickly moving biomedical research tothe era of personalized medicine.

All supplementary materials are avail-able online at molmed.org.

ACKNOWLEDGMENTSHABS thanks the Staff of the Microar-

ray Technology EMBO-INER AdvancedPractical Course for enjoyable course les-sons, materials and results; Peter Davies,Nancy and Greg Shipley of UT MedicalSchool for additional laboratory training;Albert Sasson for critical reading of themanuscript and the offices of the Deanof his school and of the President of hisUniversity for support. Victor Trevinothanks Darwin Trust of Edinburgh andCONACyT for his PhD scholarship, andITESM for support.

REFERENCES1. Golub TR et al. (1999) Molecular classification of

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 3 9

Figure 13. Genes differentially expressed in placenta compared with other tissues. (A)Heatmap showing the relative gene expression in placenta. Darker color means higherexpression in placenta. Genes are ordered using a hierarchical clustering algorithm. (B)Heatmap showing the score in T1dbase corresponding to genes in (A). Darker colors rep-resent more specific expression.

Page 14: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

cancer: Class discovery and class prediction bygene expression monitoring. Science. 286:531–7.

2. van’t Veer LJ et al. (2002) Gene expression profil-ing predicts clinical outcome of breast cancer. Nature. 415:530–6.

3. Singh D et al. (2002) Gene expression correlatesof clinical prostate cancer behavior. Cancer Cell.1:203–9.

4. Wang T et al. (2000) Identification of genes differ-entially over-expressed in lung squamous cellcarcinoma using combination of cDNA subtrac-tion and microarray analysis. Oncogene. 19:1519–28.

5. Alon U et al. (1999) Broad patterns of gene ex-pression revealed by clustering analysis of tumorand normal colon tissues probed by oligonu-cleotide arrays. Proc. Natl Acad. Sci. U. S. A. 96:6745–50.

6. Ramaswamy S et al. (2001) Multiclass cancer di-agnosis using tumor gene expression signatures.Proc. Natl Acad. Sci. U. S. A. 98:15149–54.

7. Brachat A, Pierrat B, Brungger A, Heim J. (2000)Comparative microarray analysis of gene expres-sion during apoptosis-induction by growth factordeprivation or protein kinase C inhibition. Onco-gene. 19:5073–82.

8. Bonner AE, Lemon WJ, You M. (2003) Gene ex-pression signatures identify novel regulatorypathways during murine lung development: implications for lung tumorigenesis. J. Med. Gen.40:408–17.

9. Brachat A et al. (2002) A microarray-based, inte-grated approach to identify novel regulators ofcancer drug response and apoptosis. Oncogene.21:8361–71.

10. Rhodes DR et al. (2004) Large-scale meta-analysisof cancer microarray data identifies commontranscriptional profiles of neoplastic transforma-tion and progression. Proc. Natl Acad. Sci. U. S. A.101:9309–14.

11. Cutler DJ et al. (2001) High-throughput variationdetection and genotyping using microarrays. Ge-nome Res. 11:1913-1925.

12. Yan PS et al. (2001) Dissecting complex epi-genetic alterations in breast cancer usingCpG island microarrays. Cancer Res. 61:8375–80.

13. Pollack JR, Perou CM, Alizadeh AA, et al. (1999)Genome-wide analysis of DNA copy-numberchanges using cDNA microarrays. Nat. Genet.23:41–6.

14. Relogio A et al. (2005) Alternative splicing mi-croarrays reveal functional expression of neuron-specific regulators in Hodgkin lymphoma cells. J. Biol. Chem. 280: 4779–84.

15. Wang D et al. (2002) Microarray-based detectionand genotyping of viral pathogens. Proc. NatlAcad. Sci. U. S. A. 99:15687–92.

16. Conejero-Goldberg C et al. (2005) Infectiouspathogen detection arrays: viral detection in celllines and postmortem brain tissue. Biotechniques.39:741–51.

17. Fan JB, Chee MS, Gunderson KL. (2006) Highly

parallel genomic assays. Nat. Rev. Genet. 7:632–44.

18. Benjamini Y, Hochberg Y. (1995) Controlling theFalse Discovery Rate - a Practical and PowerfulApproach to Multiple Testing. J. R. Stat. Soc. Ser.B. 57:289–300.

19. Storey JD, Tibshirani R. (2003) Statistical signifi-cance for genomewide studies. Proc. Natl Acad.Sci. U. S. A. 100:9440–5.

20. Yue H et al. (2001) An evaluation of the perform-ance of cDNA microarrays for detecting changesin global mRNA expression. Nucleic Acids Res. 29:E41–41.

21. Mutch DM, Berger A, Mansourian R, Rytz A,Roberts MA. (2001) Microarray data analysis: apractical approach for selecting differentially ex-pressed genes. Genome Biol. 2: PREPRINT0009.

22. Kim SY, Lee JW, Sohn IS. (2006) Comparison ofvarious statistical methods for identifying differ-ential gene expression in replicated microarraydata. Stat. Methods Med. Res. 15:3–20.

23. Li LP, Weinberg CR, Darden TA, Pedersen LG.(2001) Gene selection for sample classificationbased on gene expression data: study of sensitivityto choice of parameters of the GA/KNN method.Bioinformatics. 17:1131–42.

24. Ooi CH, Tan P. (2003) Genetic algorithms appliedto multi-class prediction for the analysis of geneexpression data. Bioinformatics. 19:37–44.

25. Sha NJ et al. (2004) Bayesian variable selection inmultinomial probit models to identify molecularsignatures of disease stage. Biometrics 60:812–9.

26. Trevino V, Falciani F. (2006) GALGO: an R pack-age for multivariate variable selection using ge-netic algorithms. Bioinformatics. 22:1154–6.

27. Tibshirani R, Hastie T, Narasimhan B, Chu G.(2002) Diagnosis of multiple cancer types byshrunken centroids of gene expression. Proc. NatlAcad. Sci. U. S. A. 99:6567–72.

28. Getz G, Levine E, Domany E. (2000) Coupledtwo-way clustering analysis of gene microarraydata. Proc. Natl Acad. Sci. U. S. A. 97:12079–84.

29. Sheng Q, Moreau Y, Smet FD, Marchal K, MoorBD. (2005) Advances in Cluster Analysis of Mi-croarray Data. In: Azuaje F, Dopazo J (eds.) Dataanalysis and visualization in genomics and pro-teomics. John Wiley, Hoboken, NJ, pp. 153-171.

30. Vaquerizas JM et al. (2005) GEPAS, an experiment-oriented pipeline for the analysis of microarraygene expression data. Nucleic Acids Res. 33:W616–20.

31. Saeed AI, Hagabati NK, Braisted JC, et al. (2006)TM4 microarray software suite. DNA Microar-rays, Part B: Databases and Statistics 411:134-193.

32. Grewal A, Conway A. (2000) Tools for AnalyzingMicroarray Expression Data. Journal of Lab Au-tomation 5:62–4.

33. Sturn A, Quackenbush J, Trajanoski Z. (2002)Genesis: cluster analysis of microarray data.Bioinformatics. 18:207–8.

34. Eisen MB, Spellman PT, Brown PO, Botstein D.(1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. U. S. A. 95:14863–8.

35. Rosenwald A, Wright G, Chan WC, et al. (2002)The use of molecular profiling to predict survivalafter chemotherapy for diffuse large-B-cell lym-phoma. N. Engl. J. Med. 346:1937–47.

36. Zhao HJ, Kim Y, Wang P, et al. (2005) Genome-wide characterization of gene expression varia-tions and DNA copy number changes in prostatecancer cell lines. Prostate 63:187-197.

37. Braude I et al. (2006) Large scale copy numbervariation (CNV) at 14q12 is associated with thepresence of genomic abnormalities in neoplasia.BMC Genomics. 7:138.

38. Bird AP. (1986) Cpg-Rich Islands and the Func-tion of DNA Methylation. Nature. 321:209–13.

39. Henikoff S, Matzke MA. (1997) Exploring and ex-plaining epigenetic effects. Trends Genet. 13:293–5.

40. Laird PW. (2003) The power and the promise ofDNA methylation markers. Nat. Rev. Cancer. 3:253–66.

41. Schumacher A, Kapranov P, Kaminsky Z, et al.(2006) Microarray-based DNA methylation pro-filing: technology and applications. Nucleic AcidsRes. 34:528–42.

42. Lodygin D, Epanchintsev A, Menssen A, Diebold J,Hermeking H. (2005) Functional epigenomicsidentifies genes frequently silenced in prostatecancer. Cancer Res. 65:4218–27.

43. Gebhard C et al. (2006) Genome-wide profilingof CpG methylation identifies novel targets ofaberrant hypermethylation in myeloid leukemia.Cancer Res. 66:6118–28.

44. Shi H et al. (2006) Discovery of novel epigeneticmarkers in non-Hodgkin’s lymphoma. Carcino-genesis. 28:60–70.

45. Zhang D et al. (2006) Microarray-based molecu-lar margin methylation pattern analysis in colo-rectal carcinoma. Anal. Biochem. 355:117–24.

46. Wei SH et al. (2006) Prognostic DNA methylationbiomarkers in ovarian cancer. Clin. Cancer Res.12:2788–94.

47. Piotrowski A et al. (2006) Microarray-based sur-vey of CpG islands identifies concurrent hyper-and hypomethylation patterns in tissues derivedfrom patients with breast cancer. Genes Chromo-somes Cancer. 45:656–67.

48. Syvanen AC. (2005) Toward genome-wide SNPgenotyping. Nat. Genet. 37:S5–10.

49. Teh MT et al. (2005) Genomewide single nu-cleotide polymorphism microarray mapping inbasal cell carcinomas unveils uniparental disomyas a key somatic event. Cancer Res. 65: 8597–603.

50. Hoque MO, Lee CC, Cairns P, Schoenberg M,Sidransky D. (2003) Genome-wide genetic char-acterization of bladder cancer: a comparison ofhigh-density single-nucleotide polymorphismarrays and PCR-based microsatellite analysis.Cancer Res. 63:2216–22.

51. Dumur CI et al. (2003) Genome-wide detection ofLOH in prostate cancer using human SNP mi-croarray technology. Genomics. 81:260–9.

52. Moreno-Rocha JC, Revol de Mendoza A, Barrera-Saldana HA. (1999) Genetic transcription in eu-karyotes: from transcriptional factors to disease.Rev. Invest. Clin. 51:375–84.

5 4 0 | T R E V I N O E T A L . | M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7

D N A M I C R O A R R AY S F O R C L I N I C A L R E S E A R C H

Page 15: DNA Microarrays: a Powerful Genomic Tool for Biomedical and … · 2018. 10. 9. · 528 | TREVINO ET AL. | MOL MED 13(9-10)527-541, SEPTEMBER-OCTOBER 2007 DNA MICROARRAYS FOR CLINICAL

53. Gardner L, Lee LA, Dang CV. (2002) c-myc Pro-tooncogene. In: Bertino JR (ed.) Encyclopedia ofCancer. Academic Press, San Diego, Calif., pp.555-561.

54. Wu J, Smith LT, Plass C, Huang TH. (2006) ChIP-chip comes of age for genome-wide functionalanalysis. Cancer Res. 66:6899–902.

55. Beyer A et al. (2006) Integrated assessment andprediction of transcription factor binding. PLoSComput. Biol. 2:e70.

56. Bolstad BM, Irizarry RA, Astrand M, Speed TP.(2003) A comparison of normalization methodsfor high density oligonucleotide array data basedon variance and bias. Bioinformatics. 19:185–93.

57. Barrera-Saldana HA, Robberson DL, Saunders GF.(1982) Transcriptional products of the humanplacental lactogen gene. J. Biol. Chem. 257:12399–404.

58. Storey JD. (2002) A direct approach to false dis-covery rates. J. R. Stat. Soc. Ser. B. 64:479–98.

59. Hulbert EM, Smink LJ, Adlem EC, et al. (2007)T1DBase: integration and presentation of com-plex data for type 1 diabetes research. NucleicAcids Research 35:D742-D746.

60. Su AI et al. (2002) Large-scale analysis of thehuman and mouse transcriptomes. Proc. Natl.Acad. Sci. U. S. A. 99:4465–70.

61. Lash AE et al. (2000) SAGEmap: a public geneexpression resource. Genome Res. 10:1051–60.

62. Huminiecki L, Lloyd AT, Wolfe KH. (2003) Con-gruence of tissue expression profiles from GeneExpression Atlas, SAGEmap and TissueInfo data-bases. BMC Genomics. 4:31.

63. Brazma A et al. (2001) Minimum informationabout a microarray experiment (MIAME)-towardstandards for microarray data. Nat. Genet. 29:365–71.

64. Spellman PT et al. (2002) Design and implemen-tation of microarray gene expression markuplanguage (MAGE-ML). Genome Biol. 3:RESEARCH0046.

R E S E A R C H A R T I C L E

M O L M E D 1 3 ( 9 - 1 0 ) 5 2 7 - 5 4 1 , S E P T E M B E R - O C T O B E R 2 0 0 7 | T R E V I N O E T A L . | 5 4 1