high throughput gene expression screening: its emerging role in drug discovery

6
High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery Tom Freeman Gene Expression Group, The Sanger Centre, Hinxton Hall, Cambridge, UK Abstract: The genetic makeup and the environment influences the health and welfare of an indi- vidual. At both the tissue and cellular level, physiological function can be correlated with the tran- scription of genes, whose protein products contribute and influence the activity of biological sys- tems. In order to understand these processes, it is therefore essential to determine the temporal and spatial patterns of gene expression, and, with particular relevance to drug discovery, define changes that occur during development of disease or treatment with therapeutic agents. © 2000 John Wiley & Sons, Inc., Med Res Rev 20, No. 3, 197–202, 2000 Key words: gene expression; microarrays; drug targets In recent years, our knowledge of gene sequence has increased massively, principally due to large- scale cDNA and genome sequencing programs. The availability of this information resource has fu- eled efforts to develop ways to analyze gene expression systematically and as a result, there are now a range of approaches available that allow parallel analysis of a large number of genes. These tools can provide a comprehensive view of the genes expressed in samples of tissue, and even individual cells, and in so doing, will further our understanding of biochemical pathways and the functional role of novel genes. This article examines a number of these new technologies, highlighting their advan- tages and limitations, and outlines how they are beginning to play an important role in the process of drug discovery. The process of drug discovery, from target identification to marketing of the product, has always been time-consuming, costly, and, in many cases, a successful development has relied on serendip- ity as much as on design. In recent years, however, there have been major advances in a number of fields—notably combinatorial chemistry, genomics, proteomics, and bioinformatics—which now promise to rationalize the search for novel therapeutics. The hope is that, by employing these tech- nologies, it should first be possible to identify a biochemical pathway, gene, or protein that is per- turbed during the disease process, and then, based on this knowledge, design drugs to specifically 197 © 2000 John Wiley & Sons, Inc. Correspondence to: Tom Freeman, Gene Expression Group, The Sanger Centre, Hinxton Hall, Cambridge, UK

Upload: tom-freeman

Post on 06-Jun-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery

High Throughput GeneExpression Screening:

Its Emerging Role in Drug Discovery

Tom Freeman

Gene Expression Group, The Sanger Centre, Hinxton Hall, Cambridge, UK

Abstract: The genetic makeup and the environment influences the health and welfare of an indi-vidual. At both the tissue and cellular level, physiological function can be correlated with the tran-scription of genes, whose protein products contribute and influence the activity of biological sys-tems. In order to understand these processes, it is therefore essential to determine the temporal and spatial patterns of gene expression, and, with particular relevance to drug discovery, definechanges that occur during development of disease or treatment with therapeutic agents. © 2000 John

Wiley & Sons, Inc., Med Res Rev 20, No. 3, 197–202, 2000

Key words: gene expression; microarrays; drug targets

In recent years, our knowledge of gene sequence has increased massively, principally due to large-scale cDNA and genome sequencing programs. The availability of this information resource has fu-eled efforts to develop ways to analyze gene expression systematically and as a result, there are nowa range of approaches available that allow parallel analysis of a large number of genes. These toolscan provide a comprehensive view of the genes expressed in samples of tissue, and even individualcells, and in so doing, will further our understanding of biochemical pathways and the functional roleof novel genes. This article examines a number of these new technologies, highlighting their advan-tages and limitations, and outlines how they are beginning to play an important role in the processof drug discovery.

The process of drug discovery, from target identification to marketing of the product, has alwaysbeen time-consuming, costly, and, in many cases, a successful development has relied on serendip-ity as much as on design. In recent years, however, there have been major advances in a number offields—notably combinatorial chemistry, genomics, proteomics, and bioinformatics—which nowpromise to rationalize the search for novel therapeutics. The hope is that, by employing these tech-nologies, it should first be possible to identify a biochemical pathway, gene, or protein that is per-turbed during the disease process, and then, based on this knowledge, design drugs to specifically

197

© 2000 John Wiley & Sons, Inc.

Correspondence to: Tom Freeman, Gene Expression Group, The Sanger Centre, Hinxton Hall, Cambridge, UK

Page 2: High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery

interact with the predefined target (see this issue, T. Harris). Such is the promise of the “new” ap-proach that almost all the major pharmaceutical companies are now investing substantially in the ex-ploration and exploitation of these technologies. Crucial to the genomics component is the analysisof where, when, and to what degree genes are turned on (expressed), so called “expression profil-ing.” This paper reviews developments in the field of expression profiling, highlights some of theinsights it will provide, and explains how it will play a role in the development of novel drugs.

Genes are stretches of genomic DNA that encode proteins. They are copied into messenger RNA(transcribed) and this mRNA carries the genetic code from the nucleus to the cytoplasm to direct protein synthesis. There are thought to be between 60,000–100,000 genes in the genome of man andother mammals. In each cell, only a proportion of these genes are active at any one time, thought tobe in the region of 10,000–15,000. The expression of these genes instructs the protein synthetic ma-chinery to produce a specific set of proteins that are required for the cell to perform its normal func-tional role. Certain genes are expressed in all cells all of the time, and encode so-called “house-keeping” proteins, whereas others are expressed only in certain cell types or at certain times. Theseproteins give a cell its unique structural and functional characteristics and ultimately make one celldifferent from another.

The complement of genes expressed by a cell is not, however, a fixed entity, and there are manygenes whose expression can be induced or reduced, as required. This provides the flexibility in bio-logical systems to respond and adapt to different stimuli, whether they are part of the normal devel-opment and homeostatic processes, or a response to injury, disease, or drug treatment. As transcrip-tion reflects physiological status, the ability to study the complement of genes expressed and theabundance of the mRNAs in a given tissue or cell type, i.e., their “expression signature,” ultimatelyprovide valuable insights into their biochemistry and function.

There are, however, certain inherent difficulties in the study of gene expression. Unlike DNA,which is essentially the same in all the cells of an individual, there can be enormous variation inabundance and distribution of a particular mRNA species among cells. Some genes are highly ex-pressed, that is to say their mRNA is abundant (.1,000 copies per cell); other genes are weakly ex-pressed, their transcripts being present at only a few copies per cell. In addition, because most tis-sues are composed of distinct cell populations, an mRNA, which is only present in one of those celltypes, perhaps already at low levels, becomes even rarer when the RNA is extracted from that tissueas it is diluted in the RNA derived from nonexpressing cells. It is also becoming increasingly ap-parent that for many genes, the transcripts can exist in different forms, so-called alternative splicevariants. These splice variants allow an even greater diversity in the complement of proteins that canbe generated from the genetic code, but add another level of complexity to the analysis of gene ex-pression.

At the present time, we have just scratched the surface in our understanding of the expressionpattern of most genes. This is largely because our ability to examine gene expression has for a longtime been severely limited. The Northern blot, the mostly commonly used tool for such analysis, isslow, normally uses radioactivity for detection, requires large amounts of RNA but is still relativelyinsensitive, and it is difficult to analyze more than a few genes at a time. These factors mean that thedata are of variable quality, which makes this method unsuitable for any large scale, routine appli-cation. As for our knowledge of the “global” picture of all the genes expressed in any given cell ortissue, it is almost nonexistent and could never realistically be addressed by such analytical tech-niques as the Northern blot. However, two factors are now set to change this, and, in so doing,promise to radically change the way we study and perceive biological systems. The first is the iden-tification of the genes themselves, and the second is the development of new technology to analyzetheir expression.

In recent years, large scale DNA sequencing projects have resulted in a massive increase in theavailability of gene sequences for a range of different species from bacteria to man (see this issue,T. Harris and D. Bentley). We now possess full or partial sequences for probably more than half thegenes present in man, with the remainder set to follow in the next couple of years with the comple-

198 • FREEMAN

Page 3: High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery

tion of the human genome project. The hope is that a number of these newly identified genes or nov-el genes of the disease pathway may be drug targets of the future, but validating these targets willnot be easy.

Knowledge of a novel gene’s sequence on its own usually provides few clues as to the functionalrole of the protein that it encodes. However, sequence information can be used for further charac-terization of the gene, and is a good starting point in defining its expression. If a gene’s expressionis limited to certain tissues or cell types, or its expression is changed during disease, then we can be-gin to postulate and focus in on those sites. Likewise, if other genes’ expressions map to the samecells, we can begin to link the protein complex or biochemical pathways with which the gene prod-uct might interact. In the same way, if we know that a gene is only expressed, for instance, in the in-testine, we can be fairly certain that it is not contributing to a disorder of the central nervous system.The other major goal of expression profiling has always been the identification of genes that are ex-pressed at different levels between one system or experimental paradigm and another. Knowledgeof these genes not only sheds light on the biochemical events underlying the change, but also in somecases provides a list of potentially interesting genes, for example, which genes are expressed in ma-lignant but not in normal cells. For these reasons, expression profiling will help in the understand-ing of gene function and the biology of complex systems, as well as in many aspects of the drug dis-covery process.

The last few years have witnessed the development of a plethora of new methodologies to studygene expression. Subtractive hybridization,1 subtractive PCR,2 differential display,3 serial analysisof gene expression (SAGE),4 in situ PCR,5,6 single cell PCR7,8 are to name but a few. While all canclearly provide insights into the complement of genes expressed in a particular system, what reallymakes the difference and ultimately determines how widely they are used in the future, is the de-tailed knowledge of how they work and the quality of the data they yield. The main issues in ex-pression profiling are, therefore, as follows:

1. Sensitivity: can you detect low abundance sequences, and how much starting material(mRNA) do you require (smaller being preferable)?

2. Specificity: is the assay highly specific for the transcript of interest?3. Quantitation: how accurately can it measure mRNA abundance?4. Reproducibility: how robust is the methodology, and can it generate the same results week

in, week out for a given sample?5. Coverage: how many transcripts can be analyzed at once, and does the approach have the

ability to detect previously uncharacterized genes?6. Redundancy: how often is each transcript sampled (some systems have the potential to an-

alyze the same mRNA a number of times, thereby increasing the complexity of the data)?7. False positives: how often do things appear to be differentially expressed but turn out not

to be?8. Scale of analysis: is the approach amenable to high throughput screening?9. Data output: does the assay give results that are easy to interpret?

10. Cost: what is the price of the system (is it proprietary), and what is the cost per gene per as-say?

Although the methodologies mentioned above have been widely used in the research environment,not all are applicable for high throughput, routine analysis of gene expression. However, one ap-proach above all others shows most promise in this respect and is now driving the field of expres-sion profiling, that is DNA “chips” or microarrays.

Although there are some important differences between DNA chips and microarrays, both workby a similar mechanism, that is, the hybridization of complex mixtures of DNA or RNA to comple-mentary DNA probes immobilized on a solid surface. DNA chips are at present available through one company: Affymetrix. Affymetrix utilizes a novel method of light-directed solid phase synthe-

HIGH THROUGHPUT GENE EXPRESSION SCREENING • 199

Page 4: High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery

sis of DNA on a glass substrate.9 Using technologies borrowed from the microchip industry, such asphotolithography, they can now synthesize up to 300,000 different oligonucleotide features in an areajust larger than 1 cm2. If these oligos are designed to be complementary to specific sets of genes,they can be used to analyze for the hybridization of the transcripts of those genes in a complex poolof RNA.10 40 DNA “features” (20 oligos matching the sequence and 20 mismatched oligos, whichare used as controls) are required for each gene (sequence) represented on the chip. The Affymetrixtechnology represents a large step forward in the ability to perform the parallel analysis of manygenes. Likewise, DNA microarrays, which were first developed by Patrick Brown and his group atStanford,11 also consist of small features of DNA immobilized on a solid substrate, usually a glassmicroscope slide (for review, see Ref. 12). However, in contrast to chips, the DNA, which is oftenderived from cDNA clone inserts, is placed onto the support using robotic spotting tools. Most ro-botic microarray systems available now can easily deliver over 10,000 spots of DNA onto a standardmicroscope slide.13 In both cases, expression profiling is then performed by extracting the RNA frombiological samples of interest, labeling cDNA or cRNA with fluorescence, hybridizing it to the chipor microarray, and finally examining the fluorescence intensity of each set of the oligonucleotide fea-tures or DNA spots. The brighter the feature or spot, the greater the level of expression. Comparisonof expression levels among two or more samples, can be performed by labeling one of the samplesor a reference sample with one fluorochrome, e.g., Cy3, while one or more others are labeled withanother, e.g., Cy5. Samples are hybridized at the same time, and analysis of differential expression,i.e., changes in the level of expression of a gene in one sample compared to the others, can then becalculated by measuring the ratio of the fluorescence derived from one sample vs. the other. If theratio is close to one, then the level of expression has not changed between the samples; if it signifi-cantly greater or lower one, it can be taken as a strong indication that the gene has been regulated upor down.

A number of studies have employed both chips and arrays, which have served to illustrate thepower of this approach.14 –19 No longer do we have to restrict our analysis to certain genes we “think”may play a part in the function of a biological system; we can simply perform the analysis and thenattempt to interpret resulting expression profiles.

So what role will expression profiling play in the future of drug discovery? The ability to ex-amine the expression of all or at least a large number of genes at once will provide new insights intothe biology of disease. For instance, many have for a long time sought to identify the differences be-tween normal and cancerous cells. Undoubtedly, there must gene products that play a crucial role inmetastasis and that could be targeted by drug treatment. However, working out which those are,which cancers they are active in, and their specificity to tumor cells should now be possible. To thatend, recently Golub et al.19 demonstrated that molecular classification of cancer types could be pre-dicted by gene expression profiling. Indeed, expression profiling is a powerful research tool for an-alyzing the differences between tumor types and distinguishing between tumors, which up until nowhave been classified on their morphological characteristics and the presence of a few markers. Oncea more complete set of genetic markers has been defined, they can then be employed as the diag-nostic tools of the future. The same is true for other pathologies that also result in an abnormal ex-pression profile in the diseased tissue. Again, knowing which genes have been altered will providenew insights into the underlying effects of the disorder. Genetic disorders, which have a complexmultigenic basis, can be mapped backed to so-called quantitative trait loci (QTLs). Recently, mi-croarrays in combination with QTLs were used to identify the genes underlying the genetic defect.20

(Also see this issue, D. West, et al. and T. Harris)One of the other main applications of expression profiling in the drug discovery process is in

the characterization of a compound’s action. For many therapeutics agents, even those that are al-ready on the market, little is known about their true mechanism of action, their effect on systems oth-er than the one that has been targeted, and how one drug differs in its action from another relatedcompound. Again, by monitoring expression levels before and at numerous times after administra-

200 • FREEMAN

Page 5: High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery

tion, it may be possible to observe subtle differences in the action of one compound that might becompletely unexpected or undetectable by other means.

In toxicology, many researchers are now looking to explore the mechanisms of the toxic re-sponse and the various pathways that are activated by compounds with different degrees of toxici-ty.21 It may possible to identify specific patterns of expression (or “signatures”) that are indicativeof a specific toxic response long before the damage can be observed by other means. Once the genesand pathways have been identified, smaller arrays of key genes might be used routinely to screen acompound for toxicity long before assays are currently performed.

These are still early days in the development of high throughput expression profiling and its ap-plication to the drug discovery process, and clearly a lot remains to be done. We have yet to identi-fy all human genes and, to an even greater extent, those of the main model organisms, notably ratand mouse. There is also considerable room for improvement in the basic technologies and method-ology. Many are now exploring novel surface chemistries that will help overcome the current limi-tations on the sensitivity of the systems and allow for an increase in the density of features. One ofthe major problems with the simultaneous analysis of the expression of thousands of genes is theshear weight of data it generates. Not only is it essential to link the result from each DNA probe backto the parent sequence, it is also necessary to decide which result is significant and then generate alist of “interesting genes.” If, say, 300 genes change in a given experimental paradigm, which aredue to experimental error, which ones are interesting and worthy of following up, and what do theytell us about what is really happening in the system under investigation? How also can one store andintegrate this enormous amount of data over years of experimentation and even compare data gen-erated by different labs? These problems will undoubtedly remain for sometime, but the impetus isclearly there to resolve them.

While clearly there is much to do on many fronts, it is almost certainly true to say that expres-sion profiling will help revolutionize the way we perform biological investigations in the future.RNA profiling, together with the many other genomics’ technologies, promises to have a major im-pact on the process of drug discovery.

R E F E R E N C E S

1. Watson JB, Margulies JE. Differential screening strategies. Develop Neurosci 1993;15:77–86.2. Diatchenko L, Lau YF, Campbell AP, Chenchik A, Moqadam F, Huang B, Lukyanov S, Lukyanov K, Gurskaya

N, Sverdlov ED, Siebert PD. Suppression subtractive hybridization: a method for generating differentially reg-ulated or tissue-specific cDNA probes and libraries. Proc Natl Acad Sci USA 1996;93:6025–6030.

3. Liang, Pardee AB. Differential display of eukaryotic messenger RNA by means of the polymerase chainreaction. Science 1992;257:967–971.

4. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science1995;270:484–487.

5. Nakai M, Kawamata T, Taniguchi T, Maeda K, Tanaka C. Expression of apolipoprotein E mRNA in rat mi-croglia. Neurosci Lett 1996;211:41–44.

6. Ohtaka–Maruyama C, Hanaoka F, Chepelinsky AB. A novel alternative spliced variant of the transcriptionfactor AP2alpha is expressed in the murine ocular lens. Dev Bio 1998;202:125–135.

7. Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R, Zettel M, Coleman P. Analysis of gene expres-sion in single live neurons. Proc Natl Acad Sci USA 1992;89:3010–3014.

8. Dixon AK, Richardson PJ, Lee K, Carter NP, Freeman TC. Expression profiling of single cells using threeprime end amplification (TPEA) PCR. Nucleic Acid Res 1998;26:4426–4431.

9. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat GenetSuppl 1999;1:20–24.

10. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, KobayashiM, Horton H, Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays.Nat Biotechnol 1996;14:1675–1680.

HIGH THROUGHPUT GENE EXPRESSION SCREENING • 201

Page 6: High Throughput Gene Expression Screening: Its Emerging Role in Drug Discovery

11. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with acomplementary DNA microarray. Science 1995;270:467–470.

12. Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, Childs G. Making and reading microarrays.Nat Genet Suppl 1999;1:15–19.

13. Bowtell DD. Options available—from start to finish—for obtaining expression data by microarray. NatGenet Suppl 1999;1:25–32.

14. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, Trent JM. Use of a cDNAmicroarray to analyse gene expression patterns in human cancer. Nat Genet 1996;14:457–460.

15. Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J, Woolley DE, Davis RW. Discovery andanalysis of inflammatory disease-related genes using cDNA microarrays. Proc Natl Acad Sci USA1997;94:2150–2155.

16. Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J Jr, BoguskiMS, Lashkari D, Shalon D, Botstein D, Brown PO. The transcriptional program in the response of humanfibroblasts to serum. Science 1999;283:83–87.

17. Cirelli C, Tononi G. Differences in brain gene expression between sleep and waking as revealed by mRNAdifferential display and cDNA microarray technology. Sleep Res Suppl 1999;1:44–52.

18. Bryant Z, Subrahmanyan L, Tworoger M, LaTray L, Liu CR, Li MJ, van den Engh G, Ruohola–Baker H.Characterization of differentially expressed genes in purified Drosophila follicle cells: toward a generalstrategy for cell type-specific developmental analysis. Proc Natl Acad Sci USA 1999;96:5559–5564.

19. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR,Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: class discovery and classprediction by gene expression monitoring. Science 1999;286:531–537.

20. Aitman TJ, Glazier AM, Wallace CA, Cooper LD, Norsworthy PJ, Wahid FN, Al–Majali KM, TremblingPM, Mann CJ, Shoulders CC, Graf D, St Lezin E, Kurtz TW, Kren V, Pravenec M, Ibrahimi A, AbumradNA, Stanton LW, Scott J. Identification of Cd36 (Fat) as an insulin-resistance gene causing defective fattyacid and glucose metabolism in hypertensive rats. Nat Genet 1999;21:76–83.

21. Nuwaysir EF, Bittner M, Trent J, Barrett JC, Afshari CA. Microarrays and toxicology: the advent of toxi-cogenomics. Mol Carcinog 1999;24:153–159.

Tom C. Freeman is a Research Fellow and the head of the Gene Expression Group at the Sanger Centre, Well-come Trust Genome Campus, Cambridge, UK. His pre-and postdoctoral work focused on the physiology andcellular biology of gastrointestinal epithelia. Since joining the Sanger Centre in 1994, his interests have cen-tered on the development and application of technologies for the performance of gene expression as a meansto further the understanding of the molecular events underlying physiological and pharmacological function,and the role of newly identified genes.

202 • FREEMAN