genomics, genetic epidemiology, and genomic medicine

9
CLINICAL GENOMICS Genomics, Genetic Epidemiology, and Genomic Medicine KONSTANTINOS N. LAZARIDIS* and GLORIA M. PETERSEN* ,‡ *Division of Gastroenterology and Hepatology, Center for Basic Research in Digestive Diseases, and Division of Epidemiology, Mayo Clinic College of Medicine, Rochester, Minnesota Medical science is on the threshold of unparalleled progress as a result of the advent of genomics and related disciplines. Human genomics, the study of structure, func- tion, and interactions of all genes in the human genome, promises to improve the diagnosis, treatment, and preven- tion of disease. This opportunity is the result of the recent completion of the Human Genome Project. It is anticipated that genomics will bring to physicians a powerful means to discover hereditary elements that interact with environ- mental factors leading to disease. However, the expected transformation toward genomics-based medicine will oc- cur over decades. It will require efforts of many scientists and physicians to begin now to sort out the vast amounts of information in the human genome and translate it to meaningful applications in clinical practice. Meanwhile, practicing physicians and health professionals need to be trained in the principles, applications, and limitations of genomics and genomic medicine. Only then will we be in a position to benefit patients, which is the ultimate goal of accelerating scientific progress in medicine. In this inaugu- ral article, we introduce and discuss concepts, facts, and methods of genomics and genetic epidemiology that will be drawn on in the forthcoming topics of the clinical genomics series. I n April 1953, the seminal discovery of the double helical structure of DNA by James Watson and Francis Crick 1 revolutionized the biologic sciences. Exactly 50 years later, the complete sequence of the human genome became a reality, 2,3 a scientific landmark achieved during a period of 13 years by an international effort known as the Human Genome Project (HGP). If the pre-genomic era was ended by the complete sequencing of the genome of Homo sapiens, then the genome (or post-genome) era has already started. 4,5 Given these extraordinary scientific achievements, it is timely to assess the current status and future influence of genomics science in gastrointestinal and liver diseases. In this issue of the journal, precisely 52 years after the descrip- tion of the double helix, we launch the first installment of a series on clinical genomics in Clinical Gastroenterology and Hepatology. Topics will be included in the series to cover both single-gene (ie, Mendelian) and complex (ie, multi- factorial) gastrointestinal and liver diseases. The knowledge gained by completion of the HGP, cou- pled with the rise of the discipline of genomics and other related scientific fields, will positively promote basic and translational studies to better understand the interplay of genetic predisposition and environmental factors in causing disease. These 2 elements have to be dissected to shed light on disease pathogenesis and devise novel treatments before we can prevent illnesses. Amidst all this exciting scientific progress, one important question comes to mind. Will genomics alter the means we currently use to diagnose, treat, and prevent gastrointestinal and liver diseases? The answer to this question is not simple, but the promise is enormous. We need first to understand where we currently stand and to recognize the challenges and opportunities that lie ahead. In this article, we present principles and approaches of genomics and genetic epidemiology that will be discussed in the future themes of the clinical genomics series. Geno- mics defines a scientific field that aspires to investigate the structure, function, and interaction of all genes in the entire human genome. 6 Genetic epidemiology is the discipline that investigates the basis for susceptibility to disease by using family and population studies. 7 To this end, 3 re- marks are in order. First, although we are at the beginning of the genomic era, this period is more than simply discov- ering human genes. Indeed, what should define the genomic era in gastroenterology and hepatology is under- standing the functions of thousands of genes that are in- volved in regulating the cellular and molecular pathways of the digestive system and liver. Second, the human genome is in perpetual interaction with the environmental factors that operate long before birth and have a significant con- Abbreviations used in this paper: HGP, Human Genome Project; SNP, single nucleotide polymorphism; TDT, transmission disequilib- rium test. © 2005 by the American Gastroenterological Association 1542-3565/05/$30.00 PII: 10.1053/S1542-3565(05)00085-6 CLINICAL GASTROENTEROLOGY AND HEPATOLOGY 2005;3:320 –328

Upload: gloria-m

Post on 03-Jan-2017

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Genomics, genetic epidemiology, and genomic medicine

C

G

K*C

Mpdtptctdmtcaomptgparmbg

Irtr1GbtGtgttaH

CLINICAL GASTROENTEROLOGY AND HEPATOLOGY 2005;3:320–328

LINICAL GENOMICS

enomics, Genetic Epidemiology, and Genomic Medicine

ONSTANTINOS N. LAZARIDIS* and GLORIA M. PETERSEN*,‡

Division of Gastroenterology and Hepatology, Center for Basic Research in Digestive Diseases, and ‡Division of Epidemiology, Mayo Clinic

ollege of Medicine, Rochester, Minnesota

bf

prtgdowpgtaesl

gimshtumoegsvtit

Sr

edical science is on the threshold of unparalleledrogress as a result of the advent of genomics and relatedisciplines. Human genomics, the study of structure, func-ion, and interactions of all genes in the human genome,romises to improve the diagnosis, treatment, and preven-ion of disease. This opportunity is the result of the recentompletion of the Human Genome Project. It is anticipatedhat genomics will bring to physicians a powerful means toiscover hereditary elements that interact with environ-ental factors leading to disease. However, the expected

ransformation toward genomics-based medicine will oc-ur over decades. It will require efforts of many scientistsnd physicians to begin now to sort out the vast amountsf information in the human genome and translate it toeaningful applications in clinical practice. Meanwhile,

racticing physicians and health professionals need to berained in the principles, applications, and limitations ofenomics and genomic medicine. Only then will we be in aosition to benefit patients, which is the ultimate goal ofccelerating scientific progress in medicine. In this inaugu-al article, we introduce and discuss concepts, facts, andethods of genomics and genetic epidemiology that will

e drawn on in the forthcoming topics of the clinicalenomics series.

n April 1953, the seminal discovery of the double helicalstructure of DNA by James Watson and Francis Crick1

evolutionized the biologic sciences. Exactly 50 years later,he complete sequence of the human genome became aeality,2,3 a scientific landmark achieved during a period of3 years by an international effort known as the Humanenome Project (HGP). If the pre-genomic era was endedy the complete sequencing of the genome of Homo sapiens,hen the genome (or post-genome) era has already started.4,5

iven these extraordinary scientific achievements, it isimely to assess the current status and future influence ofenomics science in gastrointestinal and liver diseases. Inhis issue of the journal, precisely 52 years after the descrip-ion of the double helix, we launch the first installment ofseries on clinical genomics in Clinical Gastroenterology and

epatology. Topics will be included in the series to cover

oth single-gene (ie, Mendelian) and complex (ie, multi-actorial) gastrointestinal and liver diseases.

The knowledge gained by completion of the HGP, cou-led with the rise of the discipline of genomics and otherelated scientific fields, will positively promote basic andranslational studies to better understand the interplay ofenetic predisposition and environmental factors in causingisease. These 2 elements have to be dissected to shed lightn disease pathogenesis and devise novel treatments beforee can prevent illnesses. Amidst all this exciting scientificrogress, one important question comes to mind. Willenomics alter the means we currently use to diagnose,reat, and prevent gastrointestinal and liver diseases? Thenswer to this question is not simple, but the promise isnormous. We need first to understand where we currentlytand and to recognize the challenges and opportunities thatie ahead.

In this article, we present principles and approaches ofenomics and genetic epidemiology that will be discussedn the future themes of the clinical genomics series. Geno-ics defines a scientific field that aspires to investigate the

tructure, function, and interaction of all genes in the entireuman genome.6 Genetic epidemiology is the disciplinehat investigates the basis for susceptibility to disease bysing family and population studies.7 To this end, 3 re-arks are in order. First, although we are at the beginning

f the genomic era, this period is more than simply discov-ring human genes. Indeed, what should define theenomic era in gastroenterology and hepatology is under-tanding the functions of thousands of genes that are in-olved in regulating the cellular and molecular pathways ofhe digestive system and liver. Second, the human genomes in perpetual interaction with the environmental factorshat operate long before birth and have a significant con-

Abbreviations used in this paper: HGP, Human Genome Project;NP, single nucleotide polymorphism; TDT, transmission disequilib-ium test.

© 2005 by the American Gastroenterological Association1542-3565/05/$30.00

PII: 10.1053/S1542-3565(05)00085-6

Page 2: Genomics, genetic epidemiology, and genomic medicine

tTdan

stmtdmpadts

fbct

aasraclrbtntnopn

witoTtfs

wirrbtimptbtnlgdo

aatvpdsTeopedeersspdvd

epaOoaltd

April 2005 GENOMICS, GENETIC EPIDEMIOLOGY, AND GENOMIC MEDICINE 321

ribution to gastrointestinal and liver disease biology.hird, clinicians will have a significant influence on theirection and application of genomics in gastroenterologynd hepatology, as they effectively assess and classify perti-ent disease phenotypes and traits.

In the next sections, we discuss the differences betweeningle-gene and complex diseases, particularly as these relateo discovering disease-causing genes; the structure and ele-ents of human genome; the variation of human genome;

he relation of genetic variation to disease phenotypes; studyesigns to dissect disease-causing genetic variants; the hu-an haplotype map; and the ethical, legal, and social im-

lications of human genomics. A basic glossary of genomicsnd genetic epidemiology terms is included in the Appen-ix to familiarize the readers with the terminology used inhe present and upcoming articles of the clinical genomicseries that will follow.

Single-Gene Diseases VersusComplex DiseasesFrom a “genetics” perspective, all diseases, aside

rom most cases of trauma, have a genetic component.8 Inroad terms, there are 3 categories of genetic disorders:hromosomal, single-gene (Mendelian), and complex (mul-ifactorial).

Chromosomal diseases are the outcome of deletion orddition of intact chromosomes or segments that affectpproximately 1% of live-born deliveries. Many chromo-omal disorders lead to spontaneous abortions or miscar-iages because lack of or aberrant chromosomes are usu-lly incompatible with life. Thus, the peak incidence ofhromosomal disorders occurs before birth. In case of aive-born delivery, the deficit or gain of a chromosomeesults in profound physical characteristics of the new-orn (eg, Down syndrome – trisomy 21). This is becausehere is loss or addition of hundreds of genes that areormally expressed on a chromosome. The advent ofechniques like cytogenetics to study the chromosomeumber and structure in peripheral blood lymphocytes orther cells has facilitated the prenatal diagnosis andrevention of chromosomal disorders (eg, through am-iocentesis).Single-gene diseases exhibit familial patterns consistent

ith autosomal recessive, autosomal dominant, or X-linkednheritance. The characteristic of dominant inheritance ishat only a single trait-causing allele located on an autosomer an X chromosome is required to express the phenotype.he hallmark of an autosomal recessive phenotype necessi-

ates that both alleles (ie, paternal and maternal) be presentor the trait to be expressed. To develop an X-linked reces-

ive disease, a male needs only a single trait-causing allele, M

hereas the female needs both alleles. Moreover, for express-ng an X-linked dominant trait, only a single allele isequired in either male or female. Mendelian diseases seg-egate in families, and full expression of the disease is causedy a few rare mutations of a single gene. In a given family,he same mutation is responsible for the disease phenotype;n another family, a different mutation of the same geneight occur. Single-gene diseases are uncommon in the

opulation; the most frequent is hereditary hemochroma-osis, which affects 1 of every 300 individuals. The geneticasis of Mendelian diseases is considered simple because ofhe direct correspondence of a specific genotype to a phe-otype (Figure 1). More than 1000 genes causing Mende-ian diseases have been identified. This catalog of humanenes linked to genetic diseases is available on-line at Men-elian Inheritance in Man (OMIM) (www.ncbi.nlm.nih.gov/mim).

Complex diseases, such as irritable bowel syndrome, non-lcoholic steatohepatitis, and inflammatory bowel disease,re considered multifactorial in etiology. It is believed thathese diseases are caused by interaction of several geneticariants with environmental factors; thus, the term “com-lex” diseases (Figure 2). As a result, the direct correspon-ence of one genotype to one phenotype that characterizes aingle-gene disease does not exist in a complex disorder.his fundamental concept might explain the proposed het-rogeneity of complex disease etiology and/or the variationf phenotypes (Figure 2), that is, disease manifestations,rogression, and response to treatment. Thus, complex dis-ases have a genetic component, not strictly Mendelian, butemonstrate familial aggregation, in which the risk of dis-ase among relatives of the proband is greater than thestimated risk in the general population.9 The term, relativeisk ratio of a sibling (�s), was coined to define the risk of aibling developing a specific disease if a biologic brother orister is already affected. The �s is calculated by dividing therevalence of disease among siblings with the prevalence ofisease in the general population.9 Therefore, the higher thealue of �s, the greater the evidence for a genetic role in aisease.

Mendelian and complex diseases operate at differentnds of a spectrum. Although their prevalence in theopulation is low, Mendelian diseases are the result ofsingle-gene with high penetrance of the phenotype.n the other hand, complex diseases are the products

f modest effects of multiple genetic variants (genesnd non-gene genomic regions) and have high preva-ence in the population. It is also important to stresshat environmental factors contribute more to theisease phenotype in complex diseases, compared with

endelian disorders. Identifying the environmental
Page 3: Genomics, genetic epidemiology, and genomic medicine

fs

abhoh

nhoacfctugcfbgrs

isageeiweas(s

FciXmspC

Fettata2

FrersfU

322 LAZARIDIS AND PETERSEN CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 3, No. 4

actors that are harmful for an individual with geneticusceptibility is our challenge (Figure 3).

Structure and Elements of theHuman GenomeOne of the main tasks of the HGP was to produce

99.99% accurate human genome sequence that woulde publicly available.10 With Internet access, the entireuman genome of 3.2 giga-bases can be browsed, nucle-tide by nucleotide, at http://genome.ucsc.edu/cgi-bin/gGateway?org�human.

igure 1. In Mendelian diseases, a single gene is responsible forausing a disorder, and the disease phenotype follows a predictednheritance pattern (ie, autosomal dominant, autosomal recessive, or-linked). In a family, all affected members carry exactly the sameutation. Mendelian diseases are characterized by a close corre-

pondence of a genotype to a phenotype. Modified and reprinted withermission from Peltonen L, et al.5 Science 2001;291:1224–1229.opyright 2001 AAAS (www.sciencemag.org).

igure 2. In complex diseases, multiple genetic variants interact withach other along with the environment to cause the disease pheno-ype. Each genetic variant and the environment have a small effect onhe phenotype. Because of the contribution of several genetic variantsnd environmental factors, complex diseases are heterogeneous inheir pathogenesis, progression, and response to treatment. Modifiednd reprinted with permission from Peltonen L, et al.5 Science 2001;

a91:1224–1229. Copyright 2001 AAAS (www.sciencemag.org).

The international effort of the HGP has providedovel information2,3: (1) the number of genes in theuman genome is approximately 30,000, much less thannce was expected; (2) human genes are unevenly spacedcross the genome; there are gene-rich and gene-poorhromosomes; (3) less than 2% of genomic DNA encodesor proteins; (4) more than 50% of the genomic DNAonsists of repetitive sequences that might have func-ional capacity; (5) approximately 35% of human genesndergo alternative splicing (a molecular mechanism toenerate isoforms of proteins with different functionalapacities; see Appendix); (6) genomic regions that codeor proteins account for less than 50% of DNA that haseen conserved during 70 million years since the diver-ence of human and mouse, suggesting that non-codingegions of the human genome are subject to evolutionaryelection, much more than was previously appreciated.

The goals of the HGP continue into the genome era andnclude: (1) classification and characterization of the entireet of functional elements encoded in the human genomend sequencing of other mammalian and non-mammalianenomes. Comparison of genome sequences derived fromvolutionarily diverse species provides a strategy for discov-ry of functionally important genomic elements; (2) exam-nation of developmental and organizational genetic net-orks and protein pathways in humans aimed at

lucidating the mechanism(s) that contribute to cellularnd/or whole-organ phenotypes; (3) comprehensive under-tanding of the heritable variation in the human genome;4) creation of policy guidelines that would assist the wide-pread use of genomic information in the research enterprise

igure 3. In complex diseases, the susceptibility genotype of 2 un-elated individuals (A and B) and their separate interactions with thenvironment define the present health status. The genotypes norm ofeaction will determine the future health course (ie, healthy vs sickeparated by the dotted line). Modified and reprinted with permissionrom Sing CF, et al.24 In: Variation in the human genome. Chichester,K: John Wiley and Sons, 1996:211–232.

nd clinical practice.4

Page 4: Genomics, genetic epidemiology, and genomic medicine

nym(eesoi

tsdgsuhobwec(it

nnto

h(Fgn(lfmTgpagt

1mctiotbm

rtssaadMl(psqfa

cim(oGes

lgbmaoafhtaseh

April 2005 GENOMICS, GENETIC EPIDEMIOLOGY, AND GENOMIC MEDICINE 323

Variation of the Human GenomePrevalence and Origin

The human species displays relatively limited ge-etic diversity (ie, variation or polymorphism) because of itsoung age (�100,000 years) and the fact that geneticaterial has been transmitted through a small number

�5,000) of generations from our ancestral origins.6 Nev-rtheless, as demonstrated by the HGP, genetic variationxists in both health and disease. No doubt, better under-tanding of the relation between genetic variation and bi-logic function of gene(s) will furnish us with novel insightsnto human biology.11

Whereas monozygotic (ie, identical) twins share 100% ofheir genetic material, any 2 unrelated human beings alsohare 99.9% of their genomic sequence.2,3 Interestingly, theifference of 0.1% translates to approximately 3 millionenetic variants scattered across the human genome. Theseubtle genetic differences, coupled with an individual’snique environmental exposures (ie, household, lifestyles,abits, etc), will determine the phenotypic variation webserve in health (eg, body weight) or disease (eg, irritableowel syndrome, colon cancer). One of the aims of HGPas to develop a comprehensive catalog of the millions of

xisting human genetic variants. These variants can beatalogued into: (1) single nucleotide polymorphismsSNPs), (2) repetitive sequences located within intergenic orntronic DNA (ie, microsatellites), and (3) insertions/dele-ions.2,3

The origin of genomic variation stems from a combi-ation of random nucleotide substitutions and recombi-ation events that occurred over thousands of genera-ions.11–13 Genetic variants are a reflection of the historyf genetic events, most of which are innocuous.13

Categories of Variation

Single nucleotide polymorphisms. SNPs areighly abundant and account for the vast majority�90%) of polymorphic loci in the human genome.12

or approximately every 500–1000 base pairs of humanenome sequence, there is a SNP, where alternates ofucleotides can exist. For example, a SNP such as a C/Tie, abbreviation for cytosine or thymine) is a nucleotideocation that can harbor 1 of 2 alleles (C or T). The morerequent allele of a SNP in a population is called theajor allele, rendering the other one the minor allele.12

he total number of SNPs that exist in the humanenome relates to the number of individuals tested in aopulation. For instance, between 2 unrelated individu-ls there are approximately 3 � 106 SNPs across theirenomes. However, in a large population of individuals

he number of SNPs increases, likely reaching more than b

07.12 The location of each SNP within the genomeight determine its functional significance. SNPs lo-

ated within or in proximity to a gene are more probableo have an impact on gene function, particularly if theyntroduce a stop codon or change an amino acid moietyf a protein. SNPs populating intergenic regions arehought to have nonfunctional consequences on gene(s),ut they can serve as useful genetic markers in diseaseapping studies and population genetics.14

Microsatellites. Microsatellites represent shortuns (less than 100 base pairs in length) of tandem di-,ri-, or tetra-nucleotide repeats with a very simple DNAequence (eg, CACACACACACACACACA is a micro-atellite consisting of 9 di-nucleotide [CA] repeats).9 Inpopulation, polymorphic microsatellites have multiple

lleles, usually 8–12. Each allele is represented by theifferent number of di-, tri-, or tetra-nucleotide repeats.9

icrosatellites are evenly spread across chromosomes, areess frequent in the human genome (�105) than SNPsup to 107), but more polymorphic than the latter. Theolymerase chain reaction has been used to type micro-atellites across the entire human genome and subse-uently to identify alleles that are linked to disease. Inact, microsatellites have been used successfully in link-ge studies of Mendelian diseases.

Other variations. Insertions and deletions are lessommon polymorphisms and represent �5% of variationn the human genome. Insertions occur when one orore nucleotides are introduced into a sequence of DNA

eg, AGGCC3 AGGACC). Deletions happen when oner more nucleotides are lost from a sequence (eg, AG-CC 3 AGCC). Several other types of genetic variants

xist in human chromosomes2; however, further discus-ion on this topic is beyond the scope of this article.

Significance of Human Genetic Variation

Genetic variation is the spice of life.12 SNPs areikely the most important genetic variants in the humanenome because of their high frequency. SNPs can be theasis for a healthy trait or disease phenotype. As inheritedarkers of variation, SNPs might be in close proximity togenetic factor that causes a disease. When recombinationn a chromosome between a SNP and a disease-causingllele has taken place, such a SNP and the actual geneticactor are said to be in linkage disequilibrium and form aaplotype (Figure 4).15 An international effort is underwayo define SNP-based haplotypes (ie, combinations of SNP-lleles found at neighboring loci on the same chromosomalegment, which tend to be transmitted together from gen-ration to generation). These chromosomal regions (termedaplotype-blocks) represent stretches of 25,000–35,000

ase pairs in length spanning the human genome.16 More
Page 5: Genomics, genetic epidemiology, and genomic medicine

ihitiH

bcwwh

ggehvhs

ep

esrrttddptfptwcblmiccoht

calaltvapcsfhq

gnd

Fatmc(cmpip

324 LAZARIDIS AND PETERSEN CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 3, No. 4

mportantly, although many SNPs could be present in aaplotype-block, only a few (termed tag-SNPs) will bemportant to define each block and its haplotypes. Defininghe haplotype-blocks and relevant tag-SNPs of our genomes the current aim of the Human Haplotype Map (ie, theapMap) project.4

SNPs and SNP-based haplotype methods will likelye powerful approaches to identify the genetic basis forommon complex diseases. In the near future, large-scale,hole-genome association studies based on tag-SNPsill become feasible and enable the identification ofuman haplotypes that predispose to disease.

Relation of Genetic Variation toDisease PhenotypesThe current theoretical and practical challenges in

enomics and genetic epidemiology emerge from theoal to link human genetic variation with complex dis-ase phenotypes. To this end, 2 overarching hypothesesave been proposed: (1) the common disease–commonariant hypothesis and (2) the common disease–rare alleleypothesis. Each has different implications for designingtudies to discover the genetic basis for complex diseases.

The Common Disease–Common VariantHypothesis

The common disease– common variant hypoth-sis is based on the fact that the present human

igure 4. The position of a genetic variant (SNP) is shown with anrrow on an ancestral chromosome. Because of meiotic recombina-ion that occurs over thousands of generations, contemporary chro-osomes have variable length segments of the common ancestral

hromosome (regions shown in white) that flank the original SNParrowhead), whereas new chromosomal sections introduced by re-ombination are depicted by regions shown in gray. Thus, geneticarkers (SNPs) within the regions shown in white that are in physicalroximity with the original SNP (arrowhead) will remain associated and

n linkage disequilibrium to the latter. Modified and reprinted withermission from Ardlie KG, et al.25 (http://www.nature.com/).

opulation of 6 billion people represents a global f

xpansion that occurred �100,000 years ago from aingle sub-Saharan African founding population ofelatively small size (�10,000 people). Thus, the cur-ent human population shares a number of alleles fromhis small group of founders. The hypothesis proposeshat alleles present before the global expansion andivergence of humans contribute significantly to pre-isposition (ie, susceptibility alleles) of common com-lex disease. Such alleles might bestow moderate risko common disease and should occur at relatively highrequencies (higher than 1%) in the present humanopulation.17 This high frequency of alleles implieshat association studies in large population cohortsill lead to identifying the susceptibility variants of

ommon complex diseases. The presence of haplotype-locks in the human genome and the fact that aimited number of common haplotypes account for theajority of haplotypes16 suggest that association stud-

es with representative SNPs (tag-SNPs) will identifyommon haplotypes associated with predisposition toommon complex diseases. This hypothesis is the the-retical basis for developing a genome-wide humanaplotype map that describes all major haplotypes andhe tag-SNPs that define them.4

The Common Disease–Rare AlleleHypothesis

An opposing view proposes that most commonomplex diseases are caused by rare rather than frequentlleles.18,19 The hypothesis predicts extensive allelic andocus heterogeneity at complex disease loci (ie, differentlleles at the same locus and alleles at numerous differentoci independently cause the same disease phenotype). Fur-hermore, it is postulated that more than 99% of theariants predisposing to common complex diseases arosefter the global expansion and divergence of the humanopulation.18 If this hypothesis is true, genome-wide asso-iation studies in a heterogeneous population that search forusceptibility alleles of common complex diseases will beruitless. Similarly, the current construction of a humanaplotype map based on common alleles would be inade-uate to define the variants of common complex diseases.

Study Designs to Dissect Disease-Causing Genetic VariantsLinkage Analysis

This well-established approach to localize diseaseenes has proven useful for Mendelian disorders.9 Ge-etic linkage analysis is based on the fact that alleles ofisease genes and genetic markers (which are analyzed

or) co-exist on the same chromosomes and should seg-
Page 6: Genomics, genetic epidemiology, and genomic medicine

rHmhdfpts

smscddtuggc

taodptcavg

vggetcitimtabturi

ssi

nbBie(agosm

dccmnTtoattfpvfi

ssrdtah

cpob2g

April 2005 GENOMICS, GENETIC EPIDEMIOLOGY, AND GENOMIC MEDICINE 325

egate simultaneously (ie, they are physically linked).owever, during meiotic recombination chromosomesight not stay intact; crossing over between a pair of

omologous chromosomes will result in separation of aisease gene and genetic marker to the chromosomes. Itollows that the inter-locus chromosomal distance isroportionally related to the probability of independentransmission of alleles (ie, closely located alleles willegregate together because recombination is rarer).

In principle, linkage analysis seeks to detect the co-egregation of polymorphic genetic markers (ie, DNAicrosatellites) among affected family members. For in-

tance, in a given family, affected relatives should shareommon chromosomal regions and genes causing theisease. Once a regional chromosomal linkage between aisease and a genetic marker is established, then addi-ional markers within this genomic region can be eval-ated to more closely map the location of the diseaseene.9 Linkage studies have limited capacity to detectenes with low penetrance of the disease phenotype (ie,omplex diseases).

Association Studies

Given the challenges of linkage strategies to iden-ify the causal genes for complex diseases, alternativepproaches, such as association analysis, have been devel-ped. Association analysis is based on a case-controlesign that searches for a statistical correlation betweenarticular genetic variant(s) and a disease or diseaserait.20 Large association studies possess greater statisti-al power than linkage methods to detect genes that havesmall effect on the disease phenotype.21 The genetic

ariants (ie, SNPs) might be located on genes (candidateenes) or distributed throughout the genome.One association study design that evaluates genetic

ariant(s) of plausible candidate genes would follow thiseneral procedure22,23: (1) hypothesized (ie, “candidate”)enes that possibly might be involved in the pathogen-sis of a disease of interest are suggested; (2) the func-ional genetic variants with or in close proximity tooding regions, 5= and 3= untranslated regions, andntron/exon boundaries of the candidate genes are iden-ified; (3) subjects are ascertained including careful def-nition of the disease phenotype in cases and well-atched, unrelated, unaffected individuals (controls); (4)

he cases and controls are genotyped; and (5) statisticalnalysis to determine whether there is an associationetween the examined variants and the disease pheno-ype. Candidate gene approaches can be limited by pop-lation stratification biases and inability to reproduceeported associations.23 Often, the lack of reproducibility

s because the initial reports might be based on small s

amples (number of cases less than 200) or differences intudy design or heterogeneity of disease locus (ie, affectedndividuals possess causal variants at different loci).

Another approach based on an association strategy isot feasible at present but will be facilitated significantlyy the completion of the Human Haplotype Map.riefly, this method is a genome-wide association study

n which hundreds of thousands of specific SNPs thatncompass the entire genome are analyzed in patientscases) and unrelated normal individuals (controls). Link-ge disequilibrium analysis is then used to map theenomic region in order to identify susceptibility genesr variants. This method is unbiased with respect topecific genes or regions of the genome; however, itight be biased because of population stratification.

Transmission Disequilibrium Test

The transmission disequilibrium test (TDT) waseveloped to address the stratification concerns of case-ontrol association studies. In essence, TDT is an asso-iation study that uses family-based designs. In thisethod, probands and both parents, who might or might

ot be affected by the disease of interest, are studied. TheDT compares the frequencies of parental alleles that are

ransmitted to their affected offspring to the frequenciesf the alleles that are not transmitted. If a disease isssociated with a high-risk allele, then the frequency ofhat allele is anticipated to be greater among the allelesransmitted, compared to the non transmitted alleles. Inact, when an association study (case-control) identifies aotentially causal allele for a disease, then this geneticariant should be subjected to TDT to validate such anding.

The Human Haplotype MapThe goal of the HGP to elucidate the complete

equence of the human genome has been achieved. De-pite this milestone, however, the greatest challengesemain ahead. The translation of this accurate sequenceata collection into discoveries that allow the identifica-ion of disease-causing genes and genetic variants will ben enormous task for scientists, physicians, and otherealth professionals for years to come.As mentioned above, genetic variants such as SNPs

an affect an individual’s overall phenotype and predis-osition to disease. A group of neighboring SNPs locatedn the same chromosome can be inherited together as alock (ie, haplotype-block).6 It is anticipated that00,000–400,000 blocks might exist in the humanenome.16 Although each block contains several SNPs, a

mall number of tag-SNPs will be adequate to identify
Page 7: Genomics, genetic epidemiology, and genomic medicine

mtNhsthHncqigptcdYr

psag

tmgghaeatdt

gfprtcpct

t

ttatmidep

kteimhnumi

mtdice(otft

Fvon

326 LAZARIDIS AND PETERSEN CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 3, No. 4

ost blocks in the genome as well as define the haplo-ypes that exist in a block. Thus, the current aim of theational Institutes of Health is to develop a human

aplotype map (HapMap) across the entire genome.4 Inimple terms, the HapMap will be a navigator of haplo-ype blocks, along with the tag-SNPs that will define theaplotypes present in each block. The significance of theapMap relates to decreasing the number of SNPs

eeded to pursue a whole-genome association study inomplex diseases. For example, investigators will be re-uired to study only approximately 500,000 tag-SNPs,nstead of millions of SNPs that exist in the humanenome, to define those haplotypes associated with com-lex diseases. Because haplotypes differ among popula-ions of different origins, the HapMap will focus onommon SNPs and haplotypes in 4 large geographicallyistinct ethnic groups, namely, Japanese, Han Chinese,oruba of Nigeria, and US residents with ancestry de-

ived from northern and western Europeans.

Ethical, Legal, and SocialImplications of Human GenomicsThe Ethical, Legal and Social Implications (ELSI)

rogram of the National Human Genome Research In-titute was developed as an essential part of the HGP toddress the related ethical and sociolegal issues of humanenomics.Physicians are responsible for protecting the confiden-

iality of their patients’ medical records and to practiceedicine in a safe manner: “primum non nocere.” However,

enetic information is discrete from other types of dataathering (ie, demographic, social, medical) because itas implications for future risk of disease in an individualnd likely his/her relatives. The relevant values of bio-thics, including the principles of beneficence, respect forutonomy, privacy, confidentiality, and equity all applyo genetic testing. These principles have implications foretermining how clinicians should approach and managehe genetic information with patients and their relatives.

Physicians are often asked to disclose benefits/risks ofenetic testing, maintain confidentiality of genetic in-ormation, and to warn of inherited genetic risk toatients and their family members. The legal and socialamifications of genetic predisposition testing are mul-iple and interrelated. Genetic information is consideredonfidential. Genetic results should be released only toatients, and health professionals must exercise all pre-autions to prevent unauthorized disclosure to third par-ies.

A threat of mishandling genetic information relates to

he loss of employability if a genetic risk becomes known e

o the employer. Thus, laws are in place to protect accesso health insurance including the Health Insurance Port-bility and Accountability Act. Another concern relateso the possibility that individuals at risk for diseaseight experience genetic discrimination. For example,

nsurance companies might use genetic information toeny insurance coverage and individuals may be deniedmployment because genetic testing demonstrated that aerson is at high risk to develop chronic disease.To address the ethical, legal, and social implications of

nowing our genetic predisposition is a multifacetedask. Physicians and health care providers have to beducated on how to interpret and communicate geneticnformation to patients and relatives to help them toake informed decisions regarding their health. Public

ealth agencies should focus on determining when ge-etic data and tests are trustworthy for routine clinicalse. Society has to create laws and monitor their imple-entation to prevent the inappropriate use of genetic

nformation.

SummaryGenomics will likely influence the practice of

edicine for years to come (Figure 5). The overall aim iso be able to better predict the risk of an individual toevelop common complex disease, so that preventiventerventions can be applied, and, if needed, treatmentan be optimized. To achieve this goal, 3 steps appearssential. First, we need to better elucidate the structureie, variation) and function of the human genome. Sec-nd, genetic epidemiology studies are required to dissecthe inherited susceptibility variants and environmentalactors accounting for contribution to disease pheno-ypes. Third, experimental biology approaches are nec-

igure 5. Genomics will likely lead to better understanding of geneticariation as a susceptibility of disease. This knowledge will improveur diagnosis, treatment, and hopefully prevention of human ill-esses.

ssary to translate the discovered susceptibility variants

Page 8: Genomics, genetic epidemiology, and genomic medicine

inp

gtgcdiiac

A

A

A

C

C

E

E

E

G

H

H

H

H

I

L

L

L

L

L

M

M

M

M

M

M

P

P

P

R

R

April 2005 GENOMICS, GENETIC EPIDEMIOLOGY, AND GENOMIC MEDICINE 327

nto clinical tests for early disease diagnosis and to deviseovel pharmacologic targets to more effectively treatatients.At the dawn of the 21st century, progress in human

enomics along with genetic epidemiology will lead uso genomic medicine. In this new period, a patient’senetic variation will likely affect the provided medicalare of many diseases, whether it is for better prevention,iagnosis, or treatment. At the genome era, the practic-ng clinician has and will continue to have a criticallymportant role in identifying the phenotype of diseasend applying the advances of genomic medicine to theare of the patient.

Appendix: Glossary

llele an alternative form (ie, “spelling”) of a gene or a DNAsequence at a specific locus.

lternative splicing a regulatory mechanism by whichvariations in the incorporation of a gene’s exons, or codingregions, into messenger RNA lead to the production ofmore than one related proteins or isoforms.

utosomes the sum of human chromosomes not includingthe sex chromosomes (ie, X and Y) and the mitochondrialDNA.

odon a three-base nucleotide sequence (ie, DNA or RNA)that signifies a particular amino acid.

odon, stop a codon causing termination of protein trans-lation.

pigenetic an idiom describing non-mutational phenom-ena, such as methylation and histone modification, thatalter gene expression.

uchromatin the loose chromatin, the gene-rich regions ofthe genome.

xon a transcribed region of a gene that codes for a protein.

enotype a person’s genetic structure, as reflected by his/herDNA sequence; there are two alleles at each locus, one ofpaternal and one of maternal origin.

aplotype the combination of alleles found at adjacent locion the same chromosomal segment that tend to be trans-mitted together.

eterochromatin the dense chromatin; the gene-poor re-gions of the genome composed of repetitive DNA se-quences.

eterozygous having two different alleles at a specific au-tosomal (or X chromosome in a female) gene locus.

omozygous having two identical alleles at a specific au-tosomal (or X chromosome in a female) gene locus.

ntron a non-transcribed region of a gene that does not code

for a protein.

inkage the tendency of genes or other DNA sequences atspecific loci to be inherited together as a consequence oftheir physical proximity on a single chromosome.

inkage analysis a method to trace and measure the co-segregation of a disease in a family with marker loci.

oci plural of locus; the physical location of a gene.

OD score the extent of linkage is measured by formulatinga LOD score. The LOD score is the logarithm (base 10) ofthe likelihood ratio comparing the hypothesis of linkagewith the hypothesis of no linkage (ie, hypothesis of freerecombination). The closer the genetic marker is to thedisease gene, the greater the extent of co-segregation andthe bigger the LOD score.

inkage disequilibrium particular alleles at two or moreneighboring loci show allelic association if they occurtogether with frequencies significantly different from thosepredicted from the individual allele frequencies.

icrosatellites small run (usually less than 0.1 kb inlength) of tandem repeats with a very simple DNA se-quence, commonly 1–4 base pairs.

utation a fixed modification in the sequence of genomicDNA that is inherited.

utation, frame-shift mutation caused by deletion or in-sertion of nucleotides (ie, DNA bases) not precise multi-ples of three, resulting in an altered open reading frame ofa gene and usually to a truncated protein.

utation, missense a single nucleotide (ie, DNA base)substitution leading to a codon that defines an alternativeamino acid of a protein.

utation, nonsense a single nucleotide (ie, DNA base)substitution resulting in a stop codon that causes trunca-tion of a protein.

utation, silence a single nucleotide (ie, DNA base) sub-stitution that causes no change in the amino acid of aprotein.

enetrance the likelihood that a person carrying a particu-lar mutant gene will have an altered phenotype.

henotype the observable features or expressions of specificgene(s), environmental factors, or both.

olymorphism any variation of two more alleles (synonym,variant).

ecombination a natural phenomenon during gametogen-esis via which regions between pairs of equivalent chro-mosomes and, thus, DNA are exchanged. A random bio-logic process causing the chromosomes of the offspringdiscrete from those of the parents, therefore introducingvariation in a population.

elative risk ratio of a sibling (�s) the risk of a sibling todevelop a specific disease if a biologic brother or sister arealready affected. The �s is calculated by dividing theprevalence of disease among siblings with the prevalence of

disease in the general population.
Page 9: Genomics, genetic epidemiology, and genomic medicine

P

S

S

1

1

1

1

1

1

1

1

1

1

2

2

2

2

2

2

CMl

6A

328 LAZARIDIS AND PETERSEN CLINICAL GASTROENTEROLOGY AND HEPATOLOGY Vol. 3, No. 4

roband the index case; the affected person through whicha pedigree is discovered and explored.

ibling (sib)-pair analysis a type of linkage analysis inwhich genetic markers are tested for linkage to a disease ortrait by measuring the extent to which affected sib pairsshare the marker haplotypes.

NPs (single nucleotide polymorphisms) any polymor-phism (ie, variation) due to the difference at a singlenucleotide between two or more genomes.

References1. Watson JD, Crick F. Molecular structure of nucleic acids: a struc-

ture for deoxyribose nucleic acid. Nature 1953;171:737–738.2. Lander ES, Linton LM, Birren B, et al. Initial sequencing and

analysis of the human genome. Nature 2001;409:860–921.3. Venter JC, Adams MD, Myers EW, et al. The sequence of the

human genome. Science 2001;291:1304–1351.4. Collins FS, Green ED, Guttmacher AE, et al. A vision for the future

of genomics research. Nature 2003;422:835–847.5. Peltonen L, McKusick VA. Dissecting human disease in the post-

genomic era. Science 2001;291:1224–1229.6. Guttmacher AE, Collins FS. Genomic medicine: a primer. N Engl

J Med 2002;347:1512–1520.7. Kaprio J. Science, medicine, and the future: genetic epidemiol-

ogy. BMJ 2000;320:1257–1259.8. Collins FS. Shattuck Lecture: medical and societal consequences

of the Human Genome Project. N Engl J Med 1999;341:28–37.9. Ghosh S, Collins FS: The geneticist’s approach to complex dis-

ease. Annu Rev Med 1996;47:333–353.0. Collins FS. Morgan M, Patrinos A. The human genome project:

lessons from large-scale biology. Science 2003;300:286–290.1. Collins FS, Guyer MS, Chakravarti A. Variations on a theme:

cataloging human DNA sequence variation. Science 1997;278:1580–1581.

2. Kruglyak L, Nickerson DA. Variation is the spice of life. Nat Genet2001;27:234–236.

3. Collins FS, Mansoura MK. The human genome project: revealing the

shared inheritance of all humankind. Cancer 2001;91:221–225. a

4. Risch NJ. Searching for genetic determinants in the new millen-nium. Nature 2000;405:847–856.

5. Goldstein DB, Weale ME. Population genomics: linkage disequi-librium holds the key. Curr Biol 2001;11:R576–R579.

6. Gabriel SB, Schaffner SF, Nguyen H, et al. The structure ofhaplotype blocks in the human genome. Science 2002;296:2225–2229.

7. Cargill M, Daley CQ. Mining for SNPs: putting the common vari-ants—common disease hypothesis to the test. Pharmacoge-nomics 2000;1:27–37.

8. Weiss KM, Terwilliger JD. How many diseases does it take to mapa gene with SNPs? Nat Genet 2000;26:151–157.

9. Pritchard JK. Are rare variants responsible for susceptibility tocomplex diseases? Am J Hum Genet 2001;69:124–137.

0. Romero R, Kuivaniemi H, Tromp G, et al. The design, execution,and interpretation of genetic association studies to deciphercomplex diseases. Am J Obstet Gynecol 2002;187:1299–1312.

1. Devlin B, Roeder K. Genomic control for association studies.Biometrics 1999;55:997–1004.

2. Hirschhorn JN, Lohmueller K, Byrne E, et al. A comprehensive reviewof genetic association studies. Genet Med 2002;4:45–61.

3. Tabor HK, Risch NJ, Myers RM. Candidate-gene approaches forstudying complex genetic traits: practical considerations. NatRev Genet 2002;3:391–397.

4. Sing CF, Haviland MB, Reilly SL. Genetic architecture of commonmultifactorial diseases. In: Chadwick D, Cardew G, eds. Variationin the human genome. Chichester, UK: John Wiley and Sons,1996:211–232.

5. Ardlie KG, Kruglyak L, Seielstad M. Patterns of linkage disequi-librium in the human genome. Nat Rev Genet 2002;3:299–310.

Address requests for reprints to: Konstantinos N. Lazaridis, MD,enter for Basic Research in Digestive Diseases, Mayo Clinic College ofedicine, 200 First Street SW, Rochester, MN 55905. e-mail:

[email protected]; fax: (507) 284-0762.Supported by grants from the National Institutes of Health (DK

8290), the Foundation for Digestive Health and Nutrition and themerican Gastroenterological Association (Research Scholar Award),

nd the Palumbo Foundation (to K.N.L.).