quantitative genetics: past and present

9
Quantitative genetics: past and present Prem Narain Received: 26 June 2009 / Accepted: 4 February 2010 / Published online: 18 February 2010 Ó Springer Science+Business Media B.V. 2010 Abstract Most characters of economic importance in plants and animals, and complex diseases in humans, exhibit quantitative variation, the genetics of which has been a fascinating subject of study since Mendel’s discovery of the laws of inheritance. The classical genetic basis of continuous variation based on the infinitesimal model of Fisher and mostly using statistical methods has since undergone major modi- fications. The advent of molecular markers and their extensive mapping in several species has enabled detection of genes of metric characters known as quantitative trait loci (QTL). Modeling the high- resolution mapping of QTL by association analysis at the population level as well as at the family level has indicated that incorporation of a haplotype of a pair of single-nucleotide polymorphisms (SNPs) in the model is statistically more powerful than a single marker approach. High-throughput genotyping technology coupled with micro-arrays has allowed expression of thousand of genes with known positions in the genome and has provided an intermediate step with mRNA abundance as a sub-phenotype in the mapping of genotype onto phenotype for quantitative traits. Such gene expression profiling has been combined with linkage analysis in what is known as eQTL mapping. The first study of this kind was on budding yeast. The associated genetic basis of protein abundance using mass spectrometry has also been attempted in the same population of yeast. A comparative picture of tran- script vs. protein abundance levels indicates that functionally important changes in the levels of the former are not necessarily reflected in changes in the levels of the latter. Genes and proteins must therefore be considered simultaneously to unravel the complex molecular circuitry that operates within a cell. One has to take a global perspective on life processes instead of individual components of the system. The network approach connecting data on genes, transcripts, pro- teins, metabolites etc. indicates the emergence of a systems quantitative genetics. It seems that the inter- play of the genotype-phenotype relationship for quantitative variation is not only complex but also requires a dialectical approach for its understanding in which ‘parts’ and ‘whole’ evolve as a consequence of their relationship and the relationship itself evolves. Keywords Quantitative characters Á Genetic basis Á Molecular markers Á Quantitative trait loci (QTL) Á High-resolution mapping Á Power of statistical modeling Á eQTL Á mRNA abundance Á Protein abundance Á Systems quantitative genetics Á Dialectical approach Introduction Most traits of economic importance in plants and animals as well as disease traits in humans have an P. Narain (&) IASRI, New Delhi, India e-mail: [email protected] 123 Mol Breeding (2010) 26:135–143 DOI 10.1007/s11032-010-9406-4

Upload: prem-narain

Post on 15-Jul-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Quantitative genetics: past and present

Quantitative genetics: past and present

Prem Narain

Received: 26 June 2009 / Accepted: 4 February 2010 / Published online: 18 February 2010

� Springer Science+Business Media B.V. 2010

Abstract Most characters of economic importance

in plants and animals, and complex diseases in

humans, exhibit quantitative variation, the genetics

of which has been a fascinating subject of study since

Mendel’s discovery of the laws of inheritance. The

classical genetic basis of continuous variation based

on the infinitesimal model of Fisher and mostly using

statistical methods has since undergone major modi-

fications. The advent of molecular markers and their

extensive mapping in several species has enabled

detection of genes of metric characters known as

quantitative trait loci (QTL). Modeling the high-

resolution mapping of QTL by association analysis at

the population level as well as at the family level has

indicated that incorporation of a haplotype of a pair of

single-nucleotide polymorphisms (SNPs) in the model

is statistically more powerful than a single marker

approach. High-throughput genotyping technology

coupled with micro-arrays has allowed expression of

thousand of genes with known positions in the genome

and has provided an intermediate step with mRNA

abundance as a sub-phenotype in the mapping of

genotype onto phenotype for quantitative traits. Such

gene expression profiling has been combined with

linkage analysis in what is known as eQTL mapping.

The first study of this kind was on budding yeast. The

associated genetic basis of protein abundance using

mass spectrometry has also been attempted in the same

population of yeast. A comparative picture of tran-

script vs. protein abundance levels indicates that

functionally important changes in the levels of the

former are not necessarily reflected in changes in the

levels of the latter. Genes and proteins must therefore

be considered simultaneously to unravel the complex

molecular circuitry that operates within a cell. One has

to take a global perspective on life processes instead of

individual components of the system. The network

approach connecting data on genes, transcripts, pro-

teins, metabolites etc. indicates the emergence of a

systems quantitative genetics. It seems that the inter-

play of the genotype-phenotype relationship for

quantitative variation is not only complex but also

requires a dialectical approach for its understanding in

which ‘parts’ and ‘whole’ evolve as a consequence of

their relationship and the relationship itself evolves.

Keywords Quantitative characters � Genetic basis �Molecular markers � Quantitative trait loci (QTL) �High-resolution mapping � Power of statistical

modeling � eQTL � mRNA abundance � Protein

abundance � Systems quantitative genetics �Dialectical approach

Introduction

Most traits of economic importance in plants and

animals as well as disease traits in humans have an

P. Narain (&)

IASRI, New Delhi, India

e-mail: [email protected]

123

Mol Breeding (2010) 26:135–143

DOI 10.1007/s11032-010-9406-4

Page 2: Quantitative genetics: past and present

underlying genetic basis involving several genes and

are subject to modification by environmental factors.

Statistical considerations have been predominant in

dissecting such complex traits into estimable com-

ponents. The heritability of a trait as the proportion of

phenotypic variation that is attributed to genetic

causes has been a prime indicator helpful in making

decisions for the genetic improvement of economic

traits. The prediction of response to artificial selection

based on intensity and accuracy of selection and the

existence of genetic variability has been successful

across several crop plants, livestock, poultry and

fisheries. However, the relationship between pheno-

type and genotype has been like a black box, where

an inferential approach has been the only way to look

into it. This scenario is now changing with the advent

of modern technologies of gene sequencing, micro-

array experiments and enormous advances in

attempts to understand gene and protein expression

within a cell of an organism. Information on molec-

ular markers has been extremely helpful in identify-

ing the regions on chromosomes that bring about

variation in the trait (quantitative trait loci; QTL),

thereby providing tools that can lead to much more

accurate selection procedures for genetic improve-

ment of economic traits. Saturated genetic maps of

markers, giving their order along a chromosome and

relative distances between them, have been devel-

oped. The map distance is based on the total number

of crossovers between the two markers, whereas the

physical distance between them is in terms of

nucleotide base pairs (bp). A centiMorgan (cM),

corresponding to a crossover of 1%, can be a span of

10–1,000 kbp and can vary across species. The gene

transcript data from microarray experiments can be

integrated with molecular marker information to map

expression traits (eQTL) that can possibly lead to

causal networks. In this paper we discuss briefly some

of these developments and indicate how the evolution

of the quantitative genetics from the past to the

present is heading towards a systems quantitative

genetics.

QTL mapping

Since the marker genotypes can be followed in their

inheritance through generations, they can serve as

molecular tags for following the QTL provided they

are tightly linked with the QTL. The first problem is

therefore to detect the marker–QTL linkage. Once

this is established, the next problem is to estimate the

QTL map position on the chromosome and estimate

the effect of allelic substitution. However, these

problems depend on whether we have data on

experimental populations obtained from controlled

crosses, as in plants and animals, or on natural

populations like humans where controlled crosses

cannot be made. It is, however, important to note that

the markers chosen for the QTL analysis should not

show any segregation distortion, as that may lead to

biased marker-trait association. Also, the phenotypic

data on the quantitative trait should follow a normal

distribution. One has therefore to verify these

assumptions for the data under consideration before

embarking on the QTL analysis.

The detection of marker–QTL linkage is based on

a statistical test of a null hypothesis (H0) against an

alternative hypothesis (H1). It is therefore subject to

two types of error. H0 postulates that there is no QTL

and hence no linkage exists between the marker and

the QTL. Rejecting it when it is true is a Type I error

which means that we detect marker–QTL linkage

when in fact no QTL is present. This is termed false

positive and the probability of such a contingency (a)

is kept as low as 5% or less. On the other hand, if we

accept H0 when in fact a QTL is present, we commit a

Type II error. This means that our test misses the

QTL. As in any statistical test, the strategy is to

minimize the probability of committing a Type II

error (b) for a fixed value of a. The statistical power

for QTL detection is then (1–b). In QTL studies, such

testing is done at several points or intervals where

markers are located on each of the several chromo-

somes across the genome. Such multiple testing poses

a challenging problem that is primarily statistical.

The most common method of QTL mapping is that

of interval mapping. The whole chromosome is

divided into short intervals of about 20 cM each

and each interval is treated separately for QTL

detection and estimation. The maximum likelihood

method leading to LOD score statistics is used for

this purpose. A LOD score threshold T is chosen for

comparing with the observed value. An observed

value greater than T indicates significance. The LOD

score values obtained for each interval are plotted

against the chromosome position to give a Likelihood

Map. The maximum value of the significant LOD

136 Mol Breeding (2010) 26:135–143

123

Page 3: Quantitative genetics: past and present

scores provides a possible position of the QTL for the

given genomic region.

Although simple interval mapping (SIM) is the

method for QTL mapping most widely used with

advantage in several practical situations, it ignores

the fact that most quantitative traits are influenced by

numerous QTL. This is overcome either by adopting

a model of multiple QTL mapping (MQM) or by

combining SIM with the method of multiple linear

regression, a procedure known as composite interval

mapping (CIM). In all these methods, one uses the

approach of maximum likelihood which produces

only point estimates of the parameters such as the

number of QTL, their location, and effects. The

corresponding confidence intervals are required to be

determined separately by re-sampling methods. Fur-

ther, the correct number of QTL is difficult to

determine using traditional methods. Their incorrect

specification leads to distortion of the estimates of

locations and effects of QTL. To address these

problems a Bayesian approach is adopted wherein the

joint posterior distribution of all unknown parameters

given their prior distributions and the observed data

is computed. This is done using iterative simulation

procedures on high-speed computers.

The first application of interval mapping in plant

breeding has been to an inter-specific backcross in

tomato. The parents for the backcross were the

domestic tomato Lycopersicon esculentum (E) with

fruit mass 65 g and a wild South American green-

fruited tomoto L. chmielewskii (CL) with fruit mass

5 g. A total of 237 backcross plants were assayed for

continuously varying characters like fruit mass,

soluble-solids concentration and pH, and 63 RFLP

and 20 isozyme markers spaced at approximately

20 cM intervals were selected for QTL mapping. The

methods of maximum likelihood and LOD scores

were used through the software MAPMAKER-QTL

to implement the interval mapping. A threshold

T = 2.4, giving the probability of less than 5% that

even a single false positive will occur anywhere in

the genome, was used. This corresponds approxi-

mately to the significance level for any single test of

0.001. The resulting QTL likelihood maps revealed

multiple QTL for each trait (6 for fruit weight, 4 for

concentration of soluble solids and 5 for fruit pH) and

estimated their location to within 20–30 cM.

In regard to fruit weight, the above type of

investigation was continued, with more and more

QTL for this trait being identified. In another study, at

least 28 QTL controlling the difference in fruit

weight between wild and cultivated tomato were

identified, one of them being fw2.2 on chromosome 2.

Using refined mapping studies, this QTL was local-

ized to a narrow chromosomal region of the order of

1/10,000 of the genome. Using a map-based

approach, fw2.2 was cloned and a 19-kb segment of

DNA containing it was sequenced. This made it

possible to identify a single gene, ORFX, responsible

for the QTL effect. By transforming the wild version

of the gene into a cultivated tomato, it was shown that

the transformed plants decrease in weight by around

30% as predicted thus conforming that there are no

additional fruit weight QTL nearby on the chromo-

some. Yet in another experiment, the population

under study was derived from a cross between the

wild species L. pimpinellifolium with average tomato

fruit weight of 1 g and L. esculentum cultivar var.

Giant Heirloom with fruit weight in excess of

1,000 g. The same six major loci on chromosomes

1–3 and 11 accounting for as much as 67% of

phenotypic variation in fruit mass as in the previous

experiments were identified. The two most significant

QTL detected in this study are fw11.3 and fw2.1 on

chromosomes 11 and 2 respectively.

Linkage disequilibrium or association mapping

Association studies that involve linkage disequilib-

rium (LD) between markers and genes underlying

complex traits are being undertaken in different parts

of the world, but mostly in human genetics. The key

idea is that a disease mutation assumed to have arisen

once on the ancestral haplotype of a single chromo-

some in the past history of the population of interest

is passed on from generation to generation together

with markers at tightly linked loci resulting in LD.

The usual method adopted in human genetic studies

is that of case–control analysis wherein genotype or

allele frequencies of candidate genes are compared in

unrelated cases and controls. However, when the

population is composed of a recent admixture of

different ethnic groups that differ in marker allele

frequencies and disease frequencies, the method of

case–control comparison leads to spurious associa-

tion between the marker genotypes and the disease

traits. Family-based association methods such as the

Mol Breeding (2010) 26:135–143 137

123

Page 4: Quantitative genetics: past and present

transmission/disequilibrium test (TDT) can circum-

vent such problems.

Several studies on modeling the high-resolution

mapping of QTL by association analysis at the

population level as well as at the family level have

been conducted (Spielman et al. 1993; Luo et al.

1997; Luo et al. 2000; Fan et al. 2006 and several

others). Because of the difficulty in ascertaining the

phase of a haplotype consisting of several single-

nucleotide polymorphisms (SNPs), these models

considered marker genotypes at each locus sepa-

rately, thus losing information on their joint charac-

teristics. Narain (2007, 2009) therefore considered

the full genotypic model at a pair of flanking diallelic

SNPs, in the context of a family-based approach like

the TDT for testing the association in the presence of

LD. It led to a more powerful test when expressed in

terms of non-centrality parameters. This strategy for

high-resolution mapping of QTL by association

analysis was also investigated at the population level

and led to increased power of the corresponding tests.

Joint linkage and LD mapping

While linkage mapping can readily detect chromo-

somal regions harboring QTL, it is difficult to locate

them precisely. Also, since this approach depends on

the cross between two true breeding parents, it

captures only a tiny fraction of the genetic diversity

in the population. Association mapping, on the other

hand, widely samples genetic diversity as well as

requires fewer individuals but has less power to

detect QTL when they are not common. The advan-

tages of the two approaches can, however, be

combined by initially detecting QTL using linkage

mapping with a moderate number of markers fol-

lowed by a second stage of high-resolution associa-

tion mapping in QTL regions that capitalizes on a

high-density marker map.

The benefits of linkage and association mapping

have recently been combined in a single population

of maize by adopting a nested association mapping

(NAM) approach. The maize NAM population was

derived by crossing a common reference sequence

strain to 25 different maize lines. Individuals result-

ing from each of the 25 crosses were self-fertilized

for four further generations, to produce 5,000 NAM

recombinant inbred lines (RILs). This population was

first used for initial detection of QTL using linkage

mapping approach. Subsequently, within each diverse

strain, high-resolution association mapping was

adopted with a high-density marker map. It is

significant to note that within each RIL all individuals

are genetically nearly identical. This means we can

estimate the true breeding value of each line much

more accurately by averaging the phenotypic mea-

surements of a given trait taken on several individuals

with the same genotype.

In a recent experiment, the genetic architecture of

flowering time in Zea mays (maize) was dissected

using NAM. About 1 million plants were assayed in

eight environments to map the QTL. About 29–56

QTL were found to affect flowering time. These were

small-effect QTL shared among the diverse families.

The analysis showed, surprisingly, the absence of any

single large-effect QTL. Moreover, there was found

no evidence of epistasis or environmental interac-

tions. Flowering time controls adaptation of plants to

their local environment in the outcrossing species Zea

mays. A simple additive genetic model accurately

predicting the flowering time in this species is thus in

sharp contrast to what has been observed in several

plant species which practice self-fertilization.

Mapping of QTL for gene expression profile

(eQTL)

The advent of DNA chip technology in the form of

cDNA and oligonucleotide microarrays has provided

huge and complex datasets on gene expression

profiles of different cell lines from different organ-

isms. Such gene expression profiles have recently

been combined with linkage analysis based on QTL

mapping through molecular markers in what has been

termed ‘genetical genomics’ (Jansen and Nap 2001).

Gene expression levels for each individual of a

segregating population are phenotypes that are cor-

related with markers, genotyped for that individual,

to identify the QTL and their locations on the genome

to which the expression traits are linked. Such

expression quantitative trait loci (eQTL) studies are

similar to traditional multi-trait QTL studies but with

thousands of phenotypes. It is also important to note

that, underlying the gene expression differences,

there are two types of regulatory sequence variation.

One is cis-regulatory that affects its own expression

138 Mol Breeding (2010) 26:135–143

123

Page 5: Quantitative genetics: past and present

and the other is trans-acting or protein coding that

affects the expression of other genes. The first study

in which transcript abundance was used to study the

linkage with the QTL was on budding yeast (Brem

et al. 2002) based on a cross between a laboratory

strain and a wild strain, the parents being haploid

derivatives. The heritability estimation was based on

haploid segregants and the linkage with a marker was

tested by partitioning the segregants into two groups

according to marker genotypes and comparing the

expression levels between the groups with the

Wilcoxon–Mann–Whitney test. They found eight

trans-acting loci, each affecting the expression of a

group of 7–94 genes of related function. Since then,

several eQTL studies have been published in species

like mice, maize, humans, rats and Arabidopsis

thaliana (Schadt et al. 2003; Lan et al. 2003; Morley

et al. 2004; DeCook et al. 2006). These have led to

some general principles of genetic mapping of

genome-wide gene expression as reviewed by Rock-

man and Kruglyak (2006).

Conducting experiments to identify QTL for

organismal phenotype (P) as well as for the corre-

sponding transcript phenotype (Ps) can indicate the

genetic relationship between them, as borne out by

the study of Lan et al. (2003) on type 2 diabetes in a

population of F2-ob/ob mice from a cross of two

mouse strains. There were 8 mRNA traits (several Ps)

and 8- and 10-week levels of fasting plasma glucose,

insulin and body mass—the six physiological pheno-

typic traits (several P) for diabetes—and known

genotypes of 192 microsatellite markers included in

the study. In addition, of course, each transcript had a

known position on the genome, as is true for any

microarray experiment. The clustering of the two

types of phenotypes together led to two groups of 4

each of the 8 mRNA traits due to their mutual

correlations, with one of the groups containing SCD1

transcript (Ps), showing strong association with the

insulin trait. eQTL mapping of the first principal

component of this group revealed two loci DMC1 and

DMC2 that were significantly associated with SCD1.

The region of the former, on chromosome 2, over-

lapped with the locus t2dm3 that was found to be

associated with fasting insulin levels (P), using

traditional QTL mapping. Similarly, the region of

the DMC2 gene, on chromosome 5, overlapped with a

locus associated with fasting glucose levels (P). Thus

SCD1 mRNA expression was shown to be linked to

the loci that are associated with type 2 diabetes using

both multi- as well as single-trait QTL mapping. This

study points out that the phenotypic correlation

between P and Ps is due to the genetic correlation

between the corresponding genotypes—the DNA

sequence variation—and the possible correlation

between their corresponding environmental compo-

nents. As we will see later, such data can develop into

causal networks.

QTL for protein levels in yeast

In each cell of an organism, most of the day-to-day

work in terms of metabolism and structure is

performed by proteins consisting of long polypeptide

chains of amino acids that are of 20 types. It is well

known that the function of a protein is coded in a 20-

letter-alphabet language of amino acids and the type

of amino acid is dictated by the genetic code that

consists of successive triplets of nucleotides along the

DNA. The relationship between DNA and proteins is

provided by the manner in which the 4-letter

language of DNA is transformed in the 20-letter

language of protein. It is therefore expected that

functionally important changes in transcript levels

should be reflected in the changes in the levels of

corresponding protein levels.

Proteome profiling based on mass spectrometry

has been used for quantitative measurement of

protein abundance to study the genetic basis of

protein level in a cross between two diverse strains of

the budding yeast, the two strains differing at 0.6% of

base pairs (Foss et al. 2007). The same cross was also

used earlier to understand the genetic basis of

transcript levels (Brem et al. 2002). This therefore

allowed the comparison of the genetics of protein and

transcript levels in the same population. Just as

transcript levels are compared across samples by

measurements of corresponding spot hybridization

intensities on micro-arrays, levels of peptides in an

output of a mass spectrometry experiment consisting

of a matrix of peaks, each of which represents a

peptide, are measured in terms of ion intensities after

appropriate alignment of the matrices.

Total proteins from eight independent logarithmic-

phase cultures of each parent and from two indepen-

dent cultures of each of 98 segregants were isolated,

digested with trypsin and analyzed by mass

Mol Breeding (2010) 26:135–143 139

123

Page 6: Quantitative genetics: past and present

spectrometry. Only the best peptide for a given

protein was selected. This led to 221 unique peptides

with high quality data and corresponded to 278

proteins. The genetic contribution to the observed

variability in protein abundance was estimated from a

subset of 156 of these proteins for which high-quality

data from the parent strains were also available. The

heritability of protein abundance was found to be

0.62. The comparison between genetic regulation of

proteins and that of the transcripts revealed more

differences than similarities; the average correlation

between them was found to be only 0.186. The

parental strains differed in both proteins as well as

transcripts to the extent of about 33%. However, only

43% of proteins that differed between the parents

corresponded to transcripts that were different

between the parents. Linkage analysis detected loci

for 156 of 278 transcripts (56%) compared to 85 of

221 peptides (38%). Most loci affected either peptide

abundance or transcript abundance but not both.

Since traits are not physically located in the region,

the corresponding hot spots are trans-acting. Protein

linkages were found to be concentrated in fewer hot-

spots than the transcript linkages. The overall

conclusion of this study was startling in that the loci

that influenced protein abundance differed from those

that influenced transcript levels, much against

expectations.

Systems quantitative genetics

The relationship between genotype and phenotype is

viewed by Rockman (2008) as a reverse engineering

process in which observations from segregating

populations on genes, transcript abundance, QTL

for transcript abundance and molecular markers are

used to infer causal networks to understand how the

system works as an integrated whole. Based on the

premise that genetic variation occurring naturally in a

population is a source of multi-factorial perturbation,

he reviews the recent literature to show how models

of probabilistic causal networks can be built up to

establish the genotype–phenotype map. In a way, the

review indicates the emergence of a systems quan-

titative genetics.

We illustrate the causal network with the help of a

hypothetical example given in Rockman (2008). The

existence of a large segregating population derived

from a cross between two pure breeding lines, such as

F2 and F3 progeny, recombinant inbred lines (RILs),

or backcross (BC) progeny etc. is assumed. Each

individual of the population is used for analysis by

micro-array profiling based on RNA, protein or

metabolites as well as molecular marker analysis.

We take the transcript abundances (Ps) corresponding

to 10 genes, denoted as 1–10, as sub-phenotypes of

the main phenotype such as a disease (P). For each

transcript we carry out a linkage analysis in which

each point along the genome is tested to find out

whether it affects the abundance of the transcript.

QTL are detected as peaks of statistical significance

walking along the chromosome exceeding a certain

threshold (Table 1). The correlation between genes

and transcript phenotypes is studied by plotting the

physical position of the gene corresponding to each

transcript against the position of the QTL associated

with variation in the transcript abundance (Fig. 1).

The data points aligning on the diagonal represent

local linkages typically due to cis-acting regulatory

polymorphisms whereas vertical alignments indicate

linkage hotspots—regions in the genome where

variation changes the abundance of a large number

of transcripts, a pleiotropic effect. Such plots lead to a

causal network in which QTL variation is the cause

of variation in the transcript abundance phenotypes

(Figs. 2 and 3). Transcript abundance (Ps) QTL can

co-localize with QTL for organismal phenotypes (P)

such as a disease. The reverse engineering process

lies in finding out whether Ps that is affected by QTL

Table 1 QTL analysis of 10 hypothetical transcripts

cDNA Marker

A B C D E F G

1 *

2 *

3 * *

4 * *

5 * *

6 * *

7 * *

8 *

9 *

10

* Means significant indicating the presence of QTL in the

region

140 Mol Breeding (2010) 26:135–143

123

Page 7: Quantitative genetics: past and present

of P is a cause or effect of the disease. Several other

approaches like Bayesian networks, structural equa-

tion modeling, or a simple network derived from pair-

wise trait correlations could be adopted to study the

systems level understanding of the biological cause

and effect relationship. But each of them has its own

limitations. A complete reverse engineering of the

genotype–phenotype map, however, does not seem to

be feasible unless we can include all possible causal

variables in the network-inference methodology.

Discussion

There have been two major developments in recent

times that have changed the way we are accustomed

to look at the mapping of genotype onto phenotype

for quantitative characters. The first is the advent of

molecular markers, their extensive mapping in sev-

eral species and their incorporation in statistical

models as covariates. In addition to classical herita-

bility as the proportion of phenotypic variation in the

character that is due to additive effects of QTL, we

have now the proportion of additive genetic variation

that is associated with the markers. The larger this

proportion, the greater is our ability to detect QTL.

However, the regions to which the QTL are mapped

are usually large, of the order of 10–20 cM or even

greater, making candidate gene evaluation impossi-

ble. High-resolution mapping based on association

genetics must then be undertaken for which various

models have been developed, most of which consider

a single marker at a time thereby losing valuable

information due to linkage between them. For family-

based association methods like the TDT, Narain

(2007, 2009) developed the theory with haplotypes

instead of a single marker and proposed that one can

study the putative gene at any given location on the

11111

10

9

8

7

6

5

4

3

2

1

*

*

*

**

*

*

*

*

*

*

*

*

Marker Position Along The Genome

A B C D E F G

*

*

####

#

###

##

Fig. 1 Plot of QTL positions against the physical positions of

the genes

Fig. 2 Co-variation in transcript phenotype due to variation in

QTLs

Fig. 3 Causal network

Mol Breeding (2010) 26:135–143 141

123

Page 8: Quantitative genetics: past and present

chromosome by considering only a pair of markers

around it rather than the whole set of markers.

The second development is high-throughput geno-

typing technology which, coupled with micro-arrays,

has allowed expression of thousand of genes with

known positions in the genome and has provided an

intermediate step with mRNA abundance as a sub-

phenotype (Ps) in the mapping of genotype onto

phenotype for quantitative traits. Such gene expres-

sion profiling has been combined with linkage

analysis, termed eQTL mapping. Recently, the asso-

ciated genetic basis of protein abundance using mass

spectrometry has also been attempted. A comparative

picture of transcript vs. protein abundance levels in

the same population in the case of budding yeast,

however, indicates that functionally important

changes in the levels of the former are not necessarily

reflected in changes in the levels of the latter. It may

be worthwhile to discuss it from a conceptual angle.

As we know, the central dogma of molecular

biology stipulates that the sequence information flows

from DNA to RNA to protein but not in the reverse

direction. Rockman (2008) has also indicated that

many causal orderings in the network analysis are

prohibited by the central dogma, at least within an

individual, as phenotype does not feed back to affect

genotype, though between individuals phenotypes do

feed back by selection to shape genes. But Kimchi-

Sarfaty et al. (2007) reported data that indicates that a

protein’s three-dimensional structure is not necessar-

ily determined by its amino acid sequence which has

been specified by the DNA sequence. An mRNA, if

subjected to translational braking, can generate a

protein with a different structure than specified by the

DNA sequence. This has been termed the ‘transla-

tion-dependent folding’ (TDF) hypothesis (Newman

and Bhat 2007). Differential gene expression result-

ing in transcripts as sub-phenotypes could then lead

to different proteins and could give results similar to

those obtained in the yeast experiment. Genes and

proteins are therefore required to be considered

simultaneously to unravel the complex molecular

circuitry that operates within a cell. One has to look

at a global perspective of genotype—phenotype

relationship instead of individual components like

DNA or proteins of a cellular system.

It seems the interplay of genotype–phenotype

relationship for quantitative variation is not only

complex but also needs a closer look at how we view

this relationship—whether purely at the DNA–RNA

level as in the reductionist approach or at the level of

the cell as a whole where DNA–RNA are just parts of

the cellular system, with other contextual forces

present in the micro-environments of the cell also

playing their own important roles. Such situations

have also been noticed in agricultural experimenta-

tion where a dialectical approach has been advocated

(Narain 2006). In the grain production process, it is

also important to study how this process affects the

soil health and the ecosystem surrounding the plant,

as is studying the effect of the inputs on the

production. In the dialectical approach, this relation-

ship between the plant and its environment is studied

both ways—input to output as well as output to

input—a sort of feedback. A similar possibility seems

to exist in the genotype–phenotype relationship

within a cell. The protein as a phenotype is

determined by the DNA sequence as the genotype

but the reverse phenomenon of protein affecting the

DNA could also take place at the expense of violating

the central dogma. In fact, studies are being con-

ducted to explore biochemical signaling pathways

that regulate the function of living cells through

regulatory networks having positive and negative

feedback loops (Ray 2008), though it is unclear how

genetics can be incorporated into it. These feedback

loops are basically cybernetic concepts that are

inherent in the dialectical approach. This approach

takes into account the dynamics of the system over

time as well, in which the development is a conse-

quence of opposing forces. This is based on the

concept of contradiction inherent in the meaning of

dialectics. Things change because of the action of the

opposing forces on them, and things remain how they

are because of the temporary balance of opposing

forces. The opposing forces are seen as contradictory

in the sense that each taken separately would have

opposite effects, but their joint action may be

different from the result of either acting alone. These

forces are, however, part of self-regulation and the

development of the object is regarded as a network of

positive and negative feedback loops the incorpora-

tion of which in the genetic context would violate the

central dogma. Genes, transcripts, proteins, metabo-

lites, physical components etc. can be regarded as

‘parts’ of the cellular system and the ‘whole’ is

regarded as a relation of these parts that acquire

properties by virtue of being the parts of a particular

142 Mol Breeding (2010) 26:135–143

123

Page 9: Quantitative genetics: past and present

whole. As soon as the parts acquire properties by

being together, they impart to the whole new

properties that are in turn reflected in changes in

the parts, and so on. Parts and whole therefore evolve

as a consequence of their relationship, and the

relationship itself evolves. Genes are fixed but their

expression, the transcript, is not. At any given

moment of time genes are expressed depending on

the requirement of the cell and through the informa-

tion contained in the DNA. At this moment of time

the cellular system is said to have a particular state.

At the next moment of time the same genes are

expressed but differently, depending upon the then

requirement of the cell and based on the feedback, if

any, from the system’s state at the previous time

point, assuming that the process is Markovian. This

gives the next state of the system which might or

might not be different from the previous state. And so

the process goes on, continually modifying the

relationship between the different parts of the system

based on the interactions and feedbacks. It seems a

dialectical approach could provide the clue for

understanding how ‘parts’ of a system and the

‘whole’ system behave in the genetics context. But

how to model such a process remains to be seen.

References

Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic

dissection of transcriptional regulation in budding yeast.

Science 296:752–755

DeCook R, Lall S, Nettleton D, Howell SH (2006) Genetic

regulation of gene expression during shoot development

in Arabidopsis. Genetics 172:1155–1164

Fan RZ, Jung JS, Jin L (2006) High-resolution association

mapping of quantitative trait loci: a population-based

approach. Genetics 172:663–686

Foss EJ, Radulovic D, Shaffer SA, Ruderfer DM, Bedalov A,

Goodlett DR, Kruglyak L (2007) Genetic basis of prote-

ome variation in yeast. Nat Genet 39:1369–1375

Jansen RC, Nap J-P (2001) Genetical genomics: the added

value from segregation. Trends Genet 17(7):388–391

Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM,

Ambudkar SV, Gottesman MM (2007) A ‘‘silent’’ poly-

morphism in the MDRI gene changes substrate specificity.

Science 315:525–528

Lan H, Stoehr JP, Nadler ST, Schueler KL, Yandell BS, Attie

AD (2003) Dimension reduction for mapping mRNA

abundance as quantitative traits. Genetics 164:1607–1614

Luo ZW, Thompson R, Woolliams JA (1997) A population

genetics model of marker-assisted selection. Genetics

146:1173–1183

Luo ZW, Tao SH, Zeng Z-B (2000) Inferring linkage dis-

equilibrium between a polymorphic marker locus and a

trait locus in natural populations. Genetics 156:457–467

Morley M et al (2004) Genetic analysis of genome-wide var-

iation in human gene expression. Nature 430:743–747

Narain P (2006) Dialectical agriculture. Natl Acad Sci Lett

29:253–260

Narain P (2007) A theoretical treatment of interval mapping of

a disease gene using transmission disequilibrium tests. J

Biosci 32(7):1317–1324

Narain P (2009) Transmission Disequilibrium Test (TDT) for a

pair of linked marker loci. Comput Stat Data Anal

53(5):1883–1893

Newman SA, Bhat R (2007) Genes and proteins: dogmas in

decline. J Biosci 32(6):1041–1043

Ray LB (2008) Getting your loops straight. Science 322:389

Rockman MV (2008) Reverse engineering the genotype-phe-

notype map with natural genetic variation. Nature

456:738–744

Rockman MV, Kruglyak L (2006) Genetics of global gene

expression. Nat Rev Genet 7:862–872

Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo

V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS,

Mao M, Stoughton RB, Friend SH (2003) Genetics of

gene expression surveyed in maize, mouse and man.

Nature 422:297–302

Spielman RS, Mcginnis RE, Ewens WJ (1993) Transmission

tests for linkage disequilibrium: the insulin gene region

and insulin-dependent diabetes mellitus (IDDM). Am J

Hum Genet 52:506–516

Mol Breeding (2010) 26:135–143 143

123