discovery and development of exome ... - cerealsdb.uk.net · made available via the use of...

17
Discovery and development of exome-based, co-dominant single nucleotide polymorphism markers in hexaploid wheat (Triticum aestivum L.) Alexandra M. Allen 1, *, Gary L. A. Barker 1 , Paul Wilkinson 1 , Amanda Burridge 1 , Mark Winfield 1 , Jane Coghill 1 , Cristobal Uauy 2 , Simon Griffiths 2 , Peter Jack 3 , Simon Berry 4 , Peter Werner 5 , James P. E. Melichar 6 , Jane McDougall 7 , Rhian Gwilliam 7 , Phil Robinson 7 and Keith J. Edwards 1 1 School of Biological Sciences, University of Bristol, Bristol, UK 2 John Innes Centre, Norwich, UK 3 RAGT, Ickleton, Essex, UK 4 Limagrain, Woolpit, Suffolk, UK 5 KWS, Thriplow, Hertfordshire, UK 6 Syngenta Seeds Ltd, Whittlesford, Cambridge, UK 7 KBioscience Unit 7, Hertfordshire, UK Received 20 April 2012; revised 6 August 2012; accepted 10 August 2012. *Correspondence (Tel 44 117 331 6770; fax 44 117 925 7374; email [email protected]) Keywords: wheat, next-generation sequencing, KASPar genotyping, single nucleotide polymorphism. Summary Globally, wheat is the most widely grown crop and one of the three most important crops for human and livestock feed. However, the complex nature of the wheat genome has, until recently, resulted in a lack of single nucleotide polymorphism (SNP)-based molecular markers of practical use to wheat breeders. Recently, large numbers of SNP-based wheat markers have been made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many of these markers and platforms have difficulty distinguishing between heterozygote and homozygote individuals and are therefore of limited use to wheat breeders carrying out commercial-scale breeding programmes. To identify exome-based co-dominant SNP-based assays, which are capable of distinguishing between heterozygotes and homozyg- otes, we have used targeted re-sequencing of the wheat exome to generate large amounts of genomic sequences from eight varieties. Using a bioinformatics approach, these sequences have been used to identify 95 266 putative single nucleotide polymorphisms, of which 10 251 were classified as being putatively co-dominant. Validation of a subset of these putative co-dominant markers confirmed that 96% were true polymorphisms and 65% were co-dominant SNP assays. The new co-dominant markers described here are capable of genotypic classification of a segregating locus in polyploid wheat and can be used on a variety of genotyping platforms; as such, they represent a powerful tool for wheat breeders. These markers and related information have been made publically available on an interactive web-based database to facilitate their use on genotyping programmes worldwide. Introduction Bread wheat (Triticum aestivum) is an allohexaploid (AABBDD) crop derived from the hybridisation of the diploid genome of Aegilops tauschii (DD) with the AABB tetraploid genome of Triticum turgidum (Dubcovsky and Dvorak, 2007). These hybridi- sation events, the domestication process and the inbreeding nature of wheat have together resulted in a reduced level of genetic diversity between cultivated wheat varieties, when compared with their wild ancestors (Haudry et al., 2007). Wheat breeders and geneticists require tools to exploit the genetic diversity available within germplasm collections and carry out breeding programmes, which utilise this diversity to maximum effect. Molecular markers enable breeders and geneticists to carry out this process; however, in allohexaploid wheat, the develop- ment of molecular markers has, until recently, been problematic due to the presence of homoeologous and paralogous copies of the various genes (Kaur et al., 2012). Recent advances in genotyping platforms have built upon the wealth of data provided by next-generation sequencing (NGS) technologies to enable, for the first time, the large-scale identification, validation and application of molecular markers in wheat breeding pro- grammes (Berkman et al., 2012; Paux et al., 2011). These developments have come at a critical time, where the need for a substantial increase in yields to feed a growing global population has coincided with reduced genetic gains and increasing climatic and environmental pressures (Dixon et al., 2009; Reynolds et al., 2009). Many of the recently developed genotyping platforms rely on the identification of single nucleotide polymorphisms (SNPs), which are polymorphic between different wheat varieties (Paux et al., 2011). To overcome the various bottlenecks and problems associated with SNP generation, characterisation and most importantly validation in wheat, we and others have previously used NGS-based technology to identify and map relatively large numbers of gene-based SNP loci (Allen et al., 2011; Akhunov et al., 2009; Chao et al., 2010). However, these studies used cDNA and EST sequences and were therefore subject to variation ª 2012 The Authors Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd 1 Plant Biotechnology Journal (2012), pp. 1–17 doi: 10.1111/pbi.12009

Upload: others

Post on 12-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

Discovery and development of exome-based, co-dominantsingle nucleotide polymorphism markers in hexaploidwheat (Triticum aestivum L.)Alexandra M. Allen1,*, Gary L. A. Barker1, Paul Wilkinson1, Amanda Burridge1, Mark Winfield1, Jane Coghill1,Cristobal Uauy2, Simon Griffiths2, Peter Jack3, Simon Berry4, Peter Werner5, James P. E. Melichar6, Jane McDougall7,Rhian Gwilliam7, Phil Robinson7 and Keith J. Edwards1

1School of Biological Sciences, University of Bristol, Bristol, UK2John Innes Centre, Norwich, UK3RAGT, Ickleton, Essex, UK4Limagrain, Woolpit, Suffolk, UK5KWS, Thriplow, Hertfordshire, UK6Syngenta Seeds Ltd, Whittlesford, Cambridge, UK7KBioscience Unit 7, Hertfordshire, UK

Received 20 April 2012;

revised 6 August 2012;

accepted 10 August 2012.

*Correspondence (Tel 44 117 331 6770;

fax 44 117 925 7374;

email [email protected])

Keywords: wheat, next-generation

sequencing, KASPar genotyping, single

nucleotide polymorphism.

SummaryGlobally, wheat is the most widely grown crop and one of the three most important crops for

human and livestock feed. However, the complex nature of the wheat genome has, until

recently, resulted in a lack of single nucleotide polymorphism (SNP)-based molecular markers of

practical use to wheat breeders. Recently, large numbers of SNP-based wheat markers have been

made available via the use of next-generation sequencing combined with a variety of genotyping

platforms. However, many of these markers and platforms have difficulty distinguishing between

heterozygote and homozygote individuals and are therefore of limited use to wheat breeders

carrying out commercial-scale breeding programmes. To identify exome-based co-dominant

SNP-based assays, which are capable of distinguishing between heterozygotes and homozyg-

otes, we have used targeted re-sequencing of the wheat exome to generate large amounts of

genomic sequences from eight varieties. Using a bioinformatics approach, these sequences have

been used to identify 95 266 putative single nucleotide polymorphisms, of which 10 251 were

classified as being putatively co-dominant. Validation of a subset of these putative co-dominant

markers confirmed that 96% were true polymorphisms and 65% were co-dominant SNP assays.

The new co-dominant markers described here are capable of genotypic classification of a

segregating locus in polyploid wheat and can be used on a variety of genotyping platforms; as

such, they represent a powerful tool for wheat breeders. These markers and related information

have been made publically available on an interactive web-based database to facilitate their use

on genotyping programmes worldwide.

Introduction

Bread wheat (Triticum aestivum) is an allohexaploid (AABBDD)

crop derived from the hybridisation of the diploid genome of

Aegilops tauschii (DD) with the AABB tetraploid genome of

Triticum turgidum (Dubcovsky and Dvorak, 2007). These hybridi-

sation events, the domestication process and the inbreeding

nature of wheat have together resulted in a reduced level of

genetic diversity between cultivated wheat varieties, when

compared with their wild ancestors (Haudry et al., 2007). Wheat

breeders and geneticists require tools to exploit the genetic

diversity available within germplasm collections and carry out

breeding programmes, which utilise this diversity to maximum

effect. Molecular markers enable breeders and geneticists to carry

out this process; however, in allohexaploid wheat, the develop-

ment of molecular markers has, until recently, been problematic

due to the presence of homoeologous and paralogous copies of

the various genes (Kaur et al., 2012). Recent advances in

genotyping platforms have built upon the wealth of data

provided by next-generation sequencing (NGS) technologies to

enable, for the first time, the large-scale identification, validation

and application of molecular markers in wheat breeding pro-

grammes (Berkman et al., 2012; Paux et al., 2011). These

developments have come at a critical time, where the need for

a substantial increase in yields to feed a growing global

population has coincided with reduced genetic gains and

increasing climatic and environmental pressures (Dixon et al.,

2009; Reynolds et al., 2009).

Many of the recently developed genotyping platforms rely on

the identification of single nucleotide polymorphisms (SNPs),

which are polymorphic between different wheat varieties (Paux

et al., 2011). To overcome the various bottlenecks and problems

associated with SNP generation, characterisation and most

importantly validation in wheat, we and others have previously

used NGS-based technology to identify and map relatively large

numbers of gene-based SNP loci (Allen et al., 2011; Akhunov

et al., 2009; Chao et al., 2010). However, these studies used

cDNA and EST sequences and were therefore subject to variation

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd 1

Plant Biotechnology Journal (2012), pp. 1–17 doi: 10.1111/pbi.12009

Page 2: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

in expression of homoeologous and paralogous genes. In

hexaploid wheat, this situation is further aggravated as homo-

eologous and paralogous genes are often silenced or can show

differential spatial and/or temporal expressions (Adams and

Wendel, 2005; Akhunova et al., 2010; Liu et al., 2009). Genomic

DNA is likely to be a more reliable source of putative SNPs;

however, the size of the wheat genome means that sequencing

the whole genome of multiple varieties to the depths required for

successful SNP identification is impractical, time consuming and

costly (Biesecker et al., 2011). To overcome these resource-

associated problems, we have used a recently developed

sequence capture targeted resequencing approach to characte-

rise a significant proportion of the wheat exome (Winfield et al.,

2012). By using a reference collection of the wheat exome as the

basis of our SNP collection, we have been able to sequence and

compare equivalent regions of the wheat genome from several

wheat varieties.

To be fully utilised in breeding programmes, putative SNPs

need to be identified and converted to working assays on a high-

throughput genotyping platform. Recently, several technologies

have revolutionised wheat genotyping: Illumina’s GoldenGate/

Infinium technologies and KBioscience’s KASPar (Akhunov et al.,

2009; Allen et al., 2011). Development of these platforms has

encouraged the widespread uptake of SNP-based genotyping in

wheat; however, both technologies have two significant draw-

backs. Firstly, they require the identification and characterisation

of varietal SNPs among an excess of homoeologous and

paralogous SNPs. Secondly, as both platforms were developed

for diploid species, they have problems with the scoring of varietal

SNPs in polyploid heterozygotes, for instance, F2 and backcross

populations. The detection of heterozygous SNPs in allohexaploid

wheat is dependent on the ability of the system to accurately

discriminate between different call ratios. For ‘dominant’ SNP

assays, which amplify all three homoeologous copies, these

systems are often incapable of distinguishing homozygote (having

a call ratio of 4 : 2) and heterozygote lines (having a call ratio of

5 : 1) (Allen et al., 2011; Paux et al., 2011). In contrast, both

genotyping platforms work well when the SNP is amplified from

just a single homoeologous/paralogous copy. Such SNP assays are

usually referred to as co-dominant SNP assays, that is, they are

capable of differentiating between homozygotes (having a call

ratio of 2 : 0) and heterozygotes (having a call ratio of 1 : 1). As

such, co-dominant SNP assays are preferred markers compared

with dominant SNP assays. Unfortunately, co-dominant SNP

assays usually make up < 20% of the SNP assays generated by

conventional means (Allen et al., 2011). However, careful primer

design can lead to the successful amplification of just one

homoeolog/paralog, but this process is time consuming as the

variable nature of each set of sequences demands a manual

approach to primer design. To overcome this bottleneck, we have

developed a SNP identification pipeline which incorporates a

novel bioinformatics procedure designed to identify putative

co-dominant SNP assays.

The developments described here have led to both the

generation of an extensive set of putative varietal SNPs from

genomic DNA and within this data set the identification of a

subset of putative co-dominant SNPs. The use of an exome-based

SNP discovery strategy has targeted gene discovery to genic

regions. Validation of a subset of these putative co-dominant SNP

assays and a comparison with dominant SNP markers has

provided useful insights into their design and characteristics.

Finally, the work described here has resulted in a significant

increase in the number of gene-derived co-dominant SNP assays,

which will be of considerable interest to wheat researchers, and in

particular, the breeding community.

Results

SNP discovery

In this study, the exome of the UK varieties Alchemy, Avalon,

Cadenza, Hereward, Rialto, Robigus, Savannah and Xi19 was

captured using the NimbleGen capture array (NimbleGen array

reference 100819_Wheat_Hall_cap_HX1) described in Winfield

et al. (2012). This generated between 9.8 and 48.7 million reads

on the Illumina GAIIx platform. Sequence data were filtered as

described in the experimental procedures. Varietal SNPs were

called in the filtered data where read coverage was sufficiently

high that there was less than a 0.1% chance of an observed allelic

difference between two varieties being due to failure to sample

an allele. For example, if the varieties Avalon and Cadenza have

observed calls of A(20) and A(10)G(10), respectively, we would

expect half of the alleles in Avalon (10 calls) to be G under the null

hypothesis that there is no real genotypic difference. Randomi-

sation tests showed that for the data set as a whole, using a

minimum expected count of 10 for null bases resulted in a false

discovery rate of < 1%. Putative co-dominant SNP markers were

identified as the subset of SNPs meeting the above criteria, but

where every variety had only a single allele called. The SNP

discovery pipeline identified 95 266 putative varietal SNPs in

26 551 distinct reference sequences (Winfield et al., 2012).

Examination of these SNPs suggested that as in our previous

work, only 10%–20% were co-dominant (Table 1; Allen et al.,

2011), with 10 251 putative co-dominant SNP markers identified

within 5308 contigs.

Co-dominant SNP validation

As co-dominant SNP assays are of significant interest to the wheat

community, it is important that such assays have a level of

polymorphism that is not significantly different to that previously

shown for dominant SNP assays. In addition, it is important that

the distribution of the co-dominant SNP markers across the three

homoeologous genomes do not show a bias beyond those shown

previously for mapped dominant SNP markers (Allen et al., 2011).

To assess both of these features, we selected a subset of 1337

putative co-dominant SNP assays for validation and further

analysis. While the selection process used to identify this subset

was essentially random, to aid further investigation, we selected

those SNPs that appeared, via the sequence analysis, to be

polymorphic between the parents of the UK mapping populations

Avalon 9 Cadenza and Savannah 9 Rialto. Of the 1337 SNPs

selected, we were able to design working KASPar assays for 1190

SNPs (89%; Data S1).

Genotyping of a panel of 47 wheat varieties using the 1190

KASPar probes resulted in 1138 probes (96%) generating data

that could be scored consistently and were polymorphic in at least

one of the varieties screened (Data S2). Examination of the

genotypic data revealed three types of varietal SNP markers:

co-dominant SNP markers (where homozygous scores were

detected for all hexaploid varieties, for instance, only scores of

A:A or T:T were obtained, Figure 1a); partially co-dominant SNP

markers (where heterozygous and homozygous scores were

detected in hexaploid varieties, that is, A:A, T:T and the mixed A:

T, Figure 1b); and dominant SNP markers (where a single

homozygous and heterozygous score was detected in hexaploid

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.2

Page 3: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

varieties, that is, A:A and mixed A:T only, Figure 1c). Of the 1138

validated probes, 734 (65%) were co-dominant, 194 (17%) were

partially co-dominant and 210 (18%) were dominant. Dominant

and co-dominant markers were used to screen an F4 population

known to contain homozygote and heterozygote individuals.

Screening this population with co-dominant SNP assays resulted

in three separate clusters for the various homozygote and

heterozygote individuals (Figure 1d). However, screening the

same population with dominant SNP assays produced a more

scattered cluster where homozygous and heterozygous loci were

indistinguishable (Figure 1e). Screening the F4 population with

partially co-dominant SNPs yielded the same results as described

for both dominant and co-dominant SNP assays depending on

the genotypes of the population parents (data not shown).

The SNP markers developed through the NimbleGen exome

capture were compared with the existing database of SNP

markers developed from EST and normalised cDNA sequences

using the experimental procedures described in Allen et al., 2011;

(Table 2; Data S3). The number of co-dominant SNP assays

generated was significantly higher, and dominant SNPs signifi-

cantly lower, when compared with the previous data set of

validated EST/cDNA-derived SNPs (v2 = 131.98, P < 0.001).

Comparison of the two data sets showed that the numbers of

partially co-dominant SNP assays were not significantly different

between the two data sets (v2 = 3.87, P = 0.14). In addition, the

polymorphism information content (PIC) scores and minor allele

frequencies (MAF) were similar between the two data sets; they

were highest for the partially co-dominant SNP assays and lowest

for the dominant SNP assays.

Characterisation of the different SNP types

To characterise the different SNP marker types identified in

this study, and in particular, the co-dominant and partially

co-dominant SNP assays, several analyses were performed using

the contig sequences containing the SNPs. The average sizes of

contigs containing different SNP types were similar (co-dominant,

692 bp; dominant, 690 bp; partially co-dominant, 666 bp). We

hypothesised that co-dominant SNP assays were likely to be

derived from 5′ or 3′ untranslated regions (UTRs) of genic

sequences. In the absence of functional coding constraints, such

regions are more likely to have diverged between homoeologs

and thus represent effectively unique regions of sequence. To

address this hypothesis, SNP-containing contig sequences were

used to screen, via BLASTX (Altschul et al., 1990), the non-

redundant (nr) protein database. If a match was found (E-value

1e-5), a further analysis was then performed to identify whether

the SNP was located inside or outside the coding region. This

analysis showed that a higher proportion of the contig sequences

used to develop co-dominant SNP assays returned no hit when

subjected to a BLASTX analysis against the nr database, compared

with dominant SNP assay sequences. Where a hit was identified,

a higher proportion of the co-dominant SNPs were found to lie

outside the coding region, compared with dominant SNPs. The

number of co-dominant SNPs located within known coding

(a) (b) (c)

(d) (e)

Figure 1 KASPar plots of different varietal SNP

types screened against a panel of hexaploid wheat

varieties with examples of (a) a co-dominant SNP

assay, (b) a partially co-dominant SNP assay and

(c) a dominant SNP assay. Screening an F4population containing heterozygotes with a

co-dominant SNP assay results in a separate

cluster for heterozygote individuals (d). Screening

the same population with a dominant SNP assay

produces a more scattered cluster where

homozygote and heterozygote individuals are

indistinguishable.

Table 1 Summary of next-generation

sequence data and SNPs identified for eight

wheat varietiesVariety

No. of

sequences

(million)

No. mapped

reads (million)

No. of SNPs

(compared with

Avalon)

No. of co-dominant

SNPs (compared

with Avalon)

Proportion of total

SNPs that are

co-dominant (%)

Avalon 27.2 10.9 N/A N/A N/A

Alchemy 44.4 16.9 22 092 2558 11.6

Cadenza 20.2 4.9 8909 1550 17.4

Hereward 41.0 15.2 13 379 2399 17.9

Rialto 30.4 11.8 15 141 2662 17.6

Robigus 31.4 6.3 10 823 2114 19.5

Savannah 48.7 16.8 18 648 2818 15.1

Xi19 9.8 2.9 4391 899 20.5

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 3

Page 4: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

regions was significantly lower than would be expected to occur

by chance (v2 = 7.56, P = 0.02). Partially co-dominant SNPs were

midway between dominant and co-dominant SNPs (Figure 2a).

Across all SNP marker types, the average length of contigs

returning no hit was lower (approximately 520 bp) than the

average length of contigs returning BLASTX hits (approximately

770 bp), suggesting that contig length affected the likelihood of

obtaining a BLASTX match. To check whether the contigs

returning no hit represent genes that had not yet been

annotated, the same contig sequences were subjected to a

BLASTN analysis (E-value 1e-3) against the NCBI nr nucleotide

database; 86% of the contigs returning no hit from the BLASTX

analysis also had no match in the BLASTN nr database.

We further hypothesised that some co-dominant SNP assaysmay

have been derived from single-copy regions of the wheat genome.

Such regions may have either been unique to only one progenitor

genome or alternatively one or more copies have been lost since

polyploidisation. To address this second hypothesis, the same sets

of sequences were screened, using BLASTN, against the 5 9

Chinese Spring genomic raw reads at http://www.cerealsdb.uk.

net/. BLAST hit coverage was calculated for every nucleotide

position in the query sequence and averaged over the whole

sequence to derive a mean contig coverage. All three SNP types

peak in coverage at 15 9 , indicative of three gene copies each at

5 9 coverage; however, the co-dominant SNPs and to a lesser

extent the partially co-dominant SNPs had a secondary peak of

coverage at fivefold coverage, indicative of single-copy number

sequences. This peak is absent from the dominant SNPs (Figure 2b).

These analyses were combined by comparing the coverage of

sequences containing co-dominant SNPs that had different

BLASTX results. The sequences with 5 9 coverage are most

highly represented by those returning a ‘no hit’ from BLASTX

analysis (Figure 2c). When contig length was plotted against

median coverage for co-dominant SNPs, no relationship was

observed (r = �0.0009). A similar result (r = �0.0009) was

obtained by performing the same analysis using only sequences

returning no BLASTX hit, suggesting that the contig length does

not affect the number of hits returned from the BLASTN analysis

or the estimated level of coverage.

Map location of dominant, partially co-dominant andco-dominant SNP assays

The map positions of the different SNP marker types were

investigated to determine whether any bias in genetic location

was introduced by using co-dominant SNP assays in two doubled-

haploid mapping populations developed from UK cultivars

Avalon 9 Cadenza (A 9 C) and Savannah 9 Rialto (S 9 R). Of

the 3214 SNP markers developed to date (Table 2), 2109 were

identified as polymorphic between Avalon and Cadenza (via

screening of the 47 varieties above), of which 1807 were placed

on the Avalon 9 Cadenza map. These consisted of 1152 EST/

cDNA-derived markers and 655 NimbleGen-derived markers. Of

the remaining 1105 SNP assays not polymorphic between Avalon

SNP type

Validated

NimbleGen

SNPs (%)

Validated

EST/cDNA

SNPs (%)

Total validated

SNPs

Average minor

allele frequency

Average

PIC score

Dominant 210 (19) 1195 (58) 1407 0.249 0.273

Partially

co-dominant

194 (17) 437 (21) 632 0.315 0.315

Co-dominant 734 (64) 444 (21) 1175 0.270 0.287

All SNPs 1138 2076 3214 0.270 0.286

PIC, polymorphism information content.

Table 2 Summary of validated SNPs

0

10

20

30

40

50

60

Perc

enta

ge o

f tot

al n

umbe

r of

sequ

ence

s

No hits

SNP outside coding regionSNP within coding region

0

5

10

15

20

25

0 5 10 15 20 25 30

Freq

uenc

y

Coverage in 5x genome

no hit

SNP outside coding regionSNP within coding reigon

0100020003000400050006000700080009000

0 5 10 15 20 25 30

Freq

uenc

y

Coverage in 5x genome

Dominant

Co-dominant

Partially co-dominant

(a)

(b)

(c)

Figure 2 Characteristics of sequences containing dominant, partially

dominant and co-dominant SNP types. (a) BLASTX analysis against the

non-redundant (nr) protein database and (b) BLASTN against the 5 9

Chinese Spring raw reads. (c) Coverage of sequences containing

co-dominant SNPs against the 5 9 Chinese Spring raw reads classified

according to the BLASTX designation in (a).

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.4

Page 5: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

and Cadenza, 562 were identified as polymorphic between

Savannah and Rialto and 541 of these markers were placed on

the Savannah 9 Rialto map. These consisted of 187 EST/cDNA-

derived markers and 375 NimbleGen-derived markers. To enable

comparisons between the maps, 231 evenly spaced loci from

the Avalon 9 Cadenza map were also included on the

Savannah 9 Rialto map (Figure 3; Data S3). For the Ava-

lon 9 Cadenza map, previously mapped SSR markers were used

to help assign linkage groups to chromosomes (http://www.wgin.

org.uk/resources/MappingPopulation/TAmapping.php; Data S4).

In total, 2350 (73%) of the validated SNP markers were mapped;

these comprised of 969 dominant SNP loci, 444 partially

co-dominant loci and 937 co-dominant SNP loci. In the Ava-

lon 9 Cadenza map, the linkage groups ranged from 54.5 to

239.0 centiMorgans (cM) in size, with 8–214 SNP markers. The

total map length was 2434.4 cM with an average spacing of

1.3 cM between SNP loci. In the Savannah 9 Rialto map, linkage

groups ranged from 1.3 to 221.1 cM, with 2–98 SNP markers.

The total Savannah 9 Rialto map length was 2861.8 cM with an

average spacing of 3.8 cM between SNP loci (Table 3). The two

linkage maps aligned well with each other, showing similar

arrangements of common loci within linkage groups.

In both populations, over 97.5% of the SNP markers could be

mapped unequivocally to a linkage group and assigned to a

unique chromosome position. The lack of markers on the short

arm of chromosome 1B in the Savannah 9 Rialto map can be

attributed to the presence of the same 1BL.1RS rye translocation

in both Savannah and Rialto, where the short arm of rye

chromosome 1B has replaced the short arm of wheat chromo-

some 1B (Figure 3). Clustering of SNP markers was observed in

both linkage maps, with 55% of A 9 C markers and 61% of

S 9 R markers being completely linked (0 cM distance between

them). Of the remaining markers, 81% of A 9 C markers are

separated by < 5 cM and 90% are within 10 cM of the next

marker. For the S 9 R map, these figures are lower (55%markers

separated by < 5 cM and 74% within 10 cM), probably due to a

smaller number of markers on the map. Similar levels of clustering

were observed for the different SNP marker types; 60% of A 9 C

co-dominant markers and 52% of dominant markers were

completely linked. Of the remaining co-dominant markers, 77%

of markers are within 5 cM of each other and 88% are within a

10 cM interval. The corresponding proportions for dominant

markers are similar (84% within 5 cM and 92% within 10 cM).

The different SNP types showed similar patterns of distribution

between the A, B and D linkage groups in both the Ava-

lon 9 Cadenza and Savannah 9 Rialto maps, with the only

difference of a higher proportion of the partially co-dominant

markers mapped to the D genome (Figure 4a). Similarly, although

clustering of SNP markers was observed within the linkage

groups, there was no obvious bias of different marker types

(Figure 3).

Summary statistics of mapped loci

The summary statistics of mapped SNP markers were compared

to assess whether different marker types had varying levels of

polymorphism in the 47 varieties screened and to ensure that the

co-dominant SNP markers developed in this study would be

useful across a wide range of material. The mean MAF and levels

of genetic diversity of SNP markers were compared between the

different marker types and assigned genomes of the

Avalon 9 Cadenza and Savannah 9 Rialto maps (Table 4). These

summary statistics were very similar for both maps; however,

differences within the maps were observed. Partially co-dominant

SNP assays had the highest average MAF and PIC scores, and

dominant SNP assays had the lowest. Loci from the separate

homoeologous genomes had consistent MAF and PIC measure-

ments, although the A and D genome measurements were

slightly higher than the B genome (Table 4). The different classes

of SNP loci showed differences in the distribution of MAF scores.

Co-dominant and partially co-dominant loci showed an increased

proportion of medium and high frequency alleles compared with

dominant loci (Figure 5a). Similarly, D genome loci showed a

trend to have higher MAF compared with A and B genome loci

(Figure 5b). The distribution of PIC scores showed that co-

dominant and partially co-dominant loci types had a higher

proportion of high PIC scores than dominant SNP assays

(Figure 5c). A and B genome loci had a similar distribution of

PIC scores, while D genome loci had a comparatively higher

proportion of high PIC scores (Figure 5d).

Discussion

In this study, we present a SNP discovery pipeline capable of

identifying large numbers of putative SNPs from genomic

sequence obtained by targeted exome capture. This proved an

efficient method to generate equivalent sequences from multiple

varieties from which we were able to generate over 90 000

putative SNPs between eight elite UK cultivars. Given our results,

this same approach is likely to prove highly effective at

identifying SNPs across a wide range of cultivars, and in a

wider range of germplasms, such as landraces, progenitors and

alien species. By cataloguing SNPs using a reference collection of

sequences derived from just the wheat exome, we provide a

unique context for each SNP, thereby both reducing the chance

of duplications within the SNP data set and allowing direct

comparisons between different wheat lines. A key advantage to

the SNP collection described here compared with other SNP

markers such as insertion site–based polymorphisms (ISBPs; Paux

et al., 2010) is that by the nature of the targeted sequencing,

all the SNPs developed are associated with genes, and as

such, are likely to prove useful in gene-based marker-assisted

breeding.

When examined further, the SNP database was shown to

contain 10 251 putative co-dominant SNPs. Validation of over

10% of the co-dominant SNP assays on the KASPar genotyping

platform resulted in a significantly improved validation rate

compared with our previous study with 96% being polymorphic

between the varieties screened compared with 67% as described

in Allen et al. (2011). This increased validation rate is probably

due to the use of genomic DNA, as opposed to transcriptome-

derived data, in the SNP discovery phase where the problems of

expression differences and presence of intron–exon splice sites

hindered effective SNP identification and primer design (Trick

et al., 2012). Of the subset of putative co-dominant SNP

assays, over 80% were validated as co-dominant or partially

co-dominant, compared with < 20% in previous studies where

random SNPs were validated (Allen et al., 2011). The occurrence

of putative co-dominant SNP assays, which were dominant when

validated, is likely to be due to the presence of homeologous

sequences that were not represented in the sequence data but

were amplified by the KASPar primers. This may be due to a

feature of individual sequences that prevent them from mapping

on to the assembly or a consequence of reduced sequence

coverage.

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 5

Page 6: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

AxAA C_1A SxR_1RR AXBS00010592 XBS00010791XBS00022505 XBS00003831XBS00011608 XBS00001634XBS00022650 XBS00010076XBS00003930 XBS00003880XBS00003911 XBS00003917XBSXX 00010032 XBSXX 00010106

D1_RxSD1_CxAxxB1_RxSB1_CxAxx

XBS00010488

XBS0XX 0023118

XBS0XX 0022361XBXX S00010128 XBS0XX 0001635XBS00004382 XBS0XX 0010039

XBS0XX 0010267XBS00022271 XBS0X 0022285

XBXX S00010488XBS00000713 XBS00004528XBS00005272 XBS00022506XBS00022677 XBS00022759XBS00022770 XBS00023176XBS00000742 XBS00022355XBS00022483 XBS00022566XBS00023130 XBS00023167XBS00023201

XBS00009466XBS00003691XBXX S00012226 XBXX S00022997XBX S00010128

XBS00010508 XBS00010868XBS00011695 XBS00011961

XBS00022310XBS00022702 XBS00022973XBS00022102 XBS00023102

XBS00023177XBS00022294 XBS00022101

XBS00003684XBS00011641 XBS00022030

XBS00022135XBS00022247 XBS00022687XBS00022632 XBS00022670XBS00022104 XBS00023162XBS00023122 XBS00022303XBS00022508 XBS00022575XBS00022011 XBS00022089XBS00022133 XBS00022324XBS00022375 XBS00021946XBS00022616 XBS00022888XBS00022581 XBS00022740

XBXX S00021710XBS0XX 0002301 XBXX S00022778XBXX S00022779XBXX S000107377 5 XBS00011783XBS0XX 0012250 XBXX S00012899XBS0XX 0021935 XBXX S00022093XBS0XX 0022639 XBXX S00022742XBS0X 0012525 XBX S00022571

XBS0XX 0021748XBXX S00003806

XBS00021723 XBS0XX 0022323

XBXX S00003806 XBS0XX 00038230

5

10

15

20 XBS00022271 XBS0XX 0022285XBS00022270 XBS0XX 0022701XBS00010054 XBS0XX 0012072

XBS00012226XBS0XX 0022070XBS0XX 0022934

XBS00003572 XBS0XX 0003761XBS0XX 0012052XBS0XX 0021662

XBXX S00011787 XBS00022391XBS00009885 XBS0XX 0009926

XBS0XX 0003851XBS00014906 XBS0XX 0021661XBS00009866 XBS0XX 0022152XBS00011128 XBS0XX 0011286XBXX S00021779 XBS0XX 0022201XBS00022130 XBS0XX 0010481XBXX S00022144 XBS0XX 0003758XBXX S00009444 XBS0XX 0009495XBS00009603 XBS0XX 0010004

XBXX S00010128XBS00022667XBS00022243 XBS00022443XBS00022682XBS00004033

XBS00004043

XBS00022581 XBS00022740XBS00021947 XBS00022490XBS00022536 XBS00022085XBS00022645 XBS00022748XBS00022749 XBS00022909XBS00022088 XBS00022739XBS00023213 XBS00022625XBS00021996 XBS00022007XBS00022722 XBS00021995XBS00022549 XBS00022648XBS00022690 XBS00023044

XBS00021954XBS00022783 XBS00022911

XBS00022005XBS00022281 XBS00003619XBS00003839 XBS00009381XBS00014413 XBS00021052

XBS00009827XBS00003822

XBS00010246 XBS00010992

XBS0XX 0012525 XBXX S00022571XBXX S000037477 3 XBS00004269XBXX S00009324 XBS00012443XBS0XX 0012480 XBXX S00023148XBS0XX 0022789XBXX S00022567 XBS00022614XBS0XX 0022849 XBXX S00023071XBXX S00022411 XBXX S00022530XBXX S00022851XBS0XX 0000764 XBXX S00012773XBS0XX 0022249 XBXX S00022251

XBXX S00003633 XBS00022539

XBS0XX 0003751

XBS0XX 0023049

XBS00010661 XBS0XX 0021819XBS0XX 0022984

25

30

35

40

XBS00009603 XBS0XX 0010004XBS00010416 XBS0XX 0010452XBS00010595 XBS0XX 0010643XBS00011835 XBS0XX 0011864XBS00021730 XBS0XX 0021777

XBS0XX 0021863XBS00022156 XBS0XX 0022217

XBS0XX 0022132XBS00022149 XBS0XX 0022167XBS00022216 XBS0XX 0022207XBS00000853 XBS0XX 0021864

XBS0XX 0022733XBS00009559 XBS0XX 0010187XBS00003656 XBS0XX 0003829XBS00021741 XBS0XX 0022172XBXX S00022173 XBS00022220

XBS0XX 0022158XBXX S000037477 3 XBS0XX 0009808XBS00011146 XBS0XX 0009459

XBS0XX 0021764

XBS00021740

XBS00023065 XBS00023141

XBS00012210

XBS00010246 XBS00010992XBS00013901

XBS00009675 XBS00015200XBS00011462 XBS00021866XBS00011379 XBS00003575XBS00003844 XBS00009921XBS00009977 XBS00011745XBS00012062 XBS00010875XBS00013824 XBS00018526XBS00009488 XBS00010352

XBS00009483XBS00011104 XBS00011333XBS00000496 XBS00003627XBS00009731 XBS00009983XBS00010428 XBS00011324XBS00012051 XBS00021907XBS00003677 XBS00010253XBS00011008 XBS00017597XBS00003567 XBS00011463XBSXX 00011712 XBSXX 00003600

XBS0XX 0021934 XBXX S00022609XBS0XX 0023194

XBX S00023173

XBS00011451 XBS0XX 0010486XBS00022633 XBS0XX 0022595XBS00022027 XBS0XX 0022962XBS00009639 XBS0XX 0010435XBXX S00014671 XBXX S00022623XBXX S00022768 XBXX S00021690XBS00022334 XBS0XX 0004282

XBS0XX 0011310XBXX S00003558 XBS0XX 0021702XBXX S00003559 XBS0XX 0018250

XBS0XX 0003816XBS0XX 0022041

XBXX S00001737

XBXX S00003558 XBS0XX 0017501XBXX S00021690 XBXX S00022623XBXX S00003559

45

50

55

60

65

XBS00009700 XBS0XX 0012042XBS00021665 XBS0XX 0021758XBXX S00022061 XBS00022239XBXX S00022396 XBS0XX 0022848

XBS0XX 0021759XBS0XX 0003863

XBXX S00021797 XBS0XX 0021798XBS0XX 0022356XBS0XX 0021778

XBS00012210

XBS00022885XBS00022426 XBS00022714XBS00022870

XBS00004266 XBS00021714XBS00022698 XBS00023101XBS00002000 XBS00002215XBXX S00009444 XBXX S00012496XBS00021672 XBS00021719XBS00021766 XBXX S00021779XBS00021808 XBS00021878XBS00021889 XBS00021916XBXX S00022144 XBXX S00022275XBS00022286 XBS00023126XBXX S00022173 XBXX S00022220XBS00022432

XBS00003634 XBS00009270XBS00009502 XBS00009547XBS00009952 XBS00010549XBS00011159 XBS00011222XBS00011450 XBS00011468XBS00012079 XBS00021900

XBS00011500XBS00018653XBXX S00010735XBS00001224

XBS00023113 XBS00023142XBS00009945XBS00023001XBS00022878

XBS00022343 XBS00022569XBS00022615 XBS00022842

XBS00022920XBS00012056 XBS00021711XBS00010130 XBS00001414

XBXX S00023173

XBS0XX 0000033 XBXX S00009709

XBS0XX 0022398 XBXX S00022676XBS0XX 0023105

XBS0XX 0000010

XBS0XX 0021877

XBS0XX 0022188

XBXX S00012449XBXX S00021851

XBXX S00004145 XBS00012350XBX S00012392 XBS00012455

70

75

80

85

90

XBS00021760 XBXX S00022061XBXX S00022239 XBXX S00022396

XBS00021879

XBS00022351

XBS00009510

XBSXX 00011062 XBSXX 0001127477XBS00011423 XBS00011841

XBS00011973XBS00010475 XBS00010779XBS00010888 XBS00012456

XBXX S00022093XBS0XX 0022468 XBXX S00022778XBXX S00022779 XBS00022883XBS00012021 XBS00010625

XBS00010399XBS00021988 XBS00021975XBS0XX 0010157 XBXX S00021710XBS00010536 XBS00010938XBS00003934 XBS00009706XBS00013948 XBS00022626XBXX S00023148 XBS00021680XBS00004328 XBS00010009XBS00022889 XBS00022392XBS00009500 XBS00012502XBS0X 0000651 XBX S00009324

XBS0XX 0021697

XBXX S00001128

XBS00011855 XBS0XX 0000424XBS00003573 XBS0XX 0003687XBS00003725 XBS0XX 0009375XBS00009377 XBS0XX 0009389XBS00009625 XBS0XX 0010248XBS00010402 XBS0XX 0010498XBS00010528 XBS0XX 0010669XBS00010713 XBS0XX 0010807XBS00010851 XBS0XX 0010946XBS00011041 XBS0XX 0011168XBS00011292 XBS0XX 0011672

XBS0XX 0011947

XBXX S00012392 XBS00012455XBXX S00013397

95

100

105

110

115

XBS00000226

XBS0XX 0000651 XBXX S00009324XBS00010148 XBS00010436XBS00011934 XBS00012029

XBXX S00022567XBS0XX 0022577 XBXX S00022851XBXX S00022530 XBS00022775

XBXX S00022411XBXX S00003633

XBS00021941 XBS00002250XBS00004129 XBS00010392

XBS00021876XBS00009909 XBS00011670

XBXX S0002317377XBS00012012

XBXX S00009709 XBS00010610XBS00022342XBS00003702XBS00011811XBXX S00001128XBS00022153

XBS0XX 0022484

XBS0XX 0021667 XBXX S00022802

XBS0XX 0011947XBS0XX 0022485XBS0XX 0022654XBS0XX 0009412

120

125

130

135

XBS00023203XBXX S00021797

XBS00022153140

145

Figure 3 Genetic linkage maps of wheat derived from 190 Avalon 9 Cadenza doubled-haploid lines and 95 Savannah 9 Rialto doubled-haploid

lines. Each linkage group was assigned to a chromosome indicated above the linkage group, and chromosomes are arranged with the short arm

above the long arm. SNP loci mapped in this study are designated XBS and are coloured according to the SNP type: Dominant SNP loci are shown in

black, co-dominant SNP loci are shown in red and partially co-dominant SNP loci are shown in blue. Common markers between the Avalon 9 Cadenza

and Savannah 9 Rialto maps are underlined. Map distances, calculated using the Kosambi mapping function, are shown in centiMorgans (cM) on the

ruler to the left of linkage groups.

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.6

Page 7: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

D2_RxSD2_CxAxxB2_RxSB2_CxAxxA2_RxSA2_CxAxx

XBS00022024 XBS00023039XBS00023052XBXX S00022777

XBS00010624 XBS00003735

XBS00004496 XBS00009837XBS00010629 XBS00012284XBS00012592XBS00011949XBS00004089 XBS00004673

XBXX S00023068 XBXX S00010734XBXX S00009972

XBS00012159

XBS00014046

XBXX S00018412 XBXX S000184120

XBS00004385XBXX S00011271

XBS00023215XBXX S00003845 XBS00003867XBS00011101 XBS00011593XBS00014736 XBS00016676XBS00021691 XBS00021706

XBS00004156 XBS00012249XBS00012628 XBS00012784XBS00014923 XBS00022665XBXX S00003845 XBXX S00004112XBS00004243 XBS00005283XBS00012221 XBS00012228XBS00012313 XBS00013085XBS00013294 XBS00013474XBS00013669 XBS00018554XBS00022322 XBXX S00022777

XBS00003807XBS00004221 XBS00012561XBS00013258 XBS00022799XBS00023143XBS00004040XBS00003999 XBS00004067XBS00005270 XBXX S00011271XBXX S00011466 XBSXX 00011893

XBS00XX 010318 XBXX S00010711XBXX S00002660XBXX S00022059

XBS00XX 022493 XBXX S00003833XBS00XX 009263 XBXX S00010637XBS00XX 010700 XBXX S00012209XBS00XX 003719 XBXX S00009801XBS00XX 022330 XBXX S00023025

XBXX S00009848

XBS00014046XBS00004244 XBS00004262XBXX S00010734XBS00012471

XBS00012202

XBXX S00023068XBS00021707 XBS00021724

XBXX S00022059 XBXX S00022391XBXX S00002660

XBS00009771

XBS00003791 XBS00009976XBS00010492

XBXX S00009575 XBSXX 00010031

Xbarc1rr 24

Xbarc2rr 04XBS00018347 XBS00022305XBS00023062 XBS00023099XBS00022957

5

10

15

20

25

30

XBXX S00009604 XBS00009933XBXX S00022487

XBS00009730 XBS00016060XBS00022332

XBS00003779 XBS00011865XBS00022872 XBS00003574XBS0XX 0010194 XBXX S00009588XBS0XX 0021812 XBXX S00009667XBS00002994 XBS00003878XBS00003898 XBS00004004XBXX S00009256 XBSXX 00010672

XBS00012205 XBS00012239XBS00012409 XBS00013209XBS00013534 XBS00022327XBS00022331 XBS00023206

XBS00009697 XBS00022982XBXX S00009604 XBXX S00022487

XBS00010087

XBXX S00012081

XBXX S00022950

XBXX S00022060XBXX S00022734

XBS00XX 014974 XBXX S00009807XBXX S00012078XBXX S00010688XBXX S00010055XBXX S00021959

XBS00XX 011039 XBXX S00011187XBS00XX 004433 XBXX S00009305XBS00XX 010219 XBXX S00010829XBS00XX 010916 XBXX S00009962XBS00XX 009989 XBXX S00011958XBS00X 011990 XBX S00012103

XBS00023015 XBS00023195

XBS00023043

XBS00022730 XBS00022900

XBS00011425

XBS00011313 XBS00016116XBS00022976XBS00022210XBS00021888

XBS00010681 XBS00011752XBS00021852

XBS00010514 XBS00014219XBS00021923 XBS00003660XBS00009606 XBS00011740XBS00000002 XBS00003804

XBS00010131XBS00009673 XBS00010184XBS00XX 009932 XBXX S00014647

XBXX S00021865Xbarc1rr 68

35

40

45

50

55

60

65 XBS00011045 XBS00011467XBS00012040XBS00023157

XBS00022301 XBS00022494XBS00021767

XBXX S00011517 XBS00012126XBS00011945XBS00010696

XBS00022108

XBS00021739XBS00000590 XBS00021751XBS00003582 XBXX S00009588XBXX S00009667 XBXX S00010014

XBS00XX 011990 XBXX S00012103XBS00XX 009247 XBXX S00009461XBS00XX 011275 XBXX S00011315XBS00XX 012071 XBXX S00022558XBS00XX 011229 XBXX S00009442XBS00XX 010567 XBXX S00012036XBS00XX 022707 XBXX S00022014XBS00XX 016650 XBXX S00019285XBS00XX 022350 XBXX S00023145

XBXX S00003714XBXX S00003788 XBXX S00010012XBS00XX 011630 XBXX S00022091XBXX S00009258 XBXX S00011954XBS00XX 022295 XBXX S00003810XBS00XX 011531 XBXX S00011737XBS00XX 021896 XBXX S00022765XBS00XX 009722 XBXX S00010303XBS00XX 011709 XBXX S00021713

XBXX S00011757XBS00XX 022335 XBXX S00022781

XBS00000011 XBS00000092XBS00004120 XBS00004140XBS00022422 XBS00022503XBS00022651 XBS00022966XBS00004372 XBS00009815XBS00010513 XBS00012294

XBXX S00021865XBS00021823

XBS00021940 XBS00022019XBXX S00023211 XBS00022679

XBS00022917XBS00021912

XBS00000905 XBS00010472XBS00010792 XBS00011413XBS00021483 XBS00021927

XBS00022337XBS00009982XBS00011109XBS00021950

XBXX S00014647 XBXX S00021865XBXX S00023211XBS00021891

Xgwmww 539

65

70

75

80

85

90

95

XBS00010901XBS00001260XBXX S00011478XBS00003663XBS00011075XBS00004090

XBS00009959 XBS00021893XBS00022209

XBS0XX 0022321 XBXX S00011169XBS00009692 XBS00009737XBS00010760 XBS00011190XBS00011392 XBS00011539XBS00011792 XBS00012066XBS00013738 XBS00022150XBS00014087 XBS00010477XBXX S00009295 XBXX S00021937

XBXX S00009667 XBXX S00010014XBS00010932 XBXX S00011517XBS00012111 XBS00012154XBS00012545XBS00022641 XBS00022896XBS00022945XBS00004255 XBS00012320XBS00022241 XBS00022257XBS00022260 XBS00022377XBS00023123XBS00012237 XBS00023214

XBS00021693

XBXX S00022819XBS00XX 022314 XBXX S00022002XBS00XX 022712 XBXX S00023124

XBXX S00022010XBXX S00022112

XBXX S00003585 XBXX S00010432XBS00XX 011430 XBXX S00011483XBS00XX 022064 XBXX S00010523XBS00XX 003589 XBXX S00003597XBS00XX 003673 XBXX S00004405XBS00XX 009437 XBXX S00009519XBXX S00009736 XBXX S00009791XBS00XX 009864 XBXX S00010361XBS00XX 010640 XBXX S00011047XBS00XX 011276 XBXX S00011858XBS00XX 021895 XBXX S00022222XBS00XX 009460 XBXX S00010081

XBXX S00022478XBXX S00022185XBX S00010918

XBS00010513 XBS00012294XBS00012538 XBXX S00022112XBS00022374XBXX S00003585 XBXX S00003788XBS00004011 XBS00004223XBS00004224 XBS00004241XBS00004242 XBS00004417XBS00004422 XBS00012171XBS00012190 XBS00012336XBS00012382 XBS00012517XBS00022313 XBS00022394XBS00022497 XBS00022547XBS00022618 XBS00022674XBS00022713 XBS00022805XBS00023169XBS00003939 XBS00004130XBS00004228 XBS00004322XBS00004378 XBS00004420XBXX S00009258 XBXX S00009511XBS00012230 XBS00012390XBS00012468 XBS00012830

XBS00021828XBS00003584 XBS00009533XBS00009637 XBS00009901XBS00010323 XBS00010382XBXX S00010393 XBS00010442XBS00012014 XBS00021825XBS00021826 XBS00021827

XBS00010685

Xgwmww 320

XBS00022941 XBS00022942

XBXX S00010393

100

105

110

115

120

125 S00009 95 S000 93XBS00009616 XBS00009710XBS00010729 XBS00014037XBS00022467 XBS00022923

XBS00022925XBS00009362XBS00021676

XBS00023048XBS00022456 XBS00022461

XBXX S00011478 XBXX S00022265XBS00012343 XBS00016238XBS00022381 XBS00022585XBS00022813XBS00023144XBS00022409

XBXX S00011169

XBXX S00010918XBXX S00009418 XBXX S00011078

XBXX S00011384XBS00XX 019095 XBXX S00009594

XBXX S00010591XBS00XX 010859 XBXX S00011388

XBS00012468 XBS00012830XBS00020962 XBS00021892XBS00021908 XBS00021931XBS00022272 XBS00022421XBS00022445 XBS00022563XBS00022649 XBXX S00022837XBS00023011 XBS00023047XBS00011182XBS00012251 XBS00012493XBS00022399 XBS00022515XBS00009998

130

135

140

145

150

155XBXX S00011169

XBS00004303

XBS00010726

XBXX S00009418

160

165

170

175

180

185

XBS00010005 XBS00012328XBS00012379 XBS00023104

XBXX S00009418

XBS00000972 XBS00004200XBS00012438 XBXX S00022478

XBS00012467

190

195

200

205

210

Figure 3 Continued

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 7

Page 8: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

D3_RxSD3_CxAxxB3_RxSB3_CxAxxA3_RxSA3_CxAxx

XBXX S00010945 XBSXX 00021849XBS00012160XBXX S00009440 XBSXX 00012316

XBSXX 00009992XBSXX 00001335XBSXX 00003596

XBXX S00009476 XBSXX 00012531XBSXX 00011532

XBXX S00011806 XBS00022154XBXX S00022971 XBSXX 00011438XBXX S00003814 XBSXX 00010844XBXX S00012127 XBSXX 00011373XBXX S00004074 XBSXX 00009393

XBSXX 00010849XBS00022190XBSXX 00022622XBSXX 00009839

XBXX S00011966 XBS00004108XBXX S00011277 XBSXX 00011551XBXX S00009805 XBSXX 00011596XBXX S00003902 XBSXX 00022315

XBSXX 00010332

XBSXX 00022803

XBSXX 00003639

XBXX S00009476

XBSXX 00022669 XBS00023186XBXX S00022154

XBXX S00021930 XBSXX 00022212

XBSXX 00021920XBSXX 00022166

XBXX S00011395 XBSXX 000117377 8XBSXX 00010258

Xgwm383-3D

Xwmc529-3D

Xwmc294-3D XwXX mww c533-3DXBXX S00023210

XBS0000377477XBS00003733 XBS00009793XBXX S00010059 XBS00011516

XBS00011840

XBXX S00022006

XBS00000628

XBS00012160XBS00003932 XBS00004149XBS00004158 XBS00004414XBS00005117 XBXX S00010059XBS00021872 XBXX S00022006XBS00022279 XBS00022347XBS00022452 XBS00022798XBS00022985 XBS00023022XBS00019919

XBS00003933 XBS00022695

0

5

10

15

20

25XBSXX 00010332XBSXX 00022048XBSXX 00022242

XBXX S00003871 XBSXX 00009513XBXX S00022961 XBSXX 00010060XBXX S00010396 XBSXX 00011243

XBSXX 00010818XBXX S00010405 XBSXX 00010778XBXX S00010881 XBSXX 00019018XBXX S00015306 XBS00011042XBXX S00003864 XBSXX 00010826XBXX S00011064 XBSXX 00011570XBXX S00003708 XBSXX 00010795XBXX S00022415 XBS0002307477XBXX S00023037 XBSXX 00023227XBXX S00022072 XBSXX 00023191

XBSXX 00022741XBXX S00023121 XBS00023120

XBSXX 00022078XBXX S00022453 XBSXX 00022708XBX S00011264 XBSX 00011720

XBXX S00022190XBXX S00004108 XBS00011966XBSXX 00022387 XBS00022397XBSXX 00023188

XBSXX 00022501

XBS00022624 XBXX S00021981XBXX S00022844XBS00021976

XBS00022029 XBS00022129XBS00022016

XBS00022148 XBXX S00022516XBS00003801 XBS00013997

XBS00021699XBS00010947

XBS00003837 XBS00010204XBXX S00010531

XBS00011270 XBS00011929XBS00009357

XBS00003964 XBS00010284

XBS00003933 XBS00022695XBS00022746

XBS00009516

30

35

40

45

50

55

XBXX S00011264 XBSXX 00011720XBXX S00009411 XBSXX 00010715XBXX S00009289 XBS00022728

XBSXX 00011869XBSXX 00009475

XBXX S00015099 XBSXX 00022051XBXX S00009651 XBSXX 00003727XBXX S00003781 XBSXX 00012452XBXX S00000793 XBSXX 00009632XBXX S00022099 XBSXX 00022732

XBSXX 00022792XBS00022122

XBXX S00009990 XBS00009579XBSXX 00010136

XBXX S00009701 XBSXX 00009892XBXX S00017577 XBSXX 00010083XBXX S00022513 XBSXX 00022021XBXX S00023190 XBSXX 00000978XBXX S00010316 XBSXX 00001242XBXX S00009549 XBSXX 00020861 XBXX S00011042 XBS00022415

XBSXX 00021850XBXX S00003688 XBSXX 00021919

XBXX S00004334

XBS00003964 XBS00010284XBXX S00011158 XBS00011170

XBS00011798XBS00022631 XBS00022881XBXX S00022502 XBS00022882

XBXX S00022719XBS00022083 XBS00022325XBS00022629 XBS00022675

XBS00022691XBS00009465 XBS00010295XBS00010854 XBS00011612XBS00011660 XBS00022612

XBS00023026XBXX S00011012XBS00021909

XBS00010022 XBS00011456XBS00003778 XBS00010167XBS00000646 XBS00003768XBS00003924 XBS00010414XBS00017988 XBS00019326XBS00022703 XBS00022692XBSXX 00022752 XBSXX 00023087

XBS00001088 XBS00021694XBXX S00021981 XBXX S00022516XBS00022711 XBXX S00022844

XBS00022379XBS00012551XBXX S00010531XBXX S00011012XBS00003971 XBS00004183XBXX S00011158 XBXX S00022159XBXX S00022502 XBXX S00022719

60

65

70

75

80

85

XBXX S00022025 XBSXX 00022039XBXX S00022084 XBSXX 00022512XBXX S00022826 XBSXX 00022939

XBSXX 00023030XBSXX 00022316XBSXX 00022611

XBXX S00022440 XBSXX 00011937XBXX S00012388 XBSXX 00013262

XBSXX 00011766XBXX S00022219 XBSXX 00022140XBXX S00010324 XBSXX 00012080

XBSXX 00009337XBSXX 00022131

XBXX S00022403 XBS00022245XBXX S00010572 XBSXX 00022904

XBSXX 00021978

XBXX S00011042 XBS00022415XBXX S0002307477 XBS00023120XBSXX 00003262 XBS00009579XBSXX 00010407 XBS00011022XBSXX 00012450 XBS00012472XBSXX 00012549 XBS00012743XBXX S00022122 XBXX S00022474XBXX S00022728 XBXX S00022774XBSXX 00023091XBSXX 00005059

XcXX fd009ff -3D

XBS00022845 XBS00022989XBS00022533XBS00021980XBS00009740XBS00022058

XBXX S00011171 XBS00011231XBS00011280 XBS00011812XBXX S00011888 XBS00012084

XBS00021772 XBS00022552

XBXX S00011171 XBXX S00011888

XBS00022256XBS00003935 XBXX S00022245XBS00022459 XBS00022658XBS00022804 XBS00022884

90

95

100

105

110

115

XBS00022715

XBSXX 00003836XBSXX 00003931

XBXX S00021967 XBS00022441

XBXX S00022715

XBS00009649 XBS00022410XBXX S00022182 XBS00022862

120

125

130

135

140

145

XBXX S00003717 XBS00009671XBXX S00011898 XBS00022401

XBSXX 00003884

XBXX S00022441

XX

XBS00013724XBS00022419 XBS00022915

XBS00022735

XBS00022092

XBXX S00003849

XBS00022273XBXX S00022182

XBS00003956 XBS00022968

150

155

160

165

170

175

XBSXX 00018668

XBXX S00009671

XBS00003956 XBS00022968

XBXX S00003849 XBXX S00021754XBXX S00022401

180

185

190

195

Figure 3 Continued

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.8

Page 9: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

D4_RxSD4_CxAxxB4_RxSB4_CxAxxA4_RxSA4_CxAxx

XBS00003914

XBS00017267XBS00001207 XBS00003776

XBS00022015

XBS00011060

XBS00022125XBS00022189

XBSXX 00011173 XBXX S00011378

XBXX S00003914

XBXX S00001207

XBXX S00022015

XBXX S00011038XBXX S00022183

XBS00022846 XBXX S00022847

XBS00022090 XBS00022988XBS00009915

XBS00022113XBS00021924XBS00021903

XBSXX 00003854 XBS00018120XBS00023046XBS00023095

XBXX S00022376 XBXX S00023046XBXX S00023131

XBS00023014

Xgwmww 194g

0

5

10

15

20

XBSXX 00001241 XBS00009974XBSXX 00011261 XBS00021736XBS00022169 XBS00022174

XBS00022661XBSXX 00022816 XBS00015312XBSXX 00011628 XBS00010339XBSXX 00010925 XBS00011273

XBS00012054XBSXX 00009492 XBS00011224

XBS00010455XBS00019518

XBSXX 00022213 XBS00023031XBS00003623

XBSXX 00003674 XBS00010216XBS00010504

XBSX 0001111 5 XBS00022418

XBXX S00009970

XBXX S00021737 XBSXX 00021738

XBXX S00021715

XBS00010235 XBXX S00011148

XBS00009426 XBXX S00009480XBS00010115 XBXX S00012107

XBXX S00015504XBXX S00011510

XBS00010830 XBXX S00014274XBXX S00022534

XBS00022686 XBXX S00021984XBXX S00022785

XBS00010374 XBXX S00011338XBX S00022038

XBS00009915

XBS00012457

XBS00012419

XBS00011336

XBS00012006 XBS00023035XBS00021722 XBS00022582XBS00022646 XBS00022653

XBS00023014XgXX wmww 194g

XgwXX mww 609

25

30

35

40

45

XBSXX 0001111 5 XBS00022418

XBS00011939

XBS00010582

XBS00022837

XBS00018651

XBXX S00021727

XBXX S00022038XBXX S00021986XBXX S00022258

XBS00022090 XBXX S00022161XBS00022193 XBXX S00022227XBS00009253 XBXX S00009439XBS00003646 XBXX S00011851XBS00011859 XBXX S00022809XBS00022179 XBXX S00009530XBS00018707 XBXX S00022576XBS00009915 XBXX S00014289XBS00003858 XBXX S00009672

XBXX S00011085XBS00022186 XBXX S00022194

XBXX S00020575XBS00010940 XBXX S00009774XBS00022184 XBXX S00003759XBS00010067 XBXX S00010409XBS00021752 XBXX S00022136

XBXX S00022653XBXX S00022046 XBS0002258XX 2

Xgwmww 624

50

55

60

65

70

XBS00023220

XBS00022363 XBXX S00022364XBS00003765 XBXX S00011040

XBXX S00017892XBXX S00009272 XBXX S00022646

XBXX S00009342XBXX S00004407 XBXX S00011336

XBXX S00022830XBXX S00022855

XBS00022972 XBXX S00022413XBXX S00023035XBXX S00012006

XBS00014075 XBXX S00022466XBXX S00023179

XBS00003704 XBXX S00003927XBS00022018 XBXX S00022055

XBXX S00022808XBS00022793 XBXX S00023024

XBXX S00023051XBS00022366 XBXX S00023204XBS00009333 XBXX S00010015

75

80

85

90

95

XBS00022395XBS00021957

XBS00009882 XBS00010716XBSXX 00000346 XBS00009590XBSXX 00009387 XBS00011583XBSXX 00011623 XBS00010645XBSXX 00021989 XBS00010766XBSXX 00021958 XBS00022929

XBS00022932XBS00023219XBS00023164

XBS00023007

XBXX S00012482 XBSXX 00020151

100

105

110

115

120

XBSXX 00010116 XBS00010202XBS00009680XBS00009703

XBSXX 00003587 XBS00023012XBS00017912XBS00022546XBS00022338

XBSXX 00022996 XBS00003692XBS00022414

XBXX S00021957 XBXX S00022395

XBXX S00010716

125

130

135

140

145

XBXX S00023164

150

155

160

165

Figure 3 Continued

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 9

Page 10: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

XBXX S00004373 XBS00010953 XBXX S00003643 XBS00022116 XBS00022336 XBXX S00003944

XBXX S00011294 XBXX S00022336

XBSXX 00021901XBSXX 00022100XBSXX 00022067

XBS00012312 XBXX S00021670

XBS00000763

0

5

D5_RxS2D5_CxAxxB5_RxSB5_CxAxxA5_RxSA5_CxAxx

XBS00022116

XBS00023008

XBS00003696

XBS00021708XBXX S00010218 XBS00011722

XBS00021873XBS00011537XBXX S00011116

XBS00023008XBXX S00021743

XBXX S00023018XBXX S00021734

XBXX S00021773

XBS00009264 XBSXX 00010981

XBS00022115 XBSXX 00003739XBSXX 00021732

XBS00010153 XBSXX 00011495

XBXX S00022107 XBS00022773XBXX S00023134XBXX S00004283 XBXX S00023072XBXX S00023226 XBSXX 00022228

10

15

20

25

30

35

40

XBS00021963XBXX S00022034 XBS00022500

XBS00022110XBS00011278XBS00005311

XBXX S00000645 XBS00021801XBS00000373XBS00009484XBS00021953XBS00022815

XBXX S00015653 XBS00011873XBXX S00003746 XBS00009531XBXX S00009658 XBS00022495XBXX S00022191 XBS00010275

XBS00010966XBS00000435XBS00011318XBS00000648

XBS00021708XBXX S00022710XBXX S00010190XBS00022110XBXX S00022812XBXX S00000615 XBS00000962XBS00003746 XBSXX 00004142XBXX S00004148 XBS00004187XBXX S00004202 XBS00012772XBXX S00013703 XBS00022191XBXX S00022358 XBS00022509XBXX S00022914 XBS00022974XBXX S00023180XBXX S00022987XBXX S00023013XBS00011116XBXX S00022683

XBS00011095 XBSXX 00015136XBS00010491 XBSXX 00011480XBS00022107 XBSXX 00022602

XBS0002277377XBSXX 00018917

XBSXX 00003726XBS00021695 XBSXX 00010616

XBSXX 00011186XBSXX 00003715XBSXX 00003789XBSXX 00021952XBSXX 00022056XBSXX 00022326XBSXX 00001817

XBXX S00004201 XBXX S00012140XBXX S00022647 XBXX S00022886XBXX S00004698XBXX S00001597 XBXX S00003993XBXX S00022231 XBXX S00022477XBXX S00012503XBXX S00004195XBXX S00021868 XBXX S00022899XBX S00022956

XBSXX 00022699

XBS00022559XBSXX 00009821

XBXX S00022157 XBS00022537XBSXX 00022850

XBS00011469 XBSXX 00011794XBS00011829 XBSXX 00022036

XBSXX 00022291XBS00011077 XBSXX 00011769XBS00004065 XBSXX 00013935XBS00022688 XBSXX 00009287

XBSXX 00021687XBS00003783

XBS00023103

45

50

55

60

65

70

75

XBS00011890XBS00022457

XBXX S00011235 XBS00021669XBXX S00021678 XBS00022215

XBXX S00021955 XBS00011227XBS00001735XBS00009799XBS00009369

XBXX S00018510 XBS00021942XBXX S00022071 XBS00022638XBXX S00010757 XBS00010419XBXX S00010702 XBS00011360

XBS00011605XBXX S00022069 XBS00022893

XBXX S00009745

XBXX S00021725

XBXX S00022608 XBS00022761XBXX S00022762 XBS00022817XBXX S00022818 XBS00022936

XBSXX 00001817

XBS00003565 XBSXX 00004427XBSXX 00010573XBS0002267377XBS00011282XBSXX 00004296XBSXX 00022755XBSXX 00018740XBSXX 00011633XBSXX 00022141XBSXX 00003586XBSXX 00021948

XBS00010703 XBSXX 00022476XBS00021709 XBSX 00012098

XBXX S00022956XBXX S00000703XBXX S00022433 XBXX S00022991XBXX S00021960XBXX S00003655

XBXX S00003942 XBXX S00022065XBXX S00023067XBXX S00009621 XBS00011282XBXX S00022689

XBXX S00010337 XBXX S00012348XBXX S00022673 XBS00023127XBXX S00023139

XBS00003783XBS00003826 XBSXX 00009294

XBSXX 00021731XBS00000622 XBSXX 00003711XBS00010322 XBSXX 00012069

XBSXX 00021742XBXX S00021991 XBS00022277XBXX S00022570 XBS00022814XBXX S00009777 XBS00023094

XBS00003563

80

85

90

95

100

105

110

115 XBXX S00022069 XBS00022893

XBXX S00022128 XBS00022198XBXX S00010882 XBS00011298

XBS00022200

XBS00022205XBXX S00021968 XBS00022838

XBS00022867

XBXX S00022818 XBS00022936XBXX S00023029 XBS00023076XBS00022457

XBS00010757 XBSXX 00022311XBXX S00009369

XBS00021709 XBSXX 00012098XBS00021960

XBS00009388 XBSXX 00021671XBS00003655 XBSXX 00009378XBS00022065 XBSXX 00022852XBS00010287 XBSXX 00010802XBXX S00011183 XBS00023067

XBS00009621XBS00003592 XBSXX 00023064

XBSXX 00009816XBS00011049 XBSXX 00018073

XBSXX 00022718XBS00022963 XBSXX 00023077XBS00009443 XBSXX 00003615XBS00003637 XBSXX 00003770XBS00011453 XBSXX 00011514XBS00022780 XBSXX 00003904

XBSXX 00010311XBSXX 00003740

XBS00012038 XBSXX 00017264XBSXX 00000592

XBXX S00009335 XBS00012975XBXX S00023163

XBS00003563

XBS00022559

XBS00004065 XBS00022688

115

120

125

130

135

140

145

150 XBS00022867XBS00022299 XBS00021703XBXX S00021969 XBS00022681

XBS00023070XBSXX 00022545 XBS00022630

XBS00022864XBS00022028

XBS00022890 XBS00023125XBXX S00003501 XBS00009274XBXX S00011915 XBS00021684XBXX S00000006 XBS00000028XBXX S00000037 XBS00001953XBXX S00003693 XBS00010048XBXX S00011221 XBS00011592XBXX S00011816 XBS00011846

XBS00021860XBS00010234

XBXX S00023016

XBXX S00021756 XBS00021757XBS00022128 XBSXX 00022891

XBXX S00022656

XBSXX 00000592XBS00021971 XBSXX 00022017

XBSXX 00022400XBS00022918 XBSXX 00023216XBS00000701 XBSXX 00010590

XBSXX 00009414XBSXX 00009589XBS00009335

XBS00014912 XBSXX 00014954XBSXX 00010017

XBS00010923 XBSXX 00022033XBS00022763 XBSXX 00022068XBS00022408 XBSXX 00022471XBS00023098 XBSXX 00022520XBS00022578 XBSXX 00022640XBS00023096 XBSXX 00022040XBS00022063 XBSXX 00022999XBS00015364 XBSXX 00000865XBS00003724 XBSXX 00003736XBS00009373 XBSXX 00009382XBS00009581 XBSXX 00009851

XBXX S00003705 XBS00021925XBXX S00022555 XBXX S00022743XBXX S00023081XBXX S00022357XBXX S00003876 XBS00022151XBXX S00023161

XBS00000622 XBXX S00003580XBS00003995 XBXX S00019894XBS00021991 XBS00022277XBS00022352 XBXX S00022814XBS00023094

XBS00003783

XBS00022537

XBS00003563

155

160

165

170

175

180

185

XBS00021703 XBXX S00022299

XBS00021969 XBXX S00023070

XBXX S00022664

XBXX S00004166 XBS00015694XBS00022890XBS00000006 XBXX S00010234XBS00022630 XBXX S00023125

XBS00009581 XBSXX 00009851XBS00009931 XBSXX 00010033XBS00010366 XBSXX 00010542XBS00010581 XBSXX 00010636XBS00010675 XBSXX 00010863XBS00010886 XBSXX 00010929XBS00010952 XBSXX 00011006XBS00011123 XBSXX 00011635XBS00011782 XBSXX 00011946XBS00021869 XBSXX 00003879XBS00010341 XBSXX 00010720XBS00011474 XBSXX 00011750XBS00010631 XBSXX 00011108XBS00021994 XBSXX 00011558XBS00010124 XBSXX 00010301XBS00010532 XBSXX 00012035XBS00011931 XBSXX 00010044

XBSXX 00015089XBSXX 00022075

XBS00022600 XBSXX 00023078XBSXX 00021949

190

195

200

205

210

215

220

XBXX S00012489 XBSXX 00009366XBS00009311 XBSXX 00009607XBS00010292 XBSXX 00011386

XBSXX 00003681XBS00003705XBSXX 00009789

XBXX S00009719 XBS00003876XBSXX 00011205

XBS00003900 XBSXX 00011655XBSXX 00016286

XBS00022293 XBSXX 00022663XBS00023132 XBSXX 00022086XBS00022103 XBSXX 00022662

XBSXX 00022652XBS00022151XBSXX 00022037

XBS00022542 XBSXX 0002317477

225

230

235

Figure 3 Continued

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.10

Page 11: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

D6_RxSD6_CxAxxB6_RxSB6_CxAxxA6_RxSA6_CxAxx

XBXX S00011125 XBS00011445XBS00010922XBS00010458

XBS00011642

XBS00012530

XBXX S00011125 XBS00012304XBXX S00021982

XBS0XX 0010693XBS00003786 XBS0XX 0011821XBXX S00018835 XBS0XX 0003593XBXX S00003622 XBS0XX 0003659XBXX S00003748 XBS0XX 0003772XBXX S00003792 XBS0XX 0003812XBXX S00009825 XBS0XX 0010277XBXX S00010439 XBS0XX 0010551XBXX S00011746 XBS0XX 0015492

XBS0XX 0016587XBXX S00012481 XBS0XX 0022232XBXX S00022491 XBS0XX 0003785XBXX S00009795 XBXX S00009967

XBS0XX 0004009

XBS0XX 0012308

XBXX S00003786

XBS0XX 0021867 XBS00022481XBS00009806XBS00022795XBXX S00021983XBS00010742XBXX S00009514

XBS00009514XBS00021983 XBS00022094

0

5

10XBXX S00021982 XBXX S00022843

XBS00003632 XBS00004278XBS00009759 XBS00010401XBS00011131 XBS00012150

XBS00012414XBS00022946XBS00022926XBS00023192

XBS00009331 XBS00010600XBS00011010 XBS00011436XBS00003635 XBS00009584XBS00022292 XBS00022660

XBS00022951XBS0002305455

XBS00003861 XBS00009664XBS00009729 XBS00009746XBS00009782 XBS00010360XBS00010933 XBS00011711XBS00009985 XBS00011034

XBS00012319

XBS00010144 XBXX S00010600XBS00011591 XBXX S00022951XBXX S00023054

S00009795 BS0000996XBS0XX 0010580

XBXX S00010443 XBS0XX 0010869XBXX S00011381 XBS0XX 0022499

XBS0XX 0010403

XBS0XX 0012491XBXX S00023196 XBS0XX 0003852XBXX S00003893 XBS0XX 0010415XBS00010420 XBS0XX 0010985XBXX S00011365 XBS0XX 0011665

XBS0XX 0023223XBS00004016 XBS0XX 0009844XBXX S00010179 XBS0XX 0010602XBXX S00011624 XBS0XX 0011995

XBS0XX 0011604

XBS0XX 0021677

10

15

20

25

30

35

XBSXX 00009319 XBXX S00009783XBS00010777 XBS00010850XBS00011861 XBS00003881XBS00004377 XBS00010441XBS00016670 XBS00021765XBS00009988 XBS00011265XBS00011789 XBS00013851XBS00014317 XBS00001132XBS00003581 XBS00009871XBS00010497 XBS00010872XBS00011122 XBS00014117XBS00010749 XBS00014510

XBS00015710XBS00001037

XBS00021999 XBS00023086XBS00009376XBS00011607XBS00022704XBS00022057XBS00010811

XBS00022947 XBS00022031

XBS00001234

XBS00021747 XBXX S00021999XBXX S00023086XBS00021343

XBXX S00021674 XBS0XX 0021686XBXX S00021698 XBS0XX 0021705XBXX S00022772 XBS0XX 0011036XBXX S00011301 XBS0XX 0011809XBXX S00012093 XBS0XX 0019726XBS00003620 XBXX S00003897XBXX S00009695 XBS0XX 0023208XBXX S00022096 XBXX S00010223XBXX S00014588 XBSXX 00010093

XBSXX 00022668XBSXX 00022226

XBXX S00022163 XBSXX 00022370XBS00010328 XBXX S00022117XBXX S00010537 XBSXX 00003641XBXX S00014363 XBSXX 00022155

XBXX S00003671XBXX S00003562 XBSXX 00011429XBXX S00010993 XBSXX 00011479

XBSXX 00023187XBSXX 00023032

XBSXX 00022865

XBSXX 00022621 XBS00022823XBXX S00010420

XBS0XX 0003891 XBS00012411

XBS00021970

XBS00022856

XBS00019487

40

45

50

55

60

65

XBS00022947 XBS00022031XBS00022992XBS00023089XBS00023119

XBS00022120 XBS00022412XBS00022553 XBS00022605XBS00022879 XBS00022892XBS00022894 XBS00022913

XBS00023183XBS00023092XBS00023020

XBS00022613 XBS00022628XBS00022840 XBS00022836XBS00001405 XBS00003616XBS00003676 XBS00003877XBS00003916 XBS00009261XBS00010576 XBS00010586XBS00010780 XBS00011696XBS00011982 XBS00012028XBS00010730 XBS00019008

XBXX S00023033

XBS00000043 XBXX S00022553XBXX S00022605 XBS00022913

XBS00022832 XBXX S00022937XBSXX 00022480XBSXX 00023224

XBSXX 00022081

XBXX S00022937XBS0XX 0021658

XBXX S00019487

XBS00019487

XBS00012213

XBS00021862 XBS00021881

70

75

80

85

90

95

XBS00018855XBS00022489 XBS00022592XBS00003835 XBS00021961

XBS00021965

XBS00023088

XBXX S00021965XBS00000749 XBXX S00022372XBS00023199

XBS0XX 0011458

XBS0XX 0010618

XBS0XX 0023066

XBXX S00023042 XBS0XX 0022444XBXX S00022240 XBS0XX 0022529XBXX S00022437 XBS0XX 0023021XBXX S00009548 XBS0XX 0010627XBXX S00010870 XBS0XX 0011603XBXX S00012034 XBS0XX 0022828XBXX S00022278 XBS0XX 0011795XBX S00010121 XBS0X 0010342

XBS0XX 0010902 XBS00012454XBS0XX 0022672

XBXX S00003897 XBS00010223XBS0XX 0012274 XBS00022709XBS0XX 0023168XBS0XX 0004012 XBS00021726XBS0XX 0004272 XBS00021729XBS0XX 0022345 XBS00022599XBS0XX 0023050 XBS00023061XBS0XX 0023217XBXX S00003562XBXX S00003671 XBS00004016

XBS00009390 XBXX S00009835XBS00012510 XBXX S00010049XBS0XX 0001068 XBS00003569XBS00003697 XBXX S00010186XBS00010787 XBXX S00011542

XBXX S00011061

XBS00003568 XBXX S00003653XBS00003753 XBXX S00003827XBS00003903 XBXX S00011111XBS00012247 XBXX S00014600

XBXX S00022204XBS00022206

XBS00009390 XBXX S00009835XBS00010049 XBXX S00010186XBS00012510XBS00010540 XBXX S00011542XBS00003697 XBXX S00011061

100

105

110

115

120

125

XBS00003339 XBS00009351XBS00010195 XBS00012050XBS00010526 XBS00011578

XBS00012046XBS00010408 XBS00011376XBS00003818 XBS00003958

XBS00009314XBS00003723 XBS00009300

XBS00003905XBS00011962

XBS00010129 XBS00010326XBS00011404XBS00021704

XBS00022953 XBS00023197XBS00022095 XBS00022164

XBXX S00010121 XBSXX 00010342 XBXX S00003671 XBS00004016XBXX S00022117XBS0XX 0011671XBXX S00022096XBS0XX 0022462

XBS00022206XBS00023009

XBS00003653 XBXX S00003753XBS00003827 XBXX S00003903XBS00004157 XBS00010516XBS00011111 XBXX S00012247XBS00014600 XBXX S00022204XBS00022787

125

130

135

140

145

150

Figure 3 Continued

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 11

Page 12: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

XBS00003665 XBXX S00009404XBXX S00010783 XBXX S00011724XBXX S00011813 XBXX S00022146XBXX S00010887 XBXX S00016937

XBXX S00009332XBXX S00022705XBXX S00022406

XBX S00010006 XBX S00012133

XBXX S00012368

XBX S00009404 XBX S00009736

XBSXX 00003494 XBSXX 00003760XBSXX 00003830 XBSXX 00010557

XBSXX 00012117XBSXX 00022009 XBSXX 00021972

XBXX S00023198

XBXX S00012587 XBXX S00013172XBXX S00022522 XBXX S00023034XBXX S00023166XBXX S00019563

XBXX S00022610XBXX S00022825 XBXX S00023150

XBXX S00023159

XBXX S00009457 XBXX S00018030

XBXX S00001983 XBXX S00022948XBXX S000231080

5

D7_RxSD7_CxAxB7_RxSB7_CxAxA7_RxSA7_CxAx

XBXX S00010006 XBXX S00012133XBXX S00010796

XBXX S00009543 XBXX S00011883XBXX S00022082 XBXX S00022404

XBXX S00010988XBXX S00023055

XBXX S00009404 XBXX S00009736

XBXX S00022386XBXX S00022538XBXX S00016214 XBXX S00022724

XBXX S00010783XBXX S00012033

XBSXX 00011069 XBSXX 00009290XBSXX 00003350

XBSXX 00010500 XBSXX 00022045XBSXX 00022106XBSXX 00022390XBSXX 00022053XBSXX 00009556

XBSXX 00009398

XBXX S00012174XBXX S00022087 XBXX S00022756

XBXX S00009457 XBXX S00018030XBXX S00021838 XBXX S00022077

XBXX S00010427

XBXX S00022203

10

15

20

25

30

35

XBXX S00022076

XBXX S00013872XBXX S00002556XBXX S00011330

XBXX S00022097

XBXX S00022145XBXX S00009978

XBXX S00010020 XBXX S00010307XBXX S00010765XBXX S00010605XBX S00022187

XBXX S00022406 XBS00022720XBXX S00022857 XBXX S00023147

XBXX S00010796

XBXX S00000345

XBSXX 00010251XBSXX 00009861

XBSXX 00023023 XBSXX 00022455

XBSXX 00009464 XBSXX 00010082XBSXX 00010454 XBSXX 00010660

XBSXX 00011065XBXX S00023166XBXX S00023034

XBXX S00022087 XBXX S00022756XBSXX 00023059XBSXX 00010327

XBSXX 00022195 XBSXX 00003664XBSXX 00003672 XBSXX 00010560XBSXX 00009879 XBSXX 00010348

XBXX S00004171 XBSXX 00004346XBXX S00004350 XBXX S00010355

XBXX S00009403XBXX S00011872XBXX S00009565XBXX S00010064

40

45

50

55

60

65

XBXX S00022187XBXX S00022435

XBXX S00021964 XBXX S00022469XBXX S00022475 XBXX S00022306XBXX S00021771 XBXX S00021769

XBXX S00021770XBXX S00011072 XBXX S00021789

XBXX S00021997XBXX S00022751XBXX S00022757

XBXX S00022049 XBXX S00022959XBXX S00011244XBXX S00009477XBXX S00011241

XBXX S00009346 XBXX S00011248XBXX S00022079

XBXX S00022202 XBXX S00022895XBXX S00009694 XBXX S00009838XBXX S00011179 XBXX S00011701

XBXX S00022004XBXX S00023128

XBXX S00004136

XBXX S00022442 XBXX S00023172XBXX S00010988 XBXX S00023055

XBXX S00022076

XBXX S00012317

XBSXX 00005062 XBSXX 00009455XBSXX 00010208

XBSXX 00010181 XBSXX 00021666XBSXX 00009245 XBSXX 00009379

XBSXX 00010819XBSXX 00003756 XBSXX 00004376

XBSXX 00009619XBSXX 00021717 XBSXX 00009794XBSXX 000107477 7 XBSXX 00022175XBSXX 00003703 XBSXX 00010689XBSXX 00011571 XBSXX 00023069XBSXX 00003649 XBSXX 00011767XBXX S00004171 XBSXX 00004403XBXX S00010355 XBSXX 00011217

XBSXX 00011399

XBXX S00010142

XBXX S00022463

XBXX S00021987 XBXX S00022511XBXX S00022721 XBXX S00022875

XBXX S00023045XBXX S00023184XBXX S00003945

XBS00010071 XBXX S00021745XBS00011507 XBXX S00012122XBXX S00003749 XBXX S00009623

XBXX S00011639

XBXX S00022463

XBXX S00021861

70

75

80

85

90

95

XBXX S00022983 XBXX S00023027XBXX S00009550XBXX S00009975

XBXX S00003894

XBXX S00010809 XBXX S00022170

XBXX S00022097XBXX S00021692

XBXX S00022145 XBS00022235XBXX S00022290 XBXX S00022907

XBXX S00003843XBXX S00000816 XBXX S00011244XBXX S00012302 XBXX S00021659XBXX S00021668 XBXX S000217477 9

XBXX S00003945

XBXX S00021745XBXX S00010967 XBXX S00012122XBXX S00021987 XBXX S00022511XBXX S00023045XBXX S00021859

100

105

110

115

120

125

XBXX S00011350

XBS00022560 XBXX S00022427XBXX S00011597

XBXX S00010748XBXX S00021962

XBXX S00023178 XBXX S00023207XBXX S00021966XBXX S00022617XBXX S00022811

XBXX S00001450 XBXX S00003621XBXX S00011622 XBXX S00011926

XBXX S00022202 XBXX S00022895

XBXX S00011677

125

130

135

140

145

150

XBXX S00011622 XBXX S00011926XBXX S00014199XBXX S00013951XBXX S00014246XBXX S00023109

XBXX S00000663 XBXX S00003720XBXX S00009797

XBXX S00021685 XBXX S00021744XBXX S00003865

XBXX S00004348 XBXX S00009283XBXX S00009886 XBXX S00011689XBXX S00022269 XBXX S00023200XBXX S00022284 XBXX S00022447XBXX S00022119 XBXX S00022137

XBXX S00021657XBXX S00002048 XBXX S00003894XBXX S00022266XBXX S00021663 XBXX S00022169XBXX S00023218

XBXX S00009995

155

160

165

170

175

180

XBXX S00003952XBXX S00022427

185

190

195

200

20

Figure 3 Continued

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.12

Page 13: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

Characterisation of the validated co-dominant SNP assays

showed that their PIC scores and MAF were on average higher

than dominant SNP assays, suggesting they are highly useful

genetic markers for use on a range of materials. Analyses using

the contig sequences containing the different SNP types revealed

that co-dominant SNP assays were more likely to be located in

contigs returning no BLAST hit to either protein or nucleotide

databases, or outside coding regions in those contigs returning a

BLASTX hit. Our analysis is consistent with the hypothesis that a

proportion of the contigs used to develop co-dominant SNP

assays represent single-copy genes. These contigs most likely

represent genes that were lost before or during the domestication

process as they are found as single copies in both landraces, such

as Chinese Spring, and modern varieties. For those SNP contigs

with 15 9 Chinese Spring genomic coverage, it is quite possible

that while these are represented as three homoeologs in Chinese

Spring, they have undergone gene loss down to single copy in the

UK germplasm we have studied. Intracultivar heterogeneity has

been documented between elite inbred lines of crop species, and

there are reports of intervarietal gene loss in wheat (Haun et al.,

2011; Swanson-Wagner et al., 2010; Winfield et al., 2012).

In addition to the factors outlined above, the Chinese Spring

reference used to map the NimbleGen-captured sequences was

based upon cDNA. If only one homoeolog was sampled in the

cDNA data, and this was sufficiently divergent from the other two

homoeologous copies, we may have only been able to map

Illumina sequence data to that single genome. This would be the

case in many 3′ UTR regions that are more divergent than protein-

coding sequence and have diverged sufficiently during evolution

to preclude their co-amplification in the KASPar PCR. This

homoeolog-specific amplification could fortuitously lead to the

development of co-dominant markers, yet BLAST analysis of such

sequences against the Chinese Spring genome would show them

to be present in three copies. In summary, investigations into the

average copy number of sequences used to develop co-dominant

SNP assays and the location of the SNP in the sequence suggests

that these SNPs are likely to reside in single-copy genes of as yet

unknown function, and/or three-copy genes which are suffi-

ciently divergent that sequence data from one homoeolog does

not map to other copies. The uncharacterised nature of these

genes makes them an exciting and intriguing source of further co-

dominant markers and scientific investigation.

Table 3 Summary of linkage groups and

mapped loci

Chromosome

Avalon 9 Cadenza map Savannah 9 Rialto map

Number of

bristol SNP

loci

Size

(cM)

Average

spacing between

loci (cM)

Number of

bristol SNP

loci

Size

(cM)

Average

spacing

between loci (cM)

1A 89 74.8 0.8 68 143.4 2.1

1B 214 126.4 0.6 50 115.9 2.3

1D 57 116.2 1.6 15 104.1 6.9

2A 82 132.2 1.6 98 211.1 2.2

2B 118 129.4 1.1 91 221.7 2.4

2D 63 77.7 1.2 13 49.8 3.8

3A 86 107.6 1.3 55 194.3 3.5

3B 136 154.5 1.1 34 205.0 6.0

3D 11 66.7 6.1 2 72.3 36.2

4A 71 136.3 1.9 14 157.7 11.3

4B 87 94.4 1.1 12 23.9 2.0

4D 8 60.3 7.5 4 1.3 0.3

5A 93 162.7 1.7 64 223.8 3.5

5B 172 239.0 1.4 51 147.2 2.9

5D 38 95.0 2.5 19 194.6 10.2

6A 136 141.1 1.0 23 111.4 4.8

6B 105 120.2 1.1 37 158.6 4.3

6D 31 91.6 3.0 28 135.6 4.8

7A 103 171.0 1.7 48 200.2 4.2

7B 64 82.8 1.3 14 56.1 4.0

7D 29 54.5 1.9 13 133.8 10.3

Total 1793 2434.4 1.3 753 2861.8 3.8

A genome 660 925.7 1.4 370 1241.9 3.4

B genome 896 946.7 1.1 289 928.4 3.2

D genome 237 562 2.2 94 691.4 7.4

Group 1 360 317.4 0.8 133 363.4 2.7

Group 2 263 339.3 1.3 202 482.6 2.4

Group 3 233 328.8 1.4 91 471.6 5.2

Group 4 166 291 1.8 30 182.9 6.1

Group 5 303 496.7 1.6 134 565.6 4.2

Group 6 272 352.9 1.3 88 405.6 4.6

Group 7 193 308.3 1.6 75 390.1 5.2

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 13

Page 14: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

During this study, we have created two complementary genetic

maps, enabling 73% of our validated SNPs to be assigned a map

location. The co-dominant SNP loci had a similar pattern of

distribution between linkage groups compared with dominant

loci, suggesting that co-dominant SNP markers have a similar

distribution to the previously used dominant markers. Analysis of

the MAF and PIC scores of the different types of mapped SNPs

demonstrated that the co-dominant and partially co-dominant

SNP markers had higher levels of genetic diversity within the lines

tested, compared with dominant SNP assays, suggesting that co-

dominant SNP assays are highly suitable for use as genetic

markers.

The two genetic maps aligned well with each other, with a

similar assignment and order of common markers. Clustering of

SNP markers was observed in both linkage maps, indicating that

despite the relatively large mapping populations used, a lack of

recombination events between these markers may affect map

resolution. This may be overcome by mapping these markers

against a larger number of individuals. Preliminary results indicate

that mapping a subset of 223 evenly spaced A 9 C markers on

566 individuals from an extended A 9 C population reduced the

proportion of completely linked markers from 50.4% to 44.9%,

and it is likely that this figure could be further decreased by

specifically targeting clustered markers. However, despite the

high proportion of clustered markers, 89% of the remaining

markers map to within 10 cM of the next marker, suggesting that

these provide good overall coverage of the genome, with few

gaps. When co-dominant and dominant markers were compared

separately, similar proportions of markers were observed to map

to within 10 cM of each other (88% and 92%, respectively),

suggesting that both marker types are similarly distributed across

the map.

Both maps had a relatively low proportion of D genome loci;

this has been observed in previous studies and is likely to relate to

a lower level of diversity found in the D genome due to the effects

of the genetic bottleneck that accompanied the domestication of

hexaploid wheat (Allen et al., 2011; Caldwell et al., 2004; Chao

Table 4 Summary statistics for mapped loci

Number of

loci

Minor allele

frequency

Polymorphism

information content

Avalon 9 Cadenza

mapped loci

1793 0.264 0.284

Co-dominant loci 672 0.277 0.290

Partially co-dominant

loci

332 0.308 0.313

Dominant loci 789 0.235 0.266

A genome 660 0.263 0.284

B genome 896 0.260 0.283

D genome 237 0.278 0.288

Savannah 9 Rialto

mapped loci

753 0.284 0.299

Co-dominant loci 395 0.291 0.300

Partially co-dominant

loci

213 0.319 0.323

Dominant loci 145 0.246 0.280

A genome 370 0.290 0.305

B genome 289 0.272 0.290

D genome 94 0.294 0.305

0.0

10.0

20.0

30.0

40.0

50.0

60.0

Dominant partially co-dominant

co-dominant All loci Dominant partially co-dominant

co-dominant All loci

AxC SxR

Prop

orti

on o

f tot

al (%

)

Marker type

A genome

B genome

D genome

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

Dominant partially co-dominant

co-dominant All loci Dominant partially co-dominant

co-dominant All loci

AxC SxR

Prop

orti

on o

f tot

al (%

)

Marker type

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

(a)

(b)

Figure 4 Distribution of the different marker types across the (a) A, B and D linkage groups and (b) homoeologous chromosome groups of the

Avalon 9 Cadenza and Savannah 9 Rialto genetic maps.

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.14

Page 15: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

et al., 2009). Although the mean MAF and PIC scores were

similar for A, B and D genome loci, some differences were

observed in the distributions of these measurements. The results

for A and B genome markers were similar; however, loci assigned

to the D genome had a higher proportion of high MAF and PIC

scores compared with A and B genome loci. This is the opposite

to what has been detected in previous studies (Akhunov et al.,

2010; Chao et al., 2009) and suggests that, although the lower

genetic diversity within the D genome hinders SNP discovery and

marker development, the D genome SNPs identified by our

pipeline are as informative and useful as loci from the A and B

genome.

This study has described the design, implementation and val-

idation of a pipeline designed to identify gene-based co-dominant

SNP assays from genomic DNA sequence data. The validation

results suggest that this approach is highly efficient and the

resulting co-dominant SNP markers are evenly distributed across

the genome with relatively high MAF and PIC scores. As such,

these should prove a highly valuable resource for use in breeding

programmes. The construction of two complementary genetic

maps has maximised the amount of mapped SNP loci and allowed

comparisons between UK breeding materials. The genotype data

generated in this study for 47 widely used wheat lines, combined

with genetic map locations for SNP markers, should enable wheat

researchers to target their efforts to regions of interest and enable

QTL studies and marker-assisted selection. The markers described

in this study will be useful in linking the genetic map with the

developing physical maps and so will enhance the possibility of

efficient map-based cloning in hexaploid wheat. The entire data

set presented in this study has been made publicly available via

the provision of supplementary data sets and an interactive

website (http://www.cerealsdb.uk.net/), to make this resource as

accessible and useful as possible. These new co-dominant wheat

SNP-based markers will be useful on a number of genotyping

platforms and germplasm collections and hence should be a

powerful new tool for wheat breeders and researchers alike. In

addition, the pipeline developed here to identify co-dominant

SNP markers should be applicable to other polyploid crops where

SNP discovery and marker development have previously been

challenging (Cordeiro et al., 2006; Trick et al., 2009; Yu et al.,

2012).

Experimental procedures

Plant material

Forty-seven wheat varieties were grown for DNA extraction (for

details see Data S5). The Avalon 9 Cadenza doubled-haploid

(DH) population was supplied by the John Innes Centre and was

developed by Clare Ellerbrook, Liz Sayers and the late Tony

Worland as part of a Defra-funded project led by ADAS. The

parents were originally chosen (to contrast for canopy architec-

ture traits) by Steve Parker (CSL), Tony Worland and Darren Lovell

(Rothamsted Research). The Savannah 9 Rialto DH population

was supplied by Limagrain UK Limited (Woolpit, Suffolk, UK). All

plants were grown in pots in a peat-based soil and maintained in

a glasshouse at 15–25 °C under a light regime of 16 h light and

8 h dark. Leaf tissues were harvested from 6-week-old plants and

immediately frozen on liquid nitrogen and stored at �80 °C until

nucleic acid extraction. Genomic DNA was prepared from leaf

tissue using a phenol–chloroform extraction method (Sambrook

et al., 1989).

Preparation of NimbleGen libraries

The NimbleGen capture array was designed to capture a significant

proportion of the wheat exome and was developed using a gene-

rich assembly of 454 titanium sequence data from normalised and

non-normalised cDNA libraries of Chinese Spring line 42, publically

available EST sequences and the NCBI unigene set (Winfield et al.,

2012). The resulting assembly was used by NimbleGen to design an

array containing 132 605 features with an average length of

426 bp (NimbleGen array reference 100819_Wheat_Hall_-

cap_HX1). NimbleGen sequence libraries were prepared for eight

wheat varieties (Alchemy, Avalon, Cadenza, Hereward, Rialto,

Robigus, Savannah and Xi19) as described by Winfield et al.

(2012). Post-capture-enriched sequencing libraries were subjected

to 110 bp of paired end sequencing on a Illumina Genome

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

Prop

orti

on o

f tot

al (%

)

Minor allele frequency

Co-dominant loci

Partially co-dominant loci

Dominant loci

(a)

0.0

5.0

10.0

15.0

20.0

25.0

30.0

Prop

rtio

n of

tota

l (%

)

Minor allele frequency

A genome loci

B genome loci

D genome loci

(b)

0.010.020.030.040.050.060.070.080.0

Prop

orti

on o

f tot

al (%

)

Polymorphism information content

Co-dominant loci

Partially co-dominant loci

Dominant loci

(c)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

Prop

orti

on o

f tot

al (%

)

Polymorphism information content

A genome loci

B genome loci

D genome loci

(d)

Figure 5 Distribution of minor allele frequency

(MAF) and polymorphism information content

(PIC) scores among the 47 wheat varieties. Loci

were separated into subgroups according to (a,c)

marker type and (b,d) genome.

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 15

Page 16: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

Analyser (GAIIx) using Illumina TruSeq v5 Cluster Generation

(Illumina Inc., San Diego, CA) and sequencing reagents following

the manufacturers preparation guides for paired end runs (Part

15019435 RevB, Oct2010 and Part 15013595 Rev C, Feb 2011,

respectively).

SNP discovery

After pre-processing of reads, where adapter sequences were

removed, the data were submitted to a custom pipeline (Winfield

et al., 2012). NGS sequences generated from the eight varieties

were mapped to the NimbleGen array reference using BWA

version 0.5.9-r16 (Li and Durbin, 2009) with a seed length of 32

bases, and the resulting SAM files were used for downstream

analysis. Uniquely mapped reads were analysed using a series of

custom PERL scripts designed to identify only differences between

varieties as opposed to those between each variety and the

reference sequence. This enabled the exclusion of homoeologous

SNPs (which are not useful markers), which were removed from

the SNP discovery pipeline. SNPs were called where there were at

least two alternative bases predicted at a reference position. An

additional constraint on SNP prediction required each SNP to be

represented by two or more independent reads or 2% of all reads

examined (whichever was the greater). Only bases that were

located at the centre of a three-base window of PHRED

quality � 20 were included in the analysis. Sequences were

discarded if they displayed more than 10% sequence variation

from the reference over their length or if they mapped equally

well to more than one locus, as the mapping in these situations

could be regarded as uncertain. In cases where multiple reads

started at the same position in the reference, all but one were

ignored to guard against clonal reads being sampled more than

once. All NGS data generated for this study will be available at:

http://www.cerealsdb.uk.net. In addition, the Illumina fastq files

and associated metadata have been uploaded to NCBI Sequence

Read Archive (SRA) under the study accession SRP011067.

Accession numbers of fastq files for each variety are as follows:

Alchemy (SRR417586.1), Avalon (SRR417587.1), Cadenza

(SRR417953.1), Hereward (SRR417954.1), Rialto (SRR417955.1),

Robigus (SRR418209.1), Savannah (SRR418210.1) and Xi19

(SRR418211.1).

SNP validation

For each putative varietal SNP, two allele-specific forward primers

and one common reverse primer (Data S1) were designed

(KBioscience, Hoddesdon, UK). Genotyping reactions were per-

formed in a Hydrocycler (KBioscience) in a final volume of 1 lLcontaining 1 9 KASP 1536 Reaction Mix (KBioscience), 0.07 lLassay mix (containing 12 lM each allele-specific forward primer

and 30 lM reverse primer) and 10–20 ng genomic DNA. The

following cycling conditions were used: 15 min at 94 °C; 10

touchdown cycles of 20 s at 94 °C, 60 s at 65–57 °C (dropping

0.8 °C per cycle); and 26–35 cycles of 20 s at 94 °C, 60 s at 57 °C. Fluorescence detection of the reactions was performed using a

Omega Pherastar scanner (BMG LABTECH GmbH, Offenburg,

Germany), and the data were analysed using the KlusterCaller 1.1

software (KBioscience).

Genetic map construction

The software programme MapDisto v. 1.7 (Lorieux, 2012) was

used to place the SNP markers in the previously established

genetic map for Avalon 9 Cadenza (http://www.wgin.org.uk/

resources/MappingPopulation/TAmapping.php). A chi-square test

was performed on all loci to test for segregation distortion from

the expected 1 : 1 ratio of each allele in a DH population, and any

loci showing significant distortion were removed from the data

set before constructing the linkage groups. Loci were assembled

into linkage groups using likelihood odds (LOD) ratios with a LOD

threshold of 6.0 and a maximum recombination frequency

threshold of 0.40. The linkage groups were ordered using the

likelihoods of different locus-order possibilities and the iterative

error removal function (maximum threshold for error probability

0.05) in MapDisto and drawn in MapChart (Voorrips, 2002). The

Kosambi mapping function (Kosambi, 1944) was used to calcu-

late map distances (cM) from recombination frequency.

SNP data analysis

Summary statistics (MAF and PIC estimates) were calculated for

loci using Powermarker 3.25 software (Liu and Muse, 2005).

Acknowledgements

We are grateful to the Biotechnology and Biological Sciences

Research Council, UK, and the Crop Improvement Research Club

(CIRC) for providing the funding for this work (awards BB/

I003207/1, BB/I017496/1). We are grateful to the Wheat Genetic

Improvement Network for making the mapping data relating to

the Avalon 9 Cadenza population public. For further details of

the Avalon 9 Cadenza mapping population, please refer to the

Wheat Genetic Improvement Network web site at: http://www.

wgin.org.uk/resources/MappingPopulation/TAmapping.php. We

also thank Limagrain UK limited for supplying the Savan-

nah 9 Rialto mapping population and related marker data.

References

Adams, K.L. and Wendel, J.F. (2005) Novel patterns of gene expression in

polyploid plants. Trends Genet. 21, 539–543.

Akhunov, E., Nicolet, C. and Dvorak, J. (2009) Single nucleotide polymorphism

genotyping in polyploidy wheat with the Illumina GoldenGate assay. Theor.

Appl. Genet. 119, 507–517.

Akhunov, E.D., Akhunova, A.R., Anderson, O.D., Anderson, J.A., Blake, N.,

Clegg, M.T., Coleman-Derr, D., Conley, E.J., Crossman, C.C., Deal, K.R.,

Dubcovsky, J., Gill, B.S., Gu, Y.Q., Hadam, J., Heo, H., Huo, N., Lazo, G.R.,

Luo, M.C., Ma, Y.Q., Matthews, D.E., McGuire, P.E., Morrell, P.L., Qualset, C.

O., Renfro, J., Tabanao, D., Talbert, L.E., Tian, C., Toleno, D.M., Warburton,

M.L., You, F.M., Zhang, W. and Dvorak, J. (2010) Nucleotide diversity maps

reveal variation in diversity among wheat genomes and chromosomes. BMC

Genomics, 11, 702.

Akhunova, A.R., Matniyazov, R.T., Liang, H. and Akhunov, E.D. (2010)

Homoeolog-specific transcriptional bias in allopolyploid wheat. BMC

Genomics, 11, 505.

Allen, A.M., Barker, G.L., Berry, S.T., Coghill, J.A., Gwilliam, R., Kirby, S.,

Robinson, P., Brenchley, R.C., D’Amore, R., McKenzie, N., Waite, D., Hall, A.,

Bevan, M., Hall, N. and Edwards, K.J. (2011) Transcript-specific, single-

nucleotide polymorphism discovery and linkage analysis in hexaploid bread

wheat (Triticum aestivum L.). Plant Biotechnol. J. 9, 1086–1099.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic

local alignment search tool. J. Mol. Biol. 215, 403–410.

Berkman, P.J., Lai, K., Lorenc, M.T. and Edwards, D. (2012) Next generation

sequencing applications for wheat crop improvement. Am. J. Bot. 99, 365–

371.

Biesecker, L.G., Shianna, K.V. and Mullikin, J.C. (2011) Exome sequencing: the

expert view. Genome Biol. 12, 128.

Caldwell, K.S., Dvorak, J., Lagudah, E.S., Akhunov, E., Luo, M.C., Wolters, P.

and Powell, W. (2004) Sequence polymorphism in polyploid wheat and their

D-genome diploid ancestor. Genetics, 167, 941–947.

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Alexandra M. Allen et al.16

Page 17: Discovery and development of exome ... - cerealsdb.uk.net · made available via the use of next-generation sequencing combined with a variety of genotyping platforms. However, many

Chao, S., Zhang, W., Akhunov, E., Sherman, J., Ma, Y., Luo, M.C.

and Dubcovsky, J. (2009) Analysis of gene-derived SNP marker

polymorphism in US wheat (Triticum aestivum L.) cultivars. Mol. Breed.

23, 23–33.

Chao, S., Dubcovsky, J., Dvorak, J., Luo, M.C., Baenziger, S.P., Matnyazov, R.,

Clark, D.R., Talbert, L.E., Anderson, J.A., Dreisigacker, S., Glover, K., Chen, J.,

Campbell, K., Bruckner, P.L., Rudd, J.C., Haley, S., Carver, B.F., Perry, S.,

Sorrells, M.E. and Akhunov, E.D. (2010) Population- and genome-specific

patterns of linkage disequilibrium and SNP variation in spring and winter

wheat (Triticum aestivum L.). BMC Genomics, 11, 727.

Cordeiro, G.M., Eliott, F., McIntyre, C.L., Casu, R.E. and Henry, R.J. (2006)

Characterisation of single nucleotide polymorphisms in sugarcane ESTs.

Theor. Appl. Genet. 113, 331–343.

Dixon, J., Braun, H.J. and Crouch, J. (2009) Transitioning wheat research to

serve the future needs of the developing world. In: Wheat Facts and

Futures (Dixon, J., Braun, H.J. and Kosina, P., eds), pp. 1–19. Mexico:

CIMMYT.

Dubcovsky, J. and Dvorak, J. (2007) Genome plasticity a key factor in the

success of polyploid wheat under domestication. Science, 316, 1862–1866.

Haudry, A., Cenci, A., Ravel, C., Bataillon, T., Brunel, D., Poncet, C., Hochu, I.,

Poirier, S., Santoni, S., Glemin, S. and David, J. (2007) Grinding up wheat: a

massive loss of nucleotide diversity since domestication. Mol. Biol. Evol. 24,

1506–1517.

Haun, W.J., Hyten, D.L., Xu, W.W., Gerhardt, D.J., Albert, T.J., Richmond, T.,

Jeddeloh, J.A., Jia, G., Springer, N.M., Vance, C.P. and Stupar, R.M.

(2011) The composition and origins of genomic variation among

individuals of the soybean reference cultivar Williams 82. Plant Physiol.

155, 645–655.

Kaur, S., Francki, M.G. and Forster, J.W. (2012) Identification, characterization

and interpretation of single-nucleotide sequence variation in allopolyploid

crop species. Plant Biotechnol. J. 10, 125–138.

Kosambi, D.D. (1944) The estimation of map distances from recombination

values. Ann. Eugen. 12, 172–175.

Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with

Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760.

Liu, K. and Muse, S.V. (2005) PowerMarker: an integrated analysis environment

for genetic marker analysis. Bioinformatics, 21, 2128–2129.

Liu, B., Xu, C., Zhao, N., Qi, B., Kimatu, J.N., Pang, J. and Han, F. (2009) Rapid

genomic changes in polyploid wheat and related species: implications for

genome evolution and genetic improvement. J. Genet. Genomics, 36, 519–

528.

Lorieux, M. (2012) MapDisto: fast and efficient computation of genetic linkage

maps. Mol. Breeding, 30, 1231–1235.

Paux, E., Faure, S., Choulet, F., Roger, D., Gauthier, V., Martinant, J.P.,

Sourdille, P., Balfourier, F., Le Paslier, M.C., Chauveau, A., Cakir, M.,

Gandon, B. and Feuillet, C. (2010) Insertion site-based polymorphism markers

open new perspectives for genome saturation and marker-assisted selection

in wheat. Plant Biotechnol. J. 8, 196–210.

Paux, E., Sourdille, P., Mackay, I. and Feuillet, C. (2011) Sequence-based marker

development in wheat: advances and applications to breeding. Biotechnol.

Adv. http://dx.doi.org/10.1016/j.bbr.2011.03.031.

Reynolds, M., Foulkes, M.J., Slafer, G.A., Berry, P., Parry, M.A.J., Snape, J.W.

and Angus, W.J. (2009) Raising yield potential in wheat. J. Exp. Bot. 60,

1899–1918.

Sambrook, J., Fritsch, E.F. and Maniatis, T. (1989) Molecular Cloning:

A Laboratory Manual, 2nd edn. Cold Spring Harbor: Cold Spring Harbor

Laboratory Press.

Swanson-Wagner, R.A., Eichten, S.R., Kumari, S., Tiffin, P., Stein, J.C., Ware, D.

and Springer, N.M. (2010) Pervasive gene content variation and copy number

variation in maize and its undomesticated progenitor. Genome Res. 20,

1689–1699.

Trick, M., Long, Y., Meng, J. and Bancroft, I. (2009) Single nucleotide

polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa

transcriptome sequencing. Plant Biotechnol. J. 7, 334–346.

Trick, M., Adamski, N., Mugford, S.G., Jiang, C., Febrer, M. and Uauy, C. (2012)

Combining SNP discovery from next-generation sequencing data with bulked

segregant analysis (BSA) to fine-map genes in polyploid wheat. BMC Plant

Biol. 12, 1–14.

Voorrips, R.E. (2002) MapChart: software for the graphical presentation of

linkage maps and QTLs. J. Hered. 93, 77–78.

Winfield, M.O., Wilkinson, P.A., Allen, A.M., Barker, G.L.A., Coghill, J.A.,

Burridge, A., Hall, A., Brenchley, R.C., D’Amore, R., Hall, N., Bevan, M.,

Richmond, T., Gerhardt, D.J., Jeddeloh, J.A. and Edwards, K.J. (2012)

Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol.

J. 10, 733–742.

Yu, J.Z., Kohel, R.J., Fang, D.D., Cho, J., Van Deynze, A., Ulloa, M., Hoffman, S.

M., Pepper, A.E., Stelly, D.M., Jenkins, J.N., Saha, S., Kumpatla, S.P., Shah,

M.R., Hugie, W.V. and Percy, R.G. (2012) A high-density simple sequence

repeat and single nucleotide polymorphism genetic map of the tetraploid

cotton genome. Genes Genomes Genetics, 2, 43–58.

Supporting information

Additional Supporting information may be found in the online

version of this article:

Data S1 KASPar assay details for 1190 SNPs designed as part of

this study.

Data S2 Genotyping results of a panel of 47 wheat varieties

screened with 1138 KASPar assays.

Data S3 KASPar assay details for all assays, including genetic map

position.

Data S4 Mapping data for all assays mapped on the

Avalon 9 Cadenza and Savannah 9 Rialto cross.

Data S5 Details of the 47 wheat varieties used in this study.

ª 2012 The Authors

Plant Biotechnology Journal ª 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd, Plant Biotechnology Journal, 1–17

Wheat exome-based co-dominant SNPs 17