pi & project coordinator: calvin qualset project manager: patrick mcguire objectives 1 and 2....

1
PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics Coordinator: Olin Anderson Objective 3. Mapping Coordinator: Bikram Gill Objective 4. Functional Genomics Coordinator: Mark Sorrells Objective 6. Genome Structure & Evolution Coordinator: Jan Dvořák EST Arrays SAGE Sequence Matching Deletion Mapping Comparative Mapping Objs. 1 & 2. EST Production cDNA libraries Screening/normalizations Sequencing Data analysis DNA storage/distribution Obj. 6. Genome Structure & Evolution Obj. 3. Mapping Obj. 4. Functional Genomics Objectives and coordination structure b. The second round of global phrap assembly was done on April 1, 2002 on 77,022 Project ESTs. Of these, 70,074 were 5' ESTs and 6,948 were 3' ESTs. They were assembled into 11,758 contigs. c. ESTs selected from each contig and those of unassembled ESTs are the resources for singleton selection. Altogether, about 32,000 ESTs are in this resource pool for further screening. d. ESTs containing sequences found to match retro-elements, E. coli, phage, mitochondrial, and chloro-plast gene sequences are removed. Sequence com- parison is done using the cross_match program. e. Validation process—Redundant ESTs are further screened and removed by comparing 3' sequence data with 3' sequences of previously identified sing-letons. The resulting singletons were rearrayed for probe distribution. 2. Mapping results As of Aug. 27, 8,789 probes have been sent out to the 10 labs and mapping data have been returned for 46% of the distributed probes. At Albany, map- ping data are processed to display the mapped probes by chromosome bin position, defined by the deletion line break points. The Project has assigned a coordinator from among the investigators for each homoeologous chromosome group who review and validate the assignments of the probe locations. Each probe may identify more than one loci and, at this point, the validated probe locations account for 7,985 individual loci mapped to chromosome bins. Approach: Initially, the plan was to produce and analyze with respect to function microarrays of the mapped EST singletons in 10 labs focusing on five aspects of wheat reproduction. As a result of an NSF mid-term site review, the plan now is to devel-op a test array, organize and hold a training work-shop in microarray construction and analysis for Project personnel, and evaluate microarray produc-tion strategies for wheat. Status: Technology development for using cDNAs in microarray analysis has been initiated at Albany. All equipment (arrayer and scanner) are in place and operational in the Albany labs of O.D. Ander-son and D. Laudencia- Chingcuanco, and printing of a limited number of a test array is underway. RNA has been prepared to test this initial array and begin evaluation of analysis software options. The training workshop was held in August (see Training). An evaluation of the suitability of arrays of long oli-gonucleotides for transcriptional analysis in wheat is being carried out. Information sought 4. To determine functional activity of the mapped ESTs relevant to reproductive biology of wheat. Approach: Develop and enhance means to ana-lyze, interpret, and visualize Project data (data pro-cessing, database modifications, and web page maintenance). Status: Protocols were established for data entry and the linking of the EST data to the records for the mapped loci. All the mapping laboratories participate by submitting hybridization results through a web- based interface to the central bioinformatics site in Albany. The information is parsed through Perl scripts which prepared the submitted information for database entry. All hybridizations were scanned by each submitting lab and formatted to an image tem-plate, and then submitted to this central database. To date, 3408 images are on line. Data are viewable at the public website (http://wheat. pw.usda.gov/NSF/). An interface was developed for the mapping coordinators to survey results and veri-fy scoring of the results. Both validated (“Confirmed”) and nonvalidated (“Unconfirmed”) data are presen-ted along with a disclaimer making clear the prelim-inary nature of the unconfirmed locations. Using a relational database built with mySQL, several dis-play options are available to users through queries with such criteria as location, status of verification, or mapping lab origin. One database uses the ACEDB biologically oriented database program and the other uses the mySQL relational database program. Information from ACEDB is available through the webace/AceBrow-ser interface which also includes links to EST and contig assembly information. The ACEDB display is familiar to many of the Triticeae working laboratories familiar with GrainGenes. The mySQL relational database also has these links and, in addition, con-tains specialized constructions for data- mining the relationships of loci to ESTs and contig assembly information. The mySQL database is a version built for efficient mining of the archived information. A user-friendly link to EST data from map information was also created. Databases allow linking of the mapped information to other information associated with the EST project. In some cases, external links are made to resource sites and related projects. 5. To process, analyze, and display data accumulated in this project (bioinformatics). 6. To analyze gene density and distribution of mapped ESTs and thus genes in the wheat genomes (genome structure and evolution). Approach: Analyze densities and distributions of ESTs in deletion maps. Status: The database of mapped ESTs became large enough in this past year to allow (1) study of wheat transcriptome structure and evolution and (2) comparisons of wheat ESTs with sequence informa-tion from other taxa. For (1), one manuscript, direc-ted by co-PI Dvořák and postdoc E. Akhunov, has been submitted and a second is in preparation. For (2), a manuscript, directed by co-PI Sorrells, is in preparation. 1. Analyzing 3977 ESTs mapped into chromosome deletion bins, it was found that single-gene loci that were not subjected to gene duplication and loci an-cestral to duplicated loci are most frequently found in proximal chromosome regions, while multi-gene loci and loci derived by duplication are most fre-quently found in distal chromosome regions. This distribution correlated with increasing recombination rates from centromere to telomere along chromo-some arms. It is suggested that recombination has played a central role in evolution of wheat transcrip-tome structure and that microsynteny of the wheat transcriptome is diverging faster where recombi-nation is higher. 2. Analyzing 2835 ESTs mapped into chromosome deletion bins and segregating populations in com-parison to the public rice genome sequence data from ordered BAC/PAC clones, revealed strong similarities between the resulting DNA sequence-based comparative map and previously published comparative maps based on RFLPs. While there appears to be extensive conservation of both gene content and order at the resolution conferred by the physical chromosome deletions in the wheat genome, there has also been an abundance of rearrangements, insertions, deletions, and dupli-cations that may complicate the use of rice as a model for cross- species transfer of information in nonconserved regions. Bioinformatics personnel: Data Curator Shiaoman Chao, based at Albany, and Bioinformatics Pro-grammer Hugh Edwards, based at Cornell Univ., are supported by collaboration with USDA ARS bioinformatics specialists in Albany (G. Lazo) and at Cornell (D. Matthews). Objectives, approaches, and status after 36 months (9/1/99–8/31/02) 2. To determine the base-pair sequence of these cDNAs, yielding ESTs. Approach: In-house, single-site 5' sequencing of approx. 3000 clones in at least 30 libraries, with 3' sequencing of putative singletons. Status: Sequencing has been carried out at O. An-derson’s lab, Albany CA. To date, over 90,000 5'-sequenced ESTs have been generated from 41 of the libraries (Table 1). Library quality was evaluated based on (1) number of empty clones or clones con-taining vector sequence or short adapter sequence only, (2) number of clones containing ribosomal RNA sequence contamination, (3) number of clones with reversed orientation (most of the libraries were made with cDNAs cloned in the fixed direction). Library complexity was evaluated based on the level of clone redundancy using the method of comparing all the 5' ESTs within each library. ESTs are consi-dered redundant if they show a degree of similarity and overlapping with other ESTs. These ESTs can be grouped and assembled together into a contig. Representatives from each contig and those ESTs not forming contigs are singleton candidates. Those libraries exhibiting the highest proportions of single-ton candidates are considered to be of higher com- plexity, thus worth extensive sampling. EST assem-bly analysis was carried out among libraries. This analysis has indicated that among the 90,000 ESTs generated so far, about 22,000 are singleton candi-dates (Table 1). More analysis is underway to char-acterize and identify unique gene sequences. 1. To produce cDNA libraries from as many tis-sue and condition combinations as possible. Approach: Produce multiple cDNA libraries from mRNAs isolated in several labs with a target of 30 total libraries. Status: This work is essentially completed. 50 cDNA libraries are now available to the Project. 28 were made at T. Close’s lab at the University of California, Riverside, eight are from H. Nguyen’s lab at Texas Tech University, and 14 were contri-buted from other sources. Tissue sources included spikes sampled at various developmental stages, anther, embryo, endosperm, young seedling, root, crown, and flag leaf and sheath. Tissues were sam-pled under various treatments, such as drought stress, cold stress, salt stress, aluminum stress, ABA treatment, and vernalization. Of these libraries, 41 have been used to date for ESTs (Table 1). In year 2, the B. Gill lab (KSU) held a workshop (Feb. 11–16, 2001) for the postdocs from the 10 mapping labs to ensure standard mapping and data entry protocols. Also in year 2, Project PIs were successful with an NSF REU proposal to support participation of 13 under- graduates in Project labs. In year 3, a microarray production and analysis work-shop was held for 8 Project postdocs and graduate students in the D. Laudencia-Chingcuanco lab (USDA-ARS, Albany) (Aug. 12–16, 2002). Training The Project’s goal is to generate and map a large number of unique DNA sequences from the bread wheat genomes. The assumption is that these unique DNA sequences will correspond to individual genes of wheat and their identification is a first step in determining gene function. The ultimate use of this information is the improvement of wheat quality, yield, and adaptability to new and marginal environ-ments, thus increasing production. Because of the large size of the wheat genomes, it is unlikely that the actual base-pair sequences of the DNA molecules will be learned completely in the near future. This Project takes an alternative strat-egy to realize the benefits of new techniques for discovering genes and learning their function. Fol-lowing the identification of 10,000 unique wheat DNA sequences (termed ESTs, Expressed Se-quence Tags), they will be mapped to their physical location on wheat chromosomes using a set of deletion stocks. The information gathered on the sequence and position of these genes in the wheat chromosomes is publicly available, distributed by means of the website created for this Project. The results from this Project will be immediately applicable to other crops, because of the close rela-tionship of wheat to other species in the Triticeae tribe and other grass species, especially corn and rice. The diversity of experimental techniques and traits pursued in the individual laboratories collabor-ating on this Project is an ideal training ground for graduate students and postdoctoral scientists. The large pool of well- characterized and mapped unique DNA sequences, available in the public domain will be an exceedingly important resource for future Triticeae research and basic functional genomics research. Introduction DBI-9975989 The Structure and Function of the Expressed Portion of the Wheat Genomes Contract Agreement DBI- 9975989 3. To map into wheat deletion stocks a set of 10,000 unique ESTs. Approach: Map EST singletons into bins defined by wheat deletion stocks; target is 10,000 mapped singletons. Status: 1. Singleton selection strategy: a. Processed 5' ESTs were searched against NCBI’s nonredundant nucleotide (blastn) and pro-tein (blastx) databases. Distribution of research investigators by objective Obj. 6 Genome structure & evolution Investigator Obj. 1 cDNA libraries Obj. 2 cDNA sequencing Obj. 3 Mapping Obj. 4 Functional genomics Obj. 5 Bioinfor- matics OD Anderson UCDavis/ARS X* X* X X* TJ Close UCRiverside X X HT Nguyen U Missouri X X X BS Gill Kansas State U X* X X ME Sorrells Cornell U X X* X J Dvořák UCDavis X X J Dubcovsky UCDavis X X KS Gill Wash State U X X JP Gustafson U Mo/ARS X X SF Kianian N Dak State U X X JA Anderson U Minn X NLV Lapitan Colo State U X CM Steber Wash State U/ARS X * designates coordinator for the corresponding objective. X* X TA001E1X endosperm (Cheyenne) 2,728 417 1,125 305 TA001E1S endosperm subtracted (Cheyenne) 269 23 218 55 TA005E1X dehydrated seedling 795 82 622 168 TA006E1X unstressed shoot 2,261 375 1,224 433 TA006E2N unstressed shoot normalized 1,686 336 268 52 TA006E3N unstressed shoot normalized 1,672 139 1,338 836 TA007E1X cold-stressed seedling 938 107 696 181 TA007E3S cold-stressed seedling subtracted1555 203 956 816 TA008E1X etiolated root 4,017 643 2,143 747 TA008E3N etiolated root normalized 4,308 963 1,739 702 TA009XXX spike (Sumai3) 10,287 1,854 4,881 3,003 TA012XXX ABA-treated embryo (Brevor) 2,207 264 1,491 625 TA015E1X heat-stressed seedling 821 100 567 200 TA016E1X vernalized crown 2,286 283 1,555 496 TA017E1X 20 to 45 DAP spike 1,076 127 422 119 TA018E1X 5 to 15 DAP spike 2,860 415 1,581 499 TA019E1X pre-anthesis spike 11,194 1,754 5,201 2,766 TA027E1X drought-stressed leaf (TAM W101) 905 94 635 231 TA031E1X heat-stressed flag leaf 973 86 710 243 TA032E1X heat-stressed spike 1,012 97 716 259 TA036E1X drought-stressed leaf 641 55 485 165 TA037E1X salt-stressed sheath 964 123 559 166 TA038E1X salt-stressed crown 943 75 743 207 TA047E1X root tip 959 125 682 178 TA048E1X Al-stressed root tip (BH1146) 991 143 646 214 TA049E1X dormant embryo (Brevor) 2,927 438 1,519 714 TA055E1X drought-stressed root 1,023 116 769 345 TA056E1X Al-stressed root tip 1,032 174 657 219 TA058E1X unstressed root at tiller stage 1,025 127 770 286 TA059E1X whole grain (Butte) 3,649 624 1451 509 TA065E1X salt-stressed root 2,055 288 1385 585 TA066E1X mixed tissue 1,404 211 864 303 TM011XXX vegetative apex (acc. DV92) 3031 432 1,906 937 TM043E1X early reproductive apex (acc. DV92)2,647 382 1,516 673 TT039E1X whole plant (Langdon-16) 1,194 123 765 241 SC010XXX Al-stressed root tip (Blanco) 1,198 105 905 457 SC013XXX control root tip (Blanco) 778 57 649 319 SC024E1X anther (Blanco) 4,631 639 1,994 987 AS040E1X anther 2,466 330 1,408 591 AS067E1X anther 1,044 134 695 231 Total (9/23/02) 91,715 22,001 *In the Name field, TA indicates Triticum aestivum, TM is T. monococcum, TT is T. turgidum, SC is Secale cereale, and AS is Aegilops speltoides. All of the TA libraries are from the Chinese Spring genotype except where indicated otherwise in parentheses in the Tissue field. Table 1. Sequencing status by library Within a library Among all libraries No. unassem- No. ESTs Name* Tissue No. ESTs No. contigs bled ESTs (unique to library)

Upload: sibyl-neal

Post on 17-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: PI & Project Coordinator: Calvin Qualset Project Manager: Patrick McGuire Objectives 1 and 2. EST Production Coordinator: Olin Anderson Objective 5. Bioinformatics

PI & Project Coordinator: Calvin QualsetProject Manager: Patrick McGuire

Objectives 1 and 2.EST Production

Coordinator:Olin Anderson

Objective 5.Bioinformatics

Coordinator:Olin Anderson

Objective 3.Mapping

Coordinator:Bikram Gill

Objective 4.Functional Genomics

Coordinator:Mark Sorrells

Objective 6.Genome Structure & Evolution

Coordinator:Jan Dvořák

EST Arrays

SAGE

SequenceMatching

DeletionMapping

ComparativeMapping

Objs. 1 & 2.EST Production

cDNA librariesScreening/normalizationsSequencingData analysisDNA storage/distribution

Obj. 6.Genome

Structure &Evolution

Obj. 3.Mapping

Obj. 4.FunctionalGenomics

Objectives and coordination structure

b. The second round of global phrap assembly was done on April 1, 2002 on 77,022 Project ESTs. Of these, 70,074 were 5' ESTs and 6,948 were 3' ESTs. They were assembled into 11,758 contigs.

c. ESTs selected from each contig and those of unassembled ESTs are the resources for singleton selection. Altogether, about 32,000 ESTs are in this resource pool for further screening.

d. ESTs containing sequences found to match retro-elements, E. coli, phage, mitochondrial, and chloro-plast gene sequences are removed. Sequence com-parison is done using the cross_match program.

e. Validation process—Redundant ESTs are further screened and removed by comparing 3' sequence data with 3' sequences of previously identified sing-letons. The resulting singletons were rearrayed for probe distribution.

2. Mapping results

As of Aug. 27, 8,789 probes have been sent out to the 10 labs and mapping data have been returned for 46% of the distributed probes. At Albany, map-ping data are processed to display the mapped probes by chromosome bin position, defined by the deletion line break points. The Project has assigned a coordinator from among the investigators for each homoeologous chromosome group who review and validate the assignments of the probe locations. Each probe may identify more than one loci and, at this point, the validated probe locations account for 7,985 individual loci mapped to chromosome bins.

Approach: Initially, the plan was to produce and analyze with respect to function microarrays of the mapped EST singletons in 10 labs focusing on five aspects of wheat reproduction. As a result of an NSF mid-term site review, the plan now is to devel-op a test array, organize and hold a training work-shop in microarray construction and analysis for Project personnel, and evaluate microarray produc-tion strategies for wheat.

Status: Technology development for using cDNAs in microarray analysis has been initiated at Albany. All equipment (arrayer and scanner) are in place and operational in the Albany labs of O.D. Ander-son and D. Laudencia-Chingcuanco, and printing of a limited number of a test array is underway. RNA has been prepared to test this initial array and begin evaluation of analysis software options.

The training workshop was held in August (see Training).

An evaluation of the suitability of arrays of long oli-gonucleotides for transcriptional analysis in wheat is being carried out. Information sought include an estimate of the optimal size of oligos to represent the wheat ESTs and a comparison of an oligo micro-array with a cDNA array (PIs Steber, Sorrells, and K.S. Gill).

4.To determine functional activity of the mapped ESTs relevant to reproductive biology of wheat.

Approach: Develop and enhance means to ana-lyze, interpret, and visualize Project data (data pro-cessing, database modifications, and web page maintenance).

Status: Protocols were established for data entry and the linking of the EST data to the records for the mapped loci. All the mapping laboratories participate by submitting hybridization results through a web-based interface to the central bioinformatics site in Albany. The information is parsed through Perl scripts which prepared the submitted information for database entry. All hybridizations were scanned by each submitting lab and formatted to an image tem-plate, and then submitted to this central database. To date, 3408 images are on line.

Data are viewable at the public website (http://wheat. pw.usda.gov/NSF/). An interface was developed for the mapping coordinators to survey results and veri-fy scoring of the results. Both validated (“Confirmed”) and nonvalidated (“Unconfirmed”) data are presen-ted along with a disclaimer making clear the prelim-inary nature of the unconfirmed locations. Using a relational database built with mySQL, several dis-play options are available to users through queries with such criteria as location, status of verification, or mapping lab origin.

One database uses the ACEDB biologically oriented database program and the other uses the mySQL relational database program. Information from ACEDB is available through the webace/AceBrow-ser interface which also includes links to EST and contig assembly information. The ACEDB display is familiar to many of the Triticeae working laboratories familiar with GrainGenes. The mySQL relational database also has these links and, in addition, con-tains specialized constructions for data-mining the relationships of loci to ESTs and contig assembly information. The mySQL database is a version built for efficient mining of the archived information. A user-friendly link to EST data from map information was also created.

Databases allow linking of the mapped information to other information associated with the EST project. In some cases, external links are made to resource sites and related projects.

Co-PI Close has contributed to the annotations for the Project cDNA libraries that are available from the Project website. In addition, he and a program-mer (Steve Wanamaker) have developed a stand-alone tool for creating contig assemblies of EST data (HarvEST, http://harvest.ucr.edu). This data-base integrates all Triticeae EST data, including wheat and rye ESTs generated by this Project and the CUGI Barley EST Project (http://www.genome. clemson.edu/projects/barley/) and allows analyses of the relationships of ESTs assembled into contigs and their cDNA library of origin.

5.To process, analyze, and display data accumulated in this project (bioinformatics).

6.To analyze gene density and distribution of mapped ESTs and thus genes in the wheat genomes (genome structure and evolution).

Approach: Analyze densities and distributions of ESTs in deletion maps.

Status: The database of mapped ESTs became large enough in this past year to allow (1) study of wheat transcriptome structure and evolution and (2) comparisons of wheat ESTs with sequence informa-tion from other taxa. For (1), one manuscript, direc-ted by co-PI Dvořák and postdoc E. Akhunov, has been submitted and a second is in preparation. For (2), a manuscript, directed by co-PI Sorrells, is in preparation.

1. Analyzing 3977 ESTs mapped into chromosome deletion bins, it was found that single-gene loci that were not subjected to gene duplication and loci an-cestral to duplicated loci are most frequently found in proximal chromosome regions, while multi-gene loci and loci derived by duplication are most fre-quently found in distal chromosome regions. This distribution correlated with increasing recombination rates from centromere to telomere along chromo-some arms. It is suggested that recombination has played a central role in evolution of wheat transcrip-tome structure and that microsynteny of the wheat transcriptome is diverging faster where recombi-nation is higher.

2. Analyzing 2835 ESTs mapped into chromosome deletion bins and segregating populations in com-parison to the public rice genome sequence data from ordered BAC/PAC clones, revealed strong similarities between the resulting DNA sequence-based comparative map and previously published comparative maps based on RFLPs. While there appears to be extensive conservation of both gene content and order at the resolution conferred by the physical chromosome deletions in the wheat genome, there has also been an abundance of rearrangements, insertions, deletions, and dupli-cations that may complicate the use of rice as a model for cross-species transfer of information in nonconserved regions.

Bioinformatics personnel: Data Curator Shiaoman Chao, based at Albany, and Bioinformatics Pro-grammer Hugh Edwards, based at Cornell Univ., are supported by collaboration with USDA ARS bioinformatics specialists in Albany (G. Lazo) and at Cornell (D. Matthews).

Objectives, approaches, and status after 36 months (9/1/99–8/31/02)

2.To determine the base-pair sequence of these cDNAs, yielding ESTs.

Approach: In-house, single-site 5' sequencing of approx. 3000 clones in at least 30 libraries, with 3' sequencing of putative singletons.

Status: Sequencing has been carried out at O. An-derson’s lab, Albany CA. To date, over 90,000 5'-sequenced ESTs have been generated from 41 of the libraries (Table 1). Library quality was evaluated based on (1) number of empty clones or clones con-taining vector sequence or short adapter sequence only, (2) number of clones containing ribosomal RNA sequence contamination, (3) number of clones with reversed orientation (most of the libraries were made with cDNAs cloned in the fixed direction).

Library complexity was evaluated based on the level of clone redundancy using the method of comparing all the 5' ESTs within each library. ESTs are consi-dered redundant if they show a degree of similarity and overlapping with other ESTs. These ESTs can be grouped and assembled together into a contig. Representatives from each contig and those ESTs not forming contigs are singleton candidates. Those libraries exhibiting the highest proportions of single-ton candidates are considered to be of higher com-plexity, thus worth extensive sampling. EST assem-bly analysis was carried out among libraries. This analysis has indicated that among the 90,000 ESTs generated so far, about 22,000 are singleton candi-dates (Table 1). More analysis is underway to char-acterize and identify unique gene sequences.

1.To produce cDNA libraries from as many tis-sue and condition combinations as possible.

Approach: Produce multiple cDNA libraries from mRNAs isolated in several labs with a target of 30 total libraries.

Status: This work is essentially completed. 50 cDNA libraries are now available to the Project. 28 were made at T. Close’s lab at the University of California, Riverside, eight are from H. Nguyen’s lab at Texas Tech University, and 14 were contri-buted from other sources. Tissue sources included spikes sampled at various developmental stages, anther, embryo, endosperm, young seedling, root, crown, and flag leaf and sheath. Tissues were sam-pled under various treatments, such as drought stress, cold stress, salt stress, aluminum stress, ABA treatment, and vernalization. Of these libraries, 41 have been used to date for ESTs (Table 1).

In year 2, the B. Gill lab (KSU) held a workshop (Feb. 11–16, 2001) for the postdocs from the 10 mapping labs to ensure standard mapping and data entry protocols.

Also in year 2, Project PIs were successful with an NSF REU proposal to support participation of 13 under-

graduates in Project labs.

In year 3, a microarray production and analysis work-shop was held for 8 Project postdocs and graduate students in the D. Laudencia-Chingcuanco lab (USDA-ARS, Albany) (Aug. 12–16, 2002).

Training

The Project’s goal is to generate and map a large number of unique DNA sequences from the bread wheat genomes. The assumption is that these unique DNA sequences will correspond to individual genes of wheat and their identification is a first step in determining gene function. The ultimate use of this information is the improvement of wheat quality, yield, and adaptability to new and marginal environ-ments, thus increasing production.

Because of the large size of the wheat genomes, it is unlikely that the actual base-pair sequences of the DNA molecules will be learned completely in the near future. This Project takes an alternative strat-egy to realize the benefits of new techniques for discovering genes and learning their function. Fol-lowing the identification of 10,000 unique wheat DNA sequences (termed ESTs, Expressed Se-quence Tags), they will be mapped to their physical

location on wheat chromosomes using a set of deletion stocks. The information gathered on the sequence and position of these genes in the wheat chromosomes is publicly available, distributed by means of the website created for this Project.

The results from this Project will be immediately applicable to other crops, because of the close rela-tionship of wheat to other species in the Triticeae tribe and other grass species, especially corn and rice. The diversity of experimental techniques and traits pursued in the individual laboratories collabor-ating on this Project is an ideal training ground for graduate students and postdoctoral scientists. The large pool of well-characterized and mapped unique DNA sequences, available in the public domain will be an exceedingly important resource for future Triticeae research and basic functional genomics research.

Introduction

DBI-9975989

The Structure and Functionof the Expressed Portionof the Wheat Genomes

Contract Agreement DBI-9975989

3.To map into wheat deletion stocks a set of 10,000 unique ESTs.

Approach: Map EST singletons into bins defined by wheat deletion stocks; target is 10,000 mapped singletons.

Status: 1. Singleton selection strategy:

a. Processed 5' ESTs were searched against NCBI’s nonredundant nucleotide (blastn) and pro-tein (blastx) databases.

Distribution of research investigatorsby objective

Obj. 6Genome

structure &evolutionInvestigator

Obj. 1cDNA

libraries

Obj. 2cDNA

sequencingObj. 3

Mapping

Obj. 4Functionalgenomics

Obj. 5Bioinfor-matics

OD AndersonUCDavis/ARS

X* X* X X*

TJ CloseUCRiverside

X X

HT NguyenU Missouri

X X X

BS GillKansas State U

X* X X

ME SorrellsCornell U

X X* X

J DvořákUCDavis

X X

J DubcovskyUCDavis

X X

KS GillWash State U

X X

JP GustafsonU Mo/ARS X X

SF KianianN Dak State U

X X

JA AndersonU Minn

X

NLV LapitanColo State U

X

CM SteberWash State U/ARS

X

* designates coordinator for the corresponding objective.

X*

X

TA001E1X endosperm (Cheyenne) 2,728 417 1,125 305TA001E1S endosperm subtracted (Cheyenne) 269 23 218 55TA005E1X dehydrated seedling 795 82 622 168TA006E1X unstressed shoot 2,261 375 1,224 433TA006E2N unstressed shoot normalized 1,686 336 268 52TA006E3N unstressed shoot normalized 1,672 139 1,338 836TA007E1X cold-stressed seedling 938 107 696 181TA007E3S cold-stressed seedling subtracted 1555 203 956 816TA008E1X etiolated root 4,017 643 2,143 747TA008E3N etiolated root normalized 4,308 963 1,739 702TA009XXX spike (Sumai3) 10,287 1,854 4,881 3,003TA012XXX ABA-treated embryo (Brevor) 2,207 264 1,491 625TA015E1X heat-stressed seedling 821 100 567 200TA016E1X vernalized crown 2,286 283 1,555 496TA017E1X 20 to 45 DAP spike 1,076 127 422 119TA018E1X 5 to 15 DAP spike 2,860 415 1,581 499TA019E1X pre-anthesis spike 11,194 1,754 5,201 2,766TA027E1X drought-stressed leaf (TAM W101) 905 94 635 231TA031E1X heat-stressed flag leaf 973 86 710 243TA032E1X heat-stressed spike 1,012 97 716 259TA036E1X drought-stressed leaf 641 55 485 165TA037E1X salt-stressed sheath 964 123 559 166TA038E1X salt-stressed crown 943 75 743 207TA047E1X root tip 959 125 682 178TA048E1X Al-stressed root tip (BH1146) 991 143 646 214TA049E1X dormant embryo (Brevor) 2,927 438 1,519 714TA055E1X drought-stressed root 1,023 116 769 345TA056E1X Al-stressed root tip 1,032 174 657 219TA058E1X unstressed root at tiller stage 1,025 127 770 286TA059E1X whole grain (Butte) 3,649 624 1451 509TA065E1X salt-stressed root 2,055 288 1385 585TA066E1X mixed tissue 1,404 211 864 303TM011XXX vegetative apex (acc. DV92) 3031 432 1,906 937TM043E1X early reproductive apex (acc. DV92) 2,647 382 1,516 673TT039E1X whole plant (Langdon-16) 1,194 123 765 241SC010XXX Al-stressed root tip (Blanco) 1,198 105 905 457SC013XXX control root tip (Blanco) 778 57 649 319SC024E1X anther (Blanco) 4,631 639 1,994 987AS040E1X anther 2,466 330 1,408 591AS067E1X anther 1,044 134 695 231Total (9/23/02) 91,715 22,001*In the Name field, TA indicates Triticum aestivum, TM is T. monococcum, TT is T. turgidum, SC is Secale cereale, and AS is Aegilops speltoides. All of the TA libraries are from the Chinese Spring genotype except where indicated otherwise in parentheses in the Tissue field.

Table 1. Sequencing status by library Within a library Among all libraries

No. unassem- No. ESTsName* Tissue No. ESTs No. contigs bled ESTs (unique to library)