functional genomics and bioinformatics applied to understanding oxidative stress resistance in...
Post on 20-Dec-2015
221 views
TRANSCRIPT
Functional Genomics and Bioinformatics
Applied to UnderstandingApplied to UnderstandingOxidative Stress ResistanceOxidative Stress Resistance
in Plantsin Plants
Ruth Grene AlscherLenwood S. Heath
Virginia TechDecember 14, 2001
Overview• Organization of our group• About environmental stress and reactive oxygen
species (ROS)• Plant responses to ROS• Analysis of responses to stress on a chip -
microarray technology• Expresso: management system for microarrays
– Managing expression experiments– Analyzing expression data– Reaching conclusions
• Where do we go from here?
Boris Chevone
Ron Sederoff NCSU
Dawei Chen
Ruth AlscherLenny Heath Naren Ramakrishnan,
Keying Ye
Len van Zyl
NCSU
Carol Loopstra
Texas A and M
Jonathan Watkinson
Margaret Ellis
Logan Hanks
Senior Collaborators
Students: VT
Cecilia Vasquez
Detection of stress -mediated gene expression effects on microarrays
Computational tools to infer interaction among genes, pathways
Revised / New Tools and
Experiments
Genetic Regulatory Networks
Test inferences with varying conditions
and genotypes
1
2
3
4
Iterative strategy for detection of stress -mediated effects on gene expression using microarrays
and CS expertise
Expresso
• Plants adapt to changing environmental conditions through global cellular responses involving successive changes in, and interactions among, expression patterns of numerous genes.
• Our group studies these changes through a combination of bioinformatics and genomic techniques.
Plant Response to Stress
• Biological: To identify molecular stress resistance mechanisms in tree and crop species.
•Bioinformatic: To support iterative experimentation in plant genomics, capture and analyze experimental data, integrate biological information from diverse sources, and close the experimental loop.
Long Term Goals
The Paradox of Aerobiosis
• Oxygen is essential, but toxic.
• Aerobic cells face constant danger from reactive oxygen species (ROS).
• ROS can act as mutagens, they can cause lipid peroxidation and denature proteins.
ROS Arise as a Result of Exposure to:
• Ozone
• Sulfur dioxide
• High light
• Herbicides
• Extremes of temperature
• Salinity
• Drought
Free Radicals
Responses to Environmental Signals
Redox Regulation of Cellular Systems
Membrane Receptors
Environmental Stress
Metabolite Defense
Protein kinases; phosphatases
Transcription factors
Gene Expression
Defense, Repair, Apoptosis
Prooxidants (ROS)Antioxidants
Scenarios for Effects of Abiotic Stress on Gene Expression in Plants
Drought Stress Responses in Loblolly Pine: Questions to be
Addressed• Can a hierarchy of drought stress resistance mechanisms be identified ?
• Can a clear distinction be made between rapidly responding and long term adaptational mechanisms?
• Can particular subgroups within gene families be associated with drought tolerance?
Hypotheses
• There is a group of genes whose expression confers resistance to drought stress.
• Based on previous work increased expression of defense genes is co-regulated and is correlated with resistance to oxidative stress. Failure to cope is correlated with little or no defense gene activation. Candidate resistance genes follow this pattern of expression.
• A common core of defense genes exists, which responds to several different stresses.
Components of Stress StudyPine Drought
Stress Experiments
Expresso Prototype
Design and Print Microarrays
Select Pine cDNAs 384, 2400 (1999, 2001)
Design Functional Hierarchy
Capture Spot Intensities
Integrate and Analyze
Inductive Logic Programming (ILP)
Imposition of Successive Cycles of Mild or Severe Drought Stresson 1-year-old Loblolly Pine Seedlings
0
-2
-10
-15
DAYS
wat
er p
oten
tial
(b
ars)
RNAHarvest
I
RNAHarvest
II
RNAHarvest
III
RNAHarvest
IV
Cycles ofMild
DroughtStress
DR
Y D
OW
N
DR
Y D
OW
N
DR
Y D
OW
N
DR
Y D
OW
N
= PS (photosynthesis)
0
-2
-10
-15
DAYS
wat
er p
oten
tion
al (
bar
s)
Cycle
ICycle
IICycle
III
RNAHarvest
I
RNAHarvest
II
RNAHarvest
III
Cycles ofSevere
DroughtStress
DRY DOW
N
DRY DOW
N
DRY DOW
N
Water withheld
Water given
Water given
Water given
Water given
Water withheld
Water withheld
Water withheld
Water given
Water given
Water given
Water withheld
Water withheld
Water withheld
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
Categories within Protective and Protected Processes
Plant Growth Regulation
Environmental
Change
GeneExpression
SignalTransduction
ProtectiveProcesses
ProtectedProcesses
ROS and Stress
Cell Wall Related
PhenylpropanoidPathway
Development
Metabolism
Chloroplast Associated
Carbon Metabolism
Respiration and Nucleic Acids
Mitochondrion
Cells
Tissues
Cytoskeleton
Secretion
Trafficking
Nucleus
Protease-associated
ProtectiveProcesses
Stress
Cell Wall Related
PhenylpropanoidPathway
AbioticBiotic
Antioxidant Processes
Drought
HeatNon-Plant
Xenobiotics
NADPH/Ascorbate/GlutathioneScavenging Pathway
Cytosolicascorbateperoxidase
Dehydrins, Aquaporins
Heat shock proteins(Chaperones)
superoxidedismutase-Fe
superoxidedismutase-Cu-Zn
glutathionereductase
Sucrose Metabolism
Cellulose
Arabionogalactan proteins
Hemicellulose
Pectins
Xylose
Other Cell Wall Proteins
isoflavone reductases
phenylalanine ammonia-lyases
S-adenosylmethionine decarboxylases
glycine hydromethyltransferases
Lignin Biosynthesis CCoAOMTs
4-coumarate-CoAligases
cinnamyl-alcoholdehydrogenase
Chaperones“IsoflavoneReductases”
GSTs
Extensins and proline rich proteins
Categorieswithin
“Protective Processes”
Hypotheses versus Results –1999 Expt
o Among the genes responding positively to mild stress, there exists a population of genes whose expression is negative or unchanged under severe stress. – Candidate stress resistance genes. Genes in 69
categories ( e.g. HSP70s and 100s, aquaporins, but not HSP80s) responded positively to mild stress. Effect of severe stress was not detectable or negative.
Genes associated with other stresses responded to drought stress
–Isoflavone reductase homologs and GSTs responded positively to mild drought stress.
–These categories are previously documented to respond to biotic stress and xenobiotics, respectively.
–However, both isoflavone reductase homologs and GSTs responded positively also to severe drought stress. Thus, they do not fall into the category of candidate stress resistance genes.
Hypotheses versus Results –1999 Experiment
Candidate Categories
• Include– Aquaporins– Dehydrins– Heat shock proteins/chaperones
• Exclude– Isoflavone reductases
Flow of a Microarray Experiment
Hypotheses
Select cDNAs
PCR
Test of Hypotheses
Extract RNA
Replication and Randomization
Reverse Transcription and
Fluorescent Labeling
Robotic Printing
Hybridization
Identify Spots
Intensities
Statistics
Clustering
Data Mining, ILP
• Selected 384 archived ESTs
• Organized into four 96-well microtitre source plates after PCR
• Pipetted into 8 sets of four randomized microtitre plates
• Each set is a different randomized arrangement of the 384 ESTs
Design of Microarrays I ---Randomization
• Printed type A microarrays from first four sets (16 plates); printed type B microarrays from second four sets
• Each array type has four replicates of each EST, randomly placed
• Each comparison was performed with four different hybridizations, with dyes reversed in two
• Total of 16 replicates of each EST in each comparison
Design of Microarrays II ---Replication
• Image Analysis: gridding, spot identification, intensity and background calculation, normalization
• Statistics:• Fold or ratio estimation• Combining replicates
• Higher-level Analysis:• Clustering methods• Inductive logic programming (ILP)
Spot and Clone Analysis
Spot Identification and Intensity Analysis
• Microarray Suite: Manual grid; extract intensities for each spot; compute ratios; compute calibrated ratios
• Spot Statistics: – Every calibrated ratio is divided by the mean of all
the uncalibrated ratios; the result is simply that the mean of the calibrated ratios is 1.0
– Our tools use the logarithm of each calibrated ratio– Positive: expression increase– Negative: expression decrease– Zero: no change in expression
Analysis of Expression Data
• The multiple (typically 16) log calibrated ratios for a replicated clone do NOT follow a normal distribution.
• Distribution is spread relatively evenly over a large range.
• Statistical analysis based on mean and standard deviation will be overly pessimistic in identifying clones that are up- or down-expressed.
• From the observation of an even spread of the log ratios, we assume that a clone whose expression is not different from a probe pair will show a distribution centered at a mean log ratio of 0.0.
Computational Methods ---Alternate Assumptions
• Our more general assumption avoids the trap of having to classify the response of each SPOT; rather, we classify the response of an EST as one of
– Up-regulated
– Down-regulated
– No clear change
• Response CLASSIFICATION rather than QUANTIFICATION allows us to develop unified relationships among genes and among treatments.
• Provides sufficient results for the use of inductive logic programming (ILP).
Data Mining:Inductive Logic Programming
• ILP is a data mining algorithm expressly designed for inferring relationships.
• By expressing relationships as rules, it provides new information and resultant testable hypotheses.
• ILP groups related data and chooses in favor of relationships having short descriptions.
• ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications).
• Infers rules relating gene expression levels to categories, both within a probe pair and across probe pairs, without explicit direction
• Example Rule:[Rule 142] [Pos cover = 69 Neg cover = 3]
level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive).
• Interpretation:“If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”
Rule Inference in ILP
ILP subsumes two forms of reasoning
• Unsupervised learning– “Find clusters of genes that have similar/consistent
expression patterns”
• Supervised learning– “Find a relationship between a priori functional
categories and gene expression”
• Hybrid reasoning: Information Integration– “Is there a relationship between genes in a given
functional category and genes in a particular expression cluster?”
– ILP mines this information in a single step
NSF-Supported Work of 2001: Expresso Progress to Date
Margaret Ellis and Logan Hanks (computer science graduate students):• MEL: Semistructured data model for experiment capture• Parsing: Automatic parser generators to drive archival storage• Database: Loading and cataloging MEL data in a Postgres RDBMS• Pipeline: Linkages to data analysis and data mining software
Imposition of Successive Cycles of Mild or Severe Drought Stresson 1-year-old Loblolly Pine Seedlings
0
-2
-10
-15
DAYS
wat
er p
oten
tial
(b
ars)
RNAHarvest
I
RNAHarvest
II
RNAHarvest
III
RNAHarvest
IV
Cycles ofMild
DroughtStress
DR
Y D
OW
N
DR
Y D
OW
N
DR
Y D
OW
N
DR
Y D
OW
N
= PS (photosynthesis)
0
-2
-10
-15
DAYS
wat
er p
oten
tion
al (
bar
s)
Cycle
ICycle
IICycle
III
RNAHarvest
I
RNAHarvest
II
RNAHarvest
III
Cycles ofSevere
DroughtStress
DRY DOW
N
DRY DOW
N
DRY DOW
N
Water withheld
Water given
Water given
Water given
Water given
Water withheld
Water withheld
Water withheld
Water given
Water given
Water given
Water withheld
Water withheld
Water withheld
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
RE
CO
VE
RY
Cy3 TIFF Image
Final Harvest; Control versus Mild Stress; 2001
Cy5 TIFF Image
Rep
lica
tion
Dif
fere
nti
al
Exp
ress
ion
Final Harvest; Control versus Mild Stress; 2001
Cy5 to Cy3 ratios. Final harvest after four drought cycles. RNA harvested 24 hours after final watering.
Cy5 = treated; Cy3 = control.Aquaporins responded positively. HSP 80’s were
unaffected (same as in 1999 results).
Drought Stress Responses in Loblolly Pine: Questions to be
Addressed• Can a hierarchy of drought stress resistance mechanisms be identified ?
• Can a clear distinction be made between rapidly responding and long term adaptational mechanisms?
• Can particular subgroups within gene families be associated with drought tolerance?
Proposed Project: 2002-2005
Plant Biology (with co-PIs: Ron Sederoff, NCSU; Carol Loopstra, TAMU)
• An investigation of drought stress responses in lobolly pine in a variety of provenances.
• Quantitative RT-PCR to confirm and expand results obtained with microarrays.
• In situ hybridization to stressed and unstressed cell and tissue types.
Proposed Project: 2002-2005Sources of cDNAs for 2002-2005 arrays
• NCSU ESTs selected on the basis of function.
• Stressed cDNA libraries from roots and stems of drought tolerant families from East Texas and Lost Pines, and from the Atlantic Coastal Plain (humid conditions).
• Homologs of drought-responsive Arabidopsis genes.
Drought Stress Responses in Loblolly Pine: Future Bioinformatics Goals
• Support incorporation of biological information in the form of functional hierarchies and gene families.
• Close the computational and experimental loop to support iterative experimental regimes.
• Integrate information from multiple experiments involving multiple provenances, drought stresses, and EST sets.
Gene Discovery in the Arabidopsis Transcriptome
Data Capture
Pos
tgre
s D
atab
ase
Database Queries
Statistical Analysis and Clustering
Data Mining, ILP
Possible Identification of Novel Drought
Responsive Genes in Arabidopsis
Drought Stress (short and long
term)
Hybridize to Arabidopsis
Transcriptome
Scanning, Image Processing
Select Pine cDNAs Via Contigs
Robotic Replication and Printing
Identification of Drought Responsive Genes and Pathways Across Provenances in Loblolly Pine
Data CaptureP
ostg
res
Dat
abas
e
Database Queries
Statistical Analysis and Clustering
Data Mining, ILP
Drought Stress Experiments on
NC, TX Pine
Hybridization
Scanning, Image Processing
Identification of Drought
Responsive Pine Genes
Close The Loop
Arabidopsis Drought
Responsive genes
Proposed Project: 2002-2005
Bioinformatics I (Alscher, Heath, Ramakrishnan)
• Constraint-based selection of cDNAs, including intelligent use of contigs.
• Assignment of pine ESTs to subgroups within protein families (ProDom, Pfam).
• Extend information integration in ILP to include Mendel classification of gene families.
• Integrating data across provenances and known degrees of drought tolerance.
Proposed Project: 2002-2005
Bioinformatics II (Ramakrishnan, Heath)
• Specialize ILP for particular biological information sources.
• Automatic tuning of ILP parameters.
• Pushing data mining functionality into the database.
• Interleaving and iteration of query, data analysis, and data mining operations.