Applications of Functional Genomics and Bioinformatics
Towards an Understanding of Towards an Understanding of Oxidative Stress Resistance Oxidative Stress Resistance
in Plants: Expresso and in Plants: Expresso and ChipsChips
Overview• Environmental stress and reactive oxygen
species (ROS)• Plant responses to ROS• Stress on a chip — current results• Expresso
– Managing expression experiments– Analyzing expression data– Reaching conclusions
• Some future directions — Collaborating with CIP on resistance mechanisms in Andean root and tuber crop species
The Paradox of Aerobiosis
• Oxygen is essential, yet also potentially toxic.
• Aerobic cells maintain themselves against constant danger of production of reactive oxygen species (ROS).
• ROS can act as mutagens, they can cause lipid peroxidation and denature proteins.
Mitochondrion
Chloroplast
Nucleus
Cytosol
Cell Wall
WoundingChilling Ozone
Drought,Salinity
ExpressionGene
Antioxidant genes
Post-transcriptionalEffects
ParaquatHigh Light + Chilling
Sulfur Dioxide
,,
subcellularROS
sitesunclear
(
)
,Pathogens
Post-transcriptionalEffects
Mitochondrion
Chloroplast
Nucleus
Cytosol
Cell Wall
WoundingChilling Ozone
ExpressionGene
Antioxidant genes
Post-transcriptionalEffects
ParaquatHigh Light + Chilling
Sulfur Dioxide
Pathogens
Post-transcriptionalEffects
ROS Arise Throughout the Cell
(ROS subcellular sites unclear)
Drought Salinity
Cellular Redox Homeostasis
• Maintained enzymatically • Glutathione, Ascorbate (soluble).
– Alpha-tocopherol, Carotenoids (membrane).• Antioxidant pools increase with stress. • Protein methionine sulfoxidation is an
additional antioxidant reservoir.• Molecular chaperones (heat shock proteins) act
as repair mechanism.
ROS Arise as a Result of Exposure to:
• Ozone
• Sulfur dioxide
• High light
• Paraquat
• Extremes of temperature
• Salinity
• Drought
Plant-Environment Interactions
• Several defense systems that respond to environmental stress are known.
• Their relative importance is not known.
• Mechanistic details are not known. Redox sensing may be involved.
A Basis for Cellular Responses to ROS
Thiol Redox ControlThiol Redox Control
StressStress DefenseDefense
Redox Regulation of Gene Expression
Mem bra neRece ptors
(Oxylipins)
Protein kinases;Phosph oprotein
phosphata ses
Tra nscr iptio nfac tors (Redox-
sensitive?)
Gene expres sio n
Cellula r respons e: Defense proc esses Repai r proc esses
Prooxidant s (ROS):
O2.
H2O2
NO.
Environmenta l Stress
Antioxida nts:
Trx-(SH)2/Trx-S2
2 GSH/GSSG
Grx-(SH)2/Grx-S2
Met/MetO
Asc/ DHAsc
Cellula r DefenseRes ponse
Adaptation
ROS Scavenging in Plastids
PSII
Fe?Cu,Zn?
NADP+
NADPH
GSH
GSSG
DHA
AsA
MDHARStromalPathway
ThylakoidPathway
AsA
NAD(P)H
H2O2 H2O
Fd O2
O2.- H2O2 H2O
SOD
GR
sAPX
PSI
e-
PSII
2H2O
tAPX
Fd O2
DHAR
Fe?
Thylakoid Membrane
Thylakoid Lumen
Stroma
SOD
NAD(P)+
x2
PSI
e-
2H2O
MDA.
O2.-
MDA.
4 e- + 4 H+ + O2 4 e- + 4 H+ + O2
O2.-
Stress Resistance — Short Term “Emergency”
• Accumulated evidence suggests that successful resistance to stress imposition consists in the mobilization of cellular defense machinery.
(Short term exposures to oxidative stress conditions in a number of crop species, and cultivars within species.)
• Activation of defense genes, such as SOD, glutathione reductase
• Stimulation of antioxidant biosynthetic pathways, such as glutathione
Differential Response of Plastid SOD to Sulfur Dioxide in Two Cultivars of Pea
Exposure to sulfur dioxide in resistant (Progress) and sensitive (Nugget) cultivars of pea resulted in increases in plastid Cu-Zn SOD mRNA and protein only in the resistant cultivar. Kinetics of increase correlates with recovery of photosynthesis in cv. Progress.
Stress Resistance — Long Term Adaptation to Harsh Environmental
ConditionsLess data available than for emergency responses. But overlap with emergency processes?
Candidates include:
• Low temperatures - glutathione-associated processes, cryoprotective proteins and oligosaccharides
• High temperature- heat shock proteins
• Drought- water channel proteins (aquaporins), dehydrins
Season-Specific Isoforms of Glutathione Reductase in Spruce
Winter and summer specific isoforms of glutathione reductase exist in red spruce. The appearance of the winter specific form correlates with the onset of hardening.
Candidate Resistance Mechanisms
• In the past, candidate mechanisms were examined known gene by known gene, process by process.
• Microarray Technology
– Simultaneous examination of groups of candidate genes and associated interactions
– Possible discovery of new defense mechanisms
Spots:(Sequences affixed to slide)
1 2 3
11
2
21
3
1 2
2333
Treatment Control
Mix
1 2 3
Excitatio
n
Em
issi
on
Detection
Relative AbundanceDetection
Hybridization
Detection of gene expression effects on microarrays
Characterize gene function
Test mutant phenotypes
Genetic Regulatory Networks
Identify mutants
1
2
3
4
Iterative strategy for detection of genetic interactions using microarrays
• Precedent I: Plants adapt to adverse environmental conditions via a global cellular response involving changes in the expression patterns of numerous genes.• Precedent II: To study these changes, the Expresso team uses bioinformatics and experimental techniques.• Long term goal: To identify and improve emergency and long term adaptational stress response mechanisms in crop species.
Long Term Goal
• Integration of design and procedures
• Integration of image analysis tools and statistical analysis
• Connections to web databases and sequence alignment tools
• The software Aleph was used for inductive logic programming (ILP).
Expresso: A Problem Solving Environment for Microarray
Experiment Design and Analysis
Who’s Who
Ruth Alscher Plant Stress
Boris Chevone Plant Stress
Ron Sederoff, Ross WhettenLen van ZylY-H.SunForest Biotechnology
Plant BiologyComputer Science
Lenwood Heath (CS)Algorithms
Naren Ramakrishnan (CS)Data Mining
Problem Solving Environments
Craig Struble,Vincent Jouenne (CS)
Image Analysis
Statistics
Ina Hoeschele (DS)Statistical Genetics
Keying Ye (STAT)Bayesian Statistics
Virginia Tech
North Carolina State Univ.
Virginia Tech
Virginia Tech
Dawei Chen
Molecular Biology
Bioinformatics
Expresso People
Ross WhettenBoris Chevone
Ron Sederoff
Y-H .Sun Dawei Chen
Lenny Heath
Ruth Alscher
Vincent Jouenne
Naren Ramakrishnan
Keying Ye
Len van Zyl
Craig Struble
The 1999 Experiment: A Measure of Long Term Adaptation to
Drought Stress• Loblolly pine seedlings (two unrelated genotypes “C” and
“D”) were subjected to mild or severe drought stress for four (mild) or three (severe) cycles.
– Mild stress: needles dried down to –10 bars; little effect on growth, new flushes as in control trees.
– Severe stress: needles dried down to –17 bars; growth retardation, fewer new flushes compared to controls.
• Harvest RNA at the end of growing season, determine patterns of gene expression on DNA microarrays.
• With algorithms incorporated into Expresso, identify genes and groups of genes involved in stress responses.
Hypotheses
• There is a group of genes whose expression confers resistance to drought stress.
• Expression of this group of genes is lower under severe than under mild stress.
• Individual members of gene families show distinct responses to drought stress.
Selection of cDNAs for Arrays
• 384 ESTs (xylem, shoot tip cDNAs of loblolly) were chosen on the basis of function and grouped into categories.
• Major emphasis was on processes known to be stress responsive.
• In cases where more than one EST had similar BLAST hits, all ESTs were used.
Categories within Protective and Protected Processes
Plant Growth Regulation
Environmental
Change
GeneExpression
SignalTransduction
ProtectiveProcesses
ProtectedProcesses
ROS and Stress
Cell Wall Related
PhenylpropanoidPathway
Development
Metabolism
Chloroplast Associated
Carbon Metabolism
Respiration and Nucleic Acids
Mitochondrion
Cells
Tissues
Cytoskeleton
Secretion
Trafficking
Nucleus
Protease-associated
A Note about Categories
Categories are not mutually exclusive; gene(s) may be assigned to more then one category. For example, heat shock proteins have been grouped under these different categories and subcategories– Abiotic stress – heat– Gene expression – post-translational
processing – chaperones– Abiotic stress - chaperones
ProtectiveProcesses
Stress
Cell Wall Related
PhenylpropanoidPathway
AbioticBiotic
Antioxidant Processes
Drought
HeatNon-Plant
Xenobiotics
NADPH/Ascorbate/GlutathioneScavenging Pathway
Cytosolicascorbateperoxidase
Dehydrins, Aquaporins
Heat shock proteins(Chaperones)
superoxidedismutase-Fe
superoxidedismutase-Cu-Zn
glutathionereductase
Sucrose Metabolism
Cellulose
Arabionogalactan proteins
Hemicellulose
Pectins
Xylose
Other Cell Wall Proteins
isoflavone reductases
phenylalanine ammonia-lyases
S-adenosylmethionine decarboxylases
glycine hydromethyltransferases
Lignin Biosynthesis CCoAOMTs
4-coumarate-CoAligases
cinnamyl-alcoholdehydrogenase
Chaperones“IsoflavoneReductases”
GSTs
Extensins and proline rich proteinsCategorieswithin
“Protective Processes”
Quality Control
• Positive: LP-3, a loblolly gene known to respond positively to drought stress in loblloly pine, was included.
• LP-3 was positive in the moist versus mild comparison, and unchanged in the moist versus severe comparison.
• Negative: Four clones of human genes used as negative controls in the Arabidopsis Functional Genomics project were included. The clones did not respond.
ProtectiveProcesses
ROS and Stress
Cell Wall Related
PhenylpropanoidPathway
AbioticBiotic
AntioxidantProcesses
Drought
HeatNon-PlantXenobiotics
NADPH/Ascorbate/GlutathioneScavenging Pathway
Cystosolicascorbateperoxidase
Dehydrins, Aquaporins
Heat shock proteins
superoxidedismutase-Fe
superoxidedismutase-Cu-Zn
glutathionereductase
Sucrose Metabolism
Cellulose
Extensins, Arabionogalactan,and Proline Rich Proteins
Hemicellulose
Pectins
Xylose
Other Cell Wall Proteins
isoflavone reductases
phenylalanine ammonia-lyase
S-adenosylmethionine decarboxylase
glycine hydromethyltransferase
Lignin Biosynthesis CCoAOMT
4-coumarate-CoAligase
cinnamyl-alcoholdehydrogenase
Chaperones“IsoflavoneReductases”
GSTs
Categories thatcontained positives ingenotypes C and D(Control versus Mild)
Data from two slides (4 arrays)for C and two slides (4 arrays)for D were collected.
Hypotheses versus Results• Among the genes responding to mild stress, there
exists a population of genes whose expression confers resistance. – Genes in 69 categories responded positively to mild stress in
Genotypes C and D (the positive response was not observed in the severe stress condition in Genotype D).
• There is evidence for a response to drought among genes associated with other stresses.– Isoflavone reductase homologs and GSTs responded
positively to mild drought stress.
– These categories are previously documented to respond to biotic stress and xenobiotics, respectively.
Relationships among HSP Homologs
In control versus mild stress,HSP 100, 70, and 23 responded in C and D;HSP 80s did not respond in either C or D.
Candidate Categories — Long-term Adaptation to Drought Stress
• Include– Aquaporins– Dehydrins– Heat shock proteins/chaperones
• Exclude– Isoflavone reductases
• Clones on the drought-stress microarrays were replicated and randomly placed
• Experiment involved 384 archived pine ESTs
• Organized into 4 microtitre source plates after PCR
• Pipetted into 8 sets of 4 microtitre plates each
• Each set a different random arrangement of 384 ESTs
• Printed type A microarrays from first 4 sets
• Printed type B microarrays from second 4 sets
• Each array has 4 randomly placed replicates of each EST
• Each control versus stress comparison was done on 4 arrays — A and B; flip dyes; A and B
• Total of 16 replicates of each EST in each comparison
Design of Microarrays
• Image Analysis: gridding, spot identification, intensity and background calculation, normalization
• Statistics:• Fold or ratio estimation• Combining replicates
• Higher-level Analysis:• Clustering methods• Inductive logic programming (ILP)
Spot and Clone Analysis
Image Analysis
Microarray Suite:• Manual gridding• Extract two intensities for each spot• Compute ratios• Compute calibrated ratios
Our tools use the logarithm of the calibrated ratios
Computational and Statistical Analysis
• The multiple (typically 16) log calibrated ratios for a replicated clone do NOT follow a normal distribution.
• We assume a zero-centered distribution for log ratios.
• The number of positive (or negative) log ratios follows a binomial distribution with parameters 16 and 0.5.
• A clone with 12 or more positive log ratios is up-expressed with a probability of 0.96.
• We classify each EST response as one of– Up-regulated– Down-regulated– No clear change
• Provides sufficient results for the use of inductive logic programming (ILP).
Related Statistical Results• Chen et al. (J. Biomed. Optics 2, 1997, 364-374)
– Assume a normal distribution and normalize ratios
– No replicates
– Estimate a confidence interval for ratios that applies to each spot
• Lee et al. (PNAS 97, August 29, 2000, 9834-9) emphasize need for replication
• Black and Doerge (PNAS, to appear)
– Investigate distributional assumptions of log-normal and gamma distributions on intensities
– Determine the number of replicates needed for a particular confidence level under each distribution
– Assume normalization has been done and location-dependent error has been eliminated.
Further Analysis:Inductive Logic Programming
• ILP is a data mining algorithm expressly designed for inferring relationships.
• By expressing relationships as rules, it provides new information and resultant testable hypotheses.
• ILP groups related data and chooses in favor of relationships having short descriptions.
• ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications).
Rule Inference in ILP• Infers rules relating gene expression levels to categories, both
within a probe pair and across probe pairs, without explicit direction
• Example Rule:[Rule 142] [Pos cover = 69 Neg cover = 3]
level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive).
• Interpretation:
“If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”
More Rules We Obtained• [Rule 6]
level(A,moist_vs_mild,positive) :-
category(A, transport_protein).
level(A,mild_vs_severe,negative) :-
category(A, transport_protein).• [Rule 13]
level(A,moist_vs_mild,positive) :-
category(A, heat).• [Rule 17]
level(A,moist_vs_mild,positive) :-
category(A, cellwallrelated).
ILP Subsumes Two Forms of Reasoning
• Unsupervised learning
– “Find clusters of genes that have similar/consistent expression patterns”
• Supervised learning
– “Given several patterns of gene expression for two conditions, give an equation that distinguishes the patterns for each condition ”
• Hybrid reasoning
– “Is there a relationship between genes in a given functional category and genes in a particular expression cluster?”
– ILP mines this information in a single step
Current Status of Expresso
• Completely automated and integrated– Statistical analysis– Data mining– Experiment capture in MEL
• Current Work: Integrating– Image processing– Querying by semi-structured views– Expresso-assisted experiment composition
Future DirectionsNext Generation Stress Chips
1. Time course, short and long term, to capture gene expression events underlying “emergency” and adaptive events following drought stress imposition.
(Use all available ESTs for candidate stress resistance genes.)
2. Generate cDNA library from stressed seedlings.
3. Initiate modeling of kinetics of drought stress responses.
Gene Expression Events Associated with Extreme Environmental Conditions
• Hypothesis 1: Specific genes that confer ability to adapt to extreme conditions are expressed in Andean potato varieties and in other root and tuber crops of the region.
• Hypothesis 2: The adaptive genes act either individually or in co-adaptive groups.
• Hypothesis 3: The adaptive genes are also expressed in temperate zone varieties or they are specific to extreme conditions.
• Proposed approach: Use of microarrays as a tool for discovery of these adaptive genes.
Successful Emergency Responses Versus Adaptation to Diurnal Variation
• Cultivar differences with respect to degree/rapidity of gene expression
• Cultivar differences with respect to rate of synthesis of antioxidant(s)
• Our question: Are the genes that respond in the short term the same ones that confer stress resistance diurnally?
Relevant Data
• Similarities among orthologous genes are sufficiently close that cross-hybridization occurs on microarrays between species (R. R. Sederoff, personal communication).
• Although the above confounds treatment responses shown by individual members of multi-gene families, it allows use of a chip based on one species in inter-specific hybridizations.
Collaborating with CIP Suggested Strategy I
• Identify patterns of gene expression in S. tuberosum associated with successful adaptation to stress (temperature extremes, with or without drought? )
• Construct two or more cDNA libraries from adapted potato
• Design “potato stress chips” including the stress cDNA library and known stress-responsive genes of S. tuberosum (SolGenes resource)
Suggested Strategy IIAn Experimental Approach
• Identify variety-specific diurnal gene expression patterns in Andean potato varieties using potato stress chip.
• Construct arrays of cDNAs from Andean varieties. Compare mRNA populations isolated during adaptation of temperate zone potatoes with RNA obtained from Andean varieties.
Detection of gene expression effects on microarrays
Characterize gene function
Test mutant phenotypes
Genetic Regulatory Networks
Identify mutants
1
2
3
4
Iterative strategy for detection of genetic interactions using microarrays
Detection of stress -mediated gene expression effects on microarrays
Computational tools to infer interaction among genes, pathways
Revised / New Tools and
Experiments
Genetic Regulatory Networks
Test inferences with mutants/varying
conditions
1
2
3
4
Iterative strategy for detection of genetic interactions using microarrays
and CS expertise