bioinformatics expression profiling and functional genomics part i: preprocessing ad 29/10/2006
DESCRIPTION
Overview MICROARRAY PREPROCESSING Gene expression Omics era Transcript profiling Experiment design Preprocessing ExercisesTRANSCRIPT
![Page 1: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/1.jpg)
Bioinformatics
Expression profiling and functional genomics
Part I: PreprocessingAd 29/10/2006
![Page 2: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/2.jpg)
• http://www.esat.kuleuven.ac.be/~kmarchal/• Course material: course notes + powerpoint files• Exercises
![Page 3: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/3.jpg)
Overview
MICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling
• Experiment design
• Preprocessing
• Exercises
![Page 4: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/4.jpg)
mRNA
DNA
transcriptiontranscription
translationtranslation
+1+1
protein
protein
Gene expression
![Page 5: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/5.jpg)
Adaptation of cell to its environment
FNR box cytN cytO cytQ cytP
??
Bacterial cell
ininoutout
Signal 1Signal 2Signal 2
Adaptation of a cell: response on environmental signalsresponse to e.g. hormones (cell differentiation)
Cellular response determined by the genes which are switched on upon a signal
Gene expression
![Page 6: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/6.jpg)
Action of genetic networks underlie the observed phenotypical behavior
Gene expression
![Page 7: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/7.jpg)
Overview
MICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling
• Experiment design
• Preprocessing
• Exercises
![Page 8: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/8.jpg)
Functional genomics
Structural Genomics
Comparative Genomics
![Page 9: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/9.jpg)
Traditional molecular biology – Directed toward understanding the role of a particular gene or
protein in a molecular biological process– Northern analysis– Mutational analysis– Expression by reporter fusions
Omics era Measurement of the expression of 1000 of genes, proteins
simultaneously
Omics era
– The function or the expression of a gene in a global context of the cell
– Holistic approaches allow better understanding of fundamental molecular biological processes
Because a gene does not act on its own, it is always embedded in a larger network (systems biology)
![Page 10: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/10.jpg)
Detection Reference Test
Reference sample Test sample
RNA RNA
cDNA cDNA
transcriptomics
mRNA
DNA
transcriptiontranscription
translationtranslation
+1+1+1+1
proteinprotein
protein
Omics era
![Page 11: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/11.jpg)
proteomics
mRNA
DNA
transcriptiontranscription
translationtranslation
+1+1+1+1
proteinprotein
protein
Omics era
![Page 12: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/12.jpg)
metabolomics
Omics era
![Page 13: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/13.jpg)
SYSTEMS BIOLOGYConsider the cell as a system
Omics era
![Page 14: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/14.jpg)
SYSTEMS BIOLOGY
Mechanistic insight in the biological system at molecular biological level
High throughput data
Omics era
![Page 15: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/15.jpg)
• analysis of such large scale data is no longer trivial => computational challenges– Low signal/ noise– High dimensionality
• Simple spreadsheet analysis such as excel are no longer sufficient
• More advanced datamining procedures become necessary
• Another urgent problem is also how to store and organize all the information.
Bioinformatics
Omics era
![Page 16: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/16.jpg)
Overview
MICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling– Principle of microarray – Applications
• Experiment design
• Preprocessing
• Exercises
![Page 17: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/17.jpg)
Detection Reference Test
Reference sample Test sample
RNA RNA
cDNA cDNA
transcriptomics
Transcript profiling
![Page 18: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/18.jpg)
• Previously: measure expression level of one gene:Northern blot analysis
• Novel techniques: measure expression level of all genes simultaneously => EXPRESSION PROFILING
Principle: hybridisation
mRNA: 5’ –UGACCUGACG- 3’
cDNA 3’ -ACTGGACTGC-5’
Hybridize : stick together
Transcript profiling
![Page 19: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/19.jpg)
Monitor molecular activities on a global level– protein levels proteomics, – enzyme activities– Metabolites– gene expression (mRNA), transcriptomics = transcript profiling
allows to gain a general insight in the global cell behavior (holistic)
Molecular biological methods
– RT-PCR
– SAGE
– Protein arrays
– Microarray analysis
Transcript profiling
![Page 20: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/20.jpg)
Transcript profiling
cDNA clones
Printing slides
SLIDE PRODUCTION
Experiment design
Sample preparation
Hybridization & scanning
cDNA µA EXPERIMENT
DATA ANALYSIS
EXPERIMENTAL PROCEDURES
![Page 21: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/21.jpg)
cDNA array
Spotted cDNA Glass side
Upscaled Northern hybridisation
++11
Gene (DNA)
Transcript (mRNA)
cDNA
Transcript profiling
![Page 22: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/22.jpg)
Preparation of probes
• Collect cDNA clones
• Amplify target cDNA insert by PCR
• Check yield & specificity by electrophoresis
Spot + PCR products on glass slides
Transcript profiling
![Page 23: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/23.jpg)
Detection Reference Test
Reference sample Test sample
RNA RNA
cDNA cDNA
Transcript profiling
![Page 24: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/24.jpg)
Signal 1 Signal 2Signal 2
2. mRNA isolation2. mRNA isolation
3. labeling3. labeling
4. Hybridization + washing4. Hybridization + washing 5. scanning5. scanning 6. Image analysis6. Image analysis
numerical value
1. Cell culture1. Cell culture
Transcript profiling
![Page 25: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/25.jpg)
http://www.bio.davidson.edu/courses/genomics/chip/chip.html
Transcript profiling
![Page 26: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/26.jpg)
Superimposed color image
* Transform into color images
* Superimpose color images from R and G channel
good alignment bad alignment
Transcript profiling
![Page 27: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/27.jpg)
black spots : gene was neither expressed in test nor in control sample
green : gene was only expressed in control sample
red : gene was only expressed in test sample
yellow : gene was expressed both in test and in control sample
Superimposed color image
Transcript profiling
![Page 28: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/28.jpg)
Signal intensity is proportional with the amount of cDNA present in the samplesignal cy3 -> numerical valuesignal cy5 -> numerical value
Data analysis
Image analysis
Transcript profiling
![Page 29: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/29.jpg)
Transcript profiling
Data representation
Gene profileExperiment profile
![Page 30: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/30.jpg)
Spotted DNA microarray High density oligonucleotide array
Transcript profiling
![Page 31: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/31.jpg)
Overview
MICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling
• Experiment design
• Preprocessing
• Exercises
![Page 32: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/32.jpg)
Depending on experimental design other mathematical approach
• Comparison of 2 samples (black/white)
• Comparison of multiple arrays
• Global dynamic profiling
• Static experiment: Comparison of samples (mutants, patients)
Experiment Design
![Page 33: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/33.jpg)
Type1: Comparison of 2 samples
Statistical testing
Control sample
Induced sample
Retrieve statistically over or under expressed genes
2 sample design
Experiment Design
![Page 34: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/34.jpg)
black/white experiment description (array V mice genes)
• Condition 1 : pygmee mouse 10 days old (test)• Condition 2 : normal mouse 10 days old (ref)
detect differentially expressed genes
Experiment design (Latin Square)
Condition 1Dye1Replica L
Condition 1dye1Replica R
Condition 2dye2Replica L
Condition 2dye2Replica R
Condition 2dye1Replica L
Condition 2dye1Replica R
Condition 1dye2Replica L
Condition 1dye2Replica R
Array 1
Array 2
Per gene, per condition 4 measurements available
Experiment Design
![Page 35: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/35.jpg)
Measure expression of all genes
• During time (dynamic profile)
• In different conditions
Identify coexpressed genes
Identify mechanism of coregulation
Motif Finding
Clustering
Multiple array design
Experiment Design
![Page 36: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/36.jpg)
Original dataset : 6178 genes
Preprocessing:• select 4634 most variable (25 % most variable)• variance normalized• adaptive quality based clustering (32 clusters) (95%)
Multiple array design• Study of Mitotic cell cycle of Saccharomyces cerevisiae with oligonucleotide
arrays (Cho et al.1999) - 15 time points (E=18)• time points 90 & 100 min deleted (Zhang et al. 1999, Tavazoie et al., 1999)
Experiment Design
![Page 37: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/37.jpg)
• Reference: unsynchronized cells• Condition: synchronized cells during cell cycle at
distinct time intervals
Condition 1
Dye1
Replica L
Condition 2
Dye1
Replica L
Condition 3
Dye1
Replica L
Condition 4
Dye1
Replica L. …
Condition 19
Dye2
Replica L
Condition 19
Dye2
Replica L
Condition 19
Dye2
Replica L
Condition 19
Dye2
Replica L
Array 1
Reference design: e.g. Spellman dataset
Experiment Design
![Page 38: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/38.jpg)
Loop design
Experiment Design
![Page 39: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/39.jpg)
Overview
MICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling
• Experiment design
• Preprocessing
– Sources of Variation
– General normalization steps
– Slide by slide normalization
– ANOVA normalization
![Page 40: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/40.jpg)
Sources of variation– Overshine effects– Dye effect– Spot effects– Array effect
Consistent errors
• Consistent errors complicate direct comparison of measurements of the same gene/condition
• Consistent errors need to be removed by preprocessing/normalization
Preprocessing
• Tedious• Influences downstream measurements
![Page 41: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/41.jpg)
Signal 1 Signal 2Signal 2
2. mRNA isolation2. mRNA isolation
3. labeling3. labeling
4. Hybridization + washing4. Hybridization + washing 5. scanning5. scanning 6. Image analysis6. Image analysis
numerical value
1. Cell culture1. Cell culture
Preprocessing
Dye effectDye effect
![Page 42: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/42.jpg)
Dye, condition effect: within slide variation
Measurement error: – Preparation mRNA– Labeling &reverse transcription
Normalization
Global normalization assumption
Overall signal in one channel more pronounced than in other channel
0)/(log2 reftest
Preprocessing
![Page 43: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/43.jpg)
Signal 1 Signal 2Signal 2
2. mRNA isolation2. mRNA isolation
3. labeling3. labeling
4. Hybridization + washing4. Hybridization + washing 5. scanning5. scanning 6. Image analysis6. Image analysis
numerical value
1. Cell culture1. Cell culture
Preprocessing
Array effectArray effect
![Page 44: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/44.jpg)
• normalization within slide
• ratio
Differences in global intensity between slides
Comparison between slides impossible
Array effects: between slide variation
Preprocessing
Hybridization differences
![Page 45: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/45.jpg)
Array effects: Between slide variation
-7
-5
-3
-1
1
3
5
7
1
Q1
maxvalue
minvalue
Q3
-7
-5
-3
-1
1
3
5
7
1
Ser ies1
Ser ies2
Ser ies3
Ser ies4
-9
-7
-5
-3
-1
1
3
5
7
1 Ser ies1
Ser ies2
Ser ies3
Ser ies4
Preprocessing
![Page 46: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/46.jpg)
Measurement error: Different quantity of DNA in spot
Difference in duplicate spots
Ratio: compare differential expression between genes
Spot effect
Absolute levels between genes incomparable
Gene 1: test: 4 ref:2 R/G:2
Gene 2: test: 8 ref:4 R/G:2
Pin main effects: spot effects
Preprocessing
![Page 47: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/47.jpg)
Non specific signal Cy5 or Cy3 resulting from overshining= emission from neighboring spots
Overshine effects: within slide variation
Preprocessing
Background intensity increases with the intensity of the neighboring spots
![Page 48: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/48.jpg)
Removing sources of variation is obligatory step
• To make comparisons within a slide possible• E.g. find differentially expressed genes
• To allow interslide comparisons• E.g. combining the replica’s of the original experiment and the color flip
Preprocessing
![Page 49: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/49.jpg)
OverviewMICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling
• Experiment design
• Preprocessing– Sources of Variation
– General normalization steps
– Slide by slide normalization
– ANOVA normalization ANOVA
![Page 50: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/50.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
normalization
Ratio
Test statistic (T-test)
Log transformation
Preprocessing
Background corrBackground corr
![Page 51: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/51.jpg)
• Background correction compensates for overshining• Background correction is considered additive
Preprocessing: Background correction
Background correction
![Page 52: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/52.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
normalization
Ratio
Test statistic (T-test)
Log transformation
Preprocessing
Background corrBackground corr
![Page 53: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/53.jpg)
• additive error: independent on the measured intensity the absolute level of the error remains the same (at low levels high relative error, at high expression levels low relative error).
• multiplicative error: the error increases with the measured intensity (at high levels high relative error)
Multiplicative error
Preprocessing: log transformation
![Page 54: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/54.jpg)
LOG2 transformed intensity values: Multiplicative effects removed, additive effects more pronounced
residuals are constant at high intensities
Additive error: error increases as the signal is lower (intuitively plausible)
Preprocessing: log transformation
![Page 55: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/55.jpg)
Preprocessing: log transformation
![Page 56: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/56.jpg)
• Log (test/ref) = log2(test)-log2(ref): • upregulation range 0…+infinity• downregulation range 0…-infinity
2 fold overexpression2 fold underexpression
Ratio = 2Ratio = 0.5
Log2(Ratio) = 1Log2(Ratio) = -1
• ratio (test/ref) test>ref upregulation range 1…+infinity• test<ref downregulation range 0...1: range of downregulation squashed
Why log2
Preprocessing: log transformation
![Page 57: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/57.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
normalization
Ratio
Test statistic (T-test)
Log transformation
Preprocessing
Background corrBackground corr
![Page 58: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/58.jpg)
• Spots are identified by Image analysis– Array Vision– ImaGene– Matarray
Spot detection and signal acquisition
e.g. Signal is definedMean pixel intensity of all pixels in a spot for which the Intensity is higher than the local background + 2SD
• Spots can have different qualities– Irregular spots– Spots with excessive large diameter– Spots which are extremely small
artifacts
Preprocessing: filtering
![Page 59: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/59.jpg)
Red >0.1 stdevGreen >1 stdevBlue >2 stdev
Preprocessing: filtering
![Page 60: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/60.jpg)
Filtering:
Zero values: treat these separatelyratiolog transformation
Zero values: black white experiment interestinggenes off in condition 1 versus on in condition 2
Undefined
Preprocessing: filtering
![Page 61: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/61.jpg)
• Some genes only labeled with green dye , not with red dye• If no mRNA of a gene is present, the green dye binds aspecifically to a spot?
color flip essential to eliminate false positives
Seemingly underexpressed
cloneIdexp1 LroodItest RroodItest LgroenIref RgroenIref26635 2106.563 0 101692.979 10399.82227141 836.407 0 123838.567 45432.93127500 803.205 0 111507.935 72379.88728152 0 1331.273 9263.894 14005.90528333 0 1255.175 87102.68 9188.58728756 363.247 0 115771.253 88541.34330694 924.256 0 22029.599 50306.219
cloneIdexp2 LgroenItestRgroenItest LroodIref RroodIref26635 14376.307 12190.883 0 995.69427141 14804.307 13242.277 1315.193 762.17227500 22051.507 18835.761 0 028152 29270.26 26939.077 90.713 3402.7328333 25964.137 22326.256 0 028756 14270.607 20442.069 0 1007.76330694 20150.615 19003.462 4750.326 7988.791
Preprocessing: filtering
![Page 62: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/62.jpg)
MICROARRAY PREPROCESSING
• Gene expression
• Omics era
• Transcript profiling
• Experiment design
• Preprocessing– Sources of Variation
– General normalization steps
– Slide by slide normalization
– ANOVA normalization
Overview
![Page 63: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/63.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
normalization
Ratio
Test statistic (T-test)
Log transformation
Preprocessing
Background corrBackground corr
![Page 64: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/64.jpg)
• On average ratio red/green should be 1
– Rescale based on average of housekeeping genes
– Rescale based on spikes
– Rescale based on average expression value of the full array (global normalization)
• Methods used for normalization
– linear normalization
– Intensity dependent normalization
Preprocessing: normalization
![Page 65: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/65.jpg)
Linear Normalization
G
R
G
R
Preprocessing: normalization
![Page 66: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/66.jpg)
– Red and green related by a constant factor– Calculate factor by linear regression
Log2(ratio)0 Log2(ratio)0
• Linear normalization factor determined by linear regression
• Filtering to remove outliers in the non-linear range (green values)
•http://afgc.stanford.edu/~finkel/talk.htm
Preprocessing: normalization
![Page 67: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/67.jpg)
Linear normalization not straightforward,…
Log2
(R/G
)
(Log2(R) + Log2(G))/2
Linear fit
Lowess fit
Preprocessing: normalization
![Page 68: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/68.jpg)
Non-linear intensity dependent normalization
Lowess (Dudoit et al., 2000) : genes seemingly underexpressed due to specific dye effect will be compensated for
Log R and log G recalculated based on the lowess fit
Lowess linearizes and normalizes the data !!!!!
Preprocessing: normalization
![Page 69: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/69.jpg)
Intensity dependent normalization
Preprocessing: normalization
![Page 70: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/70.jpg)
Result of the normalizationA. Before normalization
-7
-5
-3
-1
1
3
5
7
1
Q1
maxvalue
minvalue
Q3
-7
-5
-3
-1
1
3
5
7
1
Ser ies1
Ser ies2
Ser ies3
Ser ies4
-9
-7
-5
-3
-1
1
3
5
7
1 Ser ies1
Ser ies2
Ser ies3
Ser ies4
B. After normalization
RATIO1_NORM
-6
-4
-2
0
2
4
6
1
Q1
maxvalue
minvalue
Q3
RATIO2_NORM
-6
-4
-2
0
2
4
6
1
Q1
maxvalue
minvalue
Q3
RATIO3_NORM
-6
-4
-2
0
2
4
6
1
Q1
maxvalue
minvalue
Q3
Preprocessing: normalization
![Page 71: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/71.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
normalization
Ratio
Test statistic (T-test)
Log transformation
Preprocessing
Background corrBackground corr
![Page 72: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/72.jpg)
• Compensates for spot effects
• Choice of the reference important
– Intuitive reference:• First time point• Uninduced sample
– Independent reference (reference design)• Tissue mixture
Intuitive interpretation possible
Ratio often undefined
interpretation complicated
Ratio defined
Preprocessing: ratio
![Page 73: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/73.jpg)
• Log ratio: • upregulation range 0…+infinity• downregulation range 0…-infinity
2 fold overexpression2 fold underexpression
Ratio = 2Ratio = 0.5
Log2(Ratio) = 1Log2(Ratio) = -1
• ratio (R/G): • R>G upregulation range 1…+infinity• R<G downregulation range 0...1: range of downregulation squashed
Preprocessing: ratio
![Page 74: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/74.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
normalization
Ratio
Test statistic (T-test)
Log transformation
Preprocessing
Background corrBackground corr
![Page 75: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/75.jpg)
Overview further analysis
Raw data
Preprocessed data
Differentially expressed genes
Clusters of coexpressed
genes
Preprocessing
ClusteringTest statistic
![Page 76: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/76.jpg)
ANOVA based
Filtering
Linearisation
Bootstrapping
Log transformation
Array by array approach
Filtering
Normalization
Ratio
Test statistic (T-test)
Log transformation
Background corrBackground corr
Preprocessing
![Page 77: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/77.jpg)
I. MAIN EFFECTS + EFFECT OF INTEREST
Overall mean
Array effect
(hybridisation effciency)
Condition effect
(mRNA isolation effciency)
Gene effect
Constitutive level of gene
GC effect
Differential expression due to the altered variety
Dye effect
(labeling efficiency)
ijnmkijmnjiijnmk GCDACGy
Model the expression level of each as a combination of the different factorsLeast squares fit:
• subject to restrictions
• contrast of interest: estimate (GC)i1 – (GC)i2
MultiFactor, Linear, fixed levels
Preprocessing: ANOVA
![Page 78: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/78.jpg)
Assumption:
Independent, additive error ~F where F is a distribution with mean and variance 2
ijnmkijmnjiijnmk GCDACGy
Plot the residualsyestimated - ymeasured
Estimated intensity
Preprocessing: ANOVA
![Page 79: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/79.jpg)
I. MAIN EFFECTS + EFFECT OF INTEREST
Analysis of variance shows relative contribution of each of the effects
ijnmkijmnjiijnmk GCDACGy
Explains the relative contribution of each of these effects
Preprocessing: ANOVA
![Page 80: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/80.jpg)
Advantages:
• Gains more information with less observations=> derives variation from all measurements made (less replica’s required e.g. array effect based on N-1 gene measurements)
• Statistical testing: estimated error can be used for bootstrapping to estimate confidence levels
• No ratio’s requiredRequirements:
• Requires knowledge about experimental effects• Model used implicates that all effects and combinations of
effects should be linear• Bootstrapping: residuals should be normally distributed around
zero with constant variance
Preprocessing: ANOVA
![Page 81: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/81.jpg)
ijnmkijmnjiijnmk GCDACGy
Estimate error
Simulate new datasets based on estimated error (3000 times)
Calculate factor of interest (GC effect) for each bootstrapped dataset (recalculate ANOVA)
Calculate CI on (GC1-GC2) of N genes based on 3000 bootstraps
Use this interval to test for significant genes
ijnmkijnmboot yy
0GC1-GC2
ANOVA Bootstrap analysis
Preprocessing: ANOVA
![Page 82: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/82.jpg)
![Page 83: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/83.jpg)
DATA• Filtered for zero values• set 1: unnormalised data
MODELS (Kerr et al. 2000, 2001)• Model 1 (no spot effects)• Model 2 (spot effects independent)• Model 3 (spot effects dependent)
MODELS• GC effects not confounded with the spot effects• type of model does influence the (residual error)=> Does influence the bootstrap interval
More Arrays Simulaneously Preprocessing
![Page 84: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/84.jpg)
DATA• Filtered for zero values• set 1: unnormalised data
MODELS (Kerr et al. 2000, 2001)• Model 1 (no spot effects)• Model 2 (spot effects independent)• Model 3 (spot effects dependent)
MODELS• GC effects not confounded with the spot effects• type of model does influence the (residual error)=> Does influence the bootstrap interval
More Arrays Simulaneously Preprocessing
![Page 85: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/85.jpg)
I. MAIN EFFECTS + EFFECT OF INTEREST
Overall mean
Array effect
(hybridisation effciency)
Condition effect
(mRNA isolation effciency)
Gene effect
Constitutive level of gene
GC effect
Differential expression due to the altered variety
Dye effect
(labeling efficiency)
ijnmkijmnjiijnmk GCDACGy
More Arrays Simulaneously Preprocessing
![Page 86: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/86.jpg)
Least squares fit:• subject to restrictions• contrast of interest: estimate (VG)k1g – (VG)k2g• Usual confidence intervals based on normal theory not appropriate
Bootstrap analysis of residuals avoid making distributional assumptions about error
Assumption:
Independent, additive error ~F where F is a distribution with mean and variance 2
ijnmkijmnjiijnmk GCDACGy
More Arrays Simulaneously Preprocessing
![Page 87: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/87.jpg)
More Arrays Simulaneously Preprocessing
![Page 88: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/88.jpg)
ŷ
ŷŷ
ŷ
TEST, ARRAY 1
REFERENCE, ARRAY 1
REFERENCE, ARRAY 2
TEST, ARRAY 2
More Arrays Simulaneously Preprocessing
![Page 89: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/89.jpg)
More Arrays Simulaneously
Additive error and non linear effects undermine application of ANOVA
Preprocessing
![Page 90: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/90.jpg)
ŷ
ŷŷ
ŷ
TEST, ARRAY 1
REFERENCE, ARRAY 1
REFERENCE, ARRAY 2
TEST, ARRAY 2
More Arrays Simulaneously Preprocessing
![Page 91: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/91.jpg)
Lowess
ijnmkijkinimnjiijnmk GCRGAGDACGy
99 % confidence interval based on 100 genes, 3000 bootstraps
retained 370 genes (62 T-test p value < 0.01)
Bootstrap analysis
ID Rat_1 Rat_2 Rat_3 Rat_4 p D_GC_effects285 -3.31674 -3.20904 -2.08115 -1.62183 0.008818 -2.577397
1076 -1.39327 -2.04573 -1.85822 -2.42609 0.002899 -2.1754383755 -0.81029 -1.50631 -0.99613 -1.40283 0.005643 -1.245061
Preprocessing
![Page 92: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/92.jpg)
Methods tested on pygmee dataset 3750 genes
1. ANOVA 99 % CI
2. ANOVA 95 % CI
3. SAM
4. T-test
5. Fold test
Retained 360 genes
Construct for each gene a binary profile 1 1 1 1 1
Hierarchically cluster genes based on this profile
methodsComparison
Only 8 genes retained by all methods
![Page 93: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/93.jpg)
methodsmethodsComparison
![Page 94: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/94.jpg)
methodsComparison
![Page 95: Bioinformatics Expression profiling and functional genomics Part I: Preprocessing Ad 29/10/2006](https://reader035.vdocuments.site/reader035/viewer/2022081513/5a4d1acf7f8b9ab059970c9a/html5/thumbnails/95.jpg)
• Latin Square (mouse data set)
• Reference: normal mouse• Condition: pygmee mouse• Two experiments C=1, C=2 reflects two sample time points• 2 batches: not all genes of the genome on one array
A 1, C 1 B1
Test = R
Ref = G
A 2, C 1 B1
Test = G
Ref = R
A 5, C 2 B1
Test = R
Ref = G
A 6, C 2 B1
Test = G
Ref = R
A 3, C 1 B2
Test = R
Ref = G
A 4, C 1 B2
Test = R
Ref = G
A 7, C 2
B2
Test = R
Ref = G
A 8, C 2 B2
Test = G
Ref = R
Transcript profiling Experiment Design