bioinformatics: gene expression basics ollie rando, lrb 903
Post on 21-Dec-2015
220 views
TRANSCRIPT
![Page 1: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/1.jpg)
Bioinformatics: gene expression basics
Ollie Rando, LRB 903
![Page 2: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/2.jpg)
Biological verification and interpretation
Microarray experiment
Experimental design
Image analysis
Normalization
Biological question (hypothesis-driven or explorative)
TestingEstimation DiscriminationAnalysis
Clustering
Experimental Cycle
Quality Measurement
Failed
Pass
Pre-processing
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination:
He may be able to say what the experiment died of.
Ronald Fisher
![Page 3: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/3.jpg)
Lecture 1.1 3
DNA Microarray
![Page 4: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/4.jpg)
From experiment to data
![Page 5: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/5.jpg)
Lecture 1.1 5
Microarrays & Spot Colour
![Page 6: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/6.jpg)
Lecture 1.1 6
Microarray Analysis Examples
Brain Brain 67,67967,679
Heart Heart 9,4009,400
Liver Liver 37,80737,807 Colon Colon
4,8324,832Prostate Prostate 7,9717,971
Skin Skin 3,0433,043
Bone Bone 4,8324,832
Lung Lung 20,22420,224
BrainBrain LungLung
LiverLiver Liver TumorLiver Tumor
![Page 7: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/7.jpg)
Raw data are not mRNA concentrations
• tissue contamination• RNA degradation• amplification efficiency• reverse transcription efficiency• Hybridization efficiency and
specificity• clone identification and
mapping• PCR yield, contamination
• spotting efficiency
• DNA support binding
• other array manufacturing related issues
• image segmentation
• signal quantification
• “background” correction
![Page 8: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/8.jpg)
Data Data (log scale)
Scatterplot
Message: look at your data on log-scale!
![Page 9: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/9.jpg)
MA Plot
A = 1/2 log2(RG)
M =
log 2(R
/G)
![Page 10: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/10.jpg)
Median centering
Log S
ignal, c
ente
red
at
0
One of the simplest strategies is to bring all „centers“ of the array data to the same level.
Assumption: the majority of genes are un-changed between conditions.
Median is more robust to outliers than the mean.
Divide all expression measurements of each array by the Median.
![Page 11: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/11.jpg)
Problem of median-centering
Log Green
Log
Red
Scatterplot of log-Signals after Median-centering
A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
M-A Plot of the same data
Median-Centering is a global Method. It does not adjust for local effects, intensity dependent effects, print-tip effects, etc.
![Page 12: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/12.jpg)
Lowess normalization
A = (Log Green + Log Red) / 2
M =
Log
Red
- Lo
g G
reen
Local
estimateUse the estimate to bend
the banana straight
![Page 13: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/13.jpg)
Summary I
• Raw data are not mRNA concentrations• We need to check data quality on different
levels– Probe level– Array level (all probes on one array)– Gene level (one gene on many arrays)
• Always log your data• Normalize your data to avoid systematic (non-
biological) effects• Lowess normalization straightens banana
![Page 14: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/14.jpg)
OK, so I’ve got a gene list with expression changes: now what?
YPL171C 7.743877387
YBR008C 6.390877387
YFL056C 5.740877387
YKL086W 5.408877387
YOL150C 4.831877387
YOL151W 4.760877387
YFL057C 4.725877387
YKL071W 4.172877387
YLR327C 4.167877387
YLL060C 4.130877387
YLR460C 4.063877387
YML131W 4.047877387
YDL243C 4.031877387
YKR076W 3.942877387
YOR374W 3.937877387
“Huh. Turns out the standard names for themost upregulated genes all start with ‘HSP’,or ‘GAL’ … I wonder if that’s real …”
![Page 15: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/15.jpg)
Gene Ontology• Organization of curated biological knowledge
– 3 branches: biological process, molecular function, cellular component
![Page 16: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/16.jpg)
Hypergeometric Distribution• Probability of observing x or more genes in a cluster of n
genes with a common annotation
– N = total number of genes in genome– M = number of genes with annotation– n = number of genes in cluster– x = number of genes in cluster with annotation
• Multiple hypothesis correction required if testing multiple functions (Bonferroni, FDR, etc.)
• Additional genes in clusters with strong enrichment may be related
![Page 17: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/17.jpg)
Kolmogorov-Smirnov test• Hypergeometric test requires “hard calls” – this list of
278 genes is my upregulated set• But say all 250 genes involved in oxygen consumption go
up ~10-20% each – this would not likely show up• KS test asks whether *distribution* for a given geneset
(GO category, etc.) deviates from your dataset’s background, and is nonparametric
• Cumulative Distribution Function (CDF) plot:
• Gene Set Enrichment Analysis:• http://www.broadinstitute.org/gsea/
![Page 18: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/18.jpg)
GO term Enrichment Tools• SGD’s & Princeton’s GoTermFinder
– http://go.princeton.edu• GOLEM (http://function.princeton.edu/GOLEM)
• HIDRA
Sealfon et al., 2006
![Page 19: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/19.jpg)
Supervised analysis= learning from examples, classification
– We have already seen groups of healthy and sick people. Now let’s diagnose the next person walking into the hospital.
– We know that these genes have function X (and these others don’t). Let’s find more genes with function X.
– We know many gene-pairs that are functionally related (and many more that are not). Let’s extend the number of known related gene pairs.
Known structure in the data needs to be generalized to new data.
![Page 20: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/20.jpg)
Un-supervised analysis
= clustering– Are there groups of genes that behave similarly in
all conditions?– Disease X is very heterogeneous. Can we identify
more specific sub-classes for more targeted treatment?
No structure is known. We first need to find it. Exploratory analysis.
![Page 21: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/21.jpg)
Supervised analysisCalvin, I still don’t know the difference between cats and dogs …Oh, now I get it!!
Don’t worry!I’ll show you once more:
Class 1: cats Class 2: dogs
![Page 22: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/22.jpg)
Un-supervised analysisCalvin, I still don’t know the difference between cats and dogs …
I don’t know it either.
Let’s try to figure it out together …
![Page 23: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/23.jpg)
Supervised analysis: setup• Training set
– Data: microarrays– Labels: for each one we know if it falls into our class
of interest or not (binary classification)
• New data (test data)– Data for which we don’t have labels. – Eg. Genes without known function
• Goal: Generalization ability– Build a classifier from the training data that is good
at predicting the right class for the new data.
![Page 24: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/24.jpg)
One microarray, one dotExpre
ssio
n o
f g
en
e 2
Expression of gene 1
Think of a space with #genes dimensions (yes, it’s hard for more than 3).
Each microarray corresponds to a point in this space.
If gene expression is similar under some conditions, the points will be close to each other.
If gene expression overall is very different, the points will be far away.
![Page 25: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/25.jpg)
Which line separates best?A B
C D
![Page 26: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/26.jpg)
No sharp knive, but a …
FAT P
LANE
![Page 27: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/27.jpg)
Support Vector Machines
Maximal margin separating hyperplane
Datapoints closest to separating hyperplane= support vectors
![Page 28: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/28.jpg)
How well did we do?
The classifier will usually perform worse than before:
Test error > training error
Same classifier (= line)
New data from same classes
Training error: how well do we do on the data we trained the classifier on?
But how well will we do in the future, on new data?
Test error: How well does the classifier generalize?
![Page 29: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/29.jpg)
Cross-validation
Train classifier and test itTraining error
Train TestTest error
K-fold Cross-validation
Train TestTrainStep 1.
Test TrainTrainStep 2.
Train TrainTestStep 3.
Here for K=3
![Page 30: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/30.jpg)
Additional supervised approaches might
depend on your goal: cell cycle analysis
![Page 31: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/31.jpg)
Clustering
• Let the data organize itself
• Reordering of genes (or conditions) in the dataset so that similar patterns are next to each other (or in separate groups)
• Identify subsets of genes (or experiments) that are related by some measure
![Page 32: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/32.jpg)
Quick ExampleG
enes
Conditions
![Page 33: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/33.jpg)
Why cluster?
• “Guilt by association” – if unknown gene X is similar in expression to known genes A and B, maybe they are involved in the same/related pathway
• Visualization: datasets are too large to be able to get information out without reorganizing the data
![Page 34: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/34.jpg)
Clustering Techniques
• Algorithm (Method)– Hierarchical– K-means– Self Organizing Maps– QT-Clustering– NNN– .– .– .
• Distance Metric– Euclidean (L2)
– Pearson Correlation– Spearman Correlation– Manhattan (L1)
– Kendall’s – .– .– .
![Page 35: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/35.jpg)
Distance Metrics
• Choice of distance measure is important for most clustering techniques
• Pair-wise metrics – compare vectors of numbers– e.g. genes x & y, ea. with n measurements
Euclidean Distance
Pearson Correlation
Spearman Correlation
![Page 36: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/36.jpg)
Distance MetricsEuclidean Distance
Pearson Correlation
Spearman Correlation
![Page 37: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/37.jpg)
Hierarchical clustering
• Imposes (pair-wise) hierarchical structure on all of the data
• Often good for visualization• Basic Method (agglomerative):
1. Calculate all pair-wise distances2. Join the closest pair3. Calculate pair’s distance to all others4. Repeat from 2 until all joined
![Page 38: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/38.jpg)
Hierarchical clustering
![Page 39: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/39.jpg)
Hierarchical clustering
![Page 40: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/40.jpg)
Hierarchical clustering
![Page 41: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/41.jpg)
Hierarchical clustering
![Page 42: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/42.jpg)
Hierarchical clustering
![Page 43: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/43.jpg)
Hierarchical clustering
![Page 44: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/44.jpg)
HC – Interior Distances
• Three typical variants to calculate interior distances within the tree– Average linkage: mean/median over all possible
pair-wise values
– Single linkage: minimum pair-wise distance
– Complete linkage: maximum pair-wise distance
![Page 45: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/45.jpg)
Hierarchical clustering: problems
• Hard to define distinct clusters• Genes assigned to clusters on the basis of all
experiments• Optimizing node ordering hard (finding the optimal
solution is NP-hard)• Can be driven by one strong cluster – a problem for
gene expression b/c data in row space is often highly correlated
![Page 46: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/46.jpg)
Cluster analysis of combined yeast data sets
Eisen M B et al. PNAS 1998;95:14863-14868
©1998 by The National Academy of Sciences
![Page 47: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/47.jpg)
To demonstrate the biological origins of patterns seen in Figs. 1 and 2, data from Fig. 1 were clustered by using methods described here before and after random permutation within rows
(random 1), within columns (random 2), and both (random 3).
Eisen M B et al. PNAS 1998;95:14863-14868
©1998 by The National Academy of Sciences
![Page 48: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/48.jpg)
Hierarchical Clustering: Another Example
• Expression of tumors hierarchically clustered• Expression groups by clinical class
Garber et al.
![Page 49: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/49.jpg)
K-means Clustering• Groups genes into a pre-defined number of
independent clusters• Basic algorithm:
1. Define k = number of clusters2. Randomly initialize each cluster with a seed (often with
a random gene)3. Assign each gene to the cluster with the most similar
seed4. Recalculate all cluster seeds as means (or medians) of
genes assigned to the cluster5. Repeat 3 & 4 until convergence
(e.g. No genes move, means don’t change much, etc.)
![Page 50: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/50.jpg)
K-means example
![Page 51: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/51.jpg)
K-means example
![Page 52: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/52.jpg)
K-means example
![Page 53: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/53.jpg)
K-means example
![Page 54: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/54.jpg)
K-means: problems
• Have to set k ahead of time– Ways to choose “optimal” k: minimize within-
cluster variation compared to random data or held out data
• Each gene only belongs to exactly 1 cluster• One cluster has no influence on the others
(one dimensional clustering) • Genes assigned to clusters on the basis of all
experiments
![Page 55: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/55.jpg)
Clustering “Tweaks”
• Fuzzy clustering – allows genes to be “partially” in different clusters
• Dependent clusters – consider between-cluster distances as well as within-cluster
• Bi-clustering – look for patterns across subsets of conditions– Very hard problem (NP-complete)– Practical solutions use heuristics/simplifications that
may affect biological interpretation
![Page 56: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/56.jpg)
Cluster Evaluation
• Mathematical consistency– Compare coherency of clusters to background
• Look for functional consistency in clusters– Requires a gold standard, often based on GO,
MIPS, etc.
• Evaluate likelihood of enrichment in clusters– Hypergeometric distribution, etc.– Several tools available
![Page 57: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/57.jpg)
More Unsupervised Methods
• Search-based approaches– Starting with a query gene/condition, find most
related group• Singular Value Decomposition (SVD) & Principal
Component Analysis (PCA)– Decomposition of data matrix into “patterns”
“weights” and “contributions”– Real names are “principal components”
“singular values” and “left/right eigenvectors”– Used to remove noise, reduce dimensionality, identify
common/dominant signals
![Page 58: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/58.jpg)
• SVD is the method, PCA is performing SVD on centered data
• Projects data into another orthonormal basis• New basis ordered by variance explained
X U
Vt
=
SVD (& PCA)
OriginalData matrix
“Eigen-conditions”
Singular values
“Eigen-genes”
![Page 59: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/59.jpg)
SVD
SVD
![Page 60: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/60.jpg)
OK, so all that’s fine. Let’s give it a shot
• Say we’ve run a gene expression array for changes in gene expression when chromatin protein X is deleted
• What GO categories show differential expression?• What TF binding sites regulate these genes?• I think this protein will affect genes near the ends of
the chromosomes – how do I check?• I bet TATA-containing genes are disproportionately
affected, so let’s check.• I think this protein is involved in stress response – let’s
compare it to a stress response dataset
![Page 61: Bioinformatics: gene expression basics Ollie Rando, LRB 903](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5a5503460f94a3a3a7/html5/thumbnails/61.jpg)
Where do we go for relevant datasets?
• GO: see previous• Yeast genomic annotations: Saccharomyces
Genome Database• Potential regulatory sites – MEME:
http://meme.sdsc.edu/meme4_3_0/cgi-bin/meme.cgi
• TATA box data for yeast: Basehoar … Pugh, Cell, 2004
• Stress response: Gasch et al