ss 2017 analysis lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... differ...
TRANSCRIPT
![Page 1: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/1.jpg)
Humboldt-Universität zu Berlin
Gene Expression Analysis
Grundlagen der BioinformatikSS 2017
Lecture 716.06.2017
"DNA Repair" by Tom Ellenberger, Washington University School of Medicine in St. Louis. - Biomedical Beat, Cool Image Gallery. Licensed under Public Domain via Commons - https://commons.wikimedia.org/wiki/File:DNA_Repair.jpg#/media/File:DNA_Repair.jpg
![Page 2: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/2.jpg)
Recap: Proteins & mRNA
2
❖ Cellular worker-units
❖ DNA -> mRNA -> Amino-acids -> Protein
❖ Abundance mRNA ~ Gene-activity
❖ Connected to phenotypes e.g. cancer
![Page 3: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/3.jpg)
Recap: MicroarraysStructure
❖ Single-stranded DNA on glass-slides
❖ cDNA-Hybridization
❖ Laser-illumination
3
![Page 4: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/4.jpg)
Recap: MicroarraysStructure
❖ Single-stranded DNA on glass-slides
❖ cDNA-Hybridization
❖ Laser-illumination
4
Data-Analysis
❖ Biological & technical errors/ biases
❖ Discretize, visualize and correct errors and biases
Normal distributionassumption
![Page 5: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/5.jpg)
Lecture 7
Gene Expression Analysis
5
Structure
❖ Differential expression
❖ Fold-change
❖ T-test
❖ Clustering
❖ Databases
Make heterogeneous data great again
![Page 6: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/6.jpg)
Differential gene expression - Etiology
Identify causes and evolution of e.g. cancer (etiology)
Adapt treatment
6
Example: Understand development of cancer
![Page 7: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/7.jpg)
Differential gene expression - Biomarker
Find early-presence-marker of cancer
Find marker for e.g. drug-response
7
Example: Increasesed angiogenesis signals
![Page 8: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/8.jpg)
Differential gene expression - Personalized medicine
❖ Sequence patient❖ Determine similarity to known
cases❖ Administer best drug
❖ And avoid side-effects!
8
![Page 9: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/9.jpg)
Basic concept differential expression
9
P-value
Log-FCMultiple-Testing
Correction
![Page 10: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/10.jpg)
Problem definition
10
![Page 11: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/11.jpg)
Scatterplot vs. differential expression
11
![Page 12: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/12.jpg)
Fold-Change
12
Thresholds (common but arbitrary)
❖ |FC| < 1 not interesting❖ |FC| > 2 very interesting
Log FC
![Page 13: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/13.jpg)
Identification differential expression
13
![Page 14: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/14.jpg)
Log FC differential expression
14
❖ Identify differentially expressed genes
❖ Fold-change problematic
Same FC but different likelihood to be dif. exp.
![Page 15: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/15.jpg)
Probability meassure
Meassure likelihood for truely dif. exp genes to
show these distributions
15
Example probability meassure
![Page 16: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/16.jpg)
P-value & statistical error types
16
P-value governed
α
Alpha:= likelihood for type 1 error
![Page 17: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/17.jpg)
Statistical Hypothesis testing
1. Formulate: null and alternative hypothesis
2. Select a significance level alpha
3. Sample population/ cohort
4. Calculate test statistic
5. P-value-based decision
17
Requires known variance + mean
![Page 18: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/18.jpg)
Central Limit Theorem
Assume normal distribution for mean-probabilities
(empirically expected value)
18
Likelihood sum of n 6-sided dice
The probability distribution of the mean of i.i.d. random
variables tends to the normal distribution
❖ i.i.d. = independent and identically distributed
![Page 19: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/19.jpg)
Student’s t-test
❖ Compare mean & variance of cohorts
❖ Equal variances❖ Probality to be dif. exp. follows
t-probability meassure
19
Test on rejection of equality
t-value calculationin general omega_0 = 0
![Page 20: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/20.jpg)
Problem variance & sub-sampling
20
Why not use assume normal distribution for variance?
![Page 21: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/21.jpg)
Problem variance & sub-sampling
Variance of dif. exp. not normal-distributed for sub-sampled data
21
Bummer for normal distribution
u = true mean, x = empirical mean
![Page 22: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/22.jpg)
T-distributionDefinition
❖ Variance of sub-samples follows t-distribution
❖ Thus, apply t-test and not normal-test
22
Probability density functiont-distribution
![Page 23: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/23.jpg)
Example T-statistic
23
![Page 24: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/24.jpg)
T-test statistic
❖ Retrieve t-values from test statistics
❖ Based on |cohorts 1| (n) and |cohort 2| (m)
24
t-statistic#
m +
# n
- 2
p-value = 1- value
P-value acquisition
![Page 25: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/25.jpg)
Volcano Plot
❖ Combines log-FC and p-value (here as negative log 10)
❖ Discretizes two-parameter cut-off
❖ Identifies dif. exp. genes
25
Volcano plotNote higher (right) and lower (left) expression
![Page 26: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/26.jpg)
Example hypothesis testing
26
![Page 27: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/27.jpg)
Multiple Testing Problem
27
Thousands of hypotheses are tested simultaneously
❖ Increased chance of false positives
❖ 10,000 genes á chip, 10k * 0.01 = 100 have a p-value < 0.01 by chance
❖ Multiple testing methods allow to assess the statistical significance of findings
Corrected P-values := Q-values
![Page 28: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/28.jpg)
Multiple Testing Problem
28
Approach 1: FWER
Family–wise error rate (FWER) is defined as the probability of at least one Type I error (false positive) among the genesselected as significant
![Page 29: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/29.jpg)
Multiple Testing Problem
29
Approach 2: FDR
False discovery rate (FDR), the expected proportion of true null hypotheses rejected in the total number of rejections
![Page 30: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/30.jpg)
Bonferoni-Correction
30
❖ Adjusted p-value is smaller than the pre-chosen significance value, probe is differentially expressed
❖ Very conservative (many failures to reject a false H0), rarely used
❖ Bonferoni assumes independence between the tests (usually wrong)
❖ Appropriate when a single false positive in a set of tests would be a problem (e.g., drug development)
Q-value = P-value * # P-values
![Page 31: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/31.jpg)
Benjamini-Hochberger
31
1. Choose α (e.g. α=0.05)
2. Sort p-values from small to large
3. Correct p-values: BH(p_i)
i=1,…,m = pi * m/i
4. BH (p) = significant if BH(p) ≤ α
Alpha
Area under curve holds 5% of p-values
![Page 32: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/32.jpg)
Receiver Operating Characteristic-curve
❖ Determine optimal e.g. q-values
❖ Trained on goldstandard
❖ Estimation of (future) sensitivity and specificity
32
ROC-curve
![Page 33: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/33.jpg)
Linear regression
❖ Model data
❖ Predict e.g. cancer-risk
❖ Identify correlated parameters B
33
Dependent variableData
Correlated features
![Page 34: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/34.jpg)
Linear regression
34
Y (effect) = X (data) * B (linear parameters)
![Page 35: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/35.jpg)
Robust Multi-array Average
❖ Abreviated RMA
❖ Utilized match & mismatch probes
35
1. Corrected, log 2 data
2. Rank expression
3. Replace ranked expression-values by mean
4. Linear (regression) expression-model
![Page 36: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/36.jpg)
RMA example
36
![Page 37: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/37.jpg)
RMA example
37
2 … n
![Page 38: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/38.jpg)
Linear (regression) model
❖ Y_ij = corrected (single) probe’s value
❖ m_i = probe set value❖ a_j = (single) probe’s affinity❖ e_ij = error term❖ i = Sample❖ j = Probe
38
Normalized single probe expression
Note distinction between perfect and mismatch probes
Oligo array
![Page 39: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/39.jpg)
Clustering
39
❖ Identify subgroups
❖ Quality control
❖ Similarity-based
Colors == spacial-clustering
Distance metric critical
![Page 40: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/40.jpg)
Overview Clustering
40
Today’s topic
![Page 41: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/41.jpg)
Unsupervised vs. supervised
41
![Page 42: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/42.jpg)
Example Clustering
❖ Colors := hierarchical tree-cut
42
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−10
0
10
20
−10 0 10 20 30 40PC1 (17.6% explained var.)
PC2
(12.
0% e
xpla
ined
var
.)
● ● ●BA CL MS
Hierarchie pair-wise similarity based
![Page 43: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/43.jpg)
Hierarchical Clustering
1. Choose distance metric
Euclidean
Pearson, etc.
2. Compute similarity matrix S
43
![Page 44: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/44.jpg)
Hierarchical Clustering
1. Choose distance metric
Euclidean
Pearson, etc.
2. Compute similarity matrix S
44
3. While |S|>1:
Determine pair (X,Y) with minimal distance
Compute new value Z = avg (X,Y),
(single, average, or complete linkage)
Delete X and Y in S, insert Z in S
Compute new distances of Z to all elements in S
Visualize X and Y as pair
![Page 45: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/45.jpg)
Example hierarchical clustering
45
![Page 46: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/46.jpg)
❖ Euclidian❖ Squared❖ Manhattan❖ Maximum❖ Mahalinobis
❖ S = Correlation matrix
Distance metrics
❖ Define ‚distance’ i.e. which data (dots) are merged
❖ Linear vs. non-linear distances
❖ Differ especially w.r.t. outlier-sensitivity
46
![Page 47: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/47.jpg)
❖ Single❖ Complete❖ Average❖ Cluster-centers
❖ c = cluster-centroids
Linkage Rules
❖ Define how to cluster data (dots)
❖ Represent desired ‚definition' of a cluster❖ E.g. ‚mean’-linkage will generall
yield more balanced clusters
47
![Page 48: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/48.jpg)
K-Means clustering
❖ Partitions n observations into k clusters
❖ Minimize the distance of the n data points from their respective cluster centres.
48
Cluster on proximity of k-centers
Difference hierarchical clustering: No pair-wise clustering
![Page 49: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/49.jpg)
K-Means clustering
49 Cluster-centers critical
1. Choose k random cluster centers μ1,...μk
2. Assign for each point x in dataset S the closest cluster center
3. Compute a new center μi for every cluster Ci
4. Repeat 2-3. until cluster centers do not change
![Page 50: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/50.jpg)
Maximum-likelihood
50
❖ Find optimal cluster-centers
❖ Convergence not assured
❖ Initialization and number of centers (centroids) critical
Maximum likelihood centroids
![Page 51: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/51.jpg)
Databases - GEO
51
![Page 52: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/52.jpg)
Databases - GEO
52
GPL(GEO platform)
platform description
GDS(GEO dataset)grouping of experiments
NCBI public repository http://www.ncbi.nlm.nih.gov/geo/archives microarray, NGS, and other high-throughput genomics data submitted by the research community
![Page 53: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/53.jpg)
MIAME checklist
53
1. Raw data present
2. Processed data present
3. Sample annotation present (e.g. experimental factors, values & protocols)
4. Experimental design explained (e.g. what samples are replicates and why)
5. Annotation of the array (e.g., gene identifiers & genomic coordinates)
6. Laboratory and data processing protocols (e.g. normalisation method)
![Page 54: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/54.jpg)
Take-home messagesDifferential expression
❖ Combination Log-FC and P-values (Volcano plot)
❖ T-test identifies significantly differentially expressed genes
❖ Multiple-testing correction
54
![Page 55: SS 2017 Analysis Lecture 7 16.06 - hu-berlin.de = true mean, x = empirical mean. ... Differ especially w.r.t. outlier-sensitivity 46 ... points from their respective cluster](https://reader034.vdocuments.site/reader034/viewer/2022051601/5ad2fa9e7f8b9a86158dc3ab/html5/thumbnails/55.jpg)
Take-home messages
55
Clustering
❖ Identifies subgroups
❖ Depends on distance metric & linkage function
❖ GEO databases offer public expression data