diferential expression

Download Diferential Expression

Post on 05-Jan-2016

222 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Microarrays

    Differential Expression Analysisof Microarray Data

    Steven Buechler

    Department of Mathematics276B Hurley Hall; 1-6233

    Fall, 2007

  • Microarrays

    Outline

    MicroarraysIntroduction to Differential Expression AnalysisMultiple Testing ProblemPrefiltering the Gene ListMulttest PackageDifferential Expression Example

  • Microarrays

    Most Expression Analysis is Comparative

    Frequently the important investigations with microarrays are toidentify the genes whose expression levels change between twosample groups. To understand the effect of a drug we may askwhich genes are up-regulated (increased in expression) ordown-regulated (decreased in expression) between treatment andcontrol groups.

    Other paradigms involve clustering genes that follow the sameexpression pattern across a set of samples, or clustering sampleswith similar expression patterns across genes. This is beyond thescope of the course (timewise).

  • Microarrays

    Most Expression Analysis is Comparative

    Frequently the important investigations with microarrays are toidentify the genes whose expression levels change between twosample groups. To understand the effect of a drug we may askwhich genes are up-regulated (increased in expression) ordown-regulated (decreased in expression) between treatment andcontrol groups.

    Other paradigms involve clustering genes that follow the sameexpression pattern across a set of samples, or clustering sampleswith similar expression patterns across genes. This is beyond thescope of the course (timewise).

  • Microarrays

    Differential Expression Compares Means

    Each sample group will contain numerous replicates. The groupexpression level for a probe will be summarized as the mean of theexpression levels in the group replicates. Thus, differentialexpression problems are a comparison of means. When there aretwo sample groups this is a t test of some kind.

  • Microarrays

    One Gene Example

    As a very simple example we ask if Mmp7 is differentiallyexpressed between the vehicle and sulindac samples in the mouseexpression data created previously.

    > load("/Users/steve/Documents/Bio/Rcourse/Lect10/sulindacObjects/sulEset1")

    > library(affy)

    > library(moe430a)

  • Microarrays

    Steps in Testing One Gene

    Find the probe(s) associated with the gene.

    For each probe create the vector of expression values for thetwo sample groups.

    Execute a t test for each probe.

    To be conservative we use the Mann-Whitney non-parametrict test.

  • Microarrays

    Steps in Testing One Gene

    Find the probe(s) associated with the gene.

    For each probe create the vector of expression values for thetwo sample groups.

    Execute a t test for each probe.

    To be conservative we use the Mann-Whitney non-parametrict test.

  • Microarrays

    Steps in Testing One Gene

    Find the probe(s) associated with the gene.

    For each probe create the vector of expression values for thetwo sample groups.

    Execute a t test for each probe.

    To be conservative we use the Mann-Whitney non-parametrict test.

  • Microarrays

    Steps in Testing One Gene

    Find the probe(s) associated with the gene.

    For each probe create the vector of expression values for thetwo sample groups.

    Execute a t test for each probe.

    To be conservative we use the Mann-Whitney non-parametrict test.

  • Microarrays

    Find the Probe(s) for Mmp7

    The environment we need to query is moe430aSYMBOL, but weneed to work backwards from the value to the key. A test likemoe430aSYMBOL == "Mmp7" doesnt work with an environment.We extract the symbols into a long character vector with namesthe probes.

    > prbs moeSyms moeSyms[1:3]

    1415670_at 1415671_at 1415672_at"Copg" "Atp6v0d1" "Golga7"

  • Microarrays

    Find the Probe(s) for Mmp7

    moeSyms has many NA values, which complicates the matchingprocess. (NA matches anything.)

    > mmp7S1 mmP7S mmP7S

    1449478_at"Mmp7"

  • Microarrays

    Extract the Expression Vectorsfor the Mmp7 probe

    > sulExp pd sMmp7 vMmp7 sMmp7

    S1.cel S3.cel S4.cel S6.cel S8.cel S9.cel11.138 9.711 9.648 11.212 10.715 10.090

    > vMmp7

    V2.cel V3.cel V4.cel V5.cel V6.cel V7.cel V8.cel11.195 9.408 12.317 12.262 11.871 11.753 11.549V9.cel11.416

  • Microarrays

    Execute the t Test

    Use the Mann-Whitney, also called the Wilcoxon Rank Sum test.

    > wilcox.test(sMmp7, vMmp7)

    Wilcoxon rank sum test

    data: sMmp7 and vMmp7W = 7, p-value = 0.02930alternative hypothesis: true location shift is not equal to 0

  • Microarrays

    Conclusion About Mmp7

    Conclude that Mmp7 expression is decreased in the sulindactreated samples.

    However, if all we wanted to know about is one gene wed just useRT-PCR, not an array with 22,000 features. The purpose of anarray is to get a list of genes that are differentially expressed, aftertesting each one. This brings into play the multiple hypothesistesting problem.

  • Microarrays

    Conclusion About Mmp7

    Conclude that Mmp7 expression is decreased in the sulindactreated samples.

    However, if all we wanted to know about is one gene wed just useRT-PCR, not an array with 22,000 features. The purpose of anarray is to get a list of genes that are differentially expressed, aftertesting each one. This brings into play the multiple hypothesistesting problem.

  • Microarrays

    Outline

    MicroarraysIntroduction to Differential Expression AnalysisMultiple Testing ProblemPrefiltering the Gene ListMulttest PackageDifferential Expression Example

  • Microarrays

    Many Genes Tested at Once

    In array-based differential expression analysis the problem is togenerate a list of genes that are differentially expressed, being ascomplete as possible.

    From a statistical point of view, for each gene we are testing thenull hypothesis that there is no differential expression across thesample groups. This may represent thousands of tests.

  • Microarrays

    Many Genes Tested at Once

    In array-based differential expression analysis the problem is togenerate a list of genes that are differentially expressed, being ascomplete as possible.

    From a statistical point of view, for each gene we are testing thenull hypothesis that there is no differential expression across thesample groups. This may represent thousands of tests.

  • Microarrays

    Example of the Problem

    Suppose that 1000 genes are represented on an array and we testeach with a t test with a Type I error threshold of 0.05. We mightexpect 40 genes to be differentially expressed. Of the 960non-differentially expressed genes we can expect 5% errors, or.05 960 = 48 false positives. In other words, there are more falsepositives than truly differentially expressed genes.

    This illustrates the multiple hypothesis testing problem. With manyindependent tests the percentage of false positives may overwhelmthe true positives and make the analysis effectively useless.

    Most experimenters would state as the goal in such a multiple genedifferential analysis is that the probability of one false positive is0.05. That is, that the experiment-wise Type I error is 0.05.

  • Microarrays

    Example of the Problem

    Suppose that 1000 genes are represented on an array and we testeach with a t test with a Type I error threshold of 0.05. We mightexpect 40 genes to be differentially expressed. Of the 960non-differentially expressed genes we can expect 5% errors, or.05 960 = 48 false positives. In other words, there are more falsepositives than truly differentially expressed genes.

    This illustrates the multiple hypothesis testing problem. With manyindependent tests the percentage of false positives may overwhelmthe true positives and make the analysis effectively useless.

    Most experimenters would state as the goal in such a multiple genedifferential analysis is that the probability of one false positive is0.05. That is, that the experiment-wise Type I error is 0.05.

  • Microarrays

    Example of the Problem

    Suppose that 1000 genes are represented on an array and we testeach with a t test with a Type I error threshold of 0.05. We mightexpect 40 genes to be differentially expressed. Of the 960non-differentially expressed genes we can expect 5% errors, or.05 960 = 48 false positives. In other words, there are more falsepositives than truly differentially expressed genes.

    This illustrates the multiple hypothesis testing problem. With manyindependent tests the percentage of false positives may overwhelmthe true positives and make the analysis effectively useless.

    Most experimenters would state as the goal in such a multiple genedifferential analysis is that the probability of one false positive is0.05. That is, that the experiment-wise Type I error is 0.05.

  • Microarrays

    Bonferroni Correction

    Suppose that a collection of g null hypotheses are being tested; i.e,differential expression of g genes is being assessed. The family-wiseerror rate (FWER) is the probability of rejecting at least one nullhypothesis, given that they are all true (one false positive). If isthe desired FWER this can be achieved by testing each nullhypothesis with a Type I error of /g . This is called theBonferroni Correction.

    If the desired FWER is 0.05 and there are 1000 genes tested, setthe threshold for each at 0.00005.

    Slightly less brutal is the Sidak correction that replaces byK (g , ) = 1 g1 , but this still places a very high thresholdon satisfying each test.

  • Microarrays