the microarray data analysis ana deckmann carla judice jorge lepikson jorge mondego leandra scarpari...

47
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais Herig

Upload: internet

Post on 21-Apr-2015

108 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

The microarray data analysis

Ana Deckmann

Carla Judice

Jorge Lepikson

Jorge Mondego

Leandra Scarpari

Marcelo Falsarella Carazzolle

Michelle Servais

Tais Herig

Page 2: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Summary

- Statistics background

- Introduction to microarray

- Pre-processing microarray data

- Statistics analysis

- D-maps

Page 3: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

- measurement = truth + error

- error = bias + variance

Error model

Normalization Experimental replicate (techniques and biological) and statistics

Bias describe a systematic tendency of the measurement. Ex: dyes Cy3 and Cy5 don´t have the same efficient

Variance is often normally distributed, ex : instrumentation imperfection and biological variation

Statistics background

Page 4: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Introduction to microarray

-Three different microarray technologies :

- Spotted cDNA microarrays (500 to 2500 bp)

- Spotted oligonucleotide microarrays (30 to 70 bp)

- Affymetrix chips (25 bp)

- Can be used to :

- Differential gene expression studies, gene co-regulation studies, gene function identification studies. time-course studies, dose-response studies, clinical diagnosis, …

Page 5: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Two color architecture

Page 6: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Probes: 30-meros, 90% até 550 bases downstream extremidade 3’ Targets: 10ug cRNA biotinilado

Codelink architecture (one color)

Page 7: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

higher frequency, more energy

lower frequency,

less energy

excitation

red lasergreen

laser

emission

overlay images

Scanning

Page 8: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

A

B

C

H

G

F

D

E

1 2 3 4

1 2 3 4 5 6 7 8 9 10 11abcdefghijk

Scarpari, Leandra – 2006 – Tese Doutorado

Ludwig flags : (0) Int <= Back

(1) Irregular spots

(3) Spot ok

(4) Saturated

Ludwig scanner

Page 9: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Codelink flags :

(L) near background

(C) contaminated

(S) saturated

(M) masked

(G) good

Codelink scanner

Page 10: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

A

B

C

H

G

F

D

E

1 2 3 4

LGE defined flags :

(0) – Spot ok

(1) – Spot Saturado

(2) – Int/Back <= 1.05

(3) – Area <= 110 or 50 (9x9 or 11x11)

Defined intensity :

-Int Cy3 = Area Cy3 * (median(Int Cy3)-median(Bkgd(Cy3))

-Int Cy5 = Area Cy5 * (median(Int Cy5)-median(Bkgd(Cy5))

LGE scanner

Page 11: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Cy3= 3329280; Cy5= 2251624 r=0.67 (fold=-1.49)

(Target median - Bkgd median) * Area = integrated intensity

pixels out pixels in > pixels outpixels in

- * =

Page 12: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Cy3= 222824; Cy5= 15488 r=0.069 fold=-14.5 flag=0

Cy3= 481536; Cy5= 676000 r=fold=1.40 flag=0

Cy3= 293664; Cy5= 485368 r=1.65 flag=0

Cy3= 6400; Cy5= -3584 NA (sinal:ruído<=1) flag=2

Cy3= 8767720; Cy5= 1349296 r=0.15 fold=-6.7 flag=1

Page 13: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Pre-processing microarray data -Bioconductor repository (http://www.bioconductor.org/)

-Log intensities

R=G Log2R=Log2G

Most genes have low gene expression levels. What happens here?

Page 14: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

up-regulated genes

down-regulated genes

non-differentially expressed genes are now along the horizontal line:

M = 0

log2R - log2G = 0

R = G

Transformed data {(M,A)i}:

M = log2(R) - log2(G) (minus)

A = ½·[log2(R) + log2(G)] (add)

M vs A plot

Page 15: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

log2R = red channel signallog2G = green channel signal

Density plot

Page 16: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

1

16

Print-tip box plot

Page 17: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Normalization within slidesExpectation: Most genes are non-differentially expressed, i.e. most of the data points should be around M=0.

Page 18: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Median normalization : which sets the median of log intensity ratios to zero

Median value = 0

Lowess normalization : global lowess normalization

Page 19: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Print-tip normalization : print-tip group lowess normalization

X*ij=(Xij-median(GRIDj))/sd(GRIDj)

Scaled print-tip : scaled print-tip group lowess normalization

Page 20: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Normalization across slides-QUANTILE

QQPlot

Mean between 8 slides

Page 21: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

-LOWESS (applied in one color microarray)

Transformed data {(M,A)i}:

M = log2(Int1) - log2(Int2) ; A= ½·[log2(Int1) + log2(Int2)]

Page 22: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Statistics analysis- T statistics test

The T statistics down-weight the importance of the average if the deviation is large and vice versa;

T = mean(x) / SE(x)

where SE(x)=std.dev(x)/N (standard error of the mean)

The blue gene has the lower T-value than red gene.

Page 23: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Top table and volcanoplotp.value F.change GENE1.01E-07 -1.5 interleukin-18 binding protein3.94E-06 -1.3234 Matrix metalloproteinase 30.000734 -1.93895 leukocyte integrin alpha chain7.25E-05 1.960643 azurocidin 1 preproprotein1.38E-09 2.317313 Macrophage-stimulating protein6.82E-05 2.34858 alpha1-antichymotrypsin

Fold change =

ratio; if ratio >=1

or

-1/ratio; if ratio < 1

Page 24: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Cluster data analysis

Page 25: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Automatizar a análise dos dados

Diferentes formatos

GeneTAC (LGE)

ScanArray (Ludwig)

CodeLink

NimbleGen (Futuro)

Objetivo do Programa

Page 26: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Possibilita a criação de diferentes projetos

Características do Programa

Estruturado por etapas

Linguagens: cgi, R (análise estatística)Banco de dados: MySql

Português e Inglês

Page 27: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa

Submissão dos Arquivos da Lâmina

Seleção de Dados

Normalização

Análises Estatísticas

Definição de um Projeto

Configuração da Lâmina

LGE e Ludwig

CodeLink

Page 28: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Criar / Selecionar um projeto

Definir o padrão

Estrutura do Programa: Definição do Projeto

Número de Placas funcionais

Page 29: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa: Definição do Projeto

Page 30: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Submissão dos arquivos

Definição dos grupos

Estrutura do Programa: Arquivos da Lâmina

Definição dos canais

Page 31: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa: Arquivos da Lâmina

Page 32: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Exclusão de spots indesejados●

Estrutura do Programa: Seleção dos Dados

Diferentes formas de exibir os dados

Diferentes filtros

Imagens

Page 33: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa: Seleção dos Dados

Page 34: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Métodos diferentes●

Estrutura do Programa: Normalização

Opções

Visualização

Page 35: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa: Normalização

Page 36: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa: Análises estatísticas

Fold Change

Pvalue

Page 37: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Estrutura do Programa: Análises estatísticas

Page 38: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Gráficos: Lâmina

(Fonte: Leandra Scarpari)

Grid

Page 39: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Gráficos: M vs A plot

M = log2(R/G)

A = ½ log2(RG)(Fonte: Leandra Scarpari)

Page 40: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Gráficos: M vs A plot

(Fonte: Ana Deckmann)

Page 41: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Gráficos: Density

(Fonte: Leandra Scarpari)

Page 42: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Gráficos: VolcanoPlot

Fold Change: Escala de comparação entre as razões

Pvalue: Reprodução dos dados(Quanto maior o módulo, mais diferencialmente expresso)

(Quanto menor, mais estão se reproduzindo os dados)

(Fonte: Leandra Scarpari, Ana Deckmann)

Page 43: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Gráficos: Clustering

Busca de padrões

(Fonte: Ana Deckmann)

Page 44: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Fim

Page 45: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Box plot

Page 46: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

Comparison of normalization methods for Codelink Bioarray data

Differences between pair of arrays in the technical replicates :

(1) Array 1 vs array 4

(2) Array4 vs array 5

BMC Bioinfomatics 2005, 6:309

Page 47: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais

- Within slide normalization

Before After

Print-tip normalization

No norm Print tip Scaled print tip

Nucleic Acids Research, 2002, vol 30, No 4