unravelling `omics' data with the mixomics r package -...

23

Upload: others

Post on 13-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Unravelling `omics' data with the mixOmics R

package

Illustration on several studies

Kim-Anh Lê Cao

Queensland Facility for Advanced BioinformaticsThe University of Queensland

Kim-Anh Lê Cao

mixOmics

Page 2: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Challenges

The issue with integrative systems biology

Unlimited quantity of data(n << p problem)

Data from multiple sources

→ E�cient and biologically relevant statistical methodologies areneeded to combine the information in these heterogeneous datasets.

Kim-Anh Lê Cao

mixOmics

Page 3: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

The biological questions

Single Omics analysis

Do we observe a `natural' separation between the di�erentgroups of patients?

Can we identify potential biomarker candidates predicting thestatus of the patients?

Integrative Omics analysis

Can we identify a subset of correlated genes and proteins frommatching data sets?

Can we predict the abundance of a protein given theexpression of a small subset of genes?

Do two matching omics data set contain the sameinformation?

Kim-Anh Lê Cao

mixOmics

Page 4: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

The data

The data

n = number of patientsp, q = number of biological features (genes, proteins ..)

Single Omics analysis

one omic data set X(n x p)

for a supervised analysis, Y vector indicating the class of thepatients

Integrative Omics analysis

two matching omics data sets (measured on the same patients)

X (n x p) and Z (n x q)

Kim-Anh Lê Cao

mixOmics

Page 5: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Multivariate analysis

Linear multivariate approaches enable:

Dimension reduction→ project the data in a smaller subspace

To handle multicollinear, irrelevant, missing variables

To capture experimental annd biological variation

In particular, in mixOmics, focus is on:

Data integration

Variable selection

Computationally e�cient methodologies for large biologicaldata sets

Interpretable graphical outputs

Kim-Anh Lê Cao

mixOmics

Page 6: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

mixOmics

mixOmics is an R package dedicated to the exploration and theintegrative analyses of high dimensional biological data sets.

Website- R tutorials- Newsletter

Web Interface- User friendly interface- Comprehensive results page

-Lê Cao et al. (2009) integRomics/mixOmics: an R package to unravel

relationships between two omics data sets, Bioinformatics

Kim-Anh Lê Cao

mixOmics

Page 7: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

mixOmics

One data set (n x p)

Two matching data sets

(n x p) & (n x q)

PCA IPCA

PLS-DA

PLS rCCA

Internal variable selection

Multivariate approach Data

sPCA sIPCA

sPLS-DA

sPLS canonical

Graphical outputs

PLS sPLS

regression

Missing values (n x p)

NIPALS imputation

Sample plots

Variable plots

Kim-Anh Lê Cao

mixOmics

Page 8: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Introduction with PCA

Principal Component Analysis: PCA

Seek the best directions in the data that account for most of thevariability

→ principal components: arti�cial variablesthat are linear combinations of the originalvariables:

c = X v

(n) (nxp) (p)−1.0 −0.5 0.0 0.5 1.0

−0.4

−0.2

0.0

0.2

0.4

x1

x2

c is a linear function of the elements of X having maximalvariance

v is called the associated loading vector

Kim-Anh Lê Cao

mixOmics

Page 9: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Introduction with PCA

Principal components cont.

The new PCs form a vectorialsubspace of dimension < p

Project the data on these newaxes.

−1.0 −0.5 0.0 0.5 1.0

−0.4

−0.2

0.0

0.2

0.4

z1

z2

→ approximate representation of the data points in a lowerdimensional space

Problem:Interpretation di�cult with very large number of (possibly)irrelevant variables

Kim-Anh Lê Cao

mixOmics

Page 10: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Introduction with PCA

sparse Principal Component Analysis: sPCA

Principal components

−1.0 −0.5 0.0 0.5 1.0

−0.4

−0.2

0.0

0.2

0.4

x1

x2

loading vectors(PCA)

0 20 40 60 80 100

0.0

00

.05

0.1

00

.15

loadin

g 1

0 20 40 60 80 100

0.0

00

.05

0.1

00

.15

loadin

g 2

sparse loading vectors(sPCA)

0 20 40 60 80 100

0.0

00

.05

0.1

00

.15

0 20 40 60 80 100

0.0

00

.05

0.1

00

.15

loadin

g 1

loadin

g 2

The principal components are linear combinations of the originalvariables, variables weights are de�ned in the associated loadingvectors.sparse PCA computes the sparse loading vectors to removeirrelevant variables using lasso penalizations (Shen & Huang 2008, J.

Multivariate Analysis).

Kim-Anh Lê Cao

mixOmics

Page 11: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Independent Principal Component Analysis

IPCA is based on Independent Component Analysis (ICA):

assumes non Gaussian data distribution ( 6= PCA).

`blind source' signal separation.

seeks for a set of independent components ( 6= PCA).

Combines the advantages of both PCA and ICA.

The PCA loadings are transformed via ICA to obtainindependent loading vectors and independent principalcomponents.

sparse IPCA also developed to select the variable contributingto the independent loading vectors

Yao, F. Coquery, J. and Lê Cao, K-A. 2012 Independent Principal Component

Analysis for biologically meaningful dimension reduction of large biological data sets,

BMC Bioinformatics.

Kim-Anh Lê Cao

mixOmics

Page 12: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Independent Principal Component Analysis

Illustration of PCA and IPCASample representation for the kidney transplant study (3 groups ofrejection status patients)

Kim-Anh Lê Cao

mixOmics

Page 13: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Discriminant Analysis

PLS - Discriminant Analysis

Similarly to Linear Discriminant Analysis, classical PLS-DAlooks for the best components to separate the sample groups.

As opposed to PCA/IPCA methods, it is a supervisedapproach.

In addition to this, sPLS-DA searches for discriminativevariables that can help separating the sample groups.

Evaluation of the discriminative power of the selected variablesusing external data sets or cross-validation.

Lê Cao K-A., Boitard S. and Besse P. (2011) Sparse PLS Discriminant Analysis:

biologically relevant feature selection and graphical displays for multiclass problems,

BMC Bioinformatics, 12:253.

Kim-Anh Lê Cao

mixOmics

Page 14: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Discriminant Analysis

Illustration of sPLS-DA

number of selected genes

erro

r ra

te

1 3 5 7 9 20 40 60 80 100 300 600 900

0.00

0.05

0.10

0.15

0.20

comp 1comp 1:2comp 1:3

Genomics − training data

Figure: Tuning the number of var.

to select with cross-validation

NR

−0.2 −0.1 0.0 0.1 0.2 0.3

−0

.4−

0.2

0.0

0.2

0.4

0.6

Comp 1

Co

mp

2

Genomics

Train

AR

NR

Test

AR

Figure: Predicting the class of the

test set samples

Kim-Anh Lê Cao

mixOmics

Page 15: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Motivation

Aims:

unravel the correlation stucture between two data sets

select co-regulated biological entities across samples

→ select and integrate in a one step procedure the di�erent typesof data

Partial Least Squares regression maximises the covariancebetween each linear combination (components) associated toeach data set

sparse PLS has been developed to include variable selectionfrom both data sets

Two modes are proposed to model the relationship betweenthe two data sets ('regression' and 'canonical')

Lê Cao et al. (2008), SAGMB, Lê Cao et al. (2009), BMC Bioinformatics

Kim-Anh Lê Cao

mixOmics

Page 16: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Integrating genomics and proteomics

Illustration of sparse PLS: sample plotsPLS aims at selecting correlated variables (genes, proteins) acrossthe same samples by performing a multivariate regression.Regression: explain the protein abundance w.r.t the gene expression�⇒ relationship�.

The latent variables (components)are determined based on theselected genes and proteins→ give more insight into thesamples similarities.

Unsupervised approach

Kim-Anh Lê Cao

mixOmics

Page 17: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Integrating genomics and proteomics

Illustration of sparsePLS: variable plot

Relevance networks are bipartite graphs directlyinferred from the sPLS components.

Some other insightful graphical outputs:

correlationcircle plots

clusteredimage maps

−1.0 −0.5 0.0 0.5 1.0

−1

.0−

0.5

0.0

0.5

1.0

Dimension 1

Dim

en

sio

n 2

Protein groupProbe set

González I., Lê Cao K.-A., Davis, M.D. and Déjean S. Visualising association between

paired `omics' data sets. In revision.

Kim-Anh Lê Cao

mixOmics

Page 18: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Comparing two proteomics platforms

Illustration of sPLS canonical modeSelects correlated variables across the same samples and highlightsthe correlation structure between the two data sets.Canonical mode: �⇔ relationship�

H460 NS

RE

MEL

OV

RE

RELEU

NS

COL

COLBR

LEU

LEU

MEL

MEL MEL

MEL

BR

BR

MEL

MEL

RECNSOV

BRCNS

CNS

NSNS

BR

OV

NSNSCNSMEL

PRRERE

RE

RE

CNS BR

LEU

LEULEU

COLCOL

COL COL

COLOV

NSOVPROV

BR NS

COL

LE

MEL

NS

OV

PR

RE

BR

CNS

Figure: Arrow plot to highlight the similarties between 2 data sets

Kim-Anh Lê Cao

mixOmics

Page 19: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Multilevel analysis

Cross-over design

One data set repeated

measurements (n x p)

Two matching data sets

repetated measurements (n x p) & (n x q)

multilevel PLS-DA

multilevel PLS

Internal variable selection

Multivariate approach Data

multilevel sPLS-DA

multilevel sPLS

Graphical outputs

supervised

Sample plots

Variable plots

canonical, 1 level

A novel approach for cross-over designs (repeated measurements up to 2 cross factors)

Kim-Anh Lê Cao

mixOmics

Page 20: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Multilevel analysis

Illustration of multilevel sPLS-DA

●●

−0.2 −0.1 0.0 0.1 0.2

−0.

2−

0.1

0.0

0.1

0.2

X−variate 1

X−

varia

te 2

GAG+

GAG−

LIPO5

NS

● t1

t2

gene517gene541gene589gene550gene567gene551gene579gene506gene540gene557gene527gene534gene509gene564gene528gene518gene553gene531gene502gene537gene546gene504gene524gene591gene519gene583gene585gene503gene521gene520gene582gene552gene568gene549gene576gene563gene539gene555gene588gene571gene529gene525gene511gene587gene515gene584gene526gene570gene594gene505gene574gene535gene566gene510gene580gene577gene538gene554gene569gene572gene530gene599gene507gene575gene560gene542gene501gene543gene597gene513gene547gene562gene595gene598gene516gene561gene578gene508gene558gene593gene536gene556gene532gene559gene573gene512gene592gene533gene586gene565gene522gene514gene600gene523gene544gene548gene581gene590gene545gene596gene19gene82gene85gene27gene92gene13gene99gene38gene81gene88gene4gene9gene28gene98gene37gene75gene76gene45gene83gene32gene55gene78gene39gene51gene20gene71gene30gene96gene63gene14gene18gene53gene73gene87gene2gene91gene25gene69gene23gene48gene43gene56gene22gene47gene59gene84gene15gene65gene1gene3gene34gene58gene80gene86gene24gene36gene49gene90gene7gene44gene67gene12gene29gene95gene17gene50gene66gene26gene64gene31gene21gene61gene5gene11gene40gene89gene97gene74gene77gene94gene100gene54gene35gene33gene62gene46gene93gene10gene52gene79gene8gene57gene16gene72gene6gene60gene68gene70gene41gene42gene613gene674gene673gene694gene603gene692gene671gene655gene606gene647gene621gene622gene654gene637gene662gene699gene635gene663gene665gene653gene664gene628gene641gene678gene632gene634gene642gene630gene616gene681gene614gene685gene652gene688gene644gene689gene672gene700gene617gene677gene650gene633gene640gene625gene631gene626gene660gene612gene636gene675gene618gene624gene697gene698gene646gene676gene607gene683gene696gene619gene656gene645gene651gene684gene680gene623gene668gene610gene679gene643gene670gene611gene686gene608gene602gene609gene620gene639gene695gene629gene666gene627gene605gene649gene669gene657gene604gene661gene658gene691gene667gene601gene659gene638gene648gene682gene687gene693gene615gene690gene160gene104gene200gene149gene190gene138gene151gene165gene132gene199gene120gene121gene179gene194gene177gene129gene123gene131gene148gene158gene171gene146gene192gene106gene124gene168gene170gene188gene162gene108gene118gene155gene173gene134gene139gene143gene150gene175gene145gene195gene122gene166gene140gene182gene187gene102gene128gene110gene147gene159gene164gene191gene105gene136gene111gene107gene135gene176gene153gene193gene183gene184gene125gene197gene185gene117gene161gene172gene101gene180gene144gene126gene156gene154gene109gene167gene137gene169gene133gene186gene103gene174gene115gene141gene130gene127gene198gene119gene142gene112gene181gene152gene157gene163gene196gene114gene116gene189gene113gene178gene837gene865gene812gene870gene817gene883gene805gene856gene839gene841gene871gene885gene802gene873gene808gene843gene851gene823gene828gene831gene818gene878gene844gene891gene824gene860gene880gene807gene884gene835gene834gene832gene866gene874gene894gene801gene819gene869gene889gene857gene863gene809gene872gene877gene836gene826gene876gene850gene868gene806gene820gene867gene887gene811gene845gene859gene846gene848gene886gene840gene861gene879gene897gene888gene892gene825gene890gene862gene875gene833gene881gene816gene827gene895gene830gene858gene810gene852gene900gene803gene849gene340gene381gene392gene309gene308gene350gene346gene366gene321gene373gene303gene329gene304gene400gene313gene367gene377gene398gene368gene355gene384gene347gene333gene361gene375gene383gene317gene343gene386gene306gene319gene348gene318gene349gene387gene332gene327gene363gene372gene396gene335gene388gene359gene378gene399gene311gene324gene315gene334gene330gene364gene357gene314gene316gene345gene322gene337gene344gene360gene353gene312gene385gene325gene382gene389gene354gene379gene352gene393gene358gene331gene391gene380gene390gene336gene310gene365gene339gene376gene301gene371gene320gene362gene394gene338gene307gene374gene326gene305gene328gene323gene351gene302gene370gene397gene342gene395gene341gene356

Sample123456789101112

StimulationGAG+GAG−LIPO5NS

−2

0

2

4

Kim-Anh Lê Cao

mixOmics

Page 21: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Conclusions

Conclusions

The exploratory and integrative approaches are:

�exible and can answer various types of questions.

can highlight the potential of the data.

enable to generate new hypotheses to be further investigated.

Future work includes:

Cross-platform comparison

Integration of multiple data sets (unsupervised and supervised)

Time-course experiments

Kim-Anh Lê Cao

mixOmics

Page 22: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Conclusions

Acknowledgements

mixOmics teamSébastien Déjean Univ. Tlse

Ignacio González Univ. Tlse

Xin Yi Chua QFAB

Other contributors to mixOmics

Je� Coquery QFAB, Sup'Biotech

Benoit Liquet Univ. Bordeaux

Eric Fangzhou Yao QFAB, Shanghai Univ ofFinance and Economics

Pierre Monget QFAB, CESI

Kim-Anh Lê Cao

mixOmics

Page 23: Unravelling `omics' data with the mixOmics R package - …mixomics.org/wp-content/uploads/2012/03/mixOmics.pdf · sparse loading vectors (sPCA) 0 20 40 60 80 100 0 20 40 60 80 100

Introduction Concept of mixOmics Single Omics Analysis Integrative Omics Analysis Recent developments Conclusions

Questions?

Questions?

http://[email protected]

[email protected]

Kim-Anh Lê Cao

mixOmics