statistical bioinformatics - lumc

16
Introduction Needles and Haystacks Gene set testing and extensions Discussion Statistical Bioinformatics Jelle Goeman Medical Statistics & Bioinformatics Leiden University Medical Center Kick-off Meeting, 2009-11-10 Statistical Bioinformatics Jelle Goeman

Upload: others

Post on 23-May-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Statistical Bioinformatics

Jelle Goeman

Medical Statistics & BioinformaticsLeiden University Medical Center

Kick-off Meeting, 2009-11-10

Statistical Bioinformatics Jelle Goeman

Page 2: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Outline

1 Introduction

2 Needles and Haystacks

3 Gene set testing and extensions

4 Discussion

Statistical Bioinformatics Jelle Goeman

Page 3: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Introduction

Data avalanche

Advent of genomics has had a great impact on statistics

New ways of working and thinking needed

Old rule-of-the-thumb:Need at least five subjects for every measured feature

Old software (SPSS) could not handle the data

Old methods broke down

Great stimulus: much development

Exciting new subfield of statistics: high dimensional data analysis

Statistical Bioinformatics Jelle Goeman

Page 4: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Bioinformaticians in the medical statistics group

Jeanine Houwing and Stefan Bohringer

Statistical genetics

Bart Mertens

Statistics of proteomics

Erik van Zwet and Jelle Goeman

Statistics of microarray data

Statistical Bioinformatics Jelle Goeman

Page 5: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Statistical consulting

Medical Statistics group

Long tradition of statistical consulting for whole LUMC

Similar consulting for statistical bioinformatics

We can

Advise

Show/teach how to do statistical analyses

Perform analyses for you

Statistical Bioinformatics Jelle Goeman

Page 6: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Experimental design

Close relationship

Design of the experiment ⇐⇒ statistical analysis

Recommendation

See a statistician before you start your experiment

Statistical Bioinformatics Jelle Goeman

Page 7: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Multiple testing

Many simultaneous measurements

Consequence: many simultaneous research questionsWhich gene expressions are different between cases and controls?

Multiplicity: needle-in-a-haystack problem

Among so many genes, many will seem to be different

Many P-values < 0.05 by pure chance

Correct for multiplicity

By statistical adjustment for multiple testingRisk: throw out the good genes with the bad

Statistical Bioinformatics Jelle Goeman

Page 8: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Prognostic modeling

Patient-level prediction

Using genomic information to distinguish good/bad prognosis

Similar needle-in-a-haystack problem

Many prediction rules seem to do well; few really do

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

time (years)

surv

ival

pro

babi

lity

all tumorspercentile 0−25percentile 25−50percentile 50−75percentile 75−100

Statistical Bioinformatics Jelle Goeman

Page 9: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

The Needle-in-a-haystack problem

Problem

Too much data

Risk of false positive findings

Lack of structure

One solution: provide structure

Use external information to structure statistical learning

Source of information

Bioinformatics: databases

Statistical Bioinformatics Jelle Goeman

Page 10: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Gene set testing

Microarray gene expression studies

Would produce (long) lists of differentially expressed genes

But: genes do not operate in isolation

What are the biological processes these genes are involved in?

Typical: post hoc analysis

Analyzing the list of differentially expressed genes for commonfunctions

Using databases of gene function (Gene Ontology)

Problem: statistically highly inefficient

Better: incorporate gene function into analysis directly

Statistical Bioinformatics Jelle Goeman

Page 11: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Globaltest

Globaltest method

Analyze your data directly at the level of gene sets

# genes p-valuechromosome segregation 14 1e-05cell cycle 230 1e-05cytokinesis 7 2e-05microtubule cytoskeleton organiz. and biogen. 22 2e-05microtubule-based process 47 2e-05mitotic cell cycle 69 2e-05G2/M transition of mitotic cell cycle 4 2e-05DNA replication 49 3e-05mitosis 53 3e-05M phase 66 3e-05M phase of mitotic cell cycle 54 3e-05sister chromatid segregation 9 3e-05mitotic sister chromatid segregation 9 3e-05establishment of organelle localization 3 4e-05cytoskeleton organization and biogenesis 128 4e-05

Statistical Bioinformatics Jelle Goeman

Page 12: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Using the structure of Gene Ontology

Looking at many gene sets

Still relatively unstructured

Exploit structure

Gene Ontology is a graph

Let the graph guide thesearch

Result of one test showswhich test to do next

Result: more power

Statistical Bioinformatics Jelle Goeman

Page 13: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Structure in data

Structured data

Measurements along thegenome

Exploit this structure

Start testing at thechromosome level

Go down deeper whereyou find effects

Chr Arm Band Gene

8

p

q

23.323.223.1

22

21.3

21.221.1

12

11.2311.2111.111.1

11.2111.2211.23

12.112.212.313.113.2

13.321.1121.1221.1321.221.3

22.122.2

22.3

23.123.2

23.324.1124.1224.1324.21

24.2224.23

24.3extra

H84926...R56148

Statistical Bioinformatics Jelle Goeman

Page 14: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Data-driven structure

Use clustering to get a data-driven structure

Exploit that structure for insight and increased power

abso

lute

cor

rela

tion ●

1

0.8

0.6

0.4

0.2

NU

SA

P1

CC

NB

2P

RC

1B

UB

1K

IF23

RA

CG

AP

1C

CN

B1

CD

C25

AN

CA

PH

CD

CA

3U

BE

2CA

UR

KB

ES

PL1

BIR

C5

SP

C25

CD

CA

8C

DC

20Z

WIN

TM

AD

2L1

CD

C2

NE

K2

CK

S1B

AN

LNC

CN

A2

CD

C25

BS

PAG

5C

KS

2N

CA

PD

2N

US

AP

1N

DC

80S

GO

L1A

SP

MC

DC

A2

KIF

11F

BX

O5

NU

F2

AS

PM

AS

PM

SG

OL2

SM

C4

CD

C6

CIT

SE

PT

3C

DC

123

RA

D21

SE

PT

11C

CN

D2

FT

SJ3

SE

PT

9LL

GL2

AN

AP

C11

PAR

D6G

MA

PR

E2

MA

D2L

2H

OX

B4

CC

NG

2M

AP

9PA

RD

6BPA

RD

6BC

DC

14A

STA

G2

CC

DC

5S

YC

P2

MA

EA

AAT

FPA

RD

6AC

DK

3C

DC

2L6

CD

K6

TX

NL4

AD

IAP

H2

TG

FB

2N

ED

D9

SE

PT

5

p−va

lue

1

0.1

0.01

0.001

1e−04

1e−05

1e−06

1e−07

1e−08 pos. assoc. with survivalneg. assoc. with survival

Statistical Bioinformatics Jelle Goeman

Page 15: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Future: use other structures

Statistical Bioinformatics Jelle Goeman

Page 16: Statistical Bioinformatics - LUMC

Introduction Needles and Haystacks Gene set testing and extensions Discussion

Discussion

Statistical Bioinformatics

Many new developments

Builds upon classical statistics

Greatest challenge and opportunity

Using biological knowledge to guide the analysis

Statistical Bioinformatics Jelle Goeman