learning rule-based models from gene expression time profiles annotated with gene ontology terms

28
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid

Upload: zonta

Post on 14-Feb-2016

40 views

Category:

Documents


2 download

DESCRIPTION

Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms. Jan Komorowski and Astrid Lägreid. Joint work with. Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik. Selected Challenges in Gene-expression Analysis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

Learning rule-based models from gene expression time

profiles annotated with Gene Ontology terms

Jan Komorowski and Astrid Lägreid

Page 2: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Joint work with

• Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik

Page 3: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Selected Challenges in Gene-expression Analysis

• Function similarity corresponds to expression similarity but:– Functionally corelated genes may be expression-wise dissimilar

(e.g. anti-coregulated)– Genes usually have multiple function– Measurements may be approximate and contradictory

• Can we obtain clusters of biologically related genes?• Can we build models that classify unknown genes to

functional classes, that are human legible, and that handle approximate and often contradictory data?

• How can we re-use biological knowledge?

Page 4: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Data

• Data material– Serum starved fibroblasts, 8,613 genes

• Added serum to medium at time = 0• Used starved fibroblasts as reference• Measured gene activity at various time points

– 493 genes found to be differentially expressed• Results

– 278 genes known (3 repeats)– 212 genes unknown, (uncharacterized)– 211 genes given hypothetical function with 88% quality

Page 5: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Fibroblast - serum Fibroblast - serum responseresponse

10 4 8 24

quiescentnon-proliferating proliferating

serumserum samples for microarray analysis

Page 6: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

10 4 8 24

quiescentnon-proliferating proliferating

protein synthesisprotein synthesis

lipid synthesislipid synthesis

stress responsestress response

cellcellmotilitymotility

re-entry re-entry cell cyclecell cycle

organelleorganellebiogenesisbiogenesistranscriptiontranscription

ProcessesProcesses

Page 7: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

quiescentnon-proliferating proliferating

immediate early

delayed immediate

earlyintermediate

10 4 8 24

late

primary secondary tertiary

Dynamic processesDynamic processes

Page 8: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

quiescentnon-proliferating proliferating

10 4 8 24primary secondary tertiary

Protein appears Protein appears afterafter the transcript the transcript

Page 9: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

10 4 8 24

gene transcript protein

Protein dynamics are not always Protein dynamics are not always similar to transcript dynamicssimilar to transcript dynamics

Page 10: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Molecular mechanisms Molecular mechanisms of transcriptional of transcriptional

responseresponse

immediate earlyresponse genes

delayedimmediate earlyresponse genes

intermediate/lateresponse genes

effectorseffectors= cellular = cellular responseresponse

serumserum= signal= signal

immediate early response factors

secondarytranscription

factors

Page 11: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

quiescentnon-proliferating proliferating

1 4 8 24

protein synthesis

DNA synthesis

energy metabolism

cell motility

stress responsecell motilitycell adhesion

DNA synthesis

lipid synthesis

cell cycle regulation

The dynamics of cellular processesThe dynamics of cellular processes

cell proliferation, negative regulation

Page 12: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Gene 0HR 15MIN 30MIN 1HR 2HR 4HR 6HR 8HR 12HR 16HR 20HR 24HR Process g1 0.00 -0.47 -3.32 -0.81 0.11 -0.60 -1.36 -1.03 -1.84 -1.00 -0.60 -0.94 Unknown

g2 0.00 0.66 0.07 0.20 0.29 -0.89 -0.45 -0.29 -0.29 -0.15 -0.45 -0.42 Transport and

defense response g3 0.00 0.14 -0.04 0.00 -0.15 -0.58 -0.30 -0.18 -0.38 -0.49 -0.81 -1.12 Cell cycle control

g4 0.00 -0.04 0.00 -0.23 -0.25 -0.47 -0.60 -0.56 -1.09 -0.71 -0.76 -0.62 Positive control of cell proliferation

g5 0.00 0.28 0.37 0.11 -0.17 -0.18 -0.60 -0.23 -0.58 -0.79 -0.29 -0.74 Positive control of cell proliferation

... ... ... ... ... ... ... ... ... ... ... ... ... ...

Process

Positive controlof cell

proliferation

Defenseresponse

Cell cyclecontrol

Ontology

Transport

g2 ... g2 ... g3 ...g4 ... g5

0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation)

Methodology1. Mining functional classes from an ontology

2. Extracting features for learning

3. Inducing minimal decision rules using rough sets

4. The function of unknown genes is predicted using the rules !-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14 16 18 20 22 24

Page 13: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Gene Ontology

GENEFUNCTION

CELLULARCOMPARTMENT

PROCESS

FUNCTION

Cell growth and maintenance

Metabolism

Energy pathwaysNucleotide and nucleic acid metabolism

DNA metabolism

TranscriptionDNA packagingDNA repairMutagenesis

Intracellular protein trafficIon homeostasisTransport

Lipid metabolism

Protein metabolism and modificationAmino-acid and derivative metabolismProtein targeting

Cell deathCell motilityStress responseOrganelle organizaton and responseOncogenesisCell proliferationCell cycle

Cell communication

Cell adhesionSignal transduction

Cell surface receptor linked signal transductionIntracellular signalling cascade

Developmental processes

Physiological processes

Blood CoagulationCirculation

Page 14: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Energy pathways DNA metabolismAmino acid and derivative

metabolismProtein targeting

Lipid metabolism Transport Ion hemostasis Intracellular traffic

Cell death Cell motility Stress responseOrganelle organization and biogenesis

Oncogenesis Cell cycle Cell adhesionCell surface receptor linked signal

transduction

Intracellular signaling cascadeDevelopmental processes Blood coagulation Circulation

Biological processes from GO

Page 15: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Hierchical Clustering of the Fibroblast Data

It’s not a cluster!

Page 16: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Gene Ontology vs. Clusters found by Iyer et al.

Page 17: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Template-based feature synthesisAll possiblesubintervals

in the time series

Templates:IncreasingDecreasing

Constant

Gene expressiontime series data

Groups containinggenes matching the

same templates overthe same subinterval

+

MATCH

12 measurement points, 55 possible intervals of length >2

Page 18: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Examples of template definitions

MIN. 0.6

MAX 0.2

MIN. 0.1

MIN. 0.1

2HR 8HR6HR4HR

MEANMIN. 0.2

8HR6HR4HR

MIN. 0.2

Constant-template

Increasing-template

MIN. 0

MIN. 0

12HR

8HR 12HR

1.0

0.5

Page 19: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Rule example 1

Rule Covered genes0 - 4(Constant) AND 0 - 10(Increasing) => GO(protein metabolism and modification) OR GO(mesoderm development) OR GO(protein biosynthesis)

M35296 J02783 D13748 X05130X60957D13748U90918 (unknown)

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0 2 4 6 8 10 12 14 16 18 20 22 24

Page 20: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Rule example 2

Rule Covered genes0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation) OR GO(cell-cell signaling) OR GO(intracellular signaling cascade) OR GO(oncogenesis)

Y07909 X58377 U66468X58377X85106Y07909

-2

-1.5

-1

-0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14 16 18 20 22 24

Page 21: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Classification using template-based rules

IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …

IF 0 - 4(Constant) AND 0 - 10(Increasing) THEN GO(prot. met. and mod.) OR …IF … THEN IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN …IF … THEN ……

X60957

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0 2 4 6 8 10 12 14 16 18 20 22 24

Process Votes protein metabolism and modification 6 protein amino acid phosphorylation 3 proteolysis and peptidolysis 2 transcription 1 transport 1 vision 1 …

+4

Votes are normalized and processes with vote fractions higher than a selection-threshold are chosen as predictions

Page 22: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Cross validation estimates Iyer et al.

PROCESS AUC SE Ion homeostasis 1.00 0.00 Protein targeting 0.99 0.03 Blood coagulation 0.96 0.08 DNA metabolism 0.94 0.09 Intracellular signaling cascade 0.94 0.06 Energy pathways 0.93 0.12 Cell cycle 0.93 0.04 Oncogenesis 0.92 0.11 Circulation 0.91 0.11 Cell death 0.90 0.10 Developmental processes 0.90 0.07 Transcription 0.88 0.11 Defense (immune) response 0.88 0.05 Cell adhesion 0.87 0.09 Stress response 0.86 0.15 Protein metabolism and modification 0.85 0.10 Cell motility 0.84 0.11 Cell surface rec linked signal transd 0.82 0.15 Lipid metabolism 0.81 0.14 Transport 0.79 0.17 Cell organization and biogenesis 0.79 0.11 Cell proliferation 0.79 0.06 Amino acid and derivative metabolism 0.69 0.06

AVERAGE

0.88

0.09

A:Coverage: 84%Precision: 50%

B:Coverage: 71%Precision: 60%

C:Coverage: 39%Precision: 90%

Coverage = TP/(TP+FN)Precision = TP/(TP+FP)

Page 23: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Cross validation estimates Cho et al.Process GO AUC SE apoptosis* GO:0006915 0.81 0.01 carbohydrate metabolism GO:0005975 0.72 0.02 cell adhesion* GO:0007155 0.77 0.02 cell cycle control* GO:0000074 0.83 0.01 cell motility* GO:0006928 0.81 0.01 cell proliferation GO:0008283 0.80 0.01 cell surface rec linked signal transd GO:0007166 0.79 0.01 cell-cell signaling GO:0007267 0.80 0.01 DNA metabolism GO:0006259 0.78 0.02 energy pathways GO:0006091 0.76 0.02 humoral immune response GO:0006959 0.77 0.02 immune response GO:0006955 0.81 0.01 intracellular signaling cascade GO:0007242 0.81 0.02 lipid metabolism GO:0006629 0.71 0.02 mesoderm development GO:0007498 0.77 0.02 mitotic cell cycle* GO:0000278 0.84 0.01 neurogenesis GO:0007399 0.78 0.01 oncogenesis GO:0007048 0.77 0.01 phototransduction GO:0007602 0.85 0.01 physiological processes GO:0007582 0.77 0.01 protein biosynthesis GO:0006412 0.80 0.02 protein metabolism and modification GO:0006411 0.77 0.01 protein amino acid phosphorylation GO:0006468 0.82 0.01 proteolysis and peptidolysis GO:0006508 0.80 0.02 transcription GO:0006350 0.71 0.01 transport GO:0006810 0.71 0.01 vision GO:0007601 0.83 0.01

AVERAGE 0.78 0.01

Coverage: 58%Precision: 61%

Coverage = TP/(TP+FN)Precision = TP/(TP+FP)

Page 24: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Protein Metabolism and Modification

A B C

D E

A – annotationsB – false negativesC – false positivesD – true positives E – pred. unknown gene

Page 25: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Re-classification of the Known Genes

Page 26: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Co-classifications for the Unknown Genes

Page 27: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Conclusions• Our methodology

– Incorporates background biological knowledge– Handles well the noise and incompleteness in the

microarray data– Can be objectively evaluated– Predicts multiple functions per gene– Can reclassify known genes and provide possible

new functions of the known genes– Can provide hypotheses about the function of

unknown genes• Experimental work needs to be done to

confirm our predictions

Page 28: Learning rule-based models from gene expression time  profiles annotated with Gene Ontology terms

J. Komorowski and A. Lägreid

Genomic ROSETTA:http://www.idi.ntnu.no/~aleks/rosetta