a graph-theoretic modeling on go space for biological interpretation of gene clusters

22
A graph-theoretic modeling on GO space for biological interpretation of gene clusters Bioinformatics Unit, ISTECH Inc. Cancer Metastasis Research Center, Yonsei University College of Medicine Sung Geun Lee, Jung Uk Hur and Yang Seok Kim 報報報 : 報報報

Upload: chase

Post on 20-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

A graph-theoretic modeling on GO space for biological interpretation of gene clusters. Bioinformatics Unit, ISTECH Inc. Cancer Metastasis Research Center, Yonsei University College of Medicine Sung Geun Lee, Jung Uk Hur and Yang Seok Kim. 報告人 : 張家榮. Introduction. Gene Ontology (GO) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

A graph-theoretic modeling on GO space for biological interpretation of gene

clusters

Bioinformatics Unit, ISTECH Inc.Cancer Metastasis Research Center, Yonsei University College of

Medicine Sung Geun Lee,Jung Uk Hur andYang Seok Kim

報告人 :張家榮

Page 2: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Introduction

Gene Ontology (GO)– Controlled vocabulary of various genomic databases about

diverse species

Clusters of microarray data– Each cluster has some genes

Extracts GO terms for a gene cluster– Each gene has several corresponding GO terms

Purpose– To discover the meaning of each cluster

Page 3: A graph-theoretic modeling on GO space for biological interpretation of gene clusters
Page 4: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Metric structure of GO tree

Page 5: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Lowest common ancestor

Given a non-empty subset U, v is a common ancestor of U if every node in U is on a subtree having v as the root and v0 is an LCA of U if v0 is greater than or equal to the level of w for any common ancestor w of U.

Page 6: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Lowest common ancestor

Page 7: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Principal distance

Each level has its own weight– W: IH -> R+

– W(i) > W(i+1)– For example: W(k)=150-10(k-1)

Where w0 is LCA of v1 and v2

Page 8: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Principal distance

40

30

20

10

0

Page 9: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Multiset

Mathematically, the following three sets {1}, {1, 1}, {1, 1, 1} are equal in the set notation.

Yet, we want to take the number of occurrences of elements into account.

Such set is called as a multiset.

Page 10: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

MaxPd and AverPd

given a multiset G ={v1, v2, . . . , vn}

Page 11: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

MaxPd and AverPd

MaxPd– give the comprehensive biological meanings of a

gene cluster– Not flexible but informs us of the existence of

some functional outliers

AverPd– Signifies the most frequent GO codes– More than one

Page 12: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Algorithmic approach (1)

c[i,j] is j’st GO code of i’st gene We consider ordered GO codes g[m] where 1

≤m≤α α is a constant related to the input data α≤Ω , Ω is the total number of GO code

Page 13: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Algorithmic approach (2)

MaxPd is used to find LCA of C Complexity : 3αn

min

Page 14: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

MaxPd

40

30

20

10

0

40 4030

40

20

Page 15: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Algorithmic approach (3)

AverPd is used to find an optimal GO code g[m0] such that the average distance between g[m0] and each gene in C is smaller than that of any g[m]

Complexity : 3αn

Page 16: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

AverPd

Page 17: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Discussion

Other algorithms consider GO term frequencies or compare specific GO term-related gene groups

In our modeling, the topological property of GO hierarchy is used.

Page 18: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Utility

Biological assessment of the clustering results of DNA microarray data

Coupled with any clustering technique to predict the functional category of the unknown genes

Not only DNA microarray data, but also any kinds of group analysis with any ontology having an identical structure with GO

Page 19: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Another approach

The length of GO code is about logα Take one number of GO code each timepseudo-code:for 1≤k≤ logα

cluster C by the kth numberbreak if no cluster above ndelete clusters under n

end

Page 20: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

MaxPd

Page 21: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

Complexity

Worst case : O( nlogα) Best case : O (n) Alternate

– Change the break time to cluster it in detail

Page 22: A graph-theoretic modeling on GO space for biological interpretation of gene clusters

disadvantage

Cannot obtain information not contained in GO such as disease-related genes

GO terms on the same level have different level information

GO hierarchy is dynamic and flexible