complex networks and data mining on genetic databases

19
Knowledge Discovery in Databases through Complex Networks: application to phylodynamics Luiz Max F. de Carvalho Scientific Computing Programme (PROCC), Fiocruz Pan American Center for Foot-and- Mouth Disease (PAHO/WHO) WaFiS 2012 Knowledge Discovery in Databases (KDD) Complex Networks Who’s this guy? Knowledge Discovery in Databases through Complex Networks: application to phylodynamics Luiz Max F. de Carvalho Scientific Computing Programme (PROCC), Fiocruz Pan American Center for Foot-and-Mouth Disease (PAHO/WHO) WaFiS 2012 September 28, 2012

Upload: luiz-max-carvalho

Post on 25-Jan-2015

86 views

Category:

Education


2 download

DESCRIPTION

This is my presentation in WaFIS last year, about the use of complex networks on data mining in genetic databases.

TRANSCRIPT

Page 1: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Who’s this guy?

Knowledge Discovery in Databases throughComplex Networks: application to

phylodynamics

Luiz Max F. de CarvalhoScientific Computing Programme (PROCC), FiocruzPan American Center for Foot-and-Mouth Disease

(PAHO/WHO)

WaFiS 2012

September 28, 2012

Page 2: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Outline

1 Knowledge Discovery in Databases (KDD)

2 Complex Networks

3 Example 1: Chitin pathway phylogeny

4 Example 2: Foot-and-mouth disease virus in South America

Page 3: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Knowledge Discovery in Databases (KDD)

Lots of data

human brain very limited processing capacity

Information → Knowledge

Increasing number of molecular data (sequences, 3Dstructures, antigenicity,. . . )

Is it possible to explore these databases to discover usefulstuff?

Page 4: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Well. . . Let’s see

Page 5: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

[We may use] Complex Networks

Graphs → G = (V ,E )

Page 6: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Yeah, but how?

We can explore the ”dynamic signature” of these ComplexNetworks, i.e., study and compare their structural properties.Some useful formulas:

Clustering Coefficient < c >: 3×#triangles#triples

Degree distribution PK =∑∞

K ′=K pK ′

Diameter: max(d(i , j))

Page 7: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Ok, Let’s work then

1 Grab n sequences;

2 Create an n × n matrix using some kind of (normalized)distance (say, S);

3 For each σ ∈ [0, 1] build M(σ) such that:

mij(σ) =

{1 if Sij > σ,

0 if Sij < σ.

In a sense, we are transforming a single network in a family ofnetworks.

Page 8: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Analysis

We shall explore the relationships between these networks:First, define a higher-order neighborhood indicator function,such that you binarize the adjacency matrix with regard thepath length `, obtaining a matrix M =

∑D`=1 `M(`). Then

δ(α, β) =1

N2

N∑i=1

N∑j=1

(mij(α)

D(α)−

mij(β)

D(β)) (1)

Evaluating δ(σ, σ + ∆σ) can give some interesting insights.

Page 9: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: Chitin pathway phylogeny

Proteins related to the chitin metabolic pathway from1605 complete genomes;

BLAST distances (which are asymmetric);

Search for phylogenetic relationships

Page 10: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: Some results

Page 11: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: Some more results

Page 12: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 1: The expected Network(s)

Page 13: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Example 2: Foot-and-mouth disease virus in SouthAmerica

S was built with phylogenetic (TN93) distances for NTand JTT distances for AA;

Try to make sense of a somewhat big data set (167 seqs);

Extract some nice patterns;

Page 14: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Indexes × σ

(a) (b)

Page 15: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

A nice network

Page 16: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Some more developments

Page 17: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Related Work

Identify transmission clusters (HIV, HCV) (Lewis et al,2008,Plos Medicine)

Explore scale-free behavior in phylodynamics (Shiino,2012, Frontiers in Microbiology)

Page 18: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Future Directions

Explore the spatial aspect in the construction of SMaybe S = µ+ S(G )α

Power law analysis

Implement assortativity

Suggestions. . .

Page 19: Complex Networks and Data Mining on genetic databases

KnowledgeDiscovery inDatabases

throughComplex

Networks:application tophylodynamics

Luiz Max F.de Carvalho

ScientificComputingProgramme(PROCC),

FiocruzPan American

Center forFoot-and-

Mouth Disease(PAHO/WHO)

WaFiS 2012

KnowledgeDiscovery inDatabases(KDD)

ComplexNetworks

Example 1:Chitinpathwayphylogeny

Example 2:Foot-and-mouth diseasevirus in SouthAmerica

Thank You!