complex networks and data mining on genetic databases
DESCRIPTION
This is my presentation in WaFIS last year, about the use of complex networks on data mining in genetic databases.TRANSCRIPT
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Who’s this guy?
Knowledge Discovery in Databases throughComplex Networks: application to
phylodynamics
Luiz Max F. de CarvalhoScientific Computing Programme (PROCC), FiocruzPan American Center for Foot-and-Mouth Disease
(PAHO/WHO)
WaFiS 2012
September 28, 2012
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Outline
1 Knowledge Discovery in Databases (KDD)
2 Complex Networks
3 Example 1: Chitin pathway phylogeny
4 Example 2: Foot-and-mouth disease virus in South America
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Knowledge Discovery in Databases (KDD)
Lots of data
human brain very limited processing capacity
Information → Knowledge
Increasing number of molecular data (sequences, 3Dstructures, antigenicity,. . . )
Is it possible to explore these databases to discover usefulstuff?
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Well. . . Let’s see
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
[We may use] Complex Networks
Graphs → G = (V ,E )
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Yeah, but how?
We can explore the ”dynamic signature” of these ComplexNetworks, i.e., study and compare their structural properties.Some useful formulas:
Clustering Coefficient < c >: 3×#triangles#triples
Degree distribution PK =∑∞
K ′=K pK ′
Diameter: max(d(i , j))
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Ok, Let’s work then
1 Grab n sequences;
2 Create an n × n matrix using some kind of (normalized)distance (say, S);
3 For each σ ∈ [0, 1] build M(σ) such that:
mij(σ) =
{1 if Sij > σ,
0 if Sij < σ.
In a sense, we are transforming a single network in a family ofnetworks.
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Analysis
We shall explore the relationships between these networks:First, define a higher-order neighborhood indicator function,such that you binarize the adjacency matrix with regard thepath length `, obtaining a matrix M =
∑D`=1 `M(`). Then
δ(α, β) =1
N2
N∑i=1
N∑j=1
(mij(α)
D(α)−
mij(β)
D(β)) (1)
Evaluating δ(σ, σ + ∆σ) can give some interesting insights.
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: Chitin pathway phylogeny
Proteins related to the chitin metabolic pathway from1605 complete genomes;
BLAST distances (which are asymmetric);
Search for phylogenetic relationships
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: Some results
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: Some more results
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 1: The expected Network(s)
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Example 2: Foot-and-mouth disease virus in SouthAmerica
S was built with phylogenetic (TN93) distances for NTand JTT distances for AA;
Try to make sense of a somewhat big data set (167 seqs);
Extract some nice patterns;
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Indexes × σ
(a) (b)
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
A nice network
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Some more developments
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Related Work
Identify transmission clusters (HIV, HCV) (Lewis et al,2008,Plos Medicine)
Explore scale-free behavior in phylodynamics (Shiino,2012, Frontiers in Microbiology)
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Future Directions
Explore the spatial aspect in the construction of SMaybe S = µ+ S(G )α
Power law analysis
Implement assortativity
Suggestions. . .
KnowledgeDiscovery inDatabases
throughComplex
Networks:application tophylodynamics
Luiz Max F.de Carvalho
ScientificComputingProgramme(PROCC),
FiocruzPan American
Center forFoot-and-
Mouth Disease(PAHO/WHO)
WaFiS 2012
KnowledgeDiscovery inDatabases(KDD)
ComplexNetworks
Example 1:Chitinpathwayphylogeny
Example 2:Foot-and-mouth diseasevirus in SouthAmerica
Thank You!