presentationmiev0 - aalborg universitet · alzheimer's disease (ad) : the most common form of...
TRANSCRIPT
30/08/09 2
Alzheimer's disease (AD) : the most common form of dementia 26.6 million people worldwide had Alzheimer's (2006)
Increase of the number of patients worldwide (*4 by 2050) Interest of the biomedical community for the discovery of the genes involved in AD development
30/08/09 3
Study of gene expression: information on the synthesis of a functional gene product
Microarrays: to compare the expression of thousands of genes in different tissues, cells or conditions
Processing microarray analysis for making biomedical sense is a big challenge because of the large amounts of data
Importance of data mining for discovering previously unknown knowledge from huge volumes of data Adaptations
30/08/09 4
30/08/09 5
Genes
30/08/09 6
Genes
Microarrays
30/08/09 7
Genes
Microarrays
Intensity (expression) of a gene measured by a microarray
30/08/09 8
Genes
Microarrays
Intensity (expression) of a gene measured by a microarray
Huge density: Affymetrix U-133 plus 2.0 Array 54,675 probesets
30/08/09 9
DNA microarray technologies
New knowledge
Online biological knowledge databases and bibliographical resources
Processing all those data remains very challenging in terms of biological significance
30/08/09 10
DNA microarray technologies Online biological knowledge databases and bibliographical resources
New knowledge
Collaboration between
Objectives: To provide a process which enables experts to interpret transcriptomic data
Application: To decipher mechanisms of brain ageing and associated pathologies (Alzheimer's diseases)
Data: Transcriptome of the temporal cortex of Microcebus murinus Affymetrix microarrays
30/08/09 11
30/08/09 12
30/08/09 13
Data Mining techniques
Sequential patterns
Clustering and visualisation techniques
Selected sequential patterns
Interpretation techniques
New biological
knowledge
30/08/09 14
Data Mining techniques
Sequential patterns
Clustering and visualisation techniques
Selected sequential patterns
Interpretation techniques
G2 has an expression lower than genes G1 and G5 which expressions are close and lower than G3
30/08/09 15
Microarray Gene expression sequences
M1 M2 M3 M4
<(G2)(G1 G5)(G3)(G4)> <(G2)(G1 G5)(G4)(G3)> <(G2)(G4)(G1 G5)(G3) > <(G2)(G3)(G1 G5)(G4)>
<(G2)(G1 G5)(G3)>
30/08/09 16
<(G2)(G1 G5)(G3)>
Itemset Item
Sequence Microarray Gene expression sequences
M1 M2 M3 M4
<(G2)(G1 G5)(G3)(G4)> <(G2)(G1 G5)(G4)(G3)> <(G2)(G4)(G1 G5)(G3) > <(G2)(G3)(G1 G5)(G4)>
30/08/09 17
<(G2)(G1 G5)(G3)>
Support: 3/4
Microarray Gene expression sequences
M1 M2 M3 M4
<(G2)(G1 G5)(G3)(G4)> <(G2)(G1 G5)(G4)(G3)> <(G2)(G4)(G1 G5)(G3) > <(G2)(G3)(G1 G5)(G4)>
30/08/09 18
<(G2)(G1 G5)(G3)>
Support: 3/4
DBSAP Algorithm [Salle et al., AIME 2009]
Microarray Gene expression sequences
M1 M2 M3 M4
<(G2)(G1 G5)(G3)(G4)> <(G2)(G1 G5)(G4)(G3)> <(G2)(G4)(G1 G5)(G3) > <(G2)(G3)(G1 G5)(G4)>
30/08/09 19
Results: Discriminant sequential patterns for various supports
Frequent for a biological class (young adults)
Not frequent for the complementary class (aged animals)
30/08/09 20
Results: Discriminant sequential patterns for various supports
frequent for a biological class (young adults) Not frequent for the complementary class (aged animals)
Too numerous (between 100 and 185,240)
Difficult to interpretate
30/08/09 21
Data Mining techniques
Sequential patterns
Clustering and visualisation techniques
Selected sequential patterns
Interpretation techniques
New knowledge
Similarity measure [Saneifar et al., AusDM’08]
Method of hierarchical clustering [Nin Guerero et al., AusDM’08]
30/08/09 22
S75%=<(G1)(G2 G3)> S’75%=< (G2 G3) (G1)>
Similarity measure [Saneifar et al., AusDM’08]
Method of hierarchical clustering [Nin Guerero et al., CBMS’08]
30/08/09 23
S75%,25%=<(G1)(G2 G3)> S75%,25%=< (G2 G3) (G1)>
Because of the quantity of patterns and the depth of the hierarchy, these results are not easily understandable and actionable by experts
30/08/09 24
Collaboration with PIKKO society
http://www.lirmm.fr/tatoo/spip.php?page=prototypes
Results: S75=<(MRVI1)(PGAP1)(PLA2R1)(A2M)(GSK3B)>
Those proteins might be involved in signalling or metabolism
Some of them interfere with Alzheimer's disease cellular events
30/08/09 25
Results: S75=<(MRVI1)(PGAP1)(PLA2R1)(A2M)(GSK3B)>
Those proteins might be involved in signalling or metabolism
Some of them interfere with Alzheimer's disease cellular events
30/08/09 26
What about the other knowledge available online?
30/08/09 27
Data Mining techniques
Sequential patterns
Clustering and visualisation techniques
Selected sequential patterns
Interpretation techniques
New knowledge
Example: Query of KEGG’s Web services % To find all the genes associated with a particular gene in a pathway
To find all diseases, related to a particular gene thanks to a pathway
….
30/08/09 28
Objectives: validation + Research of novelties Research of documents associated to 1..n genes of a
pattern Identification of the synonyms of the genes in GO With 2 genes, 73% of the queries return less than 15
documents
30/08/09 29
S75%,25%=<(G1)(G2 G3)> Texts
30/08/09 30
30/08/09 31
Data Mining techniques
Sequential patterns
Clustering and visualisation techniques
Selected sequential patterns
Interpretation techniques
New knowledge
By discovering of new knowledge from transcriptomic data that showed biological significance, we pave the way for promising research both in terms of data mining and biology.
30/08/09 32
Improve each step of this process Other types of patterns (Fuzzy patterns) Other clustering methods and Treemap visualisation for the groups
Generalise these methods to other types of massive data (genomic data)
Use of sequential patterns for prediction tasks
30/08/09 33
30/08/09 34