bioinformática aplicada al estudio del control de la expresión de genes en el cerebro humano
TRANSCRIPT
Juan A. Bo*a Ins-tute of Neurology, University College London, UK Facultad de Informá-ca, Universidad de Murcia, Spain
Algorithmic Approaches for the construc3on of gene co-‐expression networks from control brain 3ssue samples mRNA
RNA-‐seq Substan-a nigra and Putamen brain co-‐expression networks on the UKBEC project to study Parkinson’s Disease
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 2
The central dogma of biology
source Wikipedia
We use pre-‐mRNA
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 3
Chapter I. The dataset
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 4
Braineacv2, RNA-‐seq based, focused on Parkinson’s Disease
l Affects 1% to 2% of the population older than 65 years l Symptons: resting tremor, bradykinesia, rigidity and impairment in ability
to initiate and sustain movements l The hallmark of this disease is the progressive loss of dopaminergic
neurons, mainly in the substantia nigra
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 5
Chapter II. The computa-onal model
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 6
Network analysis: aprioris-c versus free approaches
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 7
Are networks something more than a fancy graph and nice plots?
Yes they are!! • Can be used to iden-fy the ac-ve pathways in specific samples (cases vs. controls)
• Describe subsystems (i.e. cell types) • Iden-fy candidate genes (GBA)
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 8
To create networks we need to es-mate links between genes
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 9
From gene expression to gene co-‐expression networks
TREM2 forms a receptor signaling complex with TYROBP, which triggers the ac-va-on of immune responses in macrophages and dendri-c cells, and the func-onal polymorphism of TREM2 is r e p o r t e d t o b e a s s o c i a t e d w i t h neurodegenera-ve disorders such as Alzheimer’s disease (AD).
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 10
From gene expression to gene co-‐expression networks
TREM2 forms a receptor signaling complex with TYROBP, which triggers the ac-va-on of immune responses in macrophages and dendri-c cells, and the func-onal polymorphism of TREM2 is r e p o r t e d t o b e a s s o c i a t e d w i t h neurodegenera-ve disorders such as Alzheimer’s disease (AD).
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 11
From gene expression to gene co-‐expression networks
TREM2 forms a receptor signaling complex with TYROBP, which triggers the ac-va-on of immune responses in macrophages and dendri-c cells, and the func-onal polymorphism of TREM2 is r e p o r t e d t o b e a s s o c i a t e d w i t h neurodegenera-ve disorders such as Alzheimer’s disease (AD).
TYROBP TREM2 0.76
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 12
From gene expression to gene co-‐expression networks
TREM2 forms a receptor signaling complex with TYROBP, which triggers the ac-va-on of immune responses in macrophages and dendri-c cells, and the func-onal polymorphism of TREM2 is r e p o r t e d t o b e a s s o c i a t e d w i t h neurodegenera-ve disorders such as Alzheimer’s disease (AD).
TYROBP TREM2 0.76
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 13
But before reaching that
• Scale free topology assump-on – The degree distribu-on p(k) of a network follows a power law so p(k) ~ k-‐ϒ
– Evidence supports this for many organisms (ϒ is approx. 2.2)
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 14
But before reaching that (& 2)
• Modularity assump-on – Varia-on coefficient of organisms, Ci = ︎ 2n/ki(ki – 1) with n number of direct links connec-ng the ki nearest neighbours of i-‐th node, suggests strong modular organiza-on
– Evidence suggests the coefficient of varia-on is higher than expected in SFT networks
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 15
But before reaching that (& 3)
• Hierarchies solve this apparent dilemma
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 16
Chapter III. The problem
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 17
Our main focus: Parkison's Disease l Affects 1% to 2% of the population older than 65 years l Symptons: resting tremor, bradykinesia, rigidity and impairment in ability
to initiate and sustain movements l The hallmark of this disease is the progressive loss of dopaminergic
neurons, mainly in the substantia nigra excitatory
inhibitory
Substantia Nigra Pars Compacta
Brain regions most typically affected by adult-onset disease
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 18
Step 1: RPKM exonic gene quantification and CQN normalization Step 2: RPKM-CQN > 0.2 & missingness < 70% Step 3: Data correcting for Sex, Age and 7/8 Peer axes Step 4: WGCNA “signed” network construction Step 5: k-Means optimization of module partitions Step 6: Network assessment Step 7: Within tissue and between tissues subsystem characterization
33670 Ensembl genes
Approx. 19K genes, two datasets
Two corrected datasets
SNIG and PUTM networks And gene modules assignment
Modified gene modules assignment for SNIG and
PUTM Quality metrics for networks and
Gene partitions
Functional characterization, correlation with traits, gene
function prediction
Steps on the pipeline Outcomes
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 19
Co-expression analysis methodology
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 20
A measure of similarity between genes, values in [0,1]
From similarity to adjacency, hard thresholding
From similarity to adjacency, sou thresholding
From adjacency to TOM values
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 21
From TOM values to clusters by 1-‐TOM as a distance
complete linkage hierarchical approach for clustering summarisa-on based on eigenvalue
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 22
l Hierarchical clustering's results are highly variable depending on linkage (max/complete, min/single, average linkages)
l Module membership (MM) of g is the correlation of g and
the 1st PC of gene expression (module eigengene)
l This doesn't necessarily mean all genes are in the best module according to MM
l Previous approaches based on reassigning some/all genes l k-means algorithm helps finding a better partition in which
genes are (hopefully) assigned to a module in a more natural way
Why do we need an optimization process for WGCNA
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 23
A k-‐means heuris-c How does it work?
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 24
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 25
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 26
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 27
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 28
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 29
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 30
Outline of the op-miza-on
Accepted in BCM Systems Biology 03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 31
Chapter IV. The results
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 32
What we get from the optimization • More accurate par--on construc-on • Bever func-on annota-on for modules • Bever cell markers enrichment • More preserved modules across similar -ssues
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 33
How to assess the accuracy of a co-expression network
cluster driven validation
data driven validation by replication
Are the gene groups good according to a
given index
same tissue similar tissue
same network model
diff. network model
Biology: Does my module
make sense?
functional characterization
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 34
How to assess the accuracy of a co-expression network
cluster driven validation
data driven validation by replication
Are the gene groups good according to a
given index
same tissue similar tissue
same network model
diff. network model
Biology: Does my module
make sense?
functional characterization
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 35
Replication in GTEx GNAT networks for Substantia Nigra
lightgreenmidnightblue
cyantan
turquoisegrey60
lightyellowgreen
pinkblue
magentapurpleyellow
redblack
lightcyanbrown
salmongreenyellow
Mantel fold SNIG GTEx coexpression within0.
0
0.5
1.0
1.5
2.0
2.5
3.0
*** 340*** 412
*** 449*** 385
*** 574** 295
*** 250*** 427*** 457*** 783*** 505
*** 417*** 579
*** 521 244
* 410 260 88
372
redpurple
magentaturquoise
blueyellow
lightyellowcyan
lightcyantan
greengrey60
midnightbluelightgreen
pinkbrown
greenyellowblack
salmon
Mantel fold SNIG microarray binary between
0.0
0.5
1.0
1.5
2.0
*** 701*** 624
*** 760*** 837
*** 1070*** 837*** 365*** 624*** 653*** 460
*** 658 477 475
417 743 406 579 402
149
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 36
Replication in GTEx GNAT networks for Putamen
lightcyan
grey60
yellow
salmon
greenyellow
pink
green
black
tan
brown
purple
magenta
turquoise
lightgreen
midnightblue
blue
cyan
Mantel fold PUTM GTEx coexpression within
0.0
0.5
1.0
1.5
2.0
2.5
3.0
*** 429
*** 275
* 72
*** 372
*** 268
*** 444
*** 486
*** 484
*** 541
*** 611
*** 574
*** 617
*** 546
*** 440
*** 461
** 386
*** 759
greenyellow
salmon
lightcyan
brown
green
pink
grey60
cyan
tan
magenta
black
lightgreen
purple
turquoise
midnightblue
blue
yellow
Mantel fold PUTM GTEx binary between0.
0
0.5
1.0
1.5
2.0
2.5
*** 268
*** 372
*** 429
*** 611
*** 486
*** 444
*** 275
*** 759
*** 541
*** 617
*** 484
*** 440
*** 574
*** 546
*** 461
*** 386
72
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 37
How to assess the accuracy of a co-expression network
cluster driven validation
data driven validation by replication
Are the gene groups good according to a
given index
same tissue similar tissue
same network model
diff. network model
Biology: Does my module
make sense?
functional characterization
03/04/17
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 38
Asignment of biological func3on to modules with gProfiler
• Based on GO (BP, MF, CC) and gProfileR • Fisher's exact test and Bonferroni corrected p-‐values • What should we expect?
• Normal cell processes like respira-on, cell development, immune func-on
• But also brain related terms (hopefully movement disorders, signalling) in some of the modules
• What should we consider when looking for enrichment? • GO is not a closed world ontology • Something not found doesn't imply it doesn't exist
• Genes can play new roles • Groups of genes can have new func-ons • It is possible to find modules with no GO and s-ll be valid
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 39
Significant similarities in practically all modules
This is a tabular View of significant agreements (Fisher's Exact test) on genes between modules from the two tissues
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 40
Subsystems
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 41
Subsystems cell type & function
Neuron cells, Synapse/NADH
Microglia cells, Immune system
Nucleus, transcription
Neuron, astrocytes & microglia cell types Response to stimulus
Endothelial cell type, Cell division
Oligodendrocytes cell type, synapse & ion transport
Mitochondrion
Cytosolic rybosome
Ubiqutin 03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 42
Lessons learned l The default WGCNA can be improved to get more
coherent gene groups l Network analysis reveals
l cell specific subsystems in putamen and substantia nigra
l Interesting differences between the two tissues at the subsystem level
Ongoing work l Models to explain the differences between subsystems l Function prediction for non coding species and intergenic
regions
03/04/17 Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 43
03/04/17
Acknowledgements
University College London
Jana Vandrovcova Sebastian Guelfi Karishma D'sha John Hardy Mar Matarin Daniah Trabzuni
King's College London
Mike Weale Mina Ryten Paola Forabosco Adai Ramasamy
Conferencias de Inves-gación para Posgrado, Fac. Informátca, UCM 44