integrating biologic and clinical data towards resolving ... integrating biologic and clinical data

Download Integrating Biologic and Clinical Data towards Resolving ... Integrating Biologic and Clinical Data

Post on 21-Apr-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Integrating Biologic and Clinical Data towards Resolving Heterogeneity in Childhood Inflammatory Diseases

    by

    Andrey Mikhaylov

    A thesis submitted in conformity with the requirements for the degree of Master of Science

    Department of Immunology University of Toronto

    © Copyright by Andrey Mikhaylov 2016

  • ii

    Integrating Biologic and Clinical Data towards Resolving

    Heterogeneity in Childhood Inflammatory Diseases

    Andrey Mikhaylov

    Master of Science

    Department of Immunology

    University of Toronto

    2016

    Abstract

    Kawasaki disease (KD) is the leading cause of acquired heart disease in children from the

    developed world, with up to 25% risk of developing aneurysms if untreated. Diagnosis uses a set

    of classical clinical symptoms, which fail to capture the heterogeneity in KD. One solution is to

    incorporate new biomarkers and expanded biologic datasets to generate new predictive models

    that can better discern homogeneous groups of patients. Using Similarity Network Fusion (SNF),

    a novel computational technique, we uncovered 3 robust clusters of patients after fusing gene

    expression and clinical datasets for 171 KD patients. The first cluster is older females with

    marked activation of the innate immune response, second cluster is patients with prolonged fever

    and markers of activation of the adaptive response, while cluster 3 is males with no

    lymphadenopathy in a less severe innate immune response. SNF identified clinically meaningful

    clusters of patients and is a promising new tool for future KD studies.

  • iii

    Acknowledgments

    I would like to express my deepest gratitude to my supervisor Dr. Rae Yeung for all the guidance

    and support during my time in the lab. I would also like to acknowledge and thank my committee

    members Dr. Pamela Ohashi, Dr. Anna Goldenberg, and Dr. Shannon Dunn, for providing

    invaluable feedback for my thesis project. A huge thanks to Dr. Trang Duong for having patience

    with me and helping me at every step of this journey. Lastly, I am very grateful for meeting

    every single member of the Yeung lab – thank you for all the help and the fun times!

  • iv

    Table of Contents

    Acknowledgments.................................................................................................................... iii

    Table of Contents ..................................................................................................................... iv

    List of Tables .......................................................................................................................... vii

    List of Figures ........................................................................................................................ viii

    List of Abbreviations .................................................................................................................x

    1. Introduction ............................................................................................................................1

    1.1 Kawasaki Disease - overview and epidemiology ..........................................................1

    1.1.1 Overview ............................................................................................................1

    1.1.2 Incidence rates ...................................................................................................1

    1.1.3 Seasonal outbreaks .............................................................................................2

    1.2 Kawasaki Disease - Diagnosis and Treatment ...............................................................2

    1.2.1 Clinical symptoms .............................................................................................2

    1.2.2 Laboratory tests ..................................................................................................4

    1.2.3 Extra-cardiac findings ........................................................................................5

    1.2.4 Cardiac findings .................................................................................................5

    1.2.5 KD Treatment ....................................................................................................6

    1.2.6 AHA Diagnostic criteria sensitivity and specificity ..........................................6

    1.2.7 Risk scoring systems ..........................................................................................7

    1.3 Etiology ..........................................................................................................................9

    1.3.1 Immune response ...............................................................................................9

    1.3.2 Environmental triggers.......................................................................................9

    1.4 Translational studies ....................................................................................................10

    1.4.1 Linkage analysis...............................................................................................10

    1.4.2 Genome-wide association studies (GWAS) .....................................................10

  • v

    1.4.3 Gene expression ...............................................................................................11

    1.5 Post-translational studies in children ...........................................................................12

    1.5.1 Candidate gene approach .................................................................................12

    1.5.2 ITPKC and CASP3 ..........................................................................................12

    1.5.3 FCGR2A ..........................................................................................................12

    1.5.4 MHCII, CD40, and BLK .................................................................................13

    1.5.5 Summary of findings........................................................................................13

    1.6 Computational Analysis ...............................................................................................14

    1.6.1 Introduction ......................................................................................................14

    1.6.2 Data aggregation ..............................................................................................14

    1.6.3 Approach to computational analysis ................................................................15

    1.6.4 Similarity network fusion ................................................................................15

    1.6.5 Gene enrichment analysis ................................................................................16

    1.6.6 Feature selection and classifiers.......................................................................18

    1.6.7 Heterogeneity in KD ........................................................................................19

    1.6.8 Rationale ..........................................................................................................20

    1.6.9 Hypothesis and objectives................................................................................22

    2 Methods ...............................................................................................................................23

    2.1 KD Cohort ....................................................................................................................23

    2.2 Gene expression microarray ........................................................................................23

    2.3 Datasets ........................................................................................................................24

    2.4 Computational analysis workflow ...............................................................................24

    2.5 Data pre-processing .....................................................................................................25

    2.6 Similarity network fusion ............................................................................................25

    2.7 Gene enrichment analysis ............................................................................................26

  • vi

    2.8 Co-clustering probability .............................................................................................26

    2.9 Statistical analysis ........................................................................................................27

    2.10 FeaLect feature selection ............................................................................................27

    3 Results .................................................................................................................................29

    3.1 KD cohort and data pre-processing ..............................................................................29

    3.2 Three unique clusters were identified after aggregation of clinical and gene

    expression datasets with SNF ......................................................................................31

    3.3 High robustness and low clinical feature sensitivity amongst the 3 clusters ......

Recommended

View more >