case study: characterizing diseased states from expression/regulation data
DESCRIPTION
Tuck et al., BMC Bioinformatics, 2006. Case Study: Characterizing Diseased States from Expression/Regulation Data. Background. How do we classify processes/expression related to disease/phenotype (separating signal/data)? - PowerPoint PPT PresentationTRANSCRIPT
Case Study: Characterizing Diseased States from
Expression/Regulation Data
Tuck et al., BMC Bioinformatics, 2006.
Background
● How do we classify processes/expression related to disease/phenotype (separating signal/data)?
● How do we use all of the data available to us – sequences, expression, regulation?
● Present case study of acute leukemia and breast cancer (normal vs. diseased cells).
Summary of Contributions
● Constructing sample-specific regulatory networks.
● Identify links between transcription factors and regulated genes that differentiate healthy states from diseased states.
● Generalize to simultaneous changes in functionality of multiple regulatory links, pointing to a regulatory gene / emanating from one TF.
Summary of Contributions
● Examine distances in transcriptional networks for subsets of genes that characterize diseased state.
● Observation that genes that optimally classify samples are concentrated in neighborhoods.
● Genes that are deregulated in diseased sttes exhibit high connectivity.
● TF-regulated gene links and centrality of genes can be used to characterize diseased cells.
Background
● Current work largely focuses on identification of individual differentially expressed genes, or co-regulated gene sets.
● There is significant work on module identification (graph models, SVD, connected components, etc.)
● There is work on expression patterns of genes that can classify tumor types.
● There is some work on transcription networks prior to this work as well [TRANSFAC/CREME]
Constructing Disease Cell Networks
● Intersect connectivity network representing TF binding to gene promoter regions, with co-expression networks representing TF target gene co-expression.
● Use TRANSFAC to relate known TF binding sites to promoter regions of genes and known TF-target gene interactions.
● For data derived from each microarray (Sample or patient), construct a co-expression network such that each TF-gene pair is assigned +1 or -1 based on up/down co-regulation.
Constructing Disease Cell Networks
● Intersection of connectivity and individual co-expression networks gives condition specific (CS) regulatory networks.
● CS networks derived from 6 gene expression studies using 3 types of datasets – normal cell lineages, tumor vs. normal tissues, and disease specific tumors associated with variable climical outcomes.
● 4821 genes and 196 Tfs on early Affy arrays and 13363 genes and 233 Tfs on newer arrays.
Constructing Disease Cell Networks
Constructing Disease Cell Networks
Classifying based on network features.
● Assume that each disease sample has a distinct regulatory network (pattern of activated links that gives rise to its expression profile).
● Examine how different aspects of network structure characterize different phenotypes.
Classifying based on network features.
Link Based Approach● Examine differences between patient
samples by analyzing activity status of regulatory links
● Construct networks unique to patients● Yields complete discriminatory networks.
Classifying based on network features.
Degree Based Approach● “Centrality” of individual genes in networks● Degree – number of TFs activating or
suppressing a particular gene (in degree), or number of genes regulated by a single TF (out degree).
● Use genome wide degree profile – identifying nodes with largest changes in centrality (rewiring) will assist is in detecting hotspots.
Classifying based on network features.
Sample Classification● Create regulatory networks for every
sample and apply a classifier. Rank features to identify set of TF-gene links Use training sets to identify features and rank
links, genes, and degree of nodes that undergo most substantial changes
● Acute lymphoblastic leukemia vs. acute mueloid leukemia
● Two different myeloid leukemia types● Different matched cell types (renal-cell carcinoma
vs. normal)
Classifying based on network features.
Sample Classification● Create regulatory networks for every
sample and apply a classifier. Rank features to identify set of TF-gene links Use training sets to identify features and rank
links, genes, and degree of nodes that undergo most substantial changes
● Acute lymphoblastic leukemia vs. acute mueloid leukemia
● Two different myeloid leukemia types● Different matched cell types (renal-cell carcinoma
vs. normal)
Classifying based on network features.
Sample Classification● Pass top links to train a basic classifier● Cross validate.
Classifying based on network features.
Classifying based on network features.
Classifying based on network features.
Classifying based on network features.
Classifying based on network features.
Classifying based on network features.
Classifying based on network features.
Classification Techniques
Classification Techniques
Classification Techniques
Classification results
Classification results
Classification results