case study: characterizing diseased states from expression/regulation data

Case Study: Characterizing Diseased States from

Expression/Regulation Data

Tuck et al., BMC Bioinformatics, 2006.

Background

● How do we classify processes/expression related to disease/phenotype (separating signal/data)?

● How do we use all of the data available to us – sequences, expression, regulation?

● Present case study of acute leukemia and breast cancer (normal vs. diseased cells).

Summary of Contributions

● Constructing sample-specific regulatory networks.

● Identify links between transcription factors and regulated genes that differentiate healthy states from diseased states.

● Generalize to simultaneous changes in functionality of multiple regulatory links, pointing to a regulatory gene / emanating from one TF.

Summary of Contributions

● Examine distances in transcriptional networks for subsets of genes that characterize diseased state.

● Observation that genes that optimally classify samples are concentrated in neighborhoods.

● Genes that are deregulated in diseased sttes exhibit high connectivity.

● TF-regulated gene links and centrality of genes can be used to characterize diseased cells.

Background

● Current work largely focuses on identification of individual differentially expressed genes, or co-regulated gene sets.

● There is significant work on module identification (graph models, SVD, connected components, etc.)

● There is work on expression patterns of genes that can classify tumor types.

● There is some work on transcription networks prior to this work as well [TRANSFAC/CREME]

Constructing Disease Cell Networks

● Intersect connectivity network representing TF binding to gene promoter regions, with co-expression networks representing TF target gene co-expression.

● Use TRANSFAC to relate known TF binding sites to promoter regions of genes and known TF-target gene interactions.

● For data derived from each microarray (Sample or patient), construct a co-expression network such that each TF-gene pair is assigned +1 or -1 based on up/down co-regulation.


● Intersection of connectivity and individual co-expression networks gives condition specific (CS) regulatory networks.

● CS networks derived from 6 gene expression studies using 3 types of datasets – normal cell lineages, tumor vs. normal tissues, and disease specific tumors associated with variable climical outcomes.

● 4821 genes and 196 Tfs on early Affy arrays and 13363 genes and 233 Tfs on newer arrays.

Classifying based on network features.

● Assume that each disease sample has a distinct regulatory network (pattern of activated links that gives rise to its expression profile).

● Examine how different aspects of network structure characterize different phenotypes.


Link Based Approach● Examine differences between patient

samples by analyzing activity status of regulatory links

● Construct networks unique to patients● Yields complete discriminatory networks.


Degree Based Approach● “Centrality” of individual genes in networks● Degree – number of TFs activating or

suppressing a particular gene (in degree), or number of genes regulated by a single TF (out degree).

● Use genome wide degree profile – identifying nodes with largest changes in centrality (rewiring) will assist is in detecting hotspots.


Sample Classification● Create regulatory networks for every

sample and apply a classifier. Rank features to identify set of TF-gene links Use training sets to identify features and rank

links, genes, and degree of nodes that undergo most substantial changes

● Acute lymphoblastic leukemia vs. acute mueloid leukemia

● Two different myeloid leukemia types● Different matched cell types (renal-cell carcinoma

vs. normal)


Sample Classification● Pass top links to train a basic classifier● Cross validate.

Classification Techniques

Classification results

case study: characterizing diseased states from expression/regulation data

Documents