initial steps toward computational discovery of genetic regulatory networks in pancreatic islet...

68
Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL April 9, 2009

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development

Georg Gerber, PhDGifford Laboratory, MIT CSAILApril 9, 2009

Page 2: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

OutlineGoalsExpression data overviewTF-TF interaction networks

◦pair-wise mutual information◦Bayesian networks

Gene expression programsChIP-seq dataDirections for future work

Page 3: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Biological goals of building a transcriptional regulatory network of pancreatic specification

Knowledge of distinct signaling/transcriptional steps involved in pancreatic specification◦ Optimize ES differentiation by determining signaling event(s)

directly inducing each sequential TFWhat is the network structure? Linear or cross-

regulatory, parallel or all interrelated◦ Direct reprogramming using TFs would benefit from knowing

hierarchy of each network◦ Are TFs that play role in specification of pancreas necessary

for later function of pancreas or are they merely required to properly induce other necessary TFs?

Can knowledge of the pancreatic specification network teach us about lineage diversification within the pancreas (endocrine, exocrine, duct)?

Page 4: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Immediate computational goals

Determine set of transcription factors active at different developmental stages

Discover network “wiring”Determine how network

changes/evolves throughout development

Compare in vivo and ESC networks

Page 5: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

OutlineGoalsExpression data overviewTF-TF interaction networks

◦pair-wise mutual information◦Bayesian networks

Gene expression programsChIP-seq dataDirections for future work

Page 6: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Definitive endoderm (E7.75 and E8.75 as

well)

Embryonic mesoderm

Embryonic ectoderm/notoch

ord

Esophagealendoderm

Lungendoderm

PancreaticEndoderm

(E10.5 as well)

Liverendoderm

Stomachendoderm

Intestinalendoderm

E8.25

E11.5

Expression data overview

Page 7: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Tcf2Foxa2

DMSO

2 uM RA

ES

Sox17

GFP+

50 ng/mL ActA6 days

DMSO/2 uM RA6h/24h

FACS sort Sox17GFP+Dpp4-

definitive endodermand perform microarray

1. Implant bead coated with DMSO/RA into foregut of E8.25 (4-6 somite) embryo

2. Explant embryo anterior to 1st somite

3. Culture for 6/24 hours4. Dissociate, sort for EpCAM+

endoderm5. Amplify RNA and profile on

Illumina Mouse Ref8 v2 chips

Page 8: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Expression data overview (cont.)

120 Illumina arrays (18118 genes/array)72 distinct experiments (41 in mESC’s)Standardized mESC/in vivo experiments

separately2758 genes w/ ≥ 2-fold change in ≥ 5 experiments154 TFs w/ ≥ 2-fold change in ≥ 5 experiments

(out of 946 “definite” or “candidate” TFs from TFCat, Fulton et al, Genome Biology 2009)

Page 9: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Limitations of expression data for genetic network reconstruction

Need 100’s of varied experiments for finding relevant/significant networks

Association ≠ causationHigh false positive rates (high

dimensional, noisy, dependent data)

High false negative rates (low TF transcript abundance, post-transcriptional regulation, etc.)

Page 10: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

OutlineGoalsExpression data overviewTF-TF interaction networks

◦pair-wise mutual information◦Bayesian networks

Gene expression programsChIP-seq dataDirections for future work

Page 11: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Pair-wise mutual information networks (CLR)Context Likelihood of Relatedness

method: Faith et al., PLoS Biology 2007

Computes MI between all genes Innovation: considers MI

distribution for both target and source to compute p-values/estimate FDR

Page 12: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

CLR (cont.)

Page 13: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.25 4-6s definitive endoderm

TF-TF network (MI)

Page 14: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.75 13-15s definitive endoderm

TF-TF network (MI)

Page 15: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E9.5 definitive endoderm

TF-TF network (MI)

Page 16: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E10.5 pancreatic endoderm

TF-TF network (MI)

Page 17: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E11.5 pancreatic endoderm

TF-TF network (MI)

Page 18: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E11.5 intestinal endoderm

TF-TF network (MI)

Page 19: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

6h 83 uM RA bead mES 2 uM RA 6h

TF-TF network (MI)

Page 20: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

24h 83 uM RA bead mES 2 uM RA 24h

TF-TF network (MI)

Page 21: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

OutlineGoalsExpression data overviewTF-TF interaction networks

◦pair-wise mutual information◦Bayesian networks

Gene expression programsChIP-seq dataDirections for future work

Page 22: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Bayesian networksDirected networks, allow for multiple

parentsEncode conditional independencePenalize complexity automaticallySoftware: Banjo (Alexander Hartemink,

Duke University)

Page 23: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.25 4-6s definitive endodermTF-TF network (Bayes

Net)

Page 24: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.75 13-15s definitive endodermTF-TF network (Bayes

Net)

Page 25: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E9.5 definitive endoderm TF-TF network (Bayes Net)

Page 26: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E10.5 pancreatic endodermTF-TF network (Bayes Net)

Page 27: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E11.5 pancreatic endodermTF-TF network (Bayes Net)

Page 28: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

6h 83 uM RA bead mES 2 uM RA 6h

TF-TF network (Bayes Net)

Page 29: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

24h 83 uM RA bead mES 2 uM RA 24h

TF-TF network (Bayes Net)

Page 30: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

OutlineGoalsExpression data overviewTF-TF interaction networks

◦pair-wise mutual information◦Bayesian networks

Gene expression programsChIP-seq dataDirections for future work

Page 31: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Advantages to methods that discover groups of genes

Infer more robust relationships because considering many genes

Allow for enrichment analysis◦Functional categories◦Signaling pathways◦TF DNA binding sequence motifs

Page 32: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

GeneProgramGerber et al, PLoS Comp Bio 2007Discovers sets of genes co-expressed

across subsets of conditionsInnovations:

◦Simultaneously models probabilistic structure of experiments (tissues) and genes

◦Uses Hierarchical Dirichlet Processes, a fully Bayesian method for automatically determining the number of expression programs and tissue groups

◦Outperforms state-of-the-art biclustering methods

Page 33: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Hierarchical clustering

Singular Value Decomposition

(SVD)

Non-negative Matrix Factorization (NMF)

GeneProgram w/o tissue groups

Full GeneProgram model

Page 34: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

GeneProgram produced a map of 12

tissue groups and 62

expression programs

tissue groups

Page 35: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

GeneProgram produced a map of 12

tissue groups and 62

expression programs

tissue

Page 36: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

GeneProgram produced a map of 12

tissue groups and 62

expression programs

expression programs (sorted by

generality score)

Page 37: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

GeneProgram produced a map of 12

tissue groups and 62

expression programs

expression program use by tissue

Page 38: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Expression program enrichment analysisGO categories

◦FDR controlled to 5%TRANSFAC motifs

◦Software: SAMBA◦Scans +3000 to -200 bp for each

motif◦Uses PWM to score region,

background to calculate p-value (Bonferroni corrected)

Page 39: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.25 4-6s definitive endoderm

Expression programs (GO and motif enrichment)

Page 40: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.75 13-15s definitive endoderm

Expression programs (GO and motif enrichment)

Page 41: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E9.5 definitive endoderm

Expression programs (GO and motif enrichment)

Page 42: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E10.5 pancreatic endoderm

Expression programs (GO and motif enrichment)

Page 43: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Expression programs showing TFs in programs and motif enrichment

E8.25 4-6s definitive endoderm

Page 44: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.75 13-15s definitive endoderm

Expression programs showing TFs in programs and motif enrichment

Page 45: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E9.5 definitive endoderm

Expression programs showing TFs in programs and motif enrichment

Page 46: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E10.5 pancreatic endoderm

Expression programs showing TFs in programs and motif enrichment

Page 47: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Expression programs showing TFs in programs and motif enrichment

E11.5 pancreatic endoderm

Page 48: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

OutlineGoalsExpression data overviewTF-TF interaction networks

◦pair-wise mutual information◦Bayesian networks

Gene expression programsChIP-seq dataDirections for future work

Page 49: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Retinoic acid receptor ChIP-seq dataGenerated in the Wichterle lab at

Columbia (unpublished data, Motor Neuron Development Project)

mESC’s grown to embryoid body stage, profiled after 8h of RA exposure

Page 50: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

ChIP-seq RAR binding: Cyp26a1

Page 51: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

ChIP-seq RAR binding: Rarb

Page 52: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Overlap of Melton lab expression data and RAR binding data

# upreg genes

# bound genes

% bound genes

p-value

6h RA 83 uM bead

104 29 28% 0

1d RA 83 uM bead

369 29 8% 0.069

mESC 6h 2 uM RA

165 33 20% 0

mESC 1d 2 uM RA

220 38 17% 0

Binding events determined with modified MACS method (Zhang et al, Genome Biology 2008); called if significant peak found w/in 50 kb of gene start site

Page 53: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Future computational directionsAdd publically available ES expression data Apply more sophisticated TF binding motif

methods (phylogeny, spatial arrangements, co-regulation)

Extend GeneProgram framework for add’l data types (TF expression, binding motifs, ChIP-seq, knockdown/overexpression, ?protein-protein interactions, etc.) → causal/predictive models

Infer dynamic rewiring networks over inferred developmental tree

Develop novel probabilistic methods for ChIP-seq data

Page 54: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

AcknowledgementsRich Sherwood (Melton lab) - all

the expression data! Arvind Jammalamadaka (Gifford

lab) -initial data analysis/normalization methods

Shaun Mahony (Gifford lab) - RA ChIP-seq data analysis

Esteban Mazzoni (Wichterle lab) - RA ChIP-seq data

Page 55: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

Backup slides

Page 56: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E11.5 stomach endoderm

TF-TF network (MI)

Page 57: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E12.5 esophagus endoderm

TF-TF network (MI)

Page 58: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E11.5 liver endoderm

TF-TF network (MI)

Page 59: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E11.5 lung endoderm

TF-TF network (MI)

Page 60: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.25 anterior endoderm

Page 61: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.25 4-6s ectoderm

Page 62: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

E8.25 4-6s mesoderm

Page 63: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

6h 83 mM RA bead

Page 64: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

d1 83 mM RA bead

Page 65: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

mES 2mM RA 6h

Page 66: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

mES 2mM RA 24h

Page 67: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

mES differentiated 7d

Page 68: Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development Georg Gerber, PhD Gifford Laboratory, MIT CSAIL

GeneProgram outperformed popular biclustering algorithms in discovery of biologically meaningful gene sets from real microarray data

Datasource

Algorithm Gene dimension(GO category enrichment)

Tissue dimension(manually derived category

enrichment)

N GeneProgram 93% 76%

N NMF 35% 29%

N Samba 53% 9%

S GeneProgram 66% 53%

S NMF 28% 19%

S Samba 51% 28%

N = Novartis Tissue Atlas v2 (141 mouse and human tissues)

S = Shyamsundar et al. (115 human tissues)