complex networks small world, scale free

71
Introduction to Network Analysis in Systems Biology Avi Ma’ayan, Ph.D. Department of Pharmacology and Systems Therapeutics Systems Biology Center New York (SBCNY) Mount Sinai School of Medicine New York, NY Lecture 1 Representation of biological systems as networks 1

Upload: carlos-arozamena

Post on 12-Jul-2016

225 views

Category:

Documents


9 download

DESCRIPTION

Protein NetworkReview

TRANSCRIPT

Introduction to Network Analysis in Systems Biology

Avi Ma’ayan, Ph.D.

Department of Pharmacology and Systems Therapeutics

Systems Biology Center New York (SBCNY)

Mount Sinai School of Medicine New York, NY

Lecture 1 Representation of biological systems as networks

1

Two Fundamental Ways to Abstract Biochemical Reactions

Eisenberg et al. Nature 405:823 (2000) 2

C1

C2

A1

A3D2

D1

D3

A2 B1

a b

c

e

E1

E2 E3 C1

C2

A1

A3D2

D1

D3

A2 B1 E1

E2 E3

C1

C2

A1

A3D2

D1

D3

A2 B1 E1

E2 E3

d

C1

C2.2

A1.2

A3D2

D1

D3.2

A2 B1 E1

E2 E3

A1.1

C2.1D3.1

C1

C2.2

A1.2

A3

D2 D1

D3.2

A2 B1 E1E2

E3

A1.1C2.1

D3.1

Ma'ayan et al. Annu Rev Biophys Biomol Struct. 34:319-349 (2005)

Different Levels of System Representation

A- gene ontology

B- protein-protein interactions

(undirected graphs)

C- signaling network diagrams

(mixed graphs,

directed/undirected)

D- ODE modeling of signaling

pathways (directed and weighted)

E- PDE modeling of signaling

pathways considering space

(directed, weighted and nodes

can move or be at different

compartments)

3

Graph Theory - Basic Concepts G = {V, E, A}

G – graph

V – vertices/nodes

E – edges/links

A- arcs/directed edges/arrows

Planar Graphs: when

there are no edge

crossing

Bipartite Graphs: two

sets of nodes; links

only between

members of each set http://en.wikipedia.org 4

Metabolic Networks

• Two types of nodes: enzymes and substrates

• Reactions can be directional or bidirectional

• Bipartite graph, reactions are not connected

and substrates are not connected

Bourqui et al. BMC Systems Biology 1:29 (2007)

Glycolysis

Berg et al. Biochemistry

New York: W. H. Freeman and Co.; c2002 5

• Nodes are proteins, metabolites, lipids,

second messengers, or peptides

• Interactions designate information flow, can

be activation or inhibition, and are direct and

physical

Gi/o Pathway

Cell Signaling

Pathways

Ma'ayan A, et al. Sci Signal. 2:cm1 (2009)

6

Signaling

pathways are not

isolated and can

be merged into

large networks

Ma’ayan et al. Science 310, 1078 (2005)

Cell Signaling Networks

7

Indirect Signaling Interactions from Literature

Li et al. PLoS Biol. 4:e312 (2006)

Pseudo-nodes

are used as

place holders to

fill-in unknown

links and

components

8

Kinase-Substrate Network

Protein kinase- substrate networks are directed bipartite graphs that connect kinases to their substrates through protein phosphorylation

Tan et al. Sci Signal. 2009 Jul 28;2(81):ra39 9

Example of Gene Regulation Networks

MacArthur et al., PLoS ONE 3: e3086 (2008)

Stem cell differentiation regulation

• Nodes are genes and transcription factors

• Interactions can be directional or bidirectional

• Interactions can be activation or inhibition

10

• Nodes are genes, transcription

factors or signaling components

• Interactions are directional and

can be activation or inhibition

Drosophila Segment Polarity Expression Pattern

Another Example of a Gene Regulation Network

Albert R, Othmer HG. J Theor Biol. 2003 223(1):1-18.

11

Network Construction from Legacy Literature

• Manual

• Semi-automated (i.e. preBIND)

• Natural Language Processing (NLP) (i.e. PathwayStudio)

Donaldson I, et al. BMC Bioinformatics. 4:11 (2003)

preBIND

12

PPI Networks from Y2H Screens

• Yeast

Does the small overlap between the

two studies mean that high-

throughput Y2H screens are not

identifying real interactions? 13

PPI Networks from Y2H Screens

Giot et al. Science 1727:302 (2003)

Fly Worm

Li et al. Science 540:303 (2004)

14

PPI Networks from Y2H Screens

• Human

Blue- literature

Red- Y2H screen (~78% verified by Co-IP)

• Defined different levels

of confidence

• Identified disease

genes

• Assessed overlap with

literature-based

interactions

• Used GO annotation

15

Epistasis Networks: Inferring Networks by Double Deletion Mutants

291 genetics

interactions

among 204

yeast genes

Hin Yan Tong, Science 294: 2364 (2001) 16

Epistasis Interactions in Yeast Metabolism

Segre et al., Nature Genetics 37:77 (2004)

Two types of links:

buffering and aggravating

Links can be directional

or bi-directional

17

Inferring Networks from Time Series Microarrays

Zou M, Conzen SD. Bioinformatics. 2005 21(1):71-9. 18

Perturbations and Bayesian Networks Networks can be inferred using targeted pertrubations

Sachs et al. Science. 2005 308:523-9 19

Disease Gene Networks

Each node corresponds to a distinct disorder, colored based on the disorder class. The size of

each node is proportional to the number of genes in the corresponding disorder, and the link

thickness is proportional to the number of genes shared by the disorders connected by the link.

Goh et al. Proc Natl Acad Sci USA. (2007) 104:8685-90

20

Drug-Target Networks

Ma’ayan et al. Mt Sinai J Med (2007) 74:27

Yildirim et al. Nat Biotechnol. (2007) 25:1110

Drugs can be connected to their known protein targets

21

Bipartite Networks for Data Integration

Tanay et al. PNAS (2004) 101:2981

Gene IDs can be used as

anchors for integrating

different omics datasets

22

Pajek - Free Windows Software to Visualize Networks

http://vlado.fmf.uni-lj.si/pub/networks/pajek/ 23

Cytoscape - Leading Academic Network Analysis and Visualization Software

Shannon et al. Genome Res. 2003 13(11):2498-504 24

Summary

• Different types of biological intracellular molecular networks can be represented by different types of graphs

• Networks can be created from collecting interactions published in many papers, or networks can be reconstructed directly from data

• Protein interaction networks and cell signaling networks can be connected to drugs and diseases

• Network representation can be used to integrate different datasets using genes as anchors

25

Introduction to Network Analysis in Systems Biology

Avi Ma’ayan, Ph.D.

Department of Pharmacology and Systems Therapeutics

Systems Biology Center New York (SBCNY)

Mount Sinai School of Medicine New York, NY

Lecture 2 Milestones and key concepts in network analysis

26

Konigsberg Bridge Problem

27

What are the mathematical consequences of throwing on the floor a random number of

buttons and randomly connecting them with a random number of links?

In the 1960’s Paul Erdos and Alfred Renyi studied the properties of random graphs.

P. Erdos A. Rényi. Publ. Math.

(Debrecen) 6, 290-297 (1959)

28

“Real” Networks are “Small World”

Watts DJ, Strogatz SH. Collective dynamics of 'small-world' networks.

Nature. 1998 Jun 4;393(6684):440-2. 29

Clustering Coefficient

Characteristic Path Length

Average shortest path from between all possible pairs of nodes

Ravasz et al. Science 297, 1551 (2002)

30

Creating Small-World Networks

Watts DJ, Strogatz SH. Collective dynamics of 'small-world' networks. Nature. 1998 Jun 4;393(6684):440-2.

31

Barabasi’s group analyzed databases of metabolic networks in lower organisms

and the protein-protein interactions map of the yeast proteome inferred from high-

throughput yeast-2-hybrid screens. All shown to have scale-free connectivity

distribution.

Barabasi, Albert and colleagues found that many real networks including the

Internet and the WWW are scale-free. This means that the connectivity

distribution of nodes fits a power-law.

Jeong et al. Nature 407, 651 (2000) Jeong et al. Nature 411, 41 (2001)

Barabasi and Albert. Science

286, 509 (1999)

“Real” Networks are “Scale Free”

32

Erdos-Renyi random networks vs.

Barabasi-Albert scale-free networks

Barabasi, Physics World, July 2001 33

Creating Scale-Free Networks

Barabasi and Albert. Science 286, 509 (1999)

34

The Importance of Hubs

Albert R, Jeong H, Barabasi A-L: Error and attack tolerance of complex

networks. Nature 2000, 406(6794):378-382.

H. Jeong, S. P. Mason, A.-L. Barabási and Z. N. Oltvai. Lethality and centrality

in protein networks. Nature 411, 41-42 (2001)

35

Creating Scale-Free Networks using

Duplication-Divergence Growth

Vázqueza et al. Complexus 1:1 (2003)

The network grows by

copying a node with its

links, then some links are

deleted with probability p,

and a link is formed

between the copied node

and the new node with

probability q.

36

Creating Geometric Random Networks

Throwing a bunch of buttons

in N-dimensions and

connecting buttons if they

are close in Euclidian space

(geometric distance between

nodes)

Przulj et al. Bioinformatics. 2004 20:3508 37

Network Motifs are Recurring Patterns of

Connectivity

Motifs are those circuits that are statistically more prevalent in real

networks vs. motifs found in randomized networks

Milo et al. Science, 298, 824 (2002)

38

Evolutionary conservation of motif constituents in the yeast protein interaction network

S Wuchty et al.

Nature Genetics

35, 176 – 179 (2003)

Graphlets – motifs in

undirected networks

39

Considering Protein Structure of Hubs

Hub proteins are either

multi or single site

Kim et al. Science 314, 1938 (2006)

40

Bow-Tie Structure of Signaling Networks

Oda and Kitano. Molecular Systems Biology 2:2006.0015 (2006)

41

Hierarchical Organization of Pathways from Ligands to Effectors

A topology common for systems that need to make discrete

decisions based on a continues complex state of the environment

Ma'ayan et al. Phys Rev E Stat Nonlin Soft Matter Phys. 2006 73:061912

Power-law distribution of branched pathways

42

General Topological Properties

of Biomolecular Networks

Ma'ayan A. J Biol Chem. 2009 284(9):5451

A- power-law connectivity distribution

B- party hubs and date hubs

C- multi-site and single-site hubs

D- power-law distribution of branched

pathways

E- bow-tie structure of signaling pathways

F- bifans, the most common motifs

G- negative feedback loops at the

membrane

H- monotone system topology

I- nesting of positive feedback loops

43

Ma’ayan et al. PNAS105:19235 (2010) 44

Ma’ayan et al. PNAS105:19235 (2010) 45

Ma’ayan et al. PNAS105:19235 (2010) 46

MacArthur, Sanchez-Garcia and Ma’ayan, Phys. Rev. Lett. 104, 168701 (2010) 47

Summary • Real networks are “small world” and “scale free”

• Simple algorithms can recreate the structure of real networks

• Shuffled networks are created for statistical control

• Network motifs and graphlets define the topology at the microscopic level

• Real biological regulatory networks have “date-and-party hubs”, hubs are either multi or single site, pathways branching follows a power-law, signaling networks display bow-tie structure, bifans are highly enriched, feedback loops are depleted and nested to provide dynamical stability. 48

Introduction to Network Analysis in Systems Biology

Avi Ma’ayan, Ph.D.

Department of Pharmacology and Systems Therapeutics

Systems Biology Center New York (SBCNY)

Mount Sinai School of Medicine New York, NY

Lecture 3 Making predictions using network analysis

49

Making Predictions based on Network Topology

Proteins close to each other in the interactome

network are also likely to share GO terms

Sharan et al. Molecular Systems Biology 3, 88 2007 50

Making Predictions based on Network Topology

Albert and Albert used the

SUGGEST algorithm used to

organize products in a

supermarket to predict protein-

protein interaction based on

known protein-protein

interactions

51

Making Predictions based on Network Topology

Completing defective cliques can be

used to predict protein interactions

Yu et al. Bioinformatics 22, 7 (2006) 52

How can we use prior knowledge networks for analyzing multivariate

experimental results?

+

Computational Modeling

Experiments

(High-content)

Low hanging fruit hypotheses 53

Govek et al.Genes & Dev. 19:1 (2005)

The Goal is to Better Understand Initial Cell Signaling Activation of Transcription Factors After HU-210

Stimulation of CB1R Receptors

Induction of Neurite Outgrowth

Study the Process of Cell Differentiation 54

Protein-DNA Arrays:

Measuring Transcription Factor Activation

DMSO 20 min

AP-2

RAR

PAX6

CREB

MYB

STAT3 TFAP2A, CEBPA, NFYA, MYB, CREB1, NR3C1, STAT3, SMAD3, SMAD4, STAT4, THRA, THRB, VDR, GATA2, STAT1, PAX6, XBP1, NR1I2, HOXD8, HOXD9, HOXD10, RUNX2, HIVEP1

Validated factors with Gel-shift assays

23 TF increase binding to DNA after 20 minutes

P

Consensus promoter sequence

Transcription Factor P

Consensus promoter sequence

Transcription Factor

signal

Bromberg KD, Ma'ayan A, Neves SR, Iyengar R.

Science. 2008 May 16;320(5878):903-9.

55

Genes2Networks

18,675

29,317

4,242

1,059

1,418

7,241

Integrator

Filter

3,121

List of

TFs

_____

_____

_____

_____

Genes2Networks

Output

subnetwork

Vidal

Stelzl

242

6,149

3,155

Unfiltered

Dataset

Filtered

Dataset

Significant

Intermediates

Berger SI, Posner JM, Ma'ayan A.

BMC Bioinformatics. 2007 Oct 4;8:372.

56

The Genes2Networks Algorithm

Large-scale mammalian

protein-protein

interaction network

Seed list of proteins

which are nodes in the background

network

Step 1: Find all shortest paths for all pairs of

nodes from the seed list

Step 2: Combine all links and nodes from all

found shortest paths to form a subnetwork

Step 3: Add all missing links that directly

connect any pair of nodes from the

subnetwork using interactions from the

background network

Step 4: Rank intermediate nodes (node that

are not from the seed list) based on the

proportion of links in the created

subnetwork vs. total links in the background

network using a binomial proportion test

Inputs Algorithm Output

Subnetwork connecting

the seed nodes

Table with ranked

intermediate proteins

Berger SI, Posner JM, Ma'ayan A.

BMC Bioinformatics. 2007 Oct 4;8:372.

57

Genes2Networks Web Interface

- Hash function for fast loading of the datasets

- Implementation of AJAX allows changing the page without reloading

- GraphViz, Overlib, and PerlMagic library utilization

http://actin.pharm.mssm.edu/genes2networks

Berger SI, Posner JM, Ma'ayan A. BMC Bioinformatics. 2007 Oct 4;8:372.

58

Network Connecting Activated Factors

Bromberg KD, Ma'ayan A, Neves SR, Iyengar R.

Science. 2008 May 16;320(5878):903-9.

59

Making Predictions by Network Analysis

Bromberg KD, Ma'ayan A, Neves SR, Iyengar R. Science. 2008 320(5878):903-9. 60

Experimental Validation

BRCA1 Blocks Neurite Outgrowth PI3K-AKT Pathway is Important for Neurite Outgrowth and Regulates Many of the Indentified Factors

Bromberg KD, Ma'ayan A, Neves SR, Iyengar R. Science. 2008 320(5878):903-9. 61

Predicting Disease Genes Noonan Syndrome

- Mild up regulation in the MAPK pathway (gain of function mutations)

- Four disease genes were identified in about 60% of patients

Noonan’s Symptoms

- Heart Defects

- Distinct Facial Features

- Learning Difficulties

- Bruising and Bleeding

62

Genes2Networks was used to find Additional

Genes that may be Mutated in Noonan Syndrome

Use known disease genes to build a network around these genes to identify new

genes/nodes that could be additional disease genes

Cordeddu V, Di Schiavi E, Pennacchio LA, Ma'ayan A, et al. Nat Genet. 41:1022 (2009) 63

Steiner Trees used to Connect Seed Genes

White and Ma’ayan, 41st ACSSC 2007. IEEE p. 155-159

64

Steiner Trees Used to Connect Signaling Pathways to Gene Regulation

Huang SS, Fraenkel E. Sci Signal.

2009 2(81):ra40

65

PluriNet - Connecting Differentially Expression Genes in Different Stem-Cells

Using Protein Interactions from Literature

Müller et al. Nature. (2008) 455:401 66

KEA- kinase-substrate interaction database and web-

based system for kinase enrichment analysis

Lachmann and Ma’ayan. Bioinformatics 11, 87 (2010)

http://amp.pharm.mssm.edu/lib/kea.jsp

67

ChEA- chip-chip and chip-seq database of

protein-DNA interactions and enrichment

analysis tool

• 118 unique transcription factors

• 107 publications

• 35286 genes

• >150 ChIP-X assays (ChIP-chip, ChIP-seq, ChIP-PET)

• Average targets per transcription factor ≈ 1,300

• Total interactions 254,854

68

ChEA works well for determining TFs regulating

gene expression changes: Myc was inferred as an

effector of Estrogen in MCF7 cells

Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, and Ma’ayan A. ChEA: Transcription Factor

Regulation Inferred from Integrating Genome-Wide ChIP-X Experiments. Bioinformatics, 26, 2438-44 (2010)

69

Summary

• Prior knowledge networks can be used to predict function of proteins, protein interactions and disease genes

• Different algorithms can be used to connect seed lists of proteins with known interactions from prior knowledge networks

• Network analysis can be used to develop hypotheses for functional experiments by combining high-throughput profiling data with prior knowledge networks

70

Slides from a lecture in the course Systems Biology—Biomedical

Modeling

Citation: A. Ma’ayan, Introduction to network analysis in systems biology. Sci. Signal.

4, tr5 (2011).