computational analysis of biological networks introduction ... · computational analysis of...

34
Computational analysis of biological networks Introduction to molecular biology Lecture 1

Upload: others

Post on 07-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Computational analysis of biological networks

Introduction to molecular biology

Lecture 1

Page 2: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Cells are fundamental working units

of all organisms

Page 3: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Yeast are unicellular organisms

Humans are multi-cellular organisms

Understanding how a cell works is critical to understanding how the organism functions

Page 4: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Prokaryotes vs. Eukaryotes

Yeast is a eukaryote just like humans. Fundamental biological processes are very similar.

Page 5: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Biological macromolecules

What are the main players in molecular biology?

What is DNA, RNA, protein, lipid?

Page 6: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Key biological macromolecules

• Lipids: – mostly structural function– Construct compartments that separate inside from

outside• DNA

– Encodes hereditary information• Proteins

– Do most of the work in the cell– Form 3D structure and complexes critical for function

Page 7: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Lipids

• Each lipid consists of a hydrophilic (water loving) and hydrophobic fragment

• Spontaneously form lipid bilayers => membranes

Page 8: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

DNA

• Uses alphabet of 4 letters {ATCG}, called bases

• Encodes genetic information in triplet code

• Structure: a double helix

Page 9: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Proteins

• A sequence of amino acids (alphabet of 20)

• Each amino acid encoded by 3 DNA bases

• Perform most of the actual work in the cell

• Fold into complex 3D structure

Courtesy of the Zhou Laboratory, The State University of New York at Buffalo

Page 10: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

How does a cell function?The “Central Dogma” of biology

How are proteins made?What are translation & transcription?

Page 11: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

How does a cell function?

DNA is a sequence of bases {A, T, C, G}

TAT-CGT-AGTProteins consist of amino acids, whose sequence is encoded in DNA

Tyr-Arg-Ser

Each 3 bases of DNA encode 1 amino acid

Courtesy U.S. Department of Energy Genomes to Life program

Page 12: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

DNA-RNA-protein

Page 13: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

The Central Dogma of biology

Page 14: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Transcription (DNA->RNA)

• DNA unwinds.• RNA polymerase recognizes a specific base sequence

in the DNA called a promoter and binds to it. The promoter identifies the start of a gene, which strand is to be copied, and the direction that it is to be copied.

• Complementary bases are assembled (U instead of T).• A termination code in the DNA indicates where

transcription will stop.• The mRNA produced is called a mRNA transcript.

Page 15: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

mRNA processing (Euk)

Page 16: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Translation (mRNA->protein)

Ribosomes attach to mRNA and move along it to make protein by adding amino acids in order based on mRNA sequence

Page 17: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Gene expression

What is gene expression?How is it controlled?How is it studied?

Page 18: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Genes vs. proteins

• Genes are units of inheritance• They are static blueprints• It’s proteins (dynamic) that do most of the work • The process of making mRNA, and then protein

from a gene (or genes) is called GENE EXPRESSION

• It’s the control of gene expression that causes most phenotypic differences in organisms

Page 19: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Control of gene expression

• GE is controlled on many levels (in eukaryotes)

Transcriptional - prevent transcription.Posttranscriptional - control or regulate mRNA after it has been produced.Translational - prevent translation, often involve protein factors needed for translation.Posttranslational - act after the protein has been produced.

Page 20: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Gene regulation (in euk.)Hundreds of different transcription factors have been discovered; each recognizes and binds with a specific nucleotide sequence inthe DNA. A specific combination of transcription factors is necessary to activate a gene.

Transcription factors is regulated by signals produced from other molecules. E.g. hormones activate transcription factors and thus enable transcription. Hormones therefore activate certain genes.

Page 21: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Gene regulation -the lac operon(in E. coli)

Genes A, B, and C represent the genes whose products are necessary to digest lactose. In the normal condition, the genes do not function because a repressor protein is active and bound to the DNA preventing transcription. When the repressor protein is bound to the DNA, RNA polymerase cannot bind to the DNA. The protein must be removed before the genes can be transcribed.

Lactose is a sugar found in milk. If lactose is present, E. coli needs to produce the necessary enzymes to digest it (3 enzymes).

Page 22: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

The lac operon (cont)

In the presence of lactose, it binds the repressor, forcing it to leave it’s site on the DNA. That activates transcription.

Page 23: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Modeling biological pathways & networks

An example…the lac operon.What data is available?

What are biological networks?Why is this exciting?

Page 24: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

The lac operon of Escherichia coli. a, Without inducer, the lacrepressor binds to the operator and LacZ is repressed.

b, With inducer, the lac repressor does not bind to the operator, lacoperon mRNA is transcribed, and protein is produced.

c, Output of a single run of a stochastic simulation based on this model. Note the fluctuation in number of lac repressor molecules, which is due to variation in timing of individual synthesis and decay events. At time zero, the concentration of inducer (IPTG) rises to 4 10-4 M; lac repressor is inactivated within seconds and induced transcription begins, followed soon after by translation of lac mRNA and the appearance of the first new LacZproteins. Nature 409, 391 - 395 (2001); Modelling cellular behaviour

Page 25: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Large-scale studies

• Systems biology – study of biology on a whole-cell or whole-organism level

• Recently, lots of new data that enables these studies

• Now can potentially look at gene regulation on whole-genome scale

Page 26: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

The “omes”• Genome – organism’s complete set of DNA

– Relatively stable through an organism’s lifetime– Size: from 600,000 to several billion bases– Gene is a basic unit of heredity (only 2% of the

human genome)• Proteome – organism’s complete set of proteins

– Dynamic – changes minute to minute– Proteins actually perform most cellular functions, they

are encoded by genes (not a 1-to-1 relationship)– Protein function and structure form molecular basis

for disease

Page 27: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Beyond the “omes” – systems biology

• Understanding the function and regulation of cellular machinery, as well as cell-to-cell communication on the molecular level

• Why? Because most important biological problems are fundamentally systems-level problems– Systems-level understanding of disease (e.g. cancer)– Molecular medicine– Gene therapy

Page 28: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Systems-level challenges• Gene function annotation – what does a gene do

– ~30,000 genes in the human genome => systems-level approaches necessary

– A modern human microarray experiment produces ~500,000 data points => computational analysis & visualization necessary

– Many high-throughput functional technologies => computational methods necessary to integrate the data

• Biological networks – how do proteins interact– Large amounts of high-throughput data => computation necessary to

store and analyze it– Data has variable specificity => computational approaches necessary to

separate reliable conclusions from random coincidences• Comparative genomics – comparing data between organisms

– Need to map concepts across organisms on a large scale => practically impossible to do by hand

– High amount of variable quality data => computational methods needed for integration, visualization, and analysis

– Data often distributed in databases across the globe, with variable schemas etc => data storage and consolidation methods needed

Page 29: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Biological networks

• Interaction maps (no directions)• Pathway models (dynamic or static)• Metabolic networks• Genetic regulatory networks

Page 30: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Hawoong Jeong et al. Oltvai Centrality and lethality of protein networks. Nature 411, 41-42 (2001)

Yeast interaction network

Page 31: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental
Page 32: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Regulatory network –urchin hox proteins

Image from: Hox pro DB

Page 33: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Metabolic network in E.coliMetabolic maps provide a framework for

studying the consequences of genotype changes and the relationships between genotypes and phenotypes.

This metabolic network model:436 metabolic intermediates undergoing 720 possible enzyme-catalyzed reactions

Circles contain abbreviated names of the metabolic intermediates, and the arrows represent enzymes. The very heavy lines indicate links with high metabolic fluxes. Analyses were correct 90% of the time in predicting the ability of 36 mutants with single-gene deletions to grow on different media.

adapted from J.S. Edwards and B.O. Palsson, PNAS 97, 5528-33 (2000)

Page 34: Computational analysis of biological networks Introduction ... · Computational analysis of biological networks Introduction to molecular biology Lecture 1. Cells are fundamental

Genetic circuits & pathway kinetics• Bacterial virus [lambda] can choose:

– lytic path of immediate replication of the virus and destruction of its E.coli host

– lysogenic path of incorporation of viral DNA into the host genome, allowing the virus to remain in a dormant state

• This diagram:– bold horizontal lines - stretches of double-

stranded DNA– arrows over genes show the transcription

direction– dashed boxes - promoter control complex

• This model of pathway selection at different virus concentrations uses a kinetic model of the genetic regulatory circuit. It is consistent with experimental observations. Developing this model required nearly 40 empirical rate constants and the use of a supercomputer.

adapted from A. Arkin, J. Ross, and H. H. McAdams, Genetics 149, 163348 (1998); DOE Genomes to life program