ivan erill: "beyond the regulon: reconstructing the sos response of the human gut...

34
CAATCCGAGGCATGGCATGGTCGTTAGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001001000101010001 TGCCATCGATAGCTTGAGACTCGAAGGGAGATAGATGACGACAGCTATTCGAGCATC01011010100100100010100101011 CGACCTAGCTTGAGATCGAGCGAAGATAGATGACGACAGCTATTCGAGCATC0101101010100100110010100101011001 AGCCTCTGAGATCGAGGGAGATAAGATGACGACAGCTATTCGAGCATC01011010101001000101001010010110011110 ATCCGACTTCGATGCATCGATACAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111101001001010 ATTCGAATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111001001010101011010 GATGCCATCGATCAGTTGCTCTCTTCTCAGAGAGAG01010101001010100010001111110010010101010000101001 ATGCCATAAGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010100101010111 ATGCCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101010111101010110 ATGCCAATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101010111101001011001 TATACTCACGGCTACGTTGCATGCAT010100010100010010010010010001111111100101010010101000100000 TACGCGCCTACGTTGCATGCAT0101000101000100100100100100011111111001010100101010001010101110 GCTACCCGTTGCATGCAT01010001010001001001001001000111111110010101001010100010101011011011 GGCTCGCATCCACATG0101010101010101010101001010101010000101001010010101010100001000011010 BIOLOGICAL SCIENCES Beyond the regulon reconstructing the SOS response of the human gut microbiome Ivan Erill

Upload: vall-dhebron-institute-of-research-vhir

Post on 11-May-2015

341 views

Category:

Health & Medicine


0 download

DESCRIPTION

Metagenomic projects provide a unique window into the genetic composition of microbial communities. To date, metagenomic analyses have focused primarily on studying the composition of microbial populations and inferring shared metabolic pathways. In this work we analyze how high-quality metagenomic data can be leveraged to infer the composition of transcriptional regulatory networks through a combination of in silico and in vitro methods. Using the SOS response as a case example, we analyze human gut microbiome data to determine the composition of the SOS meta-regulon in a natural context. Our analysis provides proof of concept that the existing knowledgebase on regulatory networks and reference genomes can be effectively leveraged to mine meta-genomic data and reconstruct multi-species regulatory networks. This approach allows us to identify de novo the core elements of the human gut SOS meta-regulon, highlighting the relevance of error-prone polymerases in this stress response, and identifies putative novel SOS protein clusters involved in cell wall biogenesis, chromosome partitioning and restriction modification. The methodology implemented in this work can be applied to other metagenomic datasets and transcriptional systems, potentially providing the means to compare regulatory networks across metagenomes. The use of metagenomic data to analyze transcriptional regulatory networks provides a realistic snapshot of these systems in their natural context and allows probing at their extended composition in non-culturable organisms, yielding insights into their interconnection and into the overall structure of transcriptional systems in microbiomes.

TRANSCRIPT

Page 1: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

CAATCCGAGGCATGGCATGGTCGTTAGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001001000101010001TGCCATCGATAGCTTGAGACTCGAAGGGAGATAGATGACGACAGCTATTCGAGCATC01011010100100100010100101011CGACCTAGCTTGAGATCGAGCGAAGATAGATGACGACAGCTATTCGAGCATC0101101010100100110010100101011001AGCCTCTGAGATCGAGGGAGATAAGATGACGACAGCTATTCGAGCATC01011010101001000101001010010110011110ATCCGACTTCGATGCATCGATACAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111101001001010ATTCGAATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111001001010101011010GATGCCATCGATCAGTTGCTCTCTTCTCAGAGAGAG01010101001010100010001111110010010101010000101001ATGCCATAAGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010100101010111ATGCCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101010111101010110ATGCCAATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101010111101001011001TATACTCACGGCTACGTTGCATGCAT010100010100010010010010010001111111100101010010101000100000TACGCGCCTACGTTGCATGCAT0101000101000100100100100100011111111001010100101010001010101110GCTACCCGTTGCATGCAT01010001010001001001001001000111111110010101001010100010101011011011GGCTCGCATCCACATG0101010101010101010101001010101010000101001010010101010100001000011010

BIOLOGICAL SCIENCES

Beyond the regulon reconstructing the SOS response of the human gut microbiome

Ivan Erill

Page 2: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

2

The researchsome

Comparative genomics

Molecular microbiology

Computational biology

Bioinformatics

Transcription factors

Stress responses

Microbial metagenomics

Codon usage indices

Machine learning

Evolutionary simulations

Motif search & discovery

High-throughput assays

Clinical microbiology

Molecular phylogeny

00000

Page 3: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

3

The researchsome

Comparative genomics

Molecular microbiology

Computational biology

Bioinformatics

Transcription factors

Stress responses

Microbial metagenomics

Codon usage indices

Machine learning

Evolutionary simulations

Motif search & discovery

High-throughput assays

Clinical microbiology

Molecular phylogeny

00001

Page 4: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

4

On regulons

RegulonsSets of genes/operons (transcriptionally)

regulated by a particular transcription factor (TF)

Cellular response to specific internal or external stimuli

Defined by specific binding of TF to promoter region of regulated genes Regulon genes can be repressed or activated TF recognizes a specific binding motif

.

Guzmán-Vargas and Santillán BMC Systems Biology 2:13 (2008)

ATGTCGATCAGCTAGCC...

RNA-polymerase

Transcription Factor (TF)

Open reading frame

00000

Schematic bacterial promoter

TFi

TG1 TG2

TG3

TG4

S

TFx

Gx

TFyTFi

TG1 TG2

TG3

TG4

S

TFx

Gx

TFy

Regulon

CTGTAAAG CTGCACAG CTGATCAG

TF-binding motif

Page 5: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

5

On metagenomesMetagenome

Multi-species, heterogeneous collection of high-throughput reads from a natural habitat

The good“Unculturable” speciesDiversity samplingNatural population sampling

The badLow coverageHigh-levels of polymorphismDiversity of low complexity regionsContamination with eukaryotic DNA

The uglyLack of proper models for

Pre-filtering Assembly Gene calling Analysis?.

00000

High-throughput sequencing

Gest, H. Microbiology Today 35: 220 (2008)P. D. Schloss and J. Handelsman, Genome Biol. 6:229, (2005)

Page 6: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

6

On metagenomesMetagenome

Multi-species, heterogeneous collection of high-throughput reads from natural habitat

PropertiesLots of data!Noisy! Increasingly cheap and abundant!

Post-processing typical formatAssembled contigs/scaffolds with predicted,

functionally annotated genes

Problem How do we extract useful information from

metagenome data?(i.e. how do we evade Brenner’s “low input, high-throughput, no output” epithet?)

.

.

00001

Assembly, gene calling & functional annotation

High-throughput sequencing

Friedberg, E. C. Nat Rev Mol Cell Biol 9, 8-9 (2008)

Page 7: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

7

Analysis of metagenomic dataThe metagenome & regulatory networks

The metagenomeMulti-species, heterogeneous collection of high-

throughput reads from natural habitat

Problem How do we extract useful information from metagenome

data?.

Conventional workflow (e.g. metabolic networks) Knowledge from references is used as terminal Data is mapped onto existing, static knowledgebase Inference on mapped data

.

00000

Assembly, gene calling & functional annotation

High-throughput sequencing

Pathway mapping, clustering and enrichmentx

yz

s w

a

m

n

Phylogeny

Pathway

Map to reference Low discovery potential

Page 8: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

8

Analysis of metagenomic dataThe metagenome & regulatory networks

The metagenomeMulti-species, heterogeneous collection of high-

throughput reads from natural habitat

Problem How do we extract useful information from metagenome

data?.

Conventional workflow (e.g. metabolic networks) Knowledge from references is used as terminal Data is mapped onto existing, static knowledgebase Inference on mapped data Interesting repertoire of new questions

.

00001

Assembly, gene calling & functional annotation

High-throughput sequencing

Pathway mapping, clustering and enrichmentx

yz

s w

a

m

n

Phylogeny

Pathway

Map to reference Low discovery potentialMuegge, B. D. et al. Science, 332 (6032), 970-974 (2011)

Page 9: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

9

Analysis of metagenomic dataThe metagenome & regulatory networks

The metagenomeMulti-species, heterogeneous collection of high-

throughput reads from natural habitat

Problem How do we extract useful information on regulatory

networks from metagenome data?.

Alternative workflow Knowledge from reference used as seed Directed mining of metagenome data Inference on mined data

.

00010

Assembly, gene calling & functional annotation

High-throughput sequencing

Regulon analysis, clustering and enrichment

x

nw

sm

x

nw

sm

x

n

wsm

z

x

n

wsm

z

Seed reference High discovery potential

Page 10: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

10

Analysis of metagenomic dataThe metagenome & regulatory networks

The metagenomeMulti-species, heterogeneous collection of high-

throughput reads from natural habitat

Problem How do we extract useful information on regulatory

networks from metagenome data?.

Alternative workflow (e.g. regulatory networks) Knowledge from reference as seed Directed mining of metagenome data Inference on mined data Promising questions and challenges

.

00011

Assembly, gene calling & functional annotation

High-throughput sequencing

Regulon analysis, clustering and enrichment

x

nw

sm

x

nw

sm

x

n

wsm

z

x

n

wsm

z

Is network composition governed by convergent evolution or by phylogeny?Can we effectively infer regulatory networks from metagenomics data?Seed reference High discovery potential

Page 11: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

11

Analysis of metagenomic dataMetagenomics and regulatory network analysis

AdvantagesReal bacterial populationsUnculturable organisms and mobile elementsVariability at species and subspecies levels

ChallengesNoisy search process, huge datasetHow to: data integration, enrichment and analysis

GoalsProof of concept

Analyze the potential of meta-genomic & regulatory sequence data to explore known regulatory systems

Study a regulatory network in its natural setting

.

00100

Page 12: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

12

Analysis of metagenomic data

Metagenomics and regulatory network analysisRequires

A regulatory network to analyzeThe bacterial SOS response

A metagenome on which to analyze itThe human gut microbiome

.

00101

Page 13: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

13

The bacterial SOS responseTranscriptional response against DNA damage

.

00000

“Canonical” stress responseWidespread in bacteria

Well-characterized in most bacterial phylaE. coli, B. subtilis, M. tuberculosis, V. parahaemolyticus, S. meliloti, B. bacteriovorus, X. campestris, G. sulfurreducens…

Two-component system RecA (sensor) LexA (repressor)

response to DNA damaging agents

Well-characterized regulon Target genes

~40 in E. coli / ~30 B. subtilis

Functions Recombination & DNA repair Cell-division inhibition Translesion synthesis

. Erill, I. et al. FEMS Microbiol. Rev. 31 (6), 637 (2007)

Page 14: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

14

The bacterial SOS responseTranscriptional response against DNA damage

.

00001

Erill, I. et al. FEMS Microbiol. Rev. 31 (6), 637 (2007)

High clinical relevanceWidespread in bacteria

Two-component system RecA (sensor) LexA (repressor) Response to

Broad range of antibiotics Bacteriophage infection

Extended regulon Functions

Integron recombination Bacteriophage induction Toxin production Dissemination of pathogenicity islands Antibiotic-induced mutagenesis Regulation of persistence

. Guerin, E. et al., Science, 324 (5930), 1034 (2009)

Page 15: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

15

The bacterial SOS responseTranscriptional response against DNA damage

.

00010

Erill, I. et al. FEMS Microbiol. Rev. 31 (6), 637 (2007)

Interesting evolutionWidespread in bacteria

Absent in some clades (Bacteroidetes/Chlorobi group) Supplanted by competence regulon (S. pneumoniae)

Extreme diversity of LexA-binding motifs Clade-specific & monophyletic

.Geobacteres

Gram-positive

Myxobacteriales

Xanthomonadales

Alpha Proteobacteria

Beta/Gamma Proteobacteria

Cyanobacteria

Fibrobacteres

Page 16: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

16

The human gut microbiome

Metagenomics projectTarget metagenome

Human microbiomeMultiple datasets (locations: gut, armpit, etc.)Multiple initiatives (HMP & MetaHit)Available data & features:

High-throughput sequencing + 16S RNA data ORF predictions & functional annotation

.

00000

Qin, J. et al. Nature. 464, 59 (2010)Nelson, K.E, et al. Science. 328, 994 (2010)

Segata, N. et al. Gen. Biol. 13, R42 (2012)

MetaHit human gut microbiome

GammaproteobacteriaActinobacteria

Other

Bacteroides

Firmicutes

GammaproteobacteriaActinobacteria

Other

Bacteroides

Firmicutes

86 healthy subjectsLarge contigs, high-quality gene calling7.1 Gbp total sequence – 4.5 M contigs (N50: 2.2 kbp)9.3 M predicted ORF (3.7M complete), λ=660 bp 1 M COG annotations

Page 17: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

17

Analysis workflowWorkflow

Data compilation LexA-binding motif compilation

Gram-positive bacteria CollecTF database 118 sites, 8 species

Reference genome panel 121 genomes from MetaHit and the Human

Microbiome Jumpstart Reference Strains Consortium

Reference SOS response 18 described SOS responses

Acidobacteria Alphaproteobacteria Gammaproteobacteria Deltaproteobacteria Bacilli Clostridia Actinobacteria Fibrobacteria

272 regulated genes

.

00001

collectf.umbc.edu

Kiliç, S. et al. Nuc. Acids Res. 42, D156-D160 (2013)Nelson, K.E, et al. Science. 328, 994 (2010)

Cornish, J. P. et al. Evol Bioinform. 8: 449–461 (2012)Erill, I. et al. FEMS Microbiol. Rev. 31 (6), 637 (2007)

Gram-positive reference motif

Page 18: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

18

Analysis workflowWorkflow

Data compilation LexA-binding motif compilation Reference genome panel Reference SOS response

Metagenome mining PSSM-based search

Reference motif, 2 strands

Operon prediction Site-operon association

Distance-based

Taxonomic annotation Through reference panel mapping

for phylogenetic filtering of results

Functional clustering Through COG mapping

for functional analysis

.

00010

GAACTACTGTTC

GAACTACTGTTC

GTACAACTGTTCGATCTATTGTTC

GAACTCATGTTT

GTTCAAAAGATC

GAACTCCTGTCC

PSSM-based search

LexA-binding motif score histogram

0

0.05

0.1

0.15

0.2

0.25

0.3

1 5 9 13 17 21 25 29

Score

Fre

qu

ency

Page 19: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

19

Analysis workflowWorkflow

Data compilation LexA-binding motif compilation Reference genome panel Reference SOS response

Metagenome mining PSSM-based search

Reference motif, 2 strands

Operon prediction Site-operon association

Distance-based

Taxonomic annotation Through reference panel mapping

for phylogenetic filtering of results

Functional clustering Through COG mapping

for functional analysis

.

00011

GAACTACTGTTC

GTACAACTGTTCGATCTATTGTTC

GAACTCATGTTT

GTTCAAAAGATC

GAACTACTGTTC GAACTACTGTTC

GAACTCATGTTT

GAACTACTGTTCGAACTCCTGTCC

Operon prediction

Page 20: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

20

Analysis workflowWorkflow

Data compilation LexA-binding motif compilation Reference genome panel Reference SOS response

Metagenome mining PSSM-based search

Reference motif, 2 strands

Operon prediction Site-operon association

Distance-based

Taxonomic annotation Through reference panel mapping

for phylogenetic filtering of results

Functional clustering Through COG mapping

for functional analysis

.

00100

GAACTACTGTTC

GTACAACTGTTCGATCTATTGTTC

GAACTCATGTTT

GTTCAAAAGATC

GAACTACTGTTC GAACTACTGTTC

GAACTCATGTTT

GAACTACTGTTCGAACTCCTGTCC

Site-operon association

Page 21: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

21

Analysis workflowWorkflow

Data compilation LexA-binding motif compilation Reference genome panel Reference SOS response

Metagenome mining PSSM-based search

Reference motif, 2 strands

Operon prediction Site-operon association

Distance-based

Taxonomic annotation Through reference panel mapping

for phylogenetic filtering of results

Functional clustering Through COG mapping

for functional analysis

.

00101

GAACTCATGTTT

GAACTACTGTTC

GAACTCATGTTT

GAACTACTGTTC

Refe

renc

e ge

nom

e lib

rary

Taxonomic annotation

Page 22: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

22

Analysis workflowWorkflow

Data compilation LexA-binding motif compilation Reference genome panel Reference SOS response

Metagenome mining PSSM-based search

Reference motif, 2 strands

Operon prediction Site-operon association

Distance-based

Taxonomic annotation Through reference panel mapping

for phylogenetic filtering of results

Functional clustering Through COG mapping

for functional analysis

.

00110

GAACTCATGTTT

GAACTACTGTTC

GAACTCATGTTT

GAACTACTGTTC

COG

refe

renc

e lib

rary

Functional clustering

COG123

COG345

COG567

COG789

Page 23: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

23

The human gut microbiomeWorkflow

Data compilation Motif compilation Reference genome panel Reference SOS response

Metagenome mining PSSM-search Operon prediction Site-operon association Phylogeny annotation Functional clustering

Analysis Positional enrichment analysis Data filtering COG enrichment analysis Gene-based functional analysis

.

00111

GAACTCATGTTT

GAACTACTGTTC

GAACTCATGTTT

GAACTACTGTTC

Data for analysis

Page 24: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

24

The human gut SOS response

Initial search resultsOver 500,000 putative LexA-binding sites identified

Positional enrichment analysisPromoter regions

Site scores are significantly enriched in promoter regionsHigh-scoring sites co-localize in promoter regions

.

00000

Permutation analysis of site scores

Page 25: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

25

The human gut SOS response

Data filteringTwo-pronged approach

Distance-basedOnly sites located between -350 and +50 of predicted TLS

Taxomomy-basedOnly sites associated with predicted protein-coding genes mapping to Gram-

positive reference genomes

Filtering resultsDramatic reduction in the number of putative sites

Over 43,000 sites meeting both criteriaTaxonomy-based filtering provides enhanced resolution

Law of large numbers: high-scoring sites can be identified in the promoter region of many Bacteroides genes

.

00001

Page 26: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

26

The human gut SOS responseCOG category analysis

Inferred regulon maps experimentally characterized SOS responsesGradual enrichment of canonical SOS categories with score cutoff:

repair/replication (L), signal transduction (T) and transcription (K) genes

Cell cycle control (D) category not enriched COGs are getting old!

.

00010

0

0.1

0.2

0.3

0.4

0.5

J K L D V T M C G F R SCOG category

Rela

tive

freq

uenc

y MetaHit COG referenceCOGs with SOS siteCOGs with site >12 bitsCOGs with site >14 bitsCOGs with site >16 bitsSOS ensemble reference

Page 27: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

27

The human gut SOS responseCOG analysis

QuestionHow to identify “SOS COGs”?

Score enrichment measureGoal

Identify bona-fide members of the regulon Capture maximum number of known SOS genes

Analysis of canonical SOS genes in 308 Gram-positive genomesLexA-binding site scores normally distributed

(lexA: =16.2 bits, =2.3; recA: =16.3 bits, =2.5)Cumulative distribution approximately linear

in central scoring range 12-20 bitsPrototypical SOS COG

High linear coefficient of determination (R2>0.85, empirically set)

At least: one site above average score (16 bits) 10 sites in 12-20 bit range

.

00011

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

9 11 13 15 17 19 21 23Site score (bits)

Cu

mu

lati

ve d

istr

ibu

tio

n

lexA (Firmicutes)recA (Firmicutes)

Quantile-quantile plot

911131517192123

9 11 13 15 17 19 21 23Theoretical

Em

pir

ica

l

lexA (Firmicutes)recA (Firmicutes)

Canonical SOS genes

Page 28: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

28

The human gut SOS responseCOG analysis

ResultsDetection of canonical SOS regulon

lexA, recA, excision repair, recombinationSOS meta-regulon composition

Four major functions Transcriptional repression (lexA) Translesion synthesis (dinB, uvrX, imuB, umuD) Sensing of DNA-damage & stabilization (recA) Excision repair (uvrA, uvrB, uvrD, pcrA)

Translesion synthesis as primary SOS component Interesting new putative SOS regulon COGs

COG0732 HsdS – restriction endonuclease

COG2001 MraZ – cell wall biogenesis

COG4974 CodV – chromosome partitioning

.

00100

0.86recNCOG0497

0.87ruvACOG0632

0.87codVCOG4974

0.88parECOG0187

0.91uvrACOG0178

0.91hsdSCOG0732

0.91MraZCOG2001

0.92uvrD, pcrACOG0210

0.96lexA,umuDCOG1974

0.97uvrBCOG0556

0.98recA,imuACOG0468

0.98dinB, imuB, uvrXCOG0389

r2Associated genesCOG

0.86recNCOG0497

0.87ruvACOG0632

0.87codVCOG4974

0.88parECOG0187

0.91uvrACOG0178

0.91hsdSCOG0732

0.91MraZCOG2001

0.92uvrD, pcrACOG0210

0.96lexA,umuDCOG1974

0.97uvrBCOG0556

0.98recA,imuACOG0468

0.98dinB, imuB, uvrXCOG0389

r2Associated genesCOG

0

0.2

0.4

0.6

0.8

1

Nor

mal

ized

num

ber o

f site

s CO

G19

74 -

lexA

, um

uD

CO

G03

89 -

dinB

, uv

rX, i

muB

CO

G04

68 -

recA

, im

uA

CO

G50

56 -

uvrB

CO

G02

10 -

uvrD

, pc

rA

CO

G01

78 -

uvrA

CO

G20

01 -

mra

Z

CO

G04

97 -

recN

CO

G01

87 -

parE

CO

G07

32 -

hsdS

CO

G49

74 -

codV

CO

G06

32 -

ruvA

Page 29: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

29

The human gut SOS responseTargeted gene analysis

Assessment of non-canonical functions in genes with high-scoring sitesToxin-antitoxin / virulence systems (higB / rhuM)

Linked to persistence phenotypesPhage integrases (intP)

In line with integron integrase regulation and phage control by SOS response

Validation of enriched COGsCell wall biogenesis (mraZ)

Possible role in cell division control Evidence of convergent regulation

YneA (B. subtilis), DivS (C. glutamicum)

Experimental validationEMSA with purified B. subtilis protein

.

00101

recA- + - + - + - +

mraZ intPrhuM

Page 30: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

30

Beyond the regulon

Proof of concept: the human gut SOS meta-regulonMethodology

Provides the means to expand our knowledge on described regulatory systemsCOG enrichment as a method for functional assessment of the meta-regulonAnalysis allows visualizing a regulatory response in a wild-population

Inference of novel knowledge on regulon function and componentsConsistent with known SOS responses; primary focus on mutagenesisContains several elements linking it to other cellular processes of clinical relevance

Future directionsAnalyze and compare regulatory networks in metagenomes

Is network evolution dictated by phylogeny or habitat?How do changes in habitat affect meta-regulons?How does the overlap between meta-regulons vary among populations?

00000

Page 31: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

31

Beyond the regulon

Automating meta-regulon inferenceA transcription factor

Exists in a subset of speciesBinding sites for the TF are enriched in a subset of functional clusters

How can we automatically determine the set of species & COGs?

00001

0

0.05

0.1

0.15

0.2

0.25

0.3

5 10 15 20 25 30

Aver

age

scor

e co

unt i

n ge

ne u

pstr

eam

regi

ons

Score (bits)

LexA-binding site score distribution

Firmicutes (SOS COGs)Firmicutes (random COGs)All taxa, all COGs

0

2

4

6

8

10

12

14

16

18

-60 -40 -20 0 20 40

Aver

age

scor

e co

unt i

n ge

ne u

pstr

eam

regi

ons

Score (bits)

LexA-binding site score distribution

Firmicutes (SOS COGs)Firmicutes (random COGs)All taxa, all COGs

Page 32: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

32

Beyond the regulon

EM algorithm for isolation of enriched COGs/taxaDefine likelihood function

Statistical test for mixture model in observed distribution

Assign weights to COGs (Ci) and taxa (Tj)For given COG weights, compute likelihood of each taxon, update weight with likelihood

For given taxon weights, compute likelihood of each COG, update weight with likelihood

00010

C60.1

C50.8

C40.7

C30.3

C20.2

C10.3

T6T5T4T3T2T1

0.50.40.20.90.60.5

C60.1

C50.8

C40.7

C30.3

C20.2

C10.3

T6T5T4T3T2T1

0.50.40.20.90.60.5

C60.1

C50.8

C40.7

C30.3

C20.2

C10.3

T6T5T4T3T2T1

0.50.40.20.80.60.5

C60.1

C50.8

C40.7

C30.3

C20.2

C10.3

T6T5T4T3T2T1

0.50.40.20.80.60.5

Page 33: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

ACACGGATCGATCGAGGCATGGCATGGTCGTTGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001100100001ACCATCGATTCGATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111010101111010CGGATGCATGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101101001110GGCTGATCCACATG010101010101010101010100101010101000010100101001010101010000100010011011

ACAACGCCTERILLGTATAGCAGTGTGTCATTGCTTTAGCTAGTACACAGACACGCBIOLOGICALATUMBC0101010101110001010100010LAB010010101001000011110001010001010001001011100SCIENCESCCAGGACATGAGCTAAAAC

33

Conclusions & AcknowledgementsAcknowledgements

Erill Lab Joe CornishNeus Sanchez-AlberolaPat O’Neill Jameel GhebaRon O’KeefeTalmo PereiraDavid Nicholson

Wolf LabRichard WolfLanyn Perez

Barbé LabSusana Campoy Jordi Barbé

Funding UMBC Office of Research – Special Research Assistantship/Initiative Support NSF grant MCB-1158056

.

Page 34: Ivan Erill: "Beyond the Regulon: reconstructing the SOS response of the human gut microbiome"

CAATCCGAGGCATGGCATGGTCGTTAGATTGCTGATTTTGAATGATCGATCGATCGATGGGC010101001001000101010001TGCCATCGATAGCTTGAGACTCGAAGGGAGATAGATGACGACAGCTATTCGAGCATC01011010100100100010100101011CGACCTAGCTTGAGATCGAGCGAAGATAGATGACGACAGCTATTCGAGCATC0101101010100100110010100101011001AGCCTCTGAGATCGAGGGAGATAAGATGACGACAGCTATTCGAGCATC01011010101001000101001010010110011110ATCCGACTTCGATGCATCGATACAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111101001001010ATTCGAATGCATCGATCAGTTGCTCTCTTCTCAGAGAGAG0101010100101010001000111111001001010101011010GATGCCATCGATCAGTTGCTCTCTTCTCAGAGAGAG01010101001010100010001111110010010101010000101001ATGCCATAAGCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010100101010111ATGCCATGCATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101010111101010110ATGCCAATGGCCCCTTCGCTCGCTAAG10101010001010101000001011100010100010101010111101001011001TATACTCACGGCTACGTTGCATGCAT010100010100010010010010010001111111100101010010101000100000TACGCGCCTACGTTGCATGCAT0101000101000100100100100100011111111001010100101010001010101110GCTACCCGTTGCATGCAT01010001010001001001001001000111111110010101001010100010101011011011GGCTCGCATCCACATG0101010101010101010101001010101010000101001010010101010100001000011010

BIOLOGICAL SCIENCES

Beyond the regulon reconstructing the SOS response of the human gut microbiome

Ivan Erill