queensland university of technology cricos no. 00213j using a beagle to sniff for bacterial...

21
Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M. Hogan Queensland University of Technology

Upload: millicent-sanders

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

Queensland University of Technology

CRICOS No. 00213J

Using a Beagle to sniff for Bacterial Promoters

Stefan R. Maetschke, Michael Towsey and James M. Hogan

Queensland University of Technology

Page 2: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

2

An Agenda

• Bacterial Promoters– The domain and the motifs – Earlier approaches, including ours

• Why dumber is better – Not quite, but flexibility before sophistication – Exploiting new features as they are identified

• Results

Page 3: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

3

Upstream from a Bacterial Gene

TSSpromoter

RNA polymerase

transcription

GSS gene

• Search for ‘conserved’ -10 and -35 hexamers– Except they’re not really conserved– Plagued by massive false positive rates

• But this is the Reader’s Digest version

Page 4: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

4

Previous Work

• Mainly in the E. coli system • PWMs – simple, but poor discrimination

– Good performance if compound structure used – (Collado-Vides et. al.: State of the art pre 2006)

• HMMs – less successful than in eukaryotes • TDNNs – boosted by GSS offset distribution • SVMs – spectrum kernel ensemble

– (Gordon et. al. (us): state of the art, but at a price)

70

Page 5: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

5

Beagle

• Principled and rapid inclusion of motifs as they are discovered or hypothesised – Prior to the Gordon et. al. paper, a TP:FP ratio of

1:300 was considered good. – But this was based solely on -10 and -35 motifs

• A model description language and parser– Less sophisticated than it sounds, but sufficient

• Iterative refinement of the model

Page 6: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

6

Upstream from a Bacterial Gene

TTGACA

-10 element

TATAAT

TSS GSS-35 element

ATG

Core Enzyme:

Specific sigma controls binding at -10, -35 elements

But binding probability varies enormously

Compensate when hexamers are weak

“It has long been known that domains 2 and 4 … bind to the strongly conserved -10 and -35 boxes”. Except when they don’t because they aren’t…

Page 7: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

7

Upstream from a Bacterial Gene

TTGACA TRTG

Extended -10 element

TATAAT

TSS GSS-35 element

ATG

Simple Extended -10: TG Discovered in B. Subtilis, found in 20% of promoters in E. Coli

-16 hypothesised to be important in E. Coli, TRTG or T(AG)TG consensus

But even the alpha units aren’t what they seem…

Page 8: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

8

Upstream from a Bacterial Gene

TTGACAAAAAAARNRAWWWWWTTTTT

CTD1CTD2

NTD2

proximal UP element

TSS GSSdistal UP element

-35 element

ATG

NTD1

TRTG

Extended -10 element

TGTATAAT

-16

CTDs are carboxy terminal domains, binding to UP elements

AT-rich region, proximal element more important

Page 9: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

9

The Data

• E. Coli and B. Subtilis• Confirmed TSS locations within 250bp of the

nearest gene start – No overlapping reading frames

• N=492 (E. Coli), 205 (B. Subtilis) • 250 bp USRs available

Page 10: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

10

Beagle algorithm

• Define a consensus promoter– e.g. <TTGACA (15, 21) TATAAT (4, 13) TSS>– Ordered pairs specify gap ranges

• Parse the description and define PWMs and weighted gaps – Initially trivial

• Refine using the confirmed TSS locations

Page 11: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

11

Beagle algorithm

• For each USR in the training set:– Anchor the pattern to the known TSS location– Determine the best match based on the current model

• Find the MLE of the model parameters based on the best matches from the training data.

• Test the refined definition on unseen data– 10 repeats x 10 fold cross validation

– Essentially TSS prediction

• Iterate until improvement ceases.

Page 12: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

12

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical -35, -10

boxes 37.5 ± 1.4 % 61.6 ± 1.8 %

Canonical

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Guess which promoter boxes are more strongly conserved…

Page 13: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

13

Including UP elements

• NNW15NN – AT rich region

• NNAAAWWTWTTNNAAANNN – Estrem et al 1998

• NNAAAWWTWTTN – A6RNR– Gourse et al 2000– distal - proximal motif

Page 14: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

14

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical boxes

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Canonical

+ distance to GSS

+ Estrem UP

41.4 ± 1.2 % 62.0 ± 1.7 %

Canonical

+ distance to GSS

+ AT rich region

47.3 ± 1.2 % 64.8 ± 1.8 %

Page 15: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

15

Comparing E. coli and B. subtilis promoters

B. subtilis -35 element

B. subtilis -10 element

E. coli -10 element

E. coli -35 element

E. Coli has 7 known sigmas; B. Subtilis 18…

Page 16: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

16

Motifs ‘in the Gap’

• Extended -10 element – Consensus TGTATAAT– Strongly implicated in Subtilis– Hypothesised as significant in 20% E Coli

• Extended -16 element – Consensus TRTG

Page 17: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

17

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical boxes

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Canonical

+ distance to GSS

+TG extended-10

41.6 ± 1.3 % 62.5 ± 1.8 %

Canonical

+ distance to GSS

+TRTG extended-10

37.6 ± 1.3 % 62.6 ± 1.8 %

Page 18: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

18

The Complete Picture

-10-35

CTDII

CTD

NTD

70CTDII

CTDII

-40.5-52-62-72

UP elementAT rich

Variable location

Page 19: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

19

TSS recognition (% accuracy)

Pattern E. coli B. subtilisCanonical boxes

+ distance to GSS 43.3 ± 1.2 % 61.2 ± 1.7 %

Canonical

+ distance to GSS

+TG extended-10

+ AT rich region

48.3 ± 1.5 % 68.8 ± 1.6 %

Canonical

+ distance to GSS

+TRTG extended-10

+ AT rich region

40.5 ± 1.4 % 71.2 ± 1.7 %

Page 20: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

20

TSS recognition (% accuracy)

E. coli

43.3%

48.3%

B. subtilis

61.2%

71.2%

+AT rich 47.3% 41.6% +TG +AT rich 64.8% 62.6% +TRTG

Page 21: Queensland University of Technology CRICOS No. 00213J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M

CRICOS No. 00213Ja university for the worldrealR

21

Conclusions

• Beagle provides a simple bridge between experiment and computational discovery– Is the extended -16 motif really important in E. Coli?– (Well, not in any general sense)

• Fast, robust and flexible • Extensions

– Combination of model organisms– Comparative genomics & regulation