1 mrbump – molecular replacement with bulk model preparation ronan keegan, martyn winn ccp4 group,...

26
1 MrBUMP Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

Upload: dwayne-parsons

Post on 29-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

1

MrBUMP – Molecular Replacement with Bulk Model Preparation

Ronan Keegan, Martyn Winn

CCP4 group, Daresbury Laboratory

Como May 23rd 2006

Page 2: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

2

The aim of Mr Bump

•An automation framework for Molecular Replacement.•Particular emphasis on generating a variety of search models.•Can be used to generate models only.

•Wraps Phaser and/or Molrep.•Also uses a variety of helper applications (e.g. Chainsaw) and

bioinformatics tools (e.g. Fasta, Mafft)•Uses on-line databases (e.g. PDB, Scop)

•In favourable cases, gives “one-button” solution•In unfavourable cases, will suggest likely search models for manual

investigation (lead generation)

Page 3: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

3

`Target MTZ

& Sequence

TargetDetails

• Currently:– Number of residues and molecular weight– Matthews Coefficient.– Estimated number of molecules in the a.s.u.

Page 4: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

4

`

`Target MTZ

& Sequence

TargetDetails

ModelSearch

Generate a list of structures that are possible templates for search models

Page 5: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

5

Search for homologous proteins

• FASTA search of PDB– Sequence based search using sequence of target structure.– Can be run locally if user has fasta34 program installed or

remotely using the OCA web-based service hosted by the EBI.– Local search is done against the complete list of PDB

sequences derived from ATOM records in the PDB structure files.

– All of the resulting PDB id codes are added to a list

– Not interested in the alignment to target at this stage.

Page 6: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

6

Search for similar structures

• Secondary Structure based search (optional)– Top hit from the FASTA search is used as the template structure

for a secondary structure based search.– Uses the SSM webservice provided by the EBI.– Any new structures found that aren’t included in the list of

matches from the FASTA search are added to the list. – Provides structural variation, not based on direct sequence

similarity to target

• Manual addition– Can add additional PDB id codes to the list, e.g. from FFAS or

psiBLAST searches

Page 7: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

7

Multiple Alignment

• After the set of PDB ids are collected in the FASTA and SSM searches, their coordinate-based sequences are collected and put through a multiple alignment with the target sequence

• Aims:– Score template structures in a consistent manner, in order to

prioritise them for subsequent steps– Extract pairwise alignment between template and target for use

in Chainsaw step. Multiple alignment should give a better set of alignments than the original pair-wise FASTA alignments

Page 8: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

8

Multiple Alignment

target

modeltemplates

pairwisealignment

Jalview 2.08.1 Barton group, Dundee

currently support ClustalW or MAFFT for multiple alignment

Page 9: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

9

Template Model Scoring

• Sequence identity:– Ungapped sequence identity i.e. sequence identity of aligned target

residues• Alignment quality:

– Dependent on the alignment length, the number of gaps created in the template alignment and the extent of each of these gaps.

– The penalties given for gaps and the size of the gaps is biased so that alignments that preserve domains of the structure rather than spreading the aligned residues out score higher.

The top scoring models are then used for further processing

• Alignment Scoring:

score = sequence identity X alignment quality

Page 10: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

10

Domains

• Suitable templates for target domains may exist in isolation in PDB, or in combination with dissimilar domains

• In case of relative domain motion, may want to solve domains separately

Page 11: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

11

Domains

• Domains search:– Top scoring templates from multiple alignment are tested to see

if they contain any domains.– Uses the SCOP database. This only lists domains that appear

more than once in the PDB.– The database is scanned to to see if domains exist for each of

the PDBs in the list of templates– Domains are then extracted from the parent PDB structure file

and added to the list of template models as additional search models for MR.

Page 12: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

12

Multimers

• Multimer search:– Search for quaternary structures that may be used as search models.– Better signal-to-noise ratio than monomer, if assembly is correct for

the target.– Multimeric structures based on top templates are retrieved using the

PQS service at the EBI, and added to the list of search models– PQS will soon be replaced by the use of the PISA service at the EBI

(Eugene Krissinel)

1n5a SPLIT-ASU into 4 Oligomeric files of type TRIMERIC1n5b SPLIT-ASU into 2 Oligomeric files of type DIMERIC1n5c SYMMETRY-COMPLEX Oligomeric file of type DIMERIC1n5d SYMMETRY-COMPLEX Oligomeric file of type DIMERIC

Page 13: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

13

`

`

`Target MTZ

& Sequence

TargetDetails

ModelSearch

ModelPreparation

Page 14: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

14

Search Model Preparation

Search models prepared in four ways:1. PDBclip

– original PDB with waters removed, hydrogens removed, most probable conformations for side chains selected and chain ID’s added if missing.

2. Molrep – Molrep contains a model preparation function which will align the

template sequence with the target sequence and prune the non-conserved side chains accordingly.

– Chainsaw – Can be given any alignment between the target and template

sequences.– Non-conserved residues are pruned back to the gamma atom.

1. Polyalanine– Created by excluding all of the side chain atoms beyond the CB atom

using the Pdbset program

Page 15: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

15

Search Model Preparation

Ensemble for Phaser:

• Top scoring search models are “superposed” to create a ensemble model.

• This may provide a better search model than any of the individual models on their own.

• Currently the default is to use the top 5 scoring search models but plan to create dynamically based on MW and RMSDs of constituent search models

Page 16: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

16

`

`

`

`Target MTZ

& Sequence

TargetDetails

ModelSearch

ModelPreparation

Molecular Replacement& Refinement

Page 17: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

17

• The search models can be processed with Molrep or Phaser or both.

• The resulting models from molecular replacement are passed to Refmac for restrained refinement.

• The change in the Rfree value during refinement is used to determine how good the resulting model is.

• If the final value for Rfree is less than 0.35 or it is less than 0.5 and has fallen by more than 20 % from the initial Rfree, a solution is deemed to have been found.

• Models that produce an Rfree below 0.5 and the value looks to be falling will be highlighted as “marginal solutions” that are worthy of further investigation if no solution is found using the other search models.

Molecular Replacement and Refinement

Page 18: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

18

`

`

`

`Target MTZ

& Sequence

TargetDetails

ModelSearch

ModelPreparation

Molecular Replacement& Refinement

Serial mode:Check Scores and exit or select the next model

Page 19: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

19

`

`

`

`Target MTZ

& Sequence

TargetDetails

ModelSearch

ModelPreparation

Molecular Replacement& Refinement `

Molecular Replacement& Refinement`

Molecular Replacement& Refinement `

Molecular Replacement& Refinement `

Molecular Replacement& Refinement

Parallel mode:Start multiple MR jobs and exit when one finds a solution

Page 20: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

20

• MrBUMP can take advantage of a compute cluster to farm out the Molecular Replacement jobs.

• Currently Sun Grid Engine enabled clusters are supported but support will be added for LSF and condor and any other types of queuing system if there is enough demand.

MrBUMP on compute clusters

Page 21: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

21

• Pre-release made available in Jan 06

• Simple installation

• Currently runs on Linux and OSX.

• Windows version almost ready.

•Comes with CCP4 GUI .

•Can also be run from the command line with keyword input

•Good deal of interest and some successes

•Regular updates (currently version 0.3)

Pre-release version of MrBUMP

http://www.ccp4.ac.uk/MrBUMP

Page 22: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

22

Example 1

1vlw: 3 chains of 205aa. Data in C2221 to 2.3Å. Using Molrep.

Search model Seq id. (%) Contrast CC Refmac Rfree ARP/wARP

1fq0_C_CHNSAW 31.8 3.25 0.342 / 0.318 0.504 / 0.480

1fq0_C_MOLREP 31.8 1.59 0.376 / 0.369 0.521 / 0.476

1fq0_B_CHNSAW 31.8 2.28 0.336 / 0.320 0.523 / 0.499

1fq0_B_MOLREP 31.8 1.37 0.358 / 0.357 0.530 / 0.529

1fq0_A_ CHNSAW 31.8 4.53 0.345 / 0.308 0.526 / 0.466 239 (29), R=0.202

23 chains, conn = 0.81

1fq0_A_MOLREP 31.8 1.61 0.352 / 0.350 0.527 / 0.479

Page 23: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

23

Example 2

Anon:Search model Seq id. RMSD Contrast / RFZ CC / TFZ Refmac Rfree ARP/wARP

xxx_A_MOLREP 57.6 2.831 7.25 0.436 / 0.419 0.570 / 0.541 yes

Z=3.6 Z=3.2 0.537 / 0.491 yes

xxx_A_CHAINSAW 57.6 2.833 3.42 0.442 / 0.423 0.536 / 0.535 no

Z=4.0 Z=3.3 0.542 / 0.536 no

yyy_A_MOLREP 57.6 2.863 3.21 0.450 / 0.434 0.567 / 0.546 no

Z=4.6 Z=2.9 0.545 / 0.498 yes

yyy_A_CHAINSAW 57.6 2.853 2.05 0.449 / 0.434 0.544 / 0.531 no

Z=4.3 Z=2.8 0.560 / 0.547 no

yyy_B_MOLREP 57.6 3.106 3.49 0.455 / 0.433 0.535 / 0.519 yes

Z=4.6 Z=2.4 0.529 / 0.503 yes

yyy_B_CHAINSAW 57.6 2.851 1.76 0.447 / 0.444 0.531 / 0.526 no

Z=2.9 Z=3.2 0.538 / 0.529 yes

yes = arp/warp builds and docks entire moleculeno = arp/warp fails = wrong MR solution

MrBUMP marginal solution

solution used

Page 24: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

24

A few observations ...

• In difficult cases, success in MrBUMP may depend on particular template, chain and model preparation method• Nevertheless, may get several putative solutions• Ease of subsequent model re-building, model completion may depend on choice of solution

• First solution or check everything? • Expectation that quick solution required - in fact, most users seem happy to let MrBUMP run for long time (hours, days)

• Worth checking “failed” solutions!

Page 25: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

25

Future developments

• Windows support (almost done)• Complexes (in progress)

– Processing of multiple target sequences• Improved alignment:

• Multiple alignment against larger sequence database• Alignment from profile-based search• User-supplied alignment• Incorporate PISA multimer determining service (in progress)

• Model generation:• Identification of flexible loops• Normal mode generated conformations

• Develop web-service version to allow CCP4i users to run jobs on CCP4 cluster

Page 26: 1 MrBUMP – Molecular Replacement with Bulk Model Preparation Ronan Keegan, Martyn Winn CCP4 group, Daresbury Laboratory Como May 23rd 2006

26

• Ronan Keegan, CCP4 @ Daresbury• Thanks to authors of all underlying programs and services• Other suggestions from:

• Dave Meredith, Graeme Winter, Daresbury Laboratory.• Eugene Krissinel, EBI, Cambridge.• Eleanor Dobson, YSBL, York University• Geoff Barton, Charlie Bond, University of Dundee• Randy Read, Airlie McCoy, Cambridge

• Funding:• BBSRC (e-HTPX, CCP4)

Acknowledgements

http://www.ccp4.ac.uk/MrBUMP