new building biomolecular models in amber · 2020. 9. 30. · workflow for building 1uf0 –pdb...
TRANSCRIPT
Building Biomolecular Models in AMBERConcepts and practicalities
AMBER conceptsWhat you need to know to get started
The AMBER Suite of Programs
AMBERTOOLS is a free software toolkit for building,
running and analysing MD simulations of biomolecules.
antechamber: For parameterisation of non-standard
residues or small molecules.
xLeap: Associates a biomolecular structure with an MD
force-field and creates the atomistic model.
sander/sander.mpi: Runs MD on single or multiple (mpi
version) CPU processors.
pmemd/pmemd.mpi/pmemd.cuda: Runs MD on single or
multiple (mpi version) processor, or on GPU (cuda version).
Requires an AMBER licence.
cpptraj: Toolkit for manipulating and analysing MD
trajectories.
https://ambermd.org/Manuals.php
Using the AMBER Leap Module
There are two ways of using the leap module either:
tLeap
Command line only. Will read a list of Leap instructions
from a file, e.g.:
$ tleap –f leapscript
xLeap
Interactive GUI enabling you to visualise your protein as
you build it. You need to type in each command sequentially.
Using xLeap
The xLeap module connects the protein structure
with the AMBER force-field.
AMBER contains residue templates for standard
biological units (e.g. amino acid residues) that the
program uses to assign force-field parameters from
a pdb file. It understands chemistry!
xLeap produces files called “prmtop” (or top) and
“rst”.
“parm” contains the parameters and connectivities
“rst” contains the starting coordinates.
Histidines within AMBERIn crystal structures, histidine is normally denoted by HIS,
because the protonation state is ambiguous. There are
three residue templates for histidine:
HIE (default, H on the delta nitrogen)
HID (H on the epsilon nitrogen)
HIP (hydrogens on both nitrogens, this is positively charged).
HIE and HID tautomers HIP resonance structures
Choosing your AMBER Force-field
AMBER force-fields:
The “Molecular mechanics force fields” chapter of the AMBER
manual provides a detailed description of each available force-field,
with their respective advantages and caveats (on pg 35 onwards for
AMBER20).
Specifying a Force-field within Leap
The following Leap script will load in recommended
protein/DNA/lipid/solvent and general components
source leaprc.ff14SB Protein
source leaprc.DNA.bsc1 DNA
source leaprc.lipid17 Lipid
source leaprc.water.tip3p Water and ions
source leaprc.gaff2 General
Additional parameters for common co-factors (e.g. ATP/NADH),
amino acid modifications (e.g.
phosphoserine/histidine/threonine/tyrosine and alternative solvent
boxes (e.g. DMSO/chloroform) are available from:
http://research.bmh.manchester.ac.uk/bryce/amber/
Structures that Require Extra Set-up
Biomolecular complexes - correct position of TER flag in
pdb and force-field parameterisation (for ligands).
Disulphide bonds (covalent linkage can be added in Leap)
Co-ordinated metal ions (e.g. Zn/Mg/Mn). Chimera has a
metal centre builder.
Post-translational modifications (e.g. phospho-
serine/glycosylation).
Membrane proteins embedded in a bilayer
AMBER DemonstrationWorkflow transcripts – for
reference/clarification
AMBER commands to build 1UF0
Antechamber command to (very crudely) parameterise the
ligand:
Run antechamber using the bcc charge model. Ligand has a net charge of -1 (-nc flag). Use
gaff2 atom types (-at flag).
antechamber -i 6uf0_ligand.pdb -fi pdb -o 6uf0_ligand.mol2 -fo mol2 -c bcc -at gaff2 -nc -1
Run parmchk to generate missing parameters
parmchk2 -i 6uf0_ligand.mol2 -f mol2 -o 6uf0_ligand.frcmod
Reliable force-field parameterisation for small molecules is a “dark
art”, and requires expertise. For example, you may wish to use the
Gaussian program for a better treatment of the QM (antechamber
uses a cheap semi-empirical method).
AMBER commands to build 1UF0
Leap commands to build the solvated complex:
source leaprc.protein.ff14SB
source leaprc.gaff2
source leaprc.water.tip3p
Q5V = loadmol2 6uf0_ligand.mol2
loadamberparams 6uf0_ligand.frcmod
pdb = loadpdb 6uf0_final_cleanH_HID.pdb
saveamberparm pdb 6uf0_Q5Y.prmtop 6uf0_Q5Y.rst
addions pdb Na+ 0
solvatebox pdb TIP3PBOX 10.0
addions pdb Na+ 28
addions pdb Cl- 0
saveamberparm pdb 6uf0_Q5Y_wat.prmtop 6uf0_Q5Y_wat.rst
Load in ff14, gaff2, solvent force-
fields
Load ligand mol2 and frcmod
Load pdb of complex
Save unsolvated prmtop/rst
Neutralise with Na+
Add 10.0Å water box
Add 28 Na+ ions
Neutralise with Cl-
Save solvated prmtop/rst
AMBER Tip: To add excess salt (e.g. 150mM), note the number of water molecules added
by Leap (e.g. 10242). Water is 55.5M. Therefore, calculate ((0.15/55.5) × num_waters).
For 10424 waters, we need to add 28 excess ions.
Workflow for Building 1UF0 – pdb preparation1. Download pdb and pdb redo files. Compare the two, and note the changes during re-
refinement. Assess the resolution of the structures, to know how much you can trust the
reported sidechain/rotamer positions.
2. Load the complex into VMD and assess the contents of the pdb file. Create a new
representation (“Graphics → Representations → Create Rep), use the “Licorice” drawing
method, and in the “Selected atoms” box type “not protein”. This will reveal everything in
the file that is not your protein, and allow you to inspect your ligand.
3. Delete anything that is a buffer component and therefore an artefact of the crystallisation (e.g.
by manually editing your pdb file). You can generally delete the crystalline waters so long as
they do not form bridging interactions with the ligand. You must check this carefully in
VMD, as water bridges can be key to specificity.
4. Check whether your protein contains disulphide bonds. (If it does, refer to, for example,
https://ambermd.org/tutorials/pengfei/index.php).
5. Ensure that there is a TER flag between your protein and your ligand (and any ions or waters
you choose to retain).
6. Check for missing residues (Chimera is best for this, and links directly to the Modeller
program which will build missing loops).
7. Protonate your structure in Chimera (or with H++ http://biophysics.cs.vt.edu/). In Chimera
use “Tools → Structure Editing → AddH”. Make sure that the “also consider H-bonds”
option is switched on. Pay careful attention to any titratable groups on your ligand and their
neighbouring interactions which may favour a particular protonation state (e.g. induce a pKa
shift). Save this new pdb file.
Workflow for Building 6UF0 - xLeap1. Execute all of the leap commands up to and including the step where the pdb file of the
complex is loaded in.
2. Type “list” to check that your ligand (Q5V) has been read into the residue template list.
3. You will see that xLeap adds a new atom to HIE residue number 436, in this case. By
typing “edit pdb” you will be able to identify where this atom is, and understand where
the problem has come from. You will see that this residue should be of type HID, not
HIE, which is the default state assumed for the HIS residue type. This deviation from the
default occurs in this case because Chimera has identified a backbone H-bond interaction
with carbonyl oxygen of ILE 435.
4. Close xLeap and edit the pdb file to change the name of residue 436 from HIS to HID.
Notice that there are multiple other HIS in the pdb file that are in the default, HIE state.
5. Repeat step 1. You should see that there are now no xLeap errors. Save parameter and
restart files
6. Type “charge pdb” to find out the charge on your molecule (in this case -4).
7. Neutralise with counterions.
8. Add waters, and note the number of water molecules added.
9. Use the number of waters added to calculate the number of excess ions you require to
obtain the salt concentration required (generally around 150mM).
10. Add the necessary number, then neutralise. Save solvated parameter and restart files.
AMBER tip: When AMBER gives you warnings, it’s most often telling you about things it has fixed. If it
will save a topology file, this normally means that everything is ok. Try this first (e.g. “saveamberparm”,
and then problem solve if it refuses to write out topology/parameter information.
Workflow for Building 1UF0 – run and visualise
1. Equilibrate your structure using pmemd.mpi on a parallel CPU machine. The AMBER
CPU version is more numerically stable, and is recommended for equilibration.
2. Perform your production run using pmemd.cuda on your GPU. Note that implicitly
solvated GB/SA simulations are not compatible with the GPU version of AMBER, and
require parallel CPU.
3. Use “cpptraj” to catenate your trajectories (if you have restarted your production run
multiple times). If you have a biomolecular complex, you may need to use the
“image” command to ensure that both biomolecules are located in the same periodic
box. Remove water and counterions (unless you are specifically interested in ion or
solvation shells around your protein, which you may well be).
4. Read your “.prmtop” and “.nc” files into VMD to visualise your trajectory. Be careful
that there is the same number of atoms in your trajectory (.nc) and topology file – or
your visualisation will be a horrible mess (e.g. using a solvated topology file but an
unsolvated trajectory output by “cpptraj”, or visa versa, gives dramatically terrifying
results!!)
5. Inspect your ligand/protein interactions very carefully. Do the key interactions you
observed in the original pdb file persist? Are there any problems with regions of the
protein changing shape during the trajectory? Are there any other closely homologous
protein/ligand interactions in the literature that you can compare with? History repeats
itself in structural biology, and key motifs occur repeatedly in unexpected places, so
these comparisons can be invaluably helpful.
Diamond shows δ-hydrogen correctly added
into the pdb file by Chimera (but currently un-
bonded).
Default ɛ-hydrogen associated with HIS residue name (this is bonded in accordance with the residue
template)
Backbone carbonyl oxygen
which forms H-bond with HID
δ-hydrogen
AMBER cpptraj commands to process 1UF0
cpptraj commands to image then dehydrate the trajectory:
Read in the solvated trajectory (containing 500000
structures), keeping every 100th conformer
Put all periodic images back into the principal
simulation box
Remove water molecules
Remove global translation and rotation of the protein
(residues 1-287)
Write out new trajectory
trajin 6uf0_Q5Y_watmd9.x 1 500000 100
autoimage
strip :WAT
rms first :1-287
trajout image_6uf0_Q5Y_ions.nc
go
How to make the corresponding prmtop file for visualisation:
parmstrip :WAT
parmwrite out 6uf0_Q5Y_ions.prmtop
go
Remove the waters from the topology file
Write out your new topology file
Run cpptraj on your solvated prmtop/trajectory with:
> cpptraj 6uf0_Q5Y_wat.prmtop < myptrajscript
Visualisation of 1UF0 Trajectory
VMD visualisation showing the protein (new cartoon), the ligand (licorice), residues
containing an atom within 5Å of the ligand (lines) and Na+ (blue VdW) and Cl- (red VdW).
AMBER practicalBuild and run a peptide MD simulation
Keep your Workspace Tidy!
Make a directory for your current work.
$ mkdir peptide_model
Enter that directory.
$ cd peptide_model
It is important to keep different simulations in
different directories or you will get into a horrible
mess.
Building Peptides in xLeap
Start xLeap:
$ xleap
You should see the xLeap Universe Editor window.
For this practical, we only need the protein force-field
(leaprc.ff14SB).
> source leaprc.protein.ff14SB
To see which molecules you have available in ff14SB, type:
> list
You should see a list of amino acid residues and their C- and
N-terminal counterparts.
To build your peptide, think of ~5 amino acids (substitute
RES with the amino acid of your choice – they can all be
different!!):
You need NXXX and CXXX at the ends of the peptide to
correctly chemically cap the ends. Use the edit command to
visualise your molecule.
> peptide = sequence {NRES RES RES RES CRES}
> edit peptide
Close the editor window using the drop down menu.
Now save your molecule:
> savepdb peptide peptide.pdb
> saveamberparm peptide peptide.prmtop peptide.rst
Playing with xLeap
> edit peptide Look at molecule
Now you have saved your files you can play with the
select, draw, erase commands. Play with the mouse
buttons and work out how to rotate the molecule and
zoom in and out.
Select a SMALL group of atoms. Use the drop-down
menu to edit selected atoms. What do you see? What do
the columns mean?
Try relax selection, check unit, calculate net charge.
Close the molecule editor and close xLeap.
MD Simulations in Implicit Solvent
In this example, we perform an implicitly solvated
simulation of our peptide using the GB/SA (Generalised
Born/Surface area model (pg 67 of the AMBER20 manual).
BEWARE!!
For simplicity, we have used the igb = 1 option, which may
not be the best choice for your system. Please refer to the
relevant section of the manual for a detailed discussed
(Section 4.1, pg 69-70 of the AMBER20 manual).
GB/SA models are often used in post-processing to
calculate interaction energies.
When used for MD, conformational sampling is enhanced
because of the absence of solvent damping.
Running MD with Sander (or PMEMD)
To run sander you need:1) The topology (prmtop) and input coordinate (rst) files
2) Sander input files (for implicit solvent just min1.in,
min2.in md1.in, md2.in, md3.in)
3) A shell script which tells the computer to run sander:
If you are fortunate enough to have a gpu, replace sander
with pmemd.cuda (not available in AmberTools - sorry)
Output trajectory file
l=rst
f=min1
sander -O -i $f.in -o $f.out -inf $f.inf -c peptide.$l -ref peptide.$l -r peptide.$f
-p peptide.prmtop -x peptide$f.nc -e peptide$f.ene
Input file (e.g.
min1.in) Output files (e.g.
min1.out/min2.inf)
Input coordinates
files (e.g. peptide.rst)
Restart file from this
run (e.g. peptide.min1)
Topology file Output file containing energy information
Equilibration
1. Energy minimisation (with then without
restraints).
2. Heat molecule from 0 to 300K (with
restraints).
3. Restrained MD at 300K
4. Production run
Equilibration is essential to obtaining a stable trajectory.
Running Sander (Implicit Solvent)
Put your input files in your working directory. You need
the .in/.sh/.prmtop and .rst files.
You will first need to make your script that runs sander
executable in Linux with chmod.
In the gbsa_peptide_run.sh script, you need to make
sure that the filenames for your ****.prmtop and
****.rst files are the same as those you have in your
working directory or AMBER won’t find them!
$ chmod +x gbsa_peptide_run.sh Make the script executable
$ ./gbsa_peptide_run.sh Execute the script
Implicit Solvent MD input file for Sander
Here is a specimen input file for an implicitly
solvated MD run (time in ps = nstlim × dt).
How long will this simulation run for?
How many MD snapshots will it output?
Production MD run at 300K
&cntrl
ntc=2, ! Enable SHAKE to constrain all bonds involving hydrogen
ntf=2, ! Setting to not calculate force for SHAKE constrained bonds
cut=12.0, ! Nonbonded cutoff distance in Angstroms (for PME, limit of the direct space sum - do NOT reduce this below 8.0.)
igb=1, ! Pairwaise generalized Born (implicit) solvent
gbsa=1, ! Carry out generalised Born/surface area simulations
saltcon=0.1, ! Set concentration of 1-1 mobile counterions
ntpr=500, ! Print to the Amber mdout output file every ntpr cycles
ntwx=500, ! Write Amber trajectory file mdcrd every ntwx steps
nstlim =500000, ! Number of MD steps in run (nstlim * dt = run length in ps)
dt=0.002, ! Time step in picoseconds (ps). The time length of each MD step
ntt=1, ! Temperature control with Langevin thermostat
temp0=300.0, ! Initial thermostat temperature in K
ntx=5, ! Read coordinates and velocities from unformatted inpcrd coordinate file
irest=1, ! Restart previous MD run [This means velocities are expected in the inpcrd file and will be used to provide initial atom velocities]
nscm = 1000, ! Remove translational and rotataional center-of-mass movement at regular intervals
/
Visualising the Results
Visualising your trajectories is the most effective way to
assess if anything obvious has gone wrong.
Start VMD and read in your topology (e.g. peptide.prmtop)
file (this is of file type amber7parm) and your trajectory file
(e.g. peptidemd3.nc) (of file type amber netcdf).
Watch your dynamics!!!
You cannot currently view AMBER netcdf (.nc)
trajectories with VMD in Windows.
You can change the format of the trajectory file output by AMBER using the
“ioutfm” flag in sander/pmemd. Or you could post-process the trajectory with
cpptraj and write it out in a different format (e.g. mdcrd).
“Repeat” MD Simulations
One way to perform an independent “repeat” of a
simulation is to reassign the velocities at a chosen
point in the trajectory.
These input flags change!ntx=5 and irest=1 for a restart where velocities are
retained.
Assign new velocities then run MD at 300K
&cntrl
ntc=2, ! Enable SHAKE to constrain all bonds involving hydrogen
ntf=2, ! Setting to not calculate force for SHAKE constrained bonds
cut=12.0, ! Nonbonded cutoff distance in Angstroms (for PME, limit of the direct space sum - do NOT reduce this below 8.0.)
igb=1, ! Pairwaise generalized Born (implicit) solvent
gbsa=1, ! Carry out generalised Born/surface area simulations
saltcon=0.1, ! Set concentration of 1-1 mobile counterions
ntpr=500, ! Print to the Amber mdout output file every ntpr cycles
ntwx=500, ! Write Amber trajectory file mdcrd every ntwx steps
nstlim =250000, ! Number of MD steps in run (nstlim * dt = run length in ps)
dt=0.002, ! Time step in picoseconds (ps). The time length of each MD step
ntt=1, ! Temperature control with Langevin thermostat
temp0=300.0, ! Initial thermostat temperature in K
ntx=1, ! Read coordinates but NOT velocities from unformatted inpcrd coordinate file
irest=0, ! Assign new velocities
nscm = 1000, ! Remove translational and rotataional center-of-mass movement at regular intervals
/
Run a Repeat MD Simulation
Make a new directory, and copy the files you need into here.
Running on from your restart file from md2.in (eg
peptide.md2), run an independent repeat of your previous
simulation by asking md3.in to reassign a new set of
velocities (e.g. use md3_repeat.in).
You can use the file “gbsa_peptide_run_repeat.sh” to help you.
Call this trajectory something different (eg peptide_repeatmd3.nc)
Compare the two trajectories in VMD, and convince
yourself that they are different.
For example, you could plot a graph of the end to end
distance of the peptide in the two simulations.
AMBER practical appendixMost Common Simulation Problems
Very Common Simulation Problems
P: When I look at my trajectory in VMD, there
are funny lines all over the place!
S: The topology file you have loaded does not
contain the same number of atoms as your
trajectory. You may be using a topology file
with water, when your trajectory has been
dehydrated.
Alternatively, you have loaded the trajectory as
“with periodic box” when no box information
is present, or visa versa.
P: I built my molecule in leap, but when I try and
run a simulation it explodes!
S: You have a bad starting configuration – probably with
some nasty VdW clashes (eg atoms on top of one another).
You may inadvertently have multiple conformers in your file (e.g. for
sidechains) which are sitting on top of each other. Delete all but one
of these and try again!!
Otherwise, use the visualisation package Chimera to identify clashes
(“Tools → Surface/Binding Analysis → Find Clashes/Contacts”),
and either change the way you build your molecule or relax that
section separately in Chimera (very slow) xLeap/sander – or better –
both.
P: When I run a simulation of DNA (or
drug/protein complex), I find that one of the
molecules jumps out of the box!!
S: AMBER has saved the coordinates of different
periodic boxes for the different parts of your
complex. You can fix this with the “image”
command in “cpptraj” (but it can be fiddly!)
P: When I read my pdb file into Leap, AMBER
complains that it doesn’t recognise the atom names.
It also add lots of atoms that I didn’t expect.
S: AMBER requires very specific atom names to
compare with its residue templates. If those in your
pdb file are not AMBER compatible, it won’t
understand them. You can either
“addPdbAtomMap” in Leap or change them in the
original pdb file.
P: When I read my drug-DNA complex/ phosphorylated
protein etc that I downloaded from the pdb into xLeap, it
won’t save a topology file because there are “missing
parameters”.
S: The most usual cause is a missing TER flag between your ligand
and your protein/metal ions etc. AMBER tries to bond these when
the TER is missing, but will be unable to find the relevant parameters
because they are not chemically relevant.
If this is not the problem, then maybe you are trying to run a
simulation of a residue that is not standard, and AMBER does not
know the parameters by default. You need to look and see if there are
AMBER parameters available for the simulation you are trying to
perform, if not, you need to calculate your own with
antechamber/Gaussian etc – you might be able to use gaff2 (this is a
dirty solution!)