gromacs molecular modeling tutorial

11
Introduction to Molecular Dynamics with GROMACS Molecular Modeling Course 2007 Simulation of Lysozyme in Water Erik Lindahl ([email protected] )

Upload: cateixeira

Post on 23-Oct-2014

307 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Gromacs Molecular Modeling Tutorial

Introduction toMolecular Dynamics

with GROMACSMolecular Modeling Course 2007

Simulation of Lysozyme in Water

Erik Lindahl ([email protected])

Page 2: Gromacs Molecular Modeling Tutorial

SummaryMolecular simulation is a very powerful toolbox in modern molecular modeling, and enables us to

follow and understand structure and dynamics with extreme detail – literally on scales where motion of individual atoms can be tracked. !is task will focus on the two most commonly used methods, namely energy minimization and molecular dynamics that respectively optimize structure and simulate the natural motion of biological macromolecules. we are first going to set up your Gromacs environments, have a look at the structure, prepare the input files necessary for simulation, solvate the structure in wa-ter, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll test some simple analysis programs that are part of Gromacs.

You should write a brief report about your work and describe your findings. In particular, try to think about why you are using particular algorithms/choices, and whether there are alternatives.

Molecular motions occur over a wide range of scales in both time and space, and the choice of ap-proach to study them depends on the question asked. Molecular simulation is far from the only avail-able method, and when the aim e.g. is to predict the structure of a protein it is often more efficient to use bioinformatics instead of spending thousands or millions of CPU hours. Ideally, the time-dependent Schrödinger equation should be able to predict all properties of any molecule with arbitrary precision ab initio. However, as soon as more than a handful of particles are involved it is necessary to introduce approximations. For most biomolecular systems we therefore choose to work with empirical parameterizations of models instead, for instance classical Coulomb interactions between pointlike atomic charges rather than a quantum description of the electrons. !ese models are not only orders of magnitude faster, but since they have been parameterized from experiments they also perform better when it comes to reproducing observations on microsecond scale (Fig. 1), rather than extrapolating quantum models 10 orders of magnitude. !e first molecular dynamics simulation was performed as late as 1957, although it was not until the 1970’s that it was possible to simulate water and biomole-cules.

Background & TheoryMacroscopic properties measured in an experiment are not direct observations, but averages over bil-

lions of molecules representing a statistical mechanics ensemble. !is has deep theoretical implications that are covered in great detail in the literature, but even from a practical point of view there are impor-tant consequences: (i) It is not sufficient to work with individual structures, but systems have to be ex-panded to generate a representative ensemble of structures at the given experimental conditions, e.g. temperature and pressure. (ii) !ermodynamic equilibrium properties related to free energy, such as binding constant, solubilities, and relative stability cannot be calculated directly from individual simula-tions, but require more elaborate techniques. (iii) For equilibrium properties (in contrast to kinetic) the aim is to examine structure ensembles, and not necessarily to reproduce individual atomic trajectories!

!e two most common ways to generate statistically faithful equilibrium ensembles are Monte Carlo and Molecular Dynamics simulations. Monte Carlo simulations rely on designing intelligent moves to generate new conformations, but since these are fairly difficult to invent most simulations tend to do classical dynamics with Newton’s equations of motion since this also has the advantage of accurately reproducing kinetics of non-equilibrium properties such as diffusion or folding times. When a starting configuration is very far from equilibrium, large forces can cause the simulation to crash or distort the system, and for this reason it is usually necessary to start with an Energy Minimization of the system prior to the molecular dynamics simulation. In addition, energy minimizations are commonly used to refine low-resolution experimental structures.

All classical simulation methods rely on more or less empirical approximations called Force fields to calculate interactions and evaluate the potential energy of the system as a function of point-like atomic coordinates. A force field consists of both the set of equations used to calculate the potential energy and forces from particle coordinates, as well as a collection of parameters used in the equations.

Page 3: Gromacs Molecular Modeling Tutorial

10-15s 10-12s 10-9s 10-6s 10-3s 1s 103s

bond length vibration rotation

around bonds transport inion channel

rapidprotein folding

normal proteinfolding

lipidrotation

ribosomesynthesis

membraneprotein fodling

waterrelaxation

lipiddiffusion "biology"

Accessible to atomic-detail simulation today

Fig. 1: Time scales if chemical/biological process and current simulation capabilities

For most purposes these approximations work great, but they cannot reproduce quantum effects like bond formation or breaking. All common force fields subdivide potential functions in two classes. Bonded interactions cover covalent bond-stretching, angle-bending, torsion potentials when rotating around bonds, and out-of-plane “improper torsion” potentials, all which are normally fixed throughout a simulation – see Fig. 2. !e remaining nonbonded interactions consist of Lennard-Jones repulsion and dispersion as well as electrostatics. !ese are computed from neighbor lists updated every 5-10 steps.

Given the potential and force (negative gradient of potential) for all atoms, the coordinates are up-dated for the next step. For energy minimization, the steepest descent algorithm simply moves each atom a short distance in direction of decreasing energy, while molecular dynamics is performed by integrating Newton’s equations of motion:

Fi = −∂V (r1,...,rN )

∂ri

mi∂ 2ri∂t 2

= Fi

!e updated coordinates are then used to evaluate the potential energy again, as shown in the flow-chart of Fig. 3.

Typical biomolecular simulations apply periodic boundary conditions to avoid surface artifacts, so that a water molecule that exits to the right reappears on the left; if the box is sufficiently large the molecules will not interact significantly with their periodic copies. !is is intimately related to the nonbonded in-teractions, which ideally should be summed over all neighbors in the resulting infinite periodic system. Simple cut-offs can work for Lennard-Jones interactions that decay very rapidly, but for Coulomb inter-actions a sudden cut-off can lead to large errors. One alternative is to “switch off” the interaction before the cut-off as shown in Fig. 4, but a better option is to use Particle-Mesh-Ewald summation (PME) to calculate the infinite electrostatic interactions by splitting the summation into short- and long-range parts. For PME, the cut-off only determines the balance between the two parts, and the long-range part is treated by assigning charges to a grid that is solved in reciprocal space through Fourier transforms.

Cut-offs and rounding errors can lead to drifts in energy, which will cause the system to heat up dur-ing the simulation. To control this, the system is normally coupled to a thermostat that scales velocities during the integration to maintain room temperature. Similarly, the total pressure in the system can be adjusted through scaling the simulation box size, either isotropically or separately in x/y/z dimensions.

!e single most demanding part of simulations is the computation of nonbonded interactions, since millions of pairs have to be evaluated for each time step. Extending the time step is thus an important way to improve simulation performance, but unfortunately errors are introduced in bond vibrations already at 1 fs. However, in most simulations the bond vibrations are not of interest per se, and can be

!e purpose of this tutorial is not to master all parts of Gromacs’ simulation and analysis tools in detail, but rather to give an overview and “feeling” for the typical steps used in practical simulations. Since the time available for this exercise is rather limited we will focus on a sample simulation system and perform some simplified analyses - in practice you would typically use one or several weeks for the production simulation.

Page 4: Gromacs Molecular Modeling Tutorial

Fig. 2: Typical molecular mechanics interactions used in Gromacs.

removed entirely by introducing bond constraint algorithms (e.g. SHAKE or LINCS). Constraints make it possible to extend time steps to 2 fs, and in addition the fixed-length bonds are actually better ap-proximations of the quantum mechanical grounds state than harmonic springs!

In principle, the most basic system we could simulate would be water or even a gas like Argon, but to show some of the capabilities of the analysis tools we will work with a protein: Lysozyme. !is is a fascinating enzyme that has ability to kill bacteria (kind of the body’s own antibiotic), and is present e.g. in tears, saliva, and egg white. It was discovered by Alexander Fleming in 1922, and one of the first pro-tein X-ray structures to be determined (David Phillips, 1965). !e two sidechains Glu35 and Asp52 are crucial for the enzyme functionality. It is fairly small by protein standards (although intermediate to large by simulation standards...) with 164 residues and a molecular weight of 14.4 kdalton - including hydrogens it consists of 2890 atoms (see illustration on front page), although these are not present in the PDB structure since they only scatter X-rays weakly (few electrons).

In the tutorial, we are first going to set up your Gromacs environments (might already be done for you), have a look at the structure, prepare the input files necessary for simulation, solvate the structure in water, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll test some simple analysis programs that are part of Gromacs.

Setting up Gromacs & other programsBefore starting the actual work, we need a couple of programs that might already be installed on

your system. Since the actual installation can depend on your system we will do this part interactively at the tutorial, although you can find more detailed instructions at http://www.gromacs.org.

1. You will need a Unix-type system with an X11 window environment (look at the install disks if you have a mac). You can use CygWin under windows, although this is a bit more cumbersome, and you will also need to install a separate X11 server.

2. Gromacs uses FFTs (fast fourier transforms) for some interactions, in particular PME. While Gromacs comes with the slower FFTPACK libraries built-in, it is a very good idea to install the freely available and faster FFT libraries from http://www.fftw.org. If you have an Intel mac you’re in luck, since the Gromacs package comes with this included. You can also find finished RPMs for x86 Linux on the Gromacs or FFTW sites. If you need to install this from source, unpack the distribution tarball with “tar -xzvf fftw-3.1.2.tar.gz”, go into the directory and issue “./configure --enable-float” (since Gromacs uses single precision by default) followed by “make” and finally become root and issue “make install”. If you do not have root access you can install it under your home directory, using the switch “--prefix=/path/to/somewhere” when you run the configure script.

Page 5: Gromacs Molecular Modeling Tutorial

3. Install Gromacs from a package, or compile it the same way as FFTW.

4 . I n s t a l l P y M o l f r o m t h e w e b s i t e http://www.pymol.org, or alternatively VMD.

5. Install the 2D plotting program Grace/xmgrace. Depending on your platform you might have to install the Lesstif (clone of Mo-tif ) libraries first (www.lesstif.org), and then get the Grace program source code from the site http://plasma-gate.weizmann.ac.il/Grace/.

6. To create beautiful secondary structure analysis plots, you can use the program “DSSP” from http://swift.cmbi.ru.nl/gv/dssp/. !is is freely available for academic use but requires a license, so we will use the Gromacs built-in al-ternative “my_dssp” for this tutorial.

If you are following this exercise as part of a lab course, the installation might already have been done for you.

The PDB Structure - think before you simulate!

!e first thing you will need is a structure of the protein you want to simulate. Go to the Protein Data Bank at http://www.pdb.org, and search for “Lysozyme”. You should get lots of hits, many of which were determined with bound ligands, mutations, etc. One possible alternative is the PDB entry 1LYD, but feel free to test some other alterna-tive too.

It is a VERY good idea to look carefully at the structure before you start any simulations. Is the reso-lution high enough? Are there any missing residues? Are there any strange ligands? Check both the PDB web page and the header in the actual file. Finally, try to open it in a viewer like PyMOL, VMD, or RasMol and see what it looks like. In PyMOL you have some menus on the right where you can show (“S”) e.g. cartoon representations of the backbone!

Creating a Gromacs topology from the PDB fileWhen you believe you have a good starting structure the next step is to prepare it for simulation.

What do we need to prepare? Well, for instance the choice of force fields, water model, charge states of titratable residues, termini, add in the missing hydrogens, etc. You will fairly quickly realize that most of this stuff is static and independent of the protein coordinates, so it makes sense to put it in a separate file called a “topology” for the system. !e good news is that Gromacs can create topologies automati-cally for you with the tool pdb2gmx.

!ere are several places where you can find documentation for the Gromacs commands. You should always get some help with “pdb2gmx -h”, which first includes a brief description of what the program

Update coordinates & velocities according to equations of motion

More steps?

Compute potential V(r) andforces Fi = iV(r) on atoms

Initial input data:Interaction function V(r) - "force field"

coordinates r, velocities v

Collect statistics and write energy/coordinates to

trajectory files

Done!

Yes

No

Repe

at for

milli

ons o

f step

s

Fig. 3: Simplified flowchart of a standard molecular dynamics simulation, e.g. as implemented in Gromacs.

Page 6: Gromacs Molecular Modeling Tutorial

does, then a list of (optional) input/output files, and finally a list of available flags. !e same informa-tion is present with “man pdb2gmx” (if it was properly installed), and you can also use the www.gromacs.org website: Choose “documentation” in the top menu, and then “manuals” in the left-side menu. !ere is both an extensive book describing everything and a shorter online reference to the commands and file formats.

For the first command we’ll help you. pdb2gmx should read its input from a coordinate file, and just for fun we’ll pick a specific water model and force field. Type

pdb2gmx -f 1LYD.pdb -water tip3p

!e program will ask you for a force field - for this tutorial I suggest that you use “OPLS-AA/L”, and then write out lots of information. !e important thing to look for is if there were any errors or warn-ings (it will say at the end). Otherwise you are probably fine. Read the help text of pdb2gmx (-h) and try to find out what the different files it just created do, and what options are available for pdb2gmx! !e important output files are (find out yourself what they do)

conf.gro topol.top posre.itp

If you issue the command several times, Gromacs will back up old files so their names start with a hash mark (#) - you can safely ignore (or erase) these for now.

Adding solvent water around the proteinIf you started a simulation now it would be in vacuo. !is was actually the norm 25-30 years ago,

since it is much faster, but since we want this simulation to be accurate you are going to add solvent.

However, before adding solvent we have to determine how big the simulation box should be, and what shape to use for it. !ere should be at least half a cutoff between the molecule and the box side, and preferably more. Due to the limited time you be cheap here and only use 0.5nm margin between the protein and box. To further reduce the volume of the box you could explore the box type options and for instance test a “rhombic dodecahedron” box instead of a cubic one. !e program you should use to alter the box is “editconf”. !is time you will have to figure out the exact command(s) yourself, but the primary options to look at in the documentation are “–f”, “–d”, “–o”, and “–bt”.

As an exercise, you could try a couple of different box sizes (the volume is written on the output!) and also other box shapes like “cubic” or “octahedron”. What effect does it have on the volume?

!e last step before the simulation is to add water in the box to solvate the protein. !is is done by using a small pre-equilibrated system of water coordinates that is repeated over the box, and overlapping water molecules removed. !e Lysozyme system will require roughly 6000 water molecules with 0.5nm to the box side, which increases the number of atoms significantly (from 2900 to over 20,000). GROMACS does not use a special pre-equilibrated system for TIP3P water since water coordinates can be used with any model – the actual parameters are stored in the topology and force field. For this rea-son the program only provides pre-equilibrated coordinates for a small 216-water system using a water model called “SPC” (simple point charge). !e best program to add solvent water is “genbox”, i.e. you will generate a box of solvent. You can select the SPC water coordinates to add with the flag “–cs spc216.gro”, and for the rest of the choices you can use the documentation. Remember that your topology needs to be updated with the new water molecules! !e easiest way to do that is to supply the topology to the genbox command.

Before you continue it is once more a good idea to look at this system in PyMOL. However, Py-MOL and most other programs cannot read the Gromacs gro file directly. !ere are two simple ways to create a PDB file for PyMOL:

Page 7: Gromacs Molecular Modeling Tutorial

1. Use the ability of all Gromacs programs to write output in alternative formats, e.g.:genbox –cp box.gro –cs spc216.gro –p topol.top –o solvated.pdb

2. Use the Gromacs editconf program to convert it (use -h to get help on the options):editconf -f solvated.gro -o solvated.pdb

If you just use the commands like this, the resulting structure might look a bit strange, with water in a rectangular box even if the system is triclinic/octahedron/dodecahedron. !is is actually fine, and de-pends on the algorithms used to add the water combined with periodic boundary conditions. However, to get a beautiful-looking unit cell you can try the commands (backslash means the entire command should be on a single line):

trjconv -s solvated.gro -f solvated.gro -o solv_triclinic.pdb \ -pbc inbox -ur tric

trjconv -s solvated.gro -f solvated.gro -o solv_compact.pdb \ -pbc inbox -ur compact

We’re not going to tell you what they do - play around and use the -h option to learn more :-)

Running energy minimization!e added hydrogens and broken hydrogen bond network in water would lead to quite large forces

and structure distortion if molecular dynamics was started immediately. To remove these forces it is nec-essary to first run a short energy minimization. !e aim is not to reach any local energy minimum, so e.g. 200 to 500 steps of steepest descent works very well as a stable rather than maximally efficient mini-mization. Nonbonded interactions and other settings are specified in a parameter file (em.mdp); it is only necessary to specify parameters where we deviate from the default value, for example:------em.mdp------integrator = steepnsteps = 200nstlist = 10rlist = 1.0coulombtype = pmercoulomb = 1.0vdw-type = cut-offrvdw = 1.0nstenergy = 10------------------

Comments on the parameters used:We choose a standard cut-off of 1.0 nm, both for the neighborlist generation and the coulomb and

Lennard-Jones interactions. nstlist=10 means it is updated at least every 10 steps, but for energy minimi-zation it will usually be every step. Energies and other statistical data are stored every 10 steps (nstenergy), and we have chosen the more expensive Particle-Mesh-Ewald summation for electrostatic interactions. "e treatment of nonbonded interactions is frequently bordering to religion. One camp advocates standard cutoffs are fine, another swears by switched-off interactions, while the third wouldn’t even consider any-thing but PME. One argument in this context is that ‘true’ interactions should conserve energy, which is violated by sharp cut-offs since the force is no longer the exact derivative of the potential. On the other hand, just because an interaction conserves energy doesn’t mean it describes nature accurately. In practice, the difference is most pronounced for systems that are very small or with large charges, but the key lesson is

Page 8: Gromacs Molecular Modeling Tutorial

really that it is a trade-off. PME is great, but also clearly slower than cut-offs. Longer cut-offs are always better than short ones (but slower), and while switched interactions improve energy conservation they in-troduce artificially large forces. Using PME is the safe option, but if that’s not fast enough it is worth in-vestigating reaction-field or cut-off interactions. It is also a good idea to check and follow the recom-mended settings for the force field used.

GROMACS uses a separate preprocessing program grompp to collect parameters, topology, and coordinates into a single run input file (e.g. em.tpr) from which the simulation is then started (this makes it easier to prepare a run on your workstation but run it on a separate supercomputer).

As we just said, the command grompp prepares run input files. You will need to provide it with the parameter file above, the topology, a set of coordinates, and give the run input file some name (oth-erwise it will get the default topol.tpr name). The program to actually run the energy minimization is mdrun - this is the very heart of Gromacs. mdrun has a really smart shortcut flag called “-deffnm” to use a string you provide (e.g. “em”) as the default name for all options. If you want to get some status update during the run it is also smart to turn on the verbosity flag (guess the letter, or consult the documentation). !e minimization should finish in a couple of minutes.

Carefully equilibrating the water around the proteinTo avoid unnecessary distortion of the protein when the molecular dynamics simulation is started,

we first perform an equilibration run where all heavy protein atoms are restrained to their starting posi-tions (using the file posre.itp generated earlier) while the water is relaxing around the structure. !is should really be 50-100ps, but if that appears to be too slow you can make do with 10ps (5000 steps) here. Bonds will be constrained to enable 2 fs time steps. Other settings are identical to energy minimi-zation, but for molecular dynamics we also control the temperature and pressure with the Berendsen weak coupling algorithm. !is is mostly done by extending the parameter file used above:

------pr.mdp------integrator = mdnsteps = 5000dt = 0.002nstlist = 10rlist = 1.0coulombtype = pmercoulomb = 1.0vdw-type = cut-offrvdw = 1.0tcoupl = Berendsentc-grps = protein non-proteintau-t = 0.1 0.1ref-t = 298 298Pcoupl = Berendsentau-p = 1.0compressibility = 5e-5 5e-5 5e-5 0 0 0ref-p = 1.0 nstenergy = 100define = -DPOSRES

Page 9: Gromacs Molecular Modeling Tutorial

------------------

For a small protein like Lysozyme it should be more than enough with 100 ps (50,000 steps) for the water to fully equilibrate around it, but in a large membrane system the slow lipid motions can require several nanoseconds of relaxation. !e only way to know for certain is to watch the potential energy, and extend the equilibration until it has converged. To run this equilibration in GROMACS you should once more use the programs grompp and mdrun, but (i) apply the new parameters, and (ii) start from the final coordinates of the energy minimization.

Running the production simulation!e difference between equilibration and production run will be minimal: the position restraints and

pressure coupling are turned off, we decide how often to write output coordinates to analyze (say, every 100 steps), and start a significantly longer simulation. How long depends on what you are studying, and that should be decided before starting any simulations! For decent sampling the simulation should be at least 10 times longer than the phenomena you are studying, which unfortunately sometimes conflicts with reality and available computer resources. !is will depend a lot on the performance of your com-puter, but one option is to first run a very short simulation so you have something to work with while learning the analysis tools, and in the meantime you can run a longer simulation in the background. Select the number of steps and output frequencies yourself, but for reasons you’ll see later it is recom-mended to write coordinates at least every picosecond, i.e. 500 steps.

------run.mdp------integrator = mdnsteps = < select yourself! > dt = 0.002nstlist = 10rlist = 1.0coulombtype = pmercoulomb = 1.0vdw-type = cut-offrvdw = 1.0tcoupl = Berendsentc-grps = protein non-proteintau-t = 0.1 0.1ref-t = 298 298nstxout = < select yourself! >nstvout = < select yourself! >nstxtcout = < select yourself! >nstenergy = < select yourself! >------------------

Storing full precision coordinates/velocities (nstxout/nstvout) enables restart if runs crash (power outage, full disk, etc.). !e analysis normally only uses the compressed coordinates (nstxtcout). Use the documentation to figure out the names of the corresponding files. Perform the production run with the commands grompp and mdrun (remember the verbosity flag to see how the run is progressing).

Analysis of the simulation

Page 10: Gromacs Molecular Modeling Tutorial

You don’t have to do exactly these analyses, but try one or a couple of them that sound interesting and then use the documentation website or Google to see what else you can do if you prefer that. At least for the longer ones you will get clearer results with a longer trajectory.

Deviation from X-ray structureOne of the most important fundamental properties to analyze is whether the protein is stable and

close to the experimental structure. !e standard way to measure this is the root-mean-square displace-ment (RMSD) of all heavy atoms with respect to the X-ray structure. GROMACS has a finished pro-gram to do this, called g_rms that takes a run input file and a trajectory to produce RMSD output.

It is usually a good idea to use a reference structure (the tpr file) from the input before energy mini-mization, since that is the experimental one. !e program will prompt both for a fit group, and the group to calculate RMSD for – choose “Protein-H” (protein except hydrogens) for both. !e output will be written to rmsd.xvg, and if you installed the Grace program you will directly get a finished graph that you can display with xmgrace rmsd.xvg.

!e RMSD increases pretty rapidly in the first part of the simulation, but stabilizes around 0.l9 nm, roughly the resolution of the X-ray structure. !e difference is partly caused by limitations in the force field, but also because atoms in the simulation are moving and vibrating around an equilibrium struc-ture. A better measure can be obtained by first creating a running average structure from the simulation and comparing the running average to the X-ray structure, which would give a more realistic RMSD around 0.16 nm.

Comparing fluctuations with temperature factorsVibrations around the equilibrium are not random, but depend on local structure flexibility. !e

root-mean-square-fluctuation (RMSF) of each residue is straightforward to calculate over the trajectory, but more important they can be converted to temperature factors that are also present for each atom in a PDB file. Once again there is a program that will do the entire job - g_rmsf. !e output can be writ-ten to an xvg file you can view with xmgrace, but also as temperature factors directly in a PDB file (check the output file options for the program). When prompted, you can use the group “C-alpha” to get one value per residue. !e overall agreement should be quite good (check the temperature factor field in this and the input PDB file), which hopefully lends further credibility to the accuracy and sta-bility of the simulation - your fluctuations are similar to those seen in the X-ray experiment.

Secondary structure (Requires DSSP)Another measure of stability is the protein secondary structure. !is can be calculated for each frame

with a program such as DSSP. If the DSSP program is installed and the environment variable DSSP points to the binary (do_dssp -h will give you help on that), the GROMACS program do_dssp can create time-resolved secondary structure plots. Since the program writes output in a special xpm (X pixmap) format you probably also need the GROMACS program xpm2ps to convert it to postscript. Both commands are reasonably well documented, and the end result will be an eps file that you can view e.g. with ghostview or some other standard previewer application.

Use the group “protein” for the secondary structure calculation. !e DSSP secondary structure defi-nition is pretty tight, so it is quite normal for residues to fluctuate around the well-defined state, in par-ticular at the ends of helices or sheets. For a (long) protein folding simulation, a DSSP plot would show how the secondary structures form during the simulation. If you do not have DSSP installed, you can compile the Gromacs built-in replacement “my_dssp” from the src/contrib directory, or hope that your system administrator already did so for you.

Page 11: Gromacs Molecular Modeling Tutorial

Distance and hydrogen bondsWith basic properties accurately reproduced, we can use the simulation to analyze more specific de-

tails. As an example, Lysozyme appears to be stabilized by hydrogen bonds between the residues GLU22 and ARG137 (the numbers might be slightly different for structures other than 1LYD), so how much does this fluctuate in the simulation, and are the hydrogen bonds intact? Gromacs has some very power-ful tools to analyze distances and hydrogen bonds, but to specify what to analyze you need an “index file” with the groups of atoms in question. To create such a file, use the command

make_ndx –f run.gro

At the prompt, create a group for GLU22 with “r 22”, ARG137 with “r 137” (for residue 22 and 137), and then “q” to save an index file as index.ndx. !e distance and number of hydrogen bonds can now be calculated with the commands g_dist and g_hbond. Both use more or less the same input (including the index file you just created). For the first program you might e.g. want the distance as a function of time as output, and for the second the total number of hydrogen bonds vs. time. In both cases you should select the two groups you just created. Two hydrogen bonds should be present almost throughout the simulation, at least for 1LYD.

Making a movieA normal movie uses roughly 30 frames/second, so a 10-second movie requires 300 simulation tra-

jectory frames. To make a smooth movie the frames should not be more 1-2 ps apart, or it will just ap-pear to shake nervously. To create a PDB-format trajectory (which can be read by PyMOL and VMD) with roughly 300 frames you can use the command trjconv. !e option -e will set an end time (to get fewer frames), or you can use -skip e.g. to only use every fifth frame. Write the output in PDB format by using .pdb as extension, e.g. movie.pdb.

Choose the protein group for output rather than the entire system (water move too fast). If you open this trajectory in PyMOL you can immediately play it using the VCR-style controls on the bottom right, adjust visual settings in the menus, and even use photorealistic ray-tracing for all images in the movie. With MacPyMOL you can directly save the movie as a quicktime file, and on Linux you can save it as a sequence of PNG images for assembly in another program. Rendering a movie only takes a few minutes, and if that doesn’t work you can still export PNG files. PyMOL can also create (rendered) PNG files with transparent background if you do: set ray_opaque_background, off

Conclusions & Report!e main point of this exercise was to get your feet wet, and at least an idea of how to use MD in

practice. !ere are more advanced analysis tools available, not to mention a wealth of simulation op-tions. Write a brief report about your work and describe your findings. In particular, try to think about why you are using particular algorithms/choices, and whether there are alternatives!

If you want to continue we’d like to recommend the Gromacs manual which is available from the website: It goes quite a bit beyond just being a program manual, and explains many of the algorithms and equations in detail! !ere are also a few videos & slides available online - Google is your friend!