Basic MD tutorial 1: mini-protein folding-bell

Basic MD simulation tutorial for the Simmerling Lab

updated 10/2017 by carlos to fix some language and make simulation longer, and add cpptraj so traj files can be loaded in Windows VMD.

This tutorial will show some basic information needed to run an Amber MD simulation of a peptide using the GPU cluster in the Simmerling lab. It will be assumed that you have read the relevant sections of the Amber manual (Leap and Sander will be used here), and also that you have read about how MD simulations work. You also need to have learned some of the basics of Unix before doing this tutorial. It can also be helpful to do the VMD tutorial first. Note that inputs for PMEMD.CUDA may not be compatible with PMEMD.MPI or SANDER.

IMPORTANT

Note that this tutorial uses simulation methods that are only appropriate for a case where you do not have an initial structure from experiment. This is not the right procedure to use for when you do have structure information- in that case, make sure to read the Basic MD tutorial 3: build it from a pdb and equilibrate in explicit water-bell and guide to setting up a new system and doing equilibration!

build the prmtop and inpcrd using tleap

In this section, we need to create the “prmtop” that contains all of the force field parameters for the molecule, as well as initial coordinates. In this case we will let tleap build the initial structure. We could also load known coordinates, such as a file from the protein data bank (PDB, www.rcsb.org), but we won’t do that yet. Of course leap will not build “good” coordinates, but it will just make a guess so that we can start doing simulations. Since the initial structure is not “good”, we don’t have to be very careful about our initial equilibration to adjust that structure to the simulation environment. Equilibration of structures obtained from experiments is much more complicated and is the subject of Basic MD tutorial 3a: build it from a PDB and equilibrate in GB-bell and Basic MD tutorial 3: build it from a PDB and equilibrate in explicit water-bell.

1: Bell is the machine where you should usually work, and you can log into this machine from inside or outside the lab using ssh. read the Simmerling Lab cluster (Bell) page in our private wiki, if you have a problem logging in. note also that logging on to Bell from outside the lab requires a few extra steps (read the aforementioned Simmerling Lab cluster page for details). Log into Bell


2: change to your home directory; if you are using the comprot account, make sure you are in the sub-directory that has YOUR files (usually this directory is your name)

3: setup Amber environment (add these lines to your .bashrc file using vi or another text editor. the .bashrc is in your home directory. Here is a simple vim tutorial if you haven't used vim before https://coderwall.com/p/adv71w/basic-vim-commands-for-getting-started)

 export AMBERHOME=/opt/amber
 test -f $AMBERHOME/amber.sh
 source $AMBERHOME/amber.sh

4: Type the below command for changes in your .bashrc to be loaded.

 source .bashrc

create a directory to put all of your files for the tutorial, and change to that directory

 comprot2@bell:~$ pwd
 /mnt/raidc/comprot2
 comprot2@bell:~$ mkdir MYNAME
 comprot2@bell:~$ cd MYNAME
 comprot2@bell:~/MYNAME$ mkdir TUTORIAL
 comprot2@bell:~/MYNAME$ cd TUTORIAL
 comprot2@bell:~/MYNAME/TUTORIAL$ ls
 comprot2@bell:~/MYNAME/TUTORIAL$

Now we need to copy files that Leap will need. First is the leaprc file. This sets the force field used for the topology file to the ff14SB force field.

Type:

 cp /opt/amber/dat/leap/cmd/leaprc.protein.ff14SBonlysc leaprc

The file will be copied to a new file in the current directory called “leaprc”. Leap looks for this file when it loads.

Now we need to make our input file that will tell leap what we want to do. For a simple system like we are building, you can make the file from scratch. For a more complicated system, it is usually easiest to take someone else’s file and modify it. Type “vim leap.in” to create the file, and then type in the contents exactly as shown below.

 x = sequence {NASN LEU TYR ILE GLN TRP LEU LYS ASP GLY GLY PRO SER SER GLY ARG PRO PRO PRO CSER}
 set default pbradii mbondi3
 saveamberparm x tc5b.top tc5b.crd

The first command specifies the sequence of our peptide. Note that the entire sequence is on a single line- it might look like 2 here because it wraps to the next line. Write all of them on 1 line. We have put “N” and “C” before the N- and C-terminal residues, respectively. This will create charged termini. If we wanted blocking groups, we would specify them (such as ACE or NHE for acetylated and amidated termini, respectively). If you are doing a simulation to compare to an experiment, it is important to find out what the termini should be (check the methods section of the paper). Proteins usually have charged termini (as shown in the input file above) but peptides often have blocking groups that need to be specified. If you load a structure from the PDB, it will have the information about the termini so you don’t need to worry about it in that case.

The second command tells leap that we want to use a specific set of atomic radii for the solvation model (mbondi3). This will be discussed later, for now just remember to add it.

The third command tells leap to save the prmtop and coordinate files so that the other Amber programs can use them. The “x” in the line refers back to the name that we gave this molecule when we specified the sequence on line 1.

Now we are ready to run leap using this file. Save the file and exit vi.

Type “tleap” to run leap. When you get the command prompt, type “source leap.in” to read your command script. The output is shown below. Leap has read the file and built the molecule. It tells you that the charge is not zero (that’s fine for this tutorial). It also tells us that the beginning and end of the peptide chain are unconnected (also fine). There are no other warnings or errors, so everything worked well. Next, quit leap by typing “quit”. Type “ls” to make sure that leap created the parameter and coordinate files. Leap also created a file called “leap.log”, in which it saved the detailed history of what it did. Always read your leap.log file! Every time you run leap it will append to this file, so if you run leap multiple times it can get to be confusing. If you re-run leap after an error, delete the leap.log file first. Also build different molecules in different directories so the log files don't get mixed.

xxxx@bell:~/MYNAME/TUTORIAL$ tleap
-I: Adding /home/jk/Desktop/softwares/amber15/dat/leap/prep to search path.
-I: Adding /home/jk/Desktop/softwares/amber15/dat/leap/lib to search path.
-I: Adding /home/jk/Desktop/softwares/amber15/dat/leap/parm to search path.
-I: Adding /home/jk/Desktop/softwares/amber15/dat/leap/cmd to search path.

Welcome to LEaP!
Sourcing leaprc: ./leaprc
Log file: ./leap.log
Loading parameters: /home/jk/Desktop/softwares/amber15/dat/leap/parm/parm10.dat
Reading title:
PARM99 + frcmod.ff99SB + frcmod.parmbsc0 + OL3 for RNA
Loading parameters: /home/jk/Desktop/softwares/amber15/dat/leap/parm/frcmod.ff14SB
Reading force field modification type file (frcmod)
Reading title:
ff14SB protein backbone and sidechain parameters
Loading parameters: /home/jk/Desktop/softwares/amber15/dat/leap/parm/frcmod.ff99SB14
Reading force field modification type file (frcmod)
Reading title:
ff99SB backbone parameters (Hornak & Simmerling) with ff14SB atom types
Loading library: /home/jk/Desktop/softwares/amber15/dat/leap/lib/amino12.lib
Loading library: /home/jk/Desktop/softwares/amber15/dat/leap/lib/aminoct12.lib
Loading library: /home/jk/Desktop/softwares/amber15/dat/leap/lib/aminont12.lib
Loading library: /home/jk/Desktop/softwares/amber15/dat/leap/lib/nucleic12.lib
Loading library: /home/jk/Desktop/softwares/amber15/dat/leap/lib/atomic_ions.lib
Loading library: /home/jk/Desktop/softwares/amber15/dat/leap/lib/solvents.lib
> source leap.in
----- Source: ./leap.in
----- Source of ./leap.in done
Using ArgH and AspGluO modified Bondi2 radii
Checking Unit.
WARNING: The unperturbed charge of the unit: 1.000000 is not zero.

 -- ignoring the warning.

Building topology.
Building atom parameters.
Building bond parameters.
Building angle parameters.
Building proper torsion parameters.
Building improper torsion parameters.
 total 61 improper torsions applied
Building H-Bond parameters.
Incorporating Non-Bonded adjustments.
Not Marking per-residue atom chain types.
Marking per-residue atom chain types.
  (Residues lacking connect0/connect1 -
   these don't have chain types marked:

        res     total affected

        CSER    1
        NASN    1
  )
 (no restraints)
> quit
        Quit

  
  bell:~/MYNAME/TUTORIAL> ls
  leap.in  leap.log  leaprc  tc5b.crd  tc5b.top

Use vmd to take a look at the molecule

Do this using the tc5b.crd and tc5b.top files. There is a separate VMD tutorial that you can use to learn more about VMD.

First, run VMD and select File/New molecule. Then click on browse so that we can find the files. You should have the raidc directory mounted on your computer, and how you find it will depend on the operating system, etc. The following example is for windows. If raidc doesn’t show up in “My Computer”, check with a senior person in the lab to get it mounted. In the meantime, you can use the ssh "window/new file transfer" window to transfer the data files to your PC.

VMD image

You need to load the prmtop file first (we called this tc5b.top). Select that file and click “open”. The Molecule File Browser will stay open and you should change the file type to “Amber7 parm”. Click the Load button. VMD will load this molecule but the File Browser will still stay open for you to load the coordinates - the prmtop has no coordinates. Click the Browse button and go back to your directory and select tc5b.crd and click “open”. Now, it is important that you change the file type to “Amber7 restart” since VMD will have guess “Amber coordinates”, but that is the format of trajectory files that will be generated during our MD simulation. Once you have changed the file type, then click on “load”. You can now close the Molecule File Browser since we are done loading files. You should see that your molecule is listed in the VMD Main window with 1 frame (we only loaded 1 coordinate set). The molecule itself is shown as a line drawing. The structure is nearly completely extended, except for the Pro residues that make the chain change direction a little. It is also possible to have leap impose a conformation when it builds the molecule (such as a helix), but this extended structure is fine for our tutorial. For all of your work in VMD, turn off the perspective view (use orthographic) since this can make small molecules look very distorted. Also consider turning off hydrogen atoms (by using the "noh" keyword in your atom selections) to help keep things less cluttered. You can turn back on hydrogen atoms that are important (such as ones that are involved in hydrogen bonds that you are examining), but most can be left off.

File:MDtutorial2.JPG

Don’t quit VMD, we will use it later in this tutorial.

Prepare for simulation using the pmemd.cuda module.

Prior to running any simulations, you should have done some basic reading on MD simulations as well as reading the web site about using Amber on GPUs (it contains lots of useful suggestions and warnings).

In order to run pmemd.cuda, you will need a few additional files to go along with the ones that you have already created. First, you need a script file that tells the computer what program you want to run (pmemd.cuda) and other options. Second, you need input files for sander that tells it the details of each of the two simulations that we want.


Since the cluster is sometimes updated, you should always go to the wiki page on using the cluster and update the script below if needed. That page will have an example of a current working script. Usually you will keep everything from the working example and change only the lines near the bottom that give the specific input and output files for your simulation.


First, let’s look at the run.pmemd file. It has comments that explain what it is doing. It is a file for the SLURM job scheduler, which tells SLURM which computers you want to use, where to find your program, and how to run the program. Most of this will not need changing. SLURM takes care of putting your job in line so it can run as soon as a computer of the type you requested becomes available.

#!/bin/bash

# this instructs slurm that you will be needing 1 gpu
# a cpu is also assigned to help the gpu
#SBATCH --gres=gpu:1
# this will let slurm know what you would like to name your job
#SBATCH --job-name kTutorialOne
# this tells slurm what type of gpu you want (using commas tells it that any of these are good)
#SBATCH --partition=680
hostname
echo $CUDA_VISIBLE_DEVICES
pwd

#here is the actual Amber job, listed in multiple pmemd runs. Once one finishes, the next will start.
/opt/amber/bin/pmemd -O \
 -i ./min.in \
 -p ./tc5b.top \
 -c ./tc5b.crd \
 -o ./min.out \
 -x ./min.x \
 -e ./minen \
 -v ./minvel \
 -inf ./mininfo \
 -r ./min.r 
 
/opt/amber/bin/pmemd.cuda -O \
 -i ./md.in \
 -p ./tc5b.top \
 -c ./min.r \
 -o ./md.out \
 -x ./md.x \
 -e ./mden \
 -v ./mdvel \
 -inf ./mdinfo \
 -r ./md.r

exit

“pmemd.cuda” is the module of Amber that does MD simulations by using GPU (like leap is the module that builds molecules). We are going to run two consecutive simulations: first will be energy minimization, since the initial structure built by leap may have some parts that are not ideal and this could cause high energies, resulting in large forces and instability when we run dynamics. We will first minimize the structure to relax it, and then use the relaxed structure as the input for an actual molecular dynamics simulation.

Note that the minimization was done by using CPU version pmemd. pmemd.cuda may have problems with minimization so please always run minimization using CPU pmemd or sander. For this tutorial, minimization only take about 1 min so it's OK to run minimization when you actually reserved a GPU but doing nothing on it. For your future projects, minimization and equilibration will take a lot of time so they should be done by only reserving corresponding amount of CPUs.


Next we are going to check the min.in and md.in files, which tell pmemd.cuda the kind of simulations that you want to do as well as other important information. Use vi to create min.in first:

energy minimization
 &cntrl
 	  imin = 1, maxcyc=1000,
 	  ntx = 1,
 	  ntwr = 100, ntpr = 100,
 	  cut = 999.0,
 	  ntb=0, igb = 8, gbsa = 0,
         saltcon = 0.0,
 /


Further details can be found in the sander manual, but the quick summary is that this job will do energy minimization for 1000 steps, using the GB implicit solvent model. After 1000 steps (or perhaps earlier if the minimization doesn’t need that many to relax the structure), pmemd will save the new coordinates and the job will end. You do not need to make any changes to this file.

Now let’s look at the md.in file:

langevin dynamics simulations
 &cntrl
 	  ntx = 1, irest=0, ig = -1,
 	  imin = 0, nstlim = 1000000000, dt = 0.002,
 	  ntt = 3, gamma_ln=1., temp0 = 300.0, tempi=200.,
 	  ntc= 2, ntf = 2,
 	  igb=8, ntb = 0, saltcon=0.,
 	  ntwx = 50000, ntwe = 0, ntwr = 50000, ntpr = 50000,
 	  cut = 1000.0,
 	  nscm = 500,
 /


Again you should check the Amber manual to see what each of these flags does. In summary, this run will perform 1 billion steps of MD, each of length 0.002ps (2fs). Since we are reading a file from a minimization, there will be no velocities in the input coordinates. We will assign them using the expected velocity distribution at 200K, and then run the simulation at 300K. It is important to note that we will save snapshots to our trajectory file each 50000 time steps (each 100 picoseconds). This is a reasonable interval; too frequently results in large trajectory files and frames that are very similar, and too infrequently means we won’t have the time resolution that we want to have for analyzing the simulation later.

Everyone except the most beginning users should modify their prmtop in order to use H mass repartitioning, allowing 4s steps. This will let your MD explore twice as much time in the same amount of computer effort. It isn't difficult if you just follow the directions. See the H-mass repartitioning Hydrogen-mass repartitioning to do larger time step (4fs) wiki page for details.

Now that we have prepared our prmtop and inprcrd files, and we have created a pmemd/pmemd.cuda run script as well as input files for both simulations, we are ready to ask the job scheduler to run the simulations for us. In order to submit this job, simply type “sbatch run.pmemd” and SLURM will tell you the job ID. You can also type “squeue” to see the status of your job. “P” means it is waiting and has not yet started, while “R” means it is running. If it does not show up, check your output files as it may have completed. If it finishes very quickly, you might have an error. Look at the MD output from Amber as well as the slurm output file, since sometimes errors are put in the slurm out file.

If you want further details on your simulation status, you can type “scontrol show job ##” where ## is the job id # that you can get from the standard squeue output. Also, the jobs that are running will differ, of course, from what you see when you do the tutorial.

 bell:~/MYNAME/TUTORIAL> sbatch run.pmemd
 Submitted batch job 6027
 bell:~/MYNAME/TUTORIAL> squeue
            JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             5949       all 004.prep    kevin  R 1-02:34:26      1 naga87
             5950       all 006.prep    kevin  R 1-02:33:26      1 naga84
             5951       all 004.prep    kevin  R   21:41:56      1 naga73
             5965       all 004.prep    kevin  R    7:24:42      1 naga5
             5971       all 003.prep    kevin  R    3:39:39      1 naga83
             6027       all TUTORIAL    juzou  R       0:53      1 naga5
             5924       cpu  1-9.2.0  hehuang  R    9:28:27      1 naga3
 
 bell:~/MYNAME/TUTORIAL> scontrol show job 6027
 JobId=6027 Name=TUTORIAL
  UserId=juzou(1004) GroupId=comp(1008)
  Priority=4294899792 Nice=0 Account=(null) QOS=(null)
  JobState=RUNNING Reason=None Dependency=(null)
  Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
  RunTime=00:08:16 TimeLimit=3-00:00:00 TimeMin=N/A
  SubmitTime=2014-06-30T14:46:11 EligibleTime=2014-06-30T14:46:11
  StartTime=2014-06-30T14:46:11 EndTime=2014-07-03T14:46:11
  PreemptTime=None SuspendTime=None SecsPreSuspend=0
  Partition=all AllocNode:Sid=bell:11081
  ReqNodeList=(null) ExcNodeList=(null)
  NodeList=naga5
  BatchHost=naga5
  NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
  Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0
  MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
  Features=(null) Gres=gpu:1 Reservation=(null)
  Shared=1 Contiguous=0 Licenses=(null) Network=(null)
  Command=/mnt/raidc2/juzou/TUTORIAL/run.pmemd
  WorkDir=/mnt/raidc2/juzou/TUTORIAL
  StdErr=/mnt/raidc2/juzou/TUTORIAL/slurm-6027.out
  StdIn=/dev/null
  StdOut=/mnt/raidc2/juzou/TUTORIAL/slurm-6027.out

Minimization

Minimization should only take a few minutes for a molecule of this size. Check the output file “min.out” and look at it carefully. The beginning simply reminds you of what files you told pmemd to use, and what flags were in your input file. It also tells you about all of the other flags that you didn’t specify (which left them at their default values). These are described in more detail in the Amber manual. Always check this part to make sure sander is doing what you wanted it to do.

         -------------------------------------------------------
         Amber 14 SANDER                              2014
         -------------------------------------------------------
 
 | PMEMD implementation of SANDER, Release 14
 
 | Run on 06/30/2014 at 14:46:11
 
 |   Executable path: /opt/amber15/bin/pmemd
 | Working directory: /mnt/raidc2/juzou/carlos/TUTORIAL
 |          Hostname: naga5
 
   [-O]verwriting output
 
 File Assignments:
 |   MDIN: ./min.in
 |  MDOUT: ./min.out
 | INPCRD: ./tc5b.crd
 |   PARM: ./tc5b.top
 | RESTRT: ./min.r
 |   REFC: refc
 |  MDVEL: ./minvel
 |   MDEN: ./minen
 |  MDCRD: ./min.x
 | MDINFO: ./mininfo
 |  MDFRC: mdfrc
 
 
  Here is the input file:
 
 energy minimization
   &cntrl
         imin = 1, maxcyc=1000,
 
         ntx = 1,
 
         ntwr = 100, ntpr = 100,
 
         cut = 999.0,
 
         ntb=0, igb = 8, gbsa = 0,
         
         saltcon = 0.0,
 /
 | Conditional Compilation Defines Used:
 | PUBFFT
 | BINTRAJ
 | EMIL
 
 | New format PARM file being parsed.
 | Version =    1.000 Date = 06/29/14 Time = 21:29:58
 
 | Note: 1-4 EEL scale factors are being read from the topology file.
 
 | Note: 1-4 VDW scale factors are being read from the topology file.
 | INFO:    Reading atomic numbers from topology file.
 | Duplicated    0 dihedrals
 
 | Duplicated    0 dihedrals
 
 --------------------------------------------------------------------------------
      1.  RESOURCE   USE:
 --------------------------------------------------------------------------------
 
  NATOM  =     304 NTYPES =      12 NBONH =     150 MBONA  =     160
  NTHETH =     346 MTHETA =     219 NPHIH =     700 MPHIA  =     653
  NHPARM =       0 NPARM  =       0 NNB   =    1701 NRES   =      20
  NBONA  =     160 NTHETA =     219 NPHIA =     653 NUMBND =      53
  NUMANG =     124 NPTRA  =     135 NATYP =      26 NPHB   =       0
  IFBOX  =       0 NMXRS  =      24 IFCAP =       0 NEXTRA =       0
  NCOPY  =       0
 
  Implicit solvent radii are ArgH and AspGluO modified Bondi2 radii (mbondi3)
 
  Replacing prmtop screening parameters with GBn2 (igb=8) values
 
 

Scroll down to the “RESULTS” section of the output where the energies are being reported for each step:

 --------------------------------------------------------------------------------
    4.  RESULTS
 --------------------------------------------------------------------------------
 
 
 
    NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
       1       3.8228E+06     3.6500E+06     5.9207E+07     HD2       253
 
  BOND    =       13.6690  ANGLE   =       38.7157  DIHED      =      221.9149
  VDWAALS =  3823264.6720  EEL     =    -1313.2880  EGB        =     -428.8900
  1-4 VDW =      103.7261  1-4 EEL =      911.7746  RESTRAINT  =        0.0000
 
 
    NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
     100      -5.6318E+02     1.7979E+00     1.0099E+01     CD2       114
 
  BOND    =       15.0662  ANGLE   =       47.7025  DIHED      =      217.9751
  VDWAALS =      -43.5080  EEL     =    -1277.5847  EGB        =     -452.3228
  1-4 VDW =       55.9959  1-4 EEL =      873.5008  RESTRAINT  =        0.0000
 
 
    NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
     200      -5.9564E+02     1.3124E+00     8.8062E+00     NH1       243
 
  BOND    =       14.7939  ANGLE   =       42.3678  DIHED      =      203.6594
  VDWAALS =      -49.1002  EEL     =    -1266.8536  EGB        =     -457.8900
  1-4 VDW =       52.4985  1-4 EEL =      864.8823  RESTRAINT  =        0.0000
   

Pmemd gives you detailed information on the energies (total plus individual types of energies that contribute to the total). It also tells you the energy gradient under “RMS”, and the atom with the highest force at this step (under GMAX, NAME, NUMBER).

Note that in this case the energy of the initial structure from leap is pretty high, especially in VDW. There must be some atoms that are a little too close to each other. The minimization algorithm is able to handle this, since the energy goes down and by step 100 the VDW energy is favorable (negative). After 100-200 steps, the energy doesn’t drop much and stays about the same until the minimization finishes.

Next we want to look at the structure after minimization. Amber will have written the restart file in netcdf format. If you're on Unix, VMD can read that, but otherwise (like on Windows) you'll need to run a quick cpptraj script to convert the file to a text format that VMD can read. Pay attention to this script, since you'll use something like this often to make your structures (and trajectories) VMD-readable. Other examples in this tutorial give more details.

#!/bin/bash
AMBERHOME=/opt/amber
cat > CPPtraj.in << EOF
parm ./tc5b.top
trajin ./min.r
trajout min.rst7
EOF
$AMBERHOME/bin/cpptraj -i CPPtraj.in

Use vi to copy this text into a new file called run.cpptraj.min. Make it executable by typing "chmod u+x run.cpptraj.min". Next, run the file by typing "./run.cpptraj.min". Your new file (min.rst7) will be VMD-readable in Windows. Load the minimized coordinates into VMD, the same way that you loaded the coordinates built by leap. Load it as a second molecule (new molecule, and load the prmtop first and then the minimized coordinates). This will allow you to show the initial and minimzed structure at the same time. In the picture below I show the original coordinates in gray and the minimized ones colored by atom. Notice that even though the energy changed a lot, the structure changed only slightly. The arrows in the picture point to some of the side chains that have moved from their initial positions, which may have been too close to the backbone.

File:MDtutorial10.JPG

MD

By now your MD simulation should have started running (the run.pmemd script started the MD right after the minimization finished. Look at the output file md.out. Most of it is the same as the minimization output; pmemd.cuda tells you about your input files and also the flags that you set in your md.in file. Scroll down to the RESULTS section to see the energies:

 --------------------------------------------------------------------------------
    4.  RESULTS
 --------------------------------------------------------------------------------
 
 
  NSTEP =        0   TIME(PS) =       0.000  TEMP(K) =   338.48  PRESS =     0.0
  Etot   =      -377.4415  EKtot   =       256.2728  EPtot      =      -633.7142
  BOND   =        11.0024  ANGLE   =        36.1515  DIHED      =       191.2463
  1-4 NB =        50.3586  1-4 EEL =       861.1627  VDWAALS    =       -55.6491
  EELEC  =     -1268.1393  EGB     =      -459.8473  RESTRAINT  =         0.0000
  ------------------------------------------------------------------------------
 
 
  NSTEP =      500   TIME(PS) =       1.000  TEMP(K) =   249.61  PRESS =     0.0
  Etot   =      -244.7386  EKtot   =       188.9846  EPtot      =      -433.7233
  BOND   =        59.2269  ANGLE   =       139.1236  DIHED      =       223.2853
  1-4 NB =        56.1610  1-4 EEL =       861.8115  VDWAALS    =       -51.9160
  EELEC  =     -1251.8055  EGB     =      -469.6101  RESTRAINT  =         0.0000
  ------------------------------------------------------------------------------
 
 
  NSTEP =     1000   TIME(PS) =       2.000  TEMP(K) =   284.32  PRESS =     0.0
  Etot   =      -178.1199  EKtot   =       215.2673  EPtot      =      -393.3872
  BOND   =        71.8073  ANGLE   =       141.5098  DIHED      =       248.2450
  1-4 NB =        56.5237  1-4 EEL =       862.1840  VDWAALS    =       -52.5565
  EELEC  =     -1236.3847  EGB     =      -484.7159  RESTRAINT  =         0.0000
  ------------------------------------------------------------------------------
 
 
  NSTEP =     1500   TIME(PS) =       3.000  TEMP(K) =   295.61  PRESS =     0.0
  Etot   =      -165.4818  EKtot   =       223.8136  EPtot      =      -389.2953
  BOND   =        59.2467  ANGLE   =       160.2499  DIHED      =       242.9907
  1-4 NB =        58.5768  1-4 EEL =       858.8242  VDWAALS    =       -52.1994
  EELEC  =     -1226.5232  EGB     =      -490.4610  RESTRAINT  =         0.0000
  ------------------------------------------------------------------------------
  

Note that the format is a little different from the minimization output. In this case pmemd.cuda also tells us the time and temperature of our simulation. Even though we set a “target” temperature, there will be fluctuations so you should not be concerned that the simulation isn’t exactly at 300K at each step (the average T should be 300K). Since our initial structure was minimized, it may take a while for the temperature to stabilize near our thermostat temperature of 300K.

About RMSD analysis

Start by reading some information on what an RMSD value means as well as this article. RMSD stands for "root mean squared deviation". It compares the coordinates of a set of atoms in 2 structures, and calculates the square root of the average over selected atoms of the distance between the matching atoms in the structures. Before calculating RMSD, the program generally overlaps the two structures to get a "best-fit", meaning it overlaps them in a way that minimizes the deviation for the selected atoms. Each structure pair can have a specific RMSD value. Normally, one compares one or more structures to a single reference structure (often the initial structure, or an experimental structure). When you have an MD simulation, you can choose a reference structure and then compare each frame of the time trajectory to the reference, giving you an RMSD value as a function of time. This is very useful to show if the structure is moving away from or getting closer to another structure as time goes on. We will use RMSD values extensively. Remember, the key points are that you need a reference structure, and that every structure will have it's own RMSD value compared to that reference. Higher RMSD values reflect larger differences. Since the RMSD is an average over all of the selected atoms in the structure, you don't always know exactly how to interpret RMSD values. Generally, if the value is small (under 1 or 2 angstroms), the match is pretty good and we call these "very similar" structures. If the value is higher, such as 3 or 4, it's hard to know if that means all of the selected atoms are about 3-4 angstroms from where they are in the reference, or if it means that many of the atoms match very closely (1-2), and that other atoms deviate more strongly (4-5, or maybe more). At some point you need to look at the structures visually to get better insight into the meaning of the RMSD value. Also keep in mind that as RMSD values get higher, it just says that the structure does not match the reference very well. It is very possible for 2 structures to both have RMSD values of 6 to the same reference, yet these two structure do not match each other, either. In other words, two structures with similar RMSD values to a reference do not have to be similar to each other just because they have the same RMSD. The expection would be that if both have very small RMSD values (0-1), they both closely match the reference and therefore also match each other. It's only at higher RMSD values where you become uncertain if two structures with the same RMSD are similar or not.

When you compare structures, the program will try to match up the atoms in the two structures. You can choose a subset of the atoms (such as just the backbone), but it's important to remember that most programs use one of the structures to figure out which atoms match your selection, and then assume that the same atom numbers in the second structure are also the match to your selection. This means that you have to make sure that the atoms in both structures being compared are in the same order, and correspond to the same systems. Otherwise you might get results that are incorrect. So, don't run a simulation and use the mdcrd/prmtop to compare to the original PDB file- when you load things into Leap, the order of atoms in the prmtop might not be the same as the original PDB. You'll need to process that original PDB through Leap and get a reference structure that has the atoms in the same order as your simulation.

RMSD values will be discussed in more detail below as you start to calculate them.

Doing analysis in cpptraj

Start by carefully reading this cpptraj RMSD tutorial from Dan Roe.

If you are running VMD under UNIX/LINUX, VMD may already support the netcdf binary file format used by Amber. Under Windows, though, VMD doesn't support netcdf so you'll first need to convert your trajectory format to a windows-compatible version before transferring it to your machine and using in VMD. For this we will use the extremely valuable cpptraj module of Amber, which you will end up using a lot as time goes on. Read the cpptraj manual to find out what kinds of things it can do (and see the other wiki tutorials too). The script below does the conversion, but it also shows an example of calculating RMSD values in cpptraj. You can do that in VMD too, but you'll probably end up doing it more often with cpptraj since it does much more analysis than possible in VMD (and is easier to script).

#!/bin/bash
AMBERHOME=/opt/amber

# This CPPtraj script does a few things in one script. Edit to suit your needs.
############################
# write a cpptraj input file#
############################
cat > CPPtraj.in << EOF
#############################################
# Specify the topology file, and its location
#############################################
parm ./tc5b.top
#############################################
# Specific the crd/trj file(s), and locations
############################################
trajin ./md.x

#########################################################
# Specify the crd/trj output file.                      #
# Specify 'nobox' if you plan to strip and use gas phase#
# topology file for VMD'ing or analyses                  #
# crd extension tells cpptraj to use amber text trajectory format
#########################################################
trajout ./tc5b.md.crd nobox
##########################

# cpptraj allows multiple reference structures to be used
# they each need a "tag" that you refer to when using the reference structure

 reference ./min.r [min]
 reference ./tc5b.native.crd [folded]

# calculate backbone rmsd to initial structure
 rmsd rms1 ref [min] :3-18@CA,N,C,O out rmsd.tc5b.ref_init.2-19.bb.dat

# calculate backbone rmsd to folded structure
 rmsd rms2 ref [folded] :3-18@CA,N,C,O out rmsd.tc5b.ref_native.2-19.bb.dat

########################################
#Stop writing to the CPPtraj input file#
########################################
EOF
####################################################
# Run CPPtraj, using the above input file...#
####################################################
$AMBERHOME/bin/cpptraj -i CPPtraj.in

Use vi to copy the text from this box into a NEW file called run.cpptraj. Look at the file and make sure you understand what it is doing, and also check the input file names to make sure they match yours. It reads a prmtop called tc5b.top, it reads a trajectory file called md.x, it reads 2 reference structures (initial from leap, and folded that you obtained through the link in this tutorial). Make sure those structures are present in your directory and have the right names. The concept of "reference" structures is extremely important - cpptraj will compare your structure to each of these, and tell you how different the current snapshot from MD is with respect to each reference. As output, you'll get 2 RMSD files (using the 2 reference structures), and also the trajectory in VMD-readable format. RMSD values only have meaning when you know exactly what you're comparing to. It tells you whether your structure is similar or not (and a number saying how similar). That will only make sense if you know your reference. Imagine that you are asking a friend about the location of a restaurant or gym, and they say "it's only 6 blocks away". That only has meaning if you know what they are using as the reference - is it 6 blocks from where you are now, 6 blocks from where they are, 6 blocks from school, 6 blocks from the train station? The reference is absolutely crucial, and every time you do an RMSD you should make sure you know what you are using as the reference structure. Reference structures are discussed more in the VMD section below.

The next thing to notice in this script is the concept of the atom "mask". This is the selection of atoms for which the comparison will be done. There is not a single correct atom selection- the atoms that you select will influence the RMSD values, and you must choose a set of atoms that makes sense for what you are trying to learn. Often we will only use protein backbone atoms (CA,N,C,O as in the script above), then the RMSD values tell us how close the backbone matches the reference but it will not tell us if the side chains are packed correctly or not. Sometimes we will choose all atoms, or all heavy atoms (non-H). Sometimes we use a subset of the structure if we just want to know about that region. Sometimes (like the case here) we will ignore 1-2 residues on each end of the protein chain if they are flexible. This is especially true if the ends are seen to be flexible in the experiment. Here we choose residues 3 through 18 out of 20 total, ignoring any deviation in the first two and last two residues. Sometimes it is helpful to do several RMSD calculations using different atom masks.

Save the file and quit. Then at the command prompt, type "chmod u+x run.cpptraj". This will make the file executable. To run the cpptraj job, just type ./run.cpptraj. This should run (check for errors and fix them) and produce your text-format trajectory file ready to load into VMD. You can also start to add additional analysis into this cpptraj file- it is much more powerful for data analysis than VMD, so you'll want to learn more about what it can do.

Viewing the dynamics in VMD and calculating RMSDs

Next, you can load the trajectory file from the MD run into VMD using the “Amber coordinates” format and the same procedure that you used for the previous files that contained only a single structure. VMD will load all of the structures and allow you to watch a movie and do some limited analysis on the data. More detailed analysis is beyond the scope of this tutorial.

A simple analysis we can do in VMD is to calculate the RMSD of the structure during the trajectory. You should already be familiar with what RMSD means and how to interpret different values. The important concepts here are (1) the atom selection mask and (2) the reference structure.

Go to the VMD menu “Extensions/Analysis/RMSD trajectory tool”. Check the box labeled “backbone” so that you only calculate backbone (non-sidechain) RMSD. IMPORTANT: selecting a subset of atoms for an RMSD calculation is one of the things you'll change often, depending on the goal. Exactly which atoms do you want to compare? Backbone or side chain? All of the protein, or just part? Don't just click the default atom selection (the "mask" in Amber) and assume it's the best choice. We expect you to think about what you want to compare before you set up the RMSD calculation, not after. DESIGN the analysis so that it answers your questions. Here the question is "does the overall shape of the backbone match the experiment?" (which tells you nothing about other things, such as whether the tryptophan side chain is properly packed in the core, for example. Often you'll have more than 1 RMSD calculation with different masks for the same trajectory!).

Next, you need to think about the reference structure. Your RMSD plot will be just a series of comparisons- comparing the positions of the selected atoms in each frame to the selected atoms in the reference structure. When you set up analysis, choose a reference structure that will give the comparison that you want. If your question is "did the structure change from the one I used at the beginning of the simulation", then you want a reference that is the beginning of the simulation. If the question is "does the simulation ever look like the structure determined by NMR", then you want the NMR (experimental) structure as the reference. None of the comparisons mean anything if you don't know exactly what the reference is. In cpptraj, you set a reference structure using the "reference" command. In VMD, you select it in the window for the RMSD trajectory tool. Don't do an RMSD calculation without fully understanding what you picked as the reference structure.

Ok, so back to VMD... look in the upper right of the window, under the "RMSD" and "Align" buttons, where it says "Ref". This is the important place where you say what to use as the reference structure for calculating RMSD. Most often, you will want to click "selected", which means you will highlight in the structure list below exactly which one to use as the reference. If you select a molecule that is a single structure, that's what it uses as reference. If you select a molecule that has many structures loaded (a trajectory), then you need to tell VMD which frame of that trajectory to use as a reference (the text box following "Trajectory Frame ref"). ONLY DO THIS IF YOU ARE ALREADY EXPERIENCED WITH TRAJECTORY ANALYSIS. The first frame is the default, but the first frame of a trajectory may not be a good reference unless you loaded an entire equilibration trajectory and know for sure that the first frame is exactly what you need to use as the reference. Here, the first frame will be your leap-built linear structure, and how close or far your simulation is from this very artificial structure probably doesn't tell you much. For now we'll do this anyway just so you get a sense of how the analysis works, Later in this tutorial, we'll load a useful reference structure.

Check the box “Plot” in the RMSD Trajectory Tool window so that you get a rough plot of RMSD (you probably don't want to use these VMD plots for a report or paper, since you have very little control over the X and Y axes and it's hard to make a good plot. It's just a guide while you're looking at your structures in VMD. Use cpptraj for any serious data analysis). . Now click the “Align” box, which will overlap each frame of the trajectory to your reference. Next click "RMSD", which will calculate the deviations. It will calculate the RMSD for each frame and generate a (not-so-pretty) plot. Use this as a guide, but as discussed above you'll want to learn to calculate RMSD values inside cpptraj and then export them to a more powerful plotting program. Next, try some different masks and see what happens. Never just accept the default atom selection without thinking about what you want to learn. For more info on how to set up masks in cpptraj, look here. Note that the mask you use for the "align" step doesn't have to be the same one you use later for the "RMSD" step - that's a useful tool that you will eventually find helpful for your analysis.

File:MDtutorial12.JPG

It is clear that the structure changes during the simulation, since the RMSD values compared to the initial structure become relatively high. Comparing the RMSD to the artificial initial structure is not really telling us anything helpful though, other than that the structure changed. You will want to know more than that. Has the peptide already folded to the correct structure for tc5b? We can visually see that it has not, since the final structure from our trajectory (shown in the VMD image above) still doesn’t look like the known tc5b structure. But- did it fold somewhere along the way, then unfold? You'd have to watch every frame to know this. Also, you want a quantitative analysis and not just "does it look sort of folded?". In order for an RMSD calculation to do what you want, you need to use the experimental structure as the reference. To do this in VMD you can load in the experimental structure for tc5b as a different molecule, select it as the reference in the RMSD tool (or in cpptraj), and quantitatively compare the trajectory to that structure, instead of comparing to the structure that leap built. That is much more informative.

Copy the experimental structure (in Amber format) to your current directory, which can be found in Media:Tc5b.native.crd.txt (rename it to Tc5b.native.crd). Now load a new molecule by using the File/New Molecule menu of VMD. Make sure the top box says “Load files for New Molecule” (you do not want to load this into the molecule that we already loaded). Load the prmtop file as Amber7 parm, and then load the tc5b.native.crd into the new molecule as “Amber7 restart” format. Finally, go to the VMD main window and click on the first molecule (the one with the MD trajectory). Go to the menu and select "Molecule/Make_Top). This will make the MD molecule the active one for the MD frame slider at the bottom of the VMD main window. VMD will have set the new molecule as "top" but that's not what you want.

In the picture below I have changed the color of this structure to white to make it stand out from our simulation structure:

File:MDtutorial13.JPG

We have both molecules loaded, but now we want to compare them. Go back to the RMSD trajectory tool window. If both molecules are not listed, then click “Add all” at the bottom of the window so it adds both molecules to the RMSD calculation. Now check the box for “Selected” in the top right. This will let us select which molecule is the reference structure. We want to use the native structure as the reference, so select it by clicking on the sane in the list (it should be the bottom molecule in the list). Now click “Align” and then “RMSD” as you did before. Note that the RMSD is much larger in this example than we obtained using the initial structure as the reference. In this case the atoms are on average about 12 angstroms away from the positions they have in the native structure, even after a best-fit alignment. This means that the simulation has not found the native structure, in fact it is still very, very far away from it. Much longer simulations are needed to get this peptide to fold. Your simulation, when it finishes, should be long enough to observe folding and you should see small RMSD values. How small? Look in the paper listed at the start of the tutorial and see what sort of RMSD values we have obtained for this simulation in the past. Importantly, you won't ever see RMSD values of 0. That doesn't mean the simulation doesn't match experiment, it means that the experiment gave a single average structure at low T, and your simulation will have some thermal fluctuations and motion even when it's properly folded. In that sense, the simulation is a better model of reality than a single structure from experiment. So, go to your RMSD data and do some thinking. Does it fold properly? How can you tell? are there periods of time where the RMSD values of the snapshots are low (using the folded structure as the reference)? Does it stay there only briefly, or for long periods of time? How often is it at a low RMSD compared to high RMSD (this is comparable to an equilibrium constant for folding)? Here is where you need to think about what the numbers actually are telling you.

File:MDtutorial14.JPG

Conclusion

Congratulations! You have successfully performed MD simulation on a peptide in (implicit) water. You are now ready for the challenging (but important!) part, which is the analysis of your data. You can look at our lab's data analysis tutorials [1] or the Amber web page for more information on other methods for analysis and more advanced simulation topics. For example, a good place to start is the Histograms_and_free_energy_plots tutorial.

It is important to not stop here, but to do more analysis on this simulation to see if it reproduces all of the known features of the trpcage fold (you will need to read the original trp-cage paper to do this!). In case yours still does not fold, we have included a long 1us trajectory here. You can download these files and do further analysis. The trajectory has been split into two netcdf format files of 500ns each [2] [3]. The files are in .tar.gz format, so type "tar -xvzf md1.trj.tar.gz", "tar -xvzf md2.trj.tar.gz" to extract the files before you analyze them. As discussed above, if you are using Windows you'll need to use cpptraj to convert these to a format that VMD can read. Use a script like the one that was provided above for converting your own trajectory file. Then, analyze these long trajectories and answer the questions below.

Include a description of your simulations- did it fold? when? how did you know? Were there multiple folding events? Was folding reversible? Was folding cooperative? What structural measures of folding did you decide to use (besides the RMSD shown above), and why? What were the results? Was folding similar for different measures of structure? DOn't just answer these questions, but show the analysis that supports your answer.

Are there limitations or problems in your simulations? What are the results of analysis of the long trajectory (location given above)? How does it compare to your own results, and what does that tell you about simulations of a single molecule? How do these results compare to published work on trp-cage folding (do background reading!), especially our lab's recent paper on protein folding that included trp-cage as one of the systems? Does your work match that, or not? If not, why not? Spend time trying to understand what might have gone wrong, and testing your work.

When you complete this tutorial, submit a short report to Prof. Simmerling. Your report should be in the format of a single MS Word document (or pdf), with embedded figures that you refer to in the text using figure numbers. Figures should have descriptions under them and be placed immediately after the paragraph that refers to them. It might be helpful to also read the Guidelines for writing a research article in the Simmerling Lab.The quality of the analysis and the insight that you show in generating it are important - give it careful thought and show that you understand what you have done and that you are ready to move on to the next step.

Personal