Difference between revisions of "Running MD simulation"

From tutorial
Jump to: navigation, search
(How to use csh/tcsh)
(How to use csh/tcsh)
Line 50: Line 50:
 
Shell scripting is easy and powerful. Tcsh  is compatible with the C shell (csh).  You can create a script file using any editors available, e.g., VI,  
 
Shell scripting is easy and powerful. Tcsh  is compatible with the C shell (csh).  You can create a script file using any editors available, e.g., VI,  
  
<pre> vi testing.tcsh </pre>
+
<pre> vi testing.csh </pre>
  
 
For example, you insert the following lines into this file by using, e.g., vi
 
For example, you insert the following lines into this file by using, e.g., vi
Line 68: Line 68:
  
 
Recap the file content,  
 
Recap the file content,  
   cat testing.tcsh
+
   cat testing.csh
  
 
Now, you can type the following and see the results
 
Now, you can type the following and see the results
   csh testing.tcsh
+
   csh testing.csh
  
 
===Checking available software and loading VMD and AMBER ===
 
===Checking available software and loading VMD and AMBER ===

Revision as of 13:32, 12 October 2016

(please see click for Yang lab website)

Before starting

By now, everyone should have an account available on the CWRU HPCC, a cluster of computers designated for stable, robust performance in computing.

(Note that your user name/password on this cluster are identical to what you have for your CASEID).

A default shell of tcsh was requested for new users. If not, you can choose to change the default to tcsh on this webpage below

https://hpcmgmt.case.edu/student.

Connecting to HPCC

You may need the VPN activated in order to gain access to the CWRU HPCC (unless you are using the campus CaseWireless network. Download of a VPN client may be obtained from CWRU.

More details about the HPCC can be found at its webpage

   https://sites.google.com/a/case.edu/hpc-upgraded-cluster/home  --> Click user support on the left

A simple command from your local Cygwin (or Mac) terminal will let you connect to HPCC,

 ssh -l yourCaseID hpc1.case.edu
When prompted,
<use the same password associated with your case ID>

By now, you are not working on your own laptop any more, but a remote computer (or a cluster of computers). In this case, the frontend of this computer cluster has a hostname called hpclogin.case.edu. Once you are logged in, we can do a lot of things from now on.

All the UNIX/Linux commands are available. For example, you can try

 pwd 

showing the current working directory. Or

 ps 

showing all current running processes, including which Unix shell that is running.

All the commands we tried last time should work perfectly here:

 http://theyanglab.org/tutorial/index.php?title=Tutorial:Main_Page#Basic_UNIX.2FLINUX_commands

Knowing which UNIX shell is running

Information about which shell you are using is held in the SHELL environment variable. You can find out via the command

  echo $SHELL

displaying the value of this variable. The default for this class is csh/tcsh. You could change the shell by visiting HPCC.

Here is the output from my side,

[sxy227@hpc1 ~]$ echo $SHELL
/bin/tcsh

How to use csh/tcsh

Shell scripting is easy and powerful. Tcsh is compatible with the C shell (csh). You can create a script file using any editors available, e.g., VI,

 vi testing.csh 

For example, you insert the following lines into this file by using, e.g., vi

#!/bin/tcsh
set cdir = `pwd`
echo "Welcome to HPCC, `whoami`"
echo "************************"
echo "Your current working directory is $cdir"
echo "Your local time is `date`"

Now, you can save it and quit vi.

You can copy this file from my folder

 cp /home/sxy227/bioc430/testing.csh .

Recap the file content,

  cat testing.csh

Now, you can type the following and see the results

  csh testing.csh

Checking available software and loading VMD and AMBER

First, it is easy to check all the software packages that have been already installed

   module avail

A list of software packages should be listed (e.g., matlab and schrodinger). For example,

[sxy227@hpc1 ~/bioc430]$ module avail

----------------------------------------------------------------- /usr/local/share/modulefiles -----------------------------------------------------------------
   PEET/1.10.1              cuda/7.0.28                  imagemagick/6.9.2        namd/2015                   python/3.5.2
   R/3.2.0                  cufflinks/2.2.1              imod/4.8.43              ncurses/6.0                 qt/4.8.5                (D)
   R/3.2.3           (D)    depends                      impute/2.3.2             netcdf/4.3.3.1              qt/5.3
   SPEC/1.0                 ds3/3.0.0                    intel/12.0-ia32          netcdf-fortran/4.4.2        qt/5.4
   abinit/7.10.4            eigensoft/5.0.1              intel/12.0               neuron/7.4                  qt/5.5
   ace/6.3.0                emflex/1-J.11                intel/13.0-ia32          opencv/3.1.0                rclone/1.29
   amber/amber14            ffmpeg/2.8.2                 intel/13.0        (D)    openfoam/2.4.0       (D)    readline/6.3
   ansys/16.1        (D)    fftw/3.3.4            (D)    intel/2015               openfoam/3.0.1              relion/1.4
   ansys/17.1               fftw/2015                    ismrmrd/2015             openfoam/4.0                resmap/1.1.4
   apr/1.5.2                fiji/2.35                    iv/19                    openmpi/1.6.3               rust/1.5.0
   apr-util/1.5.4           flacs/10.5                   julia/0.4.5              openmpi/1.8.3               samtools/1.2
   armadillo/5.600.2        fltk/1.3.3                   king/1.4                 openmpi/1.8.5               scalapack/200
   atompaw/4.0.0.12         ga/5.4b                      lammps/2015              openmpi/1.8.8        (D)    schrodinger/2015-3
   bc/2015                  gadgetron/3.8.2              lapack/3.4.2             openmpi/1.10.2              scotch/6.0.3
   bcftools/1.3             gatk/3.6                     lapack/3.5.0      (D)    openmpi-gnu/1.8.5           shapeit/v2.r790
   bigdft/1.7.7             gaussian/09                  libxc/2.2.2              pandoc/1.13.1               siemens_to_ismrmrd/2015
   blas/2015                gaussview/5.0                libxml2/2.9.2            paraview/4.4.0              siesta/3.2
   boost/1_58_0             gcc/4.7.3                    libxslt/1.1.28           paris/1.1.3                 slurm/14.11.5
   bowtie/1.1.2             gcc/4.9.3             (D)    ls-dyna/r8               pedsys/2.0                  slurm-drmaa/1.0.7
   bowtie/2.2.6      (D)    gdc/1.0.1             (D)    magma/1.7.0              perl/5.18.2                 sqlite/3.8.11.1
   bsoft/1.9.0              gdc/1.1.0                    mathematica/10.2         perl/5.22.0          (D)    texlive/2016
   bsoft/1.9.2       (D)    genetorrent/3.8.7            matlab/R2014b            pgi/15.10                   tophat/2.1.0
   bwa/0.7.12               git/2.4.8                    matlab/R2015b     (D)    phylip/3.696                torch/2015
   caffe/2015               globus/0.4.0                 mercurial/3.6.1          picard/2.6.0                vasp/4.6
   cblas/2015               gnuplot/5.0.3                merlin/1.1.2             pigz/2.3.3                  visit/2.10.0
   charm/6.6.1              graphicsmagick/1.3.24        mitoseek/1.3             plato/2.0.0                 vmd/1.9.1               (D)
   charmm/c35b3             gromacs/5.1                  mothur/1.36.1            plink/1.07                  vmd/1.9.2
   chimera/1.10.2           gsl/1.16                     motioncorr/2.1           plink/1.9            (D)    voro++/0.4.6
   circos/0.69              gstreamer/0.10.36            mpfr/3.1.2               probabel/0.4.5              vtk/6.3.0
   clhep/2.2.0.8            hdf5/1.8.11           (D)    mpich/1.2.7p1            pycasp/2015                 xerces/3.1.1
   ctffind/4.0.17           hdf5/1.8.15                  namd/2.10-cuda           python/2.7.8         (D)    xqilla/2.3.2
   cuda/5.5.22       (D)    i/1.0.0                      namd/2.10         (D)    python/2.7.10               yaml/0.1.5

VMD and AMBER should be there as well. These packages are installed at a system level, but you have to load/add a specific package under your own account first before you start to use it. Today, we are going to use AMBER.

AMBER may be pre-loaded into your account. You can check this by

   echo $AMBERHOME

It will show the location of all AMBER-related files. If not, run this command

    module load amber

or

    module add amber

You can check again after loading by

   echo $AMBERHOME

The output would be something like

  /usr/local/amber/amber14

This shows the default Amber version14 is loaded and ready to go.

Of course, you can do the same for VMD by

    module load vmd

Keep in mind that this VMD copy is loaded on the cluster, which is different from the one you have on your laptop.

Also, if AMBER is not pre-loaded, you can add this line into your .cshrc file under your home directory by

     vi ~/.cshrc

Simulation Preparation

Choice of an MD integrator

There are many MD software packages. Here, AMBER is used as a tool of choice. One key advantage: Very easy to learn with quite user-friendly manuals and a lot of great tutorials are available at

    AmberMD.org

This tutorial is to provide a simple example for beginners with no prior experience of MD simulations to run molecular dynamics simulations.

The current AMBER installed is version 14. Its manual can be found at

   http://ambermd.org/doc12/Amber14.pdf

Model system: Estrogen Receptor

In this example, we will use part of the estrogen receptor (i.e., the estrogen-binding domain), a key target of drug discovery in breast cancer, as an example.

Or, you can copy this PDB file to your own folder (at your current dir) by

  cp -r /home/sxy227/bioc430/ebd.pdb .

Note that the "." is needed to stand for your current directory.

In the case that you download your own PDB file, before proceeding to the next step, now is a good time to use your favorite text editor to check for a few things:

1.) Are crystal waters present in the PDB file? 
It may be a good idea to delete the waters now and allow the Amber solvation tool to fill them back in later.

2.) Are there any ligands or non-standard residues present in the PDB file? 
Without going into too much detail, non-standard residues are okay 
so long as the residue name and atom names are well defined in the force field you choose.

3.) Are there any residues with missing atoms in the PDB file? 
If so, you would need to take extra preparation steps beforehand to fix the broken residues.

4.) Are there any residues with duplicate coordinates in the PDB file? 
If there were, you would need to decide which one to keep  before continuing.

AMBER Coordinate and Topology Files

In order to run an MD simulation, four basic files are needed. Two of them are:

1.    xxx.prmtop - describing the parameter (ff) and topology of the molecules in the system

2.    xxx.prmcrd - describing the initial molecular coordinates of the system

How to generate these files are described next.

How to generate topology and coordinate files (.prmtop and .prmcrd)

There are two options. One is a series of interactive line-by-line commands and the other is a batch mode by putting all command lines into a single file and running them all at once in background. When you get familiar, you may prefer the latter. Here for demonstration, we will go the former and show the latter afterwards.

Let us create a folder under your home directory

  mkdir testing

Enter this new directory

  cd testing

Now you do try some of other unix commands such as

  ls 

or

  pwd

For sure, you can go back to the parent directory

  cd ..

Or go back to your home directory

  cd ~

Before we start, you can copy your PDB file into the current working directory. I also provide a sample PDB file under my directory. If you want to use this PDB, type the following and you should have it in your current directory

  cp /home/sxy227/class2015/ebd.pdb .

To generate these two *.prmtop and *.prmcrd files, we will use a tool within Amber, called leap. There are two versions: graphic (xleap) and text-mode (tleap). It may be easier to go with tleap by typing

  tleap

Now you should see a welcome message. You can exit anytime by typing

   quit

Load the Energy Function (aka force field) related parameters that describe how atoms of this protein should interact with each other, after tleap is activated,

  tleap 

Now you can load (source) the FF14SB force field by

  source leaprc.ff14SB

Next, we are going to load the PDB into tleap and give this protein a name, ER

  ER = loadpdb ebd.pdb

Note that ebd.pdb is the file you copied from my folder, which will be used for the rest of demonstration.

We would like to soak the protein into a water box so we are getting as close as possible to reality,

  solvateBox ER TIP3PBOX 10.0

Here, TIP3PBOX is a type of water box used. The value 10.0 means that the molecule should have a layer of water molecules and the thickness of this layer is 10 Angstroms.

Now you can save the prmtop and prmcrd files to the current working directory via

  saveAmberParm ER ebd.prmtop ebd.prmcrd

You may also want to save the protein-water system in a PDB format (e.g., you can load it to vmd later and take a look later)

  savepdb ER ebd_wat.pdb

Now you quit tleap by

  quit

The ebd.prmtop and ebd.prmcrd files (as well as ebd_wat.pdb) should be in your working directory,

  ls

Above is a demonstration to show you in a step-by-step fashion. In practice, you put all these lines into a single file, say, named tleap.in containing these lines:

  source leaprc.ff14SB
  ER = loadpdb ebd.pdb
  solvateBox ER TIP3PBOX 10.0
  saveAmberParm ER ebd.prmtop ebd.prmcrd
  savepdb ER ebd_wat.pdb
  quit

You can also copy this file from my folder by

  cp /home/sxy227/bioc430/tleap.in .

You should be able to repeat all above calculations by running

  tleap -s -f tleap.in

Note that tleap is part of the Amber package loaded into your account. Now you should have both *.prmtop and *.prmcrd files in your folder; these two files are needed for next steps. It also generates a log file (*.log) for your reference.

Now, it is a good time to take a close look if you see any error message. If any, you would need to fix it before continuing.

AMBER coordinate (.prmcrd) file

It looks like this


 36498
  41.6854588  56.7349139  18.2331679  42.1312793  56.4253467  19.0849385
  42.2856754  57.4082757  17.7788286  41.3590576  55.9254492  17.7249074
  40.4834588  57.4729139  18.5941679  40.5267827  58.4925893  18.2114351
  39.2594588  56.7599139  17.9661679  39.4788534  56.9337902  16.9127296
  39.4755409  55.7209498  18.2150897  37.7684588  57.0539139  18.2361679
  37.6637358  58.0966426  18.5359191  36.9284588  56.7969139  16.9811679
...

Topology (.prmtop) file

It begins with

%VERSION  VERSION_STAMP = V0001.000  DATE = 11/09/15  13:30:17
%FLAG TITLE
%FORMAT(20a4)

%FLAG POINTERS
%FORMAT(10I8)
   36498      16   34537    1999    4596    2703    8792    6771       0       0
   65349   11085    1999    2703    6771      43      89      42      30       1
       0       0       0       0       0       0       0       1      24       0
       0
%FLAG ATOM_NAME
%FORMAT(20a4)
N   H1  H2  H3  CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C
O   N   H   CA  HA  CB  HB1 HB2 HB3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG
CD1 HD11HD12HD13CD2 HD21HD22HD23C   O   N   H   CA  HA  CB  HB2 HB3 OG  HG  C
O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C   
...

The protein soaked in a water box

The file looks something like

ATOM      1  N   LEU     1       4.615  14.388 -18.018  1.00  0.00
ATOM      2  H1  LEU     1       5.061  14.079 -17.166  1.00  0.00
ATOM      3  H2  LEU     1       5.215  15.062 -18.472  1.00  0.00
ATOM      4  H3  LEU     1       4.289  13.579 -18.526  1.00  0.00
ATOM      5  CA  LEU     1       3.413  15.126 -17.657  1.00  0.00
ATOM      6  HA  LEU     1       3.456  16.146 -18.040  1.00  0.00
ATOM      7  CB  LEU     1       2.189  14.413 -18.285  1.00  0.00
ATOM      8  HB2 LEU     1       2.408  14.587 -19.338  1.00  0.00
ATOM      9  HB3 LEU     1       2.405  13.374 -18.036  1.00  0.00
ATOM     10  CG  LEU     1       0.698  14.707 -18.015  1.00  0.00
ATOM     11  HG  LEU     1       0.593  15.750 -17.715  1.00  0.00
ATOM     12  CD1 LEU     1      -0.142  14.450 -19.270  1.00  0.00
ATOM     13 HD11 LEU     1      -0.038  13.408 -19.570  1.00  0.00
ATOM     14 HD12 LEU     1      -1.189  14.664 -19.057  1.00  0.00
ATOM     15 HD13 LEU     1       0.203  15.096 -20.077  1.00  0.00
ATOM     16  CD2 LEU     1       0.203  13.825 -16.869  1.00  0.00
ATOM     17 HD21 LEU     1       0.783  14.037 -15.971  1.00  0.00
ATOM     18 HD22 LEU     1      -0.850  14.033 -16.679  1.00  0.00
ATOM     19 HD23 LEU     1       0.323  12.776 -17.139  1.00  0.00
ATOM     20  C   LEU     1       3.350  15.138 -16.123  1.00  0.00
ATOM     21  O   LEU     1       2.321  15.454 -15.527  1.00  0.00
ATOM     22  N   ALA     2       4.476  14.815 -15.492  1.00  0.00
ATOM     23  H   ALA     2       5.284  14.549 -16.037  1.00  0.00
...
...
...
ATOM  36493  O   WAT  11084    -32.822 -32.435 -34.462  1.00  0.00
ATOM  36494  H1  WAT  11084    -33.451 -31.967 -33.913  1.00  0.00
ATOM  36495  H2  WAT  11084    -32.708 -33.281 -34.027  1.00  0.00
TER
ATOM  36496  O   WAT  11085    -21.493 -29.257 -19.193  1.00  0.00
ATOM  36497  H1  WAT  11085    -20.701 -29.752 -19.401  1.00  0.00
ATOM  36498  H2  WAT  11085    -21.200 -28.578 -18.586  1.00  0.00
TER
END

You can transfer ebd_wat.pdb to your local laptop and use VMD to take a look. From your local Cygwin terminal,

  scp yourCaseId@hpc1.case.edu:/home/yourCaseID/testing/ebd_wat.pdb .

There are other methods for transferring as well; see HPCC. Some may find the Globus Online very easy to use

   https://sites.google.com/a/case.edu/hpc-upgraded-cluster/hpcc-news/transferfileswithaclick

Once the file is transferred to your local PC, you can use your local copy VMD to look at it. Does it have a lot of water molecules? If yes, we are ready to move on.

Preparing Amber MD input files

Lastly, input files that define the program settings for each MD run, as follows

3.   min.inp - describing the setting parameters to setup the Amber energy minimization 
4.   md.inp  - describing the setting parameters to setup the Amber MD integrator


In general, for each system, it is typical to follow a 3-step procedure by performing an energy minimization on the system, then slowly heating the system, and then carrying out production MD at the desired temperature and pressure.

   Step 1.    Minimization
   Step 1.5   Gradually heating from 0K to 300K with constant volume (NVT)
   Step 2.    Production MD with constant pressure and temperature (NPT) at 300K and 1atm 

For some simple, stable systems, I found over the years that the second step may not make a big difference. Hence, for simplicity here, we will skip the 2nd step and proceed with the 1st (mininization) and the last (MD) steps.

The integrator

The Amber MD integrator is called sander, a numerical solver for the Newton Equation. To ask sander to perform MD simulations, we have to give specific commands in a file named md.in (or any other name you would prefer). Ideally, we could start MD right away. However, when we prepare the initial PDB files, some atoms (such as Hydrogen) may be slightly misplaced in the 3D space, which can cause the instability of MD simulations. For this reason, we often start with some simple energy minimization (at a minimal level). In other words, two input files are needed here: one for minimization (min.in) and the other for MD (md.in), both of which can be triggered via the sander interface.

You can find the usage by

  sander -h

Input file for minimization

Let us first create the file min.in that includes the following settings for minimization.

# minimization
 &cntrl
  imin   = 1,
  ntx    = 1,
  irest  = 0,
  maxcyc = 1000,
  ncyc   = 100,
  ntpr   = 100,
  ntwx   = 0,
  cut    = 12,
/

Note that maxcyc=1000 means that we will have 1000 steps of minimization. More details of their settings can be found in the manual.

The settings can be summarized as follows:

imin=1 	        choose a minimization run
ntx=1   	read coordinates but not velocities from ASCII formatted inpcrd coordinate file
irest=0         do not restart simulation. (not applicable to minimization)
maxcyc=1000     maximum cycles of minimization
ncyc=100 	The steepest descent algorithm for the first 0-ncyc cycles, then switches the conjugate gradient algorithm for ncyc-maxcyc cycles
ntpr=100        frequency to print to the Amber mdout output file every ntpr cycles
ntwx=0          no Amber mdcrd trajectory file written (not applicable to minimization)
cut=12          nonbonded cutoff distance in Angstroms (for PME, limit of the direct space sum - do NOT reduce this below 8.0. Higher numbers give slightly better accuracy but at vastly increased computational cost.)

You can copy this file from my folder

  cp /home/sxy227/bioc430/min.in .

Input file for MD

The production time of this simulation is only 10ps. Ideally, we would run this simulation for much longer, but in the interest of time for this on-the-fly tutorial, we have limited the final production simulation time.

Below is the example for md.in

# eq-MD with NVT
 &cntrl
  imin=0,
  ntx=1,
  irest=0,
  nstlim=5000,
  dt=0.002,
  ntf=2,
  ntc=2,
  temp0=300.0,
  ntpr=500,
  ntwx=500,
  cut=12.0,
  ntb=2,
  ntp=0,
  ntt=3,
  gamma_ln=2.0,
  ig=-1,
/


The settings for production can be summarized as follows:

imin=0        Start MD, not minimization
ntx=1         Read coordinates (but no velocities) from the coordinate file
irest=0	      NO restart from a previous MD run
nstlim=5000   A total of 5000 MD steps, resulting 5000*0.002 ps = 10 ps
dt=0.002      the timestep size is 0.002 ps  or 2 fs
temp0=300.0   Thermostat temperature. Run at 300K
ntwx=500      write the output to an Amber trajectory file (ebd.mdcrd) every 500 steps or every 1ps
ntb=2         Use periodic boundary conditions with constant pressure
ntp=1         md with isotropic position scaling for constant pressure (NPT)
ntt=3         Use Langevin dynamics with the collision frequency


The value of nstlim is the number is MD steps. Here, it was set at 5000, which means we will run MD for a total of 5000*0.002ps = 10ps. It is quite short by today's standard. For your own protein system, you run longer, say 100ps. In other words, you can change the values of both nstlim and ntwx for for your own need.

To speed up the calculations, I used a much smaller cut value reduced from 12 to 8 angstroms. This is just for demonstration only; For realistic applications, a cut value of 12 is recommended.

Submitting your computing job

There are more details about job submission at HPCC. Below is a simple working example, which will include both the energy minimization and the MD steps.

An example is given below. It begins with a so-called submitting script file, say, submit.csh with the following content.

#!/bin/csh
#PBS -N test
#PBS -l nodes=1:ppn=12
#PBS -l walltime=10:00:00
#PBS -l mem=4gb
#PBS -j oe

module load amber
set p = ebd

# set $np to have the current number of processors
set PBS_NODEFILE=`generate_pbs_nodefile`
setenv np `wc -l $PBS_NODEFILE | awk '{print $1}'`
#
echo '========================================'

# change to the working directory
cd $SLURM_SUBMIT_DIR

# start the 1st-step: energy minimization
mpirun -n $np sander.MPI -i min.in -o ${p}.min.out -p ${p}.prmtop -c ${p}.prmcrd -r ${p}.min.rst

# start the 2nd-step: MD
mpirun -n $np sander.MPI -i md.in -o ${p}.md.out -p ${p}.prmtop -c ${p}.min.rst -r ${p}.md.rst -x ${p}.mdcrd

The beginning region of this submit.csh file is to request a total of 12 processors from 1 node.

You can copy this file from my folder

  cp /home/sxy227/bioc430/submit.csh .

Now, we are READY to ask the cluster to do the calculations via

  qsub submit.csh

By now, the request has been made to the cluster. The computing may start right away or have to wait in the line for a while depending on the availability.

You can check the status of your just-submitted computing job via

   qstat -u yourCaseID

If your job starts, the status column S is marked as R meaning it is Running.

Just in case you want to cancel it, you may do so by

   qdel yourjobid

Here, yourjobid can be found out by running the previous qstat command. For example,

   qdel 4161073

Note that a different job-submitting method is being implemented as well. See more at

 https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands

Waiting for job to finish

Once the job is submitted, you can log out and take a break. Even if you log out, the job is running (or waiting to start) at the cluster. Nice!

Once the computing starts, it takes about tens of minutes or so to finish.

Specific questions about cluster operations should be sent to hpc-support@case.edu, but do expect to wait for some time to get response since the cluster is a mixed use of teaching and research.

Output MD files

Once finished, you can log in back and check all the flies generated via

 ls

There should be several new files, including

ebd.min.out 
ebd.md.out

These two files detail most of the information about the simulation itself you need to know.

Another key file is
ebd.mdcrd

It is the so-called MD trajectory file containing the xyz coordinates of each atom at each saved timepoint during the simulations.

Now you can transfer this MD trajectory file to your own computer, e.g., by scp. There are other methods for transferring as well; see HPCC as mentioned previously.

Visualizing AMBER trajectories

Transferring trajectory files

VMD is installed on HPCC as well as on your laptop. It is possible to use VMD on HPCC, but it may be very slow to use since a lot of graphic information are transferred from HPCC to your laptop. In addition, if there are a lot of users using VMD at the same time (via the frontend), it may crash the login node. For this reason, it is recommended and required to use the VMD copy on your laptop. The only thing you need to do is transferring the files to your laptop. You can do this from your Cygwin terminal using the command scp,

scp yourCaseID@hpclogin.case.edu:yourdirpath/filename .

More information on how to use scp can be obtained

man scp

There are other methods for transferring as well; see HPCC

Loading into VMD

So far, we should have two files on your own computers: ebd_wat.pdb and ebd.mdcrd. They are good to go for VMD visualization. First, we can load the PDB file as demonstrated previously. For VMD, you should see two windows: Main and Display.

           Click the PDB file in the Main window

You will see it highlighted. Next, we will load the MD trajectory in the Main window,

   go to File -> Load Data into Molecule  -> Browse -> Select ebd.mdcrd 
              -> File Type: Amber with PBox -> Load

Now you should be able to see the whole MD trajectory by moving the bottom bar back and forth.

Also, there are many different visualization options. For example,

 go to Graphics -> Representations -> Create Rep 
                -> Selected Atom: protein -> Coloring method: Secondary structure 
                -> Drawing method: New Cartoon

You can double-click each Rep to show or not show the current representation

Use VMD to make a movie

A lot of things we can do with VMD visualization

  • Take a look to get a feel
  • Get a sense about overall flexibility (more quantitative analyses will come later)
  • Generate a molecular movie, e.g.,
 go to Extensions -> Visualization -> Movie Maker -> Make Movie

Now you should have a movie file for you to watch. Enjoy!

Post-simulation data analysis

Here, two specific examples are demonstrated to give you a feeling of how it works.

A simple RMSD calculation

A root-mean-square deviation (RMSD) calculation is a standard practice. We have two options to do the calculation.

  • A simple graphic interface. In the VMD main window,
  go to Extensions -> Analysis -> RMSD visualizer tool -> ALIGN -> RMSD -> Plot result. 

Note that ALIGN is required, even if you can picked any reference frame as you want. Otherwise, you may not get what you wanted.

  • A text/script mode. This can be in a script as well. Here is an example, but beyond today's demonstration.

Flexibility via RMSF

We are going to use another tool within AMBER, called ptraj. It is very useful and quite easy to use. Here, an example is given to show how to access the structural flexibility from MD trajectories for each amino acid. We will compute the atomic positional fluctuations for all the atoms. Here, the output is performed only for Ca atoms in mask. This positional fluctuation of RMSF is related to the so-called B-factors, which means multiplying the squared fluctuations by (8/3)pi**2.

For example, you can make a file, say, rmsf.csh containing

#!/bin/csh

set p = ebd

cat << eof >! ptraj.in
trajin $p.mdcrd
average avg.pdb pdb
atomicfluct out ${p}RMSF.dat :3-247@CA byres
eof

ptraj  $p.prmtop  ptraj.in  > ptraj.log

Now you run this calculation via

  csh rmsf.csh 

The output file is ebdRMSF.dat, which is a two-column file (residue index vs. RMSF). You can transfer/plot this data file using your favorite plotting software.

Clustering analysis: Selection of molecular poses for docking

This is a slightly advanced topic. There are several ways to pick molecular structures (or poses) from simulation trajectories.

  • A simple way is to save a pose of choice from VMD trajectory (based on your prior knowledge). You can do this via the VMD main window,
go to File -> Save Coordinate -> Selection: not water 
           -> First: [frame number X] -> Last [frame number X] -> Save

where X is the frame number you want to save.

  • Use all the major poses that have been simulated so far. The identification of such major poses can be achieved via a so-called clustering method, where similar structures are grouped into a same cluster. There are quite a few metrics that can be used for such clustering. Today, we are going to use RMSD, which is essentially a measure of the similarity between two structures or poses.

To get an idea, you can copy a sample file from my folder

cp /home/sxy227/class2015/cluster.csh . 

More details about its usage can be found in the user mannual ($AMBERHOME/doc/AmberTools.pdf; page 117)

Then, you can start the clustering via

csh cluster.csh

where a number of 5 poses or clusters are requested. Since we ask for 5 poses, there should be a list of 5 files under your working directory, e.g.,

ls 

showing the filenames: ebd_cluster.rep.c0, ebd_cluster.rep.c1, ..., ebd_cluster.rep.c4.

Now, you can download them into your own computer and take a look. They are in the format of PDB, so you can rename them with a PDB extension name. If you want to use them for docking, you can get rid of all the water molecules for sure.

One quick thing you can do is to load all 5 poses together into VMD and take a look at their difference, in a similar way you loaded MD trajectory. Do you see a big difference between/among these poses? If not, why? How long did you simulate? Perhaps too short to sample any large changes? Just some hints to make sense your results.

For docking, you can cherry-pick any pose, presumably based on available experimental knowledge. Or you can perform the docking workflow on all of them given the relatively lost cost of virtual docking. In that case, you can simply repeat what you had learned from your previous docking projects.

PCA

PCA

Summary and Outlooks for Computer-aided drug discovery (CADD)

Congratulations! You've now run your first MD simulation, make your molecular movie, and analyzed the results. This is a simplest example of running a MD simulation that I can imagine. If you want to learn more you are encouraged to visit the AMBER website (or any other simulation website).

The molecular structures generated from MD simulations have been successfully used in the past for CADD such as docking. For this reason, it makes sense to learn both docking and MD simulations together for this class and for your project of interest. No need to repeat the significance of MD simulations for CADD here. Now, let us take a break!