Running MD simulation

From tutorial
(Redirected from MD)
Jump to: navigation, search

(please see click for Yang lab website)

Contents

Before starting

By now, everyone should have an account available on the CWRU HPCC, a cluster of computers designated for stable, robust performance in computing.

(Note that your user name/password on this cluster are identical to what you have for your CASEID).

A default shell of tcsh was requested for new users. If not, you can choose to change the default to tcsh on this webpage below

https://hpcmgmt.case.edu/student.

Connecting to HPCC

You may need the VPN activated in order to gain access to the CWRU HPCC (unless you are using the campus CaseWireless network. Download of a VPN client may be obtained from CWRU.

More details about the HPCC can be found at its webpage

   https://sites.google.com/a/case.edu/hpc-upgraded-cluster/home  --> Click user support on the left

A simple command from your local Cygwin (or Mac) terminal will let you connect to HPCC,


     ssh -l yourCaseID hpc1.case.edu


When prompted,
<use the same password associated with your case ID>

By now, you are not working on your own laptop any more, but a remote computer (or a cluster of computers). In this case, the frontend of this computer cluster has a hostname called hpclogin.case.edu. Once you are logged in, we can do a lot of things from now on.

All the UNIX/Linux commands are available. For example, you can try

 pwd 

showing the current working directory. Or

 ps 

showing all current running processes, including which Unix shell that is running.

All the commands we tried last time should work perfectly here:

 http://theyanglab.org/tutorial/index.php?title=Tutorial:Main_Page#Basic_UNIX.2FLINUX_commands

Knowing which UNIX shell is running

Information about which shell you are using is held in the SHELL environment variable. You can find out via the command

  echo $SHELL

displaying the value of this variable. The default for this class is csh/tcsh. You could change the shell by visiting HPCC.

Here is the output from my side,

[sxy227@hpc1 ~]$ echo $SHELL
/bin/tcsh

How to use csh/tcsh

Shell scripting is easy and powerful. Tcsh is compatible with the C shell (csh). You can create a script file using any editors available, e.g., VI,

 vi testing.csh 

For example, you insert the following lines into this file by using, e.g., vi

#!/bin/tcsh
set cdir = `pwd`
echo "Welcome to HPCC, `whoami`"
echo "************************"
echo "Your current working directory is $cdir"
echo "Your local time is `date`"

Now, you can save it and quit vi.

You can copy this file from my folder

 cp /home/sxy227/bioc430/testing.csh .

Recap the file content,

  cat testing.csh

Now, you can type the following and see the results

  csh testing.csh

Checking available software and loading VMD and AMBER

First, it is easy to check all the software packages that have been already installed

    module avail


A list of software packages should be listed (e.g., matlab and schrodinger). For example,

[sxy227@hpc1 ~/bioc430]$ module avail

----------------------------------------------------------------- /usr/local/share/modulefiles -----------------------------------------------------------------
   PEET/1.10.1              cuda/7.0.28                  imagemagick/6.9.2        namd/2015                   python/3.5.2
   R/3.2.0                  cufflinks/2.2.1              imod/4.8.43              ncurses/6.0                 qt/4.8.5                (D)
   R/3.2.3           (D)    depends                      impute/2.3.2             netcdf/4.3.3.1              qt/5.3
   SPEC/1.0                 ds3/3.0.0                    intel/12.0-ia32          netcdf-fortran/4.4.2        qt/5.4
   abinit/7.10.4            eigensoft/5.0.1              intel/12.0               neuron/7.4                  qt/5.5
   ace/6.3.0                emflex/1-J.11                intel/13.0-ia32          opencv/3.1.0                rclone/1.29
   amber/amber14            ffmpeg/2.8.2                 intel/13.0        (D)    openfoam/2.4.0       (D)    readline/6.3
   ansys/16.1        (D)    fftw/3.3.4            (D)    intel/2015               openfoam/3.0.1              relion/1.4
   ansys/17.1               fftw/2015                    ismrmrd/2015             openfoam/4.0                resmap/1.1.4
   apr/1.5.2                fiji/2.35                    iv/19                    openmpi/1.6.3               rust/1.5.0
   apr-util/1.5.4           flacs/10.5                   julia/0.4.5              openmpi/1.8.3               samtools/1.2
   armadillo/5.600.2        fltk/1.3.3                   king/1.4                 openmpi/1.8.5               scalapack/200
   atompaw/4.0.0.12         ga/5.4b                      lammps/2015              openmpi/1.8.8        (D)    schrodinger/2015-3
   bc/2015                  gadgetron/3.8.2              lapack/3.4.2             openmpi/1.10.2              scotch/6.0.3
   bcftools/1.3             gatk/3.6                     lapack/3.5.0      (D)    openmpi-gnu/1.8.5           shapeit/v2.r790
   bigdft/1.7.7             gaussian/09                  libxc/2.2.2              pandoc/1.13.1               siemens_to_ismrmrd/2015
   blas/2015                gaussview/5.0                libxml2/2.9.2            paraview/4.4.0              siesta/3.2
   boost/1_58_0             gcc/4.7.3                    libxslt/1.1.28           paris/1.1.3                 slurm/14.11.5
   bowtie/1.1.2             gcc/4.9.3             (D)    ls-dyna/r8               pedsys/2.0                  slurm-drmaa/1.0.7
   bowtie/2.2.6      (D)    gdc/1.0.1             (D)    magma/1.7.0              perl/5.18.2                 sqlite/3.8.11.1
   bsoft/1.9.0              gdc/1.1.0                    mathematica/10.2         perl/5.22.0          (D)    texlive/2016
   bsoft/1.9.2       (D)    genetorrent/3.8.7            matlab/R2014b            pgi/15.10                   tophat/2.1.0
   bwa/0.7.12               git/2.4.8                    matlab/R2015b     (D)    phylip/3.696                torch/2015
   caffe/2015               globus/0.4.0                 mercurial/3.6.1          picard/2.6.0                vasp/4.6
   cblas/2015               gnuplot/5.0.3                merlin/1.1.2             pigz/2.3.3                  visit/2.10.0
   charm/6.6.1              graphicsmagick/1.3.24        mitoseek/1.3             plato/2.0.0                 vmd/1.9.1               (D)
   charmm/c35b3             gromacs/5.1                  mothur/1.36.1            plink/1.07                  vmd/1.9.2
   chimera/1.10.2           gsl/1.16                     motioncorr/2.1           plink/1.9            (D)    voro++/0.4.6
   circos/0.69              gstreamer/0.10.36            mpfr/3.1.2               probabel/0.4.5              vtk/6.3.0
   clhep/2.2.0.8            hdf5/1.8.11           (D)    mpich/1.2.7p1            pycasp/2015                 xerces/3.1.1
   ctffind/4.0.17           hdf5/1.8.15                  namd/2.10-cuda           python/2.7.8         (D)    xqilla/2.3.2
   cuda/5.5.22       (D)    i/1.0.0                      namd/2.10         (D)    python/2.7.10               yaml/0.1.5

VMD and AMBER should be there as well. These packages are installed at a system level, but you have to load/add a specific package under your own account first before you start to use it. Today, we are going to use AMBER.

AMBER may be pre-loaded into your account. You can check this by

   echo $AMBERHOME

It will show the location of all AMBER-related files. If not, run this command

   module load amber 
  

or

    module add amber

You can check again after loading by

   echo $AMBERHOME

The output would be something like

  /usr/local/amber/amber14

This shows the default Amber version14 is loaded and ready to go.

Of course, you can do the same for VMD by

   module load vmd

Keep in mind that this VMD copy is loaded on the cluster, which is different from the one you have on your laptop.

Also, if AMBER is not pre-loaded, you can add this line into your .cshrc file under your home directory by

     vi ~/.cshrc

Simulation Preparation

Choice of an MD integrator

There are many MD software packages. Here, AMBER is used as a tool of choice. One key advantage: Very easy to learn with quite user-friendly manuals and a lot of great tutorials are available at

    AmberMD.org

This tutorial is to provide a simple example for beginners with no prior experience of MD simulations to run molecular dynamics simulations.

The current AMBER installed is version 14. Its manual can be found at

   http://ambermd.org/doc12/Amber14.pdf

Model system: Estrogen Receptor

In this example, we will use part of the estrogen receptor (i.e., the estrogen-binding domain), a key target of drug discovery in breast cancer, as an example.

Or, you can copy this PDB file to your own folder (at your current dir) by

  cp -r /home/sxy227/bioc430/ebd.pdb .

Note that the "." is needed to stand for your current directory.

In the case that you download your own PDB file, before proceeding to the next step, now is a good time to use your favorite text editor to check for a few things:

1.) Are crystal waters present in the PDB file? 
It may be a good idea to delete the waters now and allow the Amber solvation tool to fill them back in later.

2.) Are there any ligands or non-standard residues present in the PDB file? 
Without going into too much detail, non-standard residues are okay 
so long as the residue name and atom names are well defined in the force field you choose.

3.) Are there any residues with missing atoms in the PDB file? 
If so, you would need to take extra preparation steps beforehand to fix the broken residues.

4.) Are there any residues with duplicate coordinates in the PDB file? 
If there were, you would need to decide which one to keep  before continuing.

AMBER Coordinate and Topology Files

In order to run an MD simulation, four basic files are needed. Two of them are:

1.    xxx.prmtop - describing the parameter (ff) and topology of the molecules in the system

2.    xxx.prmcrd - describing the initial molecular coordinates of the system

How to generate these files are described next.

How to generate topology and coordinate files (.prmtop and .prmcrd)

There are two options. One is a series of interactive line-by-line commands and the other is a batch mode by putting all command lines into a single file and running them all at once in background. When you get familiar, you may prefer the latter. Here for demonstration, we will go the former and show the latter afterwards.

Let us create a folder under your home directory

  mkdir testing

Enter this new directory

  cd testing

Now you do try some of other unix commands such as

  ls 

or

  pwd

For sure, you can go back to the parent directory

  cd ..

Or go back to your home directory

  cd ~

Before we start, you can copy your PDB file into the current working directory. I also provide a sample PDB file under my directory. If you want to use this PDB, type the following and you should have it in your current directory

  
  cp /home/sxy227/class2015/ebd.pdb .

To generate these two *.prmtop and *.prmcrd files, we will use a tool within Amber, called leap. There are two versions: graphic (xleap) and text-mode (tleap). It may be easier to go with tleap by typing

  tleap

Now you should see a welcome message. You can exit anytime by typing

   quit

Load the Energy Function (aka force field) related parameters that describe how atoms of this protein should interact with each other, after tleap is activated,

  tleap 

Now you can load (source) the FF14SB force field by

  source leaprc.ff14SB

Next, we are going to load the PDB into tleap and give this protein a name, ER

  ER = loadpdb ebd.pdb

Note that ebd.pdb is the file you copied from my folder, which will be used for the rest of demonstration.

We would like to soak the protein into a water box so we are getting as close as possible to reality,

  solvateBox ER TIP3PBOX 10.0

Here, TIP3PBOX is a type of water box used. The value 10.0 means that the molecule should have a layer of water molecules and the thickness of this layer is 10 Angstroms.

Now you can save the prmtop and prmcrd files to the current working directory via

  saveAmberParm ER ebd.prmtop ebd.prmcrd

You may also want to save the protein-water system in a PDB format (e.g., you can load it to vmd later and take a look later)

  savepdb ER ebd_wat.pdb

Now you quit tleap by

  quit

The ebd.prmtop and ebd.prmcrd files (as well as ebd_wat.pdb) should be in your working directory,

  ls

Above is a demonstration to show you in a step-by-step fashion. In practice, you put all these lines into a single file, say, named tleap.in containing these lines:

  source leaprc.ff14SB
  ER = loadpdb ebd.pdb
  solvateBox ER TIP3PBOX 10.0
  saveAmberParm ER ebd.prmtop ebd.prmcrd
  savepdb ER ebd_wat.pdb
  quit

You can also copy this file from my folder by

      cp /home/sxy227/bioc430/tleap.in . 

You should be able to repeat all above calculations by running

      tleap -s -f tleap.in 

Note that tleap is part of the Amber package loaded into your account. Now you should have both *.prmtop and *.prmcrd files in your folder by

      ls

These two files are needed for next steps. It also generates a log file (*.log) for your reference.

Now, it is a good time to take a close look if you see any error message. If any, you would need to fix it before continuing.

AMBER coordinate (.prmcrd) file

It looks like this


 36498
  41.6854588  56.7349139  18.2331679  42.1312793  56.4253467  19.0849385
  42.2856754  57.4082757  17.7788286  41.3590576  55.9254492  17.7249074
  40.4834588  57.4729139  18.5941679  40.5267827  58.4925893  18.2114351
  39.2594588  56.7599139  17.9661679  39.4788534  56.9337902  16.9127296
  39.4755409  55.7209498  18.2150897  37.7684588  57.0539139  18.2361679
  37.6637358  58.0966426  18.5359191  36.9284588  56.7969139  16.9811679
...

Topology (.prmtop) file

It begins with

%VERSION  VERSION_STAMP = V0001.000  DATE = 11/09/15  13:30:17
%FLAG TITLE
%FORMAT(20a4)

%FLAG POINTERS
%FORMAT(10I8)
   36498      16   34537    1999    4596    2703    8792    6771       0       0
   65349   11085    1999    2703    6771      43      89      42      30       1
       0       0       0       0       0       0       0       1      24       0
       0
%FLAG ATOM_NAME
%FORMAT(20a4)
N   H1  H2  H3  CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C
O   N   H   CA  HA  CB  HB1 HB2 HB3 C   O   N   H   CA  HA  CB  HB2 HB3 CG  HG
CD1 HD11HD12HD13CD2 HD21HD22HD23C   O   N   H   CA  HA  CB  HB2 HB3 OG  HG  C
O   N   H   CA  HA  CB  HB2 HB3 CG  HG  CD1 HD11HD12HD13CD2 HD21HD22HD23C   
...

The protein soaked in a water box

The file looks something like

ATOM      1  N   LEU     1       4.615  14.388 -18.018  1.00  0.00
ATOM      2  H1  LEU     1       5.061  14.079 -17.166  1.00  0.00
ATOM      3  H2  LEU     1       5.215  15.062 -18.472  1.00  0.00
ATOM      4  H3  LEU     1       4.289  13.579 -18.526  1.00  0.00
ATOM      5  CA  LEU     1       3.413  15.126 -17.657  1.00  0.00
ATOM      6  HA  LEU     1       3.456  16.146 -18.040  1.00  0.00
ATOM      7  CB  LEU     1       2.189  14.413 -18.285  1.00  0.00
ATOM      8  HB2 LEU     1       2.408  14.587 -19.338  1.00  0.00
ATOM      9  HB3 LEU     1       2.405  13.374 -18.036  1.00  0.00
ATOM     10  CG  LEU     1       0.698  14.707 -18.015  1.00  0.00
ATOM     11  HG  LEU     1       0.593  15.750 -17.715  1.00  0.00
ATOM     12  CD1 LEU     1      -0.142  14.450 -19.270  1.00  0.00
ATOM     13 HD11 LEU     1      -0.038  13.408 -19.570  1.00  0.00
ATOM     14 HD12 LEU     1      -1.189  14.664 -19.057  1.00  0.00
ATOM     15 HD13 LEU     1       0.203  15.096 -20.077  1.00  0.00
ATOM     16  CD2 LEU     1       0.203  13.825 -16.869  1.00  0.00
ATOM     17 HD21 LEU     1       0.783  14.037 -15.971  1.00  0.00
ATOM     18 HD22 LEU     1      -0.850  14.033 -16.679  1.00  0.00
ATOM     19 HD23 LEU     1       0.323  12.776 -17.139  1.00  0.00
ATOM     20  C   LEU     1       3.350  15.138 -16.123  1.00  0.00
ATOM     21  O   LEU     1       2.321  15.454 -15.527  1.00  0.00
ATOM     22  N   ALA     2       4.476  14.815 -15.492  1.00  0.00
ATOM     23  H   ALA     2       5.284  14.549 -16.037  1.00  0.00
...
...
...
ATOM  36493  O   WAT  11084    -32.822 -32.435 -34.462  1.00  0.00
ATOM  36494  H1  WAT  11084    -33.451 -31.967 -33.913  1.00  0.00
ATOM  36495  H2  WAT  11084    -32.708 -33.281 -34.027  1.00  0.00
TER
ATOM  36496  O   WAT  11085    -21.493 -29.257 -19.193  1.00  0.00
ATOM  36497  H1  WAT  11085    -20.701 -29.752 -19.401  1.00  0.00
ATOM  36498  H2  WAT  11085    -21.200 -28.578 -18.586  1.00  0.00
TER
END

You can transfer ebd_wat.pdb to your local laptop and use VMD to take a look. From your local Cygwin terminal,

  scp yourCaseId@hpc1.case.edu:/home/yourCaseID/testing/ebd_wat.pdb .

There are other methods for transferring as well; see HPCC. Some may find the Globus Online very easy to use

   https://sites.google.com/a/case.edu/hpc-upgraded-cluster/hpcc-news/transferfileswithaclick

Once the file is transferred to your local PC, you can use your local copy VMD to look at it. Does it have a lot of water molecules? If yes, we are ready to move on.

Preparing Amber MD input files

Lastly, input files that define the program settings for each MD run, as follows

3.   min.inp - describing the setting parameters to setup the Amber energy minimization 
4.   md.inp  - describing the setting parameters to setup the Amber MD integrator


In general, for each system, it is typical to follow a 3-step procedure by performing an energy minimization on the system, then slowly heating the system, and then carrying out production MD at the desired temperature and pressure.

   Step 1.    Minimization
   Step 1.5   Gradually heating from 0K to 300K with constant volume (NVT)
   Step 2.    Production MD with constant pressure and temperature (NPT) at 300K and 1atm 

For some simple, stable systems, I found over the years that the second step may not make a big difference. Hence, for simplicity here, we will skip the 2nd step and proceed with the 1st (mininization) and the last (MD) steps.

The integrator

The Amber MD integrator is called sander, a numerical solver for the Newton Equation. To ask sander to perform MD simulations, we have to give specific commands in a file named md.in (or any other name you would prefer). Ideally, we could start MD right away. However, when we prepare the initial PDB files, some atoms (such as Hydrogen) may be slightly misplaced in the 3D space, which can cause the instability of MD simulations. For this reason, we often start with some simple energy minimization (at a minimal level). In other words, two input files are needed here: one for minimization (min.in) and the other for MD (md.in), both of which can be triggered via the sander interface.

You can find the usage by

  sander -h

Input file for minimization

Let us first create the file min.in that includes the following settings for minimization.

# minimization
 &cntrl
  imin   = 1,
  ntx    = 1,
  irest  = 0,
  maxcyc = 1000,
  ncyc   = 100,
  ntpr   = 100,
  ntwx   = 0,
  cut    = 12,
/

Note that maxcyc=1000 means that we will have 1000 steps of minimization. More details of their settings can be found in the manual.

The settings can be summarized as follows:

imin=1 	        choose a minimization run
ntx=1   	read coordinates but not velocities from ASCII formatted inpcrd coordinate file
irest=0         do not restart simulation. (not applicable to minimization)
maxcyc=1000     maximum cycles of minimization
ncyc=100 	The steepest descent algorithm for the first 0-ncyc cycles, then switches the conjugate gradient algorithm for ncyc-maxcyc cycles
ntpr=100        frequency to print to the Amber mdout output file every ntpr cycles
ntwx=0          no Amber mdcrd trajectory file written (not applicable to minimization)
cut=12          nonbonded cutoff distance in Angstroms (for PME, limit of the direct space sum - do NOT reduce this below 8.0. Higher numbers give slightly better accuracy but at vastly increased computational cost.)

You can copy this file from my folder

          cp /home/sxy227/bioc430/min.in .

Input file for MD

The production time of this simulation is only 10ps. Ideally, we would run this simulation for much longer, but in the interest of time for this on-the-fly tutorial, we have limited the final production simulation time.

Below is the example for md.in

# eq-MD with NVT
 &cntrl
  imin=0,
  ntx=1,
  irest=0,
  nstlim=5000,
  dt=0.002,
  ntf=2,
  ntc=2,
  temp0=300.0,
  ntpr=500,
  ntwx=500,
  cut=12.0,
  ntb=2,
  ntp=0,
  ntt=3,
  gamma_ln=2.0,
  ig=-1,
/


The settings for production can be summarized as follows:

imin=0        Start MD, not minimization
ntx=1         Read coordinates (but no velocities) from the coordinate file
irest=0	      NO restart from a previous MD run
nstlim=5000   A total of 5000 MD steps, resulting 5000*0.002 ps = 10 ps
dt=0.002      the timestep size is 0.002 ps  or 2 fs
temp0=300.0   Thermostat temperature. Run at 300K
ntwx=500      write the output to an Amber trajectory file (ebd.mdcrd) every 500 steps or every 1ps
ntb=2         Use periodic boundary conditions with constant pressure
ntp=1         md with isotropic position scaling for constant pressure (NPT)
ntt=3         Use Langevin dynamics with the collision frequency

You can copy this file from my folder as well

         cp /home/sxy227/bioc430/md.in .

The value of nstlim is the number is MD steps. Here, it was set at 5000, which means we will run MD for a total of 5000*0.002ps = 10ps. It is quite short by today's standard. For your own protein system, you run longer, say 100ps. In other words, you can change the values of both nstlim and ntwx for for your own need.

To speed up the calculations, I used a much smaller cut value reduced from 12 to 8 angstroms. This is just for demonstration only; For realistic applications, a cut value of 12 is recommended.

Submitting your computing job

There are more details about job submission at HPCC. Below is a simple working example, which will include both the energy minimization and the MD steps.

An example is given below. It begins with a so-called submitting script file, say, submit.csh with the following content.

#!/bin/csh
#PBS -N test
#PBS -l nodes=1:ppn=12
#PBS -l walltime=10:00:00
#PBS -l mem=4gb
#PBS -j oe

module load amber
set p = ebd

# set $np to have the current number of processors
set PBS_NODEFILE=`generate_pbs_nodefile`
setenv np `wc -l $PBS_NODEFILE | awk '{print $1}'`
#
echo '========================================'

# change to the working directory
cd $SLURM_SUBMIT_DIR

# start the 1st-step: energy minimization
mpirun -n $np sander.MPI -i min.in -o ${p}.min.out -p ${p}.prmtop -c ${p}.prmcrd -r ${p}.min.rst

# start the 2nd-step: MD
mpirun -n $np sander.MPI -i md.in -o ${p}.md.out -p ${p}.prmtop -c ${p}.min.rst -r ${p}.md.rst -x ${p}.mdcrd

The beginning region of this submit.csh file is to request a total of 12 processors from 1 node.

You can copy this file from my folder

         cp /home/sxy227/bioc430/submit.csh .

Now, we are READY to ask the cluster to do the calculations via

         qsub submit.csh

By now, the request has been made to the cluster. The computing may start right away or have to wait in the line for a while depending on the availability.

You can check the status of your just-submitted computing job via

   qstat -u yourCaseID

If your job starts, the status column S is marked as R meaning it is Running.

Just in case you want to cancel it, you may do so by

   qdel yourjobid

Here, yourjobid can be found out by running the previous qstat command. For example,

   qdel 4161073

Note that a different job-submitting method is being implemented as well. See more at

 https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands

Waiting for job to finish

Once the job is submitted, you can log out and take a break. Even if you log out, the job is running (or waiting to start) at the cluster. Nice!

Once the computing starts, it takes about tens of minutes or so to finish.

Specific questions about cluster operations should be sent to hpc-support@case.edu, but do expect to wait for some time to get response since the cluster is a mixed use of teaching and research.

Output MD files

Once finished, you can log in back and check all the flies generated via

 ls

There should be several new files, including

ebd.min.out 
ebd.md.out

These two files detail most of the information about the simulation itself you need to know.

Another key file is
ebd.mdcrd

It is the so-called MD trajectory file containing the xyz coordinates of each atom at each saved timepoint during the simulations.

Now you can transfer this MD trajectory file to your own computer, e.g., by scp. There are other methods for transferring as well; see HPCC as mentioned previously.

 ________________ 
< Are you ready? >
 ---------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Alternative job submission via sbatch

For those who got error using the qsub command (especially who did not have an account on hpcc before), the cluster we are using is currently switching to a different job submission mechanism, called slurm (see details at https://sites.google.com/a/case.edu/hpc-upgraded-cluster/slurm-cluster-commands).

In that case, you can copy this file from my folder

         cp /home/sxy227/bioc430/submit2.csh .

The content of this file can be viewed by

  cat submit2.csh

It contains the following commands,

#!/bin/csh
#SBATCH -N 1      # request 1 nodes
#SBATCH -n 8     # 8 MPI processes per node
#SBATCH --time=10:00:00
#SBATCH --mem=4G     # 4 GB RAM per node
#SBATCH --output=mpi_job_slurm.log

module load amber
module load openmpi

set p = ebd

set NPROCS=$SLURM_NTASKS # Assign the number of processors
cd $SLURM_SUBMIT_DIR  # change to the working directory

# start the 1st-step: energy minimization
mpirun -n $NPROCS sander.MPI -O -i min.in -o  ${p}.min.out -p ${p}.prmtop -c ${p}.prmcrd -r ${p}.min.rst

wait

# start the 2nd-step: MD
mpirun -n $NPROCS sander.MPI -O -i md.in -o ${p}.mdout -p ${p}.prmtop -c ${p}.min.rst -r ${p}.md.rst -x ${p}.mdcrd

wait

Once you have this file, the job can be submitted by

        sbatch submit2.csh

Again, you can check the status of this submitted job by

 qstat -u yourCASEID



Now, take a break ... see you on Friday ...

 




................................................................................................
in the mean time, you can practice your linux command and the use of your chosen text editor
................................................................................................








Completion of the MD simulation

Once finished, you can have all the production MD outputs available under your folder by

 ls

The output files include

ebd.md.out
ebd.mdcrd
...

If you want to view the output of the MD simulation, try

 more ebd.md.out

Just in case your job is not finished yet, you get these files from my folder

 cp  /home/sxy227/bioc430/ebd.mdcrd .
 cp  /home/sxy227/bioc430/ebd.prmcrd .
 cp  /home/sxy227/bioc430/ebd.prmtop .
 cp  /home/sxy227/bioc430/ebd_wat.pdb .

File transferring to your local laptop

There are several methods for transferring from and to the cluster.

 Option 1: scp --- if you know how to use it, please go ahead

The scp command works like this

  scp yourCaseID@hpc1.case.edu:/home/yourCaseID/filename .

If you are uncomfortable with scp, the use of GlobusOnline is recommended.

 Option 2: GlobusOnline --> https://www.globus.org/ --> Click log in (on the top-right corner) 
           --> Search Case Western Reserve University --> Use yourCASEID/password ---> ... 

More detailed GO setups can be found at

 https://sites.google.com/a/case.edu/hpc-upgraded-cluster/home/important-notes-for-new-users/transferring-files

Other options of transferring can be found at

   https://sites.google.com/a/case.edu/hpc-upgraded-cluster/home/important-notes-for-new-users/helpful-references/transferring-files-hpcc


Now, use one of these options to transfer at least the following 2 files to your local laptop and make sure you have these two files on your laptop,

ebd.mdcrd
ebd.prmtop

Visualizing the result from AMBER simulation

You've now run an MD simulation. In order to visualize the results, we will now use VMD. This is a molecular graphics program that can render molecular structures in 3D. VMD not only loads Protein Database (PDB) structure files, but also MD trajectories from many programs.

VMD is installed on HPCC as well as on your laptop. It is possible to use VMD on HPCC, but it may be very slow to use since a lot of graphic information are transferred from HPCC to your laptop. In addition, if there are a lot of users using VMD at the same time (via the frontend), it may crash the login node. For this reason, it is recommended to use the local VMD copy on your laptop.

Loading into VMD

So far, we should have two files on your own computers: ebd.prmtop and ebd.mdcrd. They are good to go for VMD visualization. First, we can load the PDB file as demonstrated previously.

After VMD is started on your laptop,

  VMD Main -->  File --> New Molecule 

You should see the "Molecule File Brower" window,

   Filename --> Browse to "ebd.prmtop"

Then "AMBER& Parm" should be the default for file type. If not, set the file type to AMBER7 Parm.

   Click Load.

Now, you should see the Main window.

    Highlighting the file "ebd.prmtop" by clicking

Then in the Main window, we will load the MD trajectory by

     File --> "Load Data Into Molecule" --> Browse -> Select ebd.mdcrd 
          --> File type: AMBER Coordinates with Periodic Box --> Load

Now you should be able to see the whole MD trajectory and control playback, by moving the arrows on the bottom (of the Main window), back and forth.

You should be able to see the estrogen receptor as well as the many water molecules in the display. You can rotate, zoom and pan the molecules around the display with the mouse.

Taking a look at the short molecular motion occurred in computer

Now, there are many different visualization options, as we mentioned last week. For example from the Main window,

       Graphics --> Representations --> Create Rep 
                --> Selected Atom: all not water. --> Coloring method: Secondary structure 
                --> Drawing method: New Cartoon

You can double-click each Rep to show or not show the current representation

You may remember how to change the background color, etc at

  http://theyanglab.org/tutorial/index.php?title=Main_Page#Beatifying_the_protein_structure

or

  http://theyanglab.org/tutorial/index.php?title=Tutorial:Main_Page&action=edit&section=13

You can drag the arrows around to control playback.

  ...   
  Play with it to get a sense of molecular motion we simulated by computer
  ...

Make a movie out of simulation

VMD has a lot of functions that can be used to analyze and study a MD trajectory. For example it is possible to align molecules, measure root mean squared deviations (RMSD), save structures from a trajectory, and measure physical system parameters throughout a trajectory. It's also possible to render a movie of a trajectory.


A lot of things we can do with VMD visualization

  • Take a look to get a feel as you just did -- this is often one of the best ways to look at your data directly
  • Get a sense about overall flexibility (more quantitative analyses will come later)
  • Generate a molecular movie, e.g.,

Here, we will make a simple movie from "ebd.mdcrd", via a two-step procedure

First, from the Main window,

     1.  Extensions -> Analysis -> RMSD Visualizer Tool -> ALIGN. 

Now, all the frames are aligned, using the default 1st frame (Frame # 0) as a reference. In principle, you can pick any frame as you like to.

Second from the Main window,

      2. Extensions -> Visualization -> Movie Maker --> Movie Settings --> Trajectory Rock
                    -> [feel free to change working dir]--> [feel free to give a name ] --> Make Movie

Now, wait for a couple of minutes and you should have a movie file in your chosen folder. Open the file to watch ... you are welcome!



... take a short break ...


Simple post-simulation data analysis

It is getting more advanced here and we start to take a more serious look at our simulation data.

Several specific examples are demonstrated here to give you a feel of how it works.

A simple RMSD calculation

A root-mean-square deviation (RMSD) calculation is a standard practice. The RMSD value is a measurement of how similar a structure's internal atomic coordinates are relative to some reference molecule coordinates. For this example, we will measure how the internal atomic coordinates change relative to, say, the starting structure (ebd.pdb).

We have at least 3 different options to do the calculation. Here, you can pick one of them and give a try (the 1st one may be the easiest)

Option 1: Using the VMD graphic interface

From the VMD main window,

   Option 1. Extensions -> Analysis -> RMSD Trajectory Tool -> ALIGN -> RMSD -> Plot result. 

Note that ALIGN is required; otherwise, you may not get what you wanted.


Option 2: Using a text/script mode of VMD

Create a file called vmd.rmsd.tcl with:

mol load parm7 ebd.prmtop crdbox ebd.mdcrd
set outfile [open rmsd.dat w]
set nf [molinfo top get numframes]
set frame0 [atomselect top "protein and backbone and noh" frame 0]
set sel [atomselect top "protein and backbone and noh"]
# rmsd calculation loop
for { set i 1 } { $i <= $nf } { incr i } {
$sel frame $i
$sel move [measure fit $sel $frame0]
puts $outfile "[measure rmsd $sel $frame0]"
}
close $outfile
exit

Or you can copy from my folder

        cp /home/sxy227/bioc430/vmd.rmsd.tcl .


Once copied, you can run this file by

        vmd  -dispdev text -e vmd.rmsd.tcl > vmd.rmsd.log

The output file is rmsd.dat containing the calculated RMSD values for all frames by

        cat rmsd.dat

Option 3: Using cpptraj from Amber

Create a file called rmsd.cpptraj, including

trajin ebd.mdcrd
autoimage
rmsd  :1-245@CA  first  out rms.dat

You can copy from my folder as well

  cp /home/sxy227/bioc430/rmsd.cpptraj .

Type the following to start the calculation

$AMBERHOME/bin/cpptraj -p ebd.prmtop -i rmsd.cpptraj

The result is saved to the output file rms.dat

Flexibility of each residue via RMSF

We are going to use another tool within AMBER, called ptraj or cpptraj. It is very useful and quite easy to use. Here, an example is given to show how to access the structural flexibility from MD trajectories for each amino acid. We will compute the atomic positional fluctuations for all the atoms. Here, the output is performed only for Ca atoms in mask. This positional fluctuation of RMSF is related to the so-called B-factors, which means multiplying the squared fluctuations by (8/3)pi**2.

Make a file called rmsf.cpptraj containing

trajin ebd.mdcrd
atomicfluct out ebd_RMSF.dat :1-245@CA byres
quit

You can copy from my folder as well

  cp /home/sxy227/bioc430/rmsf.cpptraj .

Type the following to start the calculation

$AMBERHOME/bin/cpptraj -p ebd.prmtop -i rmsf.cpptraj

The result is saved to the output file ebd_RMSF.dat for each residue from 3 to 247, which is a two-column file (residue index vs. RMSF). You can transfer/plot this data file using your favorite plotting software.


   \
    \
        .--.
       |o_o |
       |:_/ |
      //   \ \
     (|     | )
    /'\_   _/`\
    \___)=(___/

(optional) Alternative: Running VMD in GUI Mode from the cluster

Some of you may have issues in using a local copy of VMD at your laptop. Alternatively, it is possible to use the VMD installed at the cluster.

Here is the instruction from HPCC; see detail. Generally, one needs to:

It is recommended to avoid using x11-forwarding since it doesn't support the data transfer well.

  Install the NoMachine client NoMachine client
  

Using the client, connect to hpc2

From the hpc2 session, create an interactive session on a compute node

 Launch VMD from the session on the compute node.


Start an interactive job,

  srun --x11 --pty tcsh

Load the vmd module:,

  module load vmd

Finally, launch the module by

  vmd

This is just an alternative way, which may take some time to figure it out. It will pay off in a long run if heavily used.

Advanced data analysis using cpptraj

Calculate the average structure

given the long (or short) time you simulated, it is quite straightforward to calculate the averaged coordinate from the MD trajectory by writing out a PDB file containing the averaged coordinates over all frames.

Make a file called avg.cpptraj containing

trajin ebd.mdcrd
average ebd_avg.pdb pdb :1-245

You can copy from my folder as well

  cp /home/sxy227/bioc430/avg.cpptraj .

Type the following to start the calculation

$AMBERHOME/bin/cpptraj -p ebd.prmtop -i avg.cpptraj

The output file is ebd_avg.pdb. You can take a look at it by VMD.

  Any difference between this average structure and the initial structure used for simulation?

Yes, where. Not, why not? Too short simulation?

It should be mentioned that an average is just an average of the coordinates visited during the trajectory. The system is rotating and translating so you cannot hope to learn anything from this average. It is not a physically meaningful structure (remember this when you are averaging coordinates), but it will get a sense of its preferred position.

Clustering analysis: Selection of molecular poses for protein-ligand docking

This is a slightly advanced topic. There are several ways to pick molecular structures (or poses) from simulation trajectories.

  • A simple way is to save a pose of choice from VMD trajectory (based on your prior knowledge). You can do this via the VMD main window,
go to File --> Save Coordinate -> Selection: not water 
           --> First: [frame number X] --> Last [frame number X] --> Save

where X is the frame number you want to save.

  • A more quantitative approach of clustering by using all simulated data.

The identification of such major poses can be achieved via a so-called clustering method, where similar structures are grouped into a same cluster. There are quite a few metrics that can be used for such clustering. Today, we are going to use a RMSD metric, which is essentially a measure of the similarity between two structures or poses.

You can start the clustering by making a file called cluster.cpptraj containing

trajin ebd.mdcrd
cluster clusters 5  rms @CA  repout cluster  repfmt pdb averagelinkage

The settings can be summarized as follows:

clusters 5 	Finish clustering when 5 clusters remain
rms  @CA   	Distance between frames calculated via best-fit coordinate RMSD using Ca atoms
repout cluster  Write representative frames to separate files named cluster.X.<ext>, 
                  where X is the cluster number and <ext> is a format-specific filename extension (PDB as indicated next).
repfmt pdb      Save the output file in the PDB format
averagelinkage  Use the average distance between members of two clusters.

You can copy from my folder as well (if you do not want to type in)

  cp /home/sxy227/bioc430/cluster.cpptraj .

Type the following to start the calculation

$AMBERHOME/bin/cpptraj -p ebd.prmtop -i cluster.cpptraj

As a result, since we ask for 5 clusters (or poses), there should be a list of 5 files under your working directory, e.g.,

     ls cluster.*.pdb

showing the following output files,

     cluster.c0.pdb  cluster.c1.pdb  cluster.c2.pdb  cluster.c3.pdb  cluster.c4.pdb. 

Now, you can download them into your own computer and take a look. They are in the format of PDB, so You can take a look at them one-by-one by VMD. If you want to use them for docking, you can get rid of all the water molecules for sure.

One quick thing you can do is to load all 5 poses together into VMD and take a look at their difference, in a similar way you loaded your MD trajectory. For example,

  Load all 5 PDB files into VMD, one-by-one

Now, you should see a total of 5 frames. Then from the Main window,

  Graphics --> Draw style --> Selected Atoms: all not water
  --> Coloring method: Trajetory->Timestep --> Drawing Method: NewCartoon
  --> Trajectory --> Draw Multiple Frames: 0:4 --> Hit Enter

Do you see all 5 structures overlapped on top of each other? If so, you can save the figure (needed for your takehome exam).

Any difference between/among these poses? If not, why? How long did you simulate? Perhaps too short to sample any large changes? Just some hints to make sense your results.

For docking, you can cherry-pick any pose, presumably based on available experimental knowledge. Or you can perform the docking workflow on all of them given the relatively lost cost of virtual docking. In that case, you can simply repeat what you will learn from your next docking projects, on each of these clusters or protein poses.


Principal component analysis (PCA)

PCA is a widely-used technique--not limited to MD--that can be used to transform a series of potentially coordinated observations into a set of orthogonal vectors called principal components (PCs). One way to think of PCs is that they are a means of explaining variance in the data. If the input data are Cartesian coordinates, then a PC is a means of showing variance in coordinate space. PCA is done in such a way that the first PC shows the largest variance in the data, the second PC shows the second largest and so on.

The input to PCA in this example will be the coordinate covariance matrix calculated from the time series of 3D positional coordinates, so the PCs will represent certain modes of motion undergone by the system, with the first PC representing the dominant motion.

One important thing to keep in mind is that while PCA is useful for gaining insight into the dynamics of a system, the actual motion of the system throughout the course of a simulation is almost always a combination of the individual PCs. So while motion along a single PC might show a transition, it is usually not actually how the system undergoes that transition.

See more about PCA at

 https://en.wikipedia.org/wiki/Principal_component_analysis

In short, create a file called pca.cpptraj

trajin ebd.mdcrd
rms first :1-245&!@H=
average crdset average
createcrd avgtrajectories
run
crdaction  avgtrajectories rms ref average :1-245&!@H=
crdaction  avgtrajectories matrix covar name ebd-covar :1-245&!@H=
runanalysis diagmatrix ebd-covar out ebd-evecs.dat \
    vecs 3 name myEvecs \
    nmwiz nmwizvecs 3 nmwizfile ebd.nmd nmwizmask :1-245&!@H=
##########
clear all
readdata ebd-evecs.dat name Evecs
parm ebd.prmtop
parmstrip !(:1-245&!@H=)
parmwrite out ebd-modes.prmtop 
runanalysis modes name Evecs trajout ebd-mode1.nc \
  pcmin -100 pcmax 100 tmode 1 trajoutmask :1-245&!@H= trajoutfmt netcd

Or you can copy from my folder

 cp /home/sxy227/bioc430/pca.cpptraj . 

Start the calculation by

 $AMBERHOME/bin/cpptraj -p ebd.prmtop -i pca.cpptraj

At least, 2 output files are there

  ls  ebd-mode1.nc ebd-modes.prmtop

showing

  ebd-mode1.nc  ebd-modes.prmtop

Now you can open these two files in VMD and watch the movie of the first mode of motion!

  Use VMD to load ebd-modes.prmtop and ebd-mode1.nc.

Summary and outlooks for computer-aided drug discovery (CADD)

Congratulations! You've now run your first MD simulation, make your molecular movie, and analyzed the results.

This is perhaps the simplest example of running a MD simulation that I can imagine. If you want to learn more you are encouraged to visit the AMBER website (or any other simulation website).

Looking back, MD simulation is a powerful tool to examine:


  • Protein dynamics (instead of static structure)
     -- A fundamental structure-dynamics-function relation
  • Functional large-scale motion from simulation
     -- Not available (sometime) from other techniques
  • Molecular poses for drug docking (CADD)
     -- Will be covered in part II

Take home exam (due Oct 31) by

  cp /home/sxy227/bioc430/TakeHomeExam2016.docx .   
  

Please feel free to email/stop by. (Email: sxy227@case.edu for an appointment)

Finally, let us conclude by saying

 _______________________________
< I love MD, but byebye for now >
 -------------------------------
    \                                  ___-------___
     \                             _-~~             ~~-_
      \                         _-~                    /~-_
             /^\__/^\         /~  \                   /    \
           /|  O|| O|        /      \_______________/        \
          | |___||__|      /       /                \          \
          |          \    /      /                    \          \
          |   (_______) /______/                        \_________ \
          |         / /         \                      /            \
           \         \^\\         \                  /               \     /
             \         ||           \______________/      _-_       //\__//
               \       ||------_-~~-_ ------------- \ --/~   ~\    || __/
                 ~-----||====/~     |==================|       |/~~~~~
                  (_(__/  ./     /                    \_\      \.
                         (_(___/                         \_____)_)