Learnings from Molecular Dynamic Simulation
Introduction
To understand the process of creating and running a molecular dynamic simulation using NAMD and VMD, I ran a simple simulation of a water box. The first half of the post discusses MD Simulations and their sophistication and intricacies. The second half discusses creating and simulating a water box using a high-performance computing cluster.
Computer Simulations
Computer simulations help us better analyze the properties of molecular assemblies in terms of microscopic interactions and structure of the same. This works as a supplement to traditional experiments, empowering us to learn something new that we couldnât learn any other way. Molecular dynamics (MD) and Monte Carlo are the most common simulation techniques (MC). Computer simulations help link time scales and the microscopic length with the macroscopic world in the laboratory. It provides a view of the interactions among molecules and obtains accurate predictions of the bulk properties. Molecular dynamics (MD) is a procedure where the motion of the particles of a system over a specific period is recorded by solving Newtonâs equations of motion. This is done for each particle in the system over the simulation duration. However, the MD simulation method was introduced in the 1950s by Marshall Rosenbluth and Nicholas Metropolis in the form of the Metropolis-Hastings algorithm.
The general procedure for MD Simulation is illustrated |
NAMD
NAMD was first introduced by Nelson et al., who was part of the theoretical biophysics group, in the year 1995. It was implemented and created using C++. Currently, the theoretical biophysics group and the parallel programming laboratory at the University of Illinois develop, upgrade, and maintain NAMD. NAMD is used to perform dynamic molecular simulations. For any simulation, the input files required and their description are given below.
- PDB (Protein Data Bank) - Contains information on the coordinates of atoms of the Protein
- PSF (Protein Structure File) - Contains all the information necessary to convert a list of residual names into a complete PSF structure file. Aids with the automatic designation of hydrogen atoms to the protein
- Topology file - Contains all the molecular-specific information needed to apply a particular force field to the molecular system
- Parameter file - Contains all the numerical constants necessary to evaluate energies, forces, speed, and position of the atoms of the protein
VMD
The theoretical and computational biophysics group established VMD under the supervision of Klaus Schulten. VMD has Tcl (a high-level programming language) and Python interpreters. VMD is used along with NAMD to visualize the proteins and the steps involved in the simulation. The theoretical and computational biophysics group currently maintains it. VMD recognizes various files of molecular data for visualization purposes. The list below gives more information about the same.
- Protein Data Bank files
- X-PLOR and Chemistry at Harvard Macromolecular Mechanics (CHARMM) PSF files
- X-PLOR and Chemistry at Harvard Macromolecular Mechanics (CHARMM) trajectory DCD files
- X-PLOR and Chemistry at Harvard Macromolecular Mechanics (CHARMM) trajectory DCD files
- Assisted Model Building with Energy Refinement (AMBER) trajectory and structure files
- GROningen MAchine for Chemical Simulations (GROMACS) trajectory and structure files
Molecular Dynamics Simulation
Computer simulations help us better analyze the properties of molecular assemblies in terms of microscopic interactions and structure of the same. This works as a supplement to traditional experiments, empowering us to learn something new that we couldnât learn any other way. Molecular dynamics (MD) and Monte Carlo are the most common simulation techniques (MC). Computer simulations help link time scales and the microscopic length with the macroscopic world in the laboratory. It provides a view of the interactions among molecules and obtains accurate predictions of the bulk properties. Molecular dynamics (MD) is a procedure where the motion of the particles of a system over a specific period is recorded by solving Newtonâs equations of motion. This is done for each particle in the system over the simulation duration.
Characteristics of Molecular Dynamic Simulation include the following:
- It is a deterministic technique
- It only takes into consideration the the intermolecular forces and the starting position of each particle in the system
- It can be successfully be used as a statistical mechanics method
- The data recorded about the distances and the velocities can be averaged over time and over all the molecules of the system to obtain the thermodynamic quantities of the system
- It provides atomic and molecular level information on the system
The Steps Involved in Molecular Dynamic Simulation |
The main disadvantage of Molecular Dynamic Simulation is that the result is often an approximate measure because solving the Schrodinger equation to obtain the wavefunction that accurately describes atom/particle is computationally expensive and time-consuming. However, the error obtained from simulation can be minimized by choosing the correct potential function. A wide range of potential functions makes the simulation results realistic and more precise.
The Algorithm
Newtonâs equation of motions are solved for all the system particles for all molecular dynamic simulations.
As input we need only the following:
- The starting set of coordinates of the atoms of a system.
- The set of potential functions for obtaining the forces among various atoms/particles of the system.
The five quintessential components for any Molecular Dynamic Simulation are initial condition, boundary condition, integrator/ensemble, force calculation, and property calculation. The distinct properties calculated include - dynamic structure factor and thermal conductivity, the viscosity of the system particles, internal energy, free energy, enthalpy, and many more. The various potentials include - the Lennard-Jones potential CoulombâBuckingham potential.
Lennard-Jones Potential
It is an intermolecular pair potential. Among all the intermolecular pairs, researchers have extensively examined the Lennard-Jones potential.
Lennard-Jones Potential Expression
Lennard-Jones Potential Graph
CoulombâBuckingham Potential
It is a modification of the Buckingham Potential, which was put forth by Richard Buckingham. For the application of the Buckingham potential to ionic systems, the modification was made, and hence the new potential came to be known as Coulomb-Buckingham Potential.
CoulombâBuckingham Potential Expression
CoulombâBuckingham Potential Graph
Water Box Simulation
The water box is modelled by utilizing the VMD software. We use the âAdd Solvation Boxâ command to generate the water box. The command can be accessed as follows:
VMD > Extensions > Modeling > Add Solvation Box
The command generates the PDB (Protein Data Bank) and the PSF (Protein Structure File) for the water box after taking in the coordinates of the box, for our case we provide 40nm as the length of our water box, centered at (0, 0, 0). The diagram below depicts the PDB of water box generated as displayed by VMD.
Water Cube
The first few lines of the PDB file for the water box:
CRYST1 40.000 40.000 40.000 90.00 90.00 90.00 P 1 1
ATOM 1 OH2 TIP3W 5 -16.332 -9.918 -4.096 1.00 0.00 WT1 O
ATOM 2 H1 TIP3W 5 -16.776 -9.549 -4.899 1.00 0.00 WT1 H
ATOM 3 H2 TIP3W 5 -16.908 -9.621 -3.373 1.00 0.00 WT1 H
ATOM 4 OH2 TIP3W 7 -13.967 -15.124 0.891 1.00 0.00 WT1 O
ATOM 5 H1 TIP3W 7 -13.922 -14.776 1.798 1.00 0.00 WT1 H
ATOM 6 H2 TIP3W 7 -13.408 -15.912 0.961 1.00 0.00 WT1 H
Now that the PDB and the PSF files are ready, we need to create the configuration file that NAMD uses to set parameters for the MD simulation that needs to be run. The various parameters necessary for running a simulation are - structure file, coordinate file, temperature, restartfreq, dcdfreq, xstFreq, outputEnergies, force-field parameters, cutoff, stepspercycle, timestep, langevin, langevinDamping, langevinTemp, langevinHydrogen, periodic boundary conditions, particle mesh Ewald (PME), langevinPiston, langevinPistonTarget, langevinPistonPeriod, langevinPistonDecay, langevinPistonTemp, minimize, and run.
This section explores each of these parameters in detail.
- Structure - This takes as input the relative location of the structure file, in our case the PSF file generated by VMD.
structure ../input/solvate.psf
- Coordinates - Just like the structure parameter, this takes the relative location of the PDB file, which the coordinates of the structure.
coordinate ../input/solvate.pdb
- Temperature - Sets the temperature for the MD simulation (in Kelvin).
temperature 300
- . Restart Frequency, DCD Frequency, XST Frequency, Output Energies - The frequency with which the restart files, DCD (coordinates), and the XST (velocity) files are generated. Output Energies rewrites the output files every fixed number of steps.
restartfreq 1000 # 1000 steps = 2ps dcdfreq 1000 xstFreq 1000 outputEnergies 150
- Force Field Parameters - paraTypeCharmm and parameters - The paraTypeCharmm parameter checks whether the parameters file is CHARMM type, and the parameter keyword takes the relative location of the force field parameter file.
paraTypeCharmm on parameters ../input/par_all27_prot_lipid.inp
- Cutoff - NAMD effectively removes the Van Der Waal forces of interaction by the specified cutoff value. The Fig. 2.2 illustrates the working of the Cutoff parameter.
cutoff 10
- Steps per cycle - The total number of steps taken for completion of one cycle is specified by the stepspercycle parameter.
stepspercycle 5
Illustration of the working of the Cutoff parameter
- Time Step - The time taken to complete one step, specified in femtoseconds.
timestep 2.0
- Temperature Control Parameters - The âlangevinâ parameter specifies whether the Langevin dynamics are to be utilized. âlangevinDampingâ specifies the damping coefficient for the said dynamics. âlangevinTempâ states the temperature in Kelvin for langevin dynamics. âlangevinHydrogenâ tells NAMD whether to apply Langevin dynamics to hydrogen atoms present.
langevin on langevinDamping 1 langevinTemp 300 langevinHydrogen no
- Periodic Boundary Conditions - The parameters cellBasisVector1, cellBasisVector2, and cellBasisVector3 state the length of the repeating units. This is implemented to avoid the finite size problem and to make the system an infinite one for smoother simulation. cellOrigin - takes as input the center/origin of the simulation.
cellBasisVector1 40 0 0 cellBasisVector2 0 40 0 cellBasisVector3 0 0 40 cellOrigin 0 0 0
- Pressure Control - The âlangevinPistonâ specifies whether the Langevin piston is to be used or not. âlangevinPistonTargetâ specifies the target pressure required for the calculations involved in piston method, in bar. âlangevinPistonPeriodâ specifies the duration for the piston method, in femtoseconds. âlangevinPistonDecayâ - specifies the damping time period for the method, in femtoseconds. âlangevinPistonTempâ specifies the piston temperature, in Kelvin.
useGroupPressure yes
useFlexibleCell no
useConstantArea no
LangevinPiston on
LangevinPistonTarget 1.01325
LangevinPistonPeriod 100
LangevinPistonDecay 50
LangevinPistonTemp 300.0
- PME or Particle Mesh Ewald is used along side with periodic boundary conditions. âPMEGridSpacingâ specifies the distance between the grids.
PME yes
PMEGridSpacing 2
- Minimize - Performs the process of minimization, which refers to the process of changing the spatial arrangement of the atoms to lower the energy and to prevent bad contacts from occurring as soon as the simulation starts, for specified number of steps.
minimize 5000
- Run - The Run commands starts the simulation for a specified number of steps.
In our case since we have to submit our job to a cluster we define multiple configuration files, this is done to so that if our system crashes at a certain point we can use the previous simulation output files rather than starting from scratch.
Multiple files were created as part of the simulation, overall the simulation ran for 12 nanoseconds at 310 K.
Now the batch file syntax looked as follows:
#!/bin/bash
#SBATCH -p compute
#SBATCH -N 1
#SBATCH -n 8
#SBATCH -t 120:00:00
#SBATCH --job-name="water_system"
#SBATCH -o slurm.%j.out
#SBATCH -e slurm.%j.err
#SBATCH --mail-user=email-id
#SBATCH --mail-type=ALL
#module load namd-2.14-gcc-10.2.0-blihfvg
# module load openmpi-4.0.5-gcc-10.2.0-vx4yhsi
#module load namd-2.14-aocc-2.3.0-besm5io
#module load openmpi-4.0.5-aocc-2.3.0-kyn74k2
spack load namd %aocc
#srun namd2 prod26.in>prodout26
#charmrun ++n 8 srun namd2 prod26.in > out
namd2 +setcpuaffinity +p8 eq1.in > out1
mv out1 output
namd2 +setcpuaffinity +p8 eq2.in > out2
mv out2 output
namd2 +setcpuaffinity +p8 eq3.in > out3
mv out3 output
namd2 +setcpuaffinity +p8 eq4.in > out4
mv out4 output
In here, first we write the settings that the systems needs to employ these include - number of cores, CPUs, job name (to keep track of job), name of output log file and name of output error file and finally mail (so that one knows when the job starts running and is completed). Following this, NAMD is loaded onto the cluster and finally run. Since the directory we are running on doesnât have much of storage space, we move the files obtained after a run to a different directory to ensure that the system doesnât run out of storage place.
Files and Further Analysis
The files for replicating the simulation are available here. Furthermore, I utilized the output trajectory files to obtain the mean square displacement and further the diffusion coefficient, the blog post for the same is available here.