Using ORCA to develop charge parameters

 written by Kellon A. A. Belfon

A little bit of theory

A good partial charge model is important for the forcefield equation in AMBER to produce accurate results. There are many options one can use to generate charges for non-standard residues such as am1-bcc in antechamber[1]. The idea is to ensure the charges generated is transferable when the molecule is in different conformations or as a building block in a larger system so as to avoid refitting the charges. In this tutorial, a method known as the Restrained Electrostatic Potential (RESP) [2] is utilized to calculate the partial charges. This method has been used to fit the charges for the standard amino acids and nucleic acids in Amber and it is the most widely used charge model method in AMBER. For details on how RESP operate please read J. Phys. Chem, 1993, 97, 10269-10280. To summarize the idea behind the tutorial: Pairwise charges will be generated from gas-phase quantum mechanic (QM) calculations using the HF method and Polpe's basis set 6-31G* (HF/6-31 G*). Although this is not the most accurate level of theory in QM the error obtained in the HF/6-31 G* calculation is close to the difference between the charge distribution in solution and in gas phase [3]. The QM calculations were done in ORCA.

The steps are as follow:

(1) Obtain a structure or structures

(2) Optimize structure(s) in Orca

(3) Obtain electrostatic grid points

(4) Calcuate electrostatic potentials using Orca_vpot

(5) Fit charges uing RESP

(6) Repeat step 1-5 if you want to do a multiple conformation RESP fit

This tutorial has lots of details, some may not be applicable to you. It is meant for more advanced users because it requires difficult decisions along the way. Methionine is used only as an example, but the same procedures can be used for any molecule.

Tutorial using Methionine as an example

In this tutorial, we will be obtaining RESP charges for Methionine (alpha and beta conformations shown below) using a similar method to the original Cieplak et al RESP paper [4]

alpha conformation of Methionine beta conformation of Methionine

The Cieplak et al paper is a great documentation of how the charges for the amino acids and nucleic acids were derived for Amber. If you are doing this tutorial I implore you to read this paper first before continuing.

Although Amber already have charges for Methionine, we choose it to (1) provide a simple example of how to do RESP charge fitting (see the notes at the end for advice on complex systems) and (2) we can compare our charges to the one in Amber to attest to the reproducibility and accuracy of the method.

Step 1: Obtain structures for charge fitting

For any charge fitting protocol, the first step is to have a structure (some form of XYZ coordinates). If you are doing a multi conformation fit to get a set of charges that best approximate the change in charge distribution as the molecule change conformations during simulations, then you will need to generate a set of structures. Two questions come to mind:

(1) How many structures (conformations) should you generate for charge fitting?

The answer to this question depends on the molecule you are using to fit charges. Be mindful that the RESP program has a hardcoded limit of 99,999 electrostatic points. This hardcoded limit will restrict the number of conformations you can use. For the amino acids in Amber, only two conformations were used (Cieplak paper), except for Proline (4 was used). Those two conformations had an alpha and beta backbone where the dihedral values of phi and psi were picked based on extensive calculations and research on Alanine. The sidechain used for each backbone was picked from the McGregor et al database and were selected because (1) the sidechains were the most commonly observed for that particular backbone, (2) the two sidechains had different Chi-1 values, and (3) the two sidechains had different chi-n values, where chi-n had a heteroatom in the first or fourth positions. Having alternative or diverse conformations can help produce a more robust set of charges. For most molecules, 2 to 6 diverse conformations are good enough to give you a robust set of charges.

(2) How can you generate the structures for charge fitting?

There are different methods to generate structures. If it is an unnatural amino acid you can use two conformations with different backbones and select the sidechains based on the closest natural amino acid counterpart. Multidimensional scanning (using a QM package such as Orca or Amber's tleap) and simulated annealing (using Amber's Sander) are two techniques often used in the Simmerling lab.

In this tutorial, we will use the same two conformations of Methionine used in the Cieplak et al paper. We will use tleap to generate these two conformations. Keep in mind this is only possible because Amber already has a set of charges for Methionine. Using tleap to generate conformations only work if you are updating charges or doing new charges for a molecule that already have parameters in Amber. If you have no charges, to begin with, you cannot use this tleap step. An easy way is to use the conformation you got from your PDB and follow the steps in the tutorial to obtain a representative set of charges. Then use the methods above to generate more structures and select conformations from the generated structures to do a multi conformation fit to get more robust charges.

Use one conformation to generate charges
           |
Generate more structures using simulated annealing or multi-dimension scan
           |
Pick a diverse set of structures 
           |
Redo the charges using a multi conformation RESP fit

To get the conformations we need for charge fitting Methionine we will use the impose command in tleap. This command allows us to set dihedrals to a specific value. The bash script (make_conf.sh) used is shown below.

#!/bin/bash

#this script use Met conformation from Cieplak et al paper to get charges for MET
#a is phi b is psi and c is chi1 d is chi2 e is chi3

#alpha
a=-60
b=-40
c=-62
d=-54
e=-58
   
cat > leap.MET.phi$a.psi$b.chi.$c.$d.$e.in << EOG
source leaprc.protein.ff14SB
m=sequence {ACE MET NME}
impose m {1 2 3} {{"C" "N" "CA" "C" $a} {"N" "CA" "C" "N" $b} {"N" "CA" "CB" "CG" $c} {"CA" "CB" "CG" "SD" $d} {"CB" "CG" "SD" "CE" $e}}
saveamberparm m MET.ff14SB.MM.parm7 MET_alphabb.rst7
savepdb m MET_alphabb.pdb
quit
EOG

/opt/amber/bin/tleap -f leap.MET.phi$a.psi$b.chi.$c.$d.$e.in

#C5/beta
a=-155
b=153
c=-173
d=176
e=180

cat > leap.MET.phi$a.psi$b.chi.$c.$d.$e.in << EOG
source leaprc.protein.ff14SB
m=sequence {ACE MET NME} 
impose m {1 2 3} {{"C" "N" "CA" "C" $a} {"N" "CA" "C" "N" $b} {"N" "CA" "CB" "CG" $c} {"CA" "CB" "CG" "SD" $d} {"CB" "CG" "SD" "CE" $e}} 
saveamberparm m MET.ff14SB.MM.parm7 Met_betabb.rst7
savepdb m MET_betabb.pdb
quit
EOG

/opt/amber/bin/tleap -f leap.MET.phi$a.psi$b.chi.$c.$d.$e.in

Make a directory name 0_makeStructure and in that directory run the script below with bash.

bash make_conf.sh

This script will give us two structures. One with an alpha backbone and the other with a beta backbone. In the 0_makeStructure directory, you will have MET_alphabb.rst7, MET_alphabb.pdb, MET_betabb.rst7, MET_betabb.pdb, MET.ff14SB.MM.parm7. You will need all five files to move onto the next step in the tutorial.

Step 2: Optimizing the structure(s)

In this step, we need to optimize the structures using Quantum Mechanics (QM). In the original Cieplak et al paper QM was not used to optimize the structure. All amino acids except for alanine were optimized using ff94. In the paper, the authors noted that this was a limitation in the method and that QM optimized structure would be better for generating RESP charges. We will diverge from this method in the Cieplak et al paper and optimized the structures using Orca, a QM package. Additionally, we will use the same Cieplak method by optimizing methionine using ff14SB. To clarify at the end of this step, we will have four optimized structures

1 alpha backbone and 1 beta backbone optimized in QM
1 alpha backbone and 1 beta backbone optimized in MM

If you have a molecule obtained from the PDB database as a ligand and Amber has no charges for it then skip step 2A and go straight to step 2B since it is not needed and only used in this tutorial for comparison at the end.

Step 2A: Optimizing in MM (ONLY for comparison, Do not do for regular charge fitting)

Make a new directory name 1_MM

In this directory, you will run a short minimization of the tleap structures generated from tleap using the python script below

import sys
import os

#python2 run_min.py ../0_makeStructure/MET.ff14SB.MM.parm7 

# functions used in this script
def make_min(f1,f2,f3,f4,f5):
  min_template="""min.in
&cntrl
imin = 1, ntx = 1, maxcyc = 100000,
ntwr=100, ntpr=100,
cut = 99.0,
ntb=0, igb = 0, saltcon = 0.0,
nmropt = 1,
/
&wt TYPE='REST',istep1=0,istep2=499,value1={f1}.,value2={f1}.,
/
 &wt type='REST',istep1=500,istep2=999,value1={f2}.,value2={f2}.,
/
 &wt TYPE='REST',istep1=1000,istep2=1499,value1={f3}.,value2={f3}.,
/
 &wt TYPE='REST',istep1=1500,istep2=1999,value1={f4}.,value2={f4}.,
/
 &wt TYPE='REST',istep1=2000,value1={f5}.,
/
&wt type='END'  /
DISANG=./chi.rst
END
"""
  #f1=100;f2=500;f3=1000;f4=5000;f5=10000
  force_constant={
    "f1":f1,
    "f2":f2,
    "f3":f3,
    "f4":f4,
    "f5":f5
  }
  with open("min.in", "w") as f:
    f.write(min_template.format(**force_constant))
    f.close()

def run_min(top,crd,name):
  runmin_template="""#!/bin/bash

sander=/opt/amber/AmberTools/bin/sander
parm={top}
crd={crd}
name={name}

$sander -O -i ../min.in -p ../$parm -c $crd -ref ./$name.rst7 -o ./$name.out -x ./$name.nc -e ./$name.en -v ./$name.vel -inf ./$name.info -r ./$name.rst7

"""
  top1={
    "top":top,
    "crd":crd,
    "name":name
  }
  with open("run_min.sh", "w") as f:
    f.write(runmin_template.format(**top1))
    f.close()


# step1: make necessary directories using whatever for loop
#   for example this uses a for loop for only two structures alpha and opt backbone
#   ensure your variable for the directory is the name of the structure file.for example
#   Here I am using alphabb as the directory name and my crd file name is alphabb.rst7
# Or in other places I use struct1 as the directory name so the crd file will be struct1.rst7
top=sys.argv[1]
struct_list=['alphabb', 'betabb']
for struct in struct_list:
  os.system("mkdir %s" %(struct))
  os.chdir("./%s" %(struct))
  os.system("cp ../../0_makeStructure/%s.rst7 ./" %(struct))
  os.system("python2 ../make_chirst.py ../%s %s.rst7" %(top,struct))
# Step 2: write min.in for energy minimization with fs being the force constants at different steps 
#  which was used in the ff14SB paper
  make_min(100,500,1000,5000,10000)
# Step 3: write bash script to run minimization and run minimization
  run_min(top, "%s.rst7" %(struct), "%s" %struct)
  os.system("bash run_min.sh")
  os.chdir("../")
                                                          

Step 2B: Optimizing in QM using Orca

Here we will optimize the structure(s) in QM to get an optimized geometry for the calculation of the electrostatic potentials (ESPs). If you are doing this tutorial using a different molecule it is okay to have just one conformation in the meantime. In this example, we have two conformations of methionine generated from tleap. Most likely you will only have one conformation if you are fitting charges for a molecule such as a ligand.

Make a new directory and name it 1_QM

General information on Orca input file

Check out the Orca manual [5] for detail information. In this step, you will learn how to create a General input file in ORCA that will do an optimization.

Here is an example of a general input file for geometry optimization in Orca. Do not use this input, it is just an example.

 %MaxCore 4000                              #maximum memory used during the calculation (4000 MB)
 ! HF 6-31G(d) TightSCF TightOpt KeepDens   #QM level of theory HF/6-31 G* ,tight SCF and optimization convergence,  keep density matrix on disk
 %id "MET"                                  # this is the title of your file or compound
 
 * xyz 0 1                                  # specifies xyz coordinates, then your overall charge (0), then your multiplicity (2S + 1) 
 H   2.3873875  -4.0170092  -2.6955220
 C   1.8289424  -3.2550675  -3.2349181
 H   1.0191517  -3.7219918  -3.7930139
 H   2.4900169  -2.7327040  -3.9245439
 C   1.2316623  -2.2432307  -2.2837944
 O   0.6467249  -1.2702062  -2.7305808
 N   1.3771172  -2.4778349  -0.9788252
 H   1.8665045  -3.3136112  -0.7072394 
 C   0.8847753  -1.5929615   0.0900104
 H  -0.1882808  -1.4837993  -0.0781802
 C   1.0874754  -2.2690208   1.4557468
 H   0.7674846  -3.3108792   1.4063890
 H   2.1510421  -2.2510925   1.6983280
 C   0.3033079  -1.5781945   2.5786755
 H   0.2207970  -0.5118871   2.3672101
 H  -0.7047896  -1.9928993   2.6088296
 S  1.0745440  -1.7446966   4.2102697
 C  -0.2383535  -1.0333856   5.2368115
 H  -0.4246745  -0.0034670   4.9309774
 H  -1.1516929  -1.6174695   5.1225765
 H   0.0705784  -1.0481230   6.2821687
 C   1.4819535  -0.1736673   0.0497819
 O   0.7435138   0.8044234   0.1159758
 N   2.8138139  -0.0627033  -0.0449178
 H   3.3529678  -0.9110134  -0.0871989
 C   3.5230554   1.2136511  -0.1012502
 H   3.3165281   1.7896258   0.8030373
 H   4.5975030   1.0486424  -0.1882873
 H   3.1746114   1.7866017  -0.9634200
 *                                     #after you list all your xyz coordinates then end the section with an *

If you wanted to do a single point calculation after you finish your optimization. For example when you need to do dihedral fitting then you can add this to the end of the file

 $new_job                              #this tell orca to run a new job (single point energy calculation of your optimized structure from job1)
 %MaxCore 3000
 ! MP2 6-31G(d) TightSCF               #single point energy calculation is done at the MP2/6-31G*
 %id "MSE"
 
 * xyzfile 0 1                         #it tell orca to take the .xyz file generated from the optimization as the input file for xyz coordinates

For Peptides, we will want to keep the backbone to a specific dihedral that matched a specific backbone conformation. To do this you can add all your dihedrals that you want to constraint using the %geom Constraints command. Check out the ORCA manual [6] for some more cool tricks (e,g bond or angle constraints) you can do with your structure during optimization.

 %geom Constraints
    { D 0 1 4 6 C }   #D stands for dihedral constraint, numbers are the four atoms and C stands for Cartesian space to do the constraints
    { D 1 4 6 7 C }
    ...
    ...
    { D 10 13 16 17 C }
     end
   end

Check out the QM page for more information on adding basis set etc. Remember Orca index from zero and not 1.


Generate an input file for basic optimization

Here you need to make a few decisions

Decision 1: What level of theory and basis set you will use for geometry optimization?

In most cases, using HF 6-31G(d) for optimization is good enough. It is consistent with what was used in the Cieplak paper and what was used to generate the charges for the amino acids in Amber. Sometimes, keeping it consistent is the best method and although HF 6-31G(d) is not the best method for optimization, the Amber developers have a good reason for using it. (Explained at the beginning of the tutorial)

Decision 2: Do you need to use any constraints?

This depends on what kind of molecule you are optimizing. For example, if you optimizing an unnatural amino acid you may want to constrain the dihedral values for the backbone and sidechain, or if your molecule has a ring that you want to keep in a particular state, you may want to constrain the ring. Most likely if it is a ligand from a complex in the PDB, you will not need to add any constraints.

If you are not adding any constraints, then you can go ahead and run the following python script makeorca.py

import sys
import linecache
import re
import os

# this script uses python2
# usage python makeorca.py pdbfile molecule_name charge multiplicity 
#       python2 makeorca.py alphabb.pdb alphabb 0 1
# remember orca index start from 0 so minus 1 from pdb index

def make_orca(pdbfile, mol_name, charge, multiplicity):
    f1 = open('%s.inp' %(mol_name),'w')
    l1="! HF 6-31G(d) TightSCF TightOpt KeepDens"
    l2="%MaxCore 4000"
    l3='%base' + "\"%s\"" %(mol_name)
    l4=" "
    l5="* xyz %s %s" %(charge, multiplicity)
    f1.write('{}\n{}\n{}\n{}\n{}\n'.format(l1,l2,l3,l4,l5))
    f2 = open(pdbfile,'r').readlines()
    for line in f2:
        if re.match('TER',line):
            break
        else:
            xyz=line.split()[5:8]
            xyz='   '.join(xyz)
            atom_name=line.split()[2][0]
            atom_name=''.join(atom_name)
            f1.write(atom_name+'   '+xyz+'\n')
    f1.write("*\n")

def create_orcarun(mol_name):
  orcarun_template='''#!/bin/bash

#SBATCH -n 1
#SBATCH --partition=980
#SBATCH --job-name $dir.{mol_name} 

export PATH=$PATH:/opt/orca 

#this command run orca
/opt/orca/orca {mol_name}.inp > {mol_name}.log
'''
  molecule={
    "mol_name":mol_name
  }
  with open("run_orca.sh", "w") as f:
    f.write(orcarun_template.format(**molecule))
    f.close()
#...............................................................#
# Everything start here

# load your pdb file
pdbfile=sys.argv[1]
# load your molecule name
mol_name=sys.argv[2]
# what is your molecule charge
charge=sys.argv[3]
# what is your molecule multiplicity
multiplicity=sys.argv[4]
# make a directory based on molecule name
os.system("mkdir %s" %(mol_name))
# cd into the directory
os.chdir("./%s" %(mol_name))
# function that generate your orca input file
make_orca("../%s" %(pdbfile), mol_name, charge, multiplicity)
# function that create the run script
create_orcarun(mol_name)
# execute the run script
os.system("sbatch run_orca.sh")
#go back to main directory
os.chdir("../")
print "Your Job has been submitted"

Name the script makeorca.py. This script has two functions def make_orca(): and def create_orcarun():. The first function creates an orca input file. If you need to make any changes to your orca input file then make the changes in this function or just simply create your own orca input file. The second function creates a bash script to submit the job to our cluster using the partition 980. If you want to use another cluster then make your own input files. The rest of the python script creates directories based on your molecule name (in this case for the tutorial, alphabb, and betabb). Ensure all header lines and connect lines are removed from your pdb file. The script takes four arguments:

  • (1) a pdbfile which you get from the PDB or created in tleap or antechamber
  • (2) molecule_name, which is the name of your molecule (e.g alphabb or betabb). Keep the same molecule name throughout this tutorial.
  • (3) charge, which is the charge of your molecule (e.g 0, -1 , 1)
  • (4) multiplicity, which is the multiplicity of the molecule energy level (2S + 1 where S is the total spin angular momentum). In most cases, we have singlets so it is 1. If you have a radical, carbene, diradical then it may not be 1.

usage: python makeorca.py pdbfile molecule_name charge multiplicity

For example for if you are following this tutorial with your own PDB structure, then

 python2 makeorca.py pdbname.pdb pdbname 0 1

However, for the alpha and beta conformations of Methionine, we will need to constrain the backbone and sidechain dihedrals using the atom index. I did the liberty of making a change to function make_orca(), but if you are using another type of unnatural amino acids, you will have to define the index number for the dihedrals. In this example psi is 6 8 21 23, phi is 4 6 8 21, chi1 is 6 8 10 13, chi2 is 8 10 13 16 and chi3 is 10 13 16 17. The D stands for dihedral and C stands for constraint. If you add a new line or remove a new line then ensure you add or remove a {}\n and update the name of your line or remove the name of your line in the .format(). Additionally, if you have a case where you want to constraint all dihedrals (often done for forcefield development) you can use the option { D * * * * C } instead of having multiple lines of {D..... C}. Below are some other options in orca.


all bond lengths where N1 is involved : { B N1 * C}
all bond lengths : { B * * C}
all bond angles where N2 is the central atom: { A * N2 * C }
all bond angles : { A * * * C }
all dihedral angles with central bond N2-N3 : { D * N2 N3 * C }
all dihedral angles : { D * * * * C }

Here is the new function

def make_orca(pdbfile, mol_name, charge, multiplicity):
    f1 = open('%s.inp' %(mol_name),'w')
    l1="! HF 6-31G(d) TightSCF TightOpt KeepDens"
    l2="%MaxCore 4000"
    l3='%base' + "\"%s\"" %(mol_name)
    nl4="%geom Constraints"
    psi="      { D 6 8 21 23 C }"
    phi="      { D 4 6 8 21 C }"
    chi1="      { D 6 8 10 13 C }"
    chi2="      { D 8 10 13 16 C }"
    chi3="      { D 10 13 16 17 C }"
    nl5="      end"
    nl6="    end"
    l4=" "
    l5="* xyz %s %s" %(charge, multiplicity)
    f1.write('{}\n{}\n{}\n{}\n{}\n{}\n{}\n{}\n{}\n{}\n{}\n{}\n{}\n'.format(l1,l2,l3,nl4,psi,phi,chi1,chi2,chi3,nl5,nl6,l4,l5))
    f2 = open(pdbfile,'r').readlines()
    for line in f2:
        if re.match('TER',line):
            break
        else:
            xyz=line.split()[5:8]
            xyz='   '.join(xyz)
            atom_name=line.split()[2][0]
            atom_name=''.join(atom_name)
            f1.write(atom_name+'   '+xyz+'\n')
    f1.write("*\n")

Replace the old function make_orca() in your makeorca.py script with the new function directly above to continue the tutorial with Methionine. Copy the MET_alphabb.pdb and MET_betabb.pdb into your 1_QM directory and run the python script.

For the alpha conformation of Methionine, the command will be

 python2 makeorca.py MET_alphabb.pdb alphabbMet 0 1

For the beta conformation of Methionine, the command will be

 python2 makeorca.py MET_betabb.pdb betabbMet 0 1

This script will create two directory alphabbMet and betabbMet and make a run script that will be submitted to our Queuing system. Note if you are using a different cluster or our cluster has been updated, make changes to the create_orcarun() function as needed.

The main part of the script with load the arguments, make a directory name after your molecule name, cd into the new directory, create the orca files and submit the job to the queue. The script works with python2 version on bell. You will get a printout saying "Submitted batch job #". Wait until your job is complete and move onto step 3. You will need the .gbw, .xyz, .scf files.

Grab some lunch, or breakfast or coffee, it will take about an hour for this job to run.

  • make sure your QM calculations converged by looking for the statement below in your .log file. It will be pdbname.log in the pdbname directory, where pdbname is the name of your molecule.
                   ***********************HURRAY********************
                   ***        THE OPTIMIZATION HAS CONVERGED     ***
                   *************************************************

Step 3: Obtain electrostatic grid points

In this step, we will generate the electrostatic grid points to calculate the electrostatic potentials in step 4 to fit the charges. You will need the python script jenerate_resp_points.py written by alumni James Maier, to create the esp points. You can obtain the script on this page [7]. Copy the jenerate_resp_points.py python script and place it in your 1_QM directory. Ensure to keep the same name (jenerate_resp_points.py). In jenerate_resp_points.py, you can change the density to edit the number of points generated by the script. The maximum number of ESP points that the program RESP can utilize is 99,999. The larger the molecule or the more conformation, the smaller the density. For example, a molecule MSE has 29 atoms and if a density of 6 was used, ~5600 grid points will be generated. As a result, only 16 conformations can be used to fit the charges. In our case, we use a density of 30, since we only have two conformations of Methionine to fit.

I have generated a simple python script name make_esppoints.py, that takes two arguments. Argument one is the molecule name and argument 2 is the density.

import os
import linecache
import re
import sys 

# this script uses python3 because James wrote the generate script
# usage python make_esppoints.py molecule_name density 
#       python make_esppoints.py alphabbMet 30
#...................................................................# 
# load your molecule name
mol_name=sys.argv[1]
# load the density value to control the number of grid points
# remember the total hardcoded limit of 99,999 in RESP
density=sys.argv[2]
# make a directory based on molecule name
os.system("mkdir %s_esppoints" %(mol_name))
# cd into the directory
os.chdir("./%s_esppoints" %(mol_name))
# run james python script to generate the gridpoints. It uses the QM log file 
os.system("python3 ../jenerate_resp_points.py ../%s/%s.log %s" %(mol_name,mol_name,density))
#go back to main directory
os.chdir("../")

copy this script and name it make_esppoints.py

For the tutorial, you can run the script for the two conformations of Methionine.

python make_esppoints.py alphabbMet 30
python make_esppoints.py betabbMet 30

If you are using your own pdb, then run it as

python make_esppoints.py pdbname 30

As the script is running you will get an output “Orca” on your screen. The script will create a new directory moleculename_esppoints. In this directory, you will see four sets of files (esppoints_14, esppoints_16, esppoints_18, esppoints_20). Inside of these files, the first line has the number of grid points and then XYZ coordinates for each grid point. The numbers 14, 16, 18 and 20 corresponds to the distance away from the VDW radius the points are generated. 14 means we generated points 1.4 times the VDW radius, 16 is 1.6 times the VDW radius and so on. This allows us to determine the electrostatic potential up to 2 times the VDW radius. Additionally, there are images that are displayed in the directory that help put everything into perspective. Here are an example of the images. Hover your mouse pointer over the image.

esppoints 1.4 times the VDW radius esppoints 1.6 times the VDW radius esppoints 1.8 times the VDW radius esppoints 2.0 times the VDW radius

If you have time you can play around with the density argument to generate a different number of grid points.

Step 4: Calculate Electrostatic potential

To calculate the electrostatic potential for each grid points generated in step 3, we will need the gbw, scfp, XYZ coordinates file, and the esp points (Esppoints_14 etc). I have created a python script (make_orcavpot.py) that does most of the copying and calculations. Let us walk through the python script. As usual, first, we load important modules and then we define a couple of functions.

The first function create_orca_vpot_run(mol_name): creates a simple submission script for our cluster (Bell). Note if the cluster is different, then you will need a different submission script. This one uses the 980 partition, and uses the program orca_vpot (/opt/orca/orca_vpot) to calculate the electrostatic potentials. The orca_vpot program takes a gbw, scfp and esp points file as an input and gives an output *vpot.out. The submission script does this for all four sets of grid points.

The second function def create_esp(mol_name, num_atoms): takes the orca_vpot outputs (*vpot.out) and the XYZ coordinates file and create a single file called the .esp file. This .esp file has a specific fortran format necessary for the program orca.

Toward the end of the script, the molecule name and number of atoms are loaded as parameters. A directory is created called molecule name_esp, then the gbw, scfp and XYZ files are copied from the directory used for the geometry optimization. Finally, the functions are called, the job is submitted to cluster.


import os
import linecache
import re
import sys
import time

# this script uses python2
# usage python2 make_orcavpot.py molecule_name number_of_atoms
#       python2 make_orcavpot.py alphabbMet 29

def create_orca_vpot_run(mol_name):
  orcarunvpot_template='''#!/bin/bash

sbatch << EOF
#!/bin/bash 

#SBATCH -n 1
#SBATCH --partition=980
#SBATCH --job-name orcavpot.{mol_name}

export PATH=$PATH:/opt/orca

#this command run orca
/opt/orca/orca_vpot {mol_name}.gbw {mol_name}.scfp ../{mol_name}_esppoints/esppoints_14 esppoints_14.vpot.out
/opt/orca/orca_vpot {mol_name}.gbw {mol_name}.scfp ../{mol_name}_esppoints/esppoints_16 esppoints_16.vpot.out
/opt/orca/orca_vpot {mol_name}.gbw {mol_name}.scfp ../{mol_name}_esppoints/esppoints_18 esppoints_18.vpot.out
/opt/orca/orca_vpot {mol_name}.gbw {mol_name}.scfp ../{mol_name}_esppoints/esppoints_20 esppoints_20.vpot.out
EOF
'''
  molecule={
    "mol_name":mol_name
  }
  with open("run_orcavpot.sh", "w") as f:
    f.write(orcarunvpot_template.format(**molecule))
    f.close()

def create_esp(mol_name, num_atoms):
  createEsp_template='''#!/bin/bash

#editing the .xyz file from ang to bohr
awk '{{printf "%16s%16.7E%16.7E%16.7E\\n","", $2/0.52917721092, $3/0.52917721092, $4/0.52917721092}}' {mol_name}.xyz > {mol_name}.tmp
#delete line 1-2 in .xyz file
sed -i '1,2d' {mol_name}.tmp
for i in `seq 7 10`
do
 I=$((2*i))
 # move column 4(vpot) to column 1 so respgen can read the file
 awk '{{printf "%16.7E%16.7E%16.7E%16.7E\\n",$4, $1, $2, $3}}' esppoints_$I.vpot.out > esppoints.$I.out 
 #editing the .vpot.out, delete line 1 in vpot.out
 sed -i '1d' esppoints.$I.out
done

#combine xyz and esp to one file
cat esppoints.14.out esppoints.16.out esppoints.18.out esppoints.20.out > esppoints.tmp
cat {mol_name}.tmp esppoints.tmp > {mol_name}.esppoints.tmp
t_lines=$(cat esppoints.tmp | wc -l)
echo {num_atom} $t_lines 0 > line1.txt
awk '{{printf "   %s %s    %s\\n",$1, $2, $3}}' line1.txt > line.txt
cat line.txt {mol_name}.esppoints.tmp > {mol_name}.esp
#remove all unncessary files
rm -f blank.txt
rm -f line1.txt
rm -f line.txt
rm -f {mol_name}.esppoints.tmp
rm -f *.tmp
rm -f esppoints.tmp
echo done with .esp
'''
  molecule={
    "mol_name":mol_name,
    "num_atom":num_atom
  }
  with open("create_esp.sh", "w") as f:
    f.write(createEsp_template.format(**molecule))
    f.close()
#..........................................................#
# load your molecule name
mol_name=sys.argv[1]
# load the number of atoms in your molecule
num_atom=sys.argv[2]
# make a directory based on molecule name
os.system("mkdir %s_esp" %(mol_name))
# cd into the directory
os.chdir("./%s_esp" %(mol_name))
# copy your .gbw and .scfp file to the working directory to calculate the vpot
os.system("cp ../%s/*.gbw ./" %(mol_name))
os.system("cp ../%s/*.scfp ./" %(mol_name))
os.system("cp ../%s/*.xyz ./" %(mol_name))
# function that generate your orca input file to calculate the electrostatic potential (vpot)
create_orca_vpot_run(mol_name)
# execute the run script
os.system("bash run_orcavpot.sh")
# wait 30 secs for the command above to be done
# if you get "awk: fatal: cannot open file" error extend time.sleep to 60s
time.sleep(30)
# edit the files to get the write format for fortran
create_esp(mol_name, num_atom)
os.system("bash create_esp.sh")
#go back to main directory
os.chdir("../")

copy the python script above and name is make_orcavpot.py In this tutorial run the python script for both conformations of Methionine using the commands below Run this python script for the two conformations of methionine.

python make_orcavpot.py alphabbMet 29
python make_orcavpot.py betabbMet 29

Again, if you have your own pdb file then you need the number of atoms, and it can be run like this

python make_orcavpot.py pdbname number_of_atoms

Check your directory name molecule_name_esp for a file moleculename.esp. In our case, it will be alphabbMet.esp. This is the file we will use in the next step to do the resp fitting. The first line in the file has three numbers. The first number is the number of atoms, the second number is the number of esppoints and the third number is your charge. This line is followed by the xyz coordinates and then the esp points (X Y Z) with the electrostatic potential for each point.

Step 5: Fitting the charges with RESP

We are almost done. We have obtained the electrostatic potentials from the previous step. Now we will use this to calculate the charges. First, you will need to create a list that contains the directory for the conformations you will use to fit the charges. Name this list dirlist. For this tutorial dirlist should have

alphabbMet
betabbMet

Since we are using the same backbone as ff14SB for methionine then we need to create a qin.dat file. If you are fitting the side chain charges for an unnatural or natural amino acid then you will need to do this step as well. If you are fitting the charges for a ligand (drug molecule) then you can skip this step. The qin.dat file contains charges you want to constrain or remain the same during the resp calculations.

The format is CHARGE chargeofMolecule atom number atomtype

This is the one I use for Methionine based on the charges on the backbone

CHARGE 0.11230 1 HH31 
CHARGE -0.36620 2 CH3 
CHARGE 0.11230 3 HH32
CHARGE 0.11230 4 HH33
CHARGE 0.597300 5 C1
CHARGE -0.567900 6 O1
CHARGE -0.415700 7 N
CHARGE 0.271900 8 H
CHARGE 0.597300 22 C
CHARGE -0.567900 23 O
CHARGE -0.415700 24 N1
CHARGE 0.271900 25 H1
CHARGE -0.14900 26 CH3 
CHARGE 0.09760 27 HH31 
CHARGE 0.09760 28 HH32
CHARGE 0.09760 29 HH33

Note It is debatable whether to restrain the charges of the methyl group in ACE and NME to be the same as in Amber. This is my take on it. I think we should because it makes it easier for the central group (MET) to be neutral. Sometimes the charges on the methyl group change with the fitting, and this result in the overall charge for the central group not equal to zero. Then you will have to redistribute the extra charges on hydrogen atoms to make the central group zero. Another reason is since the simulations used the amber charges for the capping groups, then we might as well fit the central unit charges with the amber methyl charges. The charges for ACE and NME can be found in the /opt/amber/dat/leap/prep/amino12.in. You are always welcome to try both approaches and look at the difference but in this tutorial, we will restrain the methyl group to be the same as in Amber.

Note the Carbonyl Carbon in ACE has a charge of 0.5972 but in the neutral amino acids, the carbonyl carbon has a charge of 0.5973. It is a difference of 0.0001. In our case, we will use 0.5973 to restrain the carbonyl carbon for the ACE group. The ACE group will still be neutral (overall charge of 0.0001 vs -1.1102230246251565e-16)

Now we will use a python script I created to calculate the charges. The name of the script is make_charges.py

import os
import linecache
import re
import sys
import time
import numpy as np

#Usage python make_charges.py number_of_conformations
#      python2 make_charges.py 2

def create_esp(mol_name):
  # make a directory based on molecule name for charges
  os.system("mkdir %s_charge" %(mol_name))
  # cd into the directory
  os.chdir("./%s_charge" %(mol_name))
  # copy mol_name.esp to the working directory
  os.system("cp ../%s_esp/%s.esp ./" %(mol_name,mol_name))
  # copy the pdb to the working directory
  os.system("cp ../%s.pdb ./" %(mol_name))
  # convert pdb to .ac file using the antechamber on our cluster
  os.system("/opt/amber/bin/antechamber -i %s.pdb -fi pdb -o %s.ac -fo ac" %(mol_name,mol_name) )
  # make all .esp files into one
  os.system("cat %s.esp >> ../charge.esp" %(mol_name))
  #go back to main directory
  os.chdir("../")
  
def run_resp(first_mol_name, num_mol):
  # only need the first .ac file for the atom name and index
  # create a charge directory
  os.system("mkdir charge_dir")
  # cd into the directory
  os.chdir("./charge_dir")
  # copy the .ac file from the first directory on your dirlist
  os.system("cp ../%s_charge/*.ac ./" %(first_mol_name))
  # create the input files for resp
  os.system("/opt/amber/bin/respgen -i %s.ac -o step1.respin1 -f resp1 -a ../qin.dat -n %s" %(first_mol_name, num_mol))
  os.system("/opt/amber/bin/respgen -i %s.ac -o step2.respin2 -f resp2 -n %s" %(first_mol_name, num_mol))
  # run resp
  os.system("/opt/amber/bin/resp -O -i step1.respin1 -o step1.respout1 -e ../charge.esp -q QIN -t qout_stage1")
  os.system("/opt/amber/bin/resp -O -i step2.respin2 -o step2.respout2 -e ../charge.esp -q qout_stage1 -t qout_stage2")
#........................................................................................................................................#
# Everything starts here

#get the number of molecule you are fitting
num_mol=sys.argv[1]
# read the dirlist file
dirlist=np.genfromtxt("dirlist", dtype=str)
# remove charge.esp incase it is there from before
os.system("rm ./charge.esp")
# Incase the list only have one structure
if num_mol == "1":
  mol_name=dirlist
  create_esp(mol_name)
else:
  for mol_name in dirlist:
    create_esp(mol_name)

if num_mol == "1":
  first_mol_name=dirlist
  run_resp(first_mol_name, num_mol)
else:
  first_mol_name=dirlist[0] #only need the first .ac file on your list
  run_resp(first_mol_name, num_mol)

To run the script

python2 make_charges.py 2

This script creates the input files for resp and run resp with the input file. look at the resp input file to ensure it is correct. RESP fit occurs in two stages. In the first stage, all atoms are allowed to vary and then in a second stage all degenerate hydrogen atoms are constrained to have equal charge and refit.

You should take a careful look at this file to make sure you understand it. The first stage consists of:

resp run #1

&cntrl
nmol=2,
ihfree=1,
qwt=0.0005,
iqopt=2,
/

nmol=2 tells RESP that this is a multi-conformational fit over two 'molecules' (two conformations of Met). ihfree=1 tells RESP we only want the weak restraint on the heavy atoms. qwt=0.0005 Strength of the restraint in A.U. iqopt=2 tells RESP to read in initial charges from a qin file in the form of Lagrange constraints. This is so we can fix the NME and ACE and backbone cap charges.

The next section contains the number of molecules:

The first value (1.0) is a weighting factor for the multi-molecule fit. In this case, we set it to 1.0 for both since we want them weighted equally. Then we have a line specifying the total charge and the number of atoms (0 29). Each following line specifies an individual atom listed in the same order as in the QM calculation. The first column is the atomic number and the second column is N.

N can either be -99, 0 or a positive integer. If N=-99 then it tells RESP that this atom's charge should be read from the qin file and constrained to this value. A 0 means that it is free to change during the fit and a positive integer allows you to specify that two atoms have the same charge. This is used in the second stage of the fit to constrain degenerate atoms to have the same charge. E.g. if we wanted atom 11 to have the same charge as atom 10 but for them to vary as a pair then atom 10 would have a 0 and atom 11 would have a 10.

The last section is specific to doing a multi-conformational fit.

   2
   1    1    2    1
   2
   1    2    2    2
   2
   1    3    2    3
   2
   1    4    2    4
   2
   1    5    2    5
   2
   1    6    2    6
   2
   1    7    2    7
   2

... Here you have sets of 2 lines for each atom. The first line just specifies the number of molecules, so in this case, it is always 2. The second line has the format: MOLECULE ATOM MOLECULE ATOM

  1        1       2      1

and is used to specify the equivalences between the molecules. So, in this case, the second line reads 1 1 2 1. Which implies that atom 1 of molecule 1 is equivalent to atom 1 of molecule 2.

This stage you might run into problems when doing multiconformations with the file format for the .esp file and the respin files. For the respin files ensure there are two empty lines between the first section and the multiconformation section for example

1 -99
 

2
1   1  2  1

For the charge.esp file, ensure that the number of grid points end at the 10th space on the line. The file format is 5 space for the number of atoms and then 5 space for the number of grid points. This is the reason there is a limit of 99,999 grid points per conformation. for example

   2910000  

where there are 29 atoms and 10,000 grid points

once you get your charges, run the following command to get an .ac file to generate a .lib file in tleap

antechamber -i mol_name.ac -fi ac -o mol_name_resp.ac -fo ac -c rc -cf qout_stage2

This will give you an .ac file with charges The .ac file can be use to generate a .lib file in tleap

I had to manual take the charges from the qout_stage2 file and enter it into the .ac file. You can write a script to do it or manually do it if it is only one molecule

Personal