Inter-residue energy flow (irEF) and energy conductivity(irEC)
In 2006, we proposed the concept of inter-residue energy conductivity (irEC) to measure the strength of the residue-residue interactions (Ishikura & Yamato, Chem. Phys. Lett. 432: 533-537, (2006)). irEC represents the amount of the energy transferred from one residue to another per unit time and proved to be particularly useful for the detailed analysis of the intramolecular signaling mechanism of proteins. For instance, we reported the application of irEC to illustrate the photosignal transduction mechanism of a small water-soluble photosensory protein–photoactive yellow protein, as mentioned in the reference above. The value of the energy flow from \(j\) to \(i\) can be calculated as follows:
The energy flow between atom groups, A and B, can be calculated by the summation of the inter-atomic energy flow between the pair:
If the atom groups A and B represent residues A and B, respectively, the corresponding energy flow between A and B represents the inter-residue energy flow (irEF). Next, the energy conductivity is defined based on the Green-Kubo linear response theory as:
In practice, the time integral is replaced by the sum of the auto-correlation function of \(J\) at discrete time points. We came to know, from our experience, that the time resolution of \(J\) should be no larger than 10 fs for the accurate estimation of the irEC. The CURP program performs these calculations for the proteins and visualizes the energy conductivities using an energy exchange network (EEN).This allows the user to grasp the pattern of the inter-residue interactions at a glance. In principle, it is possible to calculate irECs based on a single NVE trajectory. However, it is highly recommended to calulate thermal averages of irECs based on multiple NVE trajectories that start from different points on the energy landscape of the target protein. If the number of the NVE trajectories are sufficiently large, the thermally averaged irECs are fair representations of the irECs of the thermally fluctuating protein system.
Note
In the previous paper(CPL ,2006), we calculated the irEC based on the atomic decomposition of the potential energy using classical force-field functions, where the way of the decomposition involves arbitrariness. In the CURP program, the irEC calculation is performed in a different manner based on the pairwise decomposition of forces (CPL 539(2012)144). Accordingly, the numerical results of the irEC may not be coincide with those obtained earlier.
We will illustrate how to perform energy flow analysis using the PDZ3 domain as an example. To calculate the irEC described above, coordinates and velocities trajectory of the system of interest are required. The sample trajectory data of the PDZ3 domain can be downloaded from the following link:
Note that each trajectory file can be larege (a few GB). Below is a summary of the MD simulation:
CURP analyzes the inter-residue interactions through the following steps:
When we are focusing on residue-residue interactions in a polypeptide chain, solvent coordinates and velocities are not needed. It is possible to save disc space by running the follwoing scripts and disregarding the solvent degrees of freedom:
# Solvent spripping from the coordinates trajetory
$CURP_HOME/bin/conv-trj -crd \
-p system.prmtop -pf amber \
-i nev.crd.nc -if netcdf --irange 1 -1 1 \
-o stripped.crd.nc -of netcdf --orange 1 -1 1 \
mask -m mask.pdb
# Solvent stripping from the velocities trajectory
$CURP_HOME/bin/conv-trj -vel \
-p system.prmtop -pf amber \
-i nev.vel.nc -if netcdf --irange 1 -1 1 \
-o stripped.vel.nc -of netcdf --orange 1 -1 1 \
mask -m mask.pdb
Suppose that the time step of the NVE simulation was 2 fs. In the first (second) example above, the mask subcommand removes the solvent degrees of freedom from the coordinates (velocities) trajectory file, nev.crd.nc (nev.vel.nc), in which coordinates are saved every 10 fs = 5 frames (2 fs = 1 frame), and a solvent-stripped trajectory is stored in stripped.crd.nc (stripped.nev.nc). The filename of the parameter/topology file is specified after the -p option.
conv-trj command options
Note that the parameter and topology file is needed to be modified according to the solvent splitting for the subsequent MD simulations of the new system.
In the Amber restart and trajectory files, the time frames of atomic velocities are shifted by \(-\Delta t/2\) from those of atomic coordinates. In the CURP program, however, the time points of the atomic coordinates must coincide with those of the atomic velocities. Therefore, the atomic velocities in the Amber trajectory file must be modified before the energy flow calculations. As explained below, the conv-trj program of the CURP package estimates the atomic velocities at the time point of \(t\) from those at the time points of \(t - \Delta t/2\), and \(t + \Delta t/2\). Accordingly, the velocity frames should be recorded every step to the trajectory file.
Note
Suppose that you started your MD simulation from time \(t_0\). Time points of the Amber coordinate and the velocity frames are, then, (\(t_0 + \Delta t, t_0 + 2\Delta t, t_0 + 3\Delta t, \cdots\)), (\(t_0 + \Delta t/2, t_0 + 3\Delta t/2, t_0 + 5\Delta t/2, \cdots\)), respectively. In addition, the restart file contains the coordinate and the velocity frames at the time points of \(t_0\) and \(t_0 - \Delta t/2\), respectively.
To modify the time points of the Amber velocities trajectory, the following script is available:
# adjust the velocity time
$CURP_HOME/bin/conv-trj -vel \
-p stripped.prmtop -pf amber \
-i stripped.vel.nc -if netcdf --irange 1 -1 1 \
-o adjusted.vel.nc -of netcdf --orange 5 -1 5 \
adjust-vel
Suppose the the NVE simulation was performed with the time step of 2 fs. In the above example, the 1st, 2nd, ... , up to the last frames are read from the stripped.vel.nc file obtained in the Solvent stripping section. For each of the frame-pairs, (4th, 5th), (9th, 10th), \(\cdots\) , a new frame is generated at the midtime point of the frame-pair. Consequently, the time points of the the original 5th, 10th, \(\cdots\), frames are shifted by \(- \Delta t/2\) altogether and output to the adjusted.vel.nc file. Note that this process is needed for the velocities trajectory obtained by the leap frog algorithm, while not needed for that obtained by the velocity Verlet algorithm.
In this tutorial, we provided the sample trajectories in which the time points of the coordinates and velocities trajectories were adjusted.
A special care is needed when you use the leap frog integrator, which is usually employed in the AMBER program, and split your trajectory into multipile files. Suppose that the time period of the i-th trajectory segment is \([T_{i-1} + \Delta t, T_{i}]\), and the atomic coordinates and the velocities in this segment are recorded at the time points of \((T_{i-1} + \Delta t, T_{i-1} + 2\Delta t, \cdots ,T_{i} - \Delta t, T_{i})\) and \((T_{i-1} + \Delta t/2, T_{i-1} + 3\Delta t/2, \cdots, T_{i} - 3\Delta t/2, T_{i} - \Delta t/2)\), respectively. If you need to consider the velocities at \(T_{i}\) for the further calculations of energy flow, you need the velocity trajectory files of both i-th and (i+1)-st segments, because the velocities at \(T_{i}\) are estimated from those at \(T_{i} - \Delta t/2\) and \(T_{i} + \Delta t/2\). Note that the velocity trajectory file of the i-th segment can be replaced with the restart file generated at the end of the i-th segment.
# Example 1: adjust velocities for the 1st half of the velocity trajectory
$CURP_HOME/bin/conv-trj -vel \
-p system.prmtop -pf amber \
-i nve1.vel.nc -if netcdf --irange 1 -1 1 \
-o stripped1.vel.nc -of netcdf --orange 1 -1 1 \
mask -m mask.pdb
$CURP_HOME/bin/convtrj -vel \
-p strip.prmtop -pf amber \
-i stripped1.vel.nc -if netcdf --irange 1 -1 1 \
-o adjusted1.vel.nc -of netcdf --orange 5 -1 5 \
adjust-vel
Example 1 shows how to adjust the time points of the velocities to those of the atomic coordinates for the first half of the trajectory after removing unnecessary part of the system. Note that strip.prmtop represents the parameter/topology file generated for mask.pdb. As a result of the adjustment, the velocities at the time points of 10f, 20fs, \(\cdots\), and 4990 fs are saved to adjusted1.vel.nc.
# Example 2: adjust velocities for the 2nd half of the velocity trajectory
$CURP_HOME/bin/conv-trj -vel \
-p system.prmtop -pf amber \
-i nve1.rst -if restart --irange 1 -1 1 \
-i nve2.vel.nc -if netcdf --irange 1 -1 1 \
-o stripped2.vel.nc -of netcdf --orange 1 -1 1 \
mask -m mask.pdb
$CURP_HOME/bin/convtrj -vel \
-p strip.prmtop -pf amber \
-i stripped2.vel.nc -if netcdf --irange 1 -1 1 \
-o adjusted2.vel.nc -of netcdf --orange 1 -1 5 \
adjust-vel
Similarly, example 2 shows the velocity adjustment for the 2nd half of the trajectory. Here we need to read the restart file, nve1.rst before reading the velocity trajectory nve2.vel.nc. The velocities at t = 4999 fs (t = 5001 fs) are saved in nve1.rst (at the 1st frame of nve2.vel.nc), and the velocities at t = 5000 fs are estimated from those at 4999 and 5001 fs. If the velocities at t = 5000 fs are not necessary for the final output file, you do not need to read the restart file in this example. Note that the final velocity file is generated with “–orange 1 -1 5” so that the first frame at t = 5000 fs is included in the output file named adjusted2.vel.nc and, thus, the velocities at 5000 fs, 5010 fs, \(\cdots\), and 9990 fs are save in the output file.
Here we explain how to calculate irEF using PDZ3 as an example, with
To start the calculations, please type in the following command:
$ $CURP_HOME/bin/curp < eflow.cfg > eflow.log
or
$ mpiexec -n 2 $CURP_HOME/bin/curp < eflow.cfg > eflow.log
for parallel calculations with OpenMPI. In this case the number of cores is 2, eflow.cfg (see below) is a configuration file for the irEF calculations and eflow.log is the log file.
Alternatively, run_eflow.sh performs the equivalent tasks.
$ cd $CURP_HOME/tutorial/pdz3-eflow/eflow+ec
$ ./run_eflow.sh
After a while, the prompt will be back and you will get the following two files:
flux_grp.nc stores the fime series of irEF in the netcdf format. To check the content of this file, type in the following command:
$ ncdump outdata/flux_grp.nc
Here we show an example of eflow.cfg:
[input]
format = amber
# first_last_interval = 1 4 1
# group_file = group.ndx
[input_amber]
target = trajectory
topology_file = ../pdz3/stripped.prmtop.gz
coordinate_format = netcdf
coordinate_file = ../pdz3/strip.crd.nc
velocity_format = netcdf
velocity_file = ../pdz3/strip.vel.nc
[curp]
potential = amber12SB
method = energy-flux
group_method = residue
flux_grain = group
# target_atoms =
# enable_inverse_pair = no
group_pair_file = gpair.ndx
remove_trans = no
remove_rotate = no
log_frequency = 2
[output]
filename = outdata/flux.nc
format = netcdf
decomp = no
output_energy = no
A detailed explanation is provided below:
The input file format.
In this section name, after input_ comes the keyword specified as the format key in the [input] section. The following keywords are used in the input_amber section.
In this section, parameters and keywords are set for the irEF calculations.
Setting the output format.
The log file looks like this.
After irEF calculations, irEC is calculated based on the linear response theory. You will need the time series of irEF stored in flux_grp.nc. Type in the following command:
$ $CURP_HOME/bin/cal-tc \
--frame-range 1 10 1 --average-shift 1 \
-a outdata/acf.nc \
-o outdata/ec.dat outdata/flux_grp.nc > ec.log
A useful script file, run_ec.sh is available for these calculations:
$ cd $CURP_HOME/tutorial/pdz3-eflow/eflow+ec
$ ./run_ec.sh
You will then obtain energy conductivity data output/ec.dat and the time-correlatioin data file, outdata/acf.nc.
In each line of the data file, ec.dat, a pair of residues and the corresponding value of irEC is written as <residue_A> <residue_B> <L_AB * RT>, where <A> = donor residue, <B> = acceptor residue, L_AB = irEC between the residues A and B, R is the gas constant, and T is the absolute temperature (= 300 K for most cases). The unit of <L_AB * RT> is measured in \((kcal/mol)^2/fs\). Note that the order of <A> and <B> makes no difference bacause the value of L_AB is evaluated for the pair (A,B) without any directionality. However, the difference between the donor and acceptor could be important in some cases. For example, the interatomic electron tunneling current has directionality and in that case the order of the donor and the acceptor is important.
Example scripts to draw the EEN are found in $CURP_HOME/tutorial/pdz3-eflow/een directory, that contains:
In this example, the working directory for drawing EEN graph is graph-een. We used the data file apo.ec.dat in this directory. If user wish to use the user’s own irEC data file, perform the following steps:
Then you will get the EEN graph output by running make.
After editing the config.mk file and specifying some parameters for the graph-een command, run make to obtain an strong.ec.pdf file, which graphically illustrates the EEN.
By running make with different options, you can obtain the EEN in different representations as follows:
To customize the graph drawing conditions, the config.mk fle should be edited. An example of the config.mk file is shown below:
$ cat ../all.ec.dat | ./renum_residue.py 1:10
In this directory, a useful make target, make sel.ec.dat, is provided to run multiple python scripts at a time.
Here we describe parameters for the $CURP_HOME/bin/graph-een command.
The selected residues will appear in the image irrespective of whether the residues are donors or acceptors.
Here we describe the format of the cluster definition file, cluster.cfg.
Example of cluster.cfg:
[Ligand binding pocket]
322 323 324 325 326 327 328 331 339 372 376 379 380
shape = box, filled
color = black
fillcolor = yellow
fontsize = 30
.
.
.
In the above example each cluster name is indicated by square brackets [ and ]. This is followed by the listing of the members of each cluster. For instance, residues 29-60 are included in the cluster TM1.
Cluster attributes may be specified as well. You can specify multiple attributes such as color (= red, yellow, ...), shape (= box, rounded,...) and so on. Basically, any Graphviz-attributes can be specified.
Examples of common attributes:
Note
Sometimes, you may group multiple nodes together. To do so:
fillcolor = white
label =
color = white
An important point is to leave label-attribute blank.
This file describes node attributes. The format of this file is similar to that of cluster definition file. The difference between the two files is that the cluster definition file describes the attributes of the cluster as a whole, while the node style definition file specifies the attributes of each individual node.
Example of the node style definition file:
[default]
1-
fillcolor = lavender
fontcolor = black
fontname = Arial Bold
# fontname = Helvetica Neue Bold
fontsize = 14
height = 0.3
width = 0.7
[alpha helix 3]
393-399
fillcolor = black
fontcolor = white
A section name is indicated by square brackets [ and ]. In contrast to the cluster names, section names do not actually appear on the image of the EEN. If a node belongs to two different sections, the attribute setting of the new section overwrites the previous one. Therefore, it is convenient to put the [default] section at the top and set the default attributes there. The line, 1-, in the default section means that the default attributes apply to all of the nodes.
Attribute examples:
The differential EEN graph illustrates the difference of the irEC between a pair of homologous proteins or different states of the same protein as an EEN representation. The working directory for this is graph-een-diff. In this example, we show how to draw the differential EEN between the apo- and the \(\alpha 3\)-truncated forms of PDZ3 domain. The respective irEC data are stored in chart-een-apo/outdata/sel.ec.dat and chart-een-a3rem/outdata/sel.ec.dat. Before drawing the differential EEN graph, set the ec_fp1 (ec_fp2) variable to chart-een-apo/outata/sel.ec.dat (chart-een-a3rem/outdata/sel.ec.dat) in the config.mk file.
The differential EEN is consisted of two parts: For the 1st (2nd) part, the values of irEC increase (decrease) from ec_fp1 to ec_fp2, and these parts are denoted as inc and dec, respectively. Note that inc (dec) would contain the node pairs that are found only one of either ec_fp1 or ec_fp2. In such cases, there are two kinds of possibilities as described below:
In Makefile in the graph-een-diff directory, you will the following variables:
dval = 0.0001
Set the virtual value of irEC if a node pair is missing in either ec_fp1 or ec_fp2 (see above). If you running make, you will get the following five files:
Sometimes, you may be interested in only on of the five files. In such a case, for instance, if you type the following command:
$ make inc
then you will get only inc.ec.pdf.