The Prediction Model

The basic statistical mechanics setup includes mean force potentials that are established using radial distributions of 40 atom types and main torsion angles of amino acids. Stepwise multiple regression was then used to unify the atom and torsion angle potentials to construct the prediction model.

Atom Pair Potentials

The atomic level organisation of potentials based on the radial distribution is an extended version of conventional amino acid potentials and exhibits a wide coverage of local and non-local interactions, and hence provides an improvement of the prediction accuracy. The structural training dataset that initially furnishes the information for the extraction of these potentials consists of a dataset of 4024 non-redundant protein structures extracted from a recent PDB repository using the PISCES algorithm with 50% sequence identity and resolution less than 2.5Å.

The energy functions are predominantly derived from mean force potentials based on the inverse Boltzmann's principle which essentially states that probability densities and energies are closely related quantities.

where gij(rd)) is the radial pair distribution function of a pair i, j separated by a distance rd. g(rd)) is the description of the reference state. The distribution of all 40 heavy atoms is taken with the radial coverage of 2.5-20Å and bin size of 0.5Å for the mean force potentials.

Torsion Angle Potentials

The same dataset of 4024 non-redundant structures was used to derive the torsion angles φ and ψ after running DSSP for the whole dataset. The minimum bin size for the torsion angles was set to 1o comprising the bins ranging from -180 to 180 for both the torsion angles. Before the potential was developed, the torsion angle bins were initialised with a constant to avoid null values for the development of Boltzmann energy values. Then, the bins were normalised with a standard procedure using the circular Gaussian function for φ and ψ having the bivariate normal distribution.

Here, σ is the standard deviation and A(φ,ψ) is the Gaussian apodisation function for the torsion angles φ and ψ where the distribution of torsion angle potential is tapered around the peaks to accommodate torsion angle perturbation in the mutants.

The torsion angle count exhibits different frequencies and the population of angles bins differ from one amino acid to other. In order to avoid this problem, the torsion angle bins for all 20 amino acids were further normalised individually for the angles φ and ψ. Then, they were used to calculate the Boltzmann energy values for mean force potentials of all amino acids:

Here, the g(φ,ψ) and gref(φ,ψ) and are torsion angle distribution of a specific amino acid and the average distribution over all the amino acids respectively.