PetaFLOPS will have many applications studying the electronic structure of macromolecules, clusters, surfaces, and solids. Here, we discuss an example: The structure and properties of HIV protease by means of ab initio quantum chemical methods.
The detailed electronic structure of the HIV protease molecule can be
elucidated using methods based on Hartree-Fock theory and its
extensions (post-Hartree-Fock theory) such as configuration interaction
and many-body perturbation theory. Using present day computers,
application of these theories to such biological molecules is
prohibitively expensive at the lower levels of approximation and become
intractable as higher levels are used. These and related methods of
ab initio quantum chemistry utilize the complete
nonrelativistic Hamiltonian operator for a system comprised of
electrons and
nuclei and require the calculation of integrals
involving kinetic energy, nuclear attraction and nuclear repulsion
operators [].
The Hartree-Fock-Roothaan (HFR) procedure is used in most present day
ab initio applications to polyatomic systems. It, or components
thereof, also can be used as a point of departure for post-Hartree-Fock
methods. HFR calculations use basis sets to define one-electron
orbitals in the construction of molecular orbitals (MOs) as linear
combinations of atomic orbitals (LCAOs). The basis functions are
usually Gaussian-type functions (GTFs) centered at the atomic nuclei
for a polyatomic system. For an basis function representation of
the
electrons in the system, the HFR procedure results in a total of
kinetic energy,
nuclear attraction, and
electron repulsion integrals. Although
the
one-electron integrals is manageable with current
computing technologies, the
scale-up of the two-electron
integrals soon exhausts the capabilities of even the most advanced
massively parallel processing supercomputers-in spite of the fact
that the computation of such integrals (matrix elements) is
``embarrassingly parallel'' [].
At the Hartree-Fock level, the integrals are assembled into a
Fock matrix of order
(where
equals the number of contracted GTFs;
refer to example below) and its eigensolutions are extracted. This
process is repeated over and over until solutions are invariant to
within a pre-chosen threshold (i.e., attain self-consistency).
Typically, this can require hundreds of iterations. A benchmark is a
HFR calculation on a cluster of 135 beryllium atoms represented by a
basis set of contracted GTFs []. (The
larger basis of 3
-type and 2
-type ``primitive'' GTFs is collected,
using fixed coefficients derived from free-atom calculations, into the
smaller ``contracted'' set of 2
-type and 1
-type GTFs for
computational efficiency [].) The
basis set
contains
primitive and
contracted GTFs. The
integrals step
, which was reduced significantly using
molecular point group symmetry, consumed 24 CRAY X-MP cpu hours.
Biological molecules such as HIV protease contain approximately
nuclei and
electrons. A modest, but not unreasonable, basis
set could contain 10,000 basis functions. Simple scaling of the
result indicates that it would require
cpu years on
the approximately 1 GigaFLOPS CRAY X-MP to calculate the required
two-electron integrals. This calculation reduces to a very
tractable 15 minutes on a PetaFLOPS computer.
To determine the equilibrium structure of HIV protease using Hartree-Fock theory would require repeating this 15-minute calculation for different molecular geometries until the total energy reaches a minimum. Since there are nearly 4,500 vibrational degrees of freedom, an unrestricted geometry search scaling roughly as the number of nuclei squared becomes prohibitive even on a PetaFLOPS computer. Fortunately, it is appropriate to focus on the active site of the molecule and restrict the atomic motions to the few in its vicinity. The problem of geometry variation then can be addressed in hours to days.
Reactions and interactions at the active site are of particular
interest in HIV protease. Because chemical bonds are formed and
cleaved, it may be necessary to use higher levels of approximation
than Hartree-Fock theory to calculate accurate results. Configuration
interaction and many-body perturbation theory approaches can be used
in such cases []. A requirement for these post-HF
methods is the so-called four-index transformation of the
basis function (contracted GTF) integrals to the
basis-an
process []. This step is followed in the configuration
interaction (CI) procedure, for example, by construction of the
Hamiltonian matrix comprised of elements connecting excitations or
configurations involving so-called virtual
s. In a full-CI
procedure, the order
of the Hamiltonian matrix grows roughly as
for a configuration space of
s containing
electrons. In the
case of HIV protease, this is
configurations, a
calculation that would take an inconceivable length of time. A more
feasible procedure would be to treat at the CI level only those
electrons in the vicinity of the active site. In this instance, the CI
procedure becomes tractable and the calculation may be limited by the
transformation step. Furthermore, the excitation levels could
be constrained to replacements of electrons from occupied to virtual
s in order of
for single, double, triple, etc.,
excitations. The CI calculation then scales roughly as
. The
lowest energy eigensolutions of the
Hamiltonian matrix must be
calculated-a formidable task even for a PetaFLOPS computer handling
a CI that scales as
because
is still greater than
.
The discussion presented here clearly infers that the advent of PetaFLOPS computing will lead to an unprecedented expansion in the scope of ab initio quantum chemistry. Biological systems such as HIV protease, which are currently impossible to study at any level of theory using today's state-of-the-art computers, will become rote using HFR methods on a PetaFLOPS machine. And despite the fact that highly accurate full-CI calculations will remain out of reach, the feasibility of post-Hartree-Fock methods will no longer be an unattainable goal. Consequently, the advent of PetaFLOPS supercomputing will result in the birth of ab initio quantum biochemistry and the coming of age of ab initio quantum chemistry.