Next: PetaFLOPS or PetaOPS Up: Exemplar Applications Previous: Lattice QCD

Computational Quantum Chemistry-HIV Protease Structure

PetaFLOPS will have many applications studying the electronic structure of macromolecules, clusters, surfaces, and solids. Here, we discuss an example: The structure and properties of HIV protease by means of ab initio quantum chemical methods.

The detailed electronic structure of the HIV protease molecule can be elucidated using methods based on Hartree-Fock theory and its extensions (post-Hartree-Fock theory) such as configuration interaction and many-body perturbation theory. Using present day computers, application of these theories to such biological molecules is prohibitively expensive at the lower levels of approximation and become intractable as higher levels are used. These and related methods of ab initio quantum chemistry utilize the complete nonrelativistic Hamiltonian operator for a system comprised of electrons and nuclei and require the calculation of integrals involving kinetic energy, nuclear attraction and nuclear repulsion operators [].

The Hartree-Fock-Roothaan (HFR) procedure is used in most present day ab initio applications to polyatomic systems. It, or components thereof, also can be used as a point of departure for post-Hartree-Fock methods. HFR calculations use basis sets to define one-electron orbitals in the construction of molecular orbitals (MOs) as linear combinations of atomic orbitals (LCAOs). The basis functions are usually Gaussian-type functions (GTFs) centered at the atomic nuclei for a polyatomic system. For an basis function representation of the electrons in the system, the HFR procedure results in a total of kinetic energy, nuclear attraction, and electron repulsion integrals. Although the one-electron integrals is manageable with current computing technologies, the scale-up of the two-electron integrals soon exhausts the capabilities of even the most advanced massively parallel processing supercomputers-in spite of the fact that the computation of such integrals (matrix elements) is ``embarrassingly parallel'' [].

At the Hartree-Fock level, the integrals are assembled into a Fock matrix of order (where equals the number of contracted GTFs; refer to example below) and its eigensolutions are extracted. This process is repeated over and over until solutions are invariant to within a pre-chosen threshold (i.e., attain self-consistency). Typically, this can require hundreds of iterations. A benchmark is a HFR calculation on a cluster of 135 beryllium atoms represented by a basis set of contracted GTFs []. (The larger basis of 3 -type and 2 -type ``primitive'' GTFs is collected, using fixed coefficients derived from free-atom calculations, into the smaller ``contracted'' set of 2 -type and 1 -type GTFs for computational efficiency [].) The basis set contains primitive and contracted GTFs. The integrals step , which was reduced significantly using molecular point group symmetry, consumed 24 CRAY X-MP cpu hours.

Biological molecules such as HIV protease contain approximately nuclei and electrons. A modest, but not unreasonable, basis set could contain 10,000 basis functions. Simple scaling of the result indicates that it would require cpu years on the approximately 1 GigaFLOPS CRAY X-MP to calculate the required two-electron integrals. This calculation reduces to a very tractable 15 minutes on a PetaFLOPS computer.

To determine the equilibrium structure of HIV protease using Hartree-Fock theory would require repeating this 15-minute calculation for different molecular geometries until the total energy reaches a minimum. Since there are nearly 4,500 vibrational degrees of freedom, an unrestricted geometry search scaling roughly as the number of nuclei squared becomes prohibitive even on a PetaFLOPS computer. Fortunately, it is appropriate to focus on the active site of the molecule and restrict the atomic motions to the few in its vicinity. The problem of geometry variation then can be addressed in hours to days.

Reactions and interactions at the active site are of particular interest in HIV protease. Because chemical bonds are formed and cleaved, it may be necessary to use higher levels of approximation than Hartree-Fock theory to calculate accurate results. Configuration interaction and many-body perturbation theory approaches can be used in such cases []. A requirement for these post-HF methods is the so-called four-index transformation of the basis function (contracted GTF) integrals to the basis-an process []. This step is followed in the configuration interaction (CI) procedure, for example, by construction of the Hamiltonian matrix comprised of elements connecting excitations or configurations involving so-called virtual s. In a full-CI procedure, the order of the Hamiltonian matrix grows roughly as for a configuration space of s containing electrons. In the case of HIV protease, this is configurations, a calculation that would take an inconceivable length of time. A more feasible procedure would be to treat at the CI level only those electrons in the vicinity of the active site. In this instance, the CI procedure becomes tractable and the calculation may be limited by the transformation step. Furthermore, the excitation levels could be constrained to replacements of electrons from occupied to virtual s in order of for single, double, triple, etc., excitations. The CI calculation then scales roughly as . The lowest energy eigensolutions of the Hamiltonian matrix must be calculated-a formidable task even for a PetaFLOPS computer handling a CI that scales as because is still greater than .

The discussion presented here clearly infers that the advent of PetaFLOPS computing will lead to an unprecedented expansion in the scope of ab initio quantum chemistry. Biological systems such as HIV protease, which are currently impossible to study at any level of theory using today's state-of-the-art computers, will become rote using HFR methods on a PetaFLOPS machine. And despite the fact that highly accurate full-CI calculations will remain out of reach, the feasibility of post-Hartree-Fock methods will no longer be an unattainable goal. Consequently, the advent of PetaFLOPS supercomputing will result in the birth of ab initio quantum biochemistry and the coming of age of ab initio quantum chemistry.



Next: PetaFLOPS or PetaOPS Up: Exemplar Applications Previous: Lattice QCD


gcf@npac.syr.edu