Jesmin Jahan Tithi, PhD Student, Department Computer Science, SBU
Fast Polarization Energy on Multicores, Clusters of Multicores and GPUs
In this talk, I will present our shared-memory, distributed-memory and distributed-shared-memory parallel algorithms for approximating GB-polarization energy (e.g., polar part of free energy of hydration) of protein molecules. This is an octree-based hierarchical algorithm, built on Greengard-Rokhlin type near-far decomposition of data points (i.e., atoms and points sampled from the molecular surface) which calculates polarization energy using surface based r6-approximation of Generalized Born radii of atoms. We use approximations, cache- and space-efficient data structures, and efficient load-balancing schemes to develop highly scalable fast parallel algorithms for computing this energy. We have shown that our implementations outperform state-of-the-art GB-polarization energy implementations, such as Amber-12, GBr6, Gromacs 4.5.3, NAMD 2.9 and Tinker-6.0. We achieved a speedup factor of ~400 w.r.t Amber using as few as 144 cores for molecules with as many as half a million atoms. The error in the result was less than 1% of the energy value obtained by directly evaluating the corresponding equations without any additional approximation.
Similar octree based approximation schemes can be used for estimating other compute-intensive energy terms such as Lennard-Jones, dispersion and H-bonding energy. Most of these algorithms run in O(M) time and use O(M) space on a serial machine, where M is the number of atoms in the molecule. On a multicore machine with p cores these algorithms complete execution in O(M/p+ log M) expected time when load-balanced with a randomized work-stealing scheduler. We have shown that the octree-based algorithms incur far fewer cache misses compared to traditional loop-based implementations, and thus are less prone to slowdowns on modern machines due to cache miss penalties. I will show several examples with timing results for these algorithms, too.
Finally, I will also talk about some very recent results on GPU variants of these algorithms (still under review) implemented by Prof. Chandrajit Bajaj's group at UT Austin in collaboration with Prof. Rezaul Chowdhury of Stony Brook University.