![]() ![]() In addition, the possibility of twofold speedups on CPUs also justifies the need for such a study even for pure CPU code. ![]() 30 However, since most “gaming grade” GPUs typically provide significantly less computational performance for fp64 operations compared to single-precision floating point (fp32) operations (e.g., the fp32:fp64 ratio for the GTX 1080Ti is 32:1), the possibility of executing as many of the necessary computation using fp32 operations needs to be explored, despite its lower numerical precision of about 10 −7. Compared to pure fp64 evaluation, this leads to 4–7× speedups for the whole SCF procedure without any significant deterioration of the results or the convergence behavior.īy default, most quantum chemistry programs execute the necessary floating point operations with double numeric precision (fp64) due to its reliable accuracy (about 10 −16 relative precision). The proposed SCF scheme (i-sn-LinK) requires only one mixed-precision exchange matrix calculation, while all other exchange-matrix builds are performed with only fp32 operations. We also propose the use of incremental exchange-builds to further reduce these errors. This still provides very accurate (1.8 µ Eh maximal error) results while providing a sevenfold speedup on a typical “gaming” GPU (GTX 1080Ti). Therefore, we also investigate the possibility of employing only fp32 operations to evaluate the exchange matrix within the self-consistent-field (SCF) followed by an accurate one-shot evaluation of the exchange energy using mixed fp32/fp64 precision. ![]() Since the cost of evaluating the 3c1e integrals is less significant on graphic processing units (GPUs) compared to CPU, the performance gains from accelerating 3c1e integrals alone is less impressive on GPUs. This leads to a near doubling in performance on central processing units (CPUs) compared to pure fp64 evaluation. We investigate the applicability of single-precision (fp32) floating point operations within our linear-scaling, seminumerical exchange method sn-LinK and find that the vast majority of the three-center-one-electron (3c1e) integrals can be computed with reduced numerical precision with virtually no loss in overall accuracy. ![]()
0 Comments
Leave a Reply. |