Tell Machine Learning Potentials What They Are Needed For: Simulation-Oriented Training Exemplified for Glycine

Fuchun Ge; Ran Wang; Chen Qu; Peikun Zheng; Apurba Nandi; Riccardo Conte; Paul L Houston; Joel M Bowman; Pavlo O Dral

doi:10.1021/acs.jpclett.4c00746

Tell Machine Learning Potentials What They Are Needed For: Simulation-Oriented Training Exemplified for Glycine

J Phys Chem Lett. 2024 Apr 25;15(16):4451-4460. doi: 10.1021/acs.jpclett.4c00746. Epub 2024 Apr 16.

Authors

Fuchun Ge¹, Ran Wang¹, Chen Qu², Peikun Zheng¹, Apurba Nandi^{3

4}, Riccardo Conte⁵, Paul L Houston⁶, Joel M Bowman³, Pavlo O Dral¹

Affiliations

¹ State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, and Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China.
² Independent Researcher, Toronto, Ontario M9B0E3, Canada.
³ Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, United States.
⁴ Department of Physics and Materials Science, University of Luxembourg, Luxembourg City L-1511, Luxembourg.
⁵ Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano, Italy.
⁶ Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, United States.

PMID: 38626460
DOI: 10.1021/acs.jpclett.4c00746

Abstract

Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PESs) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks, such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by diffusion Monte Carlo simulations.