Facile Solutions to the Problems Associated with Chemical Information and Mathematical Symbolism While Using Machine Translation Tools

J Chem Inf Model. 2020 Jul 27;60(7):3423-3430. doi: 10.1021/acs.jcim.0c00274. Epub 2020 Jun 25.

Abstract

Advances in computer-aided translation technology have made tremendous progress in accuracy in the past few years. Chemical Abstracts Service of the American Chemical Society summarizes scientific works from more than 50 languages and allows the users to search papers in nine selected languages. Currently, only the abstracts are rendered into English by human experts or by machine translation because full text translation of millions of articles is beyond the human capacity today. An English translation of a research paper, scientific book, or patent is often required for research, data mining, and for historical purposes from various foreign languages. Many fundamental papers in chemistry, quantum chemistry, physics, and mathematics contain a significant number of chemical or mathematical equations. One of the major known problems in machine translation of such symbolically dense texts is incorrect or meaningless output. This article describes how to optimize the existing machine translation tools to read foreign language papers embedded with chemical/mathematical equations. German and French languages have been selected for illustrative purposes for English translation. Direct upload of text with extensive symbolism is possible with certain services, but this also occasionally produces erroneous rendition into English. A facile solution to the associated problems with embedded equations and mathematical formulas is replacing the equations and notations with "dummy" variables. The placeholder or dummy symbols can be removed after translation, and the original equations are substituted again. This approach, which can be automated in future, relies on the idea that chemical formulas and mathematical notations are universal. Following the guidelines in the article, excellent translations can be produced from a text having interspersed equations and chemical symbols.

MeSH terms

  • Humans
  • Language*
  • Mathematics
  • Natural Language Processing
  • Symbolism
  • Translating*