Chemically driven variable selection by focused multimodal genetic algorithms in mid-IR spectra

Anal Bioanal Chem. 2007 Dec;389(7-8):2331-42. doi: 10.1007/s00216-007-1608-1. Epub 2007 Oct 3.

Abstract

Four genetic-algorithm-based approaches to variable selection in spectral data sets are presented. They range from a pure black-box approach to a chemically driven one. The latter uses a fitness function that takes into account not only typical parameters like the number of errors when classifying a training set but also the chemical interpretability of the selected variables. In order to cope with the fact that multiple solutions may be acceptable, a multimodal genetic algorithm (GA) is employed and the most satisfactory solution selected. The multimodal GA uses two populations (denominated "hybrid two populations" GA or HTP-GA): a classical population, from which potential solutions emerge, and a new population, which maintains diversity in the search space (as required by multimodal problems). Results show that the HTP-GA approach improves the chemical understanding of the selected solution (compared to other GA approaches) and that the classification capabilities of the approach are still good. All of the GA strategies for variable selection were compared with a classical parametric technique, Procrustes rotation, which does not consider interpretability.