Envelope-based partial partial least squares with application to cytokine-based biomarker analysis for COVID-19

Stat Med. 2022 Oct 15;41(23):4578-4592. doi: 10.1002/sim.9526. Epub 2022 Jul 15.

Abstract

Partial least squares (PLS) regression is a popular alternative to ordinary least squares regression because of its superior prediction performance demonstrated in many cases. In various contemporary applications, the predictors include both continuous and categorical variables. A common practice in PLS regression is to treat the categorical variable as continuous. However, studies find that this practice may lead to biased estimates and invalid inferences (Schuberth et al., 2018). Based on a connection between the envelope model and PLS, we develop an envelope-based partial PLS estimator that considers the PLS regression on the conditional distributions of the response(s) and continuous predictors on the categorical predictors. Root-n consistency and asymptotic normality are established for this estimator. Numerical study shows that this approach can achieve more efficiency gains in estimation and produce better predictions. The method is applied for the identification of cytokine-based biomarkers for COVID-19 patients, which reveals the association between the cytokine-based biomarkers and patients' clinical information including disease status at admission and demographical characteristics. The efficient estimation leads to a clear scientific interpretation of the results.

Keywords: Grassmann manifold; dimension reduction; envelope model; multivariate regression; partial least squares.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers
  • COVID-19* / diagnosis
  • Cytokines*
  • Humans
  • Least-Squares Analysis

Substances

  • Biomarkers
  • Cytokines