Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Phillip J Schulte; Anastasios A Tsiatis; Eric B Laber; Marie Davidian

doi:10.1214/13-STS450

Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes

Stat Sci. 2014 Nov;29(4):640-661. doi: 10.1214/13-STS450.

Authors

Phillip J Schulte¹, Anastasios A Tsiatis², Eric B Laber³, Marie Davidian⁴

Affiliations

¹ Biostatistician, Duke Clinical Research Institute, Durham, North Carolina 27701, USA ( phillip.schulte@duke.edu ).
² Gertrude M. Cox Distinguished Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203, USA ( tsiatis@ncsu.edu ).
³ Assistant Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203, USA ( eblaber@ncsu.edu ).
⁴ William Neal Reynolds Professor, Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203, USA ( davidian@ncsu.edu ).

Abstract

In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.

Keywords: Advantage learning; bias-variance tradeoff; model misspecification; personalized medicine; potential outcomes; sequential decision making.

Abstract

Grants and funding