Partial identification in the statistical matching problem

Daniel Ahfock; Saumyadipta Pyne; Sharon X Lee; Geoffrey J McLachlan

doi:10.1016/j.csda.2016.06.005

Partial identification in the statistical matching problem

Comput Stat Data Anal. 2016 Dec:104:79-90. doi: 10.1016/j.csda.2016.06.005.

Authors

Daniel Ahfock¹, Saumyadipta Pyne^{2

3}, Sharon X Lee¹, Geoffrey J McLachlan¹

Affiliations

¹ Department of Mathematics, University of Queensland, Australia.
² Public Health Foundation of India, IIPH Hyderabad, India.
³ CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, India.

Abstract

The statistical matching problem involves the integration of multiple datasets where some variables are not observed jointly. This missing data pattern leaves most statistical models unidentifiable. Statistical inference is still possible when operating under the framework of partially identified models, where the goal is to bound the parameters rather than to estimate them precisely. In many matching problems, developing feasible bounds on the parameters is equivalent to finding the set of positive-definite completions of a partially specified covariance matrix. Existing methods for characterising the set of possible completions do not extend to high-dimensional problems. A Gibbs sampler to draw from the set of possible completions is proposed. The variation in the observed samples gives an estimate of the feasible region of the parameters. The Gibbs sampler extends easily to high-dimensional statistical matching problems.

Keywords: Data integration; Missing data; Positive-definite matrix completion; Statistical matching.

Abstract

Grants and funding