Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care

Fadila Zerka; Samir Barakat; Sean Walsh; Marta Bogowicz; Ralph T H Leijenaar; Arthur Jochems; Benjamin Miraglio; David Townend; Philippe Lambin

doi:10.1200/CCI.19.00047

Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care

JCO Clin Cancer Inform. 2020 Mar:4:184-200. doi: 10.1200/CCI.19.00047.

Authors

Fadila Zerka^{1

2}, Samir Barakat^{1

2}, Sean Walsh^{1

2}, Marta Bogowicz^{1

3}, Ralph T H Leijenaar^{1

2}, Arthur Jochems¹, Benjamin Miraglio², David Townend⁴, Philippe Lambin¹

Affiliations

¹ The D-Lab, Department of Precision Medicine, GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre, Maastricht, The Netherlands.
² Oncoradiomics, Liège, Belgium.
³ Department of Radiation Oncology, University Hospital Zurich and University of Zurich, Zurich, Switzerland.
⁴ Department of Health, Ethics, and Society, CAPHRI (Care and Public Health Research Institute), Maastricht University, Maastricht, The Netherlands.

Abstract

Big data for health care is one of the potential solutions to deal with the numerous challenges of health care, such as rising cost, aging population, precision medicine, universal health coverage, and the increase of noncommunicable diseases. However, data centralization for big data raises privacy and regulatory concerns.Covered topics include (1) an introduction to privacy of patient data and distributed learning as a potential solution to preserving these data, a description of the legal context for patient data research, and a definition of machine/deep learning concepts; (2) a presentation of the adopted review protocol; (3) a presentation of the search results; and (4) a discussion of the findings, limitations of the review, and future perspectives.Distributed learning from federated databases makes data centralization unnecessary. Distributed algorithms iteratively analyze separate databases, essentially sharing research questions and answers between databases instead of sharing the data. In other words, one can learn from separate and isolated datasets without patient data ever leaving the individual clinical institutes.Distributed learning promises great potential to facilitate big data for medical application, in particular for international consortiums. Our purpose is to review the major implementations of distributed learning in health care.

Publication types

Research Support, Non-U.S. Gov't
Systematic Review

MeSH terms

Algorithms*
Data Management / standards*
Data Mining / ethics*
Data Mining / methods
Databases, Factual / statistics & numerical data
Delivery of Health Care / ethics*
Delivery of Health Care / methods
Electronic Health Records / ethics*
Humans
Machine Learning*
Precision Medicine / methods
Privacy*