Abstract: Synthesizing information from multiple data sources is critical to ensure knowledge generalizability. Integrative analysis of multi-source data is challenging due to the heterogeneity across sources and data-sharing constraints. In this paper, we consider a general robust inference framework for federated meta-learning of data from multiple sites, enabling statistical inference for the prevailing model, defined as the one matching the majority of the sites. Statistical inference for the prevailing model is challenging since it requires a data-adaptive mechanism to select eligible sites and subsequently account for the selection uncertainty. We propose a novel sampling method to address the additional variation arising from the selection. Our devised confidence interval does not require sites to share individual-level data and is shown to be valid without requiring the selection of eligible sites to be error-free. The proposed robust inference for federated meta-learning (RIFL) methodology is broadly applicable and illustrated with three inference problems: aggregation of parametric models, high-dimensional prediction models, and inference for average treatment effects. We use RIFL to perform federated learning of mortality risk for patients hospitalized with COVID-19 using real-world EHR data from 16 healthcare centers representing 275 hospitals across four countries.
About the Speaker:
Zijian Guo is an associate professor at the Department of Statistics at Rutgers University. He obtained Ph.D. in Statistics in 2017 from Wharton School, University of Pennsylvania, under the supervision of Tony Cai. His research interests include high-dimensional statistics, causal inference, multi-source learning, and nonstandard statistical inference. His works have been published at AoS, JRSSB, JASA, JMLR, and JoE. He is currently serving as an associate editor at JASA.
Meeting ID: 394-642-267
Meeting Link: https://meeting.tencent.com/dm/sGnGsqDhoi2R