Paradoxes and resolutions for semiparametric fusion of individual and summary data
  • 演讲人: 苗旺(北京大学概率统计系研究员)
  • 时间:2023年3月21日上午10:00
  • 地点:(线下)浙江大学紫金港校区行政楼1416会议室 (线上)腾讯会议 ID:565-449-305
  • 主办单位:浙江大学数据科学研究中心
  • 协办单位:浙江大学统计学研究所

摘要:Suppose we have available  individual  data from  an internal study  and various types of summary statistics from relevant external studies.  External summary statistics have been used as constraints on the internal data distribution, which promised to  improve the statistical inference in the internal data; however, the  additional use of  external summary data may lead to paradoxical results:  efficiency loss  may occur  if the uncertainty of the summary statistics is not negligible  and  estimation bias can emerge     if  they are obtained in a different population   from the internal study. We investigate these    paradoxical results in a semiparametric framework. We establish the semiparametric efficiency bound for estimating  a  general functional of the internal data distribution, which is shown to be  no larger than that using only internal data.  We   propose  a data-fused efficient estimator that achieves this bound so that the efficiency paradox is resolved.  This   data-fused  estimator is further regularized with adaptive lasso penalty  so that the resultant estimator  can achieve the same asymptotic distribution as the oracle one that uses only unbiased summary statistics, which resolves the bias paradox. Simulations and   application to  a Helicobacter pylori infection dataset are used to illustrate the proposed methods.
