- 演讲人: 张澍一(华东师范大学,副教授)
- 时间:2025年2月25日15:30
- 地点:浙江大学紫金港校区行政楼1417报告厅
Abstract: Empirical risk minimization,
where the underlying loss function depends on a pair of data points, covers a
wide range of application areas in statistics including pairwise ranking and
survival analysis. The common empirical risk estimator obtained by averaging
values of a loss function over all possible pairs of observations is
essentially a U-statistic. One well-known problem with minimizing U-statistic
type empirical risks, is that the computational complexity of U-statistics
increases quadratically with the sample size. When faced with big data, this
poses computational challenges as the colossal number of observation pairs
virtually prohibits centralized computing to be performed on a single machine.
This paper addresses this problem by developing two computationally and
statistically efficient methods based on the divide-and-conquer strategy on a
decentralized computing system, whereby the data are distributed among machines
to perform the tasks. One of these methods is based on a surrogate of the
empirical risk, while the other method extends the one-step updating scheme in
classical M-estimation to the case of pairwise loss. We show that the proposed
estimators are as asymptotically efficient as the benchmark global U-estimator
obtained under centralized computing. As well, we introduce two distributed
iterative algorithms to facilitate the implementation of the proposed methods,
and conduct extensive numerical experiments to demonstrate their merit.
Bio: 张澍一,华东师范大学统计学院、统计交叉科学研究院副教授。2019年博士毕业于北京大学,随后在哈佛大学从事博士后研究。主要研究方向为分布式推断、多源学习、高维检验、缺失数据分析等,以及统计学在环境、海洋等领域的应用。主要成果发表于 AOS、JMLR 等期刊。入选上海市领军人才(青年海外)、浦江人才,现任环境计量学国际一流期刊《Environmetrics》副主编。