- 演讲人: 邱怡轩(上海财经大学,副教授)
- 时间:2025年4月1日15:30
- 地点:浙江大学紫金港校区行政楼1417报告厅
摘要:Coreset
selection, a technique for compressing large datasets while preserving
performance for downstream tasks, is crucial for modern machine learning. This work
presents a novel method for generating high-quality Wasserstein coresets using
the optimal transport approach, a powerful tool to compare and manipulate
probability distributions. We formulate this task as a bi-level optimization
problem, and address two central problems in this task: the forward computation
of the Sinkhorn loss that characterizes the difference between two probability
distributions, and the backward differentiation of the Sinkhorn loss with
respect to model parameters. In the forward stage, we show that the widely-used
Sinkhorn algorithm may suffer from numerical instability and slow convergence
speed. To this end, we propose and analyze new computational algorithms that
are both fast and stable with strong theoretical guarantees. In the backward
stage, we obtain an analytical formula for the derivative of the Sinkhorn loss,
combined with a rigorous error analysis. Numerical experiments demonstrate that
our approach significantly outperforms existing methods in terms of the sample
selection quality and the computational efficiency.
Bio:邱怡轩,上海财经大学统计与数据科学学院副教授,博士毕业于普渡大学统计系,毕业后曾于卡内基梅隆大学担任博士后研究员。主要研究方向包括深度学习、生成式模型和大规模统计计算与优化等,科研成果发表在统计学国际权威期刊(如JASA、Biometrika等)及机器学习顶级会议(如NeurIPS、ICLR等)之上。长期参与建设统计学与数据科学社区“统计之都”,是众多开源算法软件包(如Spectra、LBFGS++、ReHLine等)的开发者与维护者。个人主页:https://statr.me。