Efficient, Stable, and Differentiable Optimal Transport for Wasserstein Coreset Selection
作者:
时间:2025-03-24
阅读量:49次
  • 演讲人: 邱怡轩(上海财经大学,副教授)
  • 时间:2025年4月1日15:30
  • 地点:浙江大学紫金港校区行政楼1417报告厅

摘要:Coreset selection, a technique for compressing large datasets while preserving performance for downstream tasks, is crucial for modern machine learning. This work presents a novel method for generating high-quality Wasserstein coresets using the optimal transport approach, a powerful tool to compare and manipulate probability distributions. We formulate this task as a bi-level optimization problem, and address two central problems in this task: the forward computation of the Sinkhorn loss that characterizes the difference between two probability distributions, and the backward differentiation of the Sinkhorn loss with respect to model parameters. In the forward stage, we show that the widely-used Sinkhorn algorithm may suffer from numerical instability and slow convergence speed. To this end, we propose and analyze new computational algorithms that are both fast and stable with strong theoretical guarantees. In the backward stage, we obtain an analytical formula for the derivative of the Sinkhorn loss, combined with a rigorous error analysis. Numerical experiments demonstrate that our approach significantly outperforms existing methods in terms of the sample selection quality and the computational efficiency.

 

Bio:邱怡轩,上海财经大学统计与数据科学学院副教授,博士毕业于普渡大学统计系,毕业后曾于卡内基梅隆大学担任博士后研究员。主要研究方向包括深度学习、生成式模型和大规模统计计算与优化等,科研成果发表在统计学国际权威期刊(如JASABiometrika等)及机器学习顶级会议(如NeurIPSICLR等)之上。长期参与建设统计学与数据科学社区“统计之都”,是众多开源算法软件包(如SpectraLBFGS++ReHLine等)的开发者与维护者。个人主页:https://statr.me