From Theory to Practice: Enhancing Data Privacy with Mathematics and Statistics
作者:
时间:2025-11-05
阅读量:327次
  • 演讲人: 王晨笛(厦门大学,助理教授)
  • 时间:2025年11月11日15:30
  • 地点:浙江大学紫金港校区行政楼1417报告厅

摘要:A fundamental challenge in data privacy is the privacy-utility trade-off: increasing privacy protection often reduces data utility. A key to mitigating this trade-off is to calculate the privacy budget (also known as privacy accounting) more accurately. In this talk, we introduce a general framework for tightly accounting for privacy budget in practice, leveraging statistical tools like hypothesis testing and mathematical tools from harmonic analysis and discrete math. We demonstrate its effectiveness in two applications: the 2020 US Census and fully decentralized federated learning algorithms.

For the 2020 US Census, tight privacy accounting is difficult because the budget depends on complicated discrete distributions. To address this, we propose a hybrid (numerical and analytical) privacy accountant that reduces the mean squared error of privatized contingency tables by 15.8% to 24.8% across geographical levels compared to the method used in current products.

In fully decentralized federated learning, privacy is amplified by data decentralization and random walks, which makes the budget depend on complicated mixture distributions that are hard to quantify. Using tools from hypothesis testing and Markov chain analysis, we developed a sharper privacy accountant that, in many cases, achieves significant improvements to the privacy budget (provides a >2x improvement).

 

Bio: Chendi Wang is an Assistant Professor at the Wang Yanan Institute for Studies in Economics (WISE) and the School of Economics, Xiamen University. He received his Ph.D. from The Hong Kong Polytechnic University and his Bachelor’s degree from Beijing Normal University. From 2021 to 2024, he was a visiting scholar in the Wharton Department of Statistics and Data Science, University of Pennsylvania. His research focuses on data privacy and machine learning, with publications in leading journals such as PNAS and top-tier AI conferences such as ICML, ICLR, and NeurIPS, including an ICML 2024 oral presentation (top 1.5%) and a NeurIPS 2025 spotlight paper (top 3%). Additionally, his collaborative research on the privacy of the U.S. Census 2020 has been featured by New Scientist.