Recent advances on conformal p-values for large-scale applications
作者:
时间:2025-11-28
阅读量:144次
  • 演讲人: 魏鸿鑫(南方科技大学,助理教授)
  • 时间:2025年12月2日15:30
  • 地点:浙江大学紫金港校区行政楼1417报告厅
  • 主办单位:浙江大学数据科学研究中心

Abstract: Selecting promising candidates from massive pools is a central task in scientific discovery and modern AI systems. Conformal p-values provide a distribution-free, model-agnostic framework that guarantees finite-sample control of error rates such as the false discovery rate (FDR) in finite samples. In this talk, we present recent advances that make conformal selection scalable and flexible for real-world large-scale applications, with particular emphasis on large language models. First, we introduce our recent work -- Multi-Condition Conformal Selection (MCCS) algorithm, which extends conformal selection to scenarios with multiple conditions. In this work, we propose a novel nonconformity score with regional monotonicity for conjunctive conditions and a global Benjamini-Hochberg (BH) procedure for disjunctive conditions, thereby establishing finite-sample FDR control with theoretical guarantees.Then, we show how to design selection methods for AI labeling and training data identification with FDR control. Finally, we introduce PAC reasoning to control the performance loss of reasoning model with efficiency improvement. The presented results illustrate the considerable potential of conformal techniques for real-world applications, with particular relevance to large language models.


Bio: 魏鸿鑫,南方科技大学统计与数据科学系助理教授,其主要研究方向为机器学习中的不确定性估计,及其在数据优化与隐私中的应用。他的研究致力于使机器学习模型能够通过概率值或 Conformal 预测来准确表达预测中的不确定性。魏鸿鑫老师于2023年在新加坡南洋理工大学完成博士学位,曾在清华交叉信息研究院担任研究助理,读博期间曾在美国威斯康辛大学麦迪逊分校进行研究访问。他在 JMLR,ICML, NeurIPS, CVPR, AAAI, TKDE 等 CCF A 类顶级国际会议或期刊上发表学术论文40余篇,长期担任机器学习顶会 ICML、NeurIPS 、ICLR 领域主席和 IJCAI 高级程序委员会委员,并在JASA,TPAMI,IJCV,JMLR等顶级统计和机器学习期刊担任审稿人。其发布的开源项目-共形预测工具库TorchCP已获得社区下载安装超过2万次。