High-Dimensional Inference for Weak-Supervision with Feature-Dependent Label Noise
作者:
时间:2024-03-25
阅读量:658次
  • 演讲人: 王兆军(南开大学)
  • 时间:2024年4月12日14:00(北京时间)
  • 地点:浙江大学紫金港校区行政楼1417报告厅
  • 主办单位:浙江大学数据科学研究中心、浙江大学数学科学学院

摘要:This paper is concerned about a typical type of weak-supervision, the label noise problem.  A common setting for classification with label noise assumes that the noise level is independent of feature and known a priori. We consider the setting where a validation dataset with correct labels is available at learning time in addition to a large dataset with label noise. We argue that the classification with possibly feature-dependent noise in weakly-supervised settings can naturally be solved by a general logistic regression.  The rate-optimal estimators are obtained via maximizing a penalized joint likelihood function.  A sample-splitting-based method is further proposed for constructing confidence intervals for individual components of the regression vector, which enables us to identify label-noise-related features with error rate control.  The superiority of our method is demonstrated through asymptotic properties as well as numerical experiments.  A real example is also presented to illustrate how to use the proposed method in practice.



报告人简介:

王兆军,南开大学统计与数据科学学院执行院长/教授,国务院学位委员会统计学科评议组成员,全国统计教材编审委员会委员; 中国工业与应用数学学会副理事长, 中国统计教育学会副会长,中国工业统计教学研究会副会长,中国概率统计学会副理事长。曾任国家统计专家咨询委员会委员、中国现场统计研究会副理事长、天津市现场统计研究会理事长,天津工业与应用数学学会理事长,曾获国务院政府特贴、全国百篇优博指导教师、教育部自然科学二等奖及天津市自然科学一等奖。