博士生讨论班2025[17]
时间:2025-09-22
阅读量:162次
- 演讲人: 韩雨哲
- 时间:2025年9月23日14:00
- 地点:浙江大学紫金港校区行政楼1417报告厅
报告文章:Probability-Gap–Driven Relabeling for PNU Semi-Supervised Learning with Label Selection Bias
摘要:Abstract: High-performance machine learning models typically require large amounts of labeled data, yet in clinical settings reliable labels are costly and scarce. Gold-standard tests, such as blood pressure measurement or laboratory confirmation, provide definitive diagnoses, but their administration is non-random and biased toward easily distinguishable cases. As a result, labeled datasets suffer from selection bias, while the unlabeled majority often contains ambiguous or borderline instances, limiting model generalization. We address this challenge with a Positive-Negative-Unlabeled (PNU) learning framework that explicitly accounts for biased labeling. Our method introduces a semi-supervised relabeling algorithm that assigns pseudo-labels to unlabeled samples using probability gaps, ensuring consistency with the Bayesian optimal classifier. The relabeled dataset is then integrated into a pseudo-likelihood estimation framework under a missing not at random mechanism, while kernel mean matching–based reweighting corrects for distributional shifts. The proposed framework is flexible across parametric and non-parametric settings and accommodates diverse feature types. Experiments on synthetic and real-world medical datasets demonstrate that our approach effectively leverages unlabeled data, mitigates selection bias, and improves classification performance.