2025春季博士生论坛
作者:
时间:2025-05-23
阅读量:169次
  • 演讲人: 吴正楷、申百宁、张艳
  • 时间:2025年5月26日9:00-10:30
  • 地点:浙江大学紫金港校区行政楼1417报告厅
  • 主办单位:浙江大学数据科学研究中心

9:00-9:30

·报告人1吴正楷

·年级专业:23级概率论与数理统计

·导师:孙文光

·题目:Weighted false discovery rate control with conformal p-values

·摘要:The weighted false discovery rate (wFDR) framework offers a powerful approach for integrating hypothesis-specific weights to address the varying severity of decision errors. However, the weighted Benjamini–Hochberg (wBH) procedure faces challenges in modern machine learning applications, where the lack of a clearly defined null distribution and the complexity of deep learning algorithms limit its applicability. This article aims to overcome these obstacles by exploiting conformal inference tools to systematically improve the applicability, theoretical robustness, and efficiency of existing wFDR methods. The proposed adaptive wBH (ada-wBH) procedure eliminates the conservativeness of wBH and can handle positively dependent conformal p-values. We further generalize ada-wBH for testing grouped hypotheses and examine novel strategies for simultaneously incorporating decision and procedural weights. Our generalization of the wFDR framework unifies theory across both semi-supervised and classical contexts. Numerical experiments on simulated and real data demonstrate that ada-wBH effectively controls the wFDR while delivering significantly improved power relative to competitive approaches.

 

9:30-10:00

·报告人2申百宁

·年级专业:23级概率论与数理统计

·导师:蒋杭进

·题目:Regularized Estimation of High-Dimensional Matrix-Variate Autoregressive Models

·摘要:Matrix-variate time series data are increasingly popular in economics, statistics, and environmental studies, among other fields. The bilinear autoregressive structure is a popular modeling approach for such data, as it reduces model complexity while capturing dynamic interactions between rows and columns. However, in high-dimensional settings, the conventional iterated least-squares method requires estimating a large number of parameters, which hampers interpretability and scalability. To address this challenge, we propose regularized estimation procedures designed for settings in which the autoregressive coefficient matrices exhibit banded or sparse structures. Specifically, we introduce a Bayesian Information Criterion (BIC)-based approach to estimate the bandwidth in the banded case, and employ the LASSO technique for enforcing sparsity in the coefficient matrices. We derive asymptotic properties for both methods as the dimensions diverge and the sample size T → ∞. Simulations and real data examples demonstrate the effectiveness of our methods, comparing their forecasting performance against common autoregressive models in the literature.

 

10:00-10:30

·报告人3张艳

·年级专业:2023级计算机科学与技术

·导师:苗晓晔

·题目:Proxy-Validated Importance-Aware Federated Sample Selection with Meta Learning

·摘要:Federated data selection strategically chooses a group of high-quality samples to train a global model, and it is promising to optimize the convergence and resource overhead of federated learning (FL).  However, existing studies either fail to account for the dynamic importance of training samples or rely on external unbiased validation datasets. These shortcomings can compromise FL model performance, potentially complicating their application in real-world scenarios.

In this paper, we propose a novel proxy-validated importance-aware federated sample selection framework, termed FedSelect. It employs a novel meta learning approach with a proxy validation dataset to select the most positively important clients and their samples in each round, thereby optimizing FL model performance. To eliminate the dependency on external unbiased data, we present a momentum-based meta-margin function to discover influential samples as the proxy validation dataset. Furthermore, we also develop an online meta model update strategy to guarantee the efficiency of FedSelect. Comprehensive experiments on three benchmark datasets demonstrate that FedSelect is superior in both effectiveness and efficiency, while maintaining strong scalability across diverse scenarios.