浙江大学数据科学研究中心- 统计与数据科学论坛

学术交流

统计与数据科学论坛

作者：

时间：2021-06-09

阅读量：4303次

时间：2021年06月26日周六 8:30-17:00
地点：杭州圆正启真酒店梨洲厅（线下）；腾讯会议ID：735 364 396（线上）
主办单位：浙江大学数据科学研究中心、浙江大学统计学研究所

论坛议程：

时间	内容
8:30-9:30	郑术蓉（东北师范大学） Title: Spectral Properties of High-dimensional Sample Correlation Matrices
9:30-10:00	茶歇
10:00-11:00	涂云东（北京大学） Title: Group Fused Lasso for Large Factor Models with Multiple Structural Breaks
11:00-12:00	郭绍俊（中国人民大学） Title: Dealing with Functional Sparsity in High Dimensional Functional Data Analysis
12:00-13:30	午餐
13:30-14:30	侯燕曦（复旦大学） Title: Prediction of Extremal Expectile Based on Regression Models With Heteroscedastic Extremes
14:30-15:30	郭旭（北京师范大学） Title: A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests
15:30-16:00	茶歇
16:00-17:00	王天颖（清华大学） Title: Integrated quantile rank test with its application in genetic and microbiome data

线下参加论坛的师生请扫码报名：

热忱欢迎各位老师同学参加！

浙江大学数据科学研究中心

2021年6月9日

附：报告人简介及学术报告信息

郑术蓉（东北师范大学数学与统计学院，教授）

主要研究方向是：大维随机矩阵理论及高维统计分析。曾在Annals of Statistics, JASA, Biometrika等统计学期刊上发表多篇跟大维随机矩阵理论有关的学术论文。现任Statistica Sinica, Journal of Multivariate Analysis等学术期刊编委，全国青年统计学家协会副会长等。曾主持国家自然科学基金委优秀青年科学基金、面上项目等多个项目。

Title: Spectral Properties of High-dimensional Sample Correlation Matrices

Abstract:

High-dimensional sample correlation matrix is an important random matrix in multivariate statistical analysis. Its central limit theory is one main theoretical basis for making statistical inference on high-dimensional correlation matrix. Under the high-dimensional framework that the dimension tends to infinity proportionally with the sample size, we establish the central limit theorems (CLT) for linear spectral statistics (LSS) of sample correlation matrices under two settings: (1). The population follows an independent component structure; (2). The population follows an elliptical structure including some heavy-tailed distributions. It shows that the CLTs of LSS of sample correlation matrices are very different under the two settings. Especially, even if the population correlation matrix is an identity matrix, the CLTs are different under the two settings. An application of our established two CLTs is given.

涂云东（北京大学光华管理学院商务统计与经济计量系和北京大学统计科学中心联席副教授，研究员）

曾获世界计量经济学会（Econometric Society）、Phi Beta Kappa International Scholarship Award。30余篇学术论文发表在Journal of Econometrics,Econometric Reviews, Journal of Business and Economic Statistics,Oxford Bulletin of Economics and Statisitics，StatisticaSinica，Journal of Empirical Finance，Computational Statistics and Data Analysis，《系统工程理论与实践》等国际国内知名专业杂志，并为多个专业学术杂志和自然科学基金匿名评审。主持多个自然科学基金项目。理论研究领域涵盖时间序列模型、非参数/半参数计量方法、模型选择和模型平均、网络数据建模、金融计量、信息计量经济学、模型设定检验等；应用研究包含宏观经济预测、价格指数建模、金融市场预测、环境污染预测、新冠肺炎预测等。

Title: Group Fused Lasso for Large Factor Models with Multiple Structural Breaks

Abstract:

High dimensional factor models are becoming popular in modeling high dimensional time series, especially with the arrival of the big data era. Due to the systematic structure changes, the time instability of factor loadings attracts much research attention recently and has lead to advancements in developing inferential methods to incorporate structural breaks in factor models. This paper reformulates the identification of multiple structural breaks in factor loadings as a problem of detecting structural breaks in factor regressions, as a result of which a group fused Lasso based estimation procedure is proposed to identify the break dates. Our procedure is practically easy-to-implement, overcoming the drawbacks of the classical methods that they often involve multiple tuning parameters and are computational demanding in dealing with multiple unknown breaks. Theoretical properties of the proposed estimators are established, with a data driven choice of tuning parameter in the procedure. The Monte Carlo simulation and real data demonstration illustrate that our procedure is fast implementable with desirable accuracy performance, and thus enjoys practical merits.

郭绍俊（中国人民大学统计与大数据研究院，长聘副教授）：

目前主要研究方向：统计学习；非参数及半参数统计建模；生存分析及函数型数据分析等。2003年本科毕业于山东师范大学，2008年获得中国科学院数学与系统科学研究院理学博士学位。博士毕业后留中国科学院数学与系统科学研究院工作，助理研究员，任期至2016年。2009年-2010年赴美国普林斯顿大学运筹与金融工程系博士后研究，从事高维数据分析方面的研究工作，并于2014-2016年在英国伦敦经济学院统计系做博士后研究，从事大维时间序列建模方面的研究。

Title: Dealing with Functional Sparsity in High Dimensional Functional Data Analysis

Abstract:

Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this talk, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert--Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding of the sample covariance function capturing the variability of individual functional entries. We investigate the convergence and support recovery properties of our proposed estimator under high-dimensional regime where p can grow exponentially with n. Our simulations demonstrate that the adaptive functional thresholding estimators significantly outperform the competing estimators. Finally, we illustrate the proposed method by the analysis of brain functional connectivity using two neuroimaging datasets.

侯燕曦（复旦大学大数据学院，副教授）：

主要研究方向：极值理论，copula和tail copula，非参数统计方法，统计推断在金融计量和风险管理中的应用。2017年在美国佐治亚理工学院数学学院获得博士学位。主要研究成果发表在AoS，JASA，JBES以及IME等国际期刊上。

Title: Prediction of Extremal Expectile Based on Regression Models With Heteroscedastic Extremes

Abstract:

Expectile recently receives much attention for its coherence as a tail risk measure. Estimation of conditional expectile at extremal tails is of great interest in quantitative risk management. Regression analysis is a convenient and useful way to quantify the conditional effect of some predictors or risk factors on an interesting response variable. However, when it comes to the estimation of extremal conditional expectile, the traditional inference methods may suffer from considerable variation due to a lack of sufficient samples on tail regions, which makes the prediction inaccurate. In this article, we study the estimation of extremal conditional expectile based on quantile regression and expectile regression models. We propose three methods to make extrapolation based on a second-order condition for a framework of the so-called conditionally heteroscedastic and unconditionally homoscedastic extremes. In addition, we establish the asymptotic properties of the proposed methods and show their empirical behaviors through simulation studies. Finally, data analysis is conducted to illustrate the applications of the proposed methods in real problems.

郭旭（北京师范大学统计学院，副教授）

2014年获得香港浸会大学博士学位，自2018年9月至2020年2月作为助理研究教授(Assistant Research Professor)访问美国宾州州立大学统计系。一直从事模型设定检验、高维数据分析和半参数回归分析等方面的研究，并取得了一系列的研究成果。在统计学顶级期刊JRSSB、Biometrika，JASA(accepted)，统计学主流期刊Statistica Sinica， Scandinavian Journal of Statistics等期刊发表论文近30篇。目前主持国家自然科学基金面上项目，主持完成国家自然科学基金数学天元基金和青年基金。

Title: A New Procedure for Controlling False Discovery Rate in Large-Scale t-tests

Abstract:

This paper is concerned with false discovery rate (FDR) control in large-scale multiple testing problems. We first propose a new data-driven testing procedure for controlling the FDR in large-scale t-tests for one-sample mean problem. The proposed procedure achieves exact FDR control in finite sample settings when the populations are symmetric no matter the number of tests or sample sizes. Comparing with the existing bootstrap method for FDR control, the proposed procedure is computationally efficient. We show that the proposed method can control the FDR asymptotically for asymmetric populations even when the test statistics are not independent. We further show that the proposed procedure with a simple correction is as accurate as the bootstrap method to the second-order degree, and could be much more effective than the existing normal calibration. We extend the proposed procedure to two-sample mean problem. Empirical results show that the proposed procedures have better FDR control than existing ones when the proportion of true alternative hypotheses is not too low, while maintaining reasonably good detection ability.

王天颖（清华大学统计学研究中心，助理教授）

研究方向：分位数回归，测量误差分析，高维数据统计分析，流行病学与生物遗传学的统计分析，电子医疗病历数据分析。2018年获得Texas A&M University统计学博士学位，2018年至2020年在哥伦比亚大学生物统计系从事博士后研究，2020年加入清华大学统计学研究中心，任助理教授。主要研究成果发表在JASA, Statistics in medicine等统计学顶级期刊。

Title: Integrated quantile rank test with its application in genetic and microbiome data.

Abstract:

We will introduce a new family of gene-level association tests that integrate quantile rank score process to better accommodate complex associations. The resulting test statistics have multiple advantages: (1) they are almost as efficient as the best existing tests when the associations are homogeneous across quantile levels and have improved efficiency for complex and heterogeneous associations, (2) they provide useful insights into risk stratification, (3) the test statistics are distribution-free and could hence accommodate a wide range of underlying distributions, and (4) they are computationally efficient. We established the asymptotic properties of the proposed tests under the null and alternative hypotheses and conducted large-scale simulation studies to investigate their finite sample performance. The performance of the proposed approach is compared with that of conventional mean-based tests, i.e., the Burden and SKAT tests, through simulation studies and applications to a Metabochip dataset on lipid traits, and to the genotype-tissue expression data in GTEx to identify eGenes. Further, we will discuss its generalized version with different kernels in microbiome data.

上一篇: 2020浙江大学数据科学研究中心交叉论坛

下一篇: 2022数据科学研究中心夏令营

首页

中心概况

新闻中心

学术交流

科学研究

教育教学

招聘信息

综合服务

联系我们

会议