Asymptotic properties of high dimensional random forests
作者:
时间:2023-12-04
阅读量:307次
  • 演讲人: Yingying Fan (南加州大学)
  • 时间:2023年12月15日 星期五 15:00 (北京时间)
  • 地点:浙江大学紫金港校区行政楼1417报告厅
  • 主办单位:浙江大学数据科学研究中心

Abstract

As a flexible nonparametric learning tool, the random forests algorithm has been widely applied to various real applications with appealing empirical performance, even in the presence of high-dimensional feature space. Yet, because of its black-box nature, the results by random forests can be hard to interpret in many big data applications. This talk contributes to a fine-grained understanding of the random forests algorithm by discussing its consistency and variable selection properties in a general high-dimensional nonparametric regression setting. Specifically speaking, we derive the consistency rates for the random forests algorithm associated with the sample CART splitting criterion used in the original version of the algorithm (Breiman, 2001) through a bias-variance decomposition analysis. Our new theoretical results show that random forests can indeed adapt to high dimensionality and allow for discontinuous regression function. Our bias analysis takes a global approach that characterizes explicitly how the random forests bias depends on the sample size, tree height, and column subsampling parameter; and our variance analysis takes a local approach that bounds the forests variance via bounding the tree variance. A major technical innovation of our work is to introduce the sufficient impurity decrease (SID) condition which makes our bias analysis possible and precise.

报告人简介
Yingying Fan is Centennial Chair in Business Administration and Professor in Data Sciences and Operations Department at USC Marshall, and Professor of Economics at USC. She received her Ph.D. in Operations Research and Financial Engineering from Princeton University in 2007. She was Lecturer in the Department of Statistics at Harvard University and received the Royal Statistical Society Guy Medal in Bronze. Her research interests include statistics, data science, machine learning, economics, big data and business applications, and artificial intelligence and blockchain. Her papers have been published in journals in statistics, economics, computer science, information theory, and biology.