浙江大学数据科学研究中心- Towards Better Policies in Sequential Decision Making: A Robust Test for Stationarity

学术交流

Towards Better Policies in Sequential Decision Making: A Robust Test for Stationarity

作者：

时间：2024-07-04

阅读量：2312次

演讲人：吴振科（密歇根大学）
时间：2024年7月23日10:00（北京时间）
地点：浙江大学紫金港校区行政楼1417报告厅
主办单位：浙江大学数据科学研究中心

Abstract：Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in sub-optimal policies learned under stationary assumptions. We propose a doubly-robust procedure for testing the stationarity assumption and detecting change points in offline RL settings, e.g., using data obtained from a completed sequentially randomized trial. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. I will use an interventional mobile health study, the largest to date in the US, to illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.

报告人简介：

Zhenke is currently an Associate Professor of Biostatistics at University of Michigan, Ann Arbor. Zhenke Wu’s research involves the development of statistical methods that inform health decisions made by individuals. He is particularly interested in scalable Bayesian methods that integrate multiple sources of evidence, with a focus on hierarchical latent variable modeling. He also works on sequential decision making by developing new statistical tools for reinforcement learning and micro-randomized trials.

上一篇: Doubly-robust inference and optimality in structure-agnostic models with smoothness

下一篇: Towards Differentially Private Deep Learning under Hidden State Assumption

首页

中心概况

新闻中心

学术交流

科学研究

教育教学

招聘信息

综合服务

联系我们

讲座