Towards Better Policies in Sequential Decision Making: A Robust Test for Stationarity
作者:
时间:2024-07-04
阅读量:190次
  • 演讲人: 吴振科(密歇根大学)
  • 时间:2024年7月23日10:00(北京时间)
  • 地点:浙江大学紫金港校区行政楼1417报告厅
  • 主办单位:浙江大学数据科学研究中心


Abstract:Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in sub-optimal policies learned under stationary assumptions. We propose a doubly-robust procedure for testing the stationarity assumption and detecting change points in offline RL settings, e.g., using data obtained from a completed sequentially randomized trial. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. I will use an interventional mobile health study, the largest to date in the US, to illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments.


报告人简介:

Zhenke is currently an Associate Professor of Biostatistics at University of Michigan, Ann Arbor. Zhenke Wu’s research involves the development of statistical methods that inform health decisions made by individuals. He is particularly interested in scalable Bayesian methods that integrate multiple sources of evidence, with a focus on hierarchical latent variable modeling. He also works on sequential decision making by developing new statistical tools for reinforcement learning and micro-randomized trials.