Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning
作者:
时间:2024-03-25
阅读量:623次
  • 演讲人: 骆钇澐 (上海财经大学)
  • 时间:2024年4月9日15:30(北京时间)
  • 地点:浙江大学紫金港校区行政楼1417报告厅
  • 主办单位:浙江大学数据科学研究中心

摘要:In online retailing, the seller aims to offer assortment of items with maximized expected revenue. In this work, we introduce a new online learning problem called Dynamic Assortment Selection with Positioning (DAP) that additionally investigates the positioning of items within the assortment. Specifically, the customers make purchases based on the item attractiveness as the product of the position effect and unknown preference parameter through a multinomial logit choice model. Our objective is to maximize the revenue over a finite horizon. We first demonstrate that any assortment-only algorithm that neglects position effects results in linear regrets. To address this gap, we propose the Truncated Linear Regression Upper Confidence Bound (TLR-UCB) policy. TLR-UCB utilizes a novel geometric linear-bandit-type feedback structure to construct upper confidence bounds (UCB) for unknown preference parameters, accounting for both random and adaptive position effects. To ensure the validity of UCB construction, TLR-UCB adopts a truncation technique for conditional geometric responses before applying linear regression. In theory, we establish a regret upper bound of O(T^(1/2)) for TLR-UCB, matching our derived regret lower bound for the DAP problem. Extensive experiments demonstrate the superior performance of TLR-UCB by incorporating the position effects into the dynamic assortment selection process.


报告人简介:


骆钇澐博士现任教于上海财经大学统计与管理学院,主要从事在线学习,老虎机算法,动态定价,动态搭配等研究。其主要学术成果发表在 Mathematics of Operations Research, NeurIPS, Canadian Journal of Statistics等国际著名学术期刊和会议上。骆博士曾先后就读于北京大学和北卡罗来纳大学教堂山分校。