在大数据时代,观察性研究(observational studies)已经成为科学研究中推断因果关系的重要数据来源。观察性数据中往往存在未观测混杂因素(unmeasured confounder)和缺失数据,这会导致因果推断的偏差和决策的错误。工具变量方法是调整未观测混杂因素的最为有效的方法之一,被广泛用于经济、金融、医学和生物研究。然而,工具变量的有效性无法从观测数据验证,基于工具变量的因果推断经常受到质疑。本次短期课程将回顾观察性研究的几个基础方法,并介绍新的有效工具变量方法、代理推断、非随机缺失数据分析方法,及其在生物医学、流行病学和社会经济中的应用案例。
主讲人:
郭子剑,副教授,Rutgers University
苗旺,助理教授,北京大学
时间:6月24日至6月27日,9:00-12:00,共四次课,
地点:浙江大学数据科学研究中心,行政楼1417
本次短期课程计划从以下方面讨论有关观察性研究的因果推断:
第一部分:因果推断基础,工具变量,弱和无效工具变量
1.因果推断的基础:从potential outcome 和 structural equation model 介绍 unmeasured confounder bias 以及其影响。
2.工具变量的基础:工具变量的假设以及two stage least square estimator,control function。 (Chapter 5 of 【1】 and 【2】)
3.弱工具变量(weak instrumental variable)的统计推断【3】:concentration parameters,Anderson-Rubin test, Conditional Likelihood Ratio test。
4.在存在无效工具变量(invalid instrumental variable)时进行有效推断【4】
5.工具变量选择导致的post-selection inference 及其解决办法【5】
6.高维内生变量相关的统计推断【6】
7.机器学习方法在观察性研究下的有效使用【7】
8.工具变量方法在经济学和医学中(包括但不限于孟德尔随机化)的具体应用以及软件使用【8,9】
第二部分:代理推断,合成对照和缺失数据分析
1.完全观测的混杂因素的调整
2. 重差法与合成对照 [10]
3. 代理推断 [11-13]
4. 基于代理推断的合成对照、阴性测试设计[14-15]
5. 非随机缺失数据的识别性和双稳健推断[16-18]
References
[1] Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT press.
[2] Guo, Z., and Small, D. S. (2016). Control function instrumental variable estimation of nonlinear causal effect models. Journal of Machine Learning Research, 17(100), 1-35.
[3] Stock, J. H., Wright, J. H., & Yogo, M. (2002). A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments. Journal of Business & Economic Statistics, 20(4), 518-529.
[4] Guo, Z., Kang, H., Cai, T. T., and Small, D. S. (2018). Confidence Interval for Causal Effects with Invalid Instruments using Two-Stage Hard Thresholding. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(4), 793-815.
[5] Guo, Z. (2021). Causal Inference with Invalid Instruments: Post-selection Problems and A Solution Using Searching and Sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology), to appear.
[6] Guo, Z., Cevid, D., and Buhlmann, P. (2022). Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding. Annals of Statistics, 50 (3), 1320 - 1347.
[7] Guo, Z. and Buhlmann, P. (2022). Causal Inference with Invalid Instruments: Exploring Nonlinear Treatment Models with Machine Learning. arXiv preprint arXiv:2203.12808.
[8] Yao, M., Guo, Z., and Liu, Z. (2023). Selecting Valid Genetic Instruments and Constructing Robust Confidence Intervals for Two-sample Mendelian Randomization Using Genome-wide Summary Statistics. medRxiv, 2023.02. 20.23286200.
[9] Koo, T., Lee, Y., Small, D. S., and Guo, Z. (2023). RobustIV and controlfunctionIV: Causal Inference for Linear and Nonlinear Models with Invalid Instrumental Variables. arXiv preprint arXiv:2301.04412.
[10] Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic Control Meth- ods for Comparative Case Studies: Estimating the Effect of California’s To- bacco Control Program. Journal of the American Statistical Association, 105(490):493–505.
[11] Miao, W., Z. Geng, and E. Tchetgen Tchetgen (2018). Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105, 987–993
[12] Shi, X., W. Miao, J. C. Nelson, and E. Tchetgen Tchetgen (2020). Multiply robust causal inference with double negative control adjustment for categorical unmeasured confounding. Journal of the Royal Statistical Society: Series B 82, 521–540
[13] Cui, Y., H. Pu, X. Shi, W. Miao, and E. Tchetgen Tchetgen (2023). Semiparametric proximal causal inference. Journal of the American Statistical Association,
[14] Li, K. Q., X. Shi, W. Miao, and E. Tchetgen Tchetgen (2023). Double negative control inference in test-negative design studies of vaccine effectiveness. Journal of the American Statistical Association
[15] Shi, X., W. Miao, M. Hu, and E. Tchetgen Tchetgen (2022). Theory for identification and Inference with Synthetic Controls: A Proximal Causal Inference Framework.
[16] Miao, W., P. Ding, and Z. Geng (2016). Identifiability of normal and normal mixture models with nonignorable missing data. Journal of the American Statistical Association 111, 1673–1683
[17] Miao, W. and E. Tchetgen Tchetgen (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika 103, 475–482
[18] Miao, W., X. Li, and B. Sun (2022). A stableness of resistance model for nonresponse adjustment with callback.