EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference with ML- Generated Variables

Published Online:https://doi.org/10.1287/mnsc.2024.08999

References

  • Allon G, Chen D, Jiang Z, Zhang D (2023) Machine learning and prediction errors in causal inference. Preprint, submitted June 15, https://doi.org/10.2139/ssrn.4480696.Google Scholar
  • Andrews I, Stock JH, Sun L (2019) Weak instruments in instrumental variables regression: Theory and practice. Ann. Rev. Econom. 11:727–753.CrossrefGoogle Scholar
  • Angrist JD, Pischke JS (2008) Mostly Harmless Econometrics: An Empiricist’s Companion (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Belloni A, Chen D, Chernozhukov V, Hansen C (2012) Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6):2369–2429.CrossrefGoogle Scholar
  • Bound J, Jaeger DA, Baker RM (1995) Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Amer. Statist. Assoc. 90(430):443–450.Google Scholar
  • Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.CrossrefGoogle Scholar
  • Buzas JS, Stefanski LA (1996) Instrumental variable estimation in generalized linear measurement error models. J. Amer. Statist. Assoc. 91(435):999–1006.CrossrefGoogle Scholar
  • Carroll RJ, Ruppert D, Stefanski LA (1995) Measurement Error in Nonlinear Models, vol. 105 (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement Error in Nonlinear Models: A Modern Perspective (Chapman and Hall/CRC).CrossrefGoogle Scholar
  • Cengiz D, Dube A, Lindner A, Zentler-Munro D (2022) Seeing beyond the trees: Using machine learning to estimate the impact of minimum wages on labor market outcomes. J. Labor Econom. 40(S1):S203–S247.CrossrefGoogle Scholar
  • Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. Krishnapuram B, Shah M, Smola A, Aggarwal C, Shen D, Rastogi R, eds. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
  • Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J (2018) Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning. Econom. J. 21(1):C1–C68.CrossrefGoogle Scholar
  • Conley TG, Hansen CB, Rossi PE (2012) Plausibly exogenous. Rev. Econom. Statist. 94(1):260–272.CrossrefGoogle Scholar
  • Cragg JG, Donald SG (1993) Testing identifiability and specification in instrumental variable models. Econom. Theory 9(2):222–240.CrossrefGoogle Scholar
  • Davidson R, MacKinnon JG (2006) The power of bootstrap and asymptotic tests. J. Econom. 133(2):421–441.CrossrefGoogle Scholar
  • Davidson R, MacKinnon JG (2008) Bootstrap inference in a linear equation estimated by instrumental variables. Econom. J. 11(3):443–477.CrossrefGoogle Scholar
  • Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted October 11, https://arxiv.org/abs/1810.04805.Google Scholar
  • Donald SG, Newey WK (2001) Choosing the number of instruments. Econometrica 69(5):1161–1191.CrossrefGoogle Scholar
  • Fanaee-T H, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Progress Artificial Intelligence 2(2–3):113–127.CrossrefGoogle Scholar
  • Fisher RA (1925) Statistical Methods for Research Workers, 5th ed. (Oliver and Boyd).Google Scholar
  • Fong C, Tyler M (2021) Machine learning predictions as regression covariates. Political Anal. (Oxford) 29(4):467–484.CrossrefGoogle Scholar
  • Fuller WA (1977) Some properties of a modification of the limited information estimator. Econometrica 45(4):939–953.CrossrefGoogle Scholar
  • Gleser LJ (1992) The importance of assessing measurement reliability in multivariate regression. J. Amer. Statist. Assoc. 87(419):696–707.CrossrefGoogle Scholar
  • Goh KY, Heng CS, Lin Z (2013) Social media brand community and consumer behavior: Quantifying the relative impact of user-and marketer-generated content. Inform. Systems Res. 24(1):88–107.LinkGoogle Scholar
  • Greene WH (2003) Econometric Analysis (Pearson Education India, Chennai, India).Google Scholar
  • Hall P (1992) The Bootstrap and Edgeworth Expansion (Springer, New York).CrossrefGoogle Scholar
  • Hopkins D, King G (2007) Extracting systematic social science meaning from text.Google Scholar
  • Horowitz JL (2019) Bootstrap methods in econometrics. Annu. Rev. Econom. 11(1):193–224.CrossrefGoogle Scholar
  • Hu Y, Schennach SM (2008) Instrumental variable treatment of nonclassical measurement error models. Econometrica 76(1):195–216.CrossrefGoogle Scholar
  • Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, et al. (2017) LightGBM: A highly efficient gradient boosting decision tree. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems vol. 30 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Kleibergen F, Paap R (2006) Generalized reduced rank tests using the singular value decomposition. J. Econom. 133(1):97–126.CrossrefGoogle Scholar
  • Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1):85–96.CrossrefGoogle Scholar
  • Lee D, Hosanagar K, Nair HS (2018) Advertising content and consumer engagement on social media: Evidence from Facebook. Management Sci. 64(11):5105–5131.LinkGoogle Scholar
  • Mehrhoff J (2009) A solution to the problem of too many instruments in dynamic panel data GMM. Bundesbank Series 1 Discussion Paper No. 2009,31, https://doi.org/10.2139/ssrn.2785360.Google Scholar
  • Moreno A, Terwiesch C (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.LinkGoogle Scholar
  • Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62:22–31.CrossrefGoogle Scholar
  • Murray MP (2006) Avoiding invalid instruments and coping with weak instruments. J. Econom. Perspective 20(4):111–132.CrossrefGoogle Scholar
  • Nevo A, Rosen AM (2012) Identification with imperfect instruments. Rev. Econom. Statist. 94(3):659–671.CrossrefGoogle Scholar
  • Oxley L, McAleer M (1993) Econometric issues in macroeconomic models with generated regressors. J. Econom. Survey 7(1):1–40.CrossrefGoogle Scholar
  • Pagan A (1984) Econometric issues in the analysis of regressions with generated regressors. Internat. Econom. Rev. (Philadelphia) 25(1):221–247.CrossrefGoogle Scholar
  • Qiao M, Huang KW (2021) Correcting misclassification bias in regression models with variables generated via data mining. Inform. Systems Res. 32(2):462–480.LinkGoogle Scholar
  • Roodman D (2009) A note on the theme of too many instruments. Oxford Bull. Econom. Statist. 71(1):135–158.CrossrefGoogle Scholar
  • Stefanski ALA, Cook JR (1995) Simulation-extrapolation: The measurement error Jackknife. J. Amer. Statist. Assoc. 90(432):1247–1256.CrossrefGoogle Scholar
  • Stock JH, Yogo M (2002) Testing for weak instruments in linear IV regression. NBER Working Paper No. 0284, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Stock JH, Wright JH, Yogo M (2002) A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econom. Statist. 20(4):518–529.CrossrefGoogle Scholar
  • Terza JV, Basu A, Rathouz PJ (2008) Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. J. Health Econom. 27(3):531–543.CrossrefGoogle Scholar
  • Tirunillai S, Tellis GJ (2012) Does chatter really matter? Dynamics of user-generated content and stock performance. Marketing Sci. 31(2):198–215.LinkGoogle Scholar
  • Wan F, Small D, Mitra N (2018) A general approach to evaluating the bias of 2-stage instrumental variable estimators. Statist. Medicine 37(12):1997–2015.CrossrefGoogle Scholar
  • Wang S, McCormick TH, Leek JT (2020) Methods for correcting inference based on outcomes predicted by machine learning. Proc. Natl. Acad. Sci. USA 117(48):30266–30275.CrossrefGoogle Scholar
  • Wei Y, Malik N (2022) Unstructured data, econometric models, and estimation bias. Preprint, submitted May 22, http://dx.doi.org/10.2139/ssrn.4113608.Google Scholar
  • Welch WJ (1990) Construction of permutation tests. J. Amer. Statist. Assoc. 85(411):693–698.CrossrefGoogle Scholar
  • Wooldridge JM (2002) Econometric Analysis of Cross Section and Panel Data (MIT Press, Cambridge, MA).Google Scholar
  • Wu X, Nethery RC, Sabath MB, Braun D, Dominici F (2020) Air pollution and Covid-19 mortality in the United States: Strengths and limitations of an ecological regression analysis. Sci. Adv. 6(45):eabd4049.CrossrefGoogle Scholar
  • Yang M, Ren Y, Adomavicius G (2019) Understanding user-generated content and customer engagement on facebook business pages. Inform. Systems Res. 30(3):839–855.LinkGoogle Scholar
  • Yang M, Adomavicius G, Burtch G, Ren Y (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.LinkGoogle Scholar
  • Yang M, McFowland E III, Burtch G, Adomavicius G (2022) Achieving reliable causal inference with data-mined variables: A random forest approach to the measurement error problem. INFORMS J. Data Sci. 1(2):138–155.LinkGoogle Scholar
  • Zhang S, Lee D, Singh PV, Srinivasan K (2021) What makes a good image? Airbnb demand analytics leveraging interpretable image features. Management Sci. 68(8):5644–5666.Google Scholar
  • Zhang J, Xue W, Yu Y, Tan Y (2023) Debiasing machine-learning-or AI-generated regressors in partial linear models. Preprint, submitted November 17, http://dx.doi.org/10.2139/ssrn.4636026.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.