EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference with ML- Generated Variables
References
- (2023) Machine learning and prediction errors in causal inference. Preprint, submitted June 15, https://doi.org/10.2139/ssrn.4480696.Google Scholar
- (2019) Weak instruments in instrumental variables regression: Theory and practice. Ann. Rev. Econom. 11:727–753.Crossref, Google Scholar
- (2008) Mostly Harmless Econometrics: An Empiricist’s Companion (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- (2012) Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80(6):2369–2429.Crossref, Google Scholar
- (1995) Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Amer. Statist. Assoc. 90(430):443–450.Google Scholar
- (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
- (1996) Instrumental variable estimation in generalized linear measurement error models. J. Amer. Statist. Assoc. 91(435):999–1006.Crossref, Google Scholar
- (1995) Measurement Error in Nonlinear Models, vol. 105 (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (2006) Measurement Error in Nonlinear Models: A Modern Perspective (Chapman and Hall/CRC).Crossref, Google Scholar
- (2022) Seeing beyond the trees: Using machine learning to estimate the impact of minimum wages on labor market outcomes. J. Labor Econom. 40(S1):S203–S247.Crossref, Google Scholar
- (2016) Xgboost: A scalable tree boosting system. Krishnapuram B, Shah M, Smola A, Aggarwal C, Shen D, Rastogi R, eds. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
- (2018) Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning. Econom. J. 21(1):C1–C68.Crossref, Google Scholar
- (2012) Plausibly exogenous. Rev. Econom. Statist. 94(1):260–272.Crossref, Google Scholar
- (1993) Testing identifiability and specification in instrumental variable models. Econom. Theory 9(2):222–240.Crossref, Google Scholar
- (2006) The power of bootstrap and asymptotic tests. J. Econom. 133(2):421–441.Crossref, Google Scholar
- (2008) Bootstrap inference in a linear equation estimated by instrumental variables. Econom. J. 11(3):443–477.Crossref, Google Scholar
- (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted October 11, https://arxiv.org/abs/1810.04805.Google Scholar
- (2001) Choosing the number of instruments. Econometrica 69(5):1161–1191.Crossref, Google Scholar
- (2014) Event labeling combining ensemble detectors and background knowledge. Progress Artificial Intelligence 2(2–3):113–127.Crossref, Google Scholar
- (1925) Statistical Methods for Research Workers, 5th ed. (Oliver and Boyd).Google Scholar
- (2021) Machine learning predictions as regression covariates. Political Anal. (Oxford) 29(4):467–484.Crossref, Google Scholar
- (1977) Some properties of a modification of the limited information estimator. Econometrica 45(4):939–953.Crossref, Google Scholar
- (1992) The importance of assessing measurement reliability in multivariate regression. J. Amer. Statist. Assoc. 87(419):696–707.Crossref, Google Scholar
- (2013) Social media brand community and consumer behavior: Quantifying the relative impact of user-and marketer-generated content. Inform. Systems Res. 24(1):88–107.Link, Google Scholar
- (2003) Econometric Analysis (Pearson Education India, Chennai, India).Google Scholar
- (1992) The Bootstrap and Edgeworth Expansion (Springer, New York).Crossref, Google Scholar
- (2007) Extracting systematic social science meaning from text.Google Scholar
- (2019) Bootstrap methods in econometrics. Annu. Rev. Econom. 11(1):193–224.Crossref, Google Scholar
- (2008) Instrumental variable treatment of nonclassical measurement error models. Econometrica 76(1):195–216.Crossref, Google Scholar
- , et al. (2017) LightGBM: A highly efficient gradient boosting decision tree. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems vol. 30 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2006) Generalized reduced rank tests using the singular value decomposition. J. Econom. 133(1):97–126.Crossref, Google Scholar
- (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1):85–96.Crossref, Google Scholar
- (2018) Advertising content and consumer engagement on social media: Evidence from Facebook. Management Sci. 64(11):5105–5131.Link, Google Scholar
- (2009) A solution to the problem of too many instruments in dynamic panel data GMM. Bundesbank Series 1 Discussion Paper No. 2009,31, https://doi.org/10.2139/ssrn.2785360.Google Scholar
- (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.Link, Google Scholar
- (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62:22–31.Crossref, Google Scholar
- (2006) Avoiding invalid instruments and coping with weak instruments. J. Econom. Perspective 20(4):111–132.Crossref, Google Scholar
- (2012) Identification with imperfect instruments. Rev. Econom. Statist. 94(3):659–671.Crossref, Google Scholar
- (1993) Econometric issues in macroeconomic models with generated regressors. J. Econom. Survey 7(1):1–40.Crossref, Google Scholar
- (1984) Econometric issues in the analysis of regressions with generated regressors. Internat. Econom. Rev. (Philadelphia) 25(1):221–247.Crossref, Google Scholar
- (2021) Correcting misclassification bias in regression models with variables generated via data mining. Inform. Systems Res. 32(2):462–480.Link, Google Scholar
- (2009) A note on the theme of too many instruments. Oxford Bull. Econom. Statist. 71(1):135–158.Crossref, Google Scholar
- (1995) Simulation-extrapolation: The measurement error Jackknife. J. Amer. Statist. Assoc. 90(432):1247–1256.Crossref, Google Scholar
- (2002) Testing for weak instruments in linear IV regression. NBER Working Paper No. 0284, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (2002) A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econom. Statist. 20(4):518–529.Crossref, Google Scholar
- (2008) Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. J. Health Econom. 27(3):531–543.Crossref, Google Scholar
- (2012) Does chatter really matter? Dynamics of user-generated content and stock performance. Marketing Sci. 31(2):198–215.Link, Google Scholar
- (2018) A general approach to evaluating the bias of 2-stage instrumental variable estimators. Statist. Medicine 37(12):1997–2015.Crossref, Google Scholar
- (2020) Methods for correcting inference based on outcomes predicted by machine learning. Proc. Natl. Acad. Sci. USA 117(48):30266–30275.Crossref, Google Scholar
- (2022) Unstructured data, econometric models, and estimation bias. Preprint, submitted May 22, http://dx.doi.org/10.2139/ssrn.4113608.Google Scholar
- (1990) Construction of permutation tests. J. Amer. Statist. Assoc. 85(411):693–698.Crossref, Google Scholar
- (2002) Econometric Analysis of Cross Section and Panel Data (MIT Press, Cambridge, MA).Google Scholar
- (2020) Air pollution and Covid-19 mortality in the United States: Strengths and limitations of an ecological regression analysis. Sci. Adv. 6(45):eabd4049.Crossref, Google Scholar
- (2019) Understanding user-generated content and customer engagement on facebook business pages. Inform. Systems Res. 30(3):839–855.Link, Google Scholar
- (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
- (2022) Achieving reliable causal inference with data-mined variables: A random forest approach to the measurement error problem. INFORMS J. Data Sci. 1(2):138–155.Link, Google Scholar
- (2021) What makes a good image? Airbnb demand analytics leveraging interpretable image features. Management Sci. 68(8):5644–5666.Google Scholar
- (2023) Debiasing machine-learning-or AI-generated regressors in partial linear models. Preprint, submitted November 17, http://dx.doi.org/10.2139/ssrn.4636026.Google Scholar

