A Robust Optimization Approach to Reliable Statistical Inference with Variables Generated by Machine Learning

Aaron Schecter
Corresponding Author
Aaron Schecter
[email protected]
https://orcid.org/0000-0002-3186-7788
Department of Management Information Systems, University of Georgia, Athens, Georgia 30602
Search for more papers by this author
,
Weifeng Li
Weifeng Li
[email protected]
https://orcid.org/0000-0002-2105-3596
Department of Management Information Systems, University of Georgia, Athens, Georgia 30602
Search for more papers by this author

Aaron Schecter

Corresponding Author

Aaron Schecter

[email protected]

https://orcid.org/0000-0002-3186-7788

Department of Management Information Systems, University of Georgia, Athens, Georgia 30602

Search for more papers by this author

Weifeng Li

[email protected]

https://orcid.org/0000-0002-2105-3596

Department of Management Information Systems, University of Georgia, Athens, Georgia 30602

Search for more papers by this author

Published Online:24 Dec 2025https://doi.org/10.1287/isre.2023.0340

References

Abbasi A, Parsons J, Pant G, Sheng ORL, Sarker S (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
Bertsimas D, Nohadani O (2019) Robust maximum likelihood estimation. INFORMS J. Comput. 31(3):445–458.Link, Google Scholar
Bertsimas D, Brown DB, Caramanis C (2011) Theory and applications of robust optimization. SIAM Rev. 53(3):464–501.Crossref, Google Scholar
Bertsimas D, Den Hertog D, Pauphilet J (2021) Probabilistic guarantees in robust optimization. SIAM J. Optim. 31(4):2893–2920.Crossref, Google Scholar
Bertsimas D, Gupta V, Kallus N (2018) Data-driven robust optimization. Math. Programming 167:235–292.Crossref, Google Scholar
Bertsimas D, Dunn J, Pawlowski C, Zhuo YD (2019) Robust classification. INFORMS J. Optim. 1(1):2–34.Link, Google Scholar
Bound J, Brown C, Duncan GJ, Rodgers WL (1994) Evidence on the validity of cross-sectional and longitudinal labor market data. J. Labor Econom. 12(3):345–368.Crossref, Google Scholar
Buonaccorsi JP (2010) Measurement Error: Models, Methods, and Applications (Chapman and Hall/CRC, New York).Crossref, Google Scholar
Carroll RJ, Stefanski LA (1994) Measurement error, instrumental variables and corrections for attenuation with applications to meta-analyses. Statist. Medicine 13(12):1265–1282.Crossref, Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement Error in Nonlinear Models: A Modern Perspective (Chapman and Hall/CRC, New York).Crossref, Google Scholar
Chan J, Wang J (2018) Hiring preferences in online labor markets: Evidence of a female hiring bias. Management Sci. 64(7):2973–2994.Link, Google Scholar
Chen R, Paschalidis IC (2018) A robust learning approach for regression models based on distributionally robust optimization. J. Machine Learn. Res. 19(13):1–48. Google Scholar
Delage E, Ye Y (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3):595–612.Link, Google Scholar
El Ghaoui L, Lebret H (1997) Robust solutions to least-squares problems with uncertain data. SIAM J. Matrix Anal. Appl. 18(4):1035–1064.Crossref, Google Scholar
Fong C, Tyler M (2021) Machine learning predictions as regression covariates. Political Anal. 29(4):467–484.Crossref, Google Scholar
Fuller WA (2009) Measurement Error Models (John Wiley & Sons, New York).Google Scholar
Gu B, Konana P, Rajagopalan B, Chen HWM (2007) Competition among virtual communities and user valuation: The case of investing-related communities. Inform. Systems Res. 18(1):68–85.Link, Google Scholar
Hong LJ, Huang Z, Lam H (2021) Learning-based robust optimization: Procedures and statistical guarantees. Management Sci. 67(6):3447–3467.Link, Google Scholar
Jockers M (2017) Package “syuzhet.” https://cran.r-project.org/web/packages/syuzhet.Google Scholar
Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning. INFORMS TutORials in Operations Research (INFORMS, Catonsville, MD), 130–166.Link, Google Scholar
Lee D, Hosanagar K, Nair HS (2018) Advertising content and consumer engagement on social media: Evidence from Facebook. Management Sci. 64(11):5105–5131.Link, Google Scholar
McAuley JJ, Leskovec J (2013) From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. Proc. 22nd Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 897–908.Google Scholar
Meijer E, Oczkowski E, Wansbeek T (2021) How measurement error affects inference in linear regression. Empirical Econom. 60(1):131–155.Crossref, Google Scholar
Mohammad S, Turney P (2010) Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. Proc. NAACL HLT 2010 Workshop Comput. Approaches Anal. Generation Emotion Text (Association for Computational Linguistics, Stroudsburg, PA), 26–34.Google Scholar
Qiao M, Huang KW (2021) Correcting misclassification bias in regression models with variables generated via data mining. Inform. Systems Res. 32(2):462–480.Link, Google Scholar
Schecter A, Nohadani O, Contractor N (2022) A robust inference method for decision making in networks. MIS Quart. 46(2):713–738.Crossref, Google Scholar
Tirunillai S, Tellis GJ (2012) Does chatter really matter? Dynamics of user-generated content and stock performance. Marketing Sci. 31(2):198–215.Link, Google Scholar
Wooldridge JM (2010) Econometric Analysis of Cross Section and Panel Data (MIT Press, Cambridge, MA).Google Scholar
Yang M, Adomavicius G, Burtch G, Ren Y (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
Yang M, McFowland E III, Burtch G, Adomavicius G (2022) Achieving reliable causal inference with data-mined variables: A random forest approach to the measurement error problem. INFORMS J. Data Sci. 1(2):138–155.Link, Google Scholar
Zhang S, Lee D, Singh PV, Srinivasan K (2022) What makes a good image? Airbnb demand analytics leveraging interpretable image features. Management Sci. 68(8):5644–5666.Link, Google Scholar

cover image Information Systems Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:July 07, 2023
Accepted:November 15, 2025
Published Online:December 24, 2025

Cite as

Aaron Schecter, Weifeng Li (2025) A Robust Optimization Approach to Reliable Statistical Inference with Variables Generated by Machine Learning. Information Systems Research 0(0).

https://doi.org/10.1287/isre.2023.0340

Keywords

Acknowledgments

A. Schecter thanks the faculty at the University of Notre Dame Department of Information Technology, Analytics, and Operations for their helpful feedback. The authors thank the anonymous reviewers, associate editor, and senior editor for their constructive feedback.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Robust Optimization Approach to Reliable Statistical Inference with Variables Generated by Machine Learning

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News