Open Access

Robust Predictive Modeling Under Unseen Data Distribution Shifts: A Methodological Commentary

Hanyu Duan
Hanyu Duan
[email protected]
https://orcid.org/0000-0003-2219-7759
Department of Information Systems, Business Statistics and Operations Management, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Search for more papers by this author
,
Yi Yang
Yi Yang
[email protected]
https://orcid.org/0000-0001-8863-112X
Department of Information Systems, Business Statistics and Operations Management, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Search for more papers by this author
,
Ahmed Abbasi
Corresponding Author
Ahmed Abbasi
[email protected]
https://orcid.org/0000-0001-7698-7794
Human-centered Analytics Lab and Department of IT, Analytics, and Operations, Mendoza College of Business, University of Notre Dame, Notre Dame, Indiana 46556
Search for more papers by this author
,
Kar Yan Tam
Kar Yan Tam
[email protected]
https://orcid.org/0000-0003-3242-0184
Department of Information Systems, Business Statistics and Operations Management, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Search for more papers by this author

Department of Information Systems, Business Statistics and Operations Management, School of Business and Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

Search for more papers by this author

Yi Yang

[email protected]

https://orcid.org/0000-0001-8863-112X

Search for more papers by this author

Ahmed Abbasi

Corresponding Author

Ahmed Abbasi

[email protected]

https://orcid.org/0000-0001-7698-7794

Human-centered Analytics Lab and Department of IT, Analytics, and Operations, Mendoza College of Business, University of Notre Dame, Notre Dame, Indiana 46556

Search for more papers by this author

Kar Yan Tam

[email protected]

https://orcid.org/0000-0003-3242-0184

Search for more papers by this author

Published Online:23 Mar 2026https://doi.org/10.1287/isre.2022.0537

References

Abbasi A, Somanchi S, Kelley K (2025) The critical challenge of using large-scale digital experiment platforms for scientific discovery. MIS Quart. 49(1):1–28.Crossref, Google Scholar
Abbasi A, Parsons J, Pant G, Sheng ORL, Sarker S (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
Agarap A (2018) Deep learning using rectified linear units (ReLU). Preprint, submitted March 22, https://arxiv.org/abs/1803.08375v1.Google Scholar
Agrawal A, Gans J, Goldfarb A (2018) Prediction Machines: The Simple Economics of Artificial Intelligence (Harvard Business Press, Boston).Google Scholar
Ahmad R, Alsmadi I, Alhamdani W, Tawalbeh L (2023) Zero-day attack detection: A systematic literature review. Artificial Intelligence Rev. 56(10):10733–10811.Crossref, Google Scholar
Ahmad F, Abbasi A, Li J, Dobolyi DG, Netemeyer RG, Clifford GD, Chen H (2020) A deep learning architecture for psychometric natural language processing. ACM Trans. Inform. Systems 38(1):1–29.Crossref, Google Scholar
Arjovsky M, Bottou L, Gulrajani I, Lopez-Paz D (2019) Invariant risk minimization. Preprint, submitted July 5, https://arxiv.org/abs/1907.02893.Google Scholar
Balaji Y, Sankaranarayanan S, Chellappa R (2018) Metareg: Towards domain generalization using meta-regularization. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 1006–1016.Google Scholar
Ben-Tal A, Den Hertog D, De Waegenaere A, Melenberg B, Rennen G (2013) Robust solutions of optimization problems affected by uncertain probabilities. Management Sci. 59(2):341–357.Link, Google Scholar
Brown DE, Abbasi A, Lau RY (2015) Predictive analytics: Predictive modeling at the micro level. IEEE Intelligent Systems 30(3):6–8.Crossref, Google Scholar
Carlucci FM, Russo P, Tommasi T, Caputo B (2019) Hallucinating agnostic images to generalize across domains. IEEE/CVF Internat. Conf. Comput. Vision Workshop (IEEE, Piscataway, NJ), 3227–3234.Google Scholar
Cha J, Chun S, Lee K, Cho HC, Park S, Lee Y, Park S (2021) SWAD: Domain generalization by seeking flat minima. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman VJ, eds. Proc. 35th Intern. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 22405–22418.Google Scholar
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R, et al. (2000) CRISP-DM 1.0: Step-by-step data mining guide. SPSS Inc. 9(13):1–73.Google Scholar
Chen T, Guestrin C (2016) XGboost: A scalable tree boosting system. Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R, eds. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: From big data to big impact. MIS Quart. 36(4):1165–1188.Crossref, Google Scholar
Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J (2016) Doctor AI: Predicting clinical events via recurrent neural networks. Finale D, Jim F, David K, Byron W, Jenna W, eds. Machine Learn. Healthcare Conf. vol. 56 (PMLR, New York), 301–318.Google Scholar
Duchi JC, Glynn PW, Namkoong H (2021) Statistics of robust optimization: A generalized empirical likelihood approach. Math. Oper. Res. 46(3):946–969.Link, Google Scholar
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J. Machine Learn. Res. 17(1):2096–2030.Google Scholar
Gao R, Kleywegt A (2023) Distributionally robust stochastic optimization with Wasserstein distance. Math. Oper. Res. 48(2):603–655.Link, Google Scholar
Gardner J, Popovic Z, Schmidt L (2024) Benchmarking distribution shift in tabular data with tableshift. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Proc. 37th Internat. Conf. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., Red Hook, NY), 53385–53432.Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: A deep learning approach. Getoor L, Scheffer T, eds. Proc. 28th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 513–520.Google Scholar
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://arxiv.org/abs/1412.6572v1.Google Scholar
Gulrajani I, Lopez-Paz D (2021) In search of lost domain generalization. Proc. 9th Internat. Conf. Learning Representations (ICLR, Appleton, WI).Google Scholar
Guo Y, Hu C, Yang Y (2023) Predict the future from the past? On the temporal data distribution shift in financial sentiment classifications. Houda B, Juan P, Kalika B, eds. Proc. 2023 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1029–1038.Google Scholar
Guo LL, Pfohl SR, Fries J, Johnson AE, Posada J, Aftandilian C, Shah N, Sung L (2022) Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine. Sci. Rep. 12(1):2726.Crossref, Google Scholar
Hai AA, Weiner MG, Livshits A, Brown JR, Paranjape A, Hwang W, Kirchner LH, et al. (2024) Domain generalization for enhanced predictions of hospital readmission on unseen domains among patients with diabetes. Artificial Intelligence Medicine 158:103010.Crossref, Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.Crossref, Google Scholar
Hulsen T (2020) Sharing is caring—Data sharing initiatives in healthcare. Internat. J. Environ. Res. Public Health 17(9):3046.Crossref, Google Scholar
Ilse M, Tomczak JM, Louizos C, Welling M (2020) Diva: Domain invariant variational autoencoders. Arbel T, Ben Ayed I, de Bruijne M, Descoteaux M, Lombaert H, Pal C, eds. Proc. Third Conf. Medical Imaging Deep Learn., vol. 121 (PMLR, Cambridge, MA), 322–348.Google Scholar
Judd CH (1932) A History of Psychology in Autobiography, Carl M, ed., vol. II (Clark University Press, Worcester, MA), 207--235.Google Scholar
Khoee AG, Yu Y, Feldt R (2024) Domain generalization through meta-learning: A survey. Artificial Intelligence Rev. 57(10):285.Crossref, Google Scholar
Kitchens B, Dobolyi D, Li J, Abbasi A (2018) Advanced customer analytics: Strategic value through integration of relationship-oriented big data. J. Management Inform. Systems 35(2):540–574.Crossref, Google Scholar
Krishnan R, Lalor JP, Prat N, Abbasi A (2025) From policy to practice: Research directions for trustworthy and responsible AI “by design.” IEEE Intelligent Systems 40(5):45–51.Crossref, Google Scholar
Li D, Yang Y, Song YZ, Hospedales T (2018) Learning to generalize: Meta-learning for domain generalization. McIlraith SA, Weinberger KQ, eds. Proc. 33nd AAAI Conf. Artificial Intelligence &30thInnovative Applications of Artificial Intelligence Conf. & 8th AAAI Sympos. Educational Adv. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3490--3497.Google Scholar
Li P, Li D, Li W, Gong S, Fu Y, Hospedales TM (2021) A simple feature augmentation for domain generalization. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE Computer Society, Washington, DC), 8886–8895.Google Scholar
Liu Y, Li X, Zheng Z (2023) Smart natural disaster relief: Assisting victims with artificial intelligence in lending. Inform. Systems Res. 35(2):489–504.Google Scholar
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Guyon I, Von LU, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. 31st International Conf. Neural Inform. Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY), 4768–4777. Google Scholar
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. 6th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Mahajan D, Tople S, Sharma A (2021) Domain generalization using causal matching. Meila M, Zhang T, eds. 38th Internat. Conf. Machine Learn. (PMLR, New York), 7313–7324.Google Scholar
Namkoong H, Duchi JC (2016) Stochastic gradient methods for distributionally robust optimization with f-divergences. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY), 2216–2224.Google Scholar
Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716.Crossref, Google Scholar
Padmanabhan B, Fang X, Sahoo N, Burton-Jones A (2022) Machine learning in information systems research. MIS Quart. 46(1):iii--xix.Google Scholar
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans. Knowledge Data Engrg. 22(10):1345–1359.Crossref, Google Scholar
Pfohl SR, Zhang H, Xu Y, Foryciarz A, Ghassemi M, Shah NH (2022) A comparison of approaches to improve worst-case predictive model performance over patient subpopulations. Sci. Rep. 12(1):3254.Crossref, Google Scholar
Pollard TJ, Johnson AE, Raffa JD, Celi LA, Mark RG, Badawi O (2018) The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5(1):1–13.Crossref, Google Scholar
Qiao M, Huang KW (2021) Correcting misclassification bias in regression models with variables generated via data mining. Inform. Systems Res. 32(2):462–480.Link, Google Scholar
Qiao M, Huang KW (2025) Correcting measurement error in regression models with variables constructed from aggregated output of data mining models. MIS Quart. 49(1):29–60.Crossref, Google Scholar
Qiao F, Zhao L, Peng X (2020) Learning to learn single domain generalization. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE Computer Society, Washington, DC), 12556–12565.Google Scholar
Rahimian H, Mehrotra S (2019) Distributionally robust optimization: A review. Preprint, submitted August 13, https://arxiv.org/abs/1908.05659.Google Scholar
Rahman MM, Fookes C, Baktashmotlagh M, Sridharan S (2019) Multi-component image translation for deep domain generalization. 2019 IEEE Winter Conf. Appl. Comput. Vision (IEEE Computer Society, Washington, DC), 579–588.Google Scholar
Rai A (2016) Editor’s comments: Synergies between big data and theory. MIS Quart. 40(2):iii–iix.Crossref, Google Scholar
Rai A (2020) Editor’s comments: Proactively attending to uncertainty in is research. MIS Quart. 44(1):iii–viii.Crossref, Google Scholar
Rai A, Burton-Jones A, Chen H, Gupta A, Hevner AR, Ketter W, Parsons J, Rao HR, Sarkar S, Yoo Y (2017) Editor’s comments: Diversity of design science research. MIS Quart. 41(1): iii--xviii.Google Scholar
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, et al. (2018) Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine 1(1):1–10.Crossref, Google Scholar
Sagawa S, Koh PW, Hashimoto TB, Liang P (2020) Distributionally robust neural networks. Proc. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Seo S, Suh Y, Kim D, Kim G, Han J, Han B (2020) Learning to optimize domain specific normalization for domain generalization. Vedaldi A, Bischof H, Brox T, JFrahm J-M, eds. Comput. Vision ECCV 2020: 16th Eur. Conf. Proc. Part XXII 16 (Springer, Berlin), 68–83.Google Scholar
Shan G, Qiu L (2025) Examining the impact of generative AI on users’ voluntary knowledge contribution: Evidence from a natural experiment on stack overflow. Inform. Systems Res. Forthcoming.Link, Google Scholar
Shen Z, Liu J, He Y, Zhang X, Xu R, Yu H, Cui P (2021) Towards out-of-distribution generalization: A survey. Preprint, submitted Augsut 31, https://arxiv.org/abs/2108.13624v1.Google Scholar
Sheth P, Moraffah R, Candan KS, Raglin A, Liu H (2022) Domain generalization—A causal perspective. Preprint, submitted September 30, https://arxiv.org/abs/2209.15177v1.Google Scholar
Shmueli G, Koppius OR (2011) Predictive analytics in information systems research. MIS Quart. 35(3):553–572.Crossref, Google Scholar
Si N, Zhang F, Zhou Z, Blanchet J (2023) Distributionally robust batch contextual bandits. Management Sci. 69(10):5772–5793.Link, Google Scholar
Simester D, Timoshenko A, Zoumpoulis SI (2020) Targeting prospective customers: Robustness of machine-learning methods to typical data challenges. Management Sci. 66(6):2495–2522.Link, Google Scholar
Sinha A, Namkoong H, Duchi J (2018) Certifying some distributional robustness with principled adversarial training. 6th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Sun B, Saenko K (2016) Deep CORAL: Correlation alignment for deep domain adaptation. Comput. Vision ECCV 2016 Workshops Proc., Part III 14 (Springer, Berlin), 443–450.Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J. Machine Learn. Res. 9(11):2579–2605.Google Scholar
Van Panhuis WG, Paul P, Emerson C, Grefenstette J, Wilder R, Herbst AJ, Heymann D, Burke DS (2014) A systematic review of barriers to data sharing in public health. BMC Public Health 14(1):1–9.Crossref, Google Scholar
Vapnik V (1991) Principles of risk minimization for learning theory. Moody J, Hanson S, Lippmann RP, eds. Proc. 5th International Conf. Neural Inform. Processing Systems (Morgan Kaufmann Publishers Inc., San Francisco, CA), 831–838.Google Scholar
Volpi R, Namkoong H, Sener O, Duchi JC, Murino V, Savarese S (2018) Generalizing to unseen domains via adversarial data augmentation. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 339–5349.Google Scholar
Wang G, Han H, Shan S, Chen X (2020) Cross-domain face presentation attack detection via multi-domain disentangled representation learning. Proc. 2020 IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE Computer Society, Washington, DC), 6678–6687.Google Scholar
Wang J, Lan C, Liu C, Ouyang Y, Qin T, Lu W, Chen Y, Zeng W, Philip SY (2022) Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowledge Data Engrg. 35(8):8052–8072.Google Scholar
Wilder-James E (2016) Breaking down data silos. Harvard Business Review (December 6), https://hbr.org/2016/12/breaking-down-data-silos.Google Scholar
Xu Z, Liu D, Yang J, Raffel C, Niethammer M (2021) Robust and generalizable visual representation learning via random convolutions. Proc. 9th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Yang K, Lau RY, Abbasi A (2023) Getting personal: A deep learning artifact for text-based measurement of personality. Inform. Systems Res. 34(1):194–222.Link, Google Scholar
Yang M, Adomavicius G, Burtch G, Ren Y (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) Mixup: Beyond empirical risk minimization. Proc. 6th Internat. Conf. Learn. Representations ((ICLR, Appleton, WI).Google Scholar
Zhang J, Xue W, Yu Y, Tan Y (2023a) Debiasing ML-or AI-generated regressors in partial linear models. Preprint, submitted November 30, https://doi.org/10.2139/ssrn.4636026.Google Scholar
Zhang H, Dullerud N, Seyyed-Kalantari L, Morris Q, Joshi S, Ghassemi M (2021a) An empirical framework for domain generalization in clinical settings. Proc. Conf. Health Inference Learn. (ACM, New York), 279–290.Google Scholar
Zhang X, He Y, Xu R, Yu H, Shen Z, Cui P (2023b) Nico++: Towards better benchmarking for domain generalization. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE Computer Society, Washington, DC), 16036–16047.Google Scholar
Zhang M, Marklund H, Dhawan N, Gupta A, Levine S, Finn C (2021b) Adaptive risk minimization: Learning to adapt to domain shift. Ranzato M, Beygelzimer A, Dauphin Y, Liang, PS, Workman VJ, eds. Proc. 35th Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 23664–23678.Google Scholar
Zhou K, Yang Y, Qiao Y, Xiang T (2021) Domain generalization with mixstyle. Proc. 9th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC (2022) Domain generalization: A survey. IEEE Trans. Pattern Anal. Machine Intelligence 45(4):4396–4415.Google Scholar

cover image Information Systems Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 23, 2022
Accepted:January 11, 2026
Published Online:March 23, 2026

Cite as

Hanyu Duan, Yi Yang, Ahmed Abbasi, Kar Yan Tam (2026) Robust Predictive Modeling Under Unseen Data Distribution Shifts: A Methodological Commentary. Information Systems Research 0(0).

https://doi.org/10.1287/isre.2022.0537

Keywords

Acknowledgments

The authors gratefully thank the senior editor, the associate editor, and anonymous reviewers for their constructive advice and guidance.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Robust Predictive Modeling Under Unseen Data Distribution Shifts: A Methodological Commentary

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News