Reference Aware Delexicalization (RAD) Framework: Theory Driven Artificial Intelligence Modeling for Domain Generalization

Sandeep Suntwal
Corresponding Author
Sandeep Suntwal
[email protected]
https://orcid.org/0000-0002-7746-7114
Information Systems Department, College of Business, University of Colorado, Colorado Springs, Colorado 80918
Search for more papers by this author
,
Susan A. Brown
Susan A. Brown
[email protected]
https://orcid.org/0000-0002-9484-4428
Management Information Systems Department, Eller College of Management, University of Arizona, Tucson, Arizona 85721
Search for more papers by this author

Sandeep Suntwal

Corresponding Author

Sandeep Suntwal

[email protected]

https://orcid.org/0000-0002-7746-7114

Information Systems Department, College of Business, University of Colorado, Colorado Springs, Colorado 80918

Search for more papers by this author

Susan A. Brown

[email protected]

https://orcid.org/0000-0002-9484-4428

Management Information Systems Department, Eller College of Management, University of Arizona, Tucson, Arizona 85721

Search for more papers by this author

Published Online:5 Mar 2026https://doi.org/10.1287/isre.2023.0457

References

Adamopoulos P, Ghose A, Todri V (2018) The impact of user personality traits on word of mouth: Text-mining social media platforms. Inform. Systems Res. 29(3):612–640. https://doi.org/10.1287/isre.2017.0768.Link, Google Scholar
Agrawal A, Batra D, Parikh D (2016) Analyzing the behavior of visual question answering models. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 955–1960.Crossref, Google Scholar
Aldoseri A, Al-Khalifa KN, Hamouda AM (2023) Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges. Appl. Sci. 13(12):7082.Crossref, Google Scholar
Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation. Proc. 6th Internat. Conf. Learn. Representations (ICLR 2018) (OpenReview.net).Google Scholar
Bender EM, Koller A (2020) Climbing towards NLU: On meaning, form, and understanding in the age of data. Proc. 58th Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5185–5198.Google Scholar
Bentivogli L, Magnini B, Dagan I, Dang HT, Giampiccolo D (2009) The fifth PASCAL recognizing textual entailment challenge. Proc. Text Anal. Conf. (TAC 2009) (National Institute of Standards and Technology, Gaithersburg, MD).Google Scholar
Berant J, Chou A, Frostig R, Liang P (2013) Semantic parsing on freebase from question-answer pairs. Proc. 2013 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1533–1544.Google Scholar
Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: A case study of African-American English. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1119–1130.Crossref, Google Scholar
Bonetta G, Roberti M, Cancelliere R, Gallinari P (2021) The rare word issue in natural language generation: A character-based solution. Informatics 8(1):20.Crossref, Google Scholar
Bourgeade T, Chiril P, Benamara F, Moriceau V (2023) What did you learn to hate? A topic-oriented analysis of generalization in hate speech detection. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 3495–3508.Google Scholar
Bowman SR, Vilnis L, Vinyals O, Dai, AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. Proc. 20th SIGNLL Conf. Comput. Natl. Language Learning (CoNLL 2016) (Association for Computational Linguistics, Stroudsburg, PA), 10–21.Crossref, Google Scholar
Bran AM, Cox S, Schilter O, Baldassari C, White AD, Schwaller P (2024) Augmenting large language models with chemistry tools. Nature Machine Intelligence 6(5):525–535.Crossref, Google Scholar
Chen J, Shen D, Chen W, Yang D (2021) Hiddencut: Simple data augmentation for natural language understanding with better generalizability. Proc. 59th Annual Meeting Assoc. Comput. Linguistics 11th Internat. Joint Conf. Natl. Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4380–4390.Google Scholar
Chen G, Huang L, Xiao S, Zhang, C, Zhao H (2024) Attending to customer attention: A novel deep learning method for leveraging multimodal online reviews to enhance sales prediction. Inform. Systems Res. 35(2):829–849.Link, Google Scholar
Chen X, Wang T, Guo T, Guo K, Zhou J, Li H, Song Z, et al. (2025) Unveiling the power of language models in chemical research question answering. Comm. Chemistry 8(1):4.Google Scholar
Clark K, Manning CD (2016) Deep reinforcement learning for mention-ranking coreference models. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 2256–2262.Crossref, Google Scholar
Devitt M (1981) Designation (Columbia University Press, New York).Crossref, Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
Ding Y, Wang L, Liang B, Liang S, Wang Y, Chen F (2022) Domain generalization by learning and removing domain-specific features. Adv. Neural Inform. Processing Systems 35:24226–24239.Google Scholar
Dixon L, Li J, Sorensen J, Thain N, Vasserman L (2018) Measuring and mitigating unintended bias in text classification. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 3495–3508.Google Scholar
Dong Y, Jiang X, Liu H, Jin Z, Gu B, Yang M, Li G (2024) Generalization or memorization: Data contamination and trustworthy evaluation for large language models. Findings Assoc. Comput. Linguistics: ACL 2024 (Association for Computational Linguistics, Stroudsburg, PA), 12039–12050.Crossref, Google Scholar
Dušek O, Jurčíček F (2016) Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. Proc. 54th Annual Meeting Assoc. Comput. Linguistics, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA), 45–51.Google Scholar
Ebrahimi M, Nunamaker JF Jr, Chen H (2020) Semi-supervised cyber threat identification in dark net markets: A transductive and deep learning approach. J. Management Inform. Systems 37(3):694–722.Crossref, Google Scholar
Evans G (1982) The Varieties of Reference (Oxford University Press, Oxford, UK).Google Scholar
Feng S, Wallace E, Grissom A, II, Iyyer M, Rodriguez P, Boyd-Graber J (2018) Pathologies of neural models make interpretations difficult. Proc. 2018 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 3719–3728.Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
Gururangan S, Swayamdipta S, Levy O, Schwartz R, Bowman SR, Smith NA (2018) Annotation artifacts in natural language inference data. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA) 107–112.Crossref, Google Scholar
Hardmeier C, Federico M (2010) Modelling pronominal anaphora in statistical machine translation. Proc. 7th Internat. Workshop Spoken Language Translation (Paris, France).Google Scholar
Herlihy C, Rudinger R (2021) MedNLI is not immune: Natural language inference artifacts in the clinical domain. Proc. 59th Annual Meeting Assoc. Comput. Linguistics 11th Internat. Joint Conf. Natl. Language Processing, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA), 1020–1027.Crossref, Google Scholar
Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Quart. 28(1):75–105.Crossref, Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput, 9(8):1735–1780.Crossref, Google Scholar
Hupkes D, Giulianelli M, Dankers V, Artetxe M, Elazar Y, Pimentel T, Christodoulopoulos C, et al. (2023) A taxonomy and review of generalization research in NLP. Nature Machine Intelligence 5(10):1161–1174.Crossref, Google Scholar
Iyyer M, Wieting J, Gimpel K, Zettlemoyer L (2018) Adversarial example generation with syntactically controlled paraphrase networks. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 1875–1885.Crossref, Google Scholar
Jacovi A, Caciularu A, Goldman O, Goldberg Y (2023) Stop uploading test data in plain text: Practical strategies for mitigating data contamination by evaluation benchmarks. Proc. 2023 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 5075–5084.Crossref, Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. Proc. 2014 Conf. Empirical Methods Natl Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1746–1751.Crossref, Google Scholar
Kouw WM, Loog M (2018) An introduction to domain adaptation and transfer learning. Preprint, submitted December 31, https://arxiv.org/abs/1812.11806.Google Scholar
Kripke SA (1972) Naming and necessity. Davidson D, Harman G, eds. Semantics of Natl. Language, vol. 40 (Harvard University Press Cambridge, MA), 253–355.Crossref, Google Scholar
Kripke S (1977) Speaker’s reference and semantic reference 1. Midwest Stud. Philosophy 2(1):255–276.Crossref, Google Scholar
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: A lite bert for self-supervised learning of language representations. Internat. Conf. Learn. Representations (ICLR 2020) (OpenReview.net).Google Scholar
Li L, Wan X (2018) Point precisely: Towards ensuring the precision of data in generated texts using delayed copy mechanism. Proc. 27th Internat. Conf. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 1044–1055.Google Scholar
Li Y, Liang S, Lyu M, R, Wang L (2024) Making long-context language models better multi-hop reasoners. Proc. 62nd Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 2462–2475.Crossref, Google Scholar
Ling X, Weld D (2012) Fine-grained entity recognition. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 94–100.Google Scholar
Ling X, Singh S, Weld DS (2015) Design challenges for entity linking. Trans. Assoc. Comput Linguistics 3:315–328.Crossref, Google Scholar
Liu NF, Lin K, Hewitt J, Paranjape A, Bevilacqua M, Petroni F, Liang P (2024) Lost in the middle: How language models use long contexts. Trans. Assoc. Comput. Linguistics 12:157–173.Crossref, Google Scholar
Liu H, Tam D, Muqeeth M, Mohta J, Huang T, Bansal M, Raffel CA (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv. Neural Inform. Processing Systems 35:1950–1965.Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: A robustly optimized BERT pretraining approach. Preprint, submitted July 26, https://arxiv.org/abs/1907.11692.Google Scholar
Mai HT, Chu CX, Paulheim H (2025) Do LLMs really adapt to domains? An ontology learning perspective. The Semantic Web: Proc. 23rd Internat. Semantic Web Conf. (ISWC 2024). Lecture Notes in Computer Science, vol. 15231 (Springer, Berlin), 126–143.Google Scholar
Matamoros-Fernández A, Farkas J (2021) Racism, hate speech, and social media: A systematic review and critique. Television New Media 22(2):205–224.Crossref, Google Scholar
McCoy RT, Pavlick E, Linzen T (2019) Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. Proc. 57th Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 3428–3448.Google Scholar
McDonald R, Petrov S, Hall K (2011) Multi-source transfer of delexicalized dependency parsers. Proc. 2011 Conf. Empirical Methods in Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 62–72.Google Scholar
Microsoft Research AI4Science & Microsoft Azure Quantum (2023) The impact of large language models on scientific discovery: A preliminary study using GPT-4. Preprint, submitted November 13, https://arxiv.org/abs/2311.07361.Google Scholar
Murphy, ML (2010) Lexical Meaning (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Poesio M, Vieira R (1998) A corpus-based investigation of definite description use. Comput. Linguistics 24(2):183–216.Google Scholar
Pomerleau D, Rao D (2017) The Fake News Challenge: Exploring how artificial intelligence technologies could be leveraged to combat fake news, http://www.fakenewschallenge.org/.Google Scholar
Putnam H (1975) The meaning of “meaning.” Gunderson K, ed. Language, Mind, and Knowledge, Minnesota Studies in the Philosophy of Science, vol. 7 (University of Minnesota Press, Minneapolis), 131–193.Google Scholar
Qi L, Yang H, Shi Y, Geng X (2024) NormAUG: Normalization-guided augmentation for domain generalization. IEEE Trans. Image Processing 33:1419–1431.Crossref, Google Scholar
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1527–1535.Google Scholar
Roberts M, Thakur H, Herlihy C, White C, Dooley S (2024) To the cutoff... and beyond? A longitudinal perspective on LLM data contamination. Proc. 12th Internat. Conf. Learning Representations (ICLR 2024) (OpenReview.net).Google Scholar
Romanov A, Shivade C (2018) Lessons from natural language inference in the clinical domain. Proc. 2018 Conf. Empirical Methods iNatl Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1586–1596.Crossref, Google Scholar
Samtani S, Chai Y, Chen H (2022) Linking exploits from the dark web to known vulnerabilities for proactive cyber threat intelligence: An attention-based deep structured semantic model. MIS Quart. 46(2):911–946.Crossref, Google Scholar
Sener O, Koltun V (2022) Domain generalization without excess empirical risk. Adv. Neural Inform. Processing Systems, 35:13380–13391.Google Scholar
Sharma S, He J, Suleman K, Schulz H, Bachman P (2017) Natural language generation in dialogue using lexicalized and delexicalized data. Internat. Conf. Learning Representations (ICLR 2017) Workshop (Toulon, France).Google Scholar
Shi F, Chen X, Misra K, Scales N, Dohan D, Chi EH, Zhou D (2023) Large language models can be easily distracted by irrelevant context. Proc. 40th Internat. Conf. Machine Learning, vol. 202 (PMLR, New York), 31210–31227.Google Scholar
Soames, S (2002) Beyond rigidity: The Unfinished Semantic Agenda of Naming and Necessity (Oxford University Press, Oxford, UK).Crossref, Google Scholar
Straub ET (2009) Understanding technology adoption: Theory and future directions for informal learning. Rev. Ed. Res. 79(2):625–649.Crossref, Google Scholar
Suntwal S, Paul M, Sharp R, Surdeanu M (2019) On the importance of delexicalization for fact verification. Proc. 2019 Conf. Empirical Methods Natural Language Processing 9th Internat. Joint Conf. Natl Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics, Stroudsburg, PA), 3413–3418.Crossref, Google Scholar
Thorne J, Vlachos A, Christodoulopoulos C, Mittal A (2018) FEVER: A large-scale dataset for fact extraction and verification. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 809–819.Crossref, Google Scholar
Vu T-T, Khadivi S, Phung D, Haffari G (2022) Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. Findings Assoc. Comput. Linguistics: ACL 2022 (Association for Computational Linguistics, Stroudsburg, PA), 582–588.Crossref, Google Scholar
Wang C, Zhao D, Wang B, He R, Hou Y (2024) Do LLMs have the generalization ability in conducting causal inference? Preprint, submitted October 15, https://arxiv.org/abs/2410.11385.Google Scholar
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding. Proc. 7th Internat. Conf. Learning Representations (ICLR 2019) (OpenReview.net).Google Scholar
Wang J, Lan C, Liu C, Ouyang Y, Qin T, Lu W, Yu PS (2022) Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowledge and Data Engrg. 35(8):8052–8072.Google Scholar
Webster J, Watson RT (2002) Analyzing the past to prepare for the future: Writing a literature review. MIS Quart. 26(2):xiii–xxiii.Crossref, Google Scholar
Wen T-H, Gašić M, Mrkšić N, Su P-H, Vandyke D, Young S (2015) Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. Proc. 2015 Conf. Empirical Methods Natal Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1711–1721.Crossref, Google Scholar
Williams A, Nangia N, Bowman SR (2018) A broad-coverage challenge corpus for sentence understanding through inference. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 1112–1122.Crossref, Google Scholar
Wu Y, Gardner M, Stenetorp P, Dasigi P (2022) Generating data to mitigate spurious correlations in natural language inference datasets. Proc. 60th Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 2660–2676.Crossref, Google Scholar
Wu J, Zhang S, Che F, Feng M, Shao P, Tao J (2025) Pandora’s box or Aladdin’s lamp: A comprehensive analysis revealing the role of RAG noise in large language models. Proc. 63rd Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
Xu C, Guo D, Duan N, McAuley J (2022) LaPraDoR: Unsupervised pretrained dense retriever for zero-shot text retrieval. Findings Assoc. Comput. Linguistics (ACL 2022) (Association for Computational Linguistics, Stroudsburg, PA), 3557–3569.Crossref, Google Scholar
Xu Z, Liu D, Yang J, Raffel C, Niethammer M (2021) Robust and generalizable visual representation learning via random convolutions. Proc. 9th Internat. Conf. Learning Representations (ICLR 2021) (OpenReview.net).Google Scholar
Yang K, Lau RY, Abbasi A (2023) Getting personal: A deep learning artifact for text-based measurement of personality. Inform. Systems Res. 34(1):194–222.Link, Google Scholar
Yang L, Yuan L, Cui L, Gao W, Zhang Y (2022) FactMix: Using a few labeled in-domain examples to generalize to cross-domain named entity recognition. Proc. 29th Internat. Conf. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5360–5371.Google Scholar
Ye T, Dong L, Xia Y, Sun Y, Zhu Y, Huang G, Wei F (2025) Differential transformer. Proc. 13th Internat. Conf. Learning Representations (ICLR 2025) (OpenReview.net).Google Scholar
Yu Z, Mareček D, Žabokrtský Z, Zeman D (2016) If you even don’t have a bit of Bible: Learning delexicalized POS taggers. Proc. 10th Internat. Conf. Language Resources Evaluation (LREC 2016) (European Language Resources Association, Portorož, Slovenia), 96–103.Google Scholar
Zeman D, Resnik P (2008) Cross-language parser adaptation between related languages. Proc. IJCNLP-08 Workshop NLP Less Privileged Languages (Asian Federation of Natural Language Processing), 35–42.Google Scholar
Zeman D, Marecek D, Yu Z, Zabokrtsky Z (2016) Planting trees in the desert: Delexicalized tagging and parsing combined. Proc. 30th Pacific Asia Conf. Language, Inform. Comput., 199–207.Google Scholar
Zhang Y, Wang H, Feng S, Tan Z, Han X, He T, Tsvetkov Y (2024) Can LLM graph reasoning generalize beyond pattern memorization? Findings Assoc. Comput. Linguistics EMNLP 2024 (Association for Computational Linguistics, Stroudsburg, PA), 2289–2305.Crossref, Google Scholar
Zheng Y, Koh HY, Yang M, Li L, May LT, Webb GI, Pan S, et al. (2024) Large language models in drug discovery and development: From disease mechanisms to clinical trials. Preprint, submitted September 6, https://arxiv.org/abs/2409.04481.Google Scholar
Zhou K, Yang Y, Qiao Y, Xiang T (2021) Domain generalization with mixstyle. Proc. 9th Internat. Conf. Learning Representations (ICLR 2021) (OpenReview.net).Google Scholar
Zhou Y, Guo C, Wang X, Chang Y, Wu Y (2024) A survey on data augmentation in large model era. Preprint, submitted January 27, https://arxiv.org/abs/2401.15422.Google Scholar
Zhou K, Liu Z, Qiao Y, Xiang T, Loy C (2022) Domain generalization: A survey. IEEE Trans. Pattern Anal. Machine Intelligence 45(4):4396–4415.Google Scholar
Zhou K, Zhu Y, Chen Z, Chen W, Zhao WX, Chen X, Lin Y, et al. (2023) Don’t make your LLM an evaluation benchmark cheater. Preprint, submitted November 3, https://arxiv.org/abs/2311.01964.Google Scholar
Zhu X, Ao X, Qin Z, Chang Y, Liu Y, He Q, Li J (2021) Intelligent financial fraud detection practices in post-pandemic era. Innovation 2(4):100176.Google Scholar

cover image Information Systems Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:August 01, 2023
Accepted:November 04, 2025
Published Online:March 05, 2026

Cite as

Sandeep Suntwal, Susan A. Brown (2026) Reference Aware Delexicalization (RAD) Framework: Theory Driven Artificial Intelligence Modeling for Domain Generalization. Information Systems Research 0(0).

https://doi.org/10.1287/isre.2023.0457

Keywords

Acknowledgments

The authors thank the senior editor, associate editor, and the anonymous reviewers for their thoughtful guidance and insights.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Reference Aware Delexicalization (RAD) Framework: Theory Driven Artificial Intelligence Modeling for Domain Generalization

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News