Reference Aware Delexicalization (RAD) Framework: Theory Driven Artificial Intelligence Modeling for Domain Generalization
Published Online:5 Mar 2026https://doi.org/10.1287/isre.2023.0457
References
- (2018) The impact of user personality traits on word of mouth: Text-mining social media platforms. Inform. Systems Res. 29(3):612–640. https://doi.org/10.1287/isre.2017.0768.Link, Google Scholar
- (2016) Analyzing the behavior of visual question answering models. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 955–1960.Crossref, Google Scholar
- (2023) Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges. Appl. Sci. 13(12):7082.Crossref, Google Scholar
- (2018) Synthetic and natural noise both break neural machine translation. Proc. 6th Internat. Conf. Learn. Representations (ICLR 2018) (OpenReview.net).Google Scholar
- (2020) Climbing towards NLU: On meaning, form, and understanding in the age of data. Proc. 58th Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5185–5198.Google Scholar
- (2009) The fifth PASCAL recognizing textual entailment challenge. Proc. Text Anal. Conf. (TAC 2009) (National Institute of Standards and Technology, Gaithersburg, MD).Google Scholar
- (2013) Semantic parsing on freebase from question-answer pairs. Proc. 2013 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1533–1544.Google Scholar
- (2016) Demographic dialectal variation in social media: A case study of African-American English. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1119–1130.Crossref, Google Scholar
- (2021) The rare word issue in natural language generation: A character-based solution. Informatics 8(1):20.Crossref, Google Scholar
- (2023) What did you learn to hate? A topic-oriented analysis of generalization in hate speech detection. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 3495–3508.Google Scholar
- (2016) Generating sentences from a continuous space. Proc. 20th SIGNLL Conf. Comput. Natl. Language Learning (CoNLL 2016) (Association for Computational Linguistics, Stroudsburg, PA), 10–21.Crossref, Google Scholar
- (2024) Augmenting large language models with chemistry tools. Nature Machine Intelligence 6(5):525–535.Crossref, Google Scholar
- (2021) Hiddencut: Simple data augmentation for natural language understanding with better generalizability. Proc. 59th Annual Meeting Assoc. Comput. Linguistics 11th Internat. Joint Conf. Natl. Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4380–4390.Google Scholar
- (2024) Attending to customer attention: A novel deep learning method for leveraging multimodal online reviews to enhance sales prediction. Inform. Systems Res. 35(2):829–849.Link, Google Scholar
- (2025) Unveiling the power of language models in chemical research question answering. Comm. Chemistry 8(1):4.Google Scholar
- (2016) Deep reinforcement learning for mention-ranking coreference models. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 2256–2262.Crossref, Google Scholar
- (1981) Designation (Columbia University Press, New York).Crossref, Google Scholar
- (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
- (2022) Domain generalization by learning and removing domain-specific features. Adv. Neural Inform. Processing Systems 35:24226–24239.Google Scholar
- (2018) Measuring and mitigating unintended bias in text classification. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 3495–3508.Google Scholar
- (2024) Generalization or memorization: Data contamination and trustworthy evaluation for large language models. Findings Assoc. Comput. Linguistics: ACL 2024 (Association for Computational Linguistics, Stroudsburg, PA), 12039–12050.Crossref, Google Scholar
- (2016) Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. Proc. 54th Annual Meeting Assoc. Comput. Linguistics, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA), 45–51.Google Scholar
- (2020) Semi-supervised cyber threat identification in dark net markets: A transductive and deep learning approach. J. Management Inform. Systems 37(3):694–722.Crossref, Google Scholar
- (1982) The Varieties of Reference (Oxford University Press, Oxford, UK).Google Scholar
- (2018) Pathologies of neural models make interpretations difficult. Proc. 2018 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 3719–3728.Google Scholar
- (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
- (2018) Annotation artifacts in natural language inference data. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA) 107–112.Crossref, Google Scholar
- (2010) Modelling pronominal anaphora in statistical machine translation. Proc. 7th Internat. Workshop Spoken Language Translation (Paris, France).Google Scholar
- (2021) MedNLI is not immune: Natural language inference artifacts in the clinical domain. Proc. 59th Annual Meeting Assoc. Comput. Linguistics 11th Internat. Joint Conf. Natl. Language Processing, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA), 1020–1027.Crossref, Google Scholar
- (2004) Design science in information systems research. MIS Quart. 28(1):75–105.Crossref, Google Scholar
- (1997) Long short-term memory. Neural Comput, 9(8):1735–1780.Crossref, Google Scholar
- (2023) A taxonomy and review of generalization research in NLP. Nature Machine Intelligence 5(10):1161–1174.Crossref, Google Scholar
- (2018) Adversarial example generation with syntactically controlled paraphrase networks. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 1875–1885.Crossref, Google Scholar
- (2023) Stop uploading test data in plain text: Practical strategies for mitigating data contamination by evaluation benchmarks. Proc. 2023 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 5075–5084.Crossref, Google Scholar
- (2014) Convolutional neural networks for sentence classification. Proc. 2014 Conf. Empirical Methods Natl Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1746–1751.Crossref, Google Scholar
- (2018) An introduction to domain adaptation and transfer learning. Preprint, submitted December 31, https://arxiv.org/abs/1812.11806.Google Scholar
- (1972) Naming and necessity. Davidson D, Harman G, eds. Semantics of Natl. Language, vol. 40 (Harvard University Press Cambridge, MA), 253–355.Crossref, Google Scholar
- (1977) Speaker’s reference and semantic reference 1. Midwest Stud. Philosophy 2(1):255–276.Crossref, Google Scholar
- (2020) Albert: A lite bert for self-supervised learning of language representations. Internat. Conf. Learn. Representations (ICLR 2020) (OpenReview.net).Google Scholar
- (2018) Point precisely: Towards ensuring the precision of data in generated texts using delayed copy mechanism. Proc. 27th Internat. Conf. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 1044–1055.Google Scholar
- (2024) Making long-context language models better multi-hop reasoners. Proc. 62nd Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 2462–2475.Crossref, Google Scholar
- (2012) Fine-grained entity recognition. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 94–100.Google Scholar
- (2015) Design challenges for entity linking. Trans. Assoc. Comput Linguistics 3:315–328.Crossref, Google Scholar
- (2024) Lost in the middle: How language models use long contexts. Trans. Assoc. Comput. Linguistics 12:157–173.Crossref, Google Scholar
- (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv. Neural Inform. Processing Systems 35:1950–1965.Google Scholar
- (2019) RoBERTa: A robustly optimized BERT pretraining approach. Preprint, submitted July 26, https://arxiv.org/abs/1907.11692.Google Scholar
- (2025) Do LLMs really adapt to domains? An ontology learning perspective. The Semantic Web: Proc. 23rd Internat. Semantic Web Conf. (ISWC 2024). Lecture Notes in Computer Science, vol. 15231 (Springer, Berlin), 126–143.Google Scholar
- (2021) Racism, hate speech, and social media: A systematic review and critique. Television New Media 22(2):205–224.Crossref, Google Scholar
- (2019) Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. Proc. 57th Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 3428–3448.Google Scholar
- (2011) Multi-source transfer of delexicalized dependency parsers. Proc. 2011 Conf. Empirical Methods in Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 62–72.Google Scholar
- Microsoft Research AI4Science & Microsoft Azure Quantum (2023) The impact of large language models on scientific discovery: A preliminary study using GPT-4. Preprint, submitted November 13, https://arxiv.org/abs/2311.07361.Google Scholar
- (2010) Lexical Meaning (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1998) A corpus-based investigation of definite description use. Comput. Linguistics 24(2):183–216.Google Scholar
- (2017) The Fake News Challenge: Exploring how artificial intelligence technologies could be leveraged to combat fake news, http://www.fakenewschallenge.org/.Google Scholar
- Putnam H (1975) The meaning of “meaning.” Gunderson K, ed. Language, Mind, and Knowledge, Minnesota Studies in the Philosophy of Science, vol. 7 (University of Minnesota Press, Minneapolis), 131–193.Google Scholar
- (2024) NormAUG: Normalization-guided augmentation for domain generalization. IEEE Trans. Image Processing 33:1419–1431.Crossref, Google Scholar
- (2018) Anchors: High-precision model-agnostic explanations. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1527–1535.Google Scholar
- (2024) To the cutoff... and beyond? A longitudinal perspective on LLM data contamination. Proc. 12th Internat. Conf. Learning Representations (ICLR 2024) (OpenReview.net).Google Scholar
- (2018) Lessons from natural language inference in the clinical domain. Proc. 2018 Conf. Empirical Methods iNatl Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1586–1596.Crossref, Google Scholar
- (2022) Linking exploits from the dark web to known vulnerabilities for proactive cyber threat intelligence: An attention-based deep structured semantic model. MIS Quart. 46(2):911–946.Crossref, Google Scholar
- (2022) Domain generalization without excess empirical risk. Adv. Neural Inform. Processing Systems, 35:13380–13391.Google Scholar
- (2017) Natural language generation in dialogue using lexicalized and delexicalized data. Internat. Conf. Learning Representations (ICLR 2017) Workshop (Toulon, France).Google Scholar
- Zhou D (2023) Large language models can be easily distracted by irrelevant context. Proc. 40th Internat. Conf. Machine Learning, vol. 202 (PMLR, New York), 31210–31227.Google Scholar
- (2002) Beyond rigidity: The Unfinished Semantic Agenda of Naming and Necessity (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- (2009) Understanding technology adoption: Theory and future directions for informal learning. Rev. Ed. Res. 79(2):625–649.Crossref, Google Scholar
- (2019) On the importance of delexicalization for fact verification. Proc. 2019 Conf. Empirical Methods Natural Language Processing 9th Internat. Joint Conf. Natl Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics, Stroudsburg, PA), 3413–3418.Crossref, Google Scholar
- (2018) FEVER: A large-scale dataset for fact extraction and verification. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 809–819.Crossref, Google Scholar
- (2022) Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. Findings Assoc. Comput. Linguistics: ACL 2022 (Association for Computational Linguistics, Stroudsburg, PA), 582–588.Crossref, Google Scholar
- (2024) Do LLMs have the generalization ability in conducting causal inference? Preprint, submitted October 15, https://arxiv.org/abs/2410.11385.Google Scholar
- (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding. Proc. 7th Internat. Conf. Learning Representations (ICLR 2019) (OpenReview.net).Google Scholar
- (2022) Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowledge and Data Engrg. 35(8):8052–8072.Google Scholar
- (2002) Analyzing the past to prepare for the future: Writing a literature review. MIS Quart. 26(2):xiii–xxiii.Crossref, Google Scholar
- (2015) Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. Proc. 2015 Conf. Empirical Methods Natal Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1711–1721.Crossref, Google Scholar
- (2018) A broad-coverage challenge corpus for sentence understanding through inference. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 1112–1122.Crossref, Google Scholar
- (2022) Generating data to mitigate spurious correlations in natural language inference datasets. Proc. 60th Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 2660–2676.Crossref, Google Scholar
- (2025) Pandora’s box or Aladdin’s lamp: A comprehensive analysis revealing the role of RAG noise in large language models. Proc. 63rd Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
- (2022) LaPraDoR: Unsupervised pretrained dense retriever for zero-shot text retrieval. Findings Assoc. Comput. Linguistics (ACL 2022) (Association for Computational Linguistics, Stroudsburg, PA), 3557–3569.Crossref, Google Scholar
- (2021) Robust and generalizable visual representation learning via random convolutions. Proc. 9th Internat. Conf. Learning Representations (ICLR 2021) (OpenReview.net).Google Scholar
- (2023) Getting personal: A deep learning artifact for text-based measurement of personality. Inform. Systems Res. 34(1):194–222.Link, Google Scholar
- (2022) FactMix: Using a few labeled in-domain examples to generalize to cross-domain named entity recognition. Proc. 29th Internat. Conf. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5360–5371.Google Scholar
- (2025) Differential transformer. Proc. 13th Internat. Conf. Learning Representations (ICLR 2025) (OpenReview.net).Google Scholar
- (2016) If you even don’t have a bit of Bible: Learning delexicalized POS taggers. Proc. 10th Internat. Conf. Language Resources Evaluation (LREC 2016) (European Language Resources Association, Portorož, Slovenia), 96–103.Google Scholar
- (2008) Cross-language parser adaptation between related languages. Proc. IJCNLP-08 Workshop NLP Less Privileged Languages (Asian Federation of Natural Language Processing), 35–42.Google Scholar
- (2016) Planting trees in the desert: Delexicalized tagging and parsing combined. Proc. 30th Pacific Asia Conf. Language, Inform. Comput., 199–207.Google Scholar
- (2024) Can LLM graph reasoning generalize beyond pattern memorization? Findings Assoc. Comput. Linguistics EMNLP 2024 (Association for Computational Linguistics, Stroudsburg, PA), 2289–2305.Crossref, Google Scholar
- , et al. (2024) Large language models in drug discovery and development: From disease mechanisms to clinical trials. Preprint, submitted September 6, https://arxiv.org/abs/2409.04481.Google Scholar
- (2021) Domain generalization with mixstyle. Proc. 9th Internat. Conf. Learning Representations (ICLR 2021) (OpenReview.net).Google Scholar
- (2024) A survey on data augmentation in large model era. Preprint, submitted January 27, https://arxiv.org/abs/2401.15422.Google Scholar
- (2022) Domain generalization: A survey. IEEE Trans. Pattern Anal. Machine Intelligence 45(4):4396–4415.Google Scholar
- (2023) Don’t make your LLM an evaluation benchmark cheater. Preprint, submitted November 3, https://arxiv.org/abs/2311.01964.Google Scholar
- (2021) Intelligent financial fraud detection practices in post-pandemic era. Innovation 2(4):100176.Google Scholar

