Reference Aware Delexicalization (RAD) Framework: Theory Driven Artificial Intelligence Modeling for Domain Generalization

Published Online:https://doi.org/10.1287/isre.2023.0457

References

  • Adamopoulos P, Ghose A, Todri V (2018) The impact of user personality traits on word of mouth: Text-mining social media platforms. Inform. Systems Res. 29(3):612–640. https://doi.org/10.1287/isre.2017.0768.LinkGoogle Scholar
  • Agrawal A, Batra D, Parikh D (2016) Analyzing the behavior of visual question answering models. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 955–1960.CrossrefGoogle Scholar
  • Aldoseri A, Al-Khalifa KN, Hamouda AM (2023) Re-thinking data strategy and integration for artificial intelligence: Concepts, opportunities, and challenges. Appl. Sci. 13(12):7082.CrossrefGoogle Scholar
  • Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation. Proc. 6th Internat. Conf. Learn. Representations (ICLR 2018) (OpenReview.net).Google Scholar
  • Bender EM, Koller A (2020) Climbing towards NLU: On meaning, form, and understanding in the age of data. Proc. 58th Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5185–5198.Google Scholar
  • Bentivogli L, Magnini B, Dagan I, Dang HT, Giampiccolo D (2009) The fifth PASCAL recognizing textual entailment challenge. Proc. Text Anal. Conf. (TAC 2009) (National Institute of Standards and Technology, Gaithersburg, MD).Google Scholar
  • Berant J, Chou A, Frostig R, Liang P (2013) Semantic parsing on freebase from question-answer pairs. Proc. 2013 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1533–1544.Google Scholar
  • Blodgett SL, Green L, O’Connor B (2016) Demographic dialectal variation in social media: A case study of African-American English. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1119–1130.CrossrefGoogle Scholar
  • Bonetta G, Roberti M, Cancelliere R, Gallinari P (2021) The rare word issue in natural language generation: A character-based solution. Informatics 8(1):20.CrossrefGoogle Scholar
  • Bourgeade T, Chiril P, Benamara F, Moriceau V (2023) What did you learn to hate? A topic-oriented analysis of generalization in hate speech detection. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 3495–3508.Google Scholar
  • Bowman SR, Vilnis L, Vinyals O, Dai, AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. Proc. 20th SIGNLL Conf. Comput. Natl. Language Learning (CoNLL 2016) (Association for Computational Linguistics, Stroudsburg, PA), 10–21.CrossrefGoogle Scholar
  • Bran AM, Cox S, Schilter O, Baldassari C, White AD, Schwaller P (2024) Augmenting large language models with chemistry tools. Nature Machine Intelligence 6(5):525–535.CrossrefGoogle Scholar
  • Chen J, Shen D, Chen W, Yang D (2021) Hiddencut: Simple data augmentation for natural language understanding with better generalizability. Proc. 59th Annual Meeting Assoc. Comput. Linguistics 11th Internat. Joint Conf. Natl. Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4380–4390.Google Scholar
  • Chen G, Huang L, Xiao S, Zhang, C, Zhao H (2024) Attending to customer attention: A novel deep learning method for leveraging multimodal online reviews to enhance sales prediction. Inform. Systems Res. 35(2):829–849.LinkGoogle Scholar
  • Chen X, Wang T, Guo T, Guo K, Zhou J, Li H, Song Z, et al. (2025) Unveiling the power of language models in chemical research question answering. Comm. Chemistry 8(1):4.Google Scholar
  • Clark K, Manning CD (2016) Deep reinforcement learning for mention-ranking coreference models. Proc. 2016 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 2256–2262.CrossrefGoogle Scholar
  • Devitt M (1981) Designation (Columbia University Press, New York).CrossrefGoogle Scholar
  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
  • Ding Y, Wang L, Liang B, Liang S, Wang Y, Chen F (2022) Domain generalization by learning and removing domain-specific features. Adv. Neural Inform. Processing Systems 35:24226–24239.Google Scholar
  • Dixon L, Li J, Sorensen J, Thain N, Vasserman L (2018) Measuring and mitigating unintended bias in text classification. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 3495–3508.Google Scholar
  • Dong Y, Jiang X, Liu H, Jin Z, Gu B, Yang M, Li G (2024) Generalization or memorization: Data contamination and trustworthy evaluation for large language models. Findings Assoc. Comput. Linguistics: ACL 2024 (Association for Computational Linguistics, Stroudsburg, PA), 12039–12050.CrossrefGoogle Scholar
  • Dušek O, Jurčíček F (2016) Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. Proc. 54th Annual Meeting Assoc. Comput. Linguistics, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA), 45–51.Google Scholar
  • Ebrahimi M, Nunamaker JF Jr, Chen H (2020) Semi-supervised cyber threat identification in dark net markets: A transductive and deep learning approach. J. Management Inform. Systems 37(3):694–722.CrossrefGoogle Scholar
  • Evans G (1982) The Varieties of Reference (Oxford University Press, Oxford, UK).Google Scholar
  • Feng S, Wallace E, Grissom A, II, Iyyer M, Rodriguez P, Boyd-Graber J (2018) Pathologies of neural models make interpretations difficult. Proc. 2018 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 3719–3728.Google Scholar
  • Goodfellow I, Bengio Y, Courville A (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
  • Gururangan S, Swayamdipta S, Levy O, Schwartz R, Bowman SR, Smith NA (2018) Annotation artifacts in natural language inference data. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA) 107–112.CrossrefGoogle Scholar
  • Hardmeier C, Federico M (2010) Modelling pronominal anaphora in statistical machine translation. Proc. 7th Internat. Workshop Spoken Language Translation (Paris, France).Google Scholar
  • Herlihy C, Rudinger R (2021) MedNLI is not immune: Natural language inference artifacts in the clinical domain. Proc. 59th Annual Meeting Assoc. Comput. Linguistics 11th Internat. Joint Conf. Natl. Language Processing, vol. 2 (Association for Computational Linguistics, Stroudsburg, PA), 1020–1027.CrossrefGoogle Scholar
  • Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Quart. 28(1):75–105.CrossrefGoogle Scholar
  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput, 9(8):1735–1780.CrossrefGoogle Scholar
  • Hupkes D, Giulianelli M, Dankers V, Artetxe M, Elazar Y, Pimentel T, Christodoulopoulos C, et al. (2023) A taxonomy and review of generalization research in NLP. Nature Machine Intelligence 5(10):1161–1174.CrossrefGoogle Scholar
  • Iyyer M, Wieting J, Gimpel K, Zettlemoyer L (2018) Adversarial example generation with syntactically controlled paraphrase networks. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 1875–1885.CrossrefGoogle Scholar
  • Jacovi A, Caciularu A, Goldman O, Goldberg Y (2023) Stop uploading test data in plain text: Practical strategies for mitigating data contamination by evaluation benchmarks. Proc. 2023 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 5075–5084.CrossrefGoogle Scholar
  • Kim Y (2014) Convolutional neural networks for sentence classification. Proc. 2014 Conf. Empirical Methods Natl Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1746–1751.CrossrefGoogle Scholar
  • Kouw WM, Loog M (2018) An introduction to domain adaptation and transfer learning. Preprint, submitted December 31, https://arxiv.org/abs/1812.11806.Google Scholar
  • Kripke SA (1972) Naming and necessity. Davidson D, Harman G, eds. Semantics of Natl. Language, vol. 40 (Harvard University Press Cambridge, MA), 253–355.CrossrefGoogle Scholar
  • Kripke S (1977) Speaker’s reference and semantic reference 1. Midwest Stud. Philosophy 2(1):255–276.CrossrefGoogle Scholar
  • Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: A lite bert for self-supervised learning of language representations. Internat. Conf. Learn. Representations (ICLR 2020) (OpenReview.net).Google Scholar
  • Li L, Wan X (2018) Point precisely: Towards ensuring the precision of data in generated texts using delayed copy mechanism. Proc. 27th Internat. Conf. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 1044–1055.Google Scholar
  • Li Y, Liang S, Lyu M, R, Wang L (2024) Making long-context language models better multi-hop reasoners. Proc. 62nd Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 2462–2475.CrossrefGoogle Scholar
  • Ling X, Weld D (2012) Fine-grained entity recognition. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 94–100.Google Scholar
  • Ling X, Singh S, Weld DS (2015) Design challenges for entity linking. Trans. Assoc. Comput Linguistics 3:315–328.CrossrefGoogle Scholar
  • Liu NF, Lin K, Hewitt J, Paranjape A, Bevilacqua M, Petroni F, Liang P (2024) Lost in the middle: How language models use long contexts. Trans. Assoc. Comput. Linguistics 12:157–173.CrossrefGoogle Scholar
  • Liu H, Tam D, Muqeeth M, Mohta J, Huang T, Bansal M, Raffel CA (2022) Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Adv. Neural Inform. Processing Systems 35:1950–1965.Google Scholar
  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) RoBERTa: A robustly optimized BERT pretraining approach. Preprint, submitted July 26, https://arxiv.org/abs/1907.11692.Google Scholar
  • Mai HT, Chu CX, Paulheim H (2025) Do LLMs really adapt to domains? An ontology learning perspective. The Semantic Web: Proc. 23rd Internat. Semantic Web Conf. (ISWC 2024). Lecture Notes in Computer Science, vol. 15231 (Springer, Berlin), 126–143.Google Scholar
  • Matamoros-Fernández A, Farkas J (2021) Racism, hate speech, and social media: A systematic review and critique. Television New Media 22(2):205–224.CrossrefGoogle Scholar
  • McCoy RT, Pavlick E, Linzen T (2019) Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. Proc. 57th Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 3428–3448.Google Scholar
  • McDonald R, Petrov S, Hall K (2011) Multi-source transfer of delexicalized dependency parsers. Proc. 2011 Conf. Empirical Methods in Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 62–72.Google Scholar
  • Microsoft Research AI4Science & Microsoft Azure Quantum (2023) The impact of large language models on scientific discovery: A preliminary study using GPT-4. Preprint, submitted November 13, https://arxiv.org/abs/2311.07361.Google Scholar
  • Murphy, ML (2010) Lexical Meaning (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Poesio M, Vieira R (1998) A corpus-based investigation of definite description use. Comput. Linguistics 24(2):183–216.Google Scholar
  • Pomerleau D, Rao D (2017) The Fake News Challenge: Exploring how artificial intelligence technologies could be leveraged to combat fake news, http://www.fakenewschallenge.org/.Google Scholar
  • Putnam H (1975) The meaning of “meaning.” Gunderson K, ed. Language, Mind, and Knowledge, Minnesota Studies in the Philosophy of Science, vol. 7 (University of Minnesota Press, Minneapolis), 131–193.Google Scholar
  • Qi L, Yang H, Shi Y, Geng X (2024) NormAUG: Normalization-guided augmentation for domain generalization. IEEE Trans. Image Processing 33:1419–1431.CrossrefGoogle Scholar
  • Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1527–1535.Google Scholar
  • Roberts M, Thakur H, Herlihy C, White C, Dooley S (2024) To the cutoff... and beyond? A longitudinal perspective on LLM data contamination. Proc. 12th Internat. Conf. Learning Representations (ICLR 2024) (OpenReview.net).Google Scholar
  • Romanov A, Shivade C (2018) Lessons from natural language inference in the clinical domain. Proc. 2018 Conf. Empirical Methods iNatl Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1586–1596.CrossrefGoogle Scholar
  • Samtani S, Chai Y, Chen H (2022) Linking exploits from the dark web to known vulnerabilities for proactive cyber threat intelligence: An attention-based deep structured semantic model. MIS Quart. 46(2):911–946.CrossrefGoogle Scholar
  • Sener O, Koltun V (2022) Domain generalization without excess empirical risk. Adv. Neural Inform. Processing Systems, 35:13380–13391.Google Scholar
  • Sharma S, He J, Suleman K, Schulz H, Bachman P (2017) Natural language generation in dialogue using lexicalized and delexicalized data. Internat. Conf. Learning Representations (ICLR 2017) Workshop (Toulon, France).Google Scholar
  • Shi F, Chen X, Misra K, Scales N, Dohan D, Chi EH, Zhou D (2023) Large language models can be easily distracted by irrelevant context. Proc. 40th Internat. Conf. Machine Learning, vol. 202 (PMLR, New York), 31210–31227.Google Scholar
  • Soames, S (2002) Beyond rigidity: The Unfinished Semantic Agenda of Naming and Necessity (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • Straub ET (2009) Understanding technology adoption: Theory and future directions for informal learning. Rev. Ed. Res. 79(2):625–649.CrossrefGoogle Scholar
  • Suntwal S, Paul M, Sharp R, Surdeanu M (2019) On the importance of delexicalization for fact verification. Proc. 2019 Conf. Empirical Methods Natural Language Processing 9th Internat. Joint Conf. Natl Language Processing (EMNLP-IJCNLP) (Association for Computational Linguistics, Stroudsburg, PA), 3413–3418.CrossrefGoogle Scholar
  • Thorne J, Vlachos A, Christodoulopoulos C, Mittal A (2018) FEVER: A large-scale dataset for fact extraction and verification. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 809–819.CrossrefGoogle Scholar
  • Vu T-T, Khadivi S, Phung D, Haffari G (2022) Domain generalisation of NMT: Fusing adapters with leave-one-domain-out training. Findings Assoc. Comput. Linguistics: ACL 2022 (Association for Computational Linguistics, Stroudsburg, PA), 582–588.CrossrefGoogle Scholar
  • Wang C, Zhao D, Wang B, He R, Hou Y (2024) Do LLMs have the generalization ability in conducting causal inference? Preprint, submitted October 15, https://arxiv.org/abs/2410.11385.Google Scholar
  • Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding. Proc. 7th Internat. Conf. Learning Representations (ICLR 2019) (OpenReview.net).Google Scholar
  • Wang J, Lan C, Liu C, Ouyang Y, Qin T, Lu W, Yu PS (2022) Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowledge and Data Engrg. 35(8):8052–8072.Google Scholar
  • Webster J, Watson RT (2002) Analyzing the past to prepare for the future: Writing a literature review. MIS Quart. 26(2):xiii–xxiii.CrossrefGoogle Scholar
  • Wen T-H, Gašić M, Mrkšić N, Su P-H, Vandyke D, Young S (2015) Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. Proc. 2015 Conf. Empirical Methods Natal Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1711–1721.CrossrefGoogle Scholar
  • Williams A, Nangia N, Bowman SR (2018) A broad-coverage challenge corpus for sentence understanding through inference. Proc. 2018 Conf. North American Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 1112–1122.CrossrefGoogle Scholar
  • Wu Y, Gardner M, Stenetorp P, Dasigi P (2022) Generating data to mitigate spurious correlations in natural language inference datasets. Proc. 60th Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 2660–2676.CrossrefGoogle Scholar
  • Wu J, Zhang S, Che F, Feng M, Shao P, Tao J (2025) Pandora’s box or Aladdin’s lamp: A comprehensive analysis revealing the role of RAG noise in large language models. Proc. 63rd Annual Meeting Assoc. Comput. Linguistics, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Xu C, Guo D, Duan N, McAuley J (2022) LaPraDoR: Unsupervised pretrained dense retriever for zero-shot text retrieval. Findings Assoc. Comput. Linguistics (ACL 2022) (Association for Computational Linguistics, Stroudsburg, PA), 3557–3569.CrossrefGoogle Scholar
  • Xu Z, Liu D, Yang J, Raffel C, Niethammer M (2021) Robust and generalizable visual representation learning via random convolutions. Proc. 9th Internat. Conf. Learning Representations (ICLR 2021) (OpenReview.net).Google Scholar
  • Yang K, Lau RY, Abbasi A (2023) Getting personal: A deep learning artifact for text-based measurement of personality. Inform. Systems Res. 34(1):194–222.LinkGoogle Scholar
  • Yang L, Yuan L, Cui L, Gao W, Zhang Y (2022) FactMix: Using a few labeled in-domain examples to generalize to cross-domain named entity recognition. Proc. 29th Internat. Conf. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5360–5371.Google Scholar
  • Ye T, Dong L, Xia Y, Sun Y, Zhu Y, Huang G, Wei F (2025) Differential transformer. Proc. 13th Internat. Conf. Learning Representations (ICLR 2025) (OpenReview.net).Google Scholar
  • Yu Z, Mareček D, Žabokrtský Z, Zeman D (2016) If you even don’t have a bit of Bible: Learning delexicalized POS taggers. Proc. 10th Internat. Conf. Language Resources Evaluation (LREC 2016) (European Language Resources Association, Portorož, Slovenia), 96–103.Google Scholar
  • Zeman D, Resnik P (2008) Cross-language parser adaptation between related languages. Proc. IJCNLP-08 Workshop NLP Less Privileged Languages (Asian Federation of Natural Language Processing), 35–42.Google Scholar
  • Zeman D, Marecek D, Yu Z, Zabokrtsky Z (2016) Planting trees in the desert: Delexicalized tagging and parsing combined. Proc. 30th Pacific Asia Conf. Language, Inform. Comput., 199–207.Google Scholar
  • Zhang Y, Wang H, Feng S, Tan Z, Han X, He T, Tsvetkov Y (2024) Can LLM graph reasoning generalize beyond pattern memorization? Findings Assoc. Comput. Linguistics EMNLP 2024 (Association for Computational Linguistics, Stroudsburg, PA), 2289–2305.CrossrefGoogle Scholar
  • Zheng Y, Koh HY, Yang M, Li L, May LT, Webb GI, Pan S, et al. (2024) Large language models in drug discovery and development: From disease mechanisms to clinical trials. Preprint, submitted September 6, https://arxiv.org/abs/2409.04481.Google Scholar
  • Zhou K, Yang Y, Qiao Y, Xiang T (2021) Domain generalization with mixstyle. Proc. 9th Internat. Conf. Learning Representations (ICLR 2021) (OpenReview.net).Google Scholar
  • Zhou Y, Guo C, Wang X, Chang Y, Wu Y (2024) A survey on data augmentation in large model era. Preprint, submitted January 27, https://arxiv.org/abs/2401.15422.Google Scholar
  • Zhou K, Liu Z, Qiao Y, Xiang T, Loy C (2022) Domain generalization: A survey. IEEE Trans. Pattern Anal. Machine Intelligence 45(4):4396–4415.Google Scholar
  • Zhou K, Zhu Y, Chen Z, Chen W, Zhao WX, Chen X, Lin Y, et al. (2023) Don’t make your LLM an evaluation benchmark cheater. Preprint, submitted November 3, https://arxiv.org/abs/2311.01964.Google Scholar
  • Zhu X, Ao X, Qin Z, Chang Y, Liu Y, He Q, Li J (2021) Intelligent financial fraud detection practices in post-pandemic era. Innovation 2(4):100176.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.