Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

Yanan Wang
Yanan Wang
[email protected]
Department of Information Systems and Operations Management, The University of Texas at Arlington, Arlington, Texas 76019
Search for more papers by this author
,
Yong Ge
Corresponding Author
Yong Ge
[email protected]
https://orcid.org/0000-0002-9630-795X
Department of Management Information Systems, University of Arizona, Tucson, Arizona 85721
Search for more papers by this author

Department of Information Systems and Operations Management, The University of Texas at Arlington, Arlington, Texas 76019

Search for more papers by this author

Yong Ge

Corresponding Author

Yong Ge

[email protected]

https://orcid.org/0000-0002-9630-795X

Department of Management Information Systems, University of Arizona, Tucson, Arizona 85721

Search for more papers by this author

Published Online:20 Apr 2026https://doi.org/10.1287/isre.2024.1518

References

Abbasi A, Parsons J, Pant G, Sheng ORL, Sarker S (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
Abdin M, Jacobs SA, Awan AA, Aneja J, Awadallah A, Awadalla H, Bach N, et al. (2024) Phi-3 technical report: A highly capable language model locally on your phone. Preprint, submitted August 30, http://arxiv.org/abs/2404.14219.Google Scholar
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, et al. (2024) GPT-4 technical report. Preprint, submitted March 4, http://arxiv.org/abs/2303.08774.Google Scholar
AdVon Commerce (2024) Importance of fine-tuning AI models for real-world applications. Retrieved September 7, https://www.advoncommerce.com/topics/fine-tuning-ai.Google Scholar
Amini A, Vieira T, Cotterell R (2024) Direct preference optimization with an offset. Ku LW, Martins A, Srikumar V, eds. Findings Assoc. Comput. Linguistics ACL 2024 (Association for Computational Linguistics, Dublin, Ireland), 9954–9972.Google Scholar
Azar MG, Guo ZD, Piot B, Munos R, Rowland M, Valko M, Calandriello D (2024) A general theoretical paradigm to understand learning from human preferences. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 4447–4455.Google Scholar
Azov G, Pelc T, Alon AF, Kamhi G (2024) Self-improving customer review response generation based on LLMs. Preprint, submitted May 6, http://arxiv.org/abs/2405.03845.Google Scholar
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. Proc. 26th Annual Internat. Conf. Machine Learn. ICML ‘09 (Association for Computing Machinery, New York), 41–48.Google Scholar
Bowman SR (2023) Eight things to know about large language models. Preprint, submitted April 2, http://arxiv.org/abs/2304.00612.Google Scholar
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39(3/4):324–345.Google Scholar
Chaudhari S, Aggarwal P, Murahari V, Rajpurohit T, Kalyan A, Narasimhan K, Deshpande A, da Silva BC (2024) RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs. Preprint, submitted April 16, http://arxiv.org/abs/2404.08555.Google Scholar
Chen Y, Liu Y, Meng F, Chen Y, Xu J, Zhou J (2023a) Improving translation faithfulness of large language models via augmenting instructions. Preprint, submitted August 24, http://arxiv.org/abs/2308.12674.Google Scholar
Chen L, Li S, Yan J, Wang H, Gunaratna K, Yadav V, Tang Z, et al. (2023b) AlpaGasus: Training a better alpaca with fewer data. Twelfth Internat. Conf. Learn. Representations (ICLR, Vienna, Austria).Google Scholar
Chevalier JA, Mayzlin D (2006) The effect of word of mouth on sales: Online book reviews. J. Marketing Res. 43(3):345–354.Crossref, Google Scholar
Chung J, Kastner K, Dinh L, Goel K, Courville A, Bengio Y (2016) A recurrent latent variable model for sequential data. Preprint, submitted April 6, http://arxiv.org/abs/1506.02216.Google Scholar
Claeys AS, Cauberghe V (2014) What makes crisis response strategies work? The impact of crisis involvement and message framing. J. Bus. Res. 67(2):182–189.Crossref, Google Scholar
Colburn L (2024) Risks from AI hallucinations and how to avoid them. Persado (January 11), https://www.persado.com/articles/ai-hallucinations/.Google Scholar
Deng C, Ravichandran T (2024) Managerial response to online positive reviews: Helpful or harmful? Inform. Systems Res. 35(4):1802–1823.Link, Google Scholar
Ethayarajh K, Xu W, Muennighoff N, Jurafsky D, Kiela D (2024) Model alignment as prospect theoretic optimization. Proc. 41st Internat. Conf. Machine Learn. (PMLR, New York), 12634–12651.Google Scholar
Farooq U, Siddique AB, Jamour F, Zhao Z, Hristidis V (2020) App-aware response synthesis for user reviews. 2020 IEEE Internat. Conf. Big Data (IEEE, Atlanta), 699–708.Google Scholar
Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 2052–2062.Google Scholar
Gabbianelli G, Neu G, Papini M (2024) Importance-weighted offline learning done right. Proc. 35th Internat. Conf. Algorithmic Learn. Theory (PMLR, New York), 614–634.Google Scholar
Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, Dai Y, Sun J, Wang M, Wang H (2024) Retrieval-augmented generation for large language models: A survey. Preprint, submitted March 27, http://arxiv.org/abs/2312.10997.Google Scholar
Gourav M (2024) Transform your base and chat suggestions with fine-tuned LLM. Retrieved September 7, https://convin.ai/en-us/blog/fine-tuned-large-language-model.Google Scholar
Gronroos C (1988) Service quality: The six criteria of good perceived service. Rev. Bus. 9(3):10–13.Google Scholar
Gu B, Ye Q (2014) First step in social media: Measuring the influence of online management responses on customer satisfaction. Production Oper. Management 23(4):570–582.Crossref, Google Scholar
Howarth J (2023) 81 Online review statistics (new 2024 data). Retrieved September 6, 2024, https://explodingtopics.com/blog/online-review-stats.Google Scholar
Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, et al. (2024) A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. Preprint, submitted November 19, http://arxiv.org/abs/2311.05232.Google Scholar
Ivison H, Wang Y, Pyatkin V, Lambert N, Peters M, Dasigi P, Jang J, et al. (2023) Camels in a changing climate: Enhancing LM adaptation with Tulu 2. Preprint, submitted November 20, http://arxiv.org/abs/2311.10702.Google Scholar
Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P (2023) Survey of hallucination in natural language generation. ACM Comput. Surveys 55(12):248.Crossref, Google Scholar
Karatepe OM (2006) Customer complaints and organizational responses: The effects of complainants’ perceptions of justice on satisfaction and loyalty. Internat. J. Hospitality Management 25(1):69–90.Crossref, Google Scholar
Kew T, Volk M (2022) Improving specificity in review response generation with data-driven data filtering. Proc. Fifth Workshop E-Commerce NLP ECNLP 5 (Association for Computational Linguistics, Dublin, Ireland), 121–133.Google Scholar
Kew T, Amsler M, Ebling S (2020) Benchmarking automated review response generation for the hospitality domain. Proc. Workshop Natural Language Processing E-Commerce (Association for Computational Linguistics, Barcelona, Spain), 43–52.Google Scholar
Kingma DP, Welling M (2014) Auto-encoding variational Bayes. Preprint, submitted May 1, https://arxiv.org/abs/1312.6114v10.Google Scholar
Kumar A (2019) Data-driven deep reinforcement learning. Retrieved June 18, 2024, http://bair.berkeley.edu/blog/2019/12/05/bear/.Google Scholar
Kumar N, Qiu L, Kumar S (2018) Exit, voice, and response on digital platforms: An empirical investigation of online management response strategies. Inform. Systems Res. 29(4):849–870.Link, Google Scholar
Kumar A, Fu J, Soh M, Tucker G, Levine S (2019) Stabilizing off-policy Q-learning via bootstrapping error reduction. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted November 1, http://arxiv.org/abs/2005.01643.Google Scholar
Liu XY, Wang G, Yang H, Zha D (2023a) FinGPT: Democratizing internet-scale data for financial large language models. Preprint, submitted November 14, https://arxiv.org/abs/2307.10485v2.Google Scholar
Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2023b) GPT understands, too. Preprint, submitted October 25, http://arxiv.org/abs/2103.10385.Google Scholar
Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, et al. (2023) Self-refine: Iterative refinement with self-feedback. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 46534–46594.Google Scholar
Maleki N, Padmanabhan B, Dutta K (2024) AI hallucinations: A misnomer worth clarifying. Preprint, submitted January 9, http://arxiv.org/abs/2401.06796.Google Scholar
Miller JL, Craighead CW, Karwan KR (2000) Service recovery: A framework and empirical investigation. J. Oper. Management 18(4):387–400.Crossref, Google Scholar
MyScale (2024) Prompt engineering vs fine-tuning vs RAG. Retrieved September 7, https://myscale.com/blog/prompt-engineering-vs-finetuning-vs-rag/.Google Scholar
OpenAI (2023) DALL3. Retrieved September 7, https://openai.com/index/dall-e-3/.Google Scholar
OpenAI (2024) Video generation models as world simulators. Retrieved September 7, https://openai.com/index/video-generation-models-as-world-simulators/.Google Scholar
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, et al. (2022) Training language models to follow instructions with human feedback. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 27730–27744.Google Scholar
Padmanabhan B, Fang X, Sahoo N, Burton-Jones A III (2022) Machine learning in information systems research. MIS Quart. 46(1):iii–xix.Crossref, Google Scholar
Panickssery A, Bowman SR, Feng S (2024) LLM evaluators recognize and favor their own generations. Preprint, submitted April 15, http://arxiv.org/abs/2404.13076.Google Scholar
Peng XB, Kumar A, Zhang G, Levine S (2019) Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. Preprint, submitted October 7, http://arxiv.org/abs/1910.00177.Google Scholar
Pomerleau DA (1991) Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1):88–97.Crossref, Google Scholar
Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C (2023) Direct preference optimization: Your language model is secretly a reward model. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
Ravichandran T, Deng C (2023) Effects of managerial response to negative reviews on future review valence and complaints. Inform. Systems Res. 34(1):319–341.Link, Google Scholar
Reisenbichler M, Reutterer T, Schweidel DA, Dan D (2022) Frontiers: Supporting content marketing with natural language generation. Marketing Sci. 41(3):441–452.Link, Google Scholar
Schulman J (2023) Reinforcement learning from human feedback: Progress and challenges. Retrieved July 1, https://www.youtube.com/watch?v=hhiLw5Q_UFg.Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint, submitted August 28, http://arxiv.org/abs/1707.06347.Google Scholar
Singh A, Kumar A, Vuong Q, Chebotar Y, Levine S (2023) ReDS: Offline RL with heteroskedastic datasets via support constraints. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 23921–23933.Google Scholar
Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L, Clark K, et al. (2023) Towards expert-level medical question answering with large language models. Preprint, submitted May 16, http://arxiv.org/abs/2305.09617.Google Scholar
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
Strauss J, Hill DJ (2001) Consumer complaints by e-mail: An exploratory investigation of corporate responses and customer reactions. J. Interactive Marketing 15(1):63–73.Crossref, Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
Tang B, Matteson DS (2021) Probabilistic transformer for time series analysis. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 23592–23608.Google Scholar
Tax SS, Brown SW, Chandrashekaran M (1998) Customer evaluations of service complaint experiences: Implications for relationship marketing. J. Marketing 62(2):60–76.Crossref, Google Scholar
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, et al. (2023) Llama 2: Open foundation and fine-tuned chat models. Preprint, submitted July 19, http://arxiv.org/abs/2307.09288.Google Scholar
Tversky A, Kahneman D (1992) Advances in prospect theory: Cumulative representation of uncertainty. J. Risk Uncertainty 5(4):297–323.Crossref, Google Scholar
Van Veen D, Van Uden C, Blankemeier L, Delbrouck JB, Aali A, Bluethgen C, Pareek A, et al. (2024) Adapted large language models can outperform medical experts in clinical text summarization. Nature Medicine 30(4):1134–1142.Crossref, Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
Wang L, Krishnamurthy A, Slivkins A (2024a) Oracle-efficient pessimism: Offline policy optimization in contextual bandits. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 766–774.Google Scholar
Wang W, Zhao Z, Sun T (2024b) Customizing large language models for business context: Framework and experiments. Preprint, submitted May 14, http://arxiv.org/abs/2312.10225.Google Scholar
Wang Y, Ivison H, Dasigi P, Hessel J, Khot T, Chandu K, Wadden D, et al. (2023) How far can camels go? Exploring the state of instruction tuning on open resources. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
Xie KL, So KKF, Wang W (2017) Joint effects of management responses and online reviews on hotel financial performance: A data-analytics approach. Internat. J. Hospitality Management 62:101–110.Crossref, Google Scholar
Zhang W, Gu W, Gao C, Lyu MR (2023) A transformer-based approach for improving app review response generation. Software Practice Experience 53(2):438–454.Crossref, Google Scholar
Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2020) BERTScore: Evaluating text generation with BERT. 8th Internat. Conf. Learn. Representations (ICLR, Addis Ababa, Ethiopia).Google Scholar
Zhang Y, Li Y, Cui L, Cai D, Liu L, Fu T, Huang X, et al. (2025) Siren’s song in the AI ocean: A survey on hallucination in large language models. Preprint, submitted September 14, http://arxiv.org/abs/2309.01219.Google Scholar
Zhao Y, Joshi R, Liu T, Khalman M, Saleh M, Liu PJ (2023) SLiC-HF: Sequence likelihood calibration with human feedback. Preprint, submitted May 17, http://arxiv.org/abs/2305.10425.Google Scholar
Zhou W, Bajracharya S, Held D (2021) PLAS: Latent action space for offline reinforcement learning. Proc. 2020 Conf. Robot Learn. (PMLR, New York), 1719–1735.Google Scholar
Zhou C, Liu P, Xu P, Iyer S, Sun J, Mao Y, Ma X, et al. (2023) LIMA: Less is more for alignment. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar

cover image Information Systems Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 16, 2024
Accepted:March 09, 2026
Published Online:April 20, 2026

Cite as

Yanan Wang, Yong Ge (2026) Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management. Information Systems Research 0(0).

https://doi.org/10.1287/isre.2024.1518

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News