Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management

Published Online:https://doi.org/10.1287/isre.2024.1518

References

  • Abbasi A, Parsons J, Pant G, Sheng ORL, Sarker S (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.LinkGoogle Scholar
  • Abdin M, Jacobs SA, Awan AA, Aneja J, Awadallah A, Awadalla H, Bach N, et al. (2024) Phi-3 technical report: A highly capable language model locally on your phone. Preprint, submitted August 30, http://arxiv.org/abs/2404.14219.Google Scholar
  • Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, et al. (2024) GPT-4 technical report. Preprint, submitted March 4, http://arxiv.org/abs/2303.08774.Google Scholar
  • AdVon Commerce (2024) Importance of fine-tuning AI models for real-world applications. Retrieved September 7, https://www.advoncommerce.com/topics/fine-tuning-ai.Google Scholar
  • Amini A, Vieira T, Cotterell R (2024) Direct preference optimization with an offset. Ku LW, Martins A, Srikumar V, eds. Findings Assoc. Comput. Linguistics ACL 2024 (Association for Computational Linguistics, Dublin, Ireland), 9954–9972.Google Scholar
  • Azar MG, Guo ZD, Piot B, Munos R, Rowland M, Valko M, Calandriello D (2024) A general theoretical paradigm to understand learning from human preferences. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 4447–4455.Google Scholar
  • Azov G, Pelc T, Alon AF, Kamhi G (2024) Self-improving customer review response generation based on LLMs. Preprint, submitted May 6, http://arxiv.org/abs/2405.03845.Google Scholar
  • Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. Proc. 26th Annual Internat. Conf. Machine Learn. ICML ‘09 (Association for Computing Machinery, New York), 41–48.Google Scholar
  • Bowman SR (2023) Eight things to know about large language models. Preprint, submitted April 2, http://arxiv.org/abs/2304.00612.Google Scholar
  • Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39(3/4):324–345.Google Scholar
  • Chaudhari S, Aggarwal P, Murahari V, Rajpurohit T, Kalyan A, Narasimhan K, Deshpande A, da Silva BC (2024) RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs. Preprint, submitted April 16, http://arxiv.org/abs/2404.08555.Google Scholar
  • Chen Y, Liu Y, Meng F, Chen Y, Xu J, Zhou J (2023a) Improving translation faithfulness of large language models via augmenting instructions. Preprint, submitted August 24, http://arxiv.org/abs/2308.12674.Google Scholar
  • Chen L, Li S, Yan J, Wang H, Gunaratna K, Yadav V, Tang Z, et al. (2023b) AlpaGasus: Training a better alpaca with fewer data. Twelfth Internat. Conf. Learn. Representations (ICLR, Vienna, Austria).Google Scholar
  • Chevalier JA, Mayzlin D (2006) The effect of word of mouth on sales: Online book reviews. J. Marketing Res. 43(3):345–354.CrossrefGoogle Scholar
  • Chung J, Kastner K, Dinh L, Goel K, Courville A, Bengio Y (2016) A recurrent latent variable model for sequential data. Preprint, submitted April 6, http://arxiv.org/abs/1506.02216.Google Scholar
  • Claeys AS, Cauberghe V (2014) What makes crisis response strategies work? The impact of crisis involvement and message framing. J. Bus. Res. 67(2):182–189.CrossrefGoogle Scholar
  • Colburn L (2024) Risks from AI hallucinations and how to avoid them. Persado (January 11), https://www.persado.com/articles/ai-hallucinations/.Google Scholar
  • Deng C, Ravichandran T (2024) Managerial response to online positive reviews: Helpful or harmful? Inform. Systems Res. 35(4):1802–1823.LinkGoogle Scholar
  • Ethayarajh K, Xu W, Muennighoff N, Jurafsky D, Kiela D (2024) Model alignment as prospect theoretic optimization. Proc. 41st Internat. Conf. Machine Learn. (PMLR, New York), 12634–12651.Google Scholar
  • Farooq U, Siddique AB, Jamour F, Zhao Z, Hristidis V (2020) App-aware response synthesis for user reviews. 2020 IEEE Internat. Conf. Big Data (IEEE, Atlanta), 699–708.Google Scholar
  • Fujimoto S, Meger D, Precup D (2019) Off-policy deep reinforcement learning without exploration. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 2052–2062.Google Scholar
  • Gabbianelli G, Neu G, Papini M (2024) Importance-weighted offline learning done right. Proc. 35th Internat. Conf. Algorithmic Learn. Theory (PMLR, New York), 614–634.Google Scholar
  • Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, Dai Y, Sun J, Wang M, Wang H (2024) Retrieval-augmented generation for large language models: A survey. Preprint, submitted March 27, http://arxiv.org/abs/2312.10997.Google Scholar
  • Gourav M (2024) Transform your base and chat suggestions with fine-tuned LLM. Retrieved September 7, https://convin.ai/en-us/blog/fine-tuned-large-language-model.Google Scholar
  • Gronroos C (1988) Service quality: The six criteria of good perceived service. Rev. Bus. 9(3):10–13.Google Scholar
  • Gu B, Ye Q (2014) First step in social media: Measuring the influence of online management responses on customer satisfaction. Production Oper. Management 23(4):570–582.CrossrefGoogle Scholar
  • Howarth J (2023) 81 Online review statistics (new 2024 data). Retrieved September 6, 2024, https://explodingtopics.com/blog/online-review-stats.Google Scholar
  • Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, et al. (2024) A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. Preprint, submitted November 19, http://arxiv.org/abs/2311.05232.Google Scholar
  • Ivison H, Wang Y, Pyatkin V, Lambert N, Peters M, Dasigi P, Jang J, et al. (2023) Camels in a changing climate: Enhancing LM adaptation with Tulu 2. Preprint, submitted November 20, http://arxiv.org/abs/2311.10702.Google Scholar
  • Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P (2023) Survey of hallucination in natural language generation. ACM Comput. Surveys 55(12):248.CrossrefGoogle Scholar
  • Karatepe OM (2006) Customer complaints and organizational responses: The effects of complainants’ perceptions of justice on satisfaction and loyalty. Internat. J. Hospitality Management 25(1):69–90.CrossrefGoogle Scholar
  • Kew T, Volk M (2022) Improving specificity in review response generation with data-driven data filtering. Proc. Fifth Workshop E-Commerce NLP ECNLP 5 (Association for Computational Linguistics, Dublin, Ireland), 121–133.Google Scholar
  • Kew T, Amsler M, Ebling S (2020) Benchmarking automated review response generation for the hospitality domain. Proc. Workshop Natural Language Processing E-Commerce (Association for Computational Linguistics, Barcelona, Spain), 43–52.Google Scholar
  • Kingma DP, Welling M (2014) Auto-encoding variational Bayes. Preprint, submitted May 1, https://arxiv.org/abs/1312.6114v10.Google Scholar
  • Kumar A (2019) Data-driven deep reinforcement learning. Retrieved June 18, 2024, http://bair.berkeley.edu/blog/2019/12/05/bear/.Google Scholar
  • Kumar N, Qiu L, Kumar S (2018) Exit, voice, and response on digital platforms: An empirical investigation of online management response strategies. Inform. Systems Res. 29(4):849–870.LinkGoogle Scholar
  • Kumar A, Fu J, Soh M, Tucker G, Levine S (2019) Stabilizing off-policy Q-learning via bootstrapping error reduction. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted November 1, http://arxiv.org/abs/2005.01643.Google Scholar
  • Liu XY, Wang G, Yang H, Zha D (2023a) FinGPT: Democratizing internet-scale data for financial large language models. Preprint, submitted November 14, https://arxiv.org/abs/2307.10485v2.Google Scholar
  • Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2023b) GPT understands, too. Preprint, submitted October 25, http://arxiv.org/abs/2103.10385.Google Scholar
  • Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, et al. (2023) Self-refine: Iterative refinement with self-feedback. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 46534–46594.Google Scholar
  • Maleki N, Padmanabhan B, Dutta K (2024) AI hallucinations: A misnomer worth clarifying. Preprint, submitted January 9, http://arxiv.org/abs/2401.06796.Google Scholar
  • Miller JL, Craighead CW, Karwan KR (2000) Service recovery: A framework and empirical investigation. J. Oper. Management 18(4):387–400.CrossrefGoogle Scholar
  • MyScale (2024) Prompt engineering vs fine-tuning vs RAG. Retrieved September 7, https://myscale.com/blog/prompt-engineering-vs-finetuning-vs-rag/.Google Scholar
  • OpenAI (2023) DALL3. Retrieved September 7, https://openai.com/index/dall-e-3/.Google Scholar
  • OpenAI (2024) Video generation models as world simulators. Retrieved September 7, https://openai.com/index/video-generation-models-as-world-simulators/.Google Scholar
  • Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, et al. (2022) Training language models to follow instructions with human feedback. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 27730–27744.Google Scholar
  • Padmanabhan B, Fang X, Sahoo N, Burton-Jones A III (2022) Machine learning in information systems research. MIS Quart. 46(1):iii–xix.CrossrefGoogle Scholar
  • Panickssery A, Bowman SR, Feng S (2024) LLM evaluators recognize and favor their own generations. Preprint, submitted April 15, http://arxiv.org/abs/2404.13076.Google Scholar
  • Peng XB, Kumar A, Zhang G, Levine S (2019) Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. Preprint, submitted October 7, http://arxiv.org/abs/1910.00177.Google Scholar
  • Pomerleau DA (1991) Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1):88–97.CrossrefGoogle Scholar
  • Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C (2023) Direct preference optimization: Your language model is secretly a reward model. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Ravichandran T, Deng C (2023) Effects of managerial response to negative reviews on future review valence and complaints. Inform. Systems Res. 34(1):319–341.LinkGoogle Scholar
  • Reisenbichler M, Reutterer T, Schweidel DA, Dan D (2022) Frontiers: Supporting content marketing with natural language generation. Marketing Sci. 41(3):441–452.LinkGoogle Scholar
  • Schulman J (2023) Reinforcement learning from human feedback: Progress and challenges. Retrieved July 1, https://www.youtube.com/watch?v=hhiLw5Q_UFg.Google Scholar
  • Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint, submitted August 28, http://arxiv.org/abs/1707.06347.Google Scholar
  • Singh A, Kumar A, Vuong Q, Chebotar Y, Levine S (2023) ReDS: Offline RL with heteroskedastic datasets via support constraints. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 23921–23933.Google Scholar
  • Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L, Clark K, et al. (2023) Towards expert-level medical question answering with large language models. Preprint, submitted May 16, http://arxiv.org/abs/2305.09617.Google Scholar
  • Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Strauss J, Hill DJ (2001) Consumer complaints by e-mail: An exploratory investigation of corporate responses and customer reactions. J. Interactive Marketing 15(1):63–73.CrossrefGoogle Scholar
  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Tang B, Matteson DS (2021) Probabilistic transformer for time series analysis. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 23592–23608.Google Scholar
  • Tax SS, Brown SW, Chandrashekaran M (1998) Customer evaluations of service complaint experiences: Implications for relationship marketing. J. Marketing 62(2):60–76.CrossrefGoogle Scholar
  • Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, et al. (2023) Llama 2: Open foundation and fine-tuned chat models. Preprint, submitted July 19, http://arxiv.org/abs/2307.09288.Google Scholar
  • Tversky A, Kahneman D (1992) Advances in prospect theory: Cumulative representation of uncertainty. J. Risk Uncertainty 5(4):297–323.CrossrefGoogle Scholar
  • Van Veen D, Van Uden C, Blankemeier L, Delbrouck JB, Aali A, Bluethgen C, Pareek A, et al. (2024) Adapted large language models can outperform medical experts in clinical text summarization. Nature Medicine 30(4):1134–1142.CrossrefGoogle Scholar
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Wang L, Krishnamurthy A, Slivkins A (2024a) Oracle-efficient pessimism: Offline policy optimization in contextual bandits. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 766–774.Google Scholar
  • Wang W, Zhao Z, Sun T (2024b) Customizing large language models for business context: Framework and experiments. Preprint, submitted May 14, http://arxiv.org/abs/2312.10225.Google Scholar
  • Wang Y, Ivison H, Dasigi P, Hessel J, Khot T, Chandu K, Wadden D, et al. (2023) How far can camels go? Exploring the state of instruction tuning on open resources. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Xie KL, So KKF, Wang W (2017) Joint effects of management responses and online reviews on hotel financial performance: A data-analytics approach. Internat. J. Hospitality Management 62:101–110.CrossrefGoogle Scholar
  • Zhang W, Gu W, Gao C, Lyu MR (2023) A transformer-based approach for improving app review response generation. Software Practice Experience 53(2):438–454.CrossrefGoogle Scholar
  • Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y (2020) BERTScore: Evaluating text generation with BERT. 8th Internat. Conf. Learn. Representations (ICLR, Addis Ababa, Ethiopia).Google Scholar
  • Zhang Y, Li Y, Cui L, Cai D, Liu L, Fu T, Huang X, et al. (2025) Siren’s song in the AI ocean: A survey on hallucination in large language models. Preprint, submitted September 14, http://arxiv.org/abs/2309.01219.Google Scholar
  • Zhao Y, Joshi R, Liu T, Khalman M, Saleh M, Liu PJ (2023) SLiC-HF: Sequence likelihood calibration with human feedback. Preprint, submitted May 17, http://arxiv.org/abs/2305.10425.Google Scholar
  • Zhou W, Bajracharya S, Held D (2021) PLAS: Latent action space for offline reinforcement learning. Proc. 2020 Conf. Robot Learn. (PMLR, New York), 1719–1735.Google Scholar
  • Zhou C, Liu P, Xu P, Iyer S, Sun J, Mao Y, Ma X, et al. (2023) LIMA: Less is more for alignment. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.