Align Generative Artificial Intelligence with Human Preferences: A Novel Large Language Model Fine-Tuning Method for Online Review Management
Published Online:20 Apr 2026https://doi.org/10.1287/isre.2024.1518
References
- (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
- (2024) Phi-3 technical report: A highly capable language model locally on your phone. Preprint, submitted August 30, http://arxiv.org/abs/2404.14219.Google Scholar
- (2024) GPT-4 technical report. Preprint, submitted March 4, http://arxiv.org/abs/2303.08774.Google Scholar
- AdVon Commerce (2024) Importance of fine-tuning AI models for real-world applications. Retrieved September 7, https://www.advoncommerce.com/topics/fine-tuning-ai.Google Scholar
- (2024) Direct preference optimization with an offset. Ku LW, Martins A, Srikumar V, eds. Findings Assoc. Comput. Linguistics ACL 2024 (Association for Computational Linguistics, Dublin, Ireland), 9954–9972.Google Scholar
- (2024) A general theoretical paradigm to understand learning from human preferences. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 4447–4455.Google Scholar
- (2024) Self-improving customer review response generation based on LLMs. Preprint, submitted May 6, http://arxiv.org/abs/2405.03845.Google Scholar
- (2009) Curriculum learning. Proc. 26th Annual Internat. Conf. Machine Learn. ICML ‘09 (Association for Computing Machinery, New York), 41–48.Google Scholar
- (2023) Eight things to know about large language models. Preprint, submitted April 2, http://arxiv.org/abs/2304.00612.Google Scholar
- (1952) Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39(3/4):324–345.Google Scholar
- (2024) RLHF deciphered: A critical analysis of reinforcement learning from human feedback for LLMs. Preprint, submitted April 16, http://arxiv.org/abs/2404.08555.Google Scholar
- (2023a) Improving translation faithfulness of large language models via augmenting instructions. Preprint, submitted August 24, http://arxiv.org/abs/2308.12674.Google Scholar
- (2023b) AlpaGasus: Training a better alpaca with fewer data. Twelfth Internat. Conf. Learn. Representations (ICLR, Vienna, Austria).Google Scholar
- (2006) The effect of word of mouth on sales: Online book reviews. J. Marketing Res. 43(3):345–354.Crossref, Google Scholar
- (2016) A recurrent latent variable model for sequential data. Preprint, submitted April 6, http://arxiv.org/abs/1506.02216.Google Scholar
- (2014) What makes crisis response strategies work? The impact of crisis involvement and message framing. J. Bus. Res. 67(2):182–189.Crossref, Google Scholar
- (2024) Risks from AI hallucinations and how to avoid them. Persado (January 11), https://www.persado.com/articles/ai-hallucinations/.Google Scholar
- (2024) Managerial response to online positive reviews: Helpful or harmful? Inform. Systems Res. 35(4):1802–1823.Link, Google Scholar
- (2024) Model alignment as prospect theoretic optimization. Proc. 41st Internat. Conf. Machine Learn. (PMLR, New York), 12634–12651.Google Scholar
- (2020) App-aware response synthesis for user reviews. 2020 IEEE Internat. Conf. Big Data (IEEE, Atlanta), 699–708.Google Scholar
- (2019) Off-policy deep reinforcement learning without exploration. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 2052–2062.Google Scholar
- (2024) Importance-weighted offline learning done right. Proc. 35th Internat. Conf. Algorithmic Learn. Theory (PMLR, New York), 614–634.Google Scholar
- (2024) Retrieval-augmented generation for large language models: A survey. Preprint, submitted March 27, http://arxiv.org/abs/2312.10997.Google Scholar
- (2024) Transform your base and chat suggestions with fine-tuned LLM. Retrieved September 7, https://convin.ai/en-us/blog/fine-tuned-large-language-model.Google Scholar
- (1988) Service quality: The six criteria of good perceived service. Rev. Bus. 9(3):10–13.Google Scholar
- (2014) First step in social media: Measuring the influence of online management responses on customer satisfaction. Production Oper. Management 23(4):570–582.Crossref, Google Scholar
- (2023) 81 Online review statistics (new 2024 data). Retrieved September 6, 2024, https://explodingtopics.com/blog/online-review-stats.Google Scholar
- (2024) A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. Preprint, submitted November 19, http://arxiv.org/abs/2311.05232.Google Scholar
- (2023) Camels in a changing climate: Enhancing LM adaptation with Tulu 2. Preprint, submitted November 20, http://arxiv.org/abs/2311.10702.Google Scholar
- (2023) Survey of hallucination in natural language generation. ACM Comput. Surveys 55(12):248.Crossref, Google Scholar
- (2006) Customer complaints and organizational responses: The effects of complainants’ perceptions of justice on satisfaction and loyalty. Internat. J. Hospitality Management 25(1):69–90.Crossref, Google Scholar
- (2022) Improving specificity in review response generation with data-driven data filtering. Proc. Fifth Workshop E-Commerce NLP ECNLP 5 (Association for Computational Linguistics, Dublin, Ireland), 121–133.Google Scholar
- (2020) Benchmarking automated review response generation for the hospitality domain. Proc. Workshop Natural Language Processing E-Commerce (Association for Computational Linguistics, Barcelona, Spain), 43–52.Google Scholar
- (2014) Auto-encoding variational Bayes. Preprint, submitted May 1, https://arxiv.org/abs/1312.6114v10.Google Scholar
- (2019) Data-driven deep reinforcement learning. Retrieved June 18, 2024, http://bair.berkeley.edu/blog/2019/12/05/bear/.Google Scholar
- (2018) Exit, voice, and response on digital platforms: An empirical investigation of online management response strategies. Inform. Systems Res. 29(4):849–870.Link, Google Scholar
- (2019) Stabilizing off-policy Q-learning via bootstrapping error reduction. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted November 1, http://arxiv.org/abs/2005.01643.Google Scholar
- (2023a) FinGPT: Democratizing internet-scale data for financial large language models. Preprint, submitted November 14, https://arxiv.org/abs/2307.10485v2.Google Scholar
- (2023b) GPT understands, too. Preprint, submitted October 25, http://arxiv.org/abs/2103.10385.Google Scholar
- (2023) Self-refine: Iterative refinement with self-feedback. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 46534–46594.Google Scholar
- (2024) AI hallucinations: A misnomer worth clarifying. Preprint, submitted January 9, http://arxiv.org/abs/2401.06796.Google Scholar
- (2000) Service recovery: A framework and empirical investigation. J. Oper. Management 18(4):387–400.Crossref, Google Scholar
- MyScale (2024) Prompt engineering vs fine-tuning vs RAG. Retrieved September 7, https://myscale.com/blog/prompt-engineering-vs-finetuning-vs-rag/.Google Scholar
- OpenAI (2023) DALL3. Retrieved September 7, https://openai.com/index/dall-e-3/.Google Scholar
- OpenAI (2024) Video generation models as world simulators. Retrieved September 7, https://openai.com/index/video-generation-models-as-world-simulators/.Google Scholar
- (2022) Training language models to follow instructions with human feedback. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 27730–27744.Google Scholar
- (2022) Machine learning in information systems research. MIS Quart. 46(1):iii–xix.Crossref, Google Scholar
- (2024) LLM evaluators recognize and favor their own generations. Preprint, submitted April 15, http://arxiv.org/abs/2404.13076.Google Scholar
- (2019) Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. Preprint, submitted October 7, http://arxiv.org/abs/1910.00177.Google Scholar
- (1991) Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1):88–97.Crossref, Google Scholar
- (2023) Direct preference optimization: Your language model is secretly a reward model. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2023) Effects of managerial response to negative reviews on future review valence and complaints. Inform. Systems Res. 34(1):319–341.Link, Google Scholar
- (2022) Frontiers: Supporting content marketing with natural language generation. Marketing Sci. 41(3):441–452.Link, Google Scholar
- (2023) Reinforcement learning from human feedback: Progress and challenges. Retrieved July 1, https://www.youtube.com/watch?v=hhiLw5Q_UFg.Google Scholar
- (2017) Proximal policy optimization algorithms. Preprint, submitted August 28, http://arxiv.org/abs/1707.06347.Google Scholar
- (2023) ReDS: Offline RL with heteroskedastic datasets via support constraints. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 23921–23933.Google Scholar
- (2023) Towards expert-level medical question answering with large language models. Preprint, submitted May 16, http://arxiv.org/abs/2305.09617.Google Scholar
- (2015) Learning structured output representation using deep conditional generative models. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2001) Consumer complaints by e-mail: An exploratory investigation of corporate responses and customer reactions. J. Interactive Marketing 15(1):63–73.Crossref, Google Scholar
- (2014) Sequence to sequence learning with neural networks. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2021) Probabilistic transformer for time series analysis. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 23592–23608.Google Scholar
- (1998) Customer evaluations of service complaint experiences: Implications for relationship marketing. J. Marketing 62(2):60–76.Crossref, Google Scholar
- (2023) Llama 2: Open foundation and fine-tuned chat models. Preprint, submitted July 19, http://arxiv.org/abs/2307.09288.Google Scholar
- (1992) Advances in prospect theory: Cumulative representation of uncertainty. J. Risk Uncertainty 5(4):297–323.Crossref, Google Scholar
- (2024) Adapted large language models can outperform medical experts in clinical text summarization. Nature Medicine 30(4):1134–1142.Crossref, Google Scholar
- (2017) Attention is all you need. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2024a) Oracle-efficient pessimism: Offline policy optimization in contextual bandits. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 766–774.Google Scholar
- (2024b) Customizing large language models for business context: Framework and experiments. Preprint, submitted May 14, http://arxiv.org/abs/2312.10225.Google Scholar
- (2023) How far can camels go? Exploring the state of instruction tuning on open resources. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2017) Joint effects of management responses and online reviews on hotel financial performance: A data-analytics approach. Internat. J. Hospitality Management 62:101–110.Crossref, Google Scholar
- (2023) A transformer-based approach for improving app review response generation. Software Practice Experience 53(2):438–454.Crossref, Google Scholar
- (2020) BERTScore: Evaluating text generation with BERT. 8th Internat. Conf. Learn. Representations (ICLR, Addis Ababa, Ethiopia).Google Scholar
- (2025) Siren’s song in the AI ocean: A survey on hallucination in large language models. Preprint, submitted September 14, http://arxiv.org/abs/2309.01219.Google Scholar
- (2023) SLiC-HF: Sequence likelihood calibration with human feedback. Preprint, submitted May 17, http://arxiv.org/abs/2305.10425.Google Scholar
- (2021) PLAS: Latent action space for offline reinforcement learning. Proc. 2020 Conf. Robot Learn. (PMLR, New York), 1719–1735.Google Scholar
- (2023) LIMA: Less is more for alignment. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar

