Improving Human Sequential Decision Making with Reinforcement Learning

Hamsa Bastani
Hamsa Bastani
[email protected]
https://orcid.org/0000-0002-8793-4732
Operations, Information and Decisions, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Search for more papers by this author
,
Osbert Bastani
Osbert Bastani
[email protected]
https://orcid.org/0000-0001-9990-7566
Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Search for more papers by this author
,
Wichinpong Park Sinchaisri
Corresponding Author
Wichinpong Park Sinchaisri
[email protected]
https://orcid.org/0000-0001-9351-0541
Haas School of Business, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author

Operations, Information and Decisions, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Search for more papers by this author

Osbert Bastani

[email protected]

https://orcid.org/0000-0001-9990-7566

Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Search for more papers by this author

Wichinpong Park Sinchaisri

Corresponding Author

Wichinpong Park Sinchaisri

[email protected]

https://orcid.org/0000-0001-9351-0541

Haas School of Business, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Published Online:22 May 2025https://doi.org/10.1287/mnsc.2022.02455

References

Akşin Z, Deo S, Jónasson JO, Ramdas K (2021) Learning from many: Partner exposure and team familiarity in fluid teams. Management Sci. 67(2):854–874.Link, Google Scholar
Allon G, Cohen MC, Moon K, Sinchaisri WP (2023) Managing multihoming workers in the gig economy. Preprint, submitted July 16, http://dx.doi.org/10.2139/ssrn.4502968.Google Scholar
Argote L (2012) Organizational Learning: Creating, Retaining and Transferring Knowledge (Springer Science & Business Media, New York).Google Scholar
Bastani O, Pu Y, Solar-Lezama A (2018) Verifiable reinforcement learning via policy extraction. NIPS’18 Proc. 32nd Internat. Conf. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 2499–2509.Google Scholar
Bavafa H, Jónasson JO (2021) Recovering from critical incidents: Evidence from paramedic performance. Manufacturing Service Oper. Management 23(4):914–932.Link, Google Scholar
Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039–1082.Crossref, Google Scholar
Brattland H, Høiseth JR, Burkeland O, Inderhaug TS, Binder PE, Iversen VC (2018) Learning from clients: A qualitative investigation of psychotherapists’ reactions to negative verbal feedback. Psychotherapy Res. 28(4):545–559.Crossref, Google Scholar
Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.Crossref, Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees (CRC Press, Boca Raton, FL).Google Scholar
Buciluǎ C Caruana R Niculescu-Mizil A(2006 Model compression. KDD’06 Proc. 12th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 535–541.Google Scholar
Chan TY, Li J, Pierce L (2014) Learning from peers: Knowledge transfer and sales force productivity growth. Marketing Sci. 33(4):463–484.Link, Google Scholar
Chandrasekaran A, Prabhu V, Yadav D, Chattopadhyay P, Parikh D (2018) Do explanations make VQA models more predictable to a human? Preprint, submitted October 29, https://arxiv.org/abs/1810.12366.Google Scholar
Chandrasekaran A, Yadav D, Chattopadhyay P, Prabhu V, Parikh D (2017) It takes two to tango: Towards theory of AI’s mind. Preprint, submitted April 3, https://arxiv.org/abs/1704.00717.Google Scholar
Chui M, Manyika J, Bughin J (2012) The social economy: Unlocking value and productivity through social technologies. Technical report, McKinsey Global Institute, New York.Google Scholar
Dietvorst BJ, Simmons JP, Massey C (2015) Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Experiment. Psych. General 144(1):114–126.Crossref, Google Scholar
Dietvorst BJ, Simmons JP, Massey C (2018) Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Sci. 64(3):1155–1170.Link, Google Scholar
Dorn B, Guzdial M (2010) Learning on the job: Characterizing the programming knowledge and learning strategies of web designers. CHI’10 Proc. SIGCHI Conf. Human Factors Comput. Systems (Association for Computing Machinery, New York), 703–712.Google Scholar
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. Preprint, submitted February 28, https://arxiv.org/abs/1702.08608.Google Scholar
Eastwood J, Snook B, Luther K (2012) What people want from their professionals: Attitudes toward decision-making strategies. J. Behav. Decision Making 25(5):458–468.Crossref, Google Scholar
Fudenberg D, Liang A (2019) Predicting and understanding initial play. Amer. Econom. Rev. 109(12):4112–41.Crossref, Google Scholar
Fudenberg D, Kleinberg J, Liang A, Mullainathan S (2022) Measuring the completeness of economic models. J. Political Econom. 130(4):956–990.Crossref, Google Scholar
Fügener A, Grahl J, Gupta A, Ketter W (2022) Cognitive challenges in human–Artificial intelligence collaboration: Investigating the path toward productive delegation. Inform. Systems Res. 33(2):678–696.Link, Google Scholar
Gleicher M (2016) A framework for considering comprehensibility in modeling. Big Data 4(2):75–88.Crossref, Google Scholar
Green B, Chen Y (2019) The principles and limits of algorithm-in-the-loop decision making. Proc. ACM Human-Comput. Interaction, vol. 3 (Association for Computing Machinery, New York), 1–24.Crossref, Google Scholar
Gurvich I, O’Leary KJ, Wang L, Van Mieghem JA (2020) Collaboration, interruptions, and changeover times: Workflow model and empirical study of hospitalist charting. Manufacturing Service Oper. Management 22(4):754–774.Link, Google Scholar
Herkenhoff K, Lise J, Menzio G, Phillips G (2018) Knowledge diffusion in the workplace. Technical report, University of Minnesota, Minneapolis.Google Scholar
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Preprint, submitted March 9, https://arxiv.org/abs/1503.02531.Google Scholar
Huckman RS, Pisano GP (2006) The firm specificity of individual performance: Evidence from cardiac surgery. Management Sci. 52(4):473–488.Link, Google Scholar
Ibanez MR, Clark JR, Huckman RS, Staats BR (2018) Discretionary task ordering: Queue management in radiological services. Management Sci. 64(9):4389–4407.Link, Google Scholar
Jarosch G, Oberfield E, Rossi-Hansberg E (2021) Learning from coworkers. Econometrica 89(2):647–676.Crossref, Google Scholar
Kagan E, Leider S, Sahin O (2021) Dynamic decision-making in operations management. Johns Hopkins Carey Business School Research Paper No. 21-13, Johns Hopkins Carey Business School, Baltimore.Google Scholar
Kc DS, Staats BR (2012) Accumulating a portfolio of experience: The effect of focal and related experience on surgeon performance. Manufacturing Service Oper. Management 14(4):618–633.Link, Google Scholar
Kim SH, Tong J, Peden C (2020) Admission control biases in hospital unit capacity management: How occupancy information hurdles and decision noise impact utilization. Management Sci. 66(11):5151–5170.Link, Google Scholar
Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z (2015) Prediction policy problems. Amer. Econom. Rev. 105(5):491–95.Crossref, Google Scholar
Kneusel RT, Mozer MC (2017) Improving human-machine cooperative visual search with soft highlighting. ACM Trans. Appl. Perception (TAP) 15(1):1–21.Google Scholar
Lage I, Ross AS, Kim B, Gershman SJ, Doshi-Velez F (2018) Human-in-the-loop interpretability prior. NIPS’18 Proc. 32nd Internat. Conf. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 10180–10189.Google Scholar
Lai V, Tan C (2019) On human predictions with explanations and predictions of machine learning models: A case study on deception detection. FAT’19 Proc. Conf. Fairness Accountability Transparency (Association for Computing Machinery, New York), 29–38.Google Scholar
Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Statist. 9(3):1350–1371.Crossref, Google Scholar
Logg JM, Minson JA, Moore DA (2019) Algorithm appreciation: People prefer algorithmic to human judgment. Organ. Behav. Human Decision Processes 151:90–103.Crossref, Google Scholar
Lu J, Lee D, Kim TW, Danks D (2019) Good explanation for algorithmic transparency. Preprint, submitted November 11, http://dx.doi.org/10.2139/ssrn.3503603.Google Scholar
Marshall A (2020) Uber changes its rules, and drivers adjust their strategies. Wired (February 18), https://www.wired.com/story/uber-changes-rules-drivers-adjust-strategies/.Google Scholar
McIlroy-Young R, Sen S, Kleinberg J, Anderson A (2020) Aligning superhuman AI with human behavior: Chess as a model system. KDD’20 Proc. 26th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1677–1687.Google Scholar
Meyer G, Adomavicius G, Johnson PE, Elidrisi M, Rush WA, Sperl-Hillen JM, O’Connor PJ (2014) A machine learning approach to improving dynamic decision making. Inform. Systems Res. 25(2):239–263.Link, Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
Nonaka I, Takeuchi H (1995) The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation (Oxford University Press, Oxford, UK).Crossref, Google Scholar
Pfeffer J, Sutton RI (2000) The Knowing-Doing Gap: How Smart Companies Turn Knowledge into Action (Harvard Business School Press, Boston).Google Scholar
Puiutta E, Veith EM (2020) Explainable reinforcement learning: A survey. Holzinger A, Kieseberg P, Tjoa A, Weippl E, eds. Machine Learning Knowledge Extraction. CD-MAKE 2020, Lecture Notes in Computer Science, vol. 12279 (Springer, Cham, Switzerland), 77–95.Google Scholar
Ramdas K, Saleh K, Stern S, Liu H (2017) Variety and experience: Learning and forgetting in the use of surgical devices. Management Sci. 64(6):2590–2608.Link, Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. KDD’16 Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1135–1144.Google Scholar
Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. Proc. 14th Internat. Conf. Artificial intelligence Statistics (JMLR), 627–635.Google Scholar
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5):206–215.Crossref, Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
Song H, Tucker AL, Murrell KL, Vinson DR (2017) Closing the productivity gap: Improving worker productivity through public relative performance feedback and validation of best practices. Management Sci. 64(6):2628–2649.Link, Google Scholar
Spear SJ (2005) Fixing health care from the inside, today. Harvard Bus. Rev. 83(9):78–91.Google Scholar
Stites MC, Nyre-Yu M, Moss B, Smutz C, Smith MR (2021) Sage advice? The impacts of explanations for machine learning models on human decision-making in spam detection. Degen H, Ntoa S, eds. Artificial Intelligence HCI. HCII 2021, Lecture Notes in Computer Science, vol. 12797 (Springer, Cham, Switzerland), 269–284.Google Scholar
Sull DN, Eisenhardt KM (2015) Simple Rules: How to Thrive in a Complex World (Houghton Mifflin Harcourt, Boston).Google Scholar
Sun J, Zhang DJ, Hu H, Van Mieghem JA (2022) Predicting human discretion to adjust algorithmic prescription: A large-scale field experiment in warehouse operations. Management Sci. 68(2):846–865.Link, Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. NIPS’99 Proc. 13th Internat. Conf. Adv. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1057–1063.Google Scholar
Szulanski G (1996) Exploring internal stickiness: Impediments to the transfer of best practice within the firm. Strategic Management J. 17(S2):27–43.Crossref, Google Scholar
Tan TF, Netessine S (2019) When you work with a superman, will you also fly? An empirical study of the impact of coworkers on performance. Management Sci. 65(8):3495–3517.Link, Google Scholar
Tucker AL, Edmondson AC, Spear S (2002) When problem solving prevents organizational learning. J. Organ. Change Management 15(2):122–137.Crossref, Google Scholar
Verma A, Murali V, Singh R, Kohli P, Chaudhuri S (2018) Programmatically interpretable reinforcement learning. Internat. Conf. Machine Learn. (PMLR), 5045–5054.Google Scholar
Wang F, Rudin C (2015) Falling rule lists. Artificial Intelligence Statist. (PMLR), 1013–1022.Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar

Volume 72, Issue 1

January 2026

Pages 1-782, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:August 11, 2022
Accepted:December 10, 2024
Published Online:May 22, 2025

Cite as

Hamsa Bastani, Osbert Bastani, Wichinpong Park Sinchaisri (2025) Improving Human Sequential Decision Making with Reinforcement Learning. Management Science 72(1):733-755.

https://doi.org/10.1287/mnsc.2022.02455

Keywords

Acknowledgments

The authors thank Sinan Aral, Ryan Buell, Paul Leonardi, Bryce McLaughlin, and Jann Spiess as well as conference and seminar participants at Boston University, Harvard Business School, the INFORMS Annual Meeting, Massachusetts Institute of Technology, the MSOM Conference, Stanford University, the University of California, Berkeley, and the University of Pennsylvania for helpful comments. The authors thank the Wharton Behavioral Laboratory, the Wharton Risk Center Ackoff Doctoral Student Fellowship, and the BAIR Open Research Commons for financial support. The authors are grateful for the research assistance of Brandon Chin, Xiteng Lin, Ron Wang, and Yuanxin Zhu.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Improving Human Sequential Decision Making with Reinforcement Learning

References

Volume 72, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News