Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Zuyue Fu
Zuyue Fu
[email protected]
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author
,
Zhengling Qi
Corresponding Author
Zhengling Qi
[email protected]
https://orcid.org/0000-0003-0270-7969
Department of Decision Sciences, George Washington University, Washington, District of Columbia 20052
Search for more papers by this author
,
Zhuoran Yang
Zhuoran Yang
[email protected]
Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06511
Search for more papers by this author
,
Zhaoran Wang
Zhaoran Wang
[email protected]
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author
,
Lan Wang
Lan Wang
[email protected]
https://orcid.org/0000-0002-3217-0202
Department of Management Science, University of Miami, Coral Gables, Florida 33146
Search for more papers by this author

Zuyue Fu

[email protected]

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Zhengling Qi

Corresponding Author

Zhengling Qi

[email protected]

https://orcid.org/0000-0003-0270-7969

Department of Decision Sciences, George Washington University, Washington, District of Columbia 20052

Search for more papers by this author

Zhuoran Yang

[email protected]

Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06511

Search for more papers by this author

Zhaoran Wang

[email protected]

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Lan Wang

[email protected]

https://orcid.org/0000-0002-3217-0202

Department of Management Science, University of Miami, Coral Gables, Florida 33146

Search for more papers by this author

Published Online:6 Aug 2025https://doi.org/10.1287/mnsc.2022.04112

Abstract

Motivated by the human-machine interaction such as recommending videos for improving customer engagement, we study human-guided human-machine interaction for decision making with private information. We model this interaction as a two-player turn-based game, where one player (Bob, a human) guides the other player (Alice, a machine) toward a common goal. Specifically, we focus on offline reinforcement learning (RL) in this game, where the goal is to find a policy pair for Alice and Bob that maximizes their expected total rewards based on an offline data set collected a priori. The offline setting presents two challenges: (i) We cannot collect Bob’s private information, leading to a confounding bias when using standard RL methods, and (ii) there is a distributional mismatch between the behavior policy used to collect data and the desired optimal policy we aim to learn. To tackle the confounding bias, we treat Bob’s previous action as an instrumental variable for Alice’s current decision making to adjust for the unmeasured confounding. We establish a novel identification result and propose a new off-policy evaluation (OPE) method for evaluating policy pairs in this two-player turn-based game. To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy policy learning algorithm for finding a desirable policy pair for both Alice and Bob. Moreover, we prove that under some technical assumptions, the policy pair obtained through our method converges to the optimal one at a satisfactory rate. Finally, we conduct a simulation study to demonstrate the performance of the proposed method.

This paper was accepted by Nicolas Stier, Special Issue on the Human-Algorithm Connection.

Funding: L. Wang’s research is partially supported by the National Science Foundation [Grant FRGMS-1952373].

Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mnsc.2022.04112.

Volume 72, Issue 1

January 2026

Pages 1-782, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:December 21, 2022
Accepted:December 27, 2024
Published Online:August 06, 2025

Cite as

Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang (2025) Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information. Management Science 72(1):646-666.

https://doi.org/10.1287/mnsc.2022.04112

Keywords

Acknowledgments

The authors thank the department editor, associate editor, and three reviewers for helpful comments, constructive suggestions, and insightful feedback that significantly improved this manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Abstract

Volume 72, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News