Technical Note—Online Matching with Bayesian Rewards

David Simchi-Levi
David Simchi-Levi
[email protected]
https://orcid.org/0000-0002-4650-1519
Institute for Data, Systems, and Society, Department of Civil & Environmental Engineering, and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Rui Sun
Corresponding Author
Rui Sun
[email protected]
https://orcid.org/0000-0001-6273-6898
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Xinshang Wang
Xinshang Wang
[email protected]
https://orcid.org/0000-0003-4683-7167
Alibaba Group US, San Mateo, California 94402; and Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200240, China
Search for more papers by this author

Institute for Data, Systems, and Society, Department of Civil & Environmental Engineering, and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Rui Sun

Corresponding Author

Rui Sun

[email protected]

https://orcid.org/0000-0001-6273-6898

Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Xinshang Wang

[email protected]

https://orcid.org/0000-0003-4683-7167

Alibaba Group US, San Mateo, California 94402; and Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200240, China

Search for more papers by this author

Published Online:28 Jul 2023https://doi.org/10.1287/opre.2021.0499

Abstract

We study in this paper an online matching problem where a central platform needs to match a number of limited resources to different groups of users that arrive sequentially over time. The reward of each matching option depends on both the type of resource and the time period the user arrives. The matching rewards are assumed to be unknown but drawn from probability distributions that are known a priori. The platform then needs to learn the true rewards online based on real-time observations of the matching results. The goal of the central platform is to maximize the total reward from all of the matchings without violating the resource capacity constraints. We formulate this matching problem with Bayesian rewards as a Markovian multiarmed bandit problem with budget constraints, where each arm corresponds to a pair of a resources and a time period. We devise our algorithm by first finding policies for each single arm separately via a relaxed linear program and then “assembling” these policies together through judicious selection criteria and well-designed pulling orders. We prove that the expected reward of our algorithm is at least $\frac{1}{2} (\sqrt{2} - 1)$ of the expected reward of an optimal algorithm.

Funding: The authors thank the Massachusetts Institute of Technology (MIT)-IBM partnership in Artificial Intelligence and the MIT Data Science Laboratory for support.

Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2021.0499.

Volume 73, Issue 1

January-February 2025

Pages iii-vii, 1-582, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:December 24, 2019
Accepted:June 06, 2023
Published Online:July 28, 2023

Cite as

David Simchi-Levi; , Rui Sun; , Xinshang Wang (2023) Technical Note—Online Matching with Bayesian Rewards. Operations Research 73(1):278-289.

https://doi.org/10.1287/opre.2021.0499

Keywords

Acknowledgments

The authors thank the area editor Prof. Daniel Kuhn, the associate editor, and the reviewers for their invaluable guidance and support throughout the review process. The authors sincerely appreciate their time and expertise in evaluating our paper. Their insightful and constructive feedback greatly enhanced the quality of this work.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Technical Note—Online Matching with Bayesian Rewards

Abstract

Volume 73, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News