Online Learning and Pricing for Service Systems with Reusable Resources

Huiwen Jia
Huiwen Jia
[email protected]
https://orcid.org/0000-0002-2633-9278
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author
,
Cong Shi
Corresponding Author
Cong Shi
[email protected]
https://orcid.org/0000-0003-3564-3391
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author
,
Siqian Shen
Siqian Shen
[email protected]
https://orcid.org/0000-0002-2854-163X
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Cong Shi

Corresponding Author

Cong Shi

[email protected]

https://orcid.org/0000-0003-3564-3391

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Siqian Shen

[email protected]

https://orcid.org/0000-0002-2854-163X

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Published Online:10 Nov 2022https://doi.org/10.1287/opre.2022.2381

Abstract

We consider a price-based revenue management problem with finite reusable resources over a finite time horizon T. Customers arrive following a price-dependent Poisson process, and each customer requests one unit of c homogeneous reusable resources. If there is an available unit, the customer gets served within a price-dependent exponentially distributed service time; otherwise, the customer waits in a queue until the next available unit. In this paper, we assume that the firm does not know how the arrival and service rates depend on posted prices, and thus it makes adaptive pricing decisions in each period based only on past observations to maximize the cumulative revenue. Given a discrete price set with cardinality P, we propose two online learning algorithms, termed batch upper confidence bound (BUCB) and batch Thompson sampling (BTS), and prove that the cumulative regret upper bound is $\tilde{O} (\sqrt{P T})$ , which matches the regret lower bound. In establishing the regret, we bound the transient system performance upon price changes via a novel coupling argument, and also generalize bandits to accommodate subexponential rewards. We also extend our approach to models with balking and reneging customers and discuss a continuous price setting. Our numerical experiments demonstrate the efficacy of the proposed BUCB and BTS algorithms.

Funding: This research was partially supported by an Amazon research award and the Department of Energy [Award DE-SC0018018].

Volume 72, Issue 3

May-June 2024

Pages iii-vi, 871-1316, C2-C3

Article Information

Metrics

Information

Received:January 16, 2021
Accepted:September 01, 2022
Published Online:November 10, 2022

Cite as

Huiwen Jia, Cong Shi, Siqian Shen (2022) Online Learning and Pricing for Service Systems with Reusable Resources. Operations Research 72(3):1203-1241.

https://doi.org/10.1287/opre.2022.2381

Keywords

Acknowledgments

The authors thank area editors Ramandeep Randhawa and Itai Gurvich, the associate editor, and the three anonymous referees for their very detailed and constructive comments, which helped significantly improve the content and exposition of this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Learning and Pricing for Service Systems with Reusable Resources

Abstract

Volume 72, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News