Technical Note—Dynamic Pricing and Learning with Discounting

Zhichao Feng
Zhichao Feng
[email protected]
https://orcid.org/0000-0002-4680-9193
Department of Logistics and Maritime Studies, Faculty of Business, The Hong Kong Polytechnic University, Kowloon, Hong Kong;
Search for more papers by this author
,
Milind Dawande
Milind Dawande
[email protected]
https://orcid.org/0000-0001-6956-0856
Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author
,
Ganesh Janakiraman
Ganesh Janakiraman
[email protected]
https://orcid.org/0000-0001-7386-4318
Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author
,
Anyan Qi
Corresponding Author
Anyan Qi
[email protected]
https://orcid.org/0000-0003-2933-8294
Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author

Department of Logistics and Maritime Studies, Faculty of Business, The Hong Kong Polytechnic University, Kowloon, Hong Kong;

Search for more papers by this author

Milind Dawande

[email protected]

https://orcid.org/0000-0001-6956-0856

Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Ganesh Janakiraman

[email protected]

https://orcid.org/0000-0001-7386-4318

Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Anyan Qi

Corresponding Author

Anyan Qi

[email protected]

https://orcid.org/0000-0003-2933-8294

Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Published Online:14 Jul 2023https://doi.org/10.1287/opre.2023.2477

Abstract

In many practical settings, learning algorithms can take a substantial amount of time to converge, thereby raising the need to understand the role of discounting in learning. We illustrate the impact of discounting on the performance of learning algorithms by examining two classic and representative dynamic-pricing and learning problems studied in Broder and Rusmevichientong (BR) [Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980] and Keskin and Zeevi (KZ) [Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167]. In both settings, a seller sells a product with unlimited inventory over T periods. The seller initially does not know the parameters of the general choice model in BR (respectively, the linear demand curve in KZ). Given a discount factor ρ, the retailer’s objective is to determine a pricing policy to maximize the expected discounted revenue over T periods. In both settings, we establish lower bounds on the regret under any policy and show limiting bounds of $Ω (\sqrt{1 / (1 - ρ)})$ and $Ω (\sqrt{T})$ when $T \to \infty$ and $ρ \to 1$ , respectively. In the model of BR with discounting, we propose an asymptotically tight learning policy and show that the regret under our policy as well that under the MLE-CYCLE policy in BR is $O (\sqrt{1 / (1 - ρ)})$ (respectively, $O (\sqrt{T})$ ) when $T \to \infty$ (respectively, $ρ \to 1$ ). In the model of KZ with discounting, we present sufficient conditions for a learning policy to guarantee asymptotic optimality and show that the regret under any policy satisfying these conditions is $O (\log (1 / (1 - ρ)) \sqrt{1 / (1 - ρ)})$ (respectively, $O (\log T \sqrt{T})$ ) when $T \to \infty$ (respectively, $ρ \to 1$ ). We show that three different policies—namely, the two variants of the greedy iterated least squares policy in KZ and a different policy that we propose—achieve this upper bound on the regret. We numerically examine the behavior of the regret under our policies as well as those in BR and KZ in the presence of discounting. We also analyze a setting in which the discount factor per period is a function of the number of decision periods in the planning horizon.

Funding: Z. Feng received support from the National Natural Science Foundation of China [Grant 72201256].

Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.2477.

Volume 72, Issue 2

March-April 2024

Pages iii-vi, 425-870, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:February 18, 2022
Accepted:March 21, 2023
Published Online:July 14, 2023

Cite as

Zhichao Feng, Milind Dawande, Ganesh Janakiraman, Anyan Qi (2023) Technical Note—Dynamic Pricing and Learning with Discounting. Operations Research 72(2):481-492.

https://doi.org/10.1287/opre.2023.2477

Keywords

Acknowledgments

The authors thank the area editor, the associate editor, and two anonymous reviewers for their many constructive comments and suggestions that significantly improved this paper in both content and exposition.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Technical Note—Dynamic Pricing and Learning with Discounting

Abstract

Volume 72, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News