Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning

Published Online:https://doi.org/10.1287/opre.2022.0624

We consider a periodic-review dual-sourcing inventory system in which the expedited supplier is faster and more costly, whereas the regular supplier is slower and cheaper. Under full demand distributional information, it is well known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, whereas the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) best Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T(logT)3 log logT), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high-probability coupling argument between our and the clairvoyant optimal system dynamics.

Funding: The research of C. Shi is partially supported by an Amazon research award.

Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0624.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.