Learning to Price Supply Chain Contracts Against a Learning Retailer
Abstract
The rise of big data analytics has automated the decision-making of companies and increased supply chain agility. In this paper, we study the supply chain contract design problem faced by a data-driven supplier (she) who needs to respond to the inventory decisions of the downstream retailer (he). Both the supplier and the retailer are uncertain about the market demand and need to learn about it sequentially over a fixed time horizon. In addition, the supplier does not know the retailer’s inventory learning policy, which may change dynamically. The goal for the supplier is to develop data-driven pricing policies with sublinear regret bounds under a wide range of possible retailer inventory learning policies. To capture the dynamics induced by the retailer’s inventory learning policy, we establish a connection with nonstationary online learning by following the notion of a variation budget. We start by making the observation that existing approaches for nonstationary online learning cannot precisely delineate the dynamics incurred by the retailer’s inventory learning policy, and may lead to linear growth in the supplier’s regret under some well-known retailer inventory learning policies. To overcome this challenge, we introduce a new notion of variation budget, which better quantifies the impact of the retailer’s learning on the supplier’s decision-making environment. We also demonstrate the advantages of our new model for the variation budget in our setting over those in the existing literature. We then proceed to propose dynamic pricing policies for the supplier for both discrete and continuous demand distributions. Our pricing policies lead to sublinear regret bounds for the supplier under a wide range of retailer inventory learning policies. Our pricing policies empirically outperform those from the existing nonstationary online learning literature. At the managerial level, we answer affirmatively that there is a pricing policy with a sublinear regret bound for the supplier under a wide range of retailer inventory learning policies, even though she faces a learning retailer and an unknown demand distribution. Our work also provides a novel perspective in data-driven operations management where the principal has to learn to react to the learning policies employed by other agents in the system.
This paper was accepted by Jeannette Song, operations management.
Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.03339.

