Online Learning for Constrained Assortment Optimization Under Markov Chain Choice Model

Published Online:https://doi.org/10.1287/opre.2022.0693

We study a dynamic assortment selection problem where arriving customers make purchase decisions among offered products from a universe of products under a Markov chain choice (MCC) model. The retailer only observes the assortment and the customer’s single choice per period. Given limited display capacity, resource constraints, and no a priori knowledge of problem parameters, the retailer’s objective is to sequentially learn the choice model and optimize cumulative revenues over a finite selling horizon. We develop a fast linear system based explore-then-commit (FastLinETC for short) learning algorithm that balances the tradeoff between exploration and exploitation. The algorithm can simultaneously estimate the arrival and transition probabilities in the MCC model by solving a linear system of equations and determining the near-optimal assortment based on these estimates. Furthermore, our consistent estimators offer superior computational times compared with existing heuristic estimation methods, which often suffer from inconsistency or a significant computational burden.

Funding: The research of Q. Luo is partially supported by the National Science Foundation [Grant CMMI-2308750]. The research of Z. Huang is partially supported by the Shanghai Sailing Program [Grant 22YF1451100 and the Fundamental Research Funds for the Central Universities]. The research of C. Shi is partially supported by Amazon [Research Award].

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.