Online Learning for Constrained Assortment Optimization Under Markov Chain Choice Model
Abstract
We study a dynamic assortment selection problem where arriving customers make purchase decisions among offered products from a universe of products under a Markov chain choice (MCC) model. The retailer only observes the assortment and the customer’s single choice per period. Given limited display capacity, resource constraints, and no a priori knowledge of problem parameters, the retailer’s objective is to sequentially learn the choice model and optimize cumulative revenues over a finite selling horizon. We develop a fast linear system based explore-then-commit (FastLinETC for short) learning algorithm that balances the tradeoff between exploration and exploitation. The algorithm can simultaneously estimate the arrival and transition probabilities in the MCC model by solving a linear system of equations and determining the near-optimal assortment based on these estimates. Furthermore, our consistent estimators offer superior computational times compared with existing heuristic estimation methods, which often suffer from inconsistency or a significant computational burden.
Funding: The research of Q. Luo is partially supported by the National Science Foundation [Grant CMMI-2308750]. The research of Z. Huang is partially supported by the Shanghai Sailing Program [Grant 22YF1451100 and the Fundamental Research Funds for the Central Universities]. The research of C. Shi is partially supported by Amazon [Research Award].

