Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing
Abstract
Key challenges in running a business include deciding which products or services to present to consumers (the assortment problem) and how to price products (the pricing problem) to maximize revenue or profit. Instead of considering these problems in isolation, we address assortment-pricing jointly and tackle the intrinsic doubly high dimensionality—both actions and contextual vectors can take continuous value in high-dimensional spaces. We propose a doubly high-dimensional contextual bandit model to formulate this problem. To circumvent the curse of dimensionality, our model is simple, yet flexible, capturing the interaction effects between covariates (context) and actions on the reward via a low-rank representation matrix. The resulting class of models is reasonably expressive while remaining interpretable through latent factors and includes various bandit and pricing models as special cases, making it suitable for applications involving simultaneous multiple decision-making beyond joint assortment-pricing. We develop a computationally tractable procedure that combines an exploration/exploitation protocol with an efficient low-rank matrix estimator. We provide a nonasymptotic instance-dependent regret bound involving dimensions and rank in addition to the time horizon. Simulations on standard bandit and pricing models—special cases of our model—demonstrate that our method yields lower regret than state-of-the-art methods. Real-world assortment-pricing case studies, from an industry-leading instant noodle manufacturer to an emerging beauty start-up, underscore the gains achievable using our method, showing at least three-fold gains in revenue/profit and the interpretability of the latent factor models that are learned.
This paper was accepted by J. George Shanthikumar, data science.
Funding: Financial support from The Wharton School [Wharton AI & Analytics Initiative Fund], the National Science Foundation [Grants DMS-2311072 and DMS-2515896), and the Office of Naval Research [Grant N00014-21-1-2842] is gratefully acknowledged.
Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2024.08311.

