Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

Junhui Cai
Junhui Cai
[email protected]
https://orcid.org/0009-0005-2740-3840
Department of Information Technology, Analytics, and Operations, University of Notre Dame, Notre Dame, Indiana 46556
Search for more papers by this author
,
Ran Chen
Ran Chen
[email protected]
https://orcid.org/0009-0005-0695-3210
Department of Statistics and Data Science, Washington University in St. Louis, St. Louis, Missouri 63130
Search for more papers by this author
,
Martin J. Wainwright
Martin J. Wainwright
[email protected]
https://orcid.org/0000-0002-8760-2236
Laboratory for Information and Decision Systems, Statistics and Data Science Center, Departments of Electrical Engineering and Computer Science and Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Linda Zhao
Corresponding Author
Linda Zhao
[email protected]
https://orcid.org/0009-0002-2752-7294
Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Search for more papers by this author

Department of Information Technology, Analytics, and Operations, University of Notre Dame, Notre Dame, Indiana 46556

Search for more papers by this author

Ran Chen

[email protected]

https://orcid.org/0009-0005-0695-3210

Department of Statistics and Data Science, Washington University in St. Louis, St. Louis, Missouri 63130

Search for more papers by this author

Martin J. Wainwright

[email protected]

https://orcid.org/0000-0002-8760-2236

Laboratory for Information and Decision Systems, Statistics and Data Science Center, Departments of Electrical Engineering and Computer Science and Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Linda Zhao

Corresponding Author

Linda Zhao

[email protected]

https://orcid.org/0009-0002-2752-7294

Department of Statistics and Data Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Search for more papers by this author

Published Online:9 Jun 2026https://doi.org/10.1287/mnsc.2024.08311

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., New York), 2312–2320.Google Scholar
Abbasi-Yadkori Y, Pal D, Szepesvari C (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Proc. 15th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1–9.Google Scholar
Abdallah T, Vulcano G (2021) Demand estimation under the multinomial logit model from sales transaction data. Manufacturing Service Oper. Management 23(5):1196–1216.Link, Google Scholar
Agrawal R (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.Crossref, Google Scholar
Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.Link, Google Scholar
Akçay Y, Natarajan HP, Xu SH (2010) Joint dynamic pricing of multiple perishable products under consumer choice. Management Sci. 56(8):1345–1361.Link, Google Scholar
Aparicio D, Eckles D, Kumar M (2023) Algorithmic pricing and consumer sensitivity to price variability. Preprint, submitted May 8, https://doi.org/10.2139/ssrn.4435831.Google Scholar
Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
Ban G-Y, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
Ban Y, Yan Y, Banerjee A, He J (2022) EE-net: Exploitation-exploration neural networks in contextual bandits. Tenth Internat. Conf. Learn. Representations (Virtual, 2022).Google Scholar
Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
Belloni A, Freund R, Selove M, Simester D (2008) Optimizing product line designs: Efficient methods and comparisons. Management Sci. 54(9):1544–1552.Link, Google Scholar
Bennett J, Lanning S (2007) The Netflix prize. Proc. KDD Cup Workshop 2007 (Association for Computing Machinery, New York), 35.Google Scholar
Bernstein F, Modaresi S, Sauré D (2019) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.Abstract, Google Scholar
Bertsimas D, Mišić VV (2019) Exact first-choice product line optimization. Oper. Res. 67(3):651–670.Link, Google Scholar
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
Besson L, Kaufmann E (2018) What doubling tricks can and can’t do for multi-armed bandits. Preprint, submitted March 19, https://arxiv.org/abs/1803.06971.Google Scholar
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
Cai TT, Guo Z (2017) Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45(2):615–646.Google Scholar
Cai TT, Zhang A (2018) Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Statist. 46(1):60–89.Google Scholar
Cai TT, Zhou W-X (2016) Matrix completion via max-norm constrained optimization. Electronic J. Statist. 10(1):1493–1525.Google Scholar
Candes EJ, Plan Y (2010) Matrix completion with noise. Proc. IEEE 98(6):925–936.Crossref, Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Cavallo A (2018) More Amazon effects: Online competition and pricing behaviors. NBER Working Paper No. 25138, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Chen R (2022) Estimation and inference for convex functions and computational efficiency in high dimensional statistics. PhD thesis, University of Pennsylvania, Philadelphia.Google Scholar
Chen KD, Hausman WH (2000) Mathematical properties of the optimal product line selection problem using choice-based conjoint analysis. Management Sci. 46(2):327–332.Link, Google Scholar
Chen N, Gallego G (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.Link, Google Scholar
Chen X, Wang Y (2017) A note on a tight lower bound for MNL-bandit assortment selection models. Preprint, submitted September 18, https://arxiv.org/abs/1709.06109v1.Google Scholar
Chen X, Krishnamurthy A, Wang Y (2024) Robust dynamic assortment optimization in the presence of outlier customers. Oper. Res. 72(3):999–1015.Link, Google Scholar
Chen X, Wang Y, Zhou Y (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(1):8918–8961.Google Scholar
Chen X, Owen Z, Pixton C, Simchi-Levi D (2022a) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.Link, Google Scholar
Chen X, Shi C, Wang Y, Zhou Y (2021) Dynamic assortment planning under nested logit models. Production Oper. Management 30(1):85–102.Crossref, Google Scholar
Chen Y, Xie M, Liu J, Zhao K (2022b) Interconnected neural linear contextual bandits with UCB exploration. 26th Pacific-Asia Conf. Knowledge Discovery Data Mining (Springer, Berlin, Heidelberg), 169–181.Google Scholar
Chen Y, Wang Y, Fang EX, Wang Z, Li R (2022c) Nearly dimension-independent sparse linear bandit over small action spaces via best subset selection. J. Amer. Statist. Assoc. 119(545):246–258.Google Scholar
Cheung WC, Simchi-Levi D (2017) Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Preprint, submitted November 21, https://doi.org/10.2139/ssrn.3075658.Google Scholar
Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.Link, Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory (Helsinki, Finland, 2008), 355–366.Google Scholar
den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
Estelami H, Lehmann DR, Holden AC (2001) Macro-economic determinants of consumer price knowledge: A meta-analysis of four decades of research. Internat. J. Res. Marketing 18(4):341–355.Crossref, Google Scholar
Fan J, Guo Y, Yu M (2022) Policy optimization using semiparametric models for dynamic pricing. J. Amer. Statist. Assoc. 119(545):552–564.Crossref, Google Scholar
Fan J, Li K, Liao Y (2021) Recent developments in factor models and applications in econometric learning. Annu. Rev. Financial Econom. 13:401–430.Google Scholar
Féraud R, Allesiardo R, Urvoy T, Clérot F (2016) Random forest for the contextual bandit problem. Proc. 19th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 93–101.Google Scholar
Ferreira KJ, Mower E (2023) Demand learning and pricing for varying assortments. Manufacturing Service Oper. Management 25(4):1227–1244.Link, Google Scholar
Gallego G, Wang R (2014) Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Oper. Res. 62(2):450–461.Link, Google Scholar
Gordon BR, Goldfarb A, Li Y (2013) Does price elasticity vary with economic growth? A cross-category analysis. J. Marketing Res. 50(1):4–23.Crossref, Google Scholar
Green PE, Krieger AM (1985) Models and heuristics for product line selection. Marketing Sci. 4(1):1–19.Link, Google Scholar
Green PE, Krieger AM (1993) Conjoint analysis with product-positioning applications. Eliashberg J, Lilien GL, eds. Handbooks in Operations Research and Management Science, vol. 5 (North–Holland, Amsterdam), 467–515.Google Scholar
Hao B, Lattimore T, Wang M (2020) High-dimensional sparse linear bandits. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., New York), 10753–10763.Google Scholar
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J. Ed. Psych. 24(6):417.Crossref, Google Scholar
Hu J, Chen X, Jin C, Li L, Wang L (2021) Near-optimal representation learning for linear bandits and linear RL. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York), 4349–4358.Google Scholar
Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
Jun K-S, Willett R, Wright S, Nowak R (2019) Bilinear bandits with low-rank structure. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 3163–3172.Google Scholar
Kaiser HF (1958) The Varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3):187–200.Crossref, Google Scholar
Kallus N, Udell M (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.Link, Google Scholar
Kang Y, Hsieh C-J, Lee TCM (2022) Efficient frameworks for generalized low-rank matrix bandit problems. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., New York), 19971–19983.Google Scholar
Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
Kim G-S, Paik MC (2019) Doubly-robust LASSO bandit. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., New York), 5877–5887.Google Scholar
Kim J-h, Vojnovic M (2021) Scheduling servers with stochastic bilinear rewards. Preprint, submitted December 13, https://arxiv.org/abs/2112.06362v1.Google Scholar
Kleinberg R (2004) Nearly tight bounds for the continuum-armed bandit problem. Proc. 17th Internat. Conf. Neural Inform. Processing Systems NIPS’04 (MIT Press, Cambridge, MA), 697–704.Google Scholar
Kleinberg R, Leighton T (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annu. IEEE Sympos. Foundations Comput. Sci. 2003 (IEEE, Piscataway, NJ), 594–605.Google Scholar
Kleinberg R, Slivkins A, Upfal E (2019) Bandits and experts in metric spaces. J. ACM 66(4):30.Google Scholar
Krishnamurthy A, Langford J, Slivkins A, Zhang C (2020) Contextual bandits with continuous actions: Smoothing, zooming, and adapting. J. Machine Learn. Res. 21(1):5402–5446.Google Scholar
Kumar V, Umashankar N, Kim KH, Bhagwat Y (2014) Assessing the influence of economic and customer experience factors on service purchase behaviors. Marketing Sci. 33(5):673–692.Link, Google Scholar
Kveton B, Szepesvári C, Rao A, Wen Z, Abbasi-Yadkori Y, Muthukrishnan S (2017) Stochastic low-rank bandits. Preprint, submitted December 13, https://arxiv.org/abs/1712.04644.Google Scholar
Lale S, Azizzadenesheli K, Anandkumar A, Hassibi B (2019) Stochastic linear bandits with hidden low rank structure. Preprint, submitted January 28, https://arxiv.org/abs/1901.09490.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Lee SJ, Sun WW, Liu Y (2024) Low-rank online dynamic assortment with dual contextual information. Preprint, submitted April 19, https://arxiv.org/abs/2404.17592v1.Google Scholar
Li H, Webster S, Yu G (2020) Product design under multinomial logit choices: Optimization of quality and prices in an evolving product line. Manufacturing Service Oper. Management 22(5):1011–1025.Link, Google Scholar
Li Q, Cheng G, Fan J, Wang Y (2018) Embracing the blessing of dimensionality in factor models. J. Amer. Statist. Assoc. 113(521):380–389.Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. WWW’10 Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
Lu T, Pál D, Pál M (2010) Contextual multi-armed bandits. Proc. 13th Internat. Conf. Artificial Intelligence Statist. JMLR Workshop Conf. Proc. (PMLR, New York), 485–492.Google Scholar
Lu Y, Meisami A, Tewari A (2021) Low-rank generalized linear bandit problems. Proc. 24th Internat. Conf. Artificial Intelligence Statist. AISTATS (PMLR, New York), 460–468.Google Scholar
Madden TJ, Hewett K, Roth MS (2000) Managing images in different cultures: A cross-national study of color meanings and preferences. J. Internat. Marketing 8(4):90–107.Crossref, Google Scholar
McBride RD, Zufryden FS (1988) An integer programming approach to the optimal product line selection problem. Marketing Sci. 7(2):126–140.Link, Google Scholar
Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
Miao S, Chao X (2022) Online personalized assortment optimization with high-dimensional customer contextual data. Manufacturing Service Oper. Management 24(5):2741–2760.Link, Google Scholar
Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.Crossref, Google Scholar
Negahban S, Wainwright MJ (2011) Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39(2):1069–1097.Crossref, Google Scholar
Negahban S, Wainwright MJ (2012) Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. J. Machine Learn. Res. 13:1665–1697.Google Scholar
Papini M, Tirinzoni A, Restelli M, Lazaric A, Pirotta M (2021) Leveraging good representations in linear contextual bandits. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York), 8371–8380.Google Scholar
Pol LG (1991) Demographic contributions to marketing: An assessment. J. Acad. Marketing Sci. 19:53–59.Google Scholar
Qiang S, Bayati M (2016) Dynamic pricing with demand covariates. Preprint, submitted April 25, https://arxiv.org/abs/1604.07463.Google Scholar
Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3):471–501.Crossref, Google Scholar
Rizk G, Thomas A, Colin I, Laraki R, Chevaleyre Y (2021) Best arm identification in graphical bilinear bandits. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York), 9010–9019.Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
Rohe K, Zeng M (2023) Vintage factor analysis with Varimax performs statistical inference. J. Roy. Statist. Soc. Ser. B Statist. Methodology 85(4):1037–1060.Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Rusmevichientong P, Shen Z-JM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.Link, Google Scholar
Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.Link, Google Scholar
Schuler A, Liu V, Wan J, Callahan A, Udell M, Stark DE, Shah NH (2016) Discovering patient phenotypes using generalized low rank models. Biocomputing 2016 Proc. Pacific Sympos. (World Scientific, Singapore), 144–155.Google Scholar
Shen S, Chen X, Fang E, Lu J (2023) Combinatorial inference on the optimal assortment in multinomial logit models. Preprint, submitted February 27, https://doi.org/10.2139/ssrn.4371919.Google Scholar
Singh S (2006) Impact of color on marketing. Management Decision 44(6):783–789.Google Scholar
Slivkins A (2011) Contextual bandits with similarity information. Proc. 24th Annu. Conf. Learn. Theory JMLR Workshop Conf. Proc. (PMLR, New York), 679–702.Google Scholar
Srebro N, Alon N, Jaakkola TS (2005) Generalization error bounds for collaborative prediction with low-rank matrices. Adv. Neural Inform. Processing Systems, vol. 18 (Curran Associates, New York), 1321–1328.Google Scholar
Turğay E, Bulucu C, Tekin C (2020) Exploiting relevance for online decision-making in high-dimensions. IEEE Trans. Signal Processing 69:1438–1451.Google Scholar
Udell M, Horn C, Zadeh R, Boyd S (2016) Generalized low rank models. Foundations Trends Machine Learn. 9(1):1–118.Crossref, Google Scholar
Vulcano G, Van Ryzin G, Ratliff R (2012) Estimating primary demand for substitutable products from sales transaction data. Oper. Res. 60(2):313–334.Link, Google Scholar
Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Wang R (2021) Consumer choice and market expansion: Modeling, optimization, and estimation. Oper. Res. 69(4):1044–1056.Google Scholar
Xu K, Bastani H (2021) Learning across bandits in high dimension via robust statistics. Preprint, submitted December 28, https://arxiv.org/abs/2112.14233v1.Google Scholar
Xu P, Wen Z, Zhao H, Gu Q (2022) Neural contextual bandits with deep representation and shallow exploration. Tenth Internat. Conf. Learn. Representations (Virtual, 2022).Google Scholar
Yang J, Hu W, Lee JD, Du SS (2020) Impact of representation learning in linear bandits. Preprint, submitted October 13, https://arxiv.org/abs/2010.06531v1.Google Scholar
Zhou D, Li L, Gu Q (2020) Neural contextual bandits with UCB-based exploration. Proc. 37th Internat. Conf. Machine Learn. (PMLR, New York), 11492–11502.Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:October 29, 2024
Accepted:October 30, 2025
Published Online:June 09, 2026

Cite as

Junhui Cai, Ran Chen, Martin J. Wainwright, Linda Zhao (2026) Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing. Management Science 0(0).

https://doi.org/10.1287/mnsc.2024.08311

Keywords

Acknowledgments

The authors thank the department editor, associate editor, and referees for their feedback and suggestions, which significantly improved the paper. The authors also thank the industry partners. Authors are listed in alphabetical order.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News