Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

Published Online:https://doi.org/10.1287/mnsc.2024.08311

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., New York), 2312–2320.Google Scholar
  • Abbasi-Yadkori Y, Pal D, Szepesvari C (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Proc. 15th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1–9.Google Scholar
  • Abdallah T, Vulcano G (2021) Demand estimation under the multinomial logit model from sales transaction data. Manufacturing Service Oper. Management 23(5):1196–1216.LinkGoogle Scholar
  • Agrawal R (1995) The continuum-armed bandit problem. SIAM J. Control Optim. 33(6):1926–1951.CrossrefGoogle Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) MNL-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Akçay Y, Natarajan HP, Xu SH (2010) Joint dynamic pricing of multiple perishable products under consumer choice. Management Sci. 56(8):1345–1361.LinkGoogle Scholar
  • Aparicio D, Eckles D, Kumar M (2023) Algorithmic pricing and consumer sensitivity to price variability. Preprint, submitted May 8, https://doi.org/10.2139/ssrn.4435831.Google Scholar
  • Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
  • Ban G-Y, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.LinkGoogle Scholar
  • Ban Y, Yan Y, Banerjee A, He J (2022) EE-net: Exploitation-exploration neural networks in contextual bandits. Tenth Internat. Conf. Learn. Representations (Virtual, 2022).Google Scholar
  • Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.LinkGoogle Scholar
  • Belloni A, Freund R, Selove M, Simester D (2008) Optimizing product line designs: Efficient methods and comparisons. Management Sci. 54(9):1544–1552.LinkGoogle Scholar
  • Bennett J, Lanning S (2007) The Netflix prize. Proc. KDD Cup Workshop 2007 (Association for Computing Machinery, New York), 35.Google Scholar
  • Bernstein F, Modaresi S, Sauré D (2019) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.AbstractGoogle Scholar
  • Bertsimas D, Mišić VV (2019) Exact first-choice product line optimization. Oper. Res. 67(3):651–670.LinkGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Besson L, Kaufmann E (2018) What doubling tricks can and can’t do for multi-armed bandits. Preprint, submitted March 19, https://arxiv.org/abs/1803.06971.Google Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Cai TT, Guo Z (2017) Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity. Ann. Statist. 45(2):615–646.Google Scholar
  • Cai TT, Zhang A (2018) Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics. Ann. Statist. 46(1):60–89.Google Scholar
  • Cai TT, Zhou W-X (2016) Matrix completion via max-norm constrained optimization. Electronic J. Statist. 10(1):1493–1525.Google Scholar
  • Candes EJ, Plan Y (2010) Matrix completion with noise. Proc. IEEE 98(6):925–936.CrossrefGoogle Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Cavallo A (2018) More Amazon effects: Online competition and pricing behaviors. NBER Working Paper No. 25138, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Chen R (2022) Estimation and inference for convex functions and computational efficiency in high dimensional statistics. PhD thesis, University of Pennsylvania, Philadelphia.Google Scholar
  • Chen KD, Hausman WH (2000) Mathematical properties of the optimal product line selection problem using choice-based conjoint analysis. Management Sci. 46(2):327–332.LinkGoogle Scholar
  • Chen N, Gallego G (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.LinkGoogle Scholar
  • Chen X, Wang Y (2017) A note on a tight lower bound for MNL-bandit assortment selection models. Preprint, submitted September 18, https://arxiv.org/abs/1709.06109v1.Google Scholar
  • Chen X, Krishnamurthy A, Wang Y (2024) Robust dynamic assortment optimization in the presence of outlier customers. Oper. Res. 72(3):999–1015.LinkGoogle Scholar
  • Chen X, Wang Y, Zhou Y (2020) Dynamic assortment optimization with changing contextual information. J. Machine Learn. Res. 21(1):8918–8961.Google Scholar
  • Chen X, Owen Z, Pixton C, Simchi-Levi D (2022a) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.LinkGoogle Scholar
  • Chen X, Shi C, Wang Y, Zhou Y (2021) Dynamic assortment planning under nested logit models. Production Oper. Management 30(1):85–102.CrossrefGoogle Scholar
  • Chen Y, Xie M, Liu J, Zhao K (2022b) Interconnected neural linear contextual bandits with UCB exploration. 26th Pacific-Asia Conf. Knowledge Discovery Data Mining (Springer, Berlin, Heidelberg), 169–181.Google Scholar
  • Chen Y, Wang Y, Fang EX, Wang Z, Li R (2022c) Nearly dimension-independent sparse linear bandit over small action spaces via best subset selection. J. Amer. Statist. Assoc. 119(545):246–258.Google Scholar
  • Cheung WC, Simchi-Levi D (2017) Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Preprint, submitted November 21, https://doi.org/10.2139/ssrn.3075658.Google Scholar
  • Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.LinkGoogle Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory (Helsinki, Finland, 2008), 355–366.Google Scholar
  • den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • Estelami H, Lehmann DR, Holden AC (2001) Macro-economic determinants of consumer price knowledge: A meta-analysis of four decades of research. Internat. J. Res. Marketing 18(4):341–355.CrossrefGoogle Scholar
  • Fan J, Guo Y, Yu M (2022) Policy optimization using semiparametric models for dynamic pricing. J. Amer. Statist. Assoc. 119(545):552–564.CrossrefGoogle Scholar
  • Fan J, Li K, Liao Y (2021) Recent developments in factor models and applications in econometric learning. Annu. Rev. Financial Econom. 13:401–430.Google Scholar
  • Féraud R, Allesiardo R, Urvoy T, Clérot F (2016) Random forest for the contextual bandit problem. Proc. 19th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 93–101.Google Scholar
  • Ferreira KJ, Mower E (2023) Demand learning and pricing for varying assortments. Manufacturing Service Oper. Management 25(4):1227–1244.LinkGoogle Scholar
  • Gallego G, Wang R (2014) Multiproduct price optimization and competition under the nested logit model with product-differentiated price sensitivities. Oper. Res. 62(2):450–461.LinkGoogle Scholar
  • Gordon BR, Goldfarb A, Li Y (2013) Does price elasticity vary with economic growth? A cross-category analysis. J. Marketing Res. 50(1):4–23.CrossrefGoogle Scholar
  • Green PE, Krieger AM (1985) Models and heuristics for product line selection. Marketing Sci. 4(1):1–19.LinkGoogle Scholar
  • Green PE, Krieger AM (1993) Conjoint analysis with product-positioning applications. Eliashberg J, Lilien GL, eds. Handbooks in Operations Research and Management Science, vol. 5 (North–Holland, Amsterdam), 467–515.Google Scholar
  • Hao B, Lattimore T, Wang M (2020) High-dimensional sparse linear bandits. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., New York), 10753–10763.Google Scholar
  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J. Ed. Psych. 24(6):417.CrossrefGoogle Scholar
  • Hu J, Chen X, Jin C, Li L, Wang L (2021) Near-optimal representation learning for linear bandits and linear RL. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York), 4349–4358.Google Scholar
  • Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
  • Jun K-S, Willett R, Wright S, Nowak R (2019) Bilinear bandits with low-rank structure. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 3163–3172.Google Scholar
  • Kaiser HF (1958) The Varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3):187–200.CrossrefGoogle Scholar
  • Kallus N, Udell M (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.LinkGoogle Scholar
  • Kang Y, Hsieh C-J, Lee TCM (2022) Efficient frameworks for generalized low-rank matrix bandit problems. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates Inc., New York), 19971–19983.Google Scholar
  • Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Kim G-S, Paik MC (2019) Doubly-robust LASSO bandit. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., New York), 5877–5887.Google Scholar
  • Kim J-h, Vojnovic M (2021) Scheduling servers with stochastic bilinear rewards. Preprint, submitted December 13, https://arxiv.org/abs/2112.06362v1.Google Scholar
  • Kleinberg R (2004) Nearly tight bounds for the continuum-armed bandit problem. Proc. 17th Internat. Conf. Neural Inform. Processing Systems NIPS’04 (MIT Press, Cambridge, MA), 697–704.Google Scholar
  • Kleinberg R, Leighton T (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annu. IEEE Sympos. Foundations Comput. Sci. 2003 (IEEE, Piscataway, NJ), 594–605.Google Scholar
  • Kleinberg R, Slivkins A, Upfal E (2019) Bandits and experts in metric spaces. J. ACM 66(4):30.Google Scholar
  • Krishnamurthy A, Langford J, Slivkins A, Zhang C (2020) Contextual bandits with continuous actions: Smoothing, zooming, and adapting. J. Machine Learn. Res. 21(1):5402–5446.Google Scholar
  • Kumar V, Umashankar N, Kim KH, Bhagwat Y (2014) Assessing the influence of economic and customer experience factors on service purchase behaviors. Marketing Sci. 33(5):673–692.LinkGoogle Scholar
  • Kveton B, Szepesvári C, Rao A, Wen Z, Abbasi-Yadkori Y, Muthukrishnan S (2017) Stochastic low-rank bandits. Preprint, submitted December 13, https://arxiv.org/abs/1712.04644.Google Scholar
  • Lale S, Azizzadenesheli K, Anandkumar A, Hassibi B (2019) Stochastic linear bandits with hidden low rank structure. Preprint, submitted January 28, https://arxiv.org/abs/1901.09490.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Lee SJ, Sun WW, Liu Y (2024) Low-rank online dynamic assortment with dual contextual information. Preprint, submitted April 19, https://arxiv.org/abs/2404.17592v1.Google Scholar
  • Li H, Webster S, Yu G (2020) Product design under multinomial logit choices: Optimization of quality and prices in an evolving product line. Manufacturing Service Oper. Management 22(5):1011–1025.LinkGoogle Scholar
  • Li Q, Cheng G, Fan J, Wang Y (2018) Embracing the blessing of dimensionality in factor models. J. Amer. Statist. Assoc. 113(521):380–389.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. WWW’10 Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
  • Lu T, Pál D, Pál M (2010) Contextual multi-armed bandits. Proc. 13th Internat. Conf. Artificial Intelligence Statist. JMLR Workshop Conf. Proc. (PMLR, New York), 485–492.Google Scholar
  • Lu Y, Meisami A, Tewari A (2021) Low-rank generalized linear bandit problems. Proc. 24th Internat. Conf. Artificial Intelligence Statist. AISTATS (PMLR, New York), 460–468.Google Scholar
  • Madden TJ, Hewett K, Roth MS (2000) Managing images in different cultures: A cross-national study of color meanings and preferences. J. Internat. Marketing 8(4):90–107.CrossrefGoogle Scholar
  • McBride RD, Zufryden FS (1988) An integer programming approach to the optimal product line selection problem. Marketing Sci. 7(2):126–140.LinkGoogle Scholar
  • Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
  • Miao S, Chao X (2022) Online personalized assortment optimization with high-dimensional customer contextual data. Manufacturing Service Oper. Management 24(5):2741–2760.LinkGoogle Scholar
  • Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.CrossrefGoogle Scholar
  • Negahban S, Wainwright MJ (2011) Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Statist. 39(2):1069–1097.CrossrefGoogle Scholar
  • Negahban S, Wainwright MJ (2012) Restricted strong convexity and (weighted) matrix completion: Optimal bounds with noise. J. Machine Learn. Res. 13:1665–1697.Google Scholar
  • Papini M, Tirinzoni A, Restelli M, Lazaric A, Pirotta M (2021) Leveraging good representations in linear contextual bandits. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York), 8371–8380.Google Scholar
  • Pol LG (1991) Demographic contributions to marketing: An assessment. J. Acad. Marketing Sci. 19:53–59.Google Scholar
  • Qiang S, Bayati M (2016) Dynamic pricing with demand covariates. Preprint, submitted April 25, https://arxiv.org/abs/1604.07463.Google Scholar
  • Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3):471–501.CrossrefGoogle Scholar
  • Rizk G, Thomas A, Colin I, Laraki R, Chevaleyre Y (2021) Best arm identification in graphical bilinear bandits. Proc. 38th Internat. Conf. Machine Learn. (PMLR, New York), 9010–9019.Google Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • Rohe K, Zeng M (2023) Vintage factor analysis with Varimax performs statistical inference. J. Roy. Statist. Soc. Ser. B Statist. Methodology 85(4):1037–1060.Google Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Rusmevichientong P, Shen Z-JM, Shmoys DB (2010) Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Oper. Res. 58(6):1666–1680.LinkGoogle Scholar
  • Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387–404.LinkGoogle Scholar
  • Schuler A, Liu V, Wan J, Callahan A, Udell M, Stark DE, Shah NH (2016) Discovering patient phenotypes using generalized low rank models. Biocomputing 2016 Proc. Pacific Sympos. (World Scientific, Singapore), 144–155.Google Scholar
  • Shen S, Chen X, Fang E, Lu J (2023) Combinatorial inference on the optimal assortment in multinomial logit models. Preprint, submitted February 27, https://doi.org/10.2139/ssrn.4371919.Google Scholar
  • Singh S (2006) Impact of color on marketing. Management Decision 44(6):783–789.Google Scholar
  • Slivkins A (2011) Contextual bandits with similarity information. Proc. 24th Annu. Conf. Learn. Theory JMLR Workshop Conf. Proc. (PMLR, New York), 679–702.Google Scholar
  • Srebro N, Alon N, Jaakkola TS (2005) Generalization error bounds for collaborative prediction with low-rank matrices. Adv. Neural Inform. Processing Systems, vol. 18 (Curran Associates, New York), 1321–1328.Google Scholar
  • Turğay E, Bulucu C, Tekin C (2020) Exploiting relevance for online decision-making in high-dimensions. IEEE Trans. Signal Processing 69:1438–1451.Google Scholar
  • Udell M, Horn C, Zadeh R, Boyd S (2016) Generalized low rank models. Foundations Trends Machine Learn. 9(1):1–118.CrossrefGoogle Scholar
  • Vulcano G, Van Ryzin G, Ratliff R (2012) Estimating primary demand for substitutable products from sales transaction data. Oper. Res. 60(2):313–334.LinkGoogle Scholar
  • Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wang R (2021) Consumer choice and market expansion: Modeling, optimization, and estimation. Oper. Res. 69(4):1044–1056.Google Scholar
  • Xu K, Bastani H (2021) Learning across bandits in high dimension via robust statistics. Preprint, submitted December 28, https://arxiv.org/abs/2112.14233v1.Google Scholar
  • Xu P, Wen Z, Zhao H, Gu Q (2022) Neural contextual bandits with deep representation and shallow exploration. Tenth Internat. Conf. Learn. Representations (Virtual, 2022).Google Scholar
  • Yang J, Hu W, Lee JD, Du SS (2020) Impact of representation learning in linear bandits. Preprint, submitted October 13, https://arxiv.org/abs/2010.06531v1.Google Scholar
  • Zhou D, Li L, Gu Q (2020) Neural contextual bandits with UCB-based exploration. Proc. 37th Internat. Conf. Machine Learn. (PMLR, New York), 11492–11502.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.