Collaborative Learning and Decision Making on Pricing and Recommendation: A Simple Framework for Planning

Published Online:https://doi.org/10.1287/mnsc.2023.00320

References

  • Abbasi-Yadkori Y, Antos A, Szepesvári C (2009) Forced-exploration based algorithms for playing in stochastic linear bandits. Proc. COLT Workshop On-Line Learn. Limited Feedback, vol. 92, 236.Google Scholar
  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
  • Agarwal A, Dudík M, Kale S, Langford J, Schapire R (2012) Contextual bandit learning with predictable rewards. Conf. Artificial Intelligence Statist., vol. 22 (PMLR, New York), 19–26.Google Scholar
  • Agarwal A, Hsu D, Kale S, Langford J, Li L, Schapire R (2014) Taming the monster: A fast and simple algorithm for contextual bandits. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1638–1646.Google Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 127–135.Google Scholar
  • Alptekinoğlu A, Semple JH (2016) The exponomial choice model: A new alternative for assortment and price optimization. Oper. Res. 64(1):79–93.LinkGoogle Scholar
  • Ban GY, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.LinkGoogle Scholar
  • Banerjee S, Sinclair SR, Tambe M, Xu L, Yu CL (2022) Artificial replay: A meta-algorithm for harnessing historical data in bandits. Preprint, submitted September 30, https://arxiv.org/abs/2210.00025.Google Scholar
  • Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Bayati M, Hamidi N, Johari R, Khosravi K (2020) Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms. Adv. Neural Inform. Processing Systems 33:1713–1723.Google Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Bietti A, Agarwal A, Langford J (2021) A contextual bandit bake-off. J. Machine Learn. Res. 22(1):5928–5976.Google Scholar
  • Bird S, Barocas S, Crawford K, Diaz F, Wallach H (2016) Exploring or exploiting? Social and ethical implications of autonomous experimentation in AI. Proc. Workshop Fairness Accountability Transparency Machine Learn.Google Scholar
  • Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.LinkGoogle Scholar
  • Cao J, Gao R (2021) Contextual decision-making under parametric uncertainty and data-driven optimistic optimization. Preprint, submitted October 14, https://optimization-online.org/2021/10/8634/.Google Scholar
  • Cao J, Leng Y (2025) Adaptive data acquisition for personalized recommender systems with optimality guarantees on short-form video platforms. Management Sci., ePub ahead of print August 25, https://doi.org/10.1287/mnsc.2022.01130.LinkGoogle Scholar
  • Cao J, Sun W (2024) Tiered assortment: Optimization and online learning. Management Sci. 70(8):5481–5501.LinkGoogle Scholar
  • Chen N, Gallego G (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.LinkGoogle Scholar
  • Chen Y, Shi C (2019) Joint pricing and inventory management with strategic customers. Oper. Res. 67(6):1610–1627.LinkGoogle Scholar
  • Chen X, Simchi-Levi D (2012) Pricing and inventory management. The Oxford Handbook of Pricing Management, vol. 1, 784–824.Google Scholar
  • Chen X, Miao S, Wang Y (2023) Differential privacy in personalized pricing with nonparametric demand models. Oper. Res. 71(2):581–602.LinkGoogle Scholar
  • Chen B, Wang Y, Zhou Y (2024) Optimal policies for dynamic pricing and inventory control with nonparametric censored demands. Management Sci. 70(5):3362–3380.LinkGoogle Scholar
  • Chen X, Owen Z, Pixton C, Simchi-Levi D (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.LinkGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 208–214.Google Scholar
  • Cohen MC, Elmachtoub AN, Lei X (2022) Price discrimination with fairness constraints. Management Sci. 68(12):8536–8552.LinkGoogle Scholar
  • Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.LinkGoogle Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. Conf. Learn. Theory.Google Scholar
  • den Boer AV, Keskin NB (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.Google Scholar
  • den Boer AV, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • den Boer AV, Zwart B (2015) Dynamic pricing and learning with finite inventories. Oper. Res. 63(4):965–978.LinkGoogle Scholar
  • Dimakopoulou M, Zhou Z, Athey S, Imbens G (2019) Balanced linear contextual bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 33, 3445–3453.Google Scholar
  • Dubey A, Pentland A (2020) Differentially-private federated linear bandits. Adv. Neural Inform. Processing Systems 33:6003–6014.Google Scholar
  • El Housni O, Topaloglu H (2022) Joint assortment optimization and customization under a mixture of multinomial logit models: On the value of personalized assortments. Oper. Res. 71(4):1197–1215.Google Scholar
  • Ettl M, Harsha P, Papush A, Perakis G (2020) A data-driven approach to personalized bundle pricing and recommendation. Manufacturing Service Oper. Management 22(3):461–480.LinkGoogle Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson Sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 586–594.Google Scholar
  • Foster D, Rakhlin A (2020) Beyond UCB: Optimal and efficient contextual bandits with regression oracles. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 3199–3210.Google Scholar
  • Foster D, Agarwal A, Dudik M, Luo H, Schapire R (2018) Practical contextual bandits with regression oracles. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1539–1548.Google Scholar
  • Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–164.CrossrefGoogle Scholar
  • Hu P, Shum S, Yu M (2016) Joint inventory and markdown management for perishable goods with strategic consumer behavior. Oper. Res. 64(1):118–134.LinkGoogle Scholar
  • Huang R, Wu W, Yang J, Shen C (2021) Federated linear contextual bandits. Adv. Neural Inform. Processing Systems 34:27057–27068.Google Scholar
  • Jagabathula S, Rusmevichientong P (2017) A nonparametric joint assortment and price choice model. Management Sci. 63(9):3128–3145.LinkGoogle Scholar
  • Javanmard A, Nazerzadeh H (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
  • Jun KS, Willett R, Wright S, Nowak R (2019) Bilinear bandits with low-rank structure. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 3163–3172.Google Scholar
  • Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, et al. (2021) Advances and open problems in federated learning. Foundations Trends Machine Learn. 14(1–2):1–210.CrossrefGoogle Scholar
  • Kallus N, Udell M (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.LinkGoogle Scholar
  • Kannan S, Morgenstern JH, Roth A, Waggoner B, Wu ZS (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 2227–2236.Google Scholar
  • Kao H, Wei CY, Subramanian V (2022) Decentralized cooperative reinforcement learning with hierarchical information structure. Proc. Internat. Conf. Algorithmic Learn. Theory (PMLR, New York), 573–605.Google Scholar
  • Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.LinkGoogle Scholar
  • Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.LinkGoogle Scholar
  • Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.LinkGoogle Scholar
  • Keskin NB, Li Y, Sunar N (2024) Data-driven clustering and feature-based retail electricity pricing with smart meters. Oper. Res. 73(5):2636–2660.LinkGoogle Scholar
  • Kök AG, Xu Y (2011) Optimal and competitive assortments with endogenous pricing under hierarchical consumer choice models. Management Sci. 57(9):1546–1563.LinkGoogle Scholar
  • Krishnamurthy A, Langford J, Slivkins A, Zhang C (2020) Contextual bandits with continuous actions: Smoothing, zooming, and adapting. J. Machine Learn. Res. 21(1):5402–5446.Google Scholar
  • Kveton B, Zaheer M, Szepesvari C, Li L, Ghavamzadeh M, Boutilier C (2020) Randomized exploration in generalized linear bandits. Proc. Internat. Conf. Artificial Intelligence Statist., 2066–2076.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. 34th Internat. Conf. Machine Learn., vol. 70, 2071–2080.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web, 61–670.Google Scholar
  • Li T, Sahu AK, Talwalkar A, Smith V (2020) Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine 37(3):50–60.CrossrefGoogle Scholar
  • Liu X (2023) Dynamic coupon targeting using batch deep reinforcement learning: An application to livestream shopping. Marketing Sci. 42(4):637–658.LinkGoogle Scholar
  • Lu Y, Meisami A, Tewari A (2021) Low-rank generalized linear bandit problems. Proc. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 460–468.Google Scholar
  • McMahan B, Moore E, Ramage D, Hampson S, Agüera y Arcas B (2017) Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence and Statistics (PMLR, New York), 1273–1282.Google Scholar
  • Mersereau AJ, Rusmevichientong P, Tsitsiklis JN (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automated Control 54(12):2787–2802.CrossrefGoogle Scholar
  • Miao S, Chao X (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
  • Nambiar M, Simchi-Levi D, Wang H (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.LinkGoogle Scholar
  • Perivier N, Goyal V (2022) Dynamic pricing and assortment under a contextual MNL demand. Adv. Neural Inform. Processing Systems 35:3461–3474.Google Scholar
  • Qiang S, Bayati M (2016) Dynamic pricing with demand covariates. Preprint, submitted April 25, https://arxiv.org/abs/1604.07463.Google Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
  • Shen M, Tang CS, Wu D, Yuan R, Zhou W (2020) JD.com: Transaction-level data for the 2020 MSOM data driven research challenge. Manufacturing Service Oper. Management 26(1):2–10.LinkGoogle Scholar
  • Shi C, Shen C (2021) Federated multi-armed bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 35, 9603–9611.Google Scholar
  • Shin D, Vaccari S, Zeevi A (2022) Dynamic pricing with online reviews. Management Sci. 69(2):824–845.Google Scholar
  • Simchi-Levi D, Xu Y (2021) Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Math. Oper. Res. 47(3):1904–1931.Google Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • Sun H, Li X, Teo C-P (2025) Partition and prosper: Design and pricing of single bundle. Oper. Res. 73(4):1983–2001.Google Scholar
  • Valko M, Korda N, Munos R, Flaounas I, Cristianini N (2013) Finite-time analysis of kernelised contextual bandits. Preprint, submitted September 26, https://arxiv.org/abs/1309.6869.Google Scholar
  • Wang R (2012) Capacitated assortment and price optimization under the multinomial logit model. Oper. Res. Lett. 40(6):492–497.CrossrefGoogle Scholar
  • Wang CC, Kulkarni SR, Poor HV (2005) Bandit problems with side observations. IEEE Trans. Automated Control 50(3):338–355.CrossrefGoogle Scholar
  • Wang H, Talluri K, Li X (2025) Technical note—On dynamic pricing with covariates. Oper. Res. 73(4):1932–1943.LinkGoogle Scholar
  • Wang M, Zhang H, Rusmevichientong P, Shen M (2024) Optimizing offline product design and online assortment policy: Measuring the relative impact of each decision. Management Sci. 71(5):4266–4286.LinkGoogle Scholar
  • Woodroofe M (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Stat. Assoc. 74(368):799–806.CrossrefGoogle Scholar
  • Xu Y, Zeevi A (2020) Upper counterfactual confidence bounds: A new optimism principle for contextual bandits. Preprint, submitted July 15, https://arxiv.org/abs/2007.07876.Google Scholar
  • Zhang H, Cheng L (2015) Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim. Lett. 9:961–979.CrossrefGoogle Scholar
  • Zhou D, Li L, Gu Q (2020) Neural contextual bandits with UCB-based exploration. Internat. Conf. Machine Learn. (PMLR, New York), 11492–11502.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.