Collaborative Learning and Decision Making on Pricing and Recommendation: A Simple Framework for Planning
References
- (2009) Forced-exploration based algorithms for playing in stochastic linear bandits. Proc. COLT Workshop On-Line Learn. Limited Feedback, vol. 92, 236.Google Scholar
- (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
- (2012) Contextual bandit learning with predictable rewards. Conf. Artificial Intelligence Statist., vol. 22 (PMLR, New York), 19–26.Google Scholar
- (2014) Taming the monster: A fast and simple algorithm for contextual bandits. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1638–1646.Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 127–135.Google Scholar
- (2016) The exponomial choice model: A new alternative for assortment and price optimization. Oper. Res. 64(1):79–93.Link, Google Scholar
- (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.Link, Google Scholar
- (2022) Artificial replay: A meta-algorithm for harnessing historical data in bandits. Preprint, submitted September 30, https://arxiv.org/abs/2210.00025.Google Scholar
- (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
- (2020) Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms. Adv. Neural Inform. Processing Systems 33:1713–1723.Google Scholar
- (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- (2021) A contextual bandit bake-off. J. Machine Learn. Res. 22(1):5928–5976.Google Scholar
- (2016) Exploring or exploiting? Social and ethical implications of autonomous experimentation in AI. Proc. Workshop Fairness Accountability Transparency Machine Learn.Google Scholar
- (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.Link, Google Scholar
- (2021) Contextual decision-making under parametric uncertainty and data-driven optimistic optimization. Preprint, submitted October 14, https://optimization-online.org/2021/10/8634/.Google Scholar
- (2025) Adaptive data acquisition for personalized recommender systems with optimality guarantees on short-form video platforms. Management Sci., ePub ahead of print August 25, https://doi.org/10.1287/mnsc.2022.01130.Link, Google Scholar
- (2024) Tiered assortment: Optimization and online learning. Management Sci. 70(8):5481–5501.Link, Google Scholar
- (2021) Nonparametric pricing analytics with customer covariates. Oper. Res. 69(3):974–984.Link, Google Scholar
- (2019) Joint pricing and inventory management with strategic customers. Oper. Res. 67(6):1610–1627.Link, Google Scholar
- (2012) Pricing and inventory management. The Oxford Handbook of Pricing Management, vol. 1, 784–824.Google Scholar
- (2023) Differential privacy in personalized pricing with nonparametric demand models. Oper. Res. 71(2):581–602.Link, Google Scholar
- (2024) Optimal policies for dynamic pricing and inventory control with nonparametric censored demands. Management Sci. 70(5):3362–3380.Link, Google Scholar
- (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.Link, Google Scholar
- (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 208–214.Google Scholar
- (2022) Price discrimination with fairness constraints. Management Sci. 68(12):8536–8552.Link, Google Scholar
- (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.Link, Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Proc. Conf. Learn. Theory.Google Scholar
- (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.Google Scholar
- (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
- (2015) Dynamic pricing and learning with finite inventories. Oper. Res. 63(4):965–978.Link, Google Scholar
- (2019) Balanced linear contextual bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 33, 3445–3453.Google Scholar
- (2020) Differentially-private federated linear bandits. Adv. Neural Inform. Processing Systems 33:6003–6014.Google Scholar
- (2022) Joint assortment optimization and customization under a mixture of multinomial logit models: On the value of personalized assortments. Oper. Res. 71(4):1197–1215.Google Scholar
- (2020) A data-driven approach to personalized bundle pricing and recommendation. Manufacturing Service Oper. Management 22(3):461–480.Link, Google Scholar
- (2018) Online network revenue management using Thompson Sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2010) Parametric bandits: The generalized linear case. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 586–594.Google Scholar
- (2020) Beyond UCB: Optimal and efficient contextual bandits with regression oracles. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 3199–3210.Google Scholar
- (2018) Practical contextual bandits with regression oracles. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1539–1548.Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B (Methodological) 41(2):148–164.Crossref, Google Scholar
- (2016) Joint inventory and markdown management for perishable goods with strategic consumer behavior. Oper. Res. 64(1):118–134.Link, Google Scholar
- (2021) Federated linear contextual bandits. Adv. Neural Inform. Processing Systems 34:27057–27068.Google Scholar
- (2017) A nonparametric joint assortment and price choice model. Management Sci. 63(9):3128–3145.Link, Google Scholar
- (2019) Dynamic pricing in high-dimensions. J. Machine Learn. Res. 20(1):315–363.Google Scholar
- (2019) Bilinear bandits with low-rank structure. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 3163–3172.Google Scholar
- (2021) Advances and open problems in federated learning. Foundations Trends Machine Learn. 14(1–2):1–210.Crossref, Google Scholar
- (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.Link, Google Scholar
- (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 2227–2236.Google Scholar
- (2022) Decentralized cooperative reinforcement learning with hierarchical information structure. Proc. Internat. Conf. Algorithmic Learn. Theory (PMLR, New York), 573–605.Google Scholar
- (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.Link, Google Scholar
- (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
- (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.Link, Google Scholar
- (2024) Data-driven clustering and feature-based retail electricity pricing with smart meters. Oper. Res. 73(5):2636–2660.Link, Google Scholar
- (2011) Optimal and competitive assortments with endogenous pricing under hierarchical consumer choice models. Management Sci. 57(9):1546–1563.Link, Google Scholar
- (2020) Contextual bandits with continuous actions: Smoothing, zooming, and adapting. J. Machine Learn. Res. 21(1):5402–5446.Google Scholar
- (2020) Randomized exploration in generalized linear bandits. Proc. Internat. Conf. Artificial Intelligence Statist., 2066–2076.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. 34th Internat. Conf. Machine Learn., vol. 70, 2071–2080.Google Scholar
- (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web, 61–670.Google Scholar
- (2020) Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine 37(3):50–60.Crossref, Google Scholar
- (2023) Dynamic coupon targeting using batch deep reinforcement learning: An application to livestream shopping. Marketing Sci. 42(4):637–658.Link, Google Scholar
- (2021) Low-rank generalized linear bandit problems. Proc. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 460–468.Google Scholar
- (2017) Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence and Statistics (PMLR, New York), 1273–1282.Google Scholar
- (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automated Control 54(12):2787–2802.Crossref, Google Scholar
- (2021) Dynamic joint assortment and pricing optimization with demand learning. Manufacturing Service Oper. Management 23(2):525–545.Google Scholar
- (2019) Dynamic learning and pricing with model misspecification. Management Sci. 65(11):4980–5000.Link, Google Scholar
- (2022) Dynamic pricing and assortment under a contextual MNL demand. Adv. Neural Inform. Processing Systems 35:3461–3474.Google Scholar
- (2016) Dynamic pricing with demand covariates. Preprint, submitted April 25, https://arxiv.org/abs/1604.07463.Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
- (2020) JD.com: Transaction-level data for the 2020 MSOM data driven research challenge. Manufacturing Service Oper. Management 26(1):2–10.Link, Google Scholar
- (2021) Federated multi-armed bandits. Proc. AAAI Conf. Artificial Intelligence, vol. 35, 9603–9611.Google Scholar
- (2022) Dynamic pricing with online reviews. Management Sci. 69(2):824–845.Google Scholar
- (2021) Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Math. Oper. Res. 47(3):1904–1931.Google Scholar
- (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
- (2025) Partition and prosper: Design and pricing of single bundle. Oper. Res. 73(4):1983–2001.Google Scholar
- (2013) Finite-time analysis of kernelised contextual bandits. Preprint, submitted September 26, https://arxiv.org/abs/1309.6869.Google Scholar
- (2012) Capacitated assortment and price optimization under the multinomial logit model. Oper. Res. Lett. 40(6):492–497.Crossref, Google Scholar
- (2005) Bandit problems with side observations. IEEE Trans. Automated Control 50(3):338–355.Crossref, Google Scholar
- (2025) Technical note—On dynamic pricing with covariates. Oper. Res. 73(4):1932–1943.Link, Google Scholar
- (2024) Optimizing offline product design and online assortment policy: Measuring the relative impact of each decision. Management Sci. 71(5):4266–4286.Link, Google Scholar
- (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Stat. Assoc. 74(368):799–806.Crossref, Google Scholar
- (2020) Upper counterfactual confidence bounds: A new optimism principle for contextual bandits. Preprint, submitted July 15, https://arxiv.org/abs/2007.07876.Google Scholar
- (2015) Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim. Lett. 9:961–979.Crossref, Google Scholar
- (2020) Neural contextual bandits with UCB-based exploration. Internat. Conf. Machine Learn. (PMLR, New York), 11492–11502.Google Scholar

