Nonparametric Pricing Bandits Leveraging Informational Externalities to Learn the Demand Curve

Published Online:https://doi.org/10.1287/mksc.2022.0247

References

  • Aghion P, Bolton P, Harris C, Jullien B (1991) Optimal learning by experimentation. Rev. Econom. Stud. 58(4):621–654.CrossrefGoogle Scholar
  • Agrawal R (1995) Sample mean based index policies by O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4):1054–1078.CrossrefGoogle Scholar
  • Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory, vol. 23 (PMLR, New York), 39.1–39.26.Google Scholar
  • Aparicio D, Simester D (2022) Price frictions and the success of new products. Marketing Sci. 41(6):1057–1073.LinkGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(November):397–422.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Bayati M, Hamidi N, Johari R, Khosravi K (2020) Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 1713–1723.Google Scholar
  • Bergemann D, Schlag KH (2008) Pricing without priors. J. Eur. Econom. Assoc. 6(2–3):560–569.CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • Botev Z, Belzile L (2025) TruncatedNormal: Truncated multivariate normal and student distributions. https://github.com/lbelzile/truncatednormal.Google Scholar
  • Brochu E, Hoffman MW, de Freitas N (2010) Portfolio allocation for Bayesian optimization. Preprint, submitted September 28, https://arxiv.org/abs/1009.5419.Google Scholar
  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2249–2257.Google Scholar
  • Chatterjee S, Sen S (2021) Regret minimization in isotonic, heavy-tailed contextual bandits via adaptive confidence bands. Preprint, submitted October 19, https://arxiv.org/abs/2110.10245.Google Scholar
  • Chen Q, Jasin S, Duenyas I (2019) Nonparametric self-adjusting control for joint learning and optimization of multiproduct pricing with finite resource capacity. Math. Oper. Res. 44(2):601–631.LinkGoogle Scholar
  • Cheshire J, Ménard P, Carpentier A (2020) The influence of shape constraints on the thresholding bandit problem. Abernethy J, Agarwal S, eds. Proc. Thirty Third Conf. Learn. Theory, vol. 125 (PMLR, New York), 1228–1275.Google Scholar
  • Ching AT, Osborne M (2020) Identification and estimation of forward-looking behavior: The case of consumer stockpiling. Marketing Sci. 39(4):707–726.LinkGoogle Scholar
  • Chowdhury SR, Gopalan A (2017) On kernelized multi-armed bandits. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 844–853.Google Scholar
  • Cohen SN, Treetanthiploet T (2020) Asymptotic randomised control with applications to bandits. Preprint, submitted October 14, https://arxiv.org/abs/2010.07252.Google Scholar
  • Dann C, Mansour Y, Mohri M, Sekhari A, Sridharan K (2022) Guarantees for epsilon-greedy reinforcement learning with function approximation. Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, eds. Proc. 39th Internat. Conf. Machine Learn., vol. 162 (PMLR, New York), 4666–4689.Google Scholar
  • Dholakia U (2015) The risks of changing your prices too often. Harvard Bus. Rev. (July 6), https://hbr.org/2015/07/the-risks-of-changing-your-prices-too-often?ab=HP-hero-for-you-text-2.Google Scholar
  • Duvenaud D (2014) Automatic model construction with Gaussian processes. PhD thesis, University of Cambridge, Cambridge, UK.Google Scholar
  • Erdem T, Keane MP (1996) Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Sci. 15(1):1–20.LinkGoogle Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Furman J, Simcoe T (2015) The economics of big data and differential pricing. The White House President Barack Obama (blog) (February 6), https://obamawhitehouse.archives.gov/blog/2015/02/06/economics-big-data-and-differential-pricing.Google Scholar
  • Gittins J (1974) A dynamic allocation index for the sequential design of experiments. Gittins JC, Jones DM, eds. Progress in Statistics (North-Holland, Amsterdam), 241–266.Google Scholar
  • Goli A, Reiley DH, Zhang H (2025) Personalizing ad load to optimize subscription and ad revenues: Product strategies constructed from experiments on pandora. Marketing Sci. 44(2):327–352.Google Scholar
  • Gordon BR, Zettelmeyer F, Bhargava N, Chapsky D (2019) A comparison of approaches to advertising measurement: Evidence from big field experiments at Facebook. Marketing Sci. 38(2):193–225.LinkGoogle Scholar
  • Guntuboyina A, Sen B (2018) Nonparametric shape-restricted regression. Statist. Sci. 33(4):568–594.CrossrefGoogle Scholar
  • Handel BR, Misra K (2015) Robust new product pricing. Marketing Sci. 34(6):864–881.LinkGoogle Scholar
  • Hanssens DM, Pauwels KH (2016) Demonstrating the value of marketing. J. Marketing 80(6):173–190.CrossrefGoogle Scholar
  • Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202–223.LinkGoogle Scholar
  • Hendel I, Nevo A (2006) Measuring the implications of sales and consumer inventory behavior. Econometrica 74(6):1637–1673.CrossrefGoogle Scholar
  • Hill DN, Nassif H, Liu Y, Iyer A, Vishwanathan S (2017) An efficient bandit algorithm for realtime multivariate optimization. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1813–1821.Google Scholar
  • Hoban PR, Bucklin RE (2015) Effects of internet display advertising in the purchase funnel: Model-based insights from a randomized field experiment. J. Marketing Res. 52(3):375–393.CrossrefGoogle Scholar
  • Huang Y, Ellickson PB, Lovett MJ (2022) Learning to set prices. J. Marketing Res. 59(2):411–434.CrossrefGoogle Scholar
  • Jindal P, Zhu T, Chintagunta P, Dhar S (2020) Marketing-mix response across retail formats: The role of shopping trip types. J. Marketing 84(2):114–132.CrossrefGoogle Scholar
  • Kawale J, Bui HH, Kveton B, Tran-Thanh L, Chawla S (2015) Efficient Thompson sampling for online matrix-factorization recommendation. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates Inc., Red Hook, NY), 1297–1305.Google Scholar
  • Kermisch R, Burns D (2018) Is pricing killing your profits? Bain & Company. Accessed June 16, 2018, http://www.bain.com/publications/articles/is-pricing-killing-your-profits.aspx.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lambrecht A, Tucker C, Wiertz C (2018) Advertising to early trend propagators: Evidence from Twitter. Marketing Sci. 37(2):177–199.LinkGoogle Scholar
  • Maatouk H, Bay X (2017) Gaussian process emulators for computer experiments with inequality constraints. Math. Geosciences 49(5):557–582.CrossrefGoogle Scholar
  • Miao S, Wang Y (2024) Demand balancing in primal-dual optimization for blind network revenue management. Preprint, submitted April 6, https://arxiv.org/abs/2404.04467.Google Scholar
  • Micchelli CA, Xu Y, Zhang H (2006) Universal kernels. J. Machine Learn. Res. 7(12):2651–2667.Google Scholar
  • Misra K, Schwartz EM, Abernethy J (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Sci. 38(2):226–252.LinkGoogle Scholar
  • Nair H (2007) Intertemporal price discrimination with forward-looking consumers: Application to the US market for console video-games. Quant. Marketing Econom. 5(3):239–292.CrossrefGoogle Scholar
  • Oren SS, Smith SA, Wilson RB (1982) Nonlinear pricing in markets with interdependent demand. Marketing Sci. 1(3):287–313.LinkGoogle Scholar
  • Rao RC, Bass FM (1985) Competition, strategy, and price dynamics: A theoretical and empirical investigation. J. Marketing Res. 22(3):283–296.CrossrefGoogle Scholar
  • Ringbeck D, Huchzermeier A (2019) Dynamic pricing and learning: An application of Gaussian process regression. Preprint, submitted June 24, http://dx.doi.org/10.2139/ssrn.3406293.Google Scholar
  • Rothschild M (1974) A two-armed bandit theory of market pricing. J. Econom. Theory 9(2):185–202.CrossrefGoogle Scholar
  • Rubel O (2013) Stochastic competitive entries and dynamic pricing. Eur. J. Oper. Res. 231(2):381–392.CrossrefGoogle Scholar
  • Sahni NS, Nair HS (2020) Does advertising serve as a signal? Evidence from a field experiment in mobile search. Rev. Econom. Stud. 87(3):1529–1564.CrossrefGoogle Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 104(1):148–175.CrossrefGoogle Scholar
  • Simester D, Hu Y, Brynjolfsson E, Anderson ET (2009) Dynamics of retail advertising: Evidence from a field experiment. Econom. Inquiry 47(3):482–499.CrossrefGoogle Scholar
  • Srinivas N, Krause A, Kakade SM, Seeger M (2009) Gaussian process optimization in the bandit setting: No regret and experimental design. Preprint, submitted December 21, https://arxiv.org/abs/0912.3995.Google Scholar
  • Thomas M, Morwitz V (2005) Penny wise and pound foolish: The left-digit effect in price cognition. J. Consumer Res. 32(1):54–64.CrossrefGoogle Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Tirole J (1988) The Theory of Industrial Organization (MIT Press, Cambridge, MA).Google Scholar
  • Urteaga I, Wiggins CH (2018) Nonparametric Gaussian mixture models for the multi-armed bandit. Preprint, submitted August 8, https://arxiv.org/abs/1808.02932.Google Scholar
  • Wang Y, Chen B, Simchi-Levi D (2021) Multimodal dynamic pricing. Management Sci. 67(10):6136–6152.LinkGoogle Scholar
  • Williams CK, Rasmussen CE (2006) Gaussian Processes for Machine Learning, vol. 2 (MIT Press, Cambridge, MA).Google Scholar
  • Yu M, Debo L, Kapuscinski R (2016) Strategic waiting for consumer-generated quality information: Dynamic pricing of new experience goods. Management Sci. 62(2):410–435.LinkGoogle Scholar
  • Zhang L, Chung DJ (2020) Price bargaining and competition in online platforms: An empirical analysis of the daily deal market. Marketing Sci. 39(4):687–706.LinkGoogle Scholar
  • Zhao H, He J, Zhou D, Zhang T, Gu Q (2023) Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency. Neu G, Rosasco L, eds. Proc. Thirty Sixth Conf. Learn. Theory, vol. 195 (PMLR, New York), 4977–5020.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.