Bid Shading in First-Price Auction: Nonstationary Bayesian Multiarmed Bandit Methods for Real-Time Bidding

Published Online:https://doi.org/10.1287/isre.2025.1837

References

  • Abbasi A, Parsons J, Pant G, Sheng ORL, Sarker S (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.LinkGoogle Scholar
  • Alami R (2023) Bayesian change-point detection for bandit feedback in non-stationary environments. Khan E, Gonen M, eds. Proc. 14th Asian Conf. Machine Learn. (PMLR, New York), 17–31.Google Scholar
  • Aramayo N, Schiappacasse M, Goic M (2023) A multiarmed bandit approach for house ads recommendations. Marketing Sci. 42(2):271–292.LinkGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. IEEE 36th Annual Foundations Comput. Sci. (IEEE, Piscataway, NJ), 322–331.Google Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Balseiro S, Golrezaei N, Mahdian M, Mirrokni V, Schneider J (2019) Contextual bandits with cross-learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9679–9688.Google Scholar
  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J. Machine Learn. Res. 13:281–305.Google Scholar
  • Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 199–207.Google Scholar
  • Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Cao Y, Wen Z, Kveton B, Xie Y (2019) Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 418–427.Google Scholar
  • Choi WJ, Sayedi A (2019) Learning in online advertising. Marketing Sci. 38(4):584–608.LinkGoogle Scholar
  • Choi WJ, Sayedi A (2024) Agency market power and information disclosure in online advertising. Marketing Sci. 43(6):1279–1298.LinkGoogle Scholar
  • Choi H, Mela CF, Balseiro SR, Leary A (2020) Online display advertising markets: A literature review and future directions. Inform. Systems Res. 31(2):556–575.LinkGoogle Scholar
  • Christopher RM, Park S, Han SP, Kim MK (2022) Bypassing performance optimizers of real time bidding systems in display ad valuation. Inform. Systems Res. 33(2):399–412.LinkGoogle Scholar
  • Despotakis S, Ravi R, Sayedi A (2021) First-price auctions in online display advertising. J. Marketing Res. 58(5):888–907.CrossrefGoogle Scholar
  • Ding W, Qin T, Zhang XD, Liu TY (2013) Multi-armed bandit with budget constraint and variable costs. Proc. AAAI Conf. Artificial Intelligence, vol. 27 (AAAI, Washington, DC), 232–238.Google Scholar
  • Engelbrecht-Wiggans R, Katok E (2008) Regret and feedback information in first-price sealed-bid auctions. Management Sci. 54(4):808–819.LinkGoogle Scholar
  • Engelbrecht-Wiggans R, Katok E (2009) A direct test of risk aversion and regret in first price sealed-bid auctions. Decision Anal. 6(2):75–86.LinkGoogle Scholar
  • Filiz-Ozbay E, Ozbay EY (2007) Auctions with anticipated regret: Theory and experiment. Amer. Econom. Rev. 97(4):1407–1418.CrossrefGoogle Scholar
  • Galgana R, Golrezaei N (2025) Learning in repeated multiunit pay-as-bid auctions. Manufacturing Service Oper. Management 27(1):200–229.LinkGoogle Scholar
  • Garivier A, Moulines E (2011) On upper-confidence bound policies for switching bandit problems. Kivinen J, Szepesvári C, Ukkonen E, Zeugmann T, eds. Algorithmic Learn. Theory. ALT 2011, Lecture Notes in Computer Science, vol. 6925 (Springer, Berlin, Heidelberg), 174–188.Google Scholar
  • Gligorijevic D, Zhou T, Shetty B, Kitts B, Pan S, Pan J, Flores A (2020) Bid shading in the brave new world of first-price auctions. CIKM’20: Proc. 29th ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 2453–2460.Google Scholar
  • Gong Z, Niu L, Zhao Y, Xu M, Zhang H, Zheng Z, Zhang Z, et al. (2023) MEBS: Multi-task end-to-end bid shading for multi-slot display advertising. CIKM’23: Proc. 32nd ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 4588–4594.Google Scholar
  • Guo M, Zhang W, Yuan C, Jia B, Song G, Hua H, Wang S, Zhang Q (2024) A Bayesian multi-armed bandit algorithm for bid shading in online display advertising. CIKM’24: Proc. 33rd ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 4506–4513.Google Scholar
  • Gupta S, Chaudhari S, Joshi G, Yağan O (2021) Multi-armed bandits with correlated arms. IEEE Trans. Inform. Theory 67(10):6711–6732.CrossrefGoogle Scholar
  • Han Y, Weissman T, Zhou Z (2025) Optimal no-regret learning in repeated first-price auctions. Oper. Res. 73(1):209–238.LinkGoogle Scholar
  • Han Y, Zhou Z, Flores A, Ordentlich E, Weissman T (2020) Learning to bid optimally and efficiently in adversarial first-price auctions. Preprint, submitted July 9, https://arxiv.org/abs/2007.04568.Google Scholar
  • Jadbabaie A, Rakhlin A, Shahrampour S, Sridharan K (2015) Online optimization: Competing with dynamic comparators. Proc. 18th Internat. Conf. Artificial Intelligence Statist., vol. 38 (PMLR, New York), 398–406.Google Scholar
  • Jiang C (2015) Online advertisements and multi-armed bandits. Doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign.Google Scholar
  • Karlsson N, Sang Q (2021) Adaptive bid shading optimization of first-price ad inventory. 2021 Amer. Control Conference (ACC) (IEEE, Piscataway, NJ), 4983–4990.Google Scholar
  • Kasberger B, Schlag KH (2024) Robust bidding in first-price auctions: How to bid without knowing what others are doing. Management Sci. 70(7):4219–4235.LinkGoogle Scholar
  • Katuščák P, Michelucci F, Zajíček M (2015) Does feedback really matter in one-shot first-price auctions? J. Econom. Behav. Organ. 119:139–152.CrossrefGoogle Scholar
  • Komiyama J, Fouché E, Honda J (2024) Finite-time analysis of globally nonstationary multi-armed bandits. J. Machine Learn. Res. 25(112):1–56.Google Scholar
  • Komiyama J, Honda J, Nakagawa H (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Bach F, Blei D, eds. ICML’15: Proc. 32nd Internat. Conf. Machine Learn., vol. 37 (PMLR, New York), 1152–1161.Google Scholar
  • Li X, Zhang MM, Wang Z, Tong Y (2022) Arbitrary distribution modeling with censorship in real-time bidding advertising. KDD’22: Proc. 28th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3250–3258.Google Scholar
  • Liberali G, Ferecatu A (2022) Morphing for consumer dynamics: Bandits meet hidden Markov models. Marketing Sci. 41(4):769–794.LinkGoogle Scholar
  • Lin Q, Zheng Z, Wu F (2024) Robust auto-bidding strategies for online advertising. KDD’24: Proc. 30th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1804–1815.Google Scholar
  • Mao W, Zheng Z, Wu F, Chen G (2018) Online pricing for revenue maximization with unknown time discounting valuations. Lang J, ed. IJCAI’18: Proc. 27th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Washington, DC), 440–446.Google Scholar
  • Misra K, Schwartz EM, Abernethy J (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Sci. 38(2):226–252.LinkGoogle Scholar
  • Ou W, Chen B, Dai X, Zhang W, Liu W, Tang R, Yu Y (2023a) A survey on bid optimization in real-time bidding display advertising. ACM Trans. Knowledge Discovery Data 18(3):1–31.Google Scholar
  • Ou W, Chen B, Yang Y, Dai X, Liu W, Zhang W, Tang R, Yu Y (2023b) Deep landscape forecasting in multi-slot real-time bidding. KDD’23: Proc. 29th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 4685–4695.Google Scholar
  • Pan S, Kitts B, Zhou T, He H, Shetty B, Flores A, Gligorijevic D, et al. (2020) Bid shading by win-rate estimation and surplus maximization. AdKDD’20: SIG Conf. Knowledge Discovery Data Mining (ACM, New York), 6 pages.Google Scholar
  • Pandey S, Chakrabarti D, Agarwal D (2007) Multi-armed bandit problems with dependent arms. Ghahramani Z, ed. ICML ‘07: Proc. 24th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 721–728.Google Scholar
  • Ren K, Qin J, Zheng L, Yang Z, Zhang W, Yu Y (2019) Deep landscape forecasting for real-time bidding advertising. KDD’23: Proc. 25th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 363–372.Google Scholar
  • Rhuggenaath J, Akcay A, Zhang Y, Kaymak U (2022) Setting reserve prices in second-price auctions with unobserved bids. INFORMS J. Comput. 34(6):2950–2967.LinkGoogle Scholar
  • Riley JG, Samuelson WF (1981) Optimal auctions. Amer. Econom. Rev. 71(3):381–392.Google Scholar
  • Sayedi A (2018) Real-time bidding in online display advertising. Marketing Sci. 37(4):553–568.LinkGoogle Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Sunar N, Wang Z (2024) Optimal multi-armed bandit with dependent arms. Preprint, submitted May 30, https://doi.org/10.2139/ssrn.4847785.Google Scholar
  • Tilli T, Espinosa-Leal L (2021) Multi-armed bandits for bid shading in first-price real-time bidding auctions. J. Intelligent Fuzzy Systems 41(6):6111–6125.Google Scholar
  • Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.CrossrefGoogle Scholar
  • Waisman C, Nair HS, Carrion C (2025) Online causal inference for advertising in real-time bidding auctions. Marketing Sci. 44(1):176–195.LinkGoogle Scholar
  • Wang C, Guo P (2017) Behavioral models for first-price sealed-bid auctions with the one-shot decision theory. Eur. J. Oper. Res. 261(3):994–1000.CrossrefGoogle Scholar
  • Wang J, Zhang W, Yuan S (2017) Display advertising with real-time bidding (RTB) and behavioural targeting. Foundations Trends Inform. Retrieval 11(4–5):297–435.CrossrefGoogle Scholar
  • Wilson R (1985) Game-Theoretic Analyses of Trading Processes (Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, CA).Google Scholar
  • Wu WCH, Yeh MY, Chen MS (2015) Predicting winning price in real time bidding with censored data. KDD’15: Proc. 21st ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1305–1314.Google Scholar
  • Wu W, Yeh MY, Chen MS (2018) Deep censored learning of the winning price in the real time bidding. KDD’18: Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2526–2535.Google Scholar
  • Xu Y, Ni B, Shen W, Wang X, Wang Z, Xue Y, Tang P (2024) Simultaneous optimization of bid shading and internal auction for demand-side platforms. Proc. AAAI Conf. Artificial Intelligence, vol. 38 (AAAI Press, Washington, DC), 9935–9943.Google Scholar
  • Yang T, Zhang L, Jin R, Yi J (2016) Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 449–457.Google Scholar
  • Zhang W, Yuan S, Wang J (2014) Optimal real-time bidding for display advertising. KDD’14: Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1077–1086.Google Scholar
  • Zhang W, Kitts B, Han Y, Zhou Z, Mao T, He H, Pan S, Flores A, Gultekin S, Weissman T (2021) MEOW: A space-efficient nonparametric bid shading algorithm. KDD’21: Proc. 27th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3928–3936.Google Scholar
  • Zhou T, He H, Pan S, Karlsson N, Shetty B, Kitts B, Gligorijevic D, et al. (2021) An efficient deep distribution network for bid shading in first-price auctions. KDD’21: Proc. 27th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3996–4004.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.