Bid Shading in First-Price Auction: Nonstationary Bayesian Multiarmed Bandit Methods for Real-Time Bidding
Published Online:1 Jun 2026https://doi.org/10.1287/isre.2025.1837
References
- (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
- (2023) Bayesian change-point detection for bandit feedback in non-stationary environments. Khan E, Gonen M, eds. Proc. 14th Asian Conf. Machine Learn. (PMLR, New York), 17–31.Google Scholar
- (2023) A multiarmed bandit approach for house ads recommendations. Marketing Sci. 42(2):271–292.Link, Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
- (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar
- (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. IEEE 36th Annual Foundations Comput. Sci. (IEEE, Piscataway, NJ), 322–331.Google Scholar
- (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (2019) Contextual bandits with cross-learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9679–9688.Google Scholar
- (2012) Random search for hyper-parameter optimization. J. Machine Learn. Res. 13:281–305.Google Scholar
- (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 199–207.Google Scholar
- (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2019) Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 418–427.Google Scholar
- (2019) Learning in online advertising. Marketing Sci. 38(4):584–608.Link, Google Scholar
- (2024) Agency market power and information disclosure in online advertising. Marketing Sci. 43(6):1279–1298.Link, Google Scholar
- (2020) Online display advertising markets: A literature review and future directions. Inform. Systems Res. 31(2):556–575.Link, Google Scholar
- (2022) Bypassing performance optimizers of real time bidding systems in display ad valuation. Inform. Systems Res. 33(2):399–412.Link, Google Scholar
- (2021) First-price auctions in online display advertising. J. Marketing Res. 58(5):888–907.Crossref, Google Scholar
- (2013) Multi-armed bandit with budget constraint and variable costs. Proc. AAAI Conf. Artificial Intelligence, vol. 27 (AAAI, Washington, DC), 232–238.Google Scholar
- (2008) Regret and feedback information in first-price sealed-bid auctions. Management Sci. 54(4):808–819.Link, Google Scholar
- (2009) A direct test of risk aversion and regret in first price sealed-bid auctions. Decision Anal. 6(2):75–86.Link, Google Scholar
- (2007) Auctions with anticipated regret: Theory and experiment. Amer. Econom. Rev. 97(4):1407–1418.Crossref, Google Scholar
- (2025) Learning in repeated multiunit pay-as-bid auctions. Manufacturing Service Oper. Management 27(1):200–229.Link, Google Scholar
- (2011) On upper-confidence bound policies for switching bandit problems. Kivinen J, Szepesvári C, Ukkonen E, Zeugmann T, eds. Algorithmic Learn. Theory. ALT 2011, Lecture Notes in Computer Science, vol. 6925 (Springer, Berlin, Heidelberg), 174–188.Google Scholar
- (2020) Bid shading in the brave new world of first-price auctions. CIKM’20: Proc. 29th ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 2453–2460.Google Scholar
- (2023) MEBS: Multi-task end-to-end bid shading for multi-slot display advertising. CIKM’23: Proc. 32nd ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 4588–4594.Google Scholar
- (2024) A Bayesian multi-armed bandit algorithm for bid shading in online display advertising. CIKM’24: Proc. 33rd ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 4506–4513.Google Scholar
- (2021) Multi-armed bandits with correlated arms. IEEE Trans. Inform. Theory 67(10):6711–6732.Crossref, Google Scholar
- (2025) Optimal no-regret learning in repeated first-price auctions. Oper. Res. 73(1):209–238.Link, Google Scholar
- (2020) Learning to bid optimally and efficiently in adversarial first-price auctions. Preprint, submitted July 9, https://arxiv.org/abs/2007.04568.Google Scholar
- (2015) Online optimization: Competing with dynamic comparators. Proc. 18th Internat. Conf. Artificial Intelligence Statist., vol. 38 (PMLR, New York), 398–406.Google Scholar
- (2015) Online advertisements and multi-armed bandits. Doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign.Google Scholar
- (2021) Adaptive bid shading optimization of first-price ad inventory. 2021 Amer. Control Conference (ACC) (IEEE, Piscataway, NJ), 4983–4990.Google Scholar
- (2024) Robust bidding in first-price auctions: How to bid without knowing what others are doing. Management Sci. 70(7):4219–4235.Link, Google Scholar
- (2015) Does feedback really matter in one-shot first-price auctions? J. Econom. Behav. Organ. 119:139–152.Crossref, Google Scholar
- (2024) Finite-time analysis of globally nonstationary multi-armed bandits. J. Machine Learn. Res. 25(112):1–56.Google Scholar
- (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Bach F, Blei D, eds. ICML’15: Proc. 32nd Internat. Conf. Machine Learn., vol. 37 (PMLR, New York), 1152–1161.Google Scholar
- (2022) Arbitrary distribution modeling with censorship in real-time bidding advertising. KDD’22: Proc. 28th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3250–3258.Google Scholar
- (2022) Morphing for consumer dynamics: Bandits meet hidden Markov models. Marketing Sci. 41(4):769–794.Link, Google Scholar
- (2024) Robust auto-bidding strategies for online advertising. KDD’24: Proc. 30th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1804–1815.Google Scholar
- (2018) Online pricing for revenue maximization with unknown time discounting valuations. Lang J, ed. IJCAI’18: Proc. 27th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Washington, DC), 440–446.Google Scholar
- (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Sci. 38(2):226–252.Link, Google Scholar
- (2023a) A survey on bid optimization in real-time bidding display advertising. ACM Trans. Knowledge Discovery Data 18(3):1–31.Google Scholar
- (2023b) Deep landscape forecasting in multi-slot real-time bidding. KDD’23: Proc. 29th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 4685–4695.Google Scholar
- (2020) Bid shading by win-rate estimation and surplus maximization. AdKDD’20: SIG Conf. Knowledge Discovery Data Mining (ACM, New York), 6 pages.Google Scholar
- (2007) Multi-armed bandit problems with dependent arms. Ghahramani Z, ed. ICML ‘07: Proc. 24th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 721–728.Google Scholar
- (2019) Deep landscape forecasting for real-time bidding advertising. KDD’23: Proc. 25th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 363–372.Google Scholar
- (2022) Setting reserve prices in second-price auctions with unobserved bids. INFORMS J. Comput. 34(6):2950–2967.Link, Google Scholar
- (1981) Optimal auctions. Amer. Econom. Rev. 71(3):381–392.Google Scholar
- (2018) Real-time bidding in online display advertising. Marketing Sci. 37(4):553–568.Link, Google Scholar
- (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.Link, Google Scholar
- (2024) Optimal multi-armed bandit with dependent arms. Preprint, submitted May 30, https://doi.org/10.2139/ssrn.4847785.Google Scholar
- (2021) Multi-armed bandits for bid shading in first-price real-time bidding auctions. J. Intelligent Fuzzy Systems 41(6):6111–6125.Google Scholar
- (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.Crossref, Google Scholar
- (2025) Online causal inference for advertising in real-time bidding auctions. Marketing Sci. 44(1):176–195.Link, Google Scholar
- (2017) Behavioral models for first-price sealed-bid auctions with the one-shot decision theory. Eur. J. Oper. Res. 261(3):994–1000.Crossref, Google Scholar
- (2017) Display advertising with real-time bidding (RTB) and behavioural targeting. Foundations Trends Inform. Retrieval 11(4–5):297–435.Crossref, Google Scholar
- (1985) Game-Theoretic Analyses of Trading Processes (Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, CA).Google Scholar
- (2015) Predicting winning price in real time bidding with censored data. KDD’15: Proc. 21st ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1305–1314.Google Scholar
- (2018) Deep censored learning of the winning price in the real time bidding. KDD’18: Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2526–2535.Google Scholar
- (2024) Simultaneous optimization of bid shading and internal auction for demand-side platforms. Proc. AAAI Conf. Artificial Intelligence, vol. 38 (AAAI Press, Washington, DC), 9935–9943.Google Scholar
- (2016) Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 449–457.Google Scholar
- (2014) Optimal real-time bidding for display advertising. KDD’14: Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1077–1086.Google Scholar
- (2021) MEOW: A space-efficient nonparametric bid shading algorithm. KDD’21: Proc. 27th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3928–3936.Google Scholar
- (2021) An efficient deep distribution network for bid shading in first-price auctions. KDD’21: Proc. 27th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3996–4004.Google Scholar

