Bid Shading in First-Price Auction: Nonstationary Bayesian Multiarmed Bandit Methods for Real-Time Bidding

Mengzhuo Guo
Mengzhuo Guo
[email protected]
https://orcid.org/0000-0003-3559-733X
Business School, Sichuan University, Chengdu, Sichuan Province 610065, China
Search for more papers by this author
,
Wuqi Zhang
Wuqi Zhang
[email protected]
https://orcid.org/0000-0003-4236-1827
Tencent, Nanshan District, Shenzhen, Guangdong Province 518054, China
Search for more papers by this author
,
Yiwen Shen
Yiwen Shen
[email protected]
https://orcid.org/0000-0002-9170-9044
School of Business and Management, Hong Kong University of Science and Technology, Kowloon, Hong Kong 999077, China
Search for more papers by this author
,
Qingpeng Zhang
Corresponding Author
Qingpeng Zhang
[email protected]
https://orcid.org/0000-0002-6819-0686
Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong, China; and Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China; and HKU Shanghai Intelligent Computing Research Center, Shanghai, China; and Shenzhen Loop Area Institute, Shenzhen, China
Search for more papers by this author

Business School, Sichuan University, Chengdu, Sichuan Province 610065, China

Search for more papers by this author

Wuqi Zhang

[email protected]

https://orcid.org/0000-0003-4236-1827

Tencent, Nanshan District, Shenzhen, Guangdong Province 518054, China

Search for more papers by this author

Yiwen Shen

[email protected]

https://orcid.org/0000-0002-9170-9044

School of Business and Management, Hong Kong University of Science and Technology, Kowloon, Hong Kong 999077, China

Search for more papers by this author

Qingpeng Zhang

Corresponding Author

Qingpeng Zhang

[email protected]

https://orcid.org/0000-0002-6819-0686

Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong, China; and Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China; and HKU Shanghai Intelligent Computing Research Center, Shanghai, China; and Shenzhen Loop Area Institute, Shenzhen, China

Search for more papers by this author

Published Online:1 Jun 2026https://doi.org/10.1287/isre.2025.1837

References

Abbasi A, Parsons J, Pant G, Sheng ORL, Sarker S (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
Alami R (2023) Bayesian change-point detection for bandit feedback in non-stationary environments. Khan E, Gonen M, eds. Proc. 14th Asian Conf. Machine Learn. (PMLR, New York), 17–31.Google Scholar
Aramayo N, Schiappacasse M, Goic M (2023) A multiarmed bandit approach for house ads recommendations. Marketing Sci. 42(2):271–292.Link, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. IEEE 36th Annual Foundations Comput. Sci. (IEEE, Piscataway, NJ), 322–331.Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Balseiro S, Golrezaei N, Mahdian M, Mirrokni V, Schneider J (2019) Contextual bandits with cross-learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9679–9688.Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J. Machine Learn. Res. 13:281–305.Google Scholar
Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. NIPS’14: Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 199–207.Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
Cao Y, Wen Z, Kveton B, Xie Y (2019) Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. Chaudhuri K, Sugiyama M, eds. Proc. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 418–427.Google Scholar
Choi WJ, Sayedi A (2019) Learning in online advertising. Marketing Sci. 38(4):584–608.Link, Google Scholar
Choi WJ, Sayedi A (2024) Agency market power and information disclosure in online advertising. Marketing Sci. 43(6):1279–1298.Link, Google Scholar
Choi H, Mela CF, Balseiro SR, Leary A (2020) Online display advertising markets: A literature review and future directions. Inform. Systems Res. 31(2):556–575.Link, Google Scholar
Christopher RM, Park S, Han SP, Kim MK (2022) Bypassing performance optimizers of real time bidding systems in display ad valuation. Inform. Systems Res. 33(2):399–412.Link, Google Scholar
Despotakis S, Ravi R, Sayedi A (2021) First-price auctions in online display advertising. J. Marketing Res. 58(5):888–907.Crossref, Google Scholar
Ding W, Qin T, Zhang XD, Liu TY (2013) Multi-armed bandit with budget constraint and variable costs. Proc. AAAI Conf. Artificial Intelligence, vol. 27 (AAAI, Washington, DC), 232–238.Google Scholar
Engelbrecht-Wiggans R, Katok E (2008) Regret and feedback information in first-price sealed-bid auctions. Management Sci. 54(4):808–819.Link, Google Scholar
Engelbrecht-Wiggans R, Katok E (2009) A direct test of risk aversion and regret in first price sealed-bid auctions. Decision Anal. 6(2):75–86.Link, Google Scholar
Filiz-Ozbay E, Ozbay EY (2007) Auctions with anticipated regret: Theory and experiment. Amer. Econom. Rev. 97(4):1407–1418.Crossref, Google Scholar
Galgana R, Golrezaei N (2025) Learning in repeated multiunit pay-as-bid auctions. Manufacturing Service Oper. Management 27(1):200–229.Link, Google Scholar
Garivier A, Moulines E (2011) On upper-confidence bound policies for switching bandit problems. Kivinen J, Szepesvári C, Ukkonen E, Zeugmann T, eds. Algorithmic Learn. Theory. ALT 2011, Lecture Notes in Computer Science, vol. 6925 (Springer, Berlin, Heidelberg), 174–188.Google Scholar
Gligorijevic D, Zhou T, Shetty B, Kitts B, Pan S, Pan J, Flores A (2020) Bid shading in the brave new world of first-price auctions. CIKM’20: Proc. 29th ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 2453–2460.Google Scholar
Gong Z, Niu L, Zhao Y, Xu M, Zhang H, Zheng Z, Zhang Z, et al. (2023) MEBS: Multi-task end-to-end bid shading for multi-slot display advertising. CIKM’23: Proc. 32nd ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 4588–4594.Google Scholar
Guo M, Zhang W, Yuan C, Jia B, Song G, Hua H, Wang S, Zhang Q (2024) A Bayesian multi-armed bandit algorithm for bid shading in online display advertising. CIKM’24: Proc. 33rd ACM Internat. Conf. Inform. Knowledge Management (Association for Computing Machinery, New York), 4506–4513.Google Scholar
Gupta S, Chaudhari S, Joshi G, Yağan O (2021) Multi-armed bandits with correlated arms. IEEE Trans. Inform. Theory 67(10):6711–6732.Crossref, Google Scholar
Han Y, Weissman T, Zhou Z (2025) Optimal no-regret learning in repeated first-price auctions. Oper. Res. 73(1):209–238.Link, Google Scholar
Han Y, Zhou Z, Flores A, Ordentlich E, Weissman T (2020) Learning to bid optimally and efficiently in adversarial first-price auctions. Preprint, submitted July 9, https://arxiv.org/abs/2007.04568.Google Scholar
Jadbabaie A, Rakhlin A, Shahrampour S, Sridharan K (2015) Online optimization: Competing with dynamic comparators. Proc. 18th Internat. Conf. Artificial Intelligence Statist., vol. 38 (PMLR, New York), 398–406.Google Scholar
Jiang C (2015) Online advertisements and multi-armed bandits. Doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign.Google Scholar
Karlsson N, Sang Q (2021) Adaptive bid shading optimization of first-price ad inventory. 2021 Amer. Control Conference (ACC) (IEEE, Piscataway, NJ), 4983–4990.Google Scholar
Kasberger B, Schlag KH (2024) Robust bidding in first-price auctions: How to bid without knowing what others are doing. Management Sci. 70(7):4219–4235.Link, Google Scholar
Katuščák P, Michelucci F, Zajíček M (2015) Does feedback really matter in one-shot first-price auctions? J. Econom. Behav. Organ. 119:139–152.Crossref, Google Scholar
Komiyama J, Fouché E, Honda J (2024) Finite-time analysis of globally nonstationary multi-armed bandits. J. Machine Learn. Res. 25(112):1–56.Google Scholar
Komiyama J, Honda J, Nakagawa H (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Bach F, Blei D, eds. ICML’15: Proc. 32nd Internat. Conf. Machine Learn., vol. 37 (PMLR, New York), 1152–1161.Google Scholar
Li X, Zhang MM, Wang Z, Tong Y (2022) Arbitrary distribution modeling with censorship in real-time bidding advertising. KDD’22: Proc. 28th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3250–3258.Google Scholar
Liberali G, Ferecatu A (2022) Morphing for consumer dynamics: Bandits meet hidden Markov models. Marketing Sci. 41(4):769–794.Link, Google Scholar
Lin Q, Zheng Z, Wu F (2024) Robust auto-bidding strategies for online advertising. KDD’24: Proc. 30th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1804–1815.Google Scholar
Mao W, Zheng Z, Wu F, Chen G (2018) Online pricing for revenue maximization with unknown time discounting valuations. Lang J, ed. IJCAI’18: Proc. 27th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Washington, DC), 440–446.Google Scholar
Misra K, Schwartz EM, Abernethy J (2019) Dynamic online pricing with incomplete information using multiarmed bandit experiments. Marketing Sci. 38(2):226–252.Link, Google Scholar
Ou W, Chen B, Dai X, Zhang W, Liu W, Tang R, Yu Y (2023a) A survey on bid optimization in real-time bidding display advertising. ACM Trans. Knowledge Discovery Data 18(3):1–31.Google Scholar
Ou W, Chen B, Yang Y, Dai X, Liu W, Zhang W, Tang R, Yu Y (2023b) Deep landscape forecasting in multi-slot real-time bidding. KDD’23: Proc. 29th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 4685–4695.Google Scholar
Pan S, Kitts B, Zhou T, He H, Shetty B, Flores A, Gligorijevic D, et al. (2020) Bid shading by win-rate estimation and surplus maximization. AdKDD’20: SIG Conf. Knowledge Discovery Data Mining (ACM, New York), 6 pages.Google Scholar
Pandey S, Chakrabarti D, Agarwal D (2007) Multi-armed bandit problems with dependent arms. Ghahramani Z, ed. ICML ‘07: Proc. 24th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 721–728.Google Scholar
Ren K, Qin J, Zheng L, Yang Z, Zhang W, Yu Y (2019) Deep landscape forecasting for real-time bidding advertising. KDD’23: Proc. 25th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 363–372.Google Scholar
Rhuggenaath J, Akcay A, Zhang Y, Kaymak U (2022) Setting reserve prices in second-price auctions with unobserved bids. INFORMS J. Comput. 34(6):2950–2967.Link, Google Scholar
Riley JG, Samuelson WF (1981) Optimal auctions. Amer. Econom. Rev. 71(3):381–392.Google Scholar
Sayedi A (2018) Real-time bidding in online display advertising. Marketing Sci. 37(4):553–568.Link, Google Scholar
Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.Link, Google Scholar
Sunar N, Wang Z (2024) Optimal multi-armed bandit with dependent arms. Preprint, submitted May 30, https://doi.org/10.2139/ssrn.4847785.Google Scholar
Tilli T, Espinosa-Leal L (2021) Multi-armed bandits for bid shading in first-price real-time bidding auctions. J. Intelligent Fuzzy Systems 41(6):6111–6125.Google Scholar
Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.Crossref, Google Scholar
Waisman C, Nair HS, Carrion C (2025) Online causal inference for advertising in real-time bidding auctions. Marketing Sci. 44(1):176–195.Link, Google Scholar
Wang C, Guo P (2017) Behavioral models for first-price sealed-bid auctions with the one-shot decision theory. Eur. J. Oper. Res. 261(3):994–1000.Crossref, Google Scholar
Wang J, Zhang W, Yuan S (2017) Display advertising with real-time bidding (RTB) and behavioural targeting. Foundations Trends Inform. Retrieval 11(4–5):297–435.Crossref, Google Scholar
Wilson R (1985) Game-Theoretic Analyses of Trading Processes (Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, CA).Google Scholar
Wu WCH, Yeh MY, Chen MS (2015) Predicting winning price in real time bidding with censored data. KDD’15: Proc. 21st ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1305–1314.Google Scholar
Wu W, Yeh MY, Chen MS (2018) Deep censored learning of the winning price in the real time bidding. KDD’18: Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2526–2535.Google Scholar
Xu Y, Ni B, Shen W, Wang X, Wang Z, Xue Y, Tang P (2024) Simultaneous optimization of bid shading and internal auction for demand-side platforms. Proc. AAAI Conf. Artificial Intelligence, vol. 38 (AAAI Press, Washington, DC), 9935–9943.Google Scholar
Yang T, Zhang L, Jin R, Yi J (2016) Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 449–457.Google Scholar
Zhang W, Yuan S, Wang J (2014) Optimal real-time bidding for display advertising. KDD’14: Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1077–1086.Google Scholar
Zhang W, Kitts B, Han Y, Zhou Z, Mao T, He H, Pan S, Flores A, Gultekin S, Weissman T (2021) MEOW: A space-efficient nonparametric bid shading algorithm. KDD’21: Proc. 27th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3928–3936.Google Scholar
Zhou T, He H, Pan S, Karlsson N, Shetty B, Kitts B, Gligorijevic D, et al. (2021) An efficient deep distribution network for bid shading in first-price auctions. KDD’21: Proc. 27th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3996–4004.Google Scholar

cover image Information Systems Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:January 22, 2025
Accepted:April 10, 2026
Published Online:June 01, 2026

Cite as

Mengzhuo Guo, Wuqi Zhang, Yiwen Shen, Qingpeng Zhang (2026) Bid Shading in First-Price Auction: Nonstationary Bayesian Multiarmed Bandit Methods for Real-Time Bidding. Information Systems Research 0(0).

https://doi.org/10.1287/isre.2025.1837

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bid Shading in First-Price Auction: Nonstationary Bayesian Multiarmed Bandit Methods for Real-Time Bidding

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News