Bid Shading in First-Price Auction: Nonstationary Bayesian Multiarmed Bandit Methods for Real-Time Bidding
Abstract
In real-time bidding systems, ad exchanges and supply-side platforms are switching from the second-price auction (SPA) to the first-price auction (FPA), where advertisers pay the amount they bid if they win the auction. To mitigate the risk of overpaying, advertisers employ bid shading strategies to adjust their bids below their true valuations. Such strategies adopt a simplified assumption that the market price distribution is stationary over time and balance the trade-off between maximizing the probability of winning and minimizing costs. However, the real-world market price distribution is inherently nonstationary, and the current bidding strategies might fail in such a condition, especially as advertisers lack visibility into competitors’ bids. Therefore, we propose two complementary Bayesian multiarmed Bandit methods for nonstationary bid shading, namely, BayesMAB-NS and BayesMAB-CD. Our methods incorporate dependencies among arms, enabling the outcome of one bid to inform the rewards and selection criteria for others while progressively refining the market price distribution through Bayesian updates. BayesMAB-NS employs predefined segmentation and time discounting to adapt to evolving environments when prior structural knowledge about market dynamics is available. On the other hand, BayesMAB-CD introduces adaptive change-point detection and soft posterior resets to track unknown market price distribution changes automatically. Empirical evaluations using simulated data, real-world offline data sets, and online replay demonstrate strong performance of the BayesMAB family over nonstationary MAB baselines. In addition, the performance of BayesMAB and BayesMAB-NS is further validated in large-scale online A/B tests on a large Chinese online display advertising platform. The online results show reductions in cost per mille and cost per action by up to 18.68% and 17.71%, respectively, along with a 17.78% increase in return on investment without compromising winning rates. Our methods have been deployed online and used in practice to handle large volumes of traffic daily.
History: Martin Bichler, Senior Editor; Mochen Yang, Associate Editor.
Funding: This research was supported by the National Natural Science Foundation of China (NSFC) [Grant 72401210], the China Postdoctoral Science Foundation funded project [Grant 2024M752282], the Natural Science Foundation of Sichuan Province [Grant 2025NSFSC1998], the Hong Kong University of Science and Technology [Grant B000-0172-R9281], the General Research Fund of the Research Grants Council of Hong Kong [Grant 17209225], the seed grant of the HKU Shanghai Intelligent Computing Research Center, and the seed grant of Shenzhen Loop Area Institute.
Supplemental Material: The online appendix is available at https://doi.org/10.1287/isre.2025.1837.

