Contextual Bandits with Cross-Learning

Published Online:https://doi.org/10.1287/moor.2022.1313

References

  • [1] Alon N, Cesa-Bianchi N, Dekel O, Koren T (2015) Online learning with feedback graphs: Beyond bandits. Conf. Learn. Theory, 23–35.Google Scholar
  • [2] Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.LinkGoogle Scholar
  • [3] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • [4] Balseiro S, Golrezaei N, Mahdian M, Mirrokni V, Schneider J (2019) Contextual bandits with cross-learning. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems 9679–9688.Google Scholar
  • [5] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.LinkGoogle Scholar
  • [6] Beygelzimer A, Langford J, Li L, Reyzin L, Schapire R (2011) Contextual bandit algorithms with supervised learning guarantees. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 19–26.Google Scholar
  • [7] Braverman M, Mao J, Schneider J, Matthew Weinberg S (2018) Selling to a no-regret buyer. Proc. 2018 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 523–538.Google Scholar
  • [8] Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Foundations and Trends in Machine Learning, vol. 5.Google Scholar
  • [9] Cai Y, Daskalakis C (2017) Learning multi-item auctions with (or without) samples. Proc. 2017 IEEE 58th Annual Sympos. Foundations of Computer Sci. (IEEE), 516–527. FOCS.Google Scholar
  • [10] Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.LinkGoogle Scholar
  • [11] Cox S (2019) Simplifying programmatic: First price auctions for Google Ad Manager. Accessed September 15, 2022, https://www.blog.google/products/admanager/simplifying-programmatic-first-price-auctions-google-ad-manager.Google Scholar
  • [12] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • [13] den Boer AV, Zwart B (2013) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.LinkGoogle Scholar
  • [14] Dudík M, Haghtalab N, Luo H, Schapire RE, Syrgkanis V, Vaughan JW (2017) Oracle-efficient learning and auction design. FOCS.Google Scholar
  • [15] Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.LinkGoogle Scholar
  • [16] Feng Z, Podimata C, Syrgkanis V (2018) Learning to bid without knowing your value. Proc. 2018 ACM Conf. Econom. Comput. (ACM), 505–522.Google Scholar
  • [17] Golrezaei N, Jaillet P, Liang JCN (2020) No-regret learning in price competitions under consumer reference effects. Adv. Neural Inf. Processing Systems 33.Google Scholar
  • [18] Golrezaei N, Javanmard A, Mirrokni V (2021) Dynamic incentive-aware learning: Robust pricing in contextual auctions. Oper. Res. 69(1):297–314.LinkGoogle Scholar
  • [19] Golrezaei N, Liang JCN, Jaillet P (2019) Incentive-aware contextual pricing with non-parametric market noise. Preprint, submitted, https://arxiv.org/abs/1911.03508.Google Scholar
  • [20] Golrezaei N, Jaillet P, Liang JCN, Mirrokni V (2021) Bidding and pricing in budget and ROI constrained markets. Preprint, submitted July 16, https://arxiv.org/abs/2107.07725.Google Scholar
  • [21] Guan MY, Jiang H (2018) Nonparametric stochastic contextual bandits. 32nd AAAI Conf. Artificial Intelligence.Google Scholar
  • [22] Han Y, Zhou Z, Weissman T (2020) Optimal no-regret learning in repeated first-price auctions. Preprint, submitted March 22, https://arxiv.org/abs/2003.09795.Google Scholar
  • [23] Hartline J, Syrgkanis V, Tardos E (2015) No-regret learning in Bayesian games. Adv. Neural Inform. Processing Systems, 3061–3069.Google Scholar
  • [24] Hazan E, Megiddo N (2007) Online learning with prior knowledge. Internat. Conf. .Comput. Learn. Theory (Springer), 499–513.Google Scholar
  • [25] Kakade SM, Tauman Kalai A, Ligett K (2009) Playing games with approximation algorithms. SIAM J. Comput. 39(3):1088–1106.CrossrefGoogle Scholar
  • [26] Kale S, Reyzin L, Schapire RE (2010) Non-stochastic bandit slate problems. Adv.n Neural Inform. Processing Systems, 1054–1062.Google Scholar
  • [27] Kanoria Y, Nazerzadeh H (2014) Dynamic reserve prices for repeated auctions: Learning from bids. Internat. Conf. Web and Internet Econom. (Springer), 232.Google Scholar
  • [28] Kleinberg R, Niculescu-Mizil A, Sharma Y (2010) Regret bounds for sleeping experts and bandits. Machine Learn. 80(2–3):245–272.CrossrefGoogle Scholar
  • [29] Krause A, Ong CS (2011) Contextual Gaussian process bandit optimization. Adv. Neural Inform. Processing Systems, 2447–2455.Google Scholar
  • [30] Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • [31] Langford J, Zhang T (2008) The epoch-greedy algorithm for multi-armed bandits with side information. Platt JC, Koller D, Singer Y, Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Curran Associates, Inc.), 817–824.Google Scholar
  • [32] Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press).CrossrefGoogle Scholar
  • [33] Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM), 661–670.Google Scholar
  • [34] Lovász L (1972) A characterization of perfect graphs. J. Combin. Theory Ser. B. 13(2):95–98.CrossrefGoogle Scholar
  • [35] Lovász L (1972) Normal hypergraphs and the perfect graph conjecture. Discrete Math. 2(3):253–267.CrossrefGoogle Scholar
  • [36] Lu T, Pál D, Pál M (2009) Showing relevant ads via context multi-armed bandits. Proc. AISTATS.Google Scholar
  • [37] Mannor S, Shamir O (2011) From bandits to experts: On the value of side-observations. Adv. Neural Inform. Processing Systems, 684–692.Google Scholar
  • [38] Mohri M, Medina AM (2016) Learning algorithms for second-price auctions with reserve. J. Machine Learn. Res. 17(1):2632–2656.Google Scholar
  • [39] Morgenstern J, Roughgarden T (2016) Learning simple auctions. Feldman V, Rakhlin A, Shamir O (eds.) 29th Annual Conf. Learn. Theory (Proc. Machine Learn. Res.), vol. 49 (Columbia University, New York), 1298–1318.Google Scholar
  • [40] Niazadeh R, Golrezaei N, Wang JR, Susan F, Badanidiyuru A (2021) Online learning via offline greedy algorithms: Applications in market design and optimization. Proc. 22nd ACM Conf. Econom. Comput., 737–738.Google Scholar
  • [41] Perchet V, Rigollet P (2013) The multi-armed bandit problem with covariates. Ann. Statist. 41(2):693–721.CrossrefGoogle Scholar
  • [42] Qian W, Yang Y (2016) Kernel estimation and model combination in a bandit problem with covariates. J. Machine Learn. Res. 17(1):5181–5217.Google Scholar
  • [43] Rigollet P, Zeevi A (2010) Nonparametric bandits with covariates. Preprint, submitted March 8, https://arxiv.org/abs/1003.1630.Google Scholar
  • [44] Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • [45] Slivkins A (2011) Contextual bandits with similarity information. Proc. 24th Annual Conf. Learn. Theory, 679–702.Google Scholar
  • [46] Van Parys B, Golrezaei N (2020) Optimal learning for structured bandits. Preprint, submitted August 12, https://dx.doi.org/10.2139/ssrn.3651397.Google Scholar
  • [47] Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.CrossrefGoogle Scholar
  • [48] Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.CrossrefGoogle Scholar
  • [49] Weed J, Perchet V, Rigollet P (2016) Online learning in repeated auctions. Conf. Learn. Theory, 1562–1583.Google Scholar
  • [50] West DB (1996) Introduction to Graph Theory, vol. 2 (Prentice Hall, Upper Saddle River, NJ).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.