[1] Alon N, Cesa-Bianchi N, Dekel O, Koren T (2015) Online learning with feedback graphs: Beyond bandits. Conf. Learn. Theory, 23–35.Google Scholar
[2] Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
[3] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
[4] Balseiro S, Golrezaei N, Mahdian M, Mirrokni V, Schneider J (2019) Contextual bandits with cross-learning. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems 9679–9688.Google Scholar
[5] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
[6] Beygelzimer A, Langford J, Li L, Reyzin L, Schapire R (2011) Contextual bandit algorithms with supervised learning guarantees. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 19–26.Google Scholar
[7] Braverman M, Mao J, Schneider J, Matthew Weinberg S (2018) Selling to a no-regret buyer. Proc. 2018 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 523–538.Google Scholar
[8] Bubeck S, Cesa-Bianchi N (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Foundations and Trends in Machine Learning, vol. 5.Google Scholar
[9] Cai Y, Daskalakis C (2017) Learning multi-item auctions with (or without) samples. Proc. 2017 IEEE 58th Annual Sympos. Foundations of Computer Sci. (IEEE), 516–527. FOCS.Google Scholar
[10] Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
[11] Cox S (2019) Simplifying programmatic: First price auctions for Google Ad Manager. Accessed September 15, 2022, https://www.blog.google/products/admanager/simplifying-programmatic-first-price-auctions-google-ad-manager.Google Scholar
[12] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
[13] den Boer AV, Zwart B (2013) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
[14] Dudík M, Haghtalab N, Luo H, Schapire RE, Syrgkanis V, Vaughan JW (2017) Oracle-efficient learning and auction design. FOCS.Google Scholar
[15] Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
[16] Feng Z, Podimata C, Syrgkanis V (2018) Learning to bid without knowing your value. Proc. 2018 ACM Conf. Econom. Comput. (ACM), 505–522.Google Scholar
[17] Golrezaei N, Jaillet P, Liang JCN (2020) No-regret learning in price competitions under consumer reference effects. Adv. Neural Inf. Processing Systems 33.Google Scholar
[18] Golrezaei N, Javanmard A, Mirrokni V (2021) Dynamic incentive-aware learning: Robust pricing in contextual auctions. Oper. Res. 69(1):297–314.Link, Google Scholar
[19] Golrezaei N, Liang JCN, Jaillet P (2019) Incentive-aware contextual pricing with non-parametric market noise. Preprint, submitted, https://arxiv.org/abs/1911.03508.Google Scholar
[20] Golrezaei N, Jaillet P, Liang JCN, Mirrokni V (2021) Bidding and pricing in budget and ROI constrained markets. Preprint, submitted July 16, https://arxiv.org/abs/2107.07725.Google Scholar
[21] Guan MY, Jiang H (2018) Nonparametric stochastic contextual bandits. 32nd AAAI Conf. Artificial Intelligence.Google Scholar
[22] Han Y, Zhou Z, Weissman T (2020) Optimal no-regret learning in repeated first-price auctions. Preprint, submitted March 22, https://arxiv.org/abs/2003.09795.Google Scholar
[23] Hartline J, Syrgkanis V, Tardos E (2015) No-regret learning in Bayesian games. Adv. Neural Inform. Processing Systems, 3061–3069.Google Scholar
[24] Hazan E, Megiddo N (2007) Online learning with prior knowledge. Internat. Conf. .Comput. Learn. Theory (Springer), 499–513.Google Scholar
[25] Kakade SM, Tauman Kalai A, Ligett K (2009) Playing games with approximation algorithms. SIAM J. Comput. 39(3):1088–1106.Crossref, Google Scholar
[26] Kale S, Reyzin L, Schapire RE (2010) Non-stochastic bandit slate problems. Adv.n Neural Inform. Processing Systems, 1054–1062.Google Scholar
[27] Kanoria Y, Nazerzadeh H (2014) Dynamic reserve prices for repeated auctions: Learning from bids. Internat. Conf. Web and Internet Econom. (Springer), 232.Google Scholar
[28] Kleinberg R, Niculescu-Mizil A, Sharma Y (2010) Regret bounds for sleeping experts and bandits. Machine Learn. 80(2–3):245–272.Crossref, Google Scholar
[29] Krause A, Ong CS (2011) Contextual Gaussian process bandit optimization. Adv. Neural Inform. Processing Systems, 2447–2455.Google Scholar
[30] Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
[31] Langford J, Zhang T (2008) The epoch-greedy algorithm for multi-armed bandits with side information. Platt JC, Koller D, Singer Y, Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Curran Associates, Inc.), 817–824.Google Scholar
[32] Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press).Crossref, Google Scholar
[33] Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM), 661–670.Google Scholar
[34] Lovász L (1972) A characterization of perfect graphs. J. Combin. Theory Ser. B. 13(2):95–98.Crossref, Google Scholar
[35] Lovász L (1972) Normal hypergraphs and the perfect graph conjecture. Discrete Math. 2(3):253–267.Crossref, Google Scholar
[36] Lu T, Pál D, Pál M (2009) Showing relevant ads via context multi-armed bandits. Proc. AISTATS.Google Scholar
[37] Mannor S, Shamir O (2011) From bandits to experts: On the value of side-observations. Adv. Neural Inform. Processing Systems, 684–692.Google Scholar
[38] Mohri M, Medina AM (2016) Learning algorithms for second-price auctions with reserve. J. Machine Learn. Res. 17(1):2632–2656.Google Scholar
[39] Morgenstern J, Roughgarden T (2016) Learning simple auctions. Feldman V, Rakhlin A, Shamir O (eds.) 29th Annual Conf. Learn. Theory (Proc. Machine Learn. Res.), vol. 49 (Columbia University, New York), 1298–1318.Google Scholar
[40] Niazadeh R, Golrezaei N, Wang JR, Susan F, Badanidiyuru A (2021) Online learning via offline greedy algorithms: Applications in market design and optimization. Proc. 22nd ACM Conf. Econom. Comput., 737–738.Google Scholar
[41] Perchet V, Rigollet P (2013) The multi-armed bandit problem with covariates. Ann. Statist. 41(2):693–721.Crossref, Google Scholar
[42] Qian W, Yang Y (2016) Kernel estimation and model combination in a bandit problem with covariates. J. Machine Learn. Res. 17(1):5181–5217.Google Scholar
[43] Rigollet P, Zeevi A (2010) Nonparametric bandits with covariates. Preprint, submitted March 8, https://arxiv.org/abs/1003.1630.Google Scholar
[44] Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
[45] Slivkins A (2011) Contextual bandits with similarity information. Proc. 24th Annual Conf. Learn. Theory, 679–702.Google Scholar
[46] Van Parys B, Golrezaei N (2020) Optimal learning for structured bandits. Preprint, submitted August 12, https://dx.doi.org/10.2139/ssrn.3651397.Google Scholar
[47] Vickrey W (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.Crossref, Google Scholar
[48] Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.Crossref, Google Scholar
[49] Weed J, Perchet V, Rigollet P (2016) Online learning in repeated auctions. Conf. Learn. Theory, 1562–1583.Google Scholar
[50] West DB (1996) Introduction to Graph Theory, vol. 2 (Prentice Hall, Upper Saddle River, NJ).Google Scholar

cover image Mathematics of Operations Research

Volume 48, Issue 3

August 2023

Pages 1213-1809, C2

Article Information

Supplemental Material

Metrics

Information

Received:July 16, 2020
Accepted:August 09, 2022
Published Online:September 30, 2022

Cite as

Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni, Jon Schneider (2022) Contextual Bandits with Cross-Learning. Mathematics of Operations Research 48(3):1607-1629.

https://doi.org/10.1287/moor.2022.1313

Keywords

Acknowledgment

Part of this work has appeared in Balseiro et al. [4].

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Contextual Bandits with Cross-Learning

References

Volume 48, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News