Contextual Bandits with Cross-Learning
Published Online:30 Sep 2022https://doi.org/10.1287/moor.2022.1313
References
- [1] (2015) Online learning with feedback graphs: Beyond bandits. Conf. Learn. Theory, 23–35.Google Scholar
- [2] (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.Link, Google Scholar
- [3] (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- [4] (2019) Contextual bandits with cross-learning. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems 9679–9688.Google Scholar
- [5] (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.Link, Google Scholar
- [6] (2011) Contextual bandit algorithms with supervised learning guarantees. Proc. 14th Internat. Conf. Artificial Intelligence Statist., 19–26.Google Scholar
- [7] (2018) Selling to a no-regret buyer. Proc. 2018 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 523–538.Google Scholar
- [8] (2012) Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Foundations and Trends in Machine Learning, vol. 5.Google Scholar
- [9] (2017) Learning multi-item auctions with (or without) samples. Proc. 2017 IEEE 58th Annual Sympos. Foundations of Computer Sci. (IEEE), 516–527. FOCS.Google Scholar
- [10] (2017) Dynamic pricing and demand learning with limited price experimentation. Oper. Res. 65(6):1722–1731.Link, Google Scholar
- [11] (2019) Simplifying programmatic: First price auctions for Google Ad Manager. Accessed September 15, 2022, https://www.blog.google/products/admanager/simplifying-programmatic-first-price-auctions-google-ad-manager.Google Scholar
- [12] (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
- [13] (2013) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.Link, Google Scholar
- [14] (2017) Oracle-efficient learning and auction design. FOCS.Google Scholar
- [15] (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.Link, Google Scholar
- [16] (2018) Learning to bid without knowing your value. Proc. 2018 ACM Conf. Econom. Comput. (ACM), 505–522.Google Scholar
- [17] (2020) No-regret learning in price competitions under consumer reference effects. Adv. Neural Inf. Processing Systems 33.Google Scholar
- [18] (2021) Dynamic incentive-aware learning: Robust pricing in contextual auctions. Oper. Res. 69(1):297–314.Link, Google Scholar
- [19] (2019) Incentive-aware contextual pricing with non-parametric market noise. Preprint, submitted, https://arxiv.org/abs/1911.03508.Google Scholar
- [20] (2021) Bidding and pricing in budget and ROI constrained markets. Preprint, submitted July 16, https://arxiv.org/abs/2107.07725.Google Scholar
- [21] (2018) Nonparametric stochastic contextual bandits. 32nd AAAI Conf. Artificial Intelligence.Google Scholar
- [22] (2020) Optimal no-regret learning in repeated first-price auctions. Preprint, submitted March 22, https://arxiv.org/abs/2003.09795.Google Scholar
- [23] (2015) No-regret learning in Bayesian games. Adv. Neural Inform. Processing Systems, 3061–3069.Google Scholar
- [24] (2007) Online learning with prior knowledge. Internat. Conf. .Comput. Learn. Theory (Springer), 499–513.Google Scholar
- [25] (2009) Playing games with approximation algorithms. SIAM J. Comput. 39(3):1088–1106.Crossref, Google Scholar
- [26] (2010) Non-stochastic bandit slate problems. Adv.n Neural Inform. Processing Systems, 1054–1062.Google Scholar
- [27] (2014) Dynamic reserve prices for repeated auctions: Learning from bids. Internat. Conf. Web and Internet Econom. (Springer), 232.Google Scholar
- [28] (2010) Regret bounds for sleeping experts and bandits. Machine Learn. 80(2–3):245–272.Crossref, Google Scholar
- [29] (2011) Contextual Gaussian process bandit optimization. Adv. Neural Inform. Processing Systems, 2447–2455.Google Scholar
- [30] (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- [31] (2008) The epoch-greedy algorithm for multi-armed bandits with side information. Platt JC, Koller D, Singer Y, Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Curran Associates, Inc.), 817–824.Google Scholar
- [32] (2020) Bandit Algorithms (Cambridge University Press).Crossref, Google Scholar
- [33] (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM), 661–670.Google Scholar
- [34] (1972) A characterization of perfect graphs. J. Combin. Theory Ser. B. 13(2):95–98.Crossref, Google Scholar
- [35] (1972) Normal hypergraphs and the perfect graph conjecture. Discrete Math. 2(3):253–267.Crossref, Google Scholar
- [36] (2009) Showing relevant ads via context multi-armed bandits. Proc. AISTATS.Google Scholar
- [37] (2011) From bandits to experts: On the value of side-observations. Adv. Neural Inform. Processing Systems, 684–692.Google Scholar
- [38] (2016) Learning algorithms for second-price auctions with reserve. J. Machine Learn. Res. 17(1):2632–2656.Google Scholar
- [39] (2016) Learning simple auctions. Feldman V, Rakhlin A, Shamir O (eds.) 29th Annual Conf. Learn. Theory (Proc. Machine Learn. Res.), vol. 49 (Columbia University, New York), 1298–1318.Google Scholar
- [40] (2021) Online learning via offline greedy algorithms: Applications in market design and optimization. Proc. 22nd ACM Conf. Econom. Comput., 737–738.Google Scholar
- [41] (2013) The multi-armed bandit problem with covariates. Ann. Statist. 41(2):693–721.Crossref, Google Scholar
- [42] (2016) Kernel estimation and model combination in a bandit problem with covariates. J. Machine Learn. Res. 17(1):5181–5217.Google Scholar
- [43] (2010) Nonparametric bandits with covariates. Preprint, submitted March 8, https://arxiv.org/abs/1003.1630.Google Scholar
- [44] (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
- [45] (2011) Contextual bandits with similarity information. Proc. 24th Annual Conf. Learn. Theory, 679–702.Google Scholar
- [46] (2020) Optimal learning for structured bandits. Preprint, submitted August 12, https://dx.doi.org/10.2139/ssrn.3651397.Google Scholar
- [47] (1961) Counterspeculation, auctions, and competitive sealed tenders. J. Finance 16(1):8–37.Crossref, Google Scholar
- [48] (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.Crossref, Google Scholar
- [49] (2016) Online learning in repeated auctions. Conf. Learn. Theory, 1562–1583.Google Scholar
- [50] (1996) Introduction to Graph Theory, vol. 2 (Prentice Hall, Upper Saddle River, NJ).Google Scholar

