Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms
Published Online:26 Jun 2026https://doi.org/10.1287/mnsc.2022.03394
References
- (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
- (1976) Commodity bundling and the burden of monopoly. Quart. J. Econom. 90(3):475–498.Google Scholar
- (2014) Bandits with concave rewards and convex knapsacks. Babaioff M, Conitzer V, Easley D, Nisan N, eds. Proc. 15th ACM Conf. Econom. Comput. (ACM, New York), 989–1006.Google Scholar
- (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. Conf. Learn. Theory, vol. 39 (PMLR, New York), 1–26.Google Scholar
- (2023) Causal matrix completion. Neu G, Rosasco L, eds. Proc. 36th Ann. Conf. Learn. Theory (PMLR, New York), 3821–3826.Google Scholar
- (2023) Optimal pricing with a single point. Management Sci. 69(10):5866–5882.Link, Google Scholar
- (2021) Matrix completion methods for causal panel data models. J. Amer. Statist. Assoc. 116(536):1716–1730.Crossref, Google Scholar
- (2002a) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar
- (2002b) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (2021) Multiple randomization designs. Preprint, submitted December 27, https://arxiv.org/abs/2112.13495.Google Scholar
- (2022) Artificial replay: A meta-algorithm for harnessing historical data in bandits. Preprint, submitted October 1, https://arxiv.org/abs/2210.00025.Google Scholar
- (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
- (2022) Learning personalized product recommendations with customer disengagement. Manufacturing Service Oper. Management 24(4):2010–2028.Link, Google Scholar
- (2020) Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms. Adv. Neural Inform. Processing Systems 33:1713–1723.Google Scholar
- (1997) Bandit problems with infinitely many arms. Ann. Statist. 25(5):2103–2116.Google Scholar
- (2023) How big should your data really be? Data-driven newsvendor: Learning one sample at a time. Management Sci. 69(10):5848–5865.Link, Google Scholar
- (2022) The creator economy: Managing ecosystem supply, revenue sharing, and platform design. Management Sci. 68(7):5233–5251.Link, Google Scholar
- (2013) Two-target algorithms for infinite-armed bandits with Bernoulli rewards. Adv. Neural Inform. Processing Systems 26:2184–2192.Google Scholar
- (2011) Handbook of Markov Chain Monte Carlo (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (2012) Exact matrix completion via convex optimization. Comm. ACM 55(6):111–119.Crossref, Google Scholar
- (2015) Simple regret for infinitely many armed bandits. Bach F, Blei D, eds. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1133–1141.Google Scholar
- (2018) Quantile-Regret Minimisation in Infinitely Many-Armed Bandits (UAI).Google Scholar
- (2015) Incoherence-optimal matrix completion. IEEE Trans. Inform. Theory 61(5):2909–2923.Crossref, Google Scholar
- (2020) Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM J. Optim. 30(4):3098–3121.Crossref, Google Scholar
- (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 208–214.Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
- (2016) An overview of low-rank matrix recovery from incomplete observations. IEEE J. Selected Topics Signal Processing 10(4):608–622.Crossref, Google Scholar
- (2022) Uncertainty quantification for low-rank matrix completion with heterogeneous and sub-exponential noise. Camps-Valls G, Ruiz FJR, Guyon I, eds. Proc. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1179–1189.Google Scholar
- (2010) Parametric bandits: The generalized linear case. Adv. Neural Inform. Processing Systems 23:586–594.Google Scholar
- (2020) Online evaluation of audiences for targeted advertising via bandit experiments. Proc. AAAI Conf. Artificial Intelligence 34(1):13273–13279.Crossref, Google Scholar
- (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.Link, Google Scholar
- (2014) Thompson sampling for complex online problems. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 100–108.Google Scholar
- (2022) Data pooling in stochastic optimization. Management Sci. 68(3):1595–1615.Link, Google Scholar
- (2021) Small-data, large-scale linear optimization with uncertain objectives. Management Sci. 67(1):220–241.Link, Google Scholar
- (2019) Personalizing many decisions with high-dimensional covariates. Adv. Neural Inform. Processing Systems 32:11469–11480.Google Scholar
- (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (Taylor & Francis, Boca Raton, FL).Crossref, Google Scholar
- (2022) Experimental design in two-sided platforms: An analysis of bias. Management Sci. 68(10):7069–7089.Link, Google Scholar
- (2019) Bilinear bandits with low-rank structure. Chaudhuri K, Salakhutdinov R, eds. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 3163–3172.Google Scholar
- (1993) Learning in Embedded Systems (MIT Press, Cambridge, MA).Crossref, Google Scholar
- (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.Link, Google Scholar
- (2017) Stochastic rank-1 bandits. Singh A, Zhu J, eds. Proc. Artificial Intelligence Statist. (PMLR, New York), 392–401.Google Scholar
- (1995) Sequential choice from several populations. Proc. Natl. Acad. Sci. USA 92(19):8584.Crossref, Google Scholar
- (2010) Matrix completion from a few entries. IEEE Trans. Informs. Theory 56(6):2980–2998.Crossref, Google Scholar
- (2024) Data-driven clustering and feature-based retail electricity pricing with smart meters. Oper. Res. 73(5):2636–2660.Link, Google Scholar
- (2017) Stochastic low-rank bandits. Preprint, submitted December 13, https://arxiv.org/abs/1712.04644.Google Scholar
- (2020) Randomized exploration in generalized linear bandits. Proc. Internat. Conf. Artificial Intelligence Statist., 2066–2076.Google Scholar
- (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.Crossref, Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, MA).Crossref, Google Scholar
- (2017) Provably optimal algorithms for generalized linear contextual bandits. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 2071–2080.Google Scholar
- (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
- (2014) Facing the cold start problem in recommender systems. Expert Systems Appl. 41(4):2065–2073.Crossref, Google Scholar
- (2011) Content-based recommender systems: State of the art and trends. Ricci F, Rokach L, Shapira B, Kantor P, eds. Recommender Systems Handbook (Springer, Berlin), 73–105.Crossref, Google Scholar
- (2019) Beyond A/B testing: Multi-armed bandit experiments. Medium (April 4), https://medium.com/data-science/beyond-a-b-testing-multi-armed-bandit-experiments-1493f709f804.Google Scholar
- (2021) Low-rank generalized linear bandit problems. Banerjee A, Fukumizu K, eds. Proc. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 460–468.Google Scholar
- (2018) Efficient online recommendation via low-rank ensemble sampling. Pera S, Rendle S, Vlachos M, eds. Proc. 12th ACM Conf. Recommender Systems (ACM, New York), 460–464.Google Scholar
- (1989) Multiproduct monopoly, commodity bundling, and correlation of values. Quart. J. Econom. 104(2):371–383.Crossref, Google Scholar
- (2009) A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automated Control 54(12):2787–2802.Crossref, Google Scholar
- (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.Crossref, Google Scholar
- (2015) A UCB-like strategy of collaborative filtering. Bui TD, Ho TB, eds. Proc. Asian Conf. Machine Learn. (PMLR, New York), 315–329.Google Scholar
- (2023) Online low rank matrix completion. Proc. 11th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2018) A tutorial on thompson sampling. Foundations Trends® Machine Learn. 11(1):1–96. Crossref, Google Scholar
- (2023) Overcoming the long horizon barrier for sample-efficient reinforcement learning with latent low-rank structure. Proc. ACM Measurment Anal. Comput. Systems 7(2):1–60.Crossref, Google Scholar
- (1984) Gaussian demand and commodity bundling. J. Bus. 57(1):S211–S230.Crossref, Google Scholar
- (1963) United States v. Loew’s Inc.: A note on block-booking. Supreme Court Rev. 1963:152–157.Crossref, Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- (2020) Solving Bernoulli rank-one bandits with unimodal Thompson sampling. Algorithmic Learning Theory (PMLR, New York), 862–889.Google Scholar
- (2010) Introduction to the non-asymptotic analysis of random matrices. Preprint, submitted November 12, https://arxiv.org/abs/1011.3027.Google Scholar
- (2017) Dropoutnet: Addressing cold start in recommender systems. Adv. Neural Inform. Processing Systems 30:4957–4966.Google Scholar
- (2009) Algorithms for infinitely many-armed bandits. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Adv. Neural Inform. Processing Systems, vol. 21 (Curran Associates, Inc., Red Hook, NY), 1729–1736.Google Scholar
- (2023) Entry-specific bounds for low-rank matrix completion under highly non-uniform sampling. Proc. IEEE Internat. Sympos. Inform. Theory (IEEE, Piscataway, NJ), 2625–2630.Google Scholar
- (2021) Multitask learning and bandits via robust statistics. Management Sci. 71(9):7752–7773.Google Scholar
- (2023) Cold start to improve market thickness on online advertising platforms: Data-driven algorithms and field experiments. Management Sci. 69(7):3838–3860.Link, Google Scholar
- (2014) Addressing cold start in recommender systems: A semi-supervised co-training algorithm. Berkovsky S, Bennett PN, Murdock V, Moffat A, eds. Proc. 37th Internat. ACM SIGIR Conf. Res. Development Informat. Retrieval (ACM, New York), 73–82.Google Scholar
- (2022) Netease cloud music data. Manufacturing Service Oper. Management 24(1):275–284.Link, Google Scholar
- (2025) Stochastic low-rank tensor bandits for multi-dimensional online decision making. J. Amer. Statist. Assoc. 120(549):198–211.Google Scholar
- (2022) Learning markov models via low-rank optimization. Oper. Res. 70(4):2384–2398.Link, Google Scholar

