Offline Planning and Online Learning Under Recovering Rewards
References
- (2021) Combinatorial blocking bandits with stochastic delays. Internat. Conf. Machine Learn. (PMLR), 404–413.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Conf. Learn. Theory (PMLR), 138–158.Google Scholar
- (2021) Contextual blocking bandits. Internat. Conf. Artificial Intelligence Statist. (PMLR), 271–279.Google Scholar
- (2019) Blocking bandits. Adv. Neural Inform. Processing Systems 32:4784–4793.Google Scholar
- (2014) Stochastic multi-armed bandit problem with non-stationary rewards. Adv. Neural Inform. Processing Systems 27:199–207.Google Scholar
- (2018) What doubling tricks can and can’t do for multi-armed bandits. Preprint, submitted March 19, https://arxiv.org/abs/1803.06971.Google Scholar
- (2020) Stochastic bandits with delay-dependent payoffs. Internat. Conf. Artificial Intelligence Statist. (PMLR), 1168–1177.Google Scholar
- (2013) Combinatorial multi-armed bandit: General framework and applications. Internat. Conf. Machine Learn. (PMLR), 151–159.Google Scholar
- (2021) Allocation problems in ride-sharing platforms: Online matching with offline reusable resources. ACM Trans. Economics Computation 9(3):1–17.Google Scholar
- (2021) How important will livestreaming be for social commerce in 2021? Marketer (July 01), https://www.emarketer.com/content/how-important-will-livestreaming-social-commerce-2021.Google Scholar
- (2019) Linear programming based online policies for real-time assortment of reusable resources. Chicago Booth Research Paper No. 20-25, University of Chicago Booth School of Business, Chicago.Google Scholar
- (2022) Near-optimal Bayesian online assortment of reusable resources. Proc. 23rd ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 964–965.Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. B 41(2):148–164.Crossref, Google Scholar
- (2022) Online assortment optimization with reusable resources. Management Sci. 68(7):4772–4785.Link, Google Scholar
- (2020a) Online allocation of reusable resources: Achieving optimal competitive ratio. Preprint, submitted February 6, https://arxiv.org/abs/2002.02430.Google Scholar
- (2020b) Online allocation of reusable resources via algorithms guided by fluid approximations. Preprint, submitted October 8, https://arxiv.org/abs/2010.03983.Google Scholar
- (2020) Live streaming e-commerce is the rage in China. Is the U.S. next? Forbes (December 10), https://www.forbes.com/sites/michellegreenwald/2020/12/10/live-streaming-e-commerce-is-the-rage-in-china-is-the-us-next/.Google Scholar
- (1989) The pinwheel: A real-time scheduling problem. Proc. 22nd Hawaii Internat. Conf. System Sci., 693–702.Google Scholar
- (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
- (2020) Livestreams Are the Future of Shopping in America (Bloomberg, New York).Google Scholar
- (2018) Recharging bandits. 59th Annual Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 309–319.Google Scholar
- (2015) Tight regret bounds for stochastic combinatorial semi-bandits. Artificial Intelligence Statist. (PMLR), 535–543.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2010) Provably near-optimal LP-based policies for revenue management in systems with reusable resources. Oper. Res. 58(2):503–507.Link, Google Scholar
- (2017) Rotting bandits. Adv. Neural Inform. Processing Systems 30:3074–3083.Google Scholar
- (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.Link, Google Scholar
- (2018) Price and assortment optimization for reusable resources. Preprint, submitted November 16, 2017, https://dx.doi.org/10.2139/ssrn.3070625.Google Scholar
- (2021) Recurrent submodular welfare and matroid blocking bandits. Adv. Neural Inform. Processing Systems 34:23334–23346.Google Scholar
- (2019) Recovering bandits. Adv. Neural Inform. Processing Systems 32:14122–14131.Google Scholar
- (2020) Dynamic assortment optimization for reusable products with random usage durations. Management Sci. 66(7):2820–2844.Link, Google Scholar
- (2009) Periodic scheduling with obligatory vacations. Theoret. Comput. Sci. 410(47–49):5112–5121.Crossref, Google Scholar
- (2024) JD.com: Transaction level data for the 2020 MSOM data driven research challenge. Manufacturing Service Oper. Management 26(1):2–10.Link, Google Scholar
- (2019) Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, vol. 12 (1–2) (Now Publishers Inc., Hanover, MA), 1–286.Google Scholar
- (2023) Revenue management of a professional services firm with quality revelation. Oper. Res. 71(4):1260–1276.Link, Google Scholar
- (2022) Periodic reranking for online matching of reusable resources. Proc. 23rd ACM Conf. Economics Comput. (Association for Computing Machinery, New York), 966.Google Scholar
- (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25:287–298.Crossref, Google Scholar
- (2020) A sleeping, recovering bandit algorithm for optimizing recurring notifications. Proc. 26th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3008–3016.Google Scholar
- (2020) When demands evolve larger and noisier: Learning and earning in a growing environment. Internat. Conf. Machine Learn. (PMLR), 11629–11638.Google Scholar

