Learning and Optimization with Seasonal Patterns

Published Online:https://doi.org/10.1287/opre.2023.0017

References

  • Allesiardo R, Féraud R (2015) Exp3 with drift detection for the switching bandit problem. 2015 IEEE Internat. Conf. Data Science Adv. Anal. (IEEE, Piscataway, NJ) 1–7.Google Scholar
  • Allesiardo R, Féraud R, Maillard OA (2017) The non-stationary stochastic multi-armed bandit problem. Internat. J. Data Sci. Anal. 3(4):267–283.CrossrefGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
  • Auer P, Gajane P, Ortner R (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Beygelzimer A, Hsu D, eds. Proc. 32nd Conf. Learn. Theory, vol. 99 (PMLR, New York), 138–158.Google Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Bartlett MS (1948) Smoothing periodograms from time-series with continuous spectra. Nature 161(4096):686–687.CrossrefGoogle Scholar
  • Bartlett MS (1963) The spectral analysis of point processes. J. Roy. Statist. Soc. B 25(2):264–281.CrossrefGoogle Scholar
  • Besbes O, Sauré D (2014) Dynamic pricing strategies in the presence of demand shifts. Manufacturing Service Oper. Management 16(4):513–528.LinkGoogle Scholar
  • Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2019) Optimal exploration–exploitation in a multi-armed bandit problem with non-stationary rewards. Stochastic Systems 9(4):319–337.LinkGoogle Scholar
  • Brigham EO (1988) The Fast Fourier Transform and Its Applications (Prentice-Hall, Inc., Hoboken, NJ).Google Scholar
  • Brillinger DR (1969) Asymptotic properties of spectral estimates of second order. Biometrika 56(2):375–390.CrossrefGoogle Scholar
  • Brown L, Gans N, Mandelbaum A, Sakov A, Shen H, Zeltyn S, Zhao L (2005) Statistical analysis of a telephone call center: A queueing-science perspective. J. Amer. Statist. Assoc. 100(469):36–50.CrossrefGoogle Scholar
  • Cai H, Cen Z, Leng L, Song R (2021) Periodic-GP: Learning periodic world with gaussian process bandits. Preprint, submitted May 30, https://arxiv.org/abs/2105.14422.Google Scholar
  • Chen N, Lee DKK, Negahban SN (2019a) Super-resolution estimation of cyclic arrival rates. Ann. Statist. 47(3):1754–1775.CrossrefGoogle Scholar
  • Chen N, Gurlek R, Lee D, Shen H (2022) Can customer arrival rates be modelled by sine waves? Service Sci. Forthcoming.Google Scholar
  • Chen Y, Wen Z, Xie Y (2019b) Dynamic pricing in an evolving and unknown marketplace. Preprint, submitted May 5, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3382957.Google Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2012) Hedging the drift: Learning to optimize under non-stationarity. Management Sci. 68(3):1696–1713.LinkGoogle Scholar
  • Christensen BJ, Nielsen MO (2006) Asymptotic normality of narrow-band least squares in the stationary fractional cointegration model and volatility forecasting. J. Econometrics 133(1):343–371.CrossrefGoogle Scholar
  • den Boer AV (2015a) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.CrossrefGoogle Scholar
  • den Boer AV (2015b) Tracking the market: Dynamic pricing and learning in a changing environment. Eur. J. Oper. Res. 247(3):914–927.CrossrefGoogle Scholar
  • Di Benedetto G, Bellini V, Zappella G (2020) A linear bandit for seasonal environments. Preprint, submitted April 28, https://arxiv.org/abs/2004.13576.Google Scholar
  • Gong XY, Simchi-Levi D (2022) Bandits atop reinforcement learning: Tackling online inventory models with cyclic demands. Management Sci. Forthcoming.Google Scholar
  • Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
  • Karnin ZS, Anava O (2016) Multi-armed bandits: Competing with optimal sequences. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Keskin NB, Li M (2020) Selling quality-differentiated products in a Markovian market with unknown transition probabilities. Preprint, submitted November 28, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3526568.Google Scholar
  • Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.LinkGoogle Scholar
  • Kleinberg R (2004) Nearly tight bounds for the continuum-armed bandit problem. Saul L, Weiss Y, Bottou L, eds. Advances in Neural Information Processing Systems, vol. 17 (MIT Press, Cambridge, MA).Google Scholar
  • Levine N, Crammer K, Mannor S (2017) Rotting bandits. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR.org, Sydney NSW Australia), 2071–2080.Google Scholar
  • Liu F, Lee J, Shroff N (2018) A change-detection based framework for piecewise-stationary multi-armed bandit problem. Proc. AAAI Conf. Artificial Intelligence (AAAI Press, New Orleans Louisiana).Google Scholar
  • Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, vol. 75 (PMLR, New York), 1739–1776.Google Scholar
  • Lykouris T, Mirrokni V, Leme RP (2020) Bandits with adversarial scaling. Hal Daumé III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (JMLR.org), 6511–6521.Google Scholar
  • Mao W, Zhang K, Zhu R, Simchi-Levi D, Basar T (2021) Near-optimal model-free reinforcement learning in non-stationary episodic MDPs. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (JMLR.org), 7447–7458.Google Scholar
  • Olshen RA (1967) Asymptotic properties of the periodogram of a discrete stationary process. J. Appl. Probab. 4(3):508–528.CrossrefGoogle Scholar
  • Raj V, Kalyani S (2017) Taming non-stationary bandits: A Bayesian approach. Preprint, submitted July 31, https://arxiv.org/abs/1707.09727.Google Scholar
  • Shao N, Lii KS (2011) Modelling non-homogeneous Poisson processes with almost periodic intensity functions. J. Roy. Statist. Soc. Ser. B. Statist. Methodology 73(1):99–122.CrossrefGoogle Scholar
  • Shao X, Wu WB (2007) Asymptotic spectral theory for nonlinear time series. Ann. Statist. 35(4):1773–1801.CrossrefGoogle Scholar
  • Stoica P, Moses R (2005) Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, NJ).Google Scholar
  • Tracà S, Rudin C, Yan W (2021) Regulating greed over time in multi-armed bandits. J. Machine Learn. Res. 22:1–99.Google Scholar
  • Valko M, Korda N, Munos R, Flaounas I, Cristianini N (2013) Finite-time analysis of kernelised contextual bandits. Proc. 29th Conf. Uncertainty Artificial Intelligence (AUAI Press, Bellevue, WA), 654–663.Google Scholar
  • Vere-Jones D (1982) On the estimation of frequency in point-process data. J. Appl. Probab. 19(A):383–394.CrossrefGoogle Scholar
  • Villamediana J, Küster I, Vila N (2019) Destination engagement on Facebook: Time and seasonality. Ann. Tourism Res. 79:102747.CrossrefGoogle Scholar
  • Webb M, Coppe V, Huybrechs D (2020) Pointwise and uniform convergence of Fourier extensions. Constructive Approximation 52(1):139–175.CrossrefGoogle Scholar
  • Xu L, Jiang C, Qian Y, Zhao Y, Li J, Ren Y (2016) Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards. IEEE Trans. Inform. Forensics Security 12(2):271–285.CrossrefGoogle Scholar
  • Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.LinkGoogle Scholar
  • Zhou X, Xiong Y, Chen N, Gao X (2021) Regime switching bandits. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Advances in Neural Information Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 4542–4554.Google Scholar
  • Zhou Z, Xu R, Blanchet J (2019) Learning in generalized linear contextual bandits with stochastic delays. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Zhu F, Zheng Z (2020) When demands evolve larger and noisier: Learning and earning in a growing environment. Hal Daumé III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 11629–11638.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.