Learning and Optimization with Seasonal Patterns

Ningyuan Chen
Corresponding Author
Ningyuan Chen
[email protected]
https://orcid.org/0000-0002-3948-1011
Department of Management, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada; and Rotman School of Management, University of Toronto, Canada, Toronto, Ontario M5S 3E6, Canada
Search for more papers by this author
,
Chun Wang
Chun Wang
[email protected]
https://orcid.org/0000-0002-0739-2022
School of Economics and Management, Tsinghua University, Beijing 100190, China
Search for more papers by this author
,
Longlin Wang
Longlin Wang
[email protected]
School of Economics and Management, Tsinghua University, Beijing 100190, China; and Department of Statistics, Harvard University, Cambridge, Massachusetts 02138
Search for more papers by this author

Corresponding Author

Ningyuan Chen

Department of Management, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, Canada; and Rotman School of Management, University of Toronto, Canada, Toronto, Ontario M5S 3E6, Canada

Search for more papers by this author

Chun Wang

[email protected]

https://orcid.org/0000-0002-0739-2022

School of Economics and Management, Tsinghua University, Beijing 100190, China

Search for more papers by this author

Longlin Wang

[email protected]

School of Economics and Management, Tsinghua University, Beijing 100190, China; and Department of Statistics, Harvard University, Cambridge, Massachusetts 02138

Search for more papers by this author

Published Online:4 Oct 2023https://doi.org/10.1287/opre.2023.0017

References

Allesiardo R, Féraud R (2015) Exp3 with drift detection for the switching bandit problem. 2015 IEEE Internat. Conf. Data Science Adv. Anal. (IEEE, Piscataway, NJ) 1–7.Google Scholar
Allesiardo R, Féraud R, Maillard OA (2017) The non-stationary stochastic multi-armed bandit problem. Internat. J. Data Sci. Anal. 3(4):267–283.Crossref, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
Auer P, Gajane P, Ortner R (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Beygelzimer A, Hsu D, eds. Proc. 32nd Conf. Learn. Theory, vol. 99 (PMLR, New York), 138–158.Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Bartlett MS (1948) Smoothing periodograms from time-series with continuous spectra. Nature 161(4096):686–687.Crossref, Google Scholar
Bartlett MS (1963) The spectral analysis of point processes. J. Roy. Statist. Soc. B 25(2):264–281.Crossref, Google Scholar
Besbes O, Sauré D (2014) Dynamic pricing strategies in the presence of demand shifts. Manufacturing Service Oper. Management 16(4):513–528.Link, Google Scholar
Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2019) Optimal exploration–exploitation in a multi-armed bandit problem with non-stationary rewards. Stochastic Systems 9(4):319–337.Link, Google Scholar
Brigham EO (1988) The Fast Fourier Transform and Its Applications (Prentice-Hall, Inc., Hoboken, NJ).Google Scholar
Brillinger DR (1969) Asymptotic properties of spectral estimates of second order. Biometrika 56(2):375–390.Crossref, Google Scholar
Brown L, Gans N, Mandelbaum A, Sakov A, Shen H, Zeltyn S, Zhao L (2005) Statistical analysis of a telephone call center: A queueing-science perspective. J. Amer. Statist. Assoc. 100(469):36–50.Crossref, Google Scholar
Cai H, Cen Z, Leng L, Song R (2021) Periodic-GP: Learning periodic world with gaussian process bandits. Preprint, submitted May 30, https://arxiv.org/abs/2105.14422.Google Scholar
Chen N, Lee DKK, Negahban SN (2019a) Super-resolution estimation of cyclic arrival rates. Ann. Statist. 47(3):1754–1775.Crossref, Google Scholar
Chen N, Gurlek R, Lee D, Shen H (2022) Can customer arrival rates be modelled by sine waves? Service Sci. Forthcoming.Google Scholar
Chen Y, Wen Z, Xie Y (2019b) Dynamic pricing in an evolving and unknown marketplace. Preprint, submitted May 5, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3382957.Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2012) Hedging the drift: Learning to optimize under non-stationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
Christensen BJ, Nielsen MO (2006) Asymptotic normality of narrow-band least squares in the stationary fractional cointegration model and volatility forecasting. J. Econometrics 133(1):343–371.Crossref, Google Scholar
den Boer AV (2015a) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1–18.Crossref, Google Scholar
den Boer AV (2015b) Tracking the market: Dynamic pricing and learning in a changing environment. Eur. J. Oper. Res. 247(3):914–927.Crossref, Google Scholar
Di Benedetto G, Bellini V, Zappella G (2020) A linear bandit for seasonal environments. Preprint, submitted April 28, https://arxiv.org/abs/2004.13576.Google Scholar
Gong XY, Simchi-Levi D (2022) Bandits atop reinforcement learning: Tackling online inventory models with cyclic demands. Management Sci. Forthcoming.Google Scholar
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
Karnin ZS, Anava O (2016) Multi-armed bandits: Competing with optimal sequences. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Keskin NB, Li M (2020) Selling quality-differentiated products in a Markovian market with unknown transition probabilities. Preprint, submitted November 28, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3526568.Google Scholar
Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Link, Google Scholar
Kleinberg R (2004) Nearly tight bounds for the continuum-armed bandit problem. Saul L, Weiss Y, Bottou L, eds. Advances in Neural Information Processing Systems, vol. 17 (MIT Press, Cambridge, MA).Google Scholar
Levine N, Crammer K, Mannor S (2017) Rotting bandits. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR.org, Sydney NSW Australia), 2071–2080.Google Scholar
Liu F, Lee J, Shroff N (2018) A change-detection based framework for piecewise-stationary multi-armed bandit problem. Proc. AAAI Conf. Artificial Intelligence (AAAI Press, New Orleans Louisiana).Google Scholar
Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, vol. 75 (PMLR, New York), 1739–1776.Google Scholar
Lykouris T, Mirrokni V, Leme RP (2020) Bandits with adversarial scaling. Hal Daumé III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (JMLR.org), 6511–6521.Google Scholar
Mao W, Zhang K, Zhu R, Simchi-Levi D, Basar T (2021) Near-optimal model-free reinforcement learning in non-stationary episodic MDPs. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., vol. 139 (JMLR.org), 7447–7458.Google Scholar
Olshen RA (1967) Asymptotic properties of the periodogram of a discrete stationary process. J. Appl. Probab. 4(3):508–528.Crossref, Google Scholar
Raj V, Kalyani S (2017) Taming non-stationary bandits: A Bayesian approach. Preprint, submitted July 31, https://arxiv.org/abs/1707.09727.Google Scholar
Shao N, Lii KS (2011) Modelling non-homogeneous Poisson processes with almost periodic intensity functions. J. Roy. Statist. Soc. Ser. B. Statist. Methodology 73(1):99–122.Crossref, Google Scholar
Shao X, Wu WB (2007) Asymptotic spectral theory for nonlinear time series. Ann. Statist. 35(4):1773–1801.Crossref, Google Scholar
Stoica P, Moses R (2005) Spectral Analysis of Signals (Pearson Prentice Hall, Upper Saddle River, NJ).Google Scholar
Tracà S, Rudin C, Yan W (2021) Regulating greed over time in multi-armed bandits. J. Machine Learn. Res. 22:1–99.Google Scholar
Valko M, Korda N, Munos R, Flaounas I, Cristianini N (2013) Finite-time analysis of kernelised contextual bandits. Proc. 29th Conf. Uncertainty Artificial Intelligence (AUAI Press, Bellevue, WA), 654–663.Google Scholar
Vere-Jones D (1982) On the estimation of frequency in point-process data. J. Appl. Probab. 19(A):383–394.Crossref, Google Scholar
Villamediana J, Küster I, Vila N (2019) Destination engagement on Facebook: Time and seasonality. Ann. Tourism Res. 79:102747.Crossref, Google Scholar
Webb M, Coppe V, Huybrechs D (2020) Pointwise and uniform convergence of Fourier extensions. Constructive Approximation 52(1):139–175.Crossref, Google Scholar
Xu L, Jiang C, Qian Y, Zhao Y, Li J, Ren Y (2016) Dynamic privacy pricing: A multi-armed bandit approach with time-variant rewards. IEEE Trans. Inform. Forensics Security 12(2):271–285.Crossref, Google Scholar
Zhang H, Chao X, Shi C (2018) Perishable inventory systems: Convexity results for base-stock policies and learning algorithms under censored demand. Oper. Res. 66(5):1276–1286.Link, Google Scholar
Zhou X, Xiong Y, Chen N, Gao X (2021) Regime switching bandits. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Advances in Neural Information Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 4542–4554.Google Scholar
Zhou Z, Xu R, Blanchet J (2019) Learning in generalized linear contextual bandits with stochastic delays. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Zhu F, Zheng Z (2020) When demands evolve larger and noisier: Learning and earning in a growing environment. Hal Daumé III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 11629–11638.Google Scholar

Volume 73, Issue 2

March-April 2025

Pages iii-viii, 583-1150, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:June 10, 2020
Accepted:August 28, 2023
Published Online:October 04, 2023

Cite as

Ningyuan Chen; , Chun Wang; , Longlin Wang (2023) Learning and Optimization with Seasonal Patterns. Operations Research 73(2):894-909.

https://doi.org/10.1287/opre.2023.0017

Keywords

Acknowledgments

The authors gratefully thank Professor John Birge (editor-in-chief), Professor Amy Ward (area editor), the anonymous associate editor, and reviewers for their insightful comments and suggestions that considerably improved the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Learning and Optimization with Seasonal Patterns

References

Volume 73, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News