Open Access

A Linear Response Bandit Problem

Alexander Goldenshluger
Alexander Goldenshluger
[email protected]
Department of Statistics, University of Haifa, Haifa 31905, Israel
Search for more papers by this author
,
Assaf Zeevi
Assaf Zeevi
[email protected]
Graduate School of Business, Columbia University New York, NY 10027, USA
Search for more papers by this author

Alexander Goldenshluger

[email protected]

Department of Statistics, University of Haifa, Haifa 31905, Israel

Search for more papers by this author

Assaf Zeevi

[email protected]

Graduate School of Business, Columbia University New York, NY 10027, USA

Search for more papers by this author

Published Online:26 Aug 2013https://doi.org/10.1287/11-SSY032

References

Auer, P. (2002). Using confidence bounds for exploitation–exploration trade–offs. J. Mach. Learn. Res. 3, 397–422. MR1984023Google Scholar
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002a). Finite time analysis of the multiarmed bandit problem. Machine learning 47, 235–256.Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. (2002b). The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 48–77. MR1954855Google Scholar
Berry, D. A. and Fristedt, B. (1985). Bandit Problems. Chapman and Hall, London. MR0813698Google Scholar
Cesa–Bianchi, N. and Lugosi, G. (2006). Prediction, Learning and Games. Cambridge University Press, Cambridge. MR2409394Google Scholar
Ginebra, J. and Clayton, M. K. (1995). Response surface bandits. J. Roy. Statist. Soc. Ser. B 57, 771–784. MR1354081Google Scholar
Gill, R. D. and Levit, B. Y. (1995). Applications of the Van Trees inequality: A Bayesian Cramer-Rao bound. Bernoulli 1, 59–79. MR1354456Google Scholar
Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Chichester. MR0996417Google Scholar
Goldenshluger, A. and Zeevi, A. (2009). Woodroofe’s one–armed bandit problem revisited. Ann. Appl. Probab. 19, 1603–1633. MR2538082Google Scholar
Goldenshluger, A. and Zeevi, A. (2011). A note on performance limitations in bandit problems with side information. IEEE Trans. Inf. Theory 57, 1707–1713. MR2815844Google Scholar
Gooley, C. and Lattin, J. (2000). Dynamic customization of marketing messages in interactive media. Research Paper No. 1664, Research Paper Series, Graduate School of Business, Stanford University. Available at https://gsbapps.stanford.edu/researchpapers/Library/RP1664.pdf.Google Scholar
Juditsky, A., Nazin, A., Tsybakov, A., and Vayatis, N. (2008). Gap–free bounds for stochastic multi–armed bandit. IFAC World Congress, 2008.Google Scholar
Lai, T. L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15, 1091–1114. MR0902248Google Scholar
Lai, T. L. (1988). Asymptotic solutions of bandit problems. Stochastic Differential Systems, Stochastic Control Theory and Applications (Minneapolis, Minn., 1986), 275–292, IMA Vol. Math. Appl., 10, Springer, New York. MR0934729Google Scholar
Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges. Statist. Sinica 11, 303–408. MR1844531Google Scholar
Lai, T. L. and Robbins, H. (1985). Asymptotically efficient allocation rules. Adv. Applied Math. 6, 4–22. MR0776826Google Scholar
Lai, T. L. and Yakowitz, S. (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Control 40, 1199–1209. MR1344032Google Scholar
Langford, J. and Zhang, T. (2008). The epoch–greedy algorithm for multiarmed bandits with side information. Advances in Neural Information Processing Systems 20, 817–824, Cambridge, MIT Press.Google Scholar
Lu, T., Pál, D., and Pál, M. (2010). Contextual multi–armed bandits. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Available at http://research.google.com/pubs/archive/37042.pdf.Google Scholar
Mersereau, A. J., Rusmevichientong, P. and Tsitsiklis, J. N. (2009). A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54, 2787–2802. MR2583719Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55, 527–535. MR0050246Google Scholar
Rusmevichientong, P. and Tsitsiklis, J. N. (2010). Linearly parametrized bandits. Math. Oper. Res. 35, 395–411. MR2674726Link, Google Scholar
Sarkar, J. (1991). One-armed bandit problems with covariates. Ann. Statist. 19, 1978–2002. MR1135160Google Scholar
Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Inc., Boston, MA. MR1061154Google Scholar
Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32, 135–166. MR2051002Google Scholar
Wang, C.-C., Kulkarni, S., and Poor, V. H. (2005). Bandit problems with side observations. IEEE Trans. Automat. Control 50, 799–806. MR2123095Google Scholar
Woodroofe, M. (1979). A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74, 799–806. MR0556471Google Scholar
Woodroofe, M. (1982). Sequential allocation with covariates. Sankhyā Ser. A 44, 403–414. MR0705463Google Scholar
Yang, Y. and Zhu, D. (2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Annals of Statis. 30, 100–121. MR1892657Google Scholar

Volume 3, Issue 1

June 2013

Pages 1-321

Article Information

Metrics

Information

Received:August 01, 2011
Published Online:August 26, 2013

Cite as

Alexander Goldenshluger, Assaf Zeevi (2013) A Linear Response Bandit Problem. Stochastic Systems 3(1):230-261.

https://doi.org/10.1287/11-SSY032

Keywords

PDF download

Available Issues

Available Issues

Available Issues

A Linear Response Bandit Problem

References

Volume 3, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News