A Linear Response Bandit Problem
Published Online:26 Aug 2013https://doi.org/10.1287/11-SSY032
References
- (2002). Using confidence bounds for exploitation–exploration trade–offs. J. Mach. Learn. Res. 3, 397–422. MR1984023Google Scholar
- (2002a). Finite time analysis of the multiarmed bandit problem. Machine learning 47, 235–256.Google Scholar
- (2002b). The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 48–77. MR1954855Google Scholar
- (1985). Bandit Problems. Chapman and Hall, London. MR0813698Google Scholar
- (2006). Prediction, Learning and Games. Cambridge University Press, Cambridge. MR2409394Google Scholar
- (1995). Response surface bandits. J. Roy. Statist. Soc. Ser. B 57, 771–784. MR1354081Google Scholar
- (1995). Applications of the Van Trees inequality: A Bayesian Cramer-Rao bound. Bernoulli 1, 59–79. MR1354456Google Scholar
- (1989). Multi-Armed Bandit Allocation Indices. Wiley-Interscience Series in Systems and Optimization. John Wiley & Sons, Chichester. MR0996417Google Scholar
- (2009). Woodroofe’s one–armed bandit problem revisited. Ann. Appl. Probab. 19, 1603–1633. MR2538082Google Scholar
- (2011). A note on performance limitations in bandit problems with side information. IEEE Trans. Inf. Theory 57, 1707–1713. MR2815844Google Scholar
- (2000). Dynamic customization of marketing messages in interactive media. Research Paper No. 1664, Research Paper Series, Graduate School of Business, Stanford University. Available at https://gsbapps.stanford.edu/researchpapers/Library/RP1664.pdf.Google Scholar
- (2008). Gap–free bounds for stochastic multi–armed bandit. IFAC World Congress, 2008.Google Scholar
- (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15, 1091–1114. MR0902248Google Scholar
- (1988). Asymptotic solutions of bandit problems. Stochastic Differential Systems, Stochastic Control Theory and Applications (Minneapolis, Minn., 1986), 275–292, IMA Vol. Math. Appl., 10, Springer, New York. MR0934729Google Scholar
- (2001). Sequential analysis: Some classical problems and new challenges. Statist. Sinica 11, 303–408. MR1844531Google Scholar
- (1985). Asymptotically efficient allocation rules. Adv. Applied Math. 6, 4–22. MR0776826Google Scholar
- (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Control 40, 1199–1209. MR1344032Google Scholar
- (2008). The epoch–greedy algorithm for multiarmed bandits with side information. Advances in Neural Information Processing Systems 20, 817–824, Cambridge, MIT Press.Google Scholar
- (2010). Contextual multi–armed bandits. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Available at http://research.google.com/pubs/archive/37042.pdf.Google Scholar
- (2009). A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54, 2787–2802. MR2583719Google Scholar
- (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55, 527–535. MR0050246Google Scholar
- (2010). Linearly parametrized bandits. Math. Oper. Res. 35, 395–411. MR2674726Link, Google Scholar
- (1991). One-armed bandit problems with covariates. Ann. Statist. 19, 1978–2002. MR1135160Google Scholar
- (1990). Matrix Perturbation Theory. Academic Press, Inc., Boston, MA. MR1061154Google Scholar
- (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32, 135–166. MR2051002Google Scholar
- (2005). Bandit problems with side observations. IEEE Trans. Automat. Control 50, 799–806. MR2123095Google Scholar
- (1979). A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74, 799–806. MR0556471Google Scholar
- (1982). Sequential allocation with covariates. Sankhyā Ser. A 44, 403–414. MR0705463Google Scholar
- (2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Annals of Statis. 30, 100–121. MR1892657Google Scholar

