Linearly Parameterized Bandits

Published Online:https://doi.org/10.1287/moor.1100.0446

References

  • Abe N., Long P. M. Associative reinforcement learning using linear probabilistic concepts. Proc. 16th Internat. Conf. Machine Learn. (1999) (Morgan Kaufman, San Francisco) 3–11Google Scholar
  • Agrawal R. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. (1995) 27(4):1054–1078CrossrefGoogle Scholar
  • Agrawal R., Teneketzis D., Anantharam V. Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automatic Control (1989) 34(3):258–267CrossrefGoogle Scholar
  • Auer P. Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. (2002) 3(3):397–422Google Scholar
  • Auer P., Cesa-Bianchi N., Fischer P. Finite-time analysis of the multi-armed bandit problem. Machine Learn. (2002) 47(2):235–256CrossrefGoogle Scholar
  • Berry D., Fristedt B.Bandit Problems: Sequential Allocation of Experiments (1985) (Chapman and Hall, London) CrossrefGoogle Scholar
  • Bertsekas D.Dynamic Programming and Optimal Controls (1995) 1(Athena Scientific, Belmont, MA) Google Scholar
  • Bertsekas D., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
  • Bertsimas D., Tsitsiklis J. N.Introduction to Linear Optimization (1997) (Athena Scientific, Belmont, MA) Google Scholar
  • Blum J. R. Multidimensional stochastic approximation methods. Ann. Math. Statist. (1954) 25(4):737–744CrossrefGoogle Scholar
  • Cicek D., Broadie M., Zeevi A. General bounds and finite-time performance improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. (2009) . Working paper, Columbia Graduate School of Business, New YorkGoogle Scholar
  • Dani V., Hayes T. P., Kakade S. M. Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT 2008) (2008a) (Helsinki, Finland) 355–366Google Scholar
  • Dani V., Hayes T. P., Kakade S. M. Stochastic linear optimization under bandit feedback. (2008b) . Working paper, University of Chicago, Chicago. http://ttic.uchicago.edu/∼sham/papers/ml/bandit_linear_long.pdfGoogle Scholar
  • Feldman D. Contributions to the “two-armed bandit” problem. Ann. Math. Statist. (1962) 33(3):847–856CrossrefGoogle Scholar
  • Fiedler M., Pták V. A new positive definite geometric mean of two positive definite matrices. Linear Algebra Its Appl. (1997) 251(1):1–20CrossrefGoogle Scholar
  • Ginebra J., Clayton M. K. Response surface bandits. J. Roy. Statist. Soc. Ser. B (Methodological) (1995) 57(4):771–784Google Scholar
  • Goldenshluger A., Zeevi A. Performance limitations in bandit problems with side observations. (2008) . Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New YorkGoogle Scholar
  • Goldenshluger A., Zeevi A. Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. (2009) 19(4):1603–1633CrossrefGoogle Scholar
  • Keener R. Further contributions to the “two-armed bandit” problem. Ann. Statist. (1985) 13(1):418–422CrossrefGoogle Scholar
  • Kiefer J., Wolfowitz J. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. (1952) 23(3):462–466CrossrefGoogle Scholar
  • Lai T. Stochastic approximation (invited paper). Ann. Statist. (2003) 31(2):391–406CrossrefGoogle Scholar
  • Lai T. L. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. (1987) 15(3):1091–1114CrossrefGoogle Scholar
  • Lai T. L., Robbins H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. (1985) 6(1):4–22CrossrefGoogle Scholar
  • Mersereau A. J., Rusmevichientong P., Tsitsiklis J. N. A structured multi-armed bandit problem and the greedy policy. IEEE Trans. Automatic Control (2009) 54(12):2787–2802CrossrefGoogle Scholar
  • Pandey S., Chakrabarti D., Agrawal D. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (2007) Corvallis, OR:721–728CrossrefGoogle Scholar
  • Polovinkin E. S. Strongly convex analysis. Sbornik: Math. (1996) 187(2):259–286CrossrefGoogle Scholar
  • Pressman E. L., Sonin I. N.Sequential Control with Incomplete Information (1990) (Academic Press, London) Google Scholar
  • Robbins H. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (1952) 58(5):527–535CrossrefGoogle Scholar
  • Robbins H., Monro S. A stochastic approximation method. Ann. Math. Statist. (1951) 22(3):400–407CrossrefGoogle Scholar
  • Rusmevichientong P., Tsitsiklis J. N. Linearly parameterized bandits (extended version). (2010) . http://arxiv.org/abs/0812.3465Google Scholar
  • Thompson W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika (1933) 25(3):285–294CrossrefGoogle Scholar
  • Wang C.-C., Kulkarni S. R., Poor H. V. Bandit problems with side observations. IEEE Trans. Automatic Control (2005a) 50(3):338–355CrossrefGoogle Scholar
  • Wang C.-C., Kulkarni S. R., Poor H. V. Arbitrary side observations in bandit problems. Adv. Appl. Math. (2005b) 34(4):903–938CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.