Linearly Parameterized Bandits
Published Online:30 Apr 2010https://doi.org/10.1287/moor.1100.0446
References
- Associative reinforcement learning using linear probabilistic concepts. Proc. 16th Internat. Conf. Machine Learn. (1999) (Morgan Kaufman, San Francisco) 3–11Google Scholar
- Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. (1995) 27(4):1054–1078Crossref, Google Scholar
- Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automatic Control (1989) 34(3):258–267Crossref, Google Scholar
- Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. (2002) 3(3):397–422Google Scholar
- Finite-time analysis of the multi-armed bandit problem. Machine Learn. (2002) 47(2):235–256Crossref, Google Scholar
- Bandit Problems: Sequential Allocation of Experiments (1985) (Chapman and Hall, London) Crossref, Google Scholar
- Dynamic Programming and Optimal Controls (1995) 1(Athena Scientific, Belmont, MA) Google Scholar
- Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
- Introduction to Linear Optimization (1997) (Athena Scientific, Belmont, MA) Google Scholar
- Multidimensional stochastic approximation methods. Ann. Math. Statist. (1954) 25(4):737–744Crossref, Google Scholar
- General bounds and finite-time performance improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. (2009) . Working paper, Columbia Graduate School of Business, New YorkGoogle Scholar
- Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT 2008) (2008a) (Helsinki, Finland) 355–366Google Scholar
- Stochastic linear optimization under bandit feedback. (2008b) . Working paper, University of Chicago, Chicago. http://ttic.uchicago.edu/∼sham/papers/ml/bandit_linear_long.pdfGoogle Scholar
- Contributions to the “two-armed bandit” problem. Ann. Math. Statist. (1962) 33(3):847–856Crossref, Google Scholar
- A new positive definite geometric mean of two positive definite matrices. Linear Algebra Its Appl. (1997) 251(1):1–20Crossref, Google Scholar
- Response surface bandits. J. Roy. Statist. Soc. Ser. B (Methodological) (1995) 57(4):771–784Google Scholar
- Performance limitations in bandit problems with side observations. (2008) . Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New YorkGoogle Scholar
- Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. (2009) 19(4):1603–1633Crossref, Google Scholar
- Further contributions to the “two-armed bandit” problem. Ann. Statist. (1985) 13(1):418–422Crossref, Google Scholar
- Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. (1952) 23(3):462–466Crossref, Google Scholar
- Stochastic approximation (invited paper). Ann. Statist. (2003) 31(2):391–406Crossref, Google Scholar
- Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. (1987) 15(3):1091–1114Crossref, Google Scholar
- Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. (1985) 6(1):4–22Crossref, Google Scholar
- A structured multi-armed bandit problem and the greedy policy. IEEE Trans. Automatic Control (2009) 54(12):2787–2802Crossref, Google Scholar
- Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (2007) Corvallis, OR:721–728Crossref, Google Scholar
- Strongly convex analysis. Sbornik: Math. (1996) 187(2):259–286Crossref, Google Scholar
- Sequential Control with Incomplete Information (1990) (Academic Press, London) Google Scholar
- Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (1952) 58(5):527–535Crossref, Google Scholar
- A stochastic approximation method. Ann. Math. Statist. (1951) 22(3):400–407Crossref, Google Scholar
- Linearly parameterized bandits (extended version). (2010) . http://arxiv.org/abs/0812.3465Google Scholar
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika (1933) 25(3):285–294Crossref, Google Scholar
- Bandit problems with side observations. IEEE Trans. Automatic Control (2005a) 50(3):338–355Crossref, Google Scholar
- Arbitrary side observations in bandit problems. Adv. Appl. Math. (2005b) 34(4):903–938Crossref, Google Scholar

