The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Published Online:1 Feb 2012https://doi.org/10.1287/opre.1110.0999
References
- Finite-time analysis of the multiarmed bandit problem. Machine Learn. (2002) 47(2–3):235–256Crossref, Google Scholar
- Optimal designs for clinical trials with dichotomous responses. Statist. Medicine (1985) 4(4):497–508Crossref, Google Scholar
- Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. (2000) 48(1):80–90Link, Google Scholar
- Incomplete learning from endogenous data in dynamic allocation. Econometrica (2000) 68(6):1511–1516Crossref, Google Scholar
- Optimal learning and experimentation in bandit problems. J. Econom. Dynam. Control (2002) 27(1):87–108Crossref, Google Scholar
- Economic analysis of simulation selection problems. Management Sci. (2009) 55(3):421–437Link, Google Scholar
- Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. (2010) 22(1):71–80Link, Google Scholar
- Optimal Statistical Decisions (1970) (John Wiley & Sons, Hoboken, NJ) Google Scholar
- Q-learning for bandit problems. Proc. 12th Internat. Conf. Machine Learn. (1995) (Morgan Kaufmann, Tahoe City, CA) 209–217Crossref, Google Scholar
- Contributions to the two-armed bandit problem. Ann. Math. Statist. (1962) 33(3):847–856Crossref, Google Scholar
- A knowledge gradient policy for sequential information collection. SIAM J. Control Optim. (2008) 47(5):2410–2439Crossref, Google Scholar
- The knowledge-gradient policy for correlated normal rewards. INFORMS J. Comput. (2009) 21(4):599–613Link, Google Scholar
- Response surface bandits. J. Royal Statist. Soc. (1995) B57(4):771–784Google Scholar
- Multi-Armed Bandit Allocation Indices (1989) (John Wiley & Sons, New York) Google Scholar
- , Gani J. A dynamic allocation index for the sequential design of experiments. Progress in Statistics (1974) (North-Holland, Amsterdam) 244–266Google Scholar
- A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika (1979) 66(3):561–565Crossref, Google Scholar
- The ratio index for budgeted learning, with applications. Proc. 19th Annual ACM-SIAM Sympos. Discrete Algorithms (2009) (SIAM, Philadelphia) 18–27Crossref, Google Scholar
- Bayesian look ahead one stage sampling allocations for selecting the largest normal mean. Statist. Papers (1994) 35(1):169–177Crossref, Google Scholar
- Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Statist. Planning Inference (1996) 54(2):229–244Crossref, Google Scholar
- Learning in Embedded Systems (1993) (MIT Press, Cambridge, MA) Crossref, Google Scholar
- The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. (1987) 12(2):262–268Link, Google Scholar
- Further contributions to the “two-armed bandit” problem. Ann. Statist. (1985) 13(1):418–422Crossref, Google Scholar
- Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. (1987) 15(3):1091–1114Crossref, Google Scholar
- Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. (1985) 6(1):4–22Crossref, Google Scholar
- , Hero A. O., Castanon D. A., Cochran D., Kastella K. Multi-armed bandit problems. Foundations and Applications of Sensor Management (2008) (Springer, New York) 121–152Crossref, Google Scholar
- A structured multiarmed bandit problem and the greedy policy. Proc. 47th IEEE Conf. Decision and Control (2008) (IEEE, Piscataway, NJ) 4945–4950Crossref, Google Scholar
- A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control (2009) 54(12):2787–2802Crossref, Google Scholar
- Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (2007) (New York)721–728ACM International Proceedings SeriesCrossref, Google Scholar
- Approximate Dynamic Programming: Solving the Curses of Dimensionality (2007) (John Wiley & Sons, New York) Crossref, Google Scholar
- , Rosetti M. D., Hill R. R., Johansson B., Dunkin A., Ingalls R. G. A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies. Proc. 2009 Winter Simulation Conf. (2009a) (IEEE, Piscataway, NJ) 1492–1502Crossref, Google Scholar
- The knowledge gradient algorithm for online subset selection. Proc. 2009 IEEE Sympos. Adaptive Dynam. Programming and Reinforcement Learn. (2009b) (IEEE, Piscataway, NJ) 137–144Crossref, Google Scholar
- Information collection on a graph. Oper. Res. (2011) 59(1):188–201Link, Google Scholar
- Stochastic Calculus and Financial Applications (2000) (Springer, New York) Google Scholar
- Reinforcement Learning (1998) (MIT Press, Cambridge, MA) Google Scholar
- , Mozer M. C., Jordan M. I., Pesche T. On-line policy improvement using Monte Carlo search. Advances in Neural Information Processing Systems (1996) 9(MIT Press, Cambridge, MA) 1068–1074Google Scholar
- , Platt J. C., Koller D., Singer Y., Roweis S. Optimistic linear programming gives logarithmic regret for irreducible MDPs. Advances in Neural Information Processing Systems (2007) 20(MIT Press, Cambridge, MA) 1505–1512Google Scholar
- Multi-armed bandit algorithms and empirical evaluation. Proc. 16th Eur. Conf. Machine Learn. (2005) (Springer-Verlag, Berlin) 437–448Crossref, Google Scholar
- , Hero A. O., Castanon D. A., Cochran D., Kastella K. Applications of multi-armed bandits to sensor management. Foundations and Applications of Sensor Management (2008) (Springer, New York) 153–176Crossref, Google Scholar
- Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. (1980) B42(2):143–149Google Scholar
- , Ho H., Ing C., Lai T. Some results on the Gittins index for a normal reward process. Time Series and Related Topics: In Memory of Ching-Zong Wei (2006) (Institute of Mathematical Statistics, Beachwood, OH) 284–294Crossref, Google Scholar

