Linearly Parameterized Bandits

Paat Rusmevichientong
Paat Rusmevichientong
[email protected]
School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Search for more papers by this author
,
John N. Tsitsiklis
John N. Tsitsiklis
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Paat Rusmevichientong

[email protected]

School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853

Search for more papers by this author

John N. Tsitsiklis

[email protected]

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:30 Apr 2010https://doi.org/10.1287/moor.1100.0446

References

Abe N., Long P. M. Associative reinforcement learning using linear probabilistic concepts. Proc. 16th Internat. Conf. Machine Learn. (1999) (Morgan Kaufman, San Francisco) 3–11Google Scholar
Agrawal R. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. (1995) 27(4):1054–1078Crossref, Google Scholar
Agrawal R., Teneketzis D., Anantharam V. Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: Finite parameter space. IEEE Trans. Automatic Control (1989) 34(3):258–267Crossref, Google Scholar
Auer P. Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. (2002) 3(3):397–422Google Scholar
Auer P., Cesa-Bianchi N., Fischer P. Finite-time analysis of the multi-armed bandit problem. Machine Learn. (2002) 47(2):235–256Crossref, Google Scholar
Berry D., Fristedt B.Bandit Problems: Sequential Allocation of Experiments (1985) (Chapman and Hall, London) Crossref, Google Scholar
Bertsekas D.Dynamic Programming and Optimal Controls (1995) 1(Athena Scientific, Belmont, MA) Google Scholar
Bertsekas D., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
Bertsimas D., Tsitsiklis J. N.Introduction to Linear Optimization (1997) (Athena Scientific, Belmont, MA) Google Scholar
Blum J. R. Multidimensional stochastic approximation methods. Ann. Math. Statist. (1954) 25(4):737–744Crossref, Google Scholar
Cicek D., Broadie M., Zeevi A. General bounds and finite-time performance improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. (2009) . Working paper, Columbia Graduate School of Business, New YorkGoogle Scholar
Dani V., Hayes T. P., Kakade S. M. Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT 2008) (2008a) (Helsinki, Finland) 355–366Google Scholar
Dani V., Hayes T. P., Kakade S. M. Stochastic linear optimization under bandit feedback. (2008b) . Working paper, University of Chicago, Chicago. http://ttic.uchicago.edu/∼sham/papers/ml/bandit_linear_long.pdfGoogle Scholar
Feldman D. Contributions to the “two-armed bandit” problem. Ann. Math. Statist. (1962) 33(3):847–856Crossref, Google Scholar
Fiedler M., Pták V. A new positive definite geometric mean of two positive definite matrices. Linear Algebra Its Appl. (1997) 251(1):1–20Crossref, Google Scholar
Ginebra J., Clayton M. K. Response surface bandits. J. Roy. Statist. Soc. Ser. B (Methodological) (1995) 57(4):771–784Google Scholar
Goldenshluger A., Zeevi A. Performance limitations in bandit problems with side observations. (2008) . Working paper, Columbia Graduate School of Business, Columbia University Graduate School of Business, New YorkGoogle Scholar
Goldenshluger A., Zeevi A. Woodroofe's one-armed bandit problem revisited. Ann. Appl. Probab. (2009) 19(4):1603–1633Crossref, Google Scholar
Keener R. Further contributions to the “two-armed bandit” problem. Ann. Statist. (1985) 13(1):418–422Crossref, Google Scholar
Kiefer J., Wolfowitz J. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. (1952) 23(3):462–466Crossref, Google Scholar
Lai T. Stochastic approximation (invited paper). Ann. Statist. (2003) 31(2):391–406Crossref, Google Scholar
Lai T. L. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. (1987) 15(3):1091–1114Crossref, Google Scholar
Lai T. L., Robbins H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. (1985) 6(1):4–22Crossref, Google Scholar
Mersereau A. J., Rusmevichientong P., Tsitsiklis J. N. A structured multi-armed bandit problem and the greedy policy. IEEE Trans. Automatic Control (2009) 54(12):2787–2802Crossref, Google Scholar
Pandey S., Chakrabarti D., Agrawal D. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn. (2007) Corvallis, OR:721–728Crossref, Google Scholar
Polovinkin E. S. Strongly convex analysis. Sbornik: Math. (1996) 187(2):259–286Crossref, Google Scholar
Pressman E. L., Sonin I. N.Sequential Control with Incomplete Information (1990) (Academic Press, London) Google Scholar
Robbins H. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (1952) 58(5):527–535Crossref, Google Scholar
Robbins H., Monro S. A stochastic approximation method. Ann. Math. Statist. (1951) 22(3):400–407Crossref, Google Scholar
Rusmevichientong P., Tsitsiklis J. N. Linearly parameterized bandits (extended version). (2010) . http://arxiv.org/abs/0812.3465Google Scholar
Thompson W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika (1933) 25(3):285–294Crossref, Google Scholar
Wang C.-C., Kulkarni S. R., Poor H. V. Bandit problems with side observations. IEEE Trans. Automatic Control (2005a) 50(3):338–355Crossref, Google Scholar
Wang C.-C., Kulkarni S. R., Poor H. V. Arbitrary side observations in bandit problems. Adv. Appl. Math. (2005b) 34(4):903–938Crossref, Google Scholar

Cited by
- Statistical inference for online decision making with Lasso loss function: In a contextual multi-armed bandit setting
  European Journal of Operational Research, Vol. 334, No. 1
- Speed Up the Cold-Start Learning in Two-Sided Bandits with Many Arms
  Mohsen Bayati,
  Junyu Cao,
  Wanning Chen
  26 June 2026 | Management Science, Vol. 0, No. 0
- Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing
  Junhui Cai,
  Ran Chen,
  Martin J. Wainwright,
  Linda Zhao
  9 June 2026 | Management Science, Vol. 0, No. 0
- An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability
- Online Facility Location: Running Stores on Wheels with Spatial Demand Learning
  Junyu Cao,
  Wei Qi,
  Yan Zhang
  2 February 2026 | Manufacturing & Service Operations Management, Vol. 28, No. 3
- Anonymous Linear Bandits for Multi-User Systems
  12 April 2026
- Exploration-exploitation trade-off for continuous-time episodic reinforcement learning with linear-convex models
  The Annals of Applied Probability, Vol. 36, No. 2
- Utility Fairness in Contextual Dynamic Pricing with Demand Learning
  Xi Chen,
  David Simchi-Levi,
  Yining Wang
  4 August 2025 | Management Science, Vol. 72, No. 3
- Stochastic Multi-Armed Bandits with Limited Control Variates
- On Statistical Discrimination as a Failure of Social Learning: A Multiarmed Bandit Approach
  Junpei Komiyama,
  Shunya Noda
  29 March 2024 | Management Science, Vol. 72, No. 1
- Thompson Sampling for the Multinomial Logit Bandit
  Shipra Agrawal,
  Vashist Avadhanula,
  Vineet Goyal,
  Assaf Zeevi
  6 March 2025 | Mathematics of Operations Research, Vol. 51, No. 1
- Local and Global Uniform Convexity Conditions
  18 February 2026
- Information maximization for generalized multiarmed bandit games
  20 January 2026 | Physical Review Research, Vol. 8, No. 1
- Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making
  Esmaeil Keyvanshokooh,
  Mohammad Zhalechian,
  Cong Shi,
  Mark P. Van Oyen,
  Pooyan Kazemian
  2 May 2025 | Management Science, Vol. 71, No. 12
- Collaborative Learning and Decision Making on Pricing and Recommendation: A Simple Framework for Planning
  Junyu Cao
  11 November 2025 | Management Science, Vol. 0, No. 0
- An optimal selection for ensembles of influential projects
  13 February 2020 | Annals of Operations Research, Vol. 354, No. 3
- Optimal sequential stochastic shortest path interdiction
  European Journal of Operational Research, Vol. 326, No. 3
- Context-Based Dynamic Pricing with Separable Demand Models
  Jinzhi Bu,
  David Simchi-Levi,
  Chonghuan Wang
  27 October 2025 | Management Science, Vol. 0, No. 0
- Online Learning with Sample Selection Bias
  Divya Singhvi,
  Somya Singhvi
  19 March 2025 | Operations Research, Vol. 73, No. 5
- Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data
  Xue Wang,
  Mike Mingcheng Wei,
  Tao Yao
  13 November 2024 | Management Science, Vol. 71, No. 8
- Brief Announcement: Stochastic Parallel Scheduling with Bandit Feedback
  16 July 2025
- Best-Arm Identification with High-Dimensional Features
- Quantum contextual bandits and recommender systems for quantum data
  12 September 2024 | Quantum Machine Intelligence, Vol. 6, No. 2
- Pricing and Positioning of Horizontally Differentiated Products with Incomplete Demand Information
  Arnoud V. den Boer,
  Boxiao Chen,
  Yining Wang
  29 April 2024 | Operations Research, Vol. 72, No. 6
- To Interfere or Not To Interfere: Information Revelation and Price-Setting Incentives in a Multiagent Learning Environment
  John R. Birge,
  Hongfan (Kevin) Chen,
  N. Bora Keskin,
  Amy Ward
  27 March 2024 | Operations Research, Vol. 72, No. 6
- Linear Bandits With Side Observations on Networks
  IEEE/ACM Transactions on Networking, Vol. 32, No. 5
- On Learning Whittle Index Policy for Restless Bandits With Scalable Regret
  IEEE Transactions on Control of Network Systems, Vol. 11, No. 3
- Influence Maximization via Graph Neural Bandits
  24 August 2024
- Tiered Assortment: Optimization and Online Learning
  Junyu Cao,
  Wei Sun
  4 October 2023 | Management Science, Vol. 70, No. 8
- Convex Methods for Constrained Linear Bandits
- Optimal Learning for Structured Bandits
  Bart Van Parys,
  Negin Golrezaei
  16 August 2023 | Management Science, Vol. 70, No. 6
- Online Learning and Pricing for Service Systems with Reusable Resources
  Huiwen Jia,
  Cong Shi,
  Siqian Shen
  10 November 2022 | Operations Research, Vol. 72, No. 3
- Distributed Linear Bandits With Differential Privacy
  IEEE Transactions on Network Science and Engineering, Vol. 11, No. 3
- Analyzing Queueing Problems via Bandits With Linear Reward & Nonlinear Workload Fairness
  IEEE Transactions on Mobile Computing, Vol. 23, No. 4
- Multi-armed linear bandits with latent biases
  Information Sciences, Vol. 660
- Network Revenue Management With Demand Learning and Fair Resource-Consumption Balancing
  6 March 2024 | Production and Operations Management, Vol. 33, No. 2
- Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
  27 September 2022 | Journal of the American Statistical Association, Vol. 119, No. 545
- Linear Contextual Bandits with Hybrid Payoff: Revisited
  22 August 2024
- Multi-armed bandits with dependent arms
  20 November 2023 | Machine Learning, Vol. 113, No. 1
- Differentially Private Stochastic Linear Bandits: (Almost) for Free
  IEEE Journal on Selected Areas in Information Theory, Vol. 5
- Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
  IEEE Transactions on Information Theory, Vol. 70, No. 1
- Data-driven Population Tracking in Large Service Systems
  1 January 2024 | SSRN Electronic Journal, Vol. 57
- Data-Driven Hospital Admission Control: A Learning Approach
  Mohammad Zhalechian,
  Esmaeil Keyvanshokooh,
  Cong Shi,
  Mark P. Van Oyen
  10 August 2023 | Operations Research, Vol. 71, No. 6
- Multi-Armed Bandits With Self-Information Rewards
  IEEE Transactions on Information Theory, Vol. 69, No. 11
- Distributionally Robust Batch Contextual Bandits
  Nian Si,
  Fan Zhang,
  Zhengyuan Zhou,
  Jose Blanchet
  31 March 2023 | Management Science, Vol. 69, No. 10
- A tractable online learning algorithm for the multinomial logit contextual bandit
  European Journal of Operational Research, Vol. 310, No. 2
- Provably Efficient Reinforcement Learning with Linear Function Approximation
  Chi Jin,
  Zhuoran Yang,
  Zhaoran Wang,
  Michael I. Jordan
  28 March 2023 | Mathematics of Operations Research, Vol. 48, No. 3
- Reinforcement Learning and Dynamical Systems
- Orchestrating Energy-Efficient vRANs: Bayesian Learning and Experimental Results
  IEEE Transactions on Mobile Computing, Vol. 22, No. 5
- Robust Stochastic Multi-Armed Bandits with Historical Data
  30 April 2023
- Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems
  Computational Statistics & Data Analysis, Vol. 180
- Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing
  1 January 2023 | SSRN Electronic Journal, Vol. 24
- Utility Fairness in Contextual Dynamic Pricing with Demand Learning
  1 January 2023 | SSRN Electronic Journal, Vol. 57
- Online Pricing with Offline Data: Phase Transition and Inverse Square Law
  Jinzhi Bu,
  David Simchi-Levi,
  Yunzong Xu
  17 March 2022 | Management Science, Vol. 68, No. 12
- Compression for Multi-Arm Bandits
  IEEE Journal on Selected Areas in Information Theory, Vol. 3, No. 4
- Satisficing in Time-Sensitive Bandit Learning
  Daniel Russo,
  Benjamin Van Roy
  14 March 2022 | Mathematics of Operations Research, Vol. 47, No. 4
- Online Learning and Optimization for Revenue Management Problems with Add-on Discounts
  David Simchi-Levi,
  Rui Sun,
  Huanan Zhang
  12 January 2022 | Management Science, Vol. 68, No. 10
- Adaptive Sequential Experiments with Unknown Information Arrival Processes
  Yonatan Gur,
  Ahmadreza Momeni
  10 June 2022 | Manufacturing & Service Operations Management, Vol. 24, No. 5
- Online Personalized Assortment Optimization with High-Dimensional Customer Contextual Data
  Sentao Miao,
  Xiuli Chao
  6 July 2022 | Manufacturing & Service Operations Management, Vol. 24, No. 5
- Sublinear regret for learning POMDPs
  1 September 2022 | Production and Operations Management, Vol. 31, No. 9
- Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information
  Boxiao Chen,
  David Simchi-Levi,
  Yining Wang,
  Yuan Zhou
  14 December 2021 | Management Science, Vol. 68, No. 8
- Machine-learning-optimized Cas12a barcoding enables the recovery of single-cell lineages and transcriptional profiles
  Molecular Cell, Vol. 82, No. 16
- Stochastic continuum-armed bandits with additive models: Minimax regrets and adaptive algorithm
  The Annals of Statistics, Vol. 50, No. 4
- Online Resource Allocation with Personalized Learning
  Mohammad Zhalechian,
  Esmaeil Keyvanshokooh,
  Cong Shi,
  Mark P. Van Oyen
  10 May 2022 | Operations Research, Vol. 70, No. 4
- Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states
  29 June 2022 | Quantum, Vol. 6
- Multi-Environment Meta-Learning in Stochastic Linear Bandits
- Representation Learning for Context-Dependent Decision-Making
- Blending Controllers via Multi-Objective Bandits
- Dynamic Learning and Decision Making via Basis Weight Vectors
  Hao Zhang
  9 February 2022 | Operations Research, Vol. 70, No. 3
- Technical note—Knowledge gradient for selection with covariates: Consistency and computation
  7 October 2021 | Naval Research Logistics (NRL), Vol. 69, No. 3
- Hedging the Drift: Learning to Optimize Under Nonstationarity
  Wang Chi Cheung,
  David Simchi-Levi,
  Ruihao Zhu
  12 November 2021 | Management Science, Vol. 68, No. 3
- Meta Dynamic Pricing: Transfer Learning Across Experiments
  Hamsa Bastani,
  David Simchi-Levi,
  Ruihao Zhu
  9 September 2021 | Management Science, Vol. 68, No. 3
- A Bandit-Learning Approach to Multifidelity Approximation
  18 January 2022 | SIAM Journal on Scientific Computing, Vol. 44, No. 1
- Dynamic Assortment Optimization: Beyond MNL Model
  12 April 2022
- Batched Learning in Generalized Linear Contextual Bandits With General Decision Sets
  IEEE Control Systems Letters, Vol. 6
- Non-Stationary Representation Learning in Sequential Linear Bandits
  IEEE Open Journal of Control Systems, Vol. 1
- Increasing Charity Donations: A Bandit Learning Approach
  SSRN Electronic Journal, Vol. 24
- Context-Based Dynamic Pricing with Separable Demand Models
  SSRN Electronic Journal, Vol. 47
- Nonstochastic Bandits with Infinitely Many Experts
- EdgeBOL
  3 December 2021
- Cache Placement Optimization in Mobile Edge Computing Networks With Unaware Environment—An Extended Multi-Armed Bandit Approach
  IEEE Transactions on Wireless Communications, Vol. 20, No. 12
- Ranking and Selection with Covariates for Personalized Decision Making
  Haihui Shen,
  L. Jeff Hong,
  Xiaowei Zhang
  12 February 2021 | INFORMS Journal on Computing, Vol. 33, No. 4
- Multimodal Dynamic Pricing
  Yining Wang,
  Boxiao Chen,
  David Simchi-Levi
  27 January 2021 | Management Science, Vol. 67, No. 10
- Online learning: A comprehensive survey
  Neurocomputing, Vol. 459
- Towards the D-Optimal Online Experiment Design for Recommender Selection
  14 August 2021
- Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses
  Rong Jin,
  David Simchi-Levi,
  Li Wang,
  Xinshang Wang,
  Sen Yang
  12 January 2021 | Management Science, Vol. 67, No. 8
- Earning and Learning with Varying Cost
  1 August 2021 | Production and Operations Management, Vol. 30, No. 8
- Linear bandits with limited adaptivity and learning distributional optimal design
  15 June 2021
- Uncertainty Quantification for Demand Prediction in Contextual Dynamic Pricing
  1 June 2021 | Production and Operations Management, Vol. 30, No. 6
- Batched Learning in Generalized Linear Contextual Bandits with General Decision Sets
- Tensors in Modern Statistical Learning
  17 May 2021
- Bayesian Online Learning for Energy-Aware Resource Orchestration in Virtualized RANs
- Best arm identification in generalized linear bandits
  Operations Research Letters, Vol. 49, No. 3
- Optimal Bayesian Demand Learning over Short Horizons
  1 April 2021 | Production and Operations Management, Vol. 30, No. 4
- Technical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents
  Daniel Russo
  24 December 2020 | Operations Research, Vol. 69, No. 1
- Exploration Methods in Sparse Reward Environments
  3 January 2021
- The Restless Hidden Markov Bandit With Linear Rewards and Side Information
  IEEE Transactions on Signal Processing, Vol. 69
- Safe Linear Thompson Sampling With Side Information
  IEEE Transactions on Signal Processing, Vol. 69
- Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
  Electronic Journal of Statistics, Vol. 15, No. 2
- An Optimal Greedy Heuristic with Minimal Learning Regret for the Markov Chain Choice Model
  SSRN Electronic Journal, Vol. 67
- To Interfere or Not To Interfere: Information Revelation and Price-Setting Incentives in a Multiagent Learning Environment
  SSRN Electronic Journal, Vol. 17
- Adaptive Sequential Experiments with Unknown Information Arrival Processes
  SSRN Electronic Journal, Vol. 68
- Online Facility Location
  SSRN Electronic Journal, Vol. 33
- Restless Hidden Markov Bandit with Linear Rewards
- In-Field Performance Optimization for mm-Wave Mixed-Signal Doherty Power Amplifiers: A Bandit Approach
  IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 67, No. 12
- Discontinuous Demand Functions: Estimation and Pricing
  Arnoud V. den Boer,
  N. Bora Keskin
  27 April 2020 | Management Science, Vol. 66, No. 10
- Active Learning of Bayesian Linear Models with High-Dimensional Binary Features by Parameter Confidence-Region Estimation
  Neural Computation, Vol. 32, No. 10
- Learning in Combinatorial Optimization: What and How to Explore
  Sajad Modaresi,
  Denis Sauré,
  Juan Pablo Vielma
  19 June 2020 | Operations Research, Vol. 68, No. 5
- Bandit Algorithms
  4 July 2020 | , Vol. 31
- Self-accelerated Thompson sampling with near-optimal regret upper bound
  Neurocomputing, Vol. 399
- Multi-Armed Bandits on Partially Revealed Unit Interval Graphs
  IEEE Transactions on Network Science and Engineering, Vol. 7, No. 3
- An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting
- Linear Thompson Sampling Under Unknown Linear Constraints
- Generalized Linear Bandits with Safety Constraints
- Online Decision Making with High-Dimensional Covariates
  Hamsa Bastani,
  Mohsen Bayati
  7 November 2019 | Operations Research, Vol. 68, No. 1
- Multi-Armed Bandits
  22 March 2022
- When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment
  SSRN Electronic Journal, Vol. 47
- Optimal Learning for Structured Bandits
  SSRN Electronic Journal, Vol. 34
- A multi-armed bandit approach for exploring partially observed networks
  31 May 2019 | Applied Network Science, Vol. 4, No. 1
- From self-tuning regulators to reinforcement learning and back again
- Introduction to Multi-Armed Bandits
  8 November 2019 | Foundations and Trends® in Machine Learning, Vol. 12, No. 1-2
- Orthogonal Projection in Linear Bandits
- Recent Advances in Multiarmed Bandits for Sequential Decision Making
  Shipra Agrawal
  2 October 2019
- MNL-Bandit: A Dynamic Learning Approach to Assortment Selection
  Shipra Agrawal,
  Vashist Avadhanula,
  Vineet Goyal,
  Assaf Zeevi
  10 September 2019 | Operations Research, Vol. 67, No. 5
- Exploring Partially Observed Networks with Nonparametric Bandits
  5 December 2018
- Repeated discrete choices in geographical agent based models with an application to fisheries
  Environmental Modelling & Software, Vol. 111
- Optimal Bayesian Price Fine-Tuning
  SSRN Electronic Journal, Vol. 15
- Conservative Exploration for Semi-Bandits with Linear Generalization: A Product Selection Problem for Urban Warehouses
  SSRN Electronic Journal, Vol. 24
- Multi-Modal Dynamic Pricing
  SSRN Electronic Journal, Vol. 23
- Sequential Choice Bandits: Learning with Marketing Fatigue
  SSRN Electronic Journal, Vol. 310
- Global Bandits
  IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, No. 12
- Online Network Revenue Management Using Thompson Sampling
  Kris Johnson Ferreira,
  David Simchi-Levi,
  He Wang
  21 November 2018 | Operations Research, Vol. 66, No. 6
- A Tutorial on Thompson Sampling
  12 July 2018 | Foundations and Trends® in Machine Learning, Vol. 11, No. 1
- What Are We Really Good At? Product Strategy with Uncertain Capabilities
  Jeanine Miklós-Thal,
  Michael Raith,
  Matthew Selove
  21 March 2018 | Marketing Science, Vol. 37, No. 2
- On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits
  IEEE Transactions on Control of Network Systems, Vol. 5, No. 1
- Learning to Optimize via Information-Directed Sampling
  Daniel Russo,
  Benjamin Van Roy
  26 October 2017 | Operations Research, Vol. 66, No. 1
- Generalized Global Bandit and Its Application in Cellular Coverage Optimization
  IEEE Journal of Selected Topics in Signal Processing, Vol. 12, No. 1
- Unsupervised cost sensitive predictions with side information
  11 January 2018
- Multiagent Systems: Learning, Strategic Behavior, Cooperation, and Network Formation
- Using Linear Stochastic Bandits to extend traditional offline Designed Experiments to online settings
  Computers & Industrial Engineering, Vol. 115
- Learning to Optimize Under Non-Stationarity
  SSRN Electronic Journal, Vol. 32
- Technical Note—Dynamic Pricing and Demand Learning with Limited Price Experimentation
  Wang Chi Cheung,
  David Simchi-Levi,
  He Wang
  2 August 2017 | Operations Research, Vol. 65, No. 6
- Online learning with side information
- Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments
  Eric M. Schwartz,
  Eric T. Bradlow,
  Peter S. Fader
  20 April 2017 | Marketing Science, Vol. 36, No. 4
- Sparse linear contextual bandits via relevance vector machines
- Chasing Demand: Learning and Earning in a Changing Environment
  N. Bora Keskin,
  Assaf Zeevi
  11 November 2016 | Mathematics of Operations Research, Vol. 42, No. 2
- Maximizing a Class of Utility Functions Over the Vertices of a Polytope
  Alper Atamtürk,
  Andrés Gómez
  31 January 2017 | Operations Research, Vol. 65, No. 2
- Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
  SIAM Journal on Computing, Vol. 46, No. 6
- Thompson Sampling for Online Personalized Assortment Optimization Problems with Multinomial Logit Choice Models
  SSRN Electronic Journal, Vol. 24
- Optimal Learning in Linear Regression with Combinatorial Feature Selection
  Bin Han,
  Ilya O. Ryzhov,
  Boris Defourny
  26 September 2016 | INFORMS Journal on Computing, Vol. 28, No. 4
- Personalized Active Learning for Activity Classification Using Wireless Wearable Sensors
  IEEE Journal of Selected Topics in Signal Processing, Vol. 10, No. 5
- Online Collaborative Filtering on Graphs
  Siddhartha Banerjee,
  Sujay Sanghavi,
  Sanjay Shakkottai
  13 May 2016 | Operations Research, Vol. 64, No. 3
- Finding the needles in the haystack: efficient intelligence processing
  21 December 2017 | Journal of the Operational Research Society, Vol. 67, No. 6
- Efficient algorithms for linear polyhedral bandits
- Linear Bandits in Unknown Environments
  4 September 2016
- Randomized allocation with arm elimination in a bandit problem with covariates
  Electronic Journal of Statistics, Vol. 10, No. 1
- Learning in Combinatorial Optimization: What and How to Explore
  SSRN Electronic Journal, Vol. 33
- Bayesian Reinforcement Learning: A Survey
  26 November 2015 | Foundations and Trends® in Machine Learning, Vol. 8, No. 5-6
- Dynamic pricing and learning: Historical origins, current research, and new directions
  Surveys in Operations Research and Management Science, Vol. 20, No. 1
- Distributed Multi-Agent Online Learning Based on Global Feedback
  IEEE Transactions on Signal Processing, Vol. 63, No. 9
- Efficient detection and localization on graph structured data
- Experiential and Social Learning in Firms: The Case of Hydraulic Fracturing in the Bakken Shale
  SSRN Electronic Journal, Vol. 58
- Online Network Revenue Management Using Thompson Sampling
  SSRN Electronic Journal, Vol. 57
- What are We Really Good At? Product Strategy with Uncertain Capabilities
  SSRN Electronic Journal, Vol. 54
- Mean Square Convergence Rates for Maximum Quasi-Likelihood Estimators
  Arnoud V. den Boer,
  Bert Zwart
  12 January 2014 | Stochastic Systems, Vol. 4, No. 2
- Learning to Optimize via Posterior Sampling
  Daniel Russo,
  Benjamin Van Roy
  23 April 2014 | Mathematics of Operations Research, Vol. 39, No. 4
- Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs
  IEEE Transactions on Signal Processing, Vol. 62, No. 22
- Context adaptation in interactive recommender systems
  6 October 2014
- Networked bandits with disjoint linear payoffs
  24 August 2014
- Dynamic Pricing with Multiple Products and Partially Specified Demand Distribution
  Arnoud V. den Boer
  13 February 2014 | Mathematics of Operations Research, Vol. 39, No. 3
- Stochastic bandits with side observations on networks
  16 June 2014 | ACM SIGMETRICS Performance Evaluation Review, Vol. 42, No. 1
- User Driven Server Selection Algorithm for CDN Architecture
  8 July 2014
- Stochastic bandits with side observations on networks
  16 June 2014
- From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning
  20 January 2014 | Foundations and Trends® in Machine Learning, Vol. 7, No. 1
- Dynamic Pricing and Demand Learning with Limited Price Experimentation
  SSRN Electronic Journal, Vol. 57
- Bandits with budgets
- Bibliography
  6 December 2013
- Learning optimal classifier chains for real-time big data mining
- Retail pricing for stochastic demand with unknown parameters: An online machine learning approach
- Spectrum bandit optimization
- Mixing bandits
  11 August 2013
- A Linear Response Bandit Problem
  Alexander Goldenshluger,
  Assaf Zeevi
  26 August 2013 | Stochastic Systems, Vol. 3, No. 1
- Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats
  David B. Brown,
  James E. Smith,
  24 May 2013 | Operations Research, Vol. 61, No. 3
- A Time and Space Efficient Algorithm for Contextual Linear Bandits
- Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions
  SSRN Electronic Journal, Vol. 24
- Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments
  SSRN Electronic Journal, Vol. 27
- Chasing Demand: Learning and Earning in a Changing Environment
  SSRN Electronic Journal, Vol. 57
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
  12 December 2012 | Foundations and Trends® in Machine Learning, Vol. 5, No. 1
- Linear bandits in high dimension and recommendation systems
- Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations
  IEEE/ACM Transactions on Networking, Vol. 20, No. 5
- Bibliography
  25 April 2012
- Parametrized stochastic multi-armed bandits with binary rewards
- Sequential learning for optimal monitoring of multi-channel wireless networks

cover image Mathematics of Operations Research

Volume 35, Issue 2

May 2010

Pages 257-512

Article Information

Metrics

Information

Received:January 19, 2009
Published Online:April 30, 2010

Cite as

Paat Rusmevichientong, John N. Tsitsiklis, (2010) Linearly Parameterized Bandits. Mathematics of Operations Research 35(2):395-411.

https://doi.org/10.1287/moor.1100.0446

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Linearly Parameterized Bandits

References

Volume 35, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News