Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Published Online:https://doi.org/10.1287/opre.2024.0771

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Proc. 25th Internat. Conf. Neural Inform. Processing Systems, Advances in Neural Information Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
  • Abernethy J, Kale S (2013) Adaptive market making via online learning. Proc. 27th Internat. Conf. Neural Inform. Processing Systems, Advances in Neural Information Processing Systems, vol. 2 (Curran Associates Inc., Red Hook, NY), 2058–2066.Google Scholar
  • Agarwal A, Bartlett P, Dama M (2010) Optimal allocation strategies for the dark pool problem. Proc. 13th Internat. Conf. Artificial Intelligence Statist. (JMLR), 9–16.Google Scholar
  • Agrawal S, Avadhanula V, Goyal V, Zeevi A (2019) Mnl-bandit: A dynamic learning approach to assortment selection. Oper. Res. 67(5):1453–1485.LinkGoogle Scholar
  • Almgren R, Chriss N (2001) Optimal execution of portfolio transactions. J. Risk 3:5–40.CrossrefGoogle Scholar
  • Alsabah H, Capponi A, Ruiz Lacedelli O, Stern M (2021) Robo-advising: Learning investors’ risk preferences via portfolio choices. J. Financial Econometrics 19(2):369–392.CrossrefGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(v):397–422.Google Scholar
  • Baldacci B, Manziuk I (2020) Adaptive trading strategies across liquidity pools. Market Microstructure Liquidity 6(01–04):2050008. Google Scholar
  • Bernasconi M, Martino S, Vittori E, Trovò F, Restelli M (2022) Dark-pool smart order routing: A combinatorial multi-armed bandit approach. Proc. 3rd ACM Internat. Conf. AI Finance (ACM, New York), 352–360.Google Scholar
  • Bernasconi-De-Luca M, Fusco L, Dragić O (2021) martinobdl/itch: Itch50converter. Accessed January 3, 2026, https://zenodo.org/record/5209267.Google Scholar
  • Bistritz I, Zhou Z, Chen X, Bambos N, Blanchet J (2022) No weighted-regret learning in adversarial bandits with delays. J. Machine Learn. Res. 23(1):6205–6247.Google Scholar
  • Blanchet J, Xu R, Zhou Z (2024) Delay-adaptive learning in generalized linear contextual bandits. Math. Oper. Res. 49(1):326–345.LinkGoogle Scholar
  • Bouchaud J-P (2010) Price impact. Encyclopedia Quant. Finance.Google Scholar
  • Camilleri R, Jamieson K, Katz-Samuels J (2021) High-dimensional experimental design and kernel bandits. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 1227–1237.Google Scholar
  • Cartea Á, Jaimungal S (2016) Incorporating order-flow into optimal execution. Math. Financial Econom. 10:339–364.CrossrefGoogle Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.LinkGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (JMLR), 208–214.Google Scholar
  • Coache A, Jaimungal S (2024) Reinforcement learning with dynamic convex risk measures. Math. Finance 34(2):557–587.CrossrefGoogle Scholar
  • Cont R, Kukanov A (2017) Optimal order placement in limit order markets. Quant. Finance 17(1):21–39.CrossrefGoogle Scholar
  • Cont R, Kukanov A, Stoikov S (2014) The price impact of order book events. J. Financial Econometrics 12(1):47–88.CrossrefGoogle Scholar
  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to Algorithms (MIT Press, Cambridge, MA).Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
  • Ganchev K, Nevmyvaka Y, Kearns M, Vaughan JW (2010) Censored exploration and the dark pool problem. Comm. ACM 53(5):99–107.CrossrefGoogle Scholar
  • Garivier A, Moulines E (2008) On upper-confidence bound policies for non-stationary bandit problems. Preprint, submitted May 22, https://arxiv.org/abs/0805.3415.Google Scholar
  • Hambly B, Xu R, Yang H (2023) Recent advances in reinforcement learning in finance. Math. Finance 33(3):437–503.Google Scholar
  • Hsieh P-C, Liu X, Bhattacharya A, Kumar PR (2019) Stay with me: Lifetime maximization through heteroscedastic linear bandits with reneging. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 2800–2809.Google Scholar
  • Huo X, Fu F (2017) Risk-aware multi-armed bandit problem with application to portfolio selection. Roy. Soc. Open Sci. 4(11):171377.CrossrefGoogle Scholar
  • Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600. Google Scholar
  • Ji J, Xu R, Zhu R (2022) Risk-aware linear bandits with application in smart order routing. Proc. 3rd ACM Internat. Conf. AI Finance (ACM, New York), 334–342.Google Scholar
  • JPMS Frequently Asked Questions–U.S. Equities (2021) J.P. Morgan Securities LLC Electronic Trading: Frequently Asked Questions–US Equities. Accessed January 3, 2026, https://www.jpmorgan.com/content/dam/jpm/cib/complex/content/markets/aqua/US_Electronic_Trading_FAQs.pdf.Google Scholar
  • Kiefer J, Wolfowitz J (1960) The equivalence of two extremum problems. Canadian J. Math. 12:363–366.CrossrefGoogle Scholar
  • Laruelle S, Lehalle C-A, Pages G (2011) Optimal split of orders across liquidity pools: A stochastic algorithm approach. SIAM J. Financial Math. 2(1):1042–1076.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (ACM, New York), 661–670.Google Scholar
  • Lin Q, Chen X, Peña J (2015) A trade execution model under a composite dynamic coherent risk measure. Oper. Res. Lett. 43(1):52–58.CrossrefGoogle Scholar
  • Markowitz H (1952) Portfolio selection. J. Finance 7(1):7791.Google Scholar
  • NASDAQ (2017) Tradetalks Nasdaq PSX is the most unique equity exchange. Accessed January 3, 2026, https://www.nasdaq.com/articles/tradetalks-nasdaq-psx-is-the-most-unique-equity-exchange-2017-04-19.Google Scholar
  • NASDAQ (2022a) Introduction to Nasdaq BX. Accessed January 3, 2026, https://www.nasdaq.com/solutions/nasdaq-bx-stock-market.Google Scholar
  • NASDAQ (2022b) Introduction to Nasdaq PSX. Accessed January 3, 2026, https://www.nasdaq.com/solutions/nasdaq-psx-stock-market.Google Scholar
  • NASDAQ ITCH Data (2022) Nasdaq ITCH data. Accessed January 3, 2026, https://emi.nasdaq.com/ITCH/.Google Scholar
  • Pike-Burke C, Agrawal S, Szepesvari C, Grunewalder S (2018) Bandits with delayed, aggregated anonymous feedback. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 4105–4113.Google Scholar
  • Preis T (2011) Price-time priority and pro rata matching in an order book model of financial markets. Econophysics of Order-Driven Markets (Springer, Milano), 65–72.CrossrefGoogle Scholar
  • Rosenblatt Securities (2022) Let there be light: US edition: Market structure report. Accessed January 3, 2026, https://www.rblt.com/market-reports/let-there-be-light-us-edition-42.Google Scholar
  • Rubinstein M (2002) Markowitz’s “portfolio selection”: A fifty-year retrospective. J. Finance 57(3):1041–1045.CrossrefGoogle Scholar
  • Sani A, Lazaric A, Munos R (2012) Risk-aversion in multi-armed bandits. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Saux P, Maillard O (2023) Risk-aware linear bandits with convex loss. Proc. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 7723–7754.Google Scholar
  • Shen W, Wang J, Jiang Y-G, Zha H (2015) Portfolio choices with orthogonal bandit learning. Proc. 24th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 974–980.Google Scholar
  • Shipra Agrawal NG (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
  • Si N, Zhang F, Zhou Z, Blanchet J (2023) Distributionally robust batch contextual bandits. Management Sci. 69(10):5772–5793.LinkGoogle Scholar
  • Simchi-Levi D, Wang C, Zheng Z (2023a) Non-stationary experimental design under structured trends. Preprint, submitted July 25, https://doi.org/10.2139/ssrn.4514568.Google Scholar
  • Simchi-Levi D, Zheng Z, Zhu F (2023b) Stochastic multi-armed bandits: Optimal trade-off among optimality, consistency, and tail risk. Proc. 37th Conf. Neural Inform. Processing Systems (Neural Information Processing Systems Foundation, New Orleans, LA).Google Scholar
  • Tan VYF, Prashanth LA, Jagannathan K (2022) A survey of risk-aware multi-armed bandits. Proc. Thirty-First Internat. Joint Conf. Artificial Intelligence (IJCAI), 5623–5629.Google Scholar
  • Vakili S, Zhao Q (2016) Risk-averse multi-armed bandit problems under mean variance measure. IEEE J. Selected Topics Signal Processing 10(6):1093–1111.CrossrefGoogle Scholar
  • Yu JY, Mannor S (2009) Piecewise-stationary bandit problems with side observations. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 1177–1184.Google Scholar
  • Yu C, Liu J, Nemati S, Yin G (2021) Reinforcement learning in healthcare: A survey. ACM Comput. Surveys 55(1):1–36.CrossrefGoogle Scholar
  • Zhou XY, Yin G (2003) Markowitz’s mean variance portfolio selection with regime switching: A continuous-time model. SIAM J. Control Optim. 42(4):1466–1482.CrossrefGoogle Scholar
  • Zhu R, Kveton B (2022) Safe data collection for offline and online policy learning. Preprint, submitted August 4, https://arxiv.org/abs/2111.04835.Google Scholar
  • Zhu Q, Tan VYF (2020) Thompson sampling algorithms for mean variance bandits. Proc. 37th Internat. Conf. Machine Learn. (PMLR, New York), 11599–11608.Google Scholar
  • Zhu F, Zheng Z (2020) When demands evolve larger and noisier: Learning and earning in a growing environment. Proc. Internat. Conf. Machine Learn. (PMLR, New York), 11629–11638.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.