Open Access

A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization

Lukas De Kerpel
Corresponding Author
Lukas De Kerpel
[email protected]
https://orcid.org/0000-0001-6860-8344
Faculty of Economics and Business Administration, Ghent University, 9000 Ghent, Belgium; and Corelab CVAMO, FlandersMake@UGent, Ghent University, 9000 Ghent, Belgium
Search for more papers by this author
,
Arthur Thuy
Arthur Thuy
[email protected]
https://orcid.org/0000-0001-9107-5646
Faculty of Economics and Business Administration, Ghent University, 9000 Ghent, Belgium; and Corelab CVAMO, FlandersMake@UGent, Ghent University, 9000 Ghent, Belgium
Search for more papers by this author
,
Dries F. Benoit
Dries F. Benoit
[email protected]
https://orcid.org/0000-0001-9901-8507
Faculty of Economics and Business Administration, Ghent University, 9000 Ghent, Belgium; and Corelab CVAMO, FlandersMake@UGent, Ghent University, 9000 Ghent, Belgium
Search for more papers by this author

Corresponding Author

Lukas De Kerpel

Faculty of Economics and Business Administration, Ghent University, 9000 Ghent, Belgium; and Corelab CVAMO, FlandersMake@UGent, Ghent University, 9000 Ghent, Belgium

Search for more papers by this author

Arthur Thuy

[email protected]

https://orcid.org/0000-0001-9107-5646

Faculty of Economics and Business Administration, Ghent University, 9000 Ghent, Belgium; and Corelab CVAMO, FlandersMake@UGent, Ghent University, 9000 Ghent, Belgium

Search for more papers by this author

Dries F. Benoit

[email protected]

https://orcid.org/0000-0001-9901-8507

Faculty of Economics and Business Administration, Ghent University, 9000 Ghent, Belgium; and Corelab CVAMO, FlandersMake@UGent, Ghent University, 9000 Ghent, Belgium

Search for more papers by this author

Published Online:10 Mar 2026https://doi.org/10.1287/ited.2025.0174

References

Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro Nathan, Williamson RC, eds. Proc. 25th Ann. Conf. Learn. Theory, vol. 23 (PMLR, Cambridge, MA), 39.1–39.26.Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (PMLR, Cambridge, MA), 127–135.Google Scholar
Aramayo N, Schiappacasse M, Goic M (2023) A multiarmed bandit approach for house ads recommendations. Marketing Sci. 42(2):271–292.Link, Google Scholar
Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 1–9.Google Scholar
Clément B, Roy D, Oudeyer PY, Lopes M (2014) Online Optimization of teaching sequences with multi-armed bandits. Stamper J, Pardos ZA, Mavrikis M, McLaren BM, eds. Proc. 7th Internat. Conf. Ed. Data Mining (International Educational Data Mining Society, Worcester, MA), 269–272.Google Scholar
Corbett AT, Anderson JR (1994) Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling User-Adapt. Interactions 4(4):253–278.Crossref, Google Scholar
Cully A, Demiris Y (2020) Online knowledge level tracking with data-driven student models and collaborative filtering. IEEE Trans. Knowledge Data Engrg. 32(10):2000–2013.Crossref, Google Scholar
Da Silva FL, Slodkowski BK, Da Silva KKA, Cazella SC (2023) A systematic literature review on educational recommender systems for teaching and learning: Research trends, limitations and opportunities. Ed. Inform. Tech. (Dordrecht) 28(3):3289–3328.Google Scholar
De Kerpel L, Benoit D (2025) A reward-informed semi-personalized bandit approach for enhancing accuracy and serendipity in online slate recommendations. ACM Trans. Recommender Systems (ACM, New York).Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
Fornasiero M, Malucelli F, Pazzi R, Schettini T (2021) Empowering optimization skills through an orienteering competition. INFORMS Trans. Ed. 22(1):1–8.Link, Google Scholar
Hakkal S, Lahcen AA (2024) XGBoost to enhance learner performance prediction. Comput. Ed. Artificial Intelligence 7:100254.Crossref, Google Scholar
Heffernan NT, Heffernan CL (2014) The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. Internat. J. Artificial Intelligence Ed. 24(4):470–497.Crossref, Google Scholar
Honda J, Takemura A (2014) Optimality of Thompson sampling for Gaussian Bandits depends on priors. Kaski S, Corander J, eds. Proc. 17th Internat. Conf. Artificial Intelligence Statist., vol. 33 (PMLR, Cambridge, MA), 375–383.Google Scholar
Huang L, Wang CD, Chao HY, Lai JH, Yu PS (2019) A score prediction approach for optional course recommendation via cross-user-domain collaborative filtering. IEEE Access 7:19550–19563.Crossref, Google Scholar
Intayoad W, Kamyod C, Temdee P (2020) Reinforcement learning based on contextual bandits for personalized online learning recommendation systems. Wireless Personal Comm. 115(4):2917–2932.Crossref, Google Scholar
Khanal SS, Prasad P, Alsadoon A, Maag A (2020) A systematic review: Machine learning based recommendation systems for e-learning. Ed. Inform. Tech. (Dordrecht) 25(4):2635–2664.Google Scholar
Krahenbuhl KS (2016) Student-centered education and constructivism: Challenges, concerns, and clarity for teachers. Clearing House 89(3):97–105.Crossref, Google Scholar
Liu Y, Feng J, Lu J (2017) Collaborative filtering algorithm based on rating distance. Kim CH, Lee HW, Lee DH, Sakurai K, eds. Proc. 11th Internat. Conf. Ubiquitous Inform. Management Comm. (Association for Computing Machinery, New York), 1–7.Google Scholar
Liu YE, Mandel T, Brunskill E, Popovic Z (2014) Trading off scientific knowledge and user learning with multi-armed bandits. Accessed August 7, 2025, https://api.semanticscholar.org/CorpusID:4103970.Google Scholar
Maclean KDS, Bayley T (2024) That’s incorrect and let me tell you why: A scalable assessment to evaluate higher order thinking skills. INFORMS Trans. Ed. 25(1):23–34.Link, Google Scholar
Manickam I, Lan AS, Baraniuk RG (2017) Contextual multi-armed bandit algorithms for personalized learning action selection. Proc. IEEE Internat. Conf. Acoustics Speech Signal Processing, 6344–6348.Google Scholar
Meng Z, McCreadie R, Macdonald C, Ounis I (2020) Exploring data splitting strategies for the evaluation of recommendation models. Proc. 14th ACM Conf. Recommender Systems (Association for Computing Machinery, New York), 681–686.Google Scholar
Nafea SM, Siewe F, He Y (2019) On recommendation of learning objects using Felder-Silverman learning style model. IEEE Access 7:163034–163048.Crossref, Google Scholar
Neshaei SP, Davis RL, Hazimeh A, Lazarevski B, Dillenbourg P, Käser T (2024) Towards modeling learner performance with large language models. Proc. 17th Internat. Conf. Ed. Data Mining (International Educational Data Mining Society, Worcester, MA), 759–768.Google Scholar
Pardos ZA, Baker RS, San Pedro M, Gowda SM, Gowda SM (2014) Affective States and state tests: Investigating how affect and engagement during the school year predict end-of-year learning outcomes. J. Learn. Analytics 1(1):107–128.Crossref, Google Scholar
Patikorn T, Baker RS, Heffernan NT (2020) ASSISTments longitudinal data mining competition special issue: A preface. J. Ed. Data Mining 12(2):i–xi.Google Scholar
Reeves KA, Hernandez-Gantes V, Centeno G, Gushi Nurnberg C (2021) Game—Constructivist exercises to enhance teaching of probability and statistics for engineers. INFORMS Trans. Ed. 22(1):55–64.Link, Google Scholar
Sergis S, Sampson DG (2016) Learning object recommendations for teachers based on elicited ICT competence profiles. IEEE Trans. Learn. Tech. 9(1):67–80.Crossref, Google Scholar
Tarus JK, Niu Z, Yousif A (2017) A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining. Future Generation Comput. Systems 72:37–48.Crossref, Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
van de Pol J, Volman M, Beishuizen J (2010) Scaffolding in teacher–student interaction: A Decade of research. Ed. Psych. Rev. 22(3):271–296.Crossref, Google Scholar
Wu D, Lu J, Zhang G (2015) A fuzzy tree matching-based personalized e-learning recommender system. IEEE Trans. Fuzzy Systems 23(6):2412–2426.Crossref, Google Scholar

cover image INFORMS Transactions on Education

Volume 26, Issue 3

May 2026

Pages 175-253

Article Information

Metrics

Information

Received:August 30, 2025
Accepted:January 30, 2026
Published Online:March 10, 2026

Cite as

Lukas De Kerpel, Arthur Thuy, Dries F. Benoit (2026) A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization. INFORMS Transactions on Education 26(3):187-199.

https://doi.org/10.1287/ited.2025.0174

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization

References

Volume 26, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News