py-irt: A Scalable Item Response Theory Library for Python

John Patrick Lalor
Corresponding Author
John Patrick Lalor
[email protected]
https://orcid.org/0000-0003-0848-4786
IT, Analytics, and Operations, University of Notre Dame, Notre Dame, Indiana 46556;
Search for more papers by this author
,
Pedro Rodriguez
Pedro Rodriguez
[email protected]
https://orcid.org/0000-0001-8572-0725
Computer Science, University of Maryland, College Park, Maryland 20742
Search for more papers by this author

John Patrick Lalor

Corresponding Author

John Patrick Lalor

[email protected]

https://orcid.org/0000-0003-0848-4786

IT, Analytics, and Operations, University of Notre Dame, Notre Dame, Indiana 46556;

Search for more papers by this author

Pedro Rodriguez

[email protected]

https://orcid.org/0000-0001-8572-0725

Computer Science, University of Maryland, College Park, Maryland 20742

Search for more papers by this author

Published Online:15 Nov 2022https://doi.org/10.1287/ijoc.2022.1250

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, et al. (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Preprint, submitted March 14, https://arxiv.org/abs/1603.04467.Google Scholar
Baker FB (2001) The Basics of Item Response Theory (ERIC Clearinghouse on Assessment and Evaluation, University of Maryland, College Park, MD).Google Scholar
Baker FB, Kim SH (2004) Item Response Theory: Parameter Estimation Techniques, 2nd ed. (CRC Press, New York).Crossref, Google Scholar
Baldock R, Maennel H, Neyshabur B (2021) Deep learning through the lens of example difficulty. Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems 34: Annual Conf. Neural Inform. Processing Systems 2021 (NeurIPS) (Curran Associates, Inc., Redhook, NY), 10876–10889.Google Scholar
Battauz M (2015) equateIRT: An R package for IRT test equating. J. Statist. Software 68(1):1–22.Google Scholar
Bergner Y, Halpin P, Vie JJ (2022) Multidimensional item response theory in the style of collaborative filtering. Psychometrika 87(1):266–288.Crossref, Google Scholar
Bingham E, Chen JP, Jankowiak M, Obermeyer F, Pradhan N, Karaletsos T, Singh R, Szerlip P, Horsfall P, Goodman ND (2019) Pyro: Deep universal probabilistic programming. J. Machine Learn. Res. 20(1):973–978.Google Scholar
Bock RD, Aitkin M (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46(4):443–459.Crossref, Google Scholar
Carlson JE, von Davier M (2013) Item Response Theory, ETS Research Report Series, vol. 2, i-69.Google Scholar
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: A probabilistic programming language. J. Statist. Software 76(1):1–32.Crossref, Google Scholar
Chalmers RP (2012) mirt: A multidimensional item response theory package for the R environment. J. Statist. Software 48:1–29.Google Scholar
Chen S, Xie W (2021) On cluster-aware supervised learning: Frameworks, convergent algorithms, and applications. INFORMS J. Comput. 34(1):481–502.Link, Google Scholar
Chen Y, Li X, Zhang S (2019) Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika 84(1):124–146.Crossref, Google Scholar
Cui J, Rosoff H, John RS (2017) A polytomous item response theory model for measuring near-miss appraisal as a psychological trait. Decision Anal. 14(2):75–86.Link, Google Scholar
de Jong MG, Lehmann DR, Netzer O (2012) State-dependence effects in surveys. Marketing Sci. 31(5):838–854.Link, Google Scholar
de Jong MG, Steenkamp JBEM, Veldkamp BP (2009) A model for the construction of country-specific yet internationally comparable short-form marketing scales. Marketing Sci. 28(4):674–689.Link, Google Scholar
Edgeworth FY (1888) The statistics of examinations. J. Roy. Statist. Soc. 51(3):599–635.Google Scholar
Gardner M, Grus J, Neumann M, Tafjord O, Dasigi P, Liu NF, Peters M, Schmitz M, Zettlemoyer LS (2018) AllenNLP: A deep semantic natural language processing platform. Proc. Workshop NLP Open Source Software (NLP-OSS) (Association for Computational Linguistics, Stroudsburg, PA), 1–6.Google Scholar
Gerrish SM, Blei DM (2011) Predicting legislative roll calls from text. Proc. 28th Internat. Conf. Machine Learn. (ICML) (OmniPress, Madison, WI), 489–496.Google Scholar
Hoffman MD, Gelman A (2014) The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Machine Learn. Res. 15(1):1593–1623.Google Scholar
Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1998) An introduction to variational methods for graphical models. Jordan MI, ed. Learning in Graphical Models, NATO ASI Series (Springer, Dordrecht, Netherlands), 105–161.Crossref, Google Scholar
Kim J, Bolt D (2007) Estimating item response theory models using Markov chain Monte Carlo methods. Ed. Measurement Issues Practice 26(4):38–51.Crossref, Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational Bayes. Bengio Y, LeCun Y, eds. 2nd Internat. Conf. Learn. Representations (ICLR) 2014. https://dblp.org/rec/journals/corr/KingmaW13.bib.Google Scholar
Krizhevsky A (2009) Learning multiple layers of features from tiny images.Google Scholar
Lalor JP (2020) Learning latent characteristics of data and models using item response theory. Unpublished PhD thesis, University of Massachusetts–Amherst, Amherst, MA.Google Scholar
Lalor JP, Rodriguez P (2022) py-irt version v2022.0061. http://dx.doi.org/https://doi.org/10.5281/zenodo.6818509.Google Scholar
Lalor JP, Yu H (2020) Dynamic data selection for curriculum learning via ability estimation. Cohn T, He Y, Liu Y, eds. Findings Assoc. Comput. Linguistics: EMNLP 2020 (Association for Computational Linguistics, Stroudsburg, PA), 545–555.Crossref, Google Scholar
Lalor JP, Wu H, Yu H (2016) Building an evaluation scale using item response theory. Proc. Conf. Empirical Methods Natural Language Processing, 648–657.Google Scholar
Lalor JP, Wu H, Yu H (2019) Learning latent parameters without human response patterns: Item response theory with artificial crowds. Proc. 2019 Conf. Empirical Methods Natural Language Processing 9th Internat. Joint Conf. Natl. Language Processing, (EMNLP-IJCNLP), (Association for Computational Linguistics, Stroudsburg, PA), 4248–4258.Google Scholar
LeCun Y, Cortes C, Burges CJ (1998) MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/.Google Scholar
Lord FM, Novick MR, Birnbaum A (1968) Statistical theories of mental test scores https://psycnet.apa.org/fulltext/1968-35040-000.pdf.Google Scholar
Martínez-Plumed F, Prudêncio RBC, Martínez-Usó A, Hernández-Orallo J (2019) Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 271:18–42.Crossref, Google Scholar
Martínez-Plumed F, Prudêncio RBC, Usó AM, Hernández-Orallo J (2016) Making sense of item response theory in machine learning. Frontiers in Artificial Intelligence and Applications, vol. 285 (IOS Press), 1140–1148.Google Scholar
Natesan P, Nandakumar R, Minka T, Rubright JD (2016) Bayesian prior choice in IRT estimation using MCMC and variational bayes. Frontiers Psych. 7.Google Scholar
Nguyen VA, Boyd-Graber J, Resnik P, Miler K (2015) Tea party in the house: A hierarchical ideal point topic model and its application to republican legislators in the 112th congress. Proc. 53rd Annual Meeting Assoc. Comput. Linguistics and 7th Internat. Joint Conf. Natl. Language Processing, vol. 1 (The Association for Computer Linguistics, Stroudsburg, PA), 1438–1448.Google Scholar
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Wallach HM, Larochelle H, Beygelzimer A, d’Alch’ e-Buc F, Fox EB, Garnett R, eds. Adv. Neural Inform. Processing Systems 32: Annual Conf. Neural Inform. Processing Systems 2019 (NeurIPS) (Curran Associates, Inc., Redhook, NY), 8024–8035.Google Scholar
Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann Publishers Inc., San Francisco).Google Scholar
Poole KT, Rosenthal H (2017) Ideology & Congress: A Political Economic History of Roll Call Voting, 2nd ed. (Routledge, London).Crossref, Google Scholar
Reckase MD (2009) Multidimensional item response theory models. Reckase, ed. Multidimensional Item Response Theory (Springer, New York), 79–112.Crossref, Google Scholar
Rizopoulos D (2006) Ltm: An R package for latent variable modeling and item response analysis. J. Statist. Software 17(5):1–25.Crossref, Google Scholar
Rodriguez P, Barrow J, Hoyle AM, Lalor JP, Jia R, Boyd-Graber J (2021) Evaluation examples are not equally informative: How should that change NLP leaderboards? Zong C, Xia F, Li W, Navigli R, eds. Proc. 59th Annual Meeting Assoc. Comput. Linguistics and 11th Internat. Joint Conf. Natl. Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4486–4503.Google Scholar
Roy A, Qureshi S, Pande K, Nair D, Gairola K, Jain P, Singh S, et al. (2019) Performance comparison of machine learning platforms. INFORMS J. Comput. 31(2):207–225.Link, Google Scholar
Satopää VA, Salikhov M, Tetlock PE, Mellers B (2021) Bias, information, noise: The BIN model of forecasting. Management Sci. 67(12):7599–7618.Link, Google Scholar
Sedoc J, Ungar L (2020) Item response theory for efficient human evaluation of chatbots. Proc. First Workshop Evaluation Comparison NLP Systems, 21–33.Google Scholar
Socher R, Perelygin A, Wu J, Chuang J, Manning DC, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. Proc. 2013 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1631–1642.Google Scholar
Urban CJ, Bauer DJ (2021) A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika 86(1):1–29.Crossref, Google Scholar
van Rijn PW, Sinharay S, Haberman SJ, Johnson MS (2016) Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-scale Assessments Ed. 4(1):10.Crossref, Google Scholar
Vania C, Htut PM, Huang W, Mungra D, Pang RY, Phang J, Liu H, Cho K, Bowman SR (2021) Comparing test sets with item response theory. Zong C, Xia F, Li W, Navigli R, eds. Proc. 59th Annual Meeting Assoc. Comput. Linguistics and 11th Internat. Joint Conf. Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1141–1158.Google Scholar
Wu M, Davis R, Domingue B, Piech C, Goodman ND (2020) Variational item response theory: Fast, accurate, and expressive. Rafferty AN, Whitehill J, Romero C, Cavalli-Sforza V, eds. Proc. 13th Internat. Conf. Educational Data Mining (International Educational Data Mining Society, Brussels, Belgium).Google Scholar
Zhang X, Chen L, Gendreau M, Langevin A (2021) Learning-based branch-and-price algorithms for the vehicle routing problem with time windows and two-dimensional loading constraints. INFORMS J. Comput. 34(3):1419–1436.Link, Google Scholar

cover image INFORMS Journal on Computing

Volume 35, Issue 1

January-February 2023

Pages 1-264, C2

Article Information

Supplemental Material

Metrics

Information

Received:March 02, 2022
Accepted:September 21, 2022
Published Online:November 15, 2022

Cite as

John Patrick Lalor, Pedro Rodriguez (2022) py-irt: A Scalable Item Response Theory Library for Python. INFORMS Journal on Computing 35(1):5-13.

https://doi.org/10.1287/ijoc.2022.1250

Keywords

Acknowledgments

The authors thank the editors and reviewers for their valuable comments.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

py-irt: A Scalable Item Response Theory Library for Python

References

Volume 35, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News