py-irt: A Scalable Item Response Theory Library for Python

Published Online:https://doi.org/10.1287/ijoc.2022.1250

References

  • Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, et al. (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Preprint, submitted March 14, https://arxiv.org/abs/1603.04467.Google Scholar
  • Baker FB (2001) The Basics of Item Response Theory (ERIC Clearinghouse on Assessment and Evaluation, University of Maryland, College Park, MD).Google Scholar
  • Baker FB, Kim SH (2004) Item Response Theory: Parameter Estimation Techniques, 2nd ed. (CRC Press, New York).CrossrefGoogle Scholar
  • Baldock R, Maennel H, Neyshabur B (2021) Deep learning through the lens of example difficulty. Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems 34: Annual Conf. Neural Inform. Processing Systems 2021 (NeurIPS) (Curran Associates, Inc., Redhook, NY), 10876–10889.Google Scholar
  • Battauz M (2015) equateIRT: An R package for IRT test equating. J. Statist. Software 68(1):1–22.Google Scholar
  • Bergner Y, Halpin P, Vie JJ (2022) Multidimensional item response theory in the style of collaborative filtering. Psychometrika 87(1):266–288.CrossrefGoogle Scholar
  • Bingham E, Chen JP, Jankowiak M, Obermeyer F, Pradhan N, Karaletsos T, Singh R, Szerlip P, Horsfall P, Goodman ND (2019) Pyro: Deep universal probabilistic programming. J. Machine Learn. Res. 20(1):973–978.Google Scholar
  • Bock RD, Aitkin M (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 46(4):443–459.CrossrefGoogle Scholar
  • Carlson JE, von Davier M (2013) Item Response Theory, ETS Research Report Series, vol. 2, i-69.Google Scholar
  • Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: A probabilistic programming language. J. Statist. Software 76(1):1–32.CrossrefGoogle Scholar
  • Chalmers RP (2012) mirt: A multidimensional item response theory package for the R environment. J. Statist. Software 48:1–29.Google Scholar
  • Chen S, Xie W (2021) On cluster-aware supervised learning: Frameworks, convergent algorithms, and applications. INFORMS J. Comput. 34(1):481–502.LinkGoogle Scholar
  • Chen Y, Li X, Zhang S (2019) Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika 84(1):124–146.CrossrefGoogle Scholar
  • Cui J, Rosoff H, John RS (2017) A polytomous item response theory model for measuring near-miss appraisal as a psychological trait. Decision Anal. 14(2):75–86.LinkGoogle Scholar
  • de Jong MG, Lehmann DR, Netzer O (2012) State-dependence effects in surveys. Marketing Sci. 31(5):838–854.LinkGoogle Scholar
  • de Jong MG, Steenkamp JBEM, Veldkamp BP (2009) A model for the construction of country-specific yet internationally comparable short-form marketing scales. Marketing Sci. 28(4):674–689.LinkGoogle Scholar
  • Edgeworth FY (1888) The statistics of examinations. J. Roy. Statist. Soc. 51(3):599–635.Google Scholar
  • Gardner M, Grus J, Neumann M, Tafjord O, Dasigi P, Liu NF, Peters M, Schmitz M, Zettlemoyer LS (2018) AllenNLP: A deep semantic natural language processing platform. Proc. Workshop NLP Open Source Software (NLP-OSS) (Association for Computational Linguistics, Stroudsburg, PA), 1–6.Google Scholar
  • Gerrish SM, Blei DM (2011) Predicting legislative roll calls from text. Proc. 28th Internat. Conf. Machine Learn. (ICML) (OmniPress, Madison, WI), 489–496.Google Scholar
  • Hoffman MD, Gelman A (2014) The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Machine Learn. Res. 15(1):1593–1623.Google Scholar
  • Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1998) An introduction to variational methods for graphical models. Jordan MI, ed. Learning in Graphical Models, NATO ASI Series (Springer, Dordrecht, Netherlands), 105–161.CrossrefGoogle Scholar
  • Kim J, Bolt D (2007) Estimating item response theory models using Markov chain Monte Carlo methods. Ed. Measurement Issues Practice 26(4):38–51.CrossrefGoogle Scholar
  • Kingma DP, Welling M (2013) Auto-encoding variational Bayes. Bengio Y, LeCun Y, eds. 2nd Internat. Conf. Learn. Representations (ICLR) 2014. https://dblp.org/rec/journals/corr/KingmaW13.bib.Google Scholar
  • Krizhevsky A (2009) Learning multiple layers of features from tiny images.Google Scholar
  • Lalor JP (2020) Learning latent characteristics of data and models using item response theory. Unpublished PhD thesis, University of Massachusetts–Amherst, Amherst, MA.Google Scholar
  • Lalor JP, Rodriguez P (2022) py-irt version v2022.0061. http://dx.doi.org/https://doi.org/10.5281/zenodo.6818509.Google Scholar
  • Lalor JP, Yu H (2020) Dynamic data selection for curriculum learning via ability estimation. Cohn T, He Y, Liu Y, eds. Findings Assoc. Comput. Linguistics: EMNLP 2020 (Association for Computational Linguistics, Stroudsburg, PA), 545–555.CrossrefGoogle Scholar
  • Lalor JP, Wu H, Yu H (2016) Building an evaluation scale using item response theory. Proc. Conf. Empirical Methods Natural Language Processing, 648–657.Google Scholar
  • Lalor JP, Wu H, Yu H (2019) Learning latent parameters without human response patterns: Item response theory with artificial crowds. Proc. 2019 Conf. Empirical Methods Natural Language Processing 9th Internat. Joint Conf. Natl. Language Processing, (EMNLP-IJCNLP), (Association for Computational Linguistics, Stroudsburg, PA), 4248–4258.Google Scholar
  • LeCun Y, Cortes C, Burges CJ (1998) MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/.Google Scholar
  • Lord FM, Novick MR, Birnbaum A (1968) Statistical theories of mental test scores https://psycnet.apa.org/fulltext/1968-35040-000.pdf.Google Scholar
  • Martínez-Plumed F, Prudêncio RBC, Martínez-Usó A, Hernández-Orallo J (2019) Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 271:18–42.CrossrefGoogle Scholar
  • Martínez-Plumed F, Prudêncio RBC, Usó AM, Hernández-Orallo J (2016) Making sense of item response theory in machine learning. Frontiers in Artificial Intelligence and Applications, vol. 285 (IOS Press), 1140–1148.Google Scholar
  • Natesan P, Nandakumar R, Minka T, Rubright JD (2016) Bayesian prior choice in IRT estimation using MCMC and variational bayes. Frontiers Psych. 7.Google Scholar
  • Nguyen VA, Boyd-Graber J, Resnik P, Miler K (2015) Tea party in the house: A hierarchical ideal point topic model and its application to republican legislators in the 112th congress. Proc. 53rd Annual Meeting Assoc. Comput. Linguistics and 7th Internat. Joint Conf. Natl. Language Processing, vol. 1 (The Association for Computer Linguistics, Stroudsburg, PA), 1438–1448.Google Scholar
  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, et al. (2019) Pytorch: An imperative style, high-performance deep learning library. Wallach HM, Larochelle H, Beygelzimer A, d’Alch’ e-Buc F, Fox EB, Garnett R, eds. Adv. Neural Inform. Processing Systems 32: Annual Conf. Neural Inform. Processing Systems 2019 (NeurIPS) (Curran Associates, Inc., Redhook, NY), 8024–8035.Google Scholar
  • Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Morgan Kaufmann Publishers Inc., San Francisco).Google Scholar
  • Poole KT, Rosenthal H (2017) Ideology & Congress: A Political Economic History of Roll Call Voting, 2nd ed. (Routledge, London).CrossrefGoogle Scholar
  • Reckase MD (2009) Multidimensional item response theory models. Reckase, ed. Multidimensional Item Response Theory (Springer, New York), 79–112.CrossrefGoogle Scholar
  • Rizopoulos D (2006) Ltm: An R package for latent variable modeling and item response analysis. J. Statist. Software 17(5):1–25.CrossrefGoogle Scholar
  • Rodriguez P, Barrow J, Hoyle AM, Lalor JP, Jia R, Boyd-Graber J (2021) Evaluation examples are not equally informative: How should that change NLP leaderboards? Zong C, Xia F, Li W, Navigli R, eds. Proc. 59th Annual Meeting Assoc. Comput. Linguistics and 11th Internat. Joint Conf. Natl. Language Processing, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4486–4503.Google Scholar
  • Roy A, Qureshi S, Pande K, Nair D, Gairola K, Jain P, Singh S, et al. (2019) Performance comparison of machine learning platforms. INFORMS J. Comput. 31(2):207–225.LinkGoogle Scholar
  • Satopää VA, Salikhov M, Tetlock PE, Mellers B (2021) Bias, information, noise: The BIN model of forecasting. Management Sci. 67(12):7599–7618.LinkGoogle Scholar
  • Sedoc J, Ungar L (2020) Item response theory for efficient human evaluation of chatbots. Proc. First Workshop Evaluation Comparison NLP Systems, 21–33.Google Scholar
  • Socher R, Perelygin A, Wu J, Chuang J, Manning DC, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. Proc. 2013 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1631–1642.Google Scholar
  • Urban CJ, Bauer DJ (2021) A deep learning algorithm for high-dimensional exploratory item factor analysis. Psychometrika 86(1):1–29.CrossrefGoogle Scholar
  • van Rijn PW, Sinharay S, Haberman SJ, Johnson MS (2016) Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-scale Assessments Ed. 4(1):10.CrossrefGoogle Scholar
  • Vania C, Htut PM, Huang W, Mungra D, Pang RY, Phang J, Liu H, Cho K, Bowman SR (2021) Comparing test sets with item response theory. Zong C, Xia F, Li W, Navigli R, eds. Proc. 59th Annual Meeting Assoc. Comput. Linguistics and 11th Internat. Joint Conf. Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1141–1158.Google Scholar
  • Wu M, Davis R, Domingue B, Piech C, Goodman ND (2020) Variational item response theory: Fast, accurate, and expressive. Rafferty AN, Whitehill J, Romero C, Cavalli-Sforza V, eds. Proc. 13th Internat. Conf. Educational Data Mining (International Educational Data Mining Society, Brussels, Belgium).Google Scholar
  • Zhang X, Chen L, Gendreau M, Langevin A (2021) Learning-based branch-and-price algorithms for the vehicle routing problem with time windows and two-dimensional loading constraints. INFORMS J. Comput. 34(3):1419–1436.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.