From Lexicons to Large Language Models: A Holistic Evaluation of Psychometric Text Analysis in Social Science Research
Published Online:22 Apr 2026https://doi.org/10.1287/isre.2024.1143
References
- (2018) Text analytics to support sense-making in social media: A language-action perspective. MIS Quart. 42(2):427–464.Crossref, Google Scholar
- (2024) Pathways for design research on artificial intelligence. Inform. Systems Res. 35(2):441–459.Link, Google Scholar
- (2021) Constructing a psychometric testbed for fair natural language processing. Proc. 2021 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 3748–3758.Google Scholar
- (2018) The impact of user personality traits on word of mouth: Text-mining social media platforms. Inform. Systems Res. 29(3):612–640.Link, Google Scholar
- (2020) A deep learning architecture for psychometric natural language processing. ACM Trans. Inform. Systems 38(1):1–29.Crossref, Google Scholar
- (2023) Annotating data for fine-tuning a neural ranker? Current active learning strategies are not better than random selection. Proc. Annual Internat. ACM SIGIR-AP ‘23 (Association for Computing Machinery, New York), 139–149.Google Scholar
- (2022) Role theory perspectives: Past, present, and future applications of role theories in management research. J. Management 48(6):1469–1502.Crossref, Google Scholar
- (2021) Bad seeds: Evaluating lexical methods for bias measurement. Proc. 59th Annual Meeting Assoc. Comput. Linguistics and 11th Internat. Joint Conf. Natl. Language Processing, vol. 1: Long Papers (Association for Computational Linguistics, Stroudsburg, PA), 1889–1904.Google Scholar
- (2019) FinBERT: Financial sentiment analysis with pre-trained language models. Preprint, submitted August 27, https://arxiv.org/abs/1908.10063.Google Scholar
- (2022) Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat. Sociol. Methods Res. 51(4):1484–1539.Crossref, Google Scholar
- (2010) SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proc. Seventh Internat. Conf. Language Resources Evaluation LREC’10 (European Language Resources Association, Paris), 2200–2204.Google Scholar
- (2014) Architecture of fluid intelligence and working memory revealed by lesion mapping. Brain Structure Function 219(2):485–494.Crossref, Google Scholar
- (2016) Empathy and altruism. Brown KW, Leary MR, eds. The Oxford Handbook of Hypo-Egoic Phenomena (Oxford University Press, Oxford, UK), 161–174.Google Scholar
- (2019) SciBERT: A pretrained language model for scientific text. Preprint, submitted September 10, https://arxiv.org/abs/1903.10676.Google Scholar
- (2021) Investigating gender bias in BERT. Cognitive Comput. 13(4):1008–1018.Crossref, Google Scholar
- (1966) Role Theory: Concepts and Research (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2011) Emotions, oral arguments, and Supreme Court decision making. J. Politics 73(2):572–581.Crossref, Google Scholar
- (2020) Adversarial filters of dataset biases. Preprint, submitted July 11, https://arxiv.org/abs/2002.04108.Google Scholar
- (2020) Language models are few-shot learners. Adv. Neural Inform. Processing Systems 33:1877–1901.Google Scholar
- (2012) Neuro-QOL: Brief measures of health-related quality of life for clinical research in neurology. Neurology 78(23):1860–1867.Crossref, Google Scholar
- (2016) XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 785–794.Google Scholar
- (2023) When large language models meet personalization: Perspectives of challenges and opportunities. Preprint, submitted July 31, https://arxiv.org/abs/2307.16376.Google Scholar
- (2024) From persona to personalization: A survey on role-playing language agents. Preprint, submitted October 9, https://arxiv.org/abs/2404.18231.Google Scholar
- (2023) Contrastive chain-of-thought prompting. Preprint, submitted November 15, https://arxiv.org/abs/2311.09277.Google Scholar
- (2022) PaLM: Scaling language modeling with pathways. Preprint, submitted October 5, https://arxiv.org/abs/2204.02311.Google Scholar
- (2007) How emotions inform judgment and regulate thought. Trends Cognitive Sci. 11(9):393–399.Crossref, Google Scholar
- (1995) Support-vector networks. Machine Learn. 20(3):273–297.Crossref, Google Scholar
- (2004) The Positive and Negative Affect Schedule (PANAS): Construct validity, measurement properties and normative data in a large non-clinical sample. British J. Clin. Psych. 43(3):245–265.Crossref, Google Scholar
- (2004) The functional architecture of human empathy. Behav. Cognitive Neurosci. Rev. 3(2):71–100.Crossref, Google Scholar
- (2021) Empathy: Assessment instruments and psychometric quality—A systematic literature review with a meta-analysis of the past ten years. Front Psychol. 12:781346.Crossref, Google Scholar
- (2018) Dual Process Theory 2.0 (Routledge/Taylor & Francis Group, New York).Google Scholar
- (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1: Long and Short Papers (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
- (2018) The Hitchhiker’s guide to testing statistical significance in natural language processing. Gurevych I, Miyao Y, eds. Proc. 56th Annual Meeting Assoc. Comput. Linguistics, vol. 1: Long Papers (Association for Computational Linguistics, Stroudsburg, PA), 1383–1392.Google Scholar
- (2024) First-person fairness in chatbots. Preprint, submitted October 16, https://arxiv.org/abs/2410.19803.Google Scholar
- (2003) In two minds: Dual-process accounts of reasoning. Trends Cognitive Sci. 7(10):454–459.Crossref, Google Scholar
- (2013) Dual-process theories of higher cognition: Advancing the debate. Perspect. Psych. Sci. 8(3):223–241.Crossref, Google Scholar
- (2023) Multimodal knowledge graph construction of Chinese traditional operas and sentiment and genre recognition. J. Cultural Heritage 62:32–44.Crossref, Google Scholar
- (2022) Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Preprint, submitted June 16, https://arxiv.org/abs/2101.03961.Google Scholar
- (2009) A new look at emotional intelligence: A dual-process framework. Personality Soc. Psych. Rev. 13(1):21–44.Crossref, Google Scholar
- (2021) Initial evidence for the hypersensitivity hypothesis: Emotional intelligence as a magnifier of emotional experience. J. Intelligence 9(2):24.Crossref, Google Scholar
- (2018) A comparative study of fairness-enhancing interventions in machine learning. Preprint, submitted February 12, https://arxiv.org/abs/1802.04422.Google Scholar
- (2024) SubData: A Python library to collect and combine datasets for evaluating LLM alignment on downstream tasks. Preprint, submitted December 21, https://arxiv.org/abs/2412.16783.Google Scholar
- (2019) Responding to bad press: How CEO temporal focus influences the sensitivity to negative media coverage of acquisitions. Acad. Management J. 62(3):918–943.Crossref, Google Scholar
- (2012) Neuro-QOL: Quality of life item banks for adults with neurological disorders: Item development and calibrations based upon clinical and general population testing. Quality Life Res. 21(3):475–486.Crossref, Google Scholar
- (2007) Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102(477):359–378.Crossref, Google Scholar
- (2024) The Llama 3 herd of models. Preprint, submitted November 23, https://arxiv.org/abs/2407.21783.Google Scholar
- (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.Crossref, Google Scholar
- (2021) LoRA: Low-rank adaptation of large language models. Preprint, submitted October 16, https://arxiv.org/abs/2106.09685.Google Scholar
- (2023) FinBERT: A large language model for extracting information from financial text. Contemporary Accounting Res. 40(2):806–841.Crossref, Google Scholar
- (2024) The tangled webs we weave: Examining the effects of CEO deception on analyst recommendations. Strategic Management J. 45(1):66–112.Crossref, Google Scholar
- (2009) Parallel coordinates. Liu L, Tamer Özsu, eds. Encyclopedia of Database Systems (Springer US, New York), 2018–2024.Crossref, Google Scholar
- (2024) AI alignment: A comprehensive survey. Preprint, submitted May 1, https://arxiv.org/abs/2310.19852.Google Scholar
- (2023) Mistral 7B. Preprint, submitted October 10, https://arxiv.org/abs/2310.06825.Google Scholar
- (2010) Emotional intelligence: An integrative meta-analysis and cascading model. J. Appl. Psych. 95(1):54–78.Crossref, Google Scholar
- (2011) Thinking, Fast and Slow (Farrar, Straus and Giroux, New York).Google Scholar
- (2002) Representativeness revisited: Attribute substitution in intuitive judgment. Gilovich T, Griffin D, Kahneman D, eds. Heuristics and Biases: The Psychology of Intuitive Judgment (Cambridge University Press, New York), 49–81.Crossref, Google Scholar
- (2014) Causal reasoning with mental models. Front. Human Neurosci. 8:849.Crossref, Google Scholar
- (2024) Timely, granular, and actionable: Designing a social listening platform for public health 3.0. MIS Quart. 48(3):899–930.Crossref, Google Scholar
- (2024) Better zero-shot reasoning with role-play prompting. Preprint, submitted March 14, https://arxiv.org/abs/2308.07702.Google Scholar
- (2020) A blessing and a curse: How CEOs’ trait empathy affects their management of organizational crises. Acad. Management Rev. 45(1):130–153.Crossref, Google Scholar
- (2024) Explainable deep learning for false information identification: An argumentation theory approach. Inform. Systems Res. 35(2):890–907.Link, Google Scholar
- (2025) Guided Diverse Concept Miner (GDCM): Uncovering relevant constructs for managerial insights from text. Inform. Systems Res. 36(1):370–393.Link, Google Scholar
- (2021) The power of scale for parameter-efficient prompt tuning. Preprint, submitted September 2, https://arxiv.org/abs/2104.08691.Google Scholar
- (2020) Predicting labor market competition: Leveraging interfirm network and employee skills. Inform. Systems Res. 31(4):1443–1466.Link, Google Scholar
- (2020) Finding useful solutions in online knowledge communities: A theory-driven design and multilevel analysis. Inform. Systems Res. 31(3):731–752.Link, Google Scholar
- (2022) Automated detection of emotional and cognitive engagement in MOOC discussions to predict learning achievement. Comput. Ed. 181:104461.Crossref, Google Scholar
- (2024) Measuring spiritual values and bias of large language models. Preprint, submitted October 15, https://arxiv.org/abs/2410.11647.Google Scholar
- (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1):35–65.Crossref, Google Scholar
- (2023) Self-refine: Iterative refinement with self-feedback. Preprint, submitted May 25, https://arxiv.org/abs/2303.17651.Google Scholar
- (2022) A survey on bias and fairness in machine learning. Preprint, submitted January 25, https://arxiv.org/abs/1908.09635.Google Scholar
- (1995) WordNet: A lexical database for English. Comm. ACM 38(11):39–41.Crossref, Google Scholar
- (2022) Computationally intensive theory construction: A primer for authors and reviewers. MIS Quart. 46(2):3–18.Google Scholar
- (2020) DQI: Measuring data quality in NLP. Preprint, submitted May 2, https://arxiv.org/abs/2005.00816.Google Scholar
- (2024) Resilience messaging: The effect of governors’ social media communications on community compliance during a public health crisis. Inform. Systems Res. 35(2):505–527.Link, Google Scholar
- (2017) Dynamic self-regulation and multiple-goal pursuit. Annual Rev. Organ. Psych. Organ. Behav. 4(1):401–423.Crossref, Google Scholar
- (2003) At the interface of the affective, behavioral, and cognitive neurosciences: Decoding the emotional feelings of the brain. Brain Cognition 52(1):4–14.Crossref, Google Scholar
- (1996) Cognitive, emotional, and language processes in disclosure. Cognition Emotion 10(6):601–626.Crossref, Google Scholar
- (2023) Large language models can infer psychological dispositions of social media users. Preprint, submitted September 13, https://arxiv.org/abs/2309.08631.Google Scholar
- (2003) Common method biases in behavioral research: A critical review of the literature and recommended remedies. J. Appl. Psych. 88(5):879–903.Crossref, Google Scholar
- (2021) Correcting misclassification bias in regression models with variables generated via data mining. Inform. Systems Res. 32(2):462–480.Link, Google Scholar
- (2023) GPT is an effective tool for multilingual psychological text analysis. Preprint, submitted May 19, https://osf.io/sekf5.Google Scholar
- (2020) Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types. PLoS One 15(4):e0231189.Crossref, Google Scholar
- (1986) Sequential thought processes in PDP models. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 2: Psychological and Biological Models (MIT Press, Cambridge, MA), 7–57.Google Scholar
- (2021) Real-time brand reputation tracking using social media. J. Marketing 85(4):21–43.Crossref, Google Scholar
- (1990) Emotional intelligence. Imagination Cognition Personality 9(3):185–211.Crossref, Google Scholar
- (1975) A vector space model for automatic indexing. Comm. ACM 18(11):613–620.Crossref, Google Scholar
- (2020) Learning word ratings for empathy and distress from document-level user responses. Proc. Twelfth Language Resources Evaluation Conf. (European Language Resources Association, Paris), 1664–1673.Google Scholar
- (2020) Women’s leadership is associated with fewer deaths during the COVID-19 crisis: Quantitative and qualitative analyses of United States governors. J. Appl. Psych. 105(8):771–783.Crossref, Google Scholar
- (2017) Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. Preprint, submitted January 23, https://arxiv.org/abs/1701.06538.Google Scholar
- (2024) On the decision-making abilities in role-playing using large language models. Preprint, submitted February 29, https://arxiv.org/abs/2402.18807.Google Scholar
- (2009) The social neuroscience of empathy. Ann. NY Acad. Sci. 1156:81–96.Crossref, Google Scholar
- (2007) The affect heuristic. Eur. J. Oper. Res. 177(3):1333–1352.Crossref, Google Scholar
- (1998) Individual differences in rational thought. J. Experiment. Psych. General 127(2):161–188.Crossref, Google Scholar
- (1966) The General Inquirer: A Computer Approach to Content Analysis (MIT Press, Oxford, UK).Google Scholar
- (2007) SemEval-2007 Task 14: Affective text. Agirre E, Màrquez L, Wicentowski R, eds. Proc. Fourth Internat. Workshop Semantic Evaluations SemEval-2007 (Association for Computational Linguistics, Stroudsburg, PA), 70–74.Google Scholar
- (2022) Positive attentional bias mediates the relationship between trait emotional intelligence and trait affect. Sci. Rep. 12(1):20733.Crossref, Google Scholar
- (2024) Large language models for data annotation: A survey. Preprint, submitted December 2, https://arxiv.org/abs/2402.13446.Google Scholar
- (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J. Language Soc. Psych. 29(1):24–54.Crossref, Google Scholar
- (2012) What lies beneath: The linguistic traces of deception in online dating profiles. J. Comm. 62(1):78–97.Crossref, Google Scholar
- (2018) The power of EI competencies over intelligence and individual performance: A task-dependent model. Front. Psych. 9:1532.Crossref, Google Scholar
- (2024) Two tales of persona in LLMs: A survey of role-playing and personalization. Preprint, submitted October 5, https://arxiv.org/abs/2406.01171.Google Scholar
- (2017) Attention is all you need. 31st Conf. Neural Inform. Processing Systems (NIPS 2017) (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2023) Chain-of-thought prompting elicits reasoning in large language models. Preprint, submitted January 10, https://arxiv.org/abs/2201.11903.Google Scholar
- (2022) Emergent abilities of large language models. Preprint, submitted October 26, https://arxiv.org/abs/2206.07682.Google Scholar
- (2011) Psychopathy and lifetime experiences of depression. Criminal Behaviour Mental Health 21(4):279–294.Crossref, Google Scholar
- Wong CS, Law KS (2002) Wong and Law Emotional Intelligence Scale (WLEIS) [Database record]. APA PsycTests. https://doi.org/10.1037/t07398-000.Google Scholar
- (2023) Large language models are diverse role-players for summarization evaluation. Preprint, submitted September 19, https://arxiv.org/abs/2303.15078.Google Scholar
- (2021) The interplay between online reviews and physician demand: An empirical investigation. Management Sci. 67(12):7344–7361.Link, Google Scholar
- (2023) Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. Preprint, submitted December 19, https://arxiv.org/abs/2312.12148.Google Scholar
- (2023) Getting personal: A deep learning artifact for text-based measurement of personality. Inform. Systems Res. 34(1):194–222.Link, Google Scholar
- (2023) SDTM: A supervised Bayesian deep topic model for text analytics. Inform. Systems Res. 34(1):137–156.Link, Google Scholar
- (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
- (2023) Tree of thoughts: Deliberate problem solving with large language models. Preprint, submitted December 3, https://arxiv.org/abs/2305.10601.Google Scholar
- (2024) KETCH: A knowledge-enhanced transformer-based approach to suicidal ideation detection from social media content. Inform. Systems Res. 36(1):572–599.Link, Google Scholar
- (2024) When “a helpful assistant” is not really helpful: Personas in system prompts do not improve performances of large language models. Preprint, submitted October 9, https://arxiv.org/abs/2311.10054.Google Scholar

