SUVA: A Probabilistic Framework for Auditing LLMs with an Application to Social Preferences
Abstract
As organizations increasingly use large language models (LLMs) in delegated decision tasks, understanding and auditing their outputs is important for responsible deployment. However, despite LLMs’ widespread adoption, systematic tools for interpreting and auditing their decision outputs are limited. We introduce a probabilistic auditing framework, state–understanding–value–action (SUVA), to evaluate LLM behavior based on textual responses. SUVA assesses LLMs through both their final decisions and the reasoning processes leading to those decisions. We demonstrate SUVA’s utility by analyzing LLM social preferences using canonical behavioral economic games and concepts relevant to LLM users. We focus on eight LLMs, covering both open-source and proprietary models with varying versions and capacities. Our results show that LLMs exhibit prosocial tendencies in addition to purely self-interested ones. They also adjust their behavior in response to social cues, such as reciprocity and group identity, as well as to practical contexts. Moreover, we demonstrate that posttraining alignment methods can systematically reshape these social preferences, enabling behavioral alignment. Overall, SUVA provides a quantitative auditing framework that helps practitioners assess whether observed LLM behavior aligns with their goals and supports iterative audit–alignment cycles. For researchers, it provides a structured lens on how LLM outputs emerge from chain-of-thought reasoning, improving transparency without attributing human-like cognition.
History: Ravi Bapna, Senior Editor; Pallab Sanyal, Associate Editor.
Funding: This research was supported by the University of Texas at Austin Office of the Vice President for Research, Scholarship and Creative Endeavors (OVPR) Research & Creative Grant. Y. Leng was supported by the U.S. National Science Foundation (NSF) [Grant IIS-2153468].
Supplemental Material: The online appendix is available at https://doi.org/10.1287/isre.2024.0857.

