ALARM: Automated MLLM-Based Anomaly Detection in Complex-Environment Monitoring with Uncertainty Quantification
Abstract
The advance of large language models (LLMs) has greatly stimulated research interest in developing multimodal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that, in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called automated MLLM-based anomaly detection in complex-environment monitoring with UQ (ALARM). ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM’s superior performance and its generic applicability across different domains for reliable decision making.
History: Yu Ding and Ningyuan Chen served as senior editors for this article.
Funding: This work was supported in part by a grant from Amazon.
Data Ethics & Reproducibility Note: The code capsule is available at https://github.com/RancyZ/ALARM/tree/main and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2025.0107).

