ALARM: Automated MLLM-Based Anomaly Detection in Complex-Environment Monitoring with Uncertainty Quantification

Published Online:https://doi.org/10.1287/ijds.2025.0107

References

  • Abshari D, Fu C, Sridhar M (2024) LLM-assisted physical invariant extraction for cyber-physical systems anomaly detection. Preprint, submitted November 17, https://arxiv.org/abs/2411.10918.Google Scholar
  • Adhikari D, Jiang W, Zhan J, Rawat DB, Bhattarai A (2024) Recent advances in anomaly detection in internet of things: Status, challenges, and perspectives. Comput. Sci. Rev. 54:100665.Google Scholar
  • Ali MM (2023) Real-time video anomaly detection for smart surveillance. IET Image Processing 17(5):1375–1388.Google Scholar
  • Alves JV, Leitão D, Jesus S, Sampaio MO, Liébana J, Saleiro P, Figueiredo MA, et al. (2025) A benchmarking framework and data set for learning to defer in human-ai decision-making. Sci. Data 12(1):506.Google Scholar
  • Bharadwaj R, Gani H, Naseer M, Khan FS, Khan S (2024) Vane-bench: Video anomaly evaluation benchmark for conversational LMMs. Preprint, submitted June 14, https://arxiv.org/abs/2406.10326.Google Scholar
  • Bhat A, Mondal A, Tripathy A (2025) LLM agents for internet of things (IoT) applications. Proc. CS598 LLM Agent 2025 Workshop (OpenReview. net).Google Scholar
  • Bui AL, Fonarow GC (2012) Home monitoring for heart failure management. J. Amer. College Cardiology 59(2):97–104.Google Scholar
  • Chen CM (2011) Web-based remote human pulse monitoring system with intelligent data analysis for home health care. Expert Systems Appl. 38(3):2011–2019.Google Scholar
  • Chen J, Mueller J (2023) Quantifying uncertainty in answers from any language model and enhancing their trustworthiness. Preprint, submitted August 30, https://arxiv.org/abs/2308.16175.Google Scholar
  • Chen T, Liu X, Da L, Chen J, Papalexakis V, Wei H (2025) Uncertainty quantification of large language models through multi-dimensional responses. Preprint, submitted February 24, https://arxiv.org/abs/2502.16820.Google Scholar
  • Chow C (2003) On optimum recognition error and reject tradeoff. IEEE Trans. Inform. Theory 16(1):41–46.Google Scholar
  • Da L, Chen T, Cheng L, Wei H (2024) LLM uncertainty quantification through directional entailment graph and claim level response augmentation. Preprint, submitted July 1, https://arxiv.org/abs/2407.00994.Google Scholar
  • Duan H, Zhang J, Zhang L, Wu Y, Lv T, Zeng Y, Cheng X (2025) A home broadband maintenance and installation solution leveraging LLM-agent technology. Proc. IEEE 8th Inform. Tech. Mechatronics Engrg. Conf., vol. 8 (IEEE, Piscataway, NJ), 1–6.Google Scholar
  • D’Incecco M, Squartini S, Zhong M (2019) Transfer learning for non-intrusive load monitoring. IEEE Trans. Smart Grid 11(2):1419–1429.Google Scholar
  • Franc V, Prusa D, Voracek V (2023) Optimal strategies for reject option classifiers. J. Machine Learn. Res. 24(11):1–49.Google Scholar
  • Gao S, Yang P, Liu Y, Chen Y, Zhu H, Zhang X, Huang L (2025) VAGU & GTS: LLM-based benchmark and framework for joint video anomaly grounding and understanding. Preprint, submitted July 29, https://arxiv.org/abs/2507.21507.Google Scholar
  • Geifman Y, El-Yaniv R (2017) Selective classification for deep neural networks. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 4878–4887.Google Scholar
  • Geifman Y, El-Yaniv R (2019) Selectivenet: A deep neural network with an integrated reject option. Chaudhuri K, Sugiyama M, eds. Proc. 36th Internat. Conf. Machine Learn., vol 97 (PMLR, New York), 2151–2159.Google Scholar
  • Gong D, Liu L, Le V, Saha B, Mansour MR, Venkatesh S, Avd H (2019) Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE, Piscataway, NJ), 1705–1714.Google Scholar
  • Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences. Proc. IEEE Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 733–742.Google Scholar
  • He M, Jia T, Duan C, Cai H, Li Y, Huang G (2024) LLMelog: An approach for anomaly detection based on LLM-enriched log events. Proc. IEEE 35th Internat. Sympos. Software Reliability Engrg. (IEEE, Piscataway, NJ), 132–143.Google Scholar
  • Ho J, Salimans T, Gritsenko A, Chan W, Norouzi M, Fleet DJ (2022) Video diffusion models. Oh A, Agarwal A, Belgrave D, Cho K, eds. Adv. Neural Inform Processing Systems, vol. 35 (Curran Associates, Inc., Red Hook, NY), 8633–8646.Google Scholar
  • Hou B, Liu Y, Qian K, Andreas J, Chang S, Zhang Y (2023) Decomposing uncertainty for large language models through input clarification ensembling. Preprint, submitted November 15, https://arxiv.org/abs/2311.08718.Google Scholar
  • Inan H, Upasani K, Chi J, Rungta R, Iyer K, Mao Y, Tontchev M, et al. (2023) Llama guard: LLM-based input-output safeguard for human-ai conversations. Preprint, submitted December 7, https://arxiv.org/abs/2312.06674.Google Scholar
  • Ionescu RT, Khan FS, Georgescu MI, Shao L (2019) Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 7842–7851.Google Scholar
  • Kiran BR, Thomas DM, Parakkal R (2018) An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 4(2):36.Google Scholar
  • Kirchhof M, Kasneci G, Kasneci E (2025) Position: Uncertainty quantification needs reassessment for large-language model agents. Preprint, submitted May 28, https://arxiv.org/abs/2505.22655.Google Scholar
  • Li S, Liu F, Jiao L (2022) Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (AAAI Press, Palo Alto, CA), 1395–1403.Google Scholar
  • Lin Y, Liu S, Huang S (2018) Selective sensing of a heterogeneous population of units with dynamic health conditions. IISE Trans. 50(12):1076–1088.Google Scholar
  • Lin Z, Trivedi S, Sun J (2024) Generating with confidence: Uncertainty quantification for black-box large language models. Trans. Machine Learn. Res.Google Scholar
  • Ling C, Zhao X, Zhang X, Cheng W, Liu Y, Sun Y, Oishi M, et al. (2024) Uncertainty quantification for in-context learning of large language models. Preprint, submitted February 1, https://arxiv.org/abs/2402.10189.Google Scholar
  • Liu J, Xia Y, Tang Z (2021) Privacy-preserving video fall detection using visual shielding information. Visual Comput. 37(2):359–370.Google Scholar
  • Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–A new baseline. Proc. IEEE Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 6536–6545.Google Scholar
  • Liu X, Chen T, Da L, Chen C, Lin Z, Wei H (2025) Uncertainty quantification and confidence calibration in large language models: A survey. Proc. 31st ACM SIGKDD Conf. Knowledge Discovery Data Mining V.2 (ACM, New York), 6107–6117.Google Scholar
  • Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2012) Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Machine Intelligence 35(1):171–184.Google Scholar
  • Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Survey 55(9):1–35.Google Scholar
  • Lopes SI, Pinho P, Marques P, Abreu C, Carvalho NB, Ferreira J (2021) Contactless smart screening in nursing homes: An IoT-enabled solution for the COVID-19 era. Proc. 17th Internat. Conf. Wireless Mobile Comput. Networking Comm. (IEEE, Piscataway, NJ), 145–150.Google Scholar
  • Lopes SI, Silva F, Pinho P, Marques P, Abreu C, Milheiro J, Braga B, et al. (2024) CoViS: A contactless health monitoring system for the nursing home. IEEE Access 12:20802–20821.Google Scholar
  • Lv H, Sun Q (2024) Video anomaly detection and explanation via large language models. Preprint, submitted January 11, https://arxiv.org/abs/2401.05702.Google Scholar
  • Lv H, Chen C, Cui Z, Xu C, Li Y, Yang J (2021) Learning normal dynamics in videos with meta prototype network. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 15425–15434.Google Scholar
  • Madras D, Pitassi T, Zemel R (2018) Predict responsibly: Improving fairness and accuracy by learning to defer. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 6150–6160.Google Scholar
  • Malone M, Schultz G (2022) Challenges in the diagnosis and management of wound infection. British J. Dermatology 187(2):159–166.Google Scholar
  • Markovitz A, Sharir G, Friedman I, Zelnik-Manor L, Avidan S (2020) Graph embedded pose clustering for anomaly detection. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 10539–10547.Google Scholar
  • Mehandru N, Golchini N, Bamman D, Zack T, Molina MF, Alaa A (2025) Er-reason: A benchmark data set for llm-based clinical reasoning in the emergency room. Preprint, submitted May 28, https://arxiv.org/abs/2505.22919.Google Scholar
  • Mnih A, Salakhutdinov RR (2007) Probabilistic matrix factorization. Platt JC, Koller D, Singer Y, Roweis S, eds. Advances in Neural Information Processing Systems, vol. 20. (Curran Associates, Inc., Red Hook, NY), 1257–1264.Google Scholar
  • Nayak R, Pati UC, Das SK (2021) A comprehensive review on deep learning-based methods for video anomaly detection. Image Vision Comput. 106:104078.Google Scholar
  • Nikitin A, Kossen J, Gal Y, Marttinen P (2024) Kernel language entropy: Fine-grained uncertainty quantification for LLMs from semantic similarities. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Adv. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY), 8901–8929.Google Scholar
  • Ntelopoulos A, Nasrollahi K (2024) CALLM: Cascading autoencoder and large language model for video anomaly detection. Proc. Internat. Conf. Image Processing Theory Tools Appl. (IEEE, Piscataway, NJ).Google Scholar
  • Pan Q, Bao Y, Li H (2023) Transfer learning-based data anomaly detection for structural health monitoring. Structural Health Monitoring 22(5):3077–3091.Google Scholar
  • Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: A review. ACM Comput. Surveys 54(2):1–38.Google Scholar
  • Park T (2024) Enhancing anomaly detection in financial markets with an llm-based multi-agent framework. Preprint, submitted March 28, https://arxiv.org/abs/2403.19735.Google Scholar
  • Park KW, Mirian MS, McKeown MJ (2024) Artificial intelligence-based video monitoring of movement disorders in the elderly: A review on current and future landscapes. Singapore Medical J. 65(3):141–149.Google Scholar
  • Patel S, Lorincz K, Hughes R, Huggins N, Growdon J, Standaert D, Akay M, et al. (2009) Monitoring motor fluctuations in patients with parkinson’s disease using wearable sensors. IEEE Trans. Inform. Tech. Biomedicine 13(6):864–873.Google Scholar
  • Ren J, Xia F, Liu Y, Lee I (2021) Deep video anomaly detection: Opportunities and challenges. Proc. Internat. Conf. Data Mining Workshops (IEEE, Piscataway, NJ), 959–966.Google Scholar
  • Rivkin D, Hogan F, Feriani A, Konar A, Sigal A, Liu X, Dudek G (2024) A IoT smart home via autonomous LLM agents. IEEE Internet Things J.Google Scholar
  • Romano Y, Sesia M, Candes E (2020) Classification with valid and adaptive coverage. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 3581–3591.Google Scholar
  • Sadinle M, Lei J, Wasserman L (2019) Least ambiguous set-valued classifiers with bounded error levels. J. Amer. Statist. Assoc. 114(525):223–234.Google Scholar
  • Song H, Ji R, Shi N, Lai F, Kontar RA (2025) Inv-entropy: A fully probabilistic framework for uncertainty quantification in language models. Preprint, submitted June 11, https://arxiv.org/abs/2506.09684.Google Scholar
  • Stojkoska BLR, Trivodaliev KV (2017) A review of internet of things for smart home: Challenges and solutions. J. Clean Production 140(3):1454–1464.Google Scholar
  • Sun Y, Ortiz J (2024) An ai-based system utilizing iot-enabled ambient sensors and llms for complex activity tracking. Preprint, submitted July 2, https://arxiv.org/abs/2407.02606.Google Scholar
  • Tian YJ, Felber NA, Pageau F, Schwab DR, Wangmo T (2024) Benefits and barriers associated with the use of smart home health technologies in the care of older persons: A systematic review. BMC Geriatrics 24(1):152.Google Scholar
  • Tian Y, Pang G, Chen Y, Singh R, Verjans JW, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE, Piscataway, NJ), 4975–4986.Google Scholar
  • Vyas J, Mercangoz M (2025) Autonomous control leveraging LLMs: An agentic framework for next-generation industrial automation. Preprint, submitted July 3, https://arxiv.org/abs/2507.07115.Google Scholar
  • Wang H, Qin J, Bastola A, Chen X, Suchanek J, Gong Z, Razi A (2024) VisionGPT: LLM-assisted real-time anomaly detection for safe visual navigation. Preprint, submitted March 19, https://arxiv.org/abs/2403.12415.Google Scholar
  • Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, Chowdhery A, et al. (2022) Self-consistency improves chain of thought reasoning in language models. Preprint, submitted March 21, https://arxiv.org/abs/2203.11171.Google Scholar
  • Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, et al. (2022) Chain-of-thought prompting elicits reasoning in large language models. Oh A, Agarwal A, Belgrave D, Cho K, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates, Inc., Red Hook, NY), 24824–24837.Google Scholar
  • Withanage KI, Lee I, Brinkworth R, Mackintosh S, Thewlis D (2016) Fall recovery subactivity recognition with RGB-D cameras. IEEE Trans. Industrial Inform. 12(6):2312–2320.Google Scholar
  • Wu P, Liu J (2021) Learning causal temporal relation and feature discrimination for anomaly detection. IEEE Trans. Image Processing 30:3513–3527.Google Scholar
  • Xu X, Cao Y, Chen Y, Shen W, Huang X (2024) Customizing visual-language foundation models for multi-modal anomaly detection and reasoning. Preprint, submitted March 17, https://arxiv.org/abs/2403.11083.Google Scholar
  • Yahaya SW, Lotfi A, Mahmud M (2021) Towards a data-driven adaptive anomaly detection system for human activity. Pattern Recognition Lett. 145:200–207.Google Scholar
  • Yang Y, Lee K, Dariush B, Cao Y, Lo SY (2024a) Follow the rules: Reasoning for video anomaly detection with large language models. Leonardis A, Ricci E, Roth S, Russakovsky O, Sattler T, Varol G, eds. Proc. Eur. Conf. Comput. Vision (Springer, Cham), 304–322.Google Scholar
  • Yang YY, Ho MY, Tai CH, Wu RM, Kuo MC, Tseng YJ (2024b) Fasteval parkinsonism: An instant deep learning–assisted video-based online system for parkinsonian motor symptom evaluation. NPJ Digital Medicine 7(1):31.Google Scholar
  • Ye F, Yang M, Pang J, Wang L, Wong D, Yilmaz E, Shi S, Tu Z (2024) Benchmarking LLMs via uncertainty quantification. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Adv. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY), 15356–15385.Google Scholar
  • Yuan T, He Z, Dong L, Wang Y, Zhao R, Xia T, Xu L, et al. (2024) R-judge: Benchmarking safety risk awareness for LLM agents. Preprint, submitted January 18, https://arxiv.org/abs/2401.10019.Google Scholar
  • Yuan J, Li H, Ding X, Xie W, Li YJ, Zhao W, Wan K, et al. (2025) Understanding and mitigating numerical sources of nondeterminism in LLM inference. Proc. 39th Annual Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • Zaheer MZ, Mahmood A, Khan MH, Segu M, Yu F, Lee SI (2022) Generative cooperative learning for unsupervised video anomaly detection. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 14744–14754.Google Scholar
  • Zanella L, Menapace W, Mancini M, Wang Y, Ricci E (2024) Harnessing large language models for training-free video anomaly detection. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 18527–18536.Google Scholar
  • Zhang Y, Cao Y, Xu X, Shen W (2024a) Logicode: An LLM-driven framework for logical anomaly detection. IEEE Trans. Automation Sci. Engrg.Google Scholar
  • Zhang H, Xu X, Wang X, Zuo J, Han C, Huang X, Gao C, et al. (2024b) Holmes-VAD: Towards unbiased and explainable video anomaly detection via multi-modal LLM. Preprint, submitted June 18, https://arxiv.org/abs/2406.12235.Google Scholar
  • Zhao X, Zhang C, Guo P, Li W, Chen L, Zhao C, Huang S (2025) Smarthome-bench: A comprehensive benchmark for video anomaly detection in smart homes using multi-modal large language models. Proc. Comput. Vision Pattern Recognition Conf. Workshops (IEEE, Piscataway, NJ), 3975–3985.Google Scholar
  • Zhu S, Chen C, Sultani W (2021) Video anomaly detection for smart surveillance. Ionescu C, Vetterli M, eds. Computer Vision: A Reference Guide (Springer, Cham), 1315–1322.Google Scholar
  • Zhu J, Cai S, Deng F, Ooi BC, Wu J (2024) Do LLMs understand visual anomalies? Uncovering LLM’s capabilities in zero-shot anomaly detection. Proc. 32nd ACM Internat. Conf. Multimedia (ACM, New York), 48–57.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.