CAAC: Co-attentive Actionability Classification for Assessing Patient Education Videos

Krishna Pothugunta
Corresponding Author
Krishna Pothugunta
[email protected]
https://orcid.org/0009-0000-1882-1825
Department of Information Technology, Analytics and Operations, Mendoza College of Business, University of Notre Dame, Notre Dame, Indiana 46556
Search for more papers by this author
,
Xiao Liu
Xiao Liu
[email protected]
https://orcid.org/0000-0001-8277-4701
Department of Information Systems, W. P. Carey School of Business, Arizona State University, Tempe, Arizona 85287
Search for more papers by this author
,
Anjana Susarla
Anjana Susarla
[email protected]
https://orcid.org/0000-0001-7482-1213
Department of Accounting and Information Systems, Eli Broad College of Business, Michigan State University, East Lansing, Michigan 48824
Search for more papers by this author
,
Rema Padman
Rema Padman
[email protected]
https://orcid.org/0000-0003-4250-4357
Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Corresponding Author

Krishna Pothugunta

Department of Information Technology, Analytics and Operations, Mendoza College of Business, University of Notre Dame, Notre Dame, Indiana 46556

Search for more papers by this author

Xiao Liu

[email protected]

https://orcid.org/0000-0001-8277-4701

Department of Information Systems, W. P. Carey School of Business, Arizona State University, Tempe, Arizona 85287

Search for more papers by this author

Anjana Susarla

[email protected]

https://orcid.org/0000-0001-7482-1213

Department of Accounting and Information Systems, Eli Broad College of Business, Michigan State University, East Lansing, Michigan 48824

Search for more papers by this author

Rema Padman

[email protected]

https://orcid.org/0000-0003-4250-4357

Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:9 Jun 2026https://doi.org/10.1287/ijoc.2023.0493

References

Arnab A, Dehghani M, Heigold G, Sun C, Lučić M, Schmid C (2021) Vivit: A video vision transformer. Berg T, Clark J, Matsushita Y, Taylor C, eds. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA), 6836–6846.Google Scholar
Ashley C, Tuten T (2015) Creative strategies in social media marketing: An exploratory study of branded social content and consumer engagement. Psych. Marketing 32(1):15–27.Crossref, Google Scholar
Aysolmaz B, Reijers HA (2021) Animation as a dynamic visualization technique for improving process model comprehension. Inform. Management 58(5):10347.Crossref, Google Scholar
Baur C, Prue C (2014) The CDC clear communication index is a new evidence-based tool to prepare and review health information. Health Promotion Practice 15(5):629–637.Crossref, Google Scholar
Beltagy I, Peters ME, Cohan A (2020) Longformer: The long-document transformer. Preprint, submitted April 10, https://arxiv.org/abs/2004.0515.Google Scholar
Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Crotty K (2011) Low health literacy and health outcomes: An updated systematic review. Ann. Internal Medicine 155(2):97–107.Crossref, Google Scholar
Chen Z, Wang S, Yan D, Li Y (2024) A spatio-temporl deepfake video detection method based on timesformer-cnn. Proc. 2024 Third Internat. Conf. Distributed Comput. Electrical Circuits Electronics (IEEE, Piscataway, NJ), 1–6.Google Scholar
D’Alfonso S (2020) Ai in mental health. Curr. Opin. Psychol. 36:112–117.Crossref, Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Burstein J, Doran C, Solorio T, eds. Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, Long and Short Papers, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, et al. (2021) An image is worth 16x16 words: Transformers for image recognition at scale. Proc. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
Draus P (2020) Impact of student engagement strategies on video content in learning computer programming and attitudes towards video instruction that was developed based on the cognitive theory of multimedia learning. Issues Inform. Systems 21(3):126–134.Google Scholar
Eichler K, Wieser S, Brügger U (2009) The costs of limited health literacy: A systematic review. Int. J. Public Health 54(5):313–324.Crossref, Google Scholar
Gadzicki K, Khamsehashari R, Zetzsche C (2020) Early vs late fusion in multimodal convolutional neural networks. Proc. IEEE 23rd Internat. Conf. Inform. Fusion (IEEE, Piscataway, NJ), 1–6.Google Scholar
Gat I, Schwartz I, Schwing A, Hazan T (2020) Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin HT, eds. Advances in Neural Information Processing Systems, vol. 33 ( Curran Associates, Red Hook, NY), 3197–3208.Google Scholar
Gemino A, Parker D, Kutzschan AO (2005) Investigating coherence and multimedia effects of a technology-mediated collaborative environment. J. Management Inform. Systems 22(3):97–121.Crossref, Google Scholar
Guo Y, Liu X, Susarla A, Padman R (2023) Youtube videos for public health literacy? A machine learning pipeline to curate covid-19 videos. Stud. Health Tech. Inform. 310:760–764.Google Scholar
Kang SJ, Lee MS (2019) Assessing of the audiovisual patient educational materials on diabetes care with PEMAT. Public Health Nursing (1931) 36(3):379–387.Crossref, Google Scholar
Kim J (2012) The institutionalization of youtube: From user-generated content to professionally generated content. Media Culture Soc. 34(1):53–67.Crossref, Google Scholar
Kim J, Koh J, Kim Y, Choi J, Hwang Y, Choi JW (2018) Robust deep multi-modal learning based on gated information fusion network. Jawahar CV, Li H, Mori G, Schindler K, eds. Proc. Asian Conf. Comput. Vision (Springer, Cham, Switzerland), 90–106.Google Scholar
Kutner M, Greenburg E, Jin Y, Paulsen C (2006) The health literacy of America’s adults: Results from the 2003 National Assessment of Adult Literacy. NCES 2006-483. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, DC.Google Scholar
Li LH, Yatskar M, Yin D, Hsieh CJ, Chang KW (2020) What does BERT with vision look at? Jurafsky D, Chai J, Schluter N, Tetreault J, eds. Proc. 58th Ann. Meeting Assoc. Comput. (Association for Computational Linguistics), 5265–5275. Google Scholar
Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation. Proc. IEEE Internat. Conf. Image Processing (IEEE, Piscataway, NJ), 1262–1266.Google Scholar
Liu X, Susarla A, Padman R (2025) Promoting health literacy with human-in-the-loop video understandability classification of youtube videos: Development and evaluation study. J. Medical Internet Res. 27:e56080.Crossref, Google Scholar
Liu X, Zhang B, Susarla A, Padman R (2020) Go to youtube and call me in the morning: Use of social media for chronic conditions. MIS Quart. 44(1):257–284.Crossref, Google Scholar
Ma Y, Xu G, Sun X, Yan M, Zhang J, Ji R (2022) X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. Magalhães J, Del Bimbo A, Satoh S, Sebe N, Alameda-Pineda X, Jin Q, Oria V, Toni L, eds. Proc. 30th ACM Internat. Conf. Multimedia (Association for Computing Machinery, New York), 638–647.Google Scholar
Mayer RE (2005) Cognitive theory of multimedia learning. Cambridge Handbook Multimedia Learn. 41(1):31–48.Crossref, Google Scholar
McGloin AF, Eslami S (2015) Digital and social media opportunities for dietary behaviour change. Proc. Nutrition Soc. 74(2):139–148.Crossref, Google Scholar
Mohamed F, Shoufan A (2024) Users’ experience with health-related content on YouTube: An exploratory study. BMC Public Health 24(1):86.Crossref, Google Scholar
Notredame CE, Grandgenèvre P, Pauwels N, Morgiève M, Wathelet M, Vaiva G, Séguin M (2018) Leveraging the web and social media to promote access to care among suicidal individuals. Frontiers Psych. 9:1338.Crossref, Google Scholar
Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, Fernandez P, et al. (2024) Dinov2: Learning robust visual features without supervision. Trans. Machine Learn. Res. (OpenReview.net).Google Scholar
Patil U, Kostareva U, Hadley M, Manganello JA, Okan O, Dadaczynski K, Massey PM, Agner J, Sentell T (2021) Health literacy, digital health literacy, and Covid-19 pandemic attitudes and behaviors in us college students: Implications for interventions. Internat. J. Environment. Res. Public Health 18(6):3301.Crossref, Google Scholar
Pothugunta KP (2025) Enhancing digital health education: AI-based assessment of actionable guidance and inclusivity on digital platforms. PhD thesis, Michigan State University, Ann Arbor.Google Scholar
Pothugunta K, Liu X, Susarla A, Padman R (2026) CAAC: Coattentive actionability classification for assessing patient education videos. https://doi.org/10.1287/ijoc.2023.0493.cd, https://github.com/INFORMSJoC/2023.0493.Google Scholar
Rau MA (2020) Comparing multiple theories about learning with physical and virtual representations: Conflicting or complementary effects? Ed. Psych. Rev. 32(2):297–325.Crossref, Google Scholar
Shoemaker SJ, Wolf MS, Brach C (2014) Development of the patient education materials assessment tool (pemat): A new measure of understandability and actionability for print and audiovisual patient information. Patient Ed. Counseling 96(3):395–403.Crossref, Google Scholar
Song Y, Xu X, Dutta K, Li Z (2024) Improving answer quality using image-text coherence on social Q&A sites. Decision Support Systems 180:114191.Crossref, Google Scholar
Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019) Videobert: A joint model for video and language representation learning. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA), 7464–7473.Google Scholar
Tong Z, Song Y, Wang J, Wang L (2022) Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Advances in Neural Information Processing Systems, vol. 35 (Curran Associates, Red Hook, NY), 10078–10093.Google Scholar
U.S. Department of Health and Human Services (2010) National Action Plan to Improve Health Literacy (Office of Disease Prevention and Health Promotion, Washington, DC).Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, et al. (2017) Attention is all you need. Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan SVN, Garnett R, eds. Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Red Hook, NY), 5998–6008.Google Scholar
Vishnevetsky J, Walters CB, Tan KS (2018) Interrater reliability of the patient education materials assessment tool (pemat). Patient Ed. Counseling 101(3):490–496.Crossref, Google Scholar
Wang J, Antonenko PD (2017) Instructor presence in instructional video: Effects on visual attention, recall, and perceived learning. Comput. Human Behav. 71:79–89.Crossref, Google Scholar
Yerramilli S, Tamarapalli JS, Francis J, Nyberg E (2024) Attribution regularization for multimodal paradigms. Preprint, submitted April 2, https://arxiv.org/abs/2404.02359.Google Scholar
Yu Z, Yu J, Cui Y, Tao D, Tian Q (2019) Deep modular co-attention networks for visual question answering. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA), 6281–6290.Google Scholar
Zeng Z, Cao J, Weng N, Jiang G, Rao Y, Xu Y (2021) Softmax pooling for super visual semantic embedding. Proc. IEEE 12th Ann. Inform. Tech. Electronics Mobile Comm. Conf. (IEEE, Piscataway, NJ), 0258–0265.Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:December 27, 2023
Accepted:April 11, 2026
Published Online:June 09, 2026

Cite as

Krishna Pothugunta, Xiao Liu, Anjana Susarla, Rema Padman (2026) CAAC: Co-attentive Actionability Classification for Assessing Patient Education Videos. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2023.0493

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

CAAC: Co-attentive Actionability Classification for Assessing Patient Education Videos

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News