CAAC: Co-attentive Actionability Classification for Assessing Patient Education Videos

Published Online:https://doi.org/10.1287/ijoc.2023.0493

References

  • Arnab A, Dehghani M, Heigold G, Sun C, Lučić M, Schmid C (2021) Vivit: A video vision transformer. Berg T, Clark J, Matsushita Y, Taylor C, eds. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA), 6836–6846.Google Scholar
  • Ashley C, Tuten T (2015) Creative strategies in social media marketing: An exploratory study of branded social content and consumer engagement. Psych. Marketing 32(1):15–27.CrossrefGoogle Scholar
  • Aysolmaz B, Reijers HA (2021) Animation as a dynamic visualization technique for improving process model comprehension. Inform. Management 58(5):10347.CrossrefGoogle Scholar
  • Baur C, Prue C (2014) The CDC clear communication index is a new evidence-based tool to prepare and review health information. Health Promotion Practice 15(5):629–637.CrossrefGoogle Scholar
  • Beltagy I, Peters ME, Cohan A (2020) Longformer: The long-document transformer. Preprint, submitted April 10, https://arxiv.org/abs/2004.0515.Google Scholar
  • Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Crotty K (2011) Low health literacy and health outcomes: An updated systematic review. Ann. Internal Medicine 155(2):97–107.CrossrefGoogle Scholar
  • Chen Z, Wang S, Yan D, Li Y (2024) A spatio-temporl deepfake video detection method based on timesformer-cnn. Proc. 2024 Third Internat. Conf. Distributed Comput. Electrical Circuits Electronics (IEEE, Piscataway, NJ), 1–6.Google Scholar
  • D’Alfonso S (2020) Ai in mental health. Curr. Opin. Psychol. 36:112–117.CrossrefGoogle Scholar
  • Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Burstein J, Doran C, Solorio T, eds. Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, Long and Short Papers, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
  • Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, et al. (2021) An image is worth 16x16 words: Transformers for image recognition at scale. Proc. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
  • Draus P (2020) Impact of student engagement strategies on video content in learning computer programming and attitudes towards video instruction that was developed based on the cognitive theory of multimedia learning. Issues Inform. Systems 21(3):126–134.Google Scholar
  • Eichler K, Wieser S, Brügger U (2009) The costs of limited health literacy: A systematic review. Int. J. Public Health 54(5):313–324.CrossrefGoogle Scholar
  • Gadzicki K, Khamsehashari R, Zetzsche C (2020) Early vs late fusion in multimodal convolutional neural networks. Proc. IEEE 23rd Internat. Conf. Inform. Fusion (IEEE, Piscataway, NJ), 1–6.Google Scholar
  • Gat I, Schwartz I, Schwing A, Hazan T (2020) Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin HT, eds. Advances in Neural Information Processing Systems, vol. 33 ( Curran Associates, Red Hook, NY), 3197–3208.Google Scholar
  • Gemino A, Parker D, Kutzschan AO (2005) Investigating coherence and multimedia effects of a technology-mediated collaborative environment. J. Management Inform. Systems 22(3):97–121.CrossrefGoogle Scholar
  • Guo Y, Liu X, Susarla A, Padman R (2023) Youtube videos for public health literacy? A machine learning pipeline to curate covid-19 videos. Stud. Health Tech. Inform. 310:760–764.Google Scholar
  • Kang SJ, Lee MS (2019) Assessing of the audiovisual patient educational materials on diabetes care with PEMAT. Public Health Nursing (1931) 36(3):379–387.CrossrefGoogle Scholar
  • Kim J (2012) The institutionalization of youtube: From user-generated content to professionally generated content. Media Culture Soc. 34(1):53–67.CrossrefGoogle Scholar
  • Kim J, Koh J, Kim Y, Choi J, Hwang Y, Choi JW (2018) Robust deep multi-modal learning based on gated information fusion network. Jawahar CV, Li H, Mori G, Schindler K, eds. Proc. Asian Conf. Comput. Vision (Springer, Cham, Switzerland), 90–106.Google Scholar
  • Kutner M, Greenburg E, Jin Y, Paulsen C (2006) The health literacy of America’s adults: Results from the 2003 National Assessment of Adult Literacy. NCES 2006-483. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education, Washington, DC.Google Scholar
  • Li LH, Yatskar M, Yin D, Hsieh CJ, Chang KW (2020) What does BERT with vision look at? Jurafsky D, Chai J, Schluter N, Tetreault J, eds. Proc. 58th Ann. Meeting Assoc. Comput. (Association for Computational Linguistics), 5265–5275. Google Scholar
  • Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation. Proc. IEEE Internat. Conf. Image Processing (IEEE, Piscataway, NJ), 1262–1266.Google Scholar
  • Liu X, Susarla A, Padman R (2025) Promoting health literacy with human-in-the-loop video understandability classification of youtube videos: Development and evaluation study. J. Medical Internet Res. 27:e56080.CrossrefGoogle Scholar
  • Liu X, Zhang B, Susarla A, Padman R (2020) Go to youtube and call me in the morning: Use of social media for chronic conditions. MIS Quart. 44(1):257–284.CrossrefGoogle Scholar
  • Ma Y, Xu G, Sun X, Yan M, Zhang J, Ji R (2022) X-clip: End-to-end multi-grained contrastive learning for video-text retrieval. Magalhães J, Del Bimbo A, Satoh S, Sebe N, Alameda-Pineda X, Jin Q, Oria V, Toni L, eds. Proc. 30th ACM Internat. Conf. Multimedia (Association for Computing Machinery, New York), 638–647.Google Scholar
  • Mayer RE (2005) Cognitive theory of multimedia learning. Cambridge Handbook Multimedia Learn. 41(1):31–48.CrossrefGoogle Scholar
  • McGloin AF, Eslami S (2015) Digital and social media opportunities for dietary behaviour change. Proc. Nutrition Soc. 74(2):139–148.CrossrefGoogle Scholar
  • Mohamed F, Shoufan A (2024) Users’ experience with health-related content on YouTube: An exploratory study. BMC Public Health 24(1):86.CrossrefGoogle Scholar
  • Notredame CE, Grandgenèvre P, Pauwels N, Morgiève M, Wathelet M, Vaiva G, Séguin M (2018) Leveraging the web and social media to promote access to care among suicidal individuals. Frontiers Psych. 9:1338.CrossrefGoogle Scholar
  • Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, Fernandez P, et al. (2024) Dinov2: Learning robust visual features without supervision. Trans. Machine Learn. Res. (OpenReview.net).Google Scholar
  • Patil U, Kostareva U, Hadley M, Manganello JA, Okan O, Dadaczynski K, Massey PM, Agner J, Sentell T (2021) Health literacy, digital health literacy, and Covid-19 pandemic attitudes and behaviors in us college students: Implications for interventions. Internat. J. Environment. Res. Public Health 18(6):3301.CrossrefGoogle Scholar
  • Pothugunta KP (2025) Enhancing digital health education: AI-based assessment of actionable guidance and inclusivity on digital platforms. PhD thesis, Michigan State University, Ann Arbor.Google Scholar
  • Pothugunta K, Liu X, Susarla A, Padman R (2026) CAAC: Coattentive actionability classification for assessing patient education videos. https://doi.org/10.1287/ijoc.2023.0493.cd, https://github.com/INFORMSJoC/2023.0493.Google Scholar
  • Rau MA (2020) Comparing multiple theories about learning with physical and virtual representations: Conflicting or complementary effects? Ed. Psych. Rev. 32(2):297–325.CrossrefGoogle Scholar
  • Shoemaker SJ, Wolf MS, Brach C (2014) Development of the patient education materials assessment tool (pemat): A new measure of understandability and actionability for print and audiovisual patient information. Patient Ed. Counseling 96(3):395–403.CrossrefGoogle Scholar
  • Song Y, Xu X, Dutta K, Li Z (2024) Improving answer quality using image-text coherence on social Q&A sites. Decision Support Systems 180:114191.CrossrefGoogle Scholar
  • Sun C, Myers A, Vondrick C, Murphy K, Schmid C (2019) Videobert: A joint model for video and language representation learning. Proc. IEEE/CVF Internat. Conf. Comput. Vision (IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA), 7464–7473.Google Scholar
  • Tong Z, Song Y, Wang J, Wang L (2022) Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Advances in Neural Information Processing Systems, vol. 35 (Curran Associates, Red Hook, NY), 10078–10093.Google Scholar
  • U.S. Department of Health and Human Services (2010) National Action Plan to Improve Health Literacy (Office of Disease Prevention and Health Promotion, Washington, DC).Google Scholar
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, et al. (2017) Attention is all you need. Guyon I, von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan SVN, Garnett R, eds. Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Red Hook, NY), 5998–6008.Google Scholar
  • Vishnevetsky J, Walters CB, Tan KS (2018) Interrater reliability of the patient education materials assessment tool (pemat). Patient Ed. Counseling 101(3):490–496.CrossrefGoogle Scholar
  • Wang J, Antonenko PD (2017) Instructor presence in instructional video: Effects on visual attention, recall, and perceived learning. Comput. Human Behav. 71:79–89.CrossrefGoogle Scholar
  • Yerramilli S, Tamarapalli JS, Francis J, Nyberg E (2024) Attribution regularization for multimodal paradigms. Preprint, submitted April 2, https://arxiv.org/abs/2404.02359.Google Scholar
  • Yu Z, Yu J, Cui Y, Tao D, Tian Q (2019) Deep modular co-attention networks for visual question answering. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA), 6281–6290.Google Scholar
  • Zeng Z, Cao J, Weng N, Jiang G, Rao Y, Xu Y (2021) Softmax pooling for super visual semantic embedding. Proc. IEEE 12th Ann. Inform. Tech. Electronics Mobile Comm. Conf. (IEEE, Piscataway, NJ), 0258–0265.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.