When Multimodal Interactions Impair Prediction: A Novel Regularized Deep Learning Strategy

Published Online:https://doi.org/10.1287/ijoc.2024.0794

References

  • Aas K, Jullum M, Løland A (2021) Explaining individual predictions when features are dependent: More accurate approximations to Shapley values. Artificial Intelligence 298:103502.CrossrefGoogle Scholar
  • Aghasi A, Rai A, Xia Y (2024) A deep learning and image processing pipeline for object characterization in firm operations. INFORMS J. Comput. 36(2):616–634.LinkGoogle Scholar
  • Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Machine Intelligence 41(2):423–443.CrossrefGoogle Scholar
  • Cao B, Sun Y, Zhu P, Hu Q (2023a) Multi-modal gated mixture of local-to-global experts for dynamic image fusion. IEEE/CVF Internat. Conf. Computer Vision (IEEE, New York), 23498–23507.Google Scholar
  • Cao W, Wu Y, Sun Y, Zhang H, Ren J, Gu D, Wang X (2023b) A review on multimodal zero‐shot learning. Data Mining Knowledge Discovery 13(2):e1488.CrossrefGoogle Scholar
  • Chen J, Zhang A (2020) HGMF: Heterogeneous graph-based fusion for multimodal data with incompleteness. Proc. 26th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1295–1305.Google Scholar
  • Chen G, Xiao S, Zhang C, Zhao H (2026) When multimodal interactions impair prediction: A novel regularized deep learning strategy. https://doi.org/10.1287/ijoc.2024.0794.cd, https://github.com/INFORMSJoC/2024.0794.Google Scholar
  • Chen F, Ji R, Su J, Cao D, Gao Y (2017) Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimedia 20(4):997–1007.CrossrefGoogle Scholar
  • Cheng D, Xiang S, Shang C, Zhang Y, Yang F, Zhang L (2020) Spatio-temporal attention-based neural network for credit card fraud detection Proc. AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 362–369.Google Scholar
  • Choi J-H, Lee J-S (2019) EmbraceNet: A robust deep learning architecture for multimodal classification. Inform. Fusion 51:259–270.CrossrefGoogle Scholar
  • Clara G, Langer S, Schmidt-Hieber J (2024) Dropout regularization versus ℓ2-penalization in the linear model. J. Machine Learn. Res. 25(1):9810–9857.Google Scholar
  • Cogswell M, Ahmed F, Girshick R, Zitnick L, Batra D (2015) Reducing overfitting in deep networks by decorrelating representations. Preprint, submitted November 19, https://arxiv.org/abs/1511.06068.Google Scholar
  • Ding S, Lin L, Wang G, Chao H (2015) Deep feature learning with relative distance comparison for person re-identification. Pattern Recognition 48(10):2993–3003.CrossrefGoogle Scholar
  • Dourado IC, Pedronette DCG, da Silva Torres R (2019) Unsupervised graph-based rank aggregation for improved retrieval. Inform. Processing Management 56(4):1260–1279.CrossrefGoogle Scholar
  • Du N, Li L, Lu T, Lu X (2020) Prosocial compliance in P2P lending: A natural field experiment. Management Sci. 66(1):315–333.LinkGoogle Scholar
  • Gao J, Li P, Chen Z, Zhang J (2020) A survey on deep learning for multimodal data fusion. Neural Comput. 32(5):829–864.CrossrefGoogle Scholar
  • Ge R, Feng J, Gu B, Zhang P (2017) Predicting and deterring default with social media information in peer-to-peer lending. J. Management Inform. Systems 34(2):401–424.CrossrefGoogle Scholar
  • Ghiasi A, Shafahi A, Ardekani R (2023) Improving robustness with adaptive weight decay. Proc. 37th Internat. Conf. Neural Informa. Processing Systems (Curran Associates Inc., Red Hook, NY), 79067–79080.Google Scholar
  • Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2020) More diverse means better: Multimodal deep learning meets remote-sensing imagery classification. IEEE Trans. Geosci. Remote Sensing 59(5):4340–4354.CrossrefGoogle Scholar
  • Hossain MZ, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput. Surveys 51(6):1–36.CrossrefGoogle Scholar
  • Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proc. 32nd Internat. Conf. Machine Learning (JMLR.org), 448–456.Google Scholar
  • Jakulin A, Bratko I (2003) Analyzing attribute dependencies. Lavrac N, Gamberger D, Todorovski L, Blockeel H, eds. Knowledge Discovery in Databases: PKDD 2003 (Springer Nature, London), 229–240.CrossrefGoogle Scholar
  • Jiang YG, Wu ZX, Tang JH, Li ZC, Xue XY, Chang SF (2018) Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Trans. Multimedia 20(11):3137–3147.CrossrefGoogle Scholar
  • Jo Y, Oh AH (2011) Aspect and sentiment unification model for online review analysis. Proc. Fourth ACM Internat. Conf. Web Search Data Mining (ACM, New York), 815–824.Google Scholar
  • Karkehabadi A, Latibari BS, Homayoun H, Sasan A (2024) HLGM: A novel methodology for improving model accuracy using saliency-guided high and low gradient masking. Proc. Fourteenth Internat. Conf. Inform. Sci. Tech. (IEEE, New York), 909–917.Google Scholar
  • Lahat D, Adali T, Jutten C (2015) Multimodal data fusion: An overview of methods, challenges, and prospects. Proc. IEEE 103(9):1449–1477.CrossrefGoogle Scholar
  • Lee M, Pavlovic V (2021) Private-shared disentangled multimodal VAE for learning of latent representations. Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognition Workshops (IEEE, New York), 1692–1700.Google Scholar
  • Liu K, Li Y, Xu N, Natarajan P (2018a) Learn to combine modalities in multimodal deep learning. Preprint, submitted May 29, https://arxiv.org/abs/1805.11730.Google Scholar
  • Liu Y, Liu L, Guo YM, Lew MS (2018b) Learning visual and textual representations for multimodal matching and classification. Pattern Recognition 84:51–67.CrossrefGoogle Scholar
  • Luo C, Jiang Z, Li X, Yi C, Tucker C (2023) Choosing to discover the unknown: The effects of choice on user attention to online video advertising. Management Sci. 70(10):6983–7003.LinkGoogle Scholar
  • Mai S, Hu H, Xing S (2020) Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion. Proc. AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 164–172.CrossrefGoogle Scholar
  • Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint ℓ2, 1 norms minimization. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1813–1821.Google Scholar
  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Machine Intelligence 27(8):1226–1238. CrossrefGoogle Scholar
  • Praveen RG, Alam J (2024) Recursive joint cross-modal attention for multimodal fusion in dimensional emotion recognition. 2024 IEEE/CVF Conf. Comput. Vision Pattern Recognition Workshops (IEEE, New York), 4803–4813.Google Scholar
  • Rahim N, El-Sappagh S, Ali S, Muhammad K, Del Ser J, Abuhmed T (2023) Prediction of Alzheimer’s progression based on multimodal deep-learning-based fusion and visual explainability of time-series data. Inform. Fusion 92:363–388.CrossrefGoogle Scholar
  • Shi T, Huang S-L (2023) MultiEMO: An attention-based correlation-aware multimodal fusion framework for emotion recognition in conversations. Rogers A, Boyd-Graber J, Okazaki N, eds. Proc. 61st Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 14752–14766.Google Scholar
  • Shi L, Wang L, Long C, Zhou S, Tang W, Zheng N, Hua G (2023) Representing multimodal behaviors with mean location for pedestrian trajectory prediction. IEEE Trans. Pattern Anal. Machine Intelligence 45(9):11184–11202.CrossrefGoogle Scholar
  • Sohn K, Shang WL, Lee H (2014) Improved multimodal deep learning with variation of information. Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 2141–2149.Google Scholar
  • Song W, Shi C, Xiao Z, Duan Z, Xu Y, Zhang M, Tang J (2019) AutoInt: Automatic feature interaction learning via self-attentive neural networks. Proc. 28th ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 1161–1170.Google Scholar
  • Sun Z, Sarma P, Sethares W, Liang Y (2020) Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. Proc. AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 8992–8999. CrossrefGoogle Scholar
  • Wang Z, Jiang C, Zhao H, Ding Y (2020a) Mining semantic soft factors for credit risk evaluation in peer-to-peer lending. J. Management Inform. Systems 37(1):282–308.CrossrefGoogle Scholar
  • Wang Y, Huang W, Sun F, Xu T, Rong Y, Huang J (2020b) Deep multimodal fusion by channel exchanging. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4835–4845.Google Scholar
  • Wang L, Wu J, Huang S-L, Zheng L, Xu X, Zhang L, Huang J (2019) An efficient approach to informative feature extraction from multimodal data. Proc. Thirty-Third AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 5281–5288.CrossrefGoogle Scholar
  • Wei Y, Feng R, Wang Z, Hu D (2024) Enhancing multimodal cooperation via sample-level modality valuation. 2024 IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, New York), 27328–27337.Google Scholar
  • Wei Y, Yuan S, Yang R, Shen L, Li Z, Wang L, Chen M (2023) Tackling modality heterogeneity with multi-view calibration network for multimodal sentiment detection. Proc. 61st Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 5240–5252.Google Scholar
  • Wu L, Long Y, Gao C, Wang Z, Zhang Y (2023) MFIR: Multimodal fusion and inconsistency reasoning for explainable fake news detection. Inform. Fusion 100:101944.CrossrefGoogle Scholar
  • Xiao S, Chen Y-J, Tang CS (2022) Customer review provision policies with heterogeneous cluster preferences. Management Sci. 68(7):5025–5048.LinkGoogle Scholar
  • Xu JJ, Chau M (2018) Cheap talk? The impact of lender-borrower communication on peer-to-peer lending outcomes. J. Management Inform. Systems 35(1):53–85.CrossrefGoogle Scholar
  • Xu W, Cao Y, Chen R (2024) A multimodal analytics framework for product sales prediction with the reputation of anchors in live streaming e-commerce. Decision Support Systems 177:114104.CrossrefGoogle Scholar
  • Xu N, Mao W, Chen G (2019) Multi-interactive memory network for aspect based multimodal sentiment analysis. Proc. Thirty-Third AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 371–378.CrossrefGoogle Scholar
  • Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J. Machine Learn. Res. 5:1205–1224.Google Scholar
  • Yu H, Qi Z, Jang L, Salakhutdinov R, Morency L-P, Liang PP (2024) MMoE: Enhancing multimodal models with mixtures of multimodal interaction experts. Al-Onaizan Y, Bansal M, Chen Y, eds. Proc. 2024 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 10006–10030.Google Scholar
  • Zeng ZL, Zhang HJ, Zhang R, Yin CX (2015) A novel feature selection method considering feature interaction. Pattern Recognition 48(8):2656–2666.CrossrefGoogle Scholar
  • Zhang C, Yang Z, He X, Deng L (2020a) Multimodal intelligence: Representation learning, information fusion, and applications. IEEE J. Selected Topics Signal Processing 14(3):478–493.CrossrefGoogle Scholar
  • Zhang Z, Wei X, Zheng X, Li Q, Zeng DD (2022) Detecting product adoption intentions via multiview deep learning. INFORMS J. Comput. 34(1):541–556.LinkGoogle Scholar
  • Zhang X, Zhang Y, Wang S, Yao Y, Fang B, Philip SY (2018) Improving stock market prediction via heterogeneous information fusion. Knowledge Based Systems 143:236–247.CrossrefGoogle Scholar
  • Zhang Y-D, Dong Z, Wang S-H, Yu X, Yao X, Zhou Q, Hu H, et al. (2020b) Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation. Inform. Fusion 64:149–187.CrossrefGoogle Scholar
  • Zhang J, Jiao L, Ma W, Liu F, Liu X, Li L, Chen P, et al. (2023) Transformer based conditional GAN for multimodal image fusion. IEEE Trans. Multimedia 25:8988–9001.CrossrefGoogle Scholar
  • Zhao Z, Zhu H, Xue Z, Liu Z, Tian J, Chua MCH, Liu M (2019) An image-text consistency driven multimodal sentiment analysis approach for social media. Inform. Processing Management 56(6):102097.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.