Supervised Multimodal Fission Learning
Supplemental Material
Software and Data: ijds.2024.0059.sm1.zip
Description of Software and Data
The code and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "Supervised Multimodal Fission Learning" by Lingchao Mao, Qi Wang, Yi Su, Fleming Lure, Catherine D Chong, Todd J Schwedt, and Jing Li. This repository is also available via Zenodo, GitHub, or others.
The goal of this repository is to replicate the numerical experiments in the paper.
Computer and Software Environment
The following describes the computer hardware conditions and software environment on which the authors produce the results reported in the paper.
- Python 3.8+
- R 3.6+ (optional, for JIVE and SLIDE R package integration)
Dependencies
- NumPy 1.21.0
- pandas 1.3.0
- SciPy 1.7.0
- scikit-learn 1.0.0
- imbalanced-learn 0.8.0
- Matplotlib 3.4.0
- NetworkX 2.6.0
- tdqm 4.62.0
- rpy2 3.4.0 (optional - requires R installation)
- jupyter 1.0.0
- ipykernel 6.0.0
- json5 0.9.0
Installation
- Step 1. Clone the repository:
git clone https://github.com/yourusername/MMFL.git cd MMFL - Step 2. Install Python dependencies:
pip install -r requirements.txt - Step 3. (Optional) Install R packages for baseline comparisons with JIVE/SJIVE/SLIDE:
# In R console or R command line
install.packages("devtools")
devtools::install_github("irinagain/SLIDE")
devtools::install_github("lockEF/r.jive")
File Structure
MMFL/
├── models/ # Core model implementations
│ ├── MMFL.py # Main MMFL algorithm with rank selection
│ ├── MADDi.py # Multi-modal Attention-based Deep Learning
│ ├── IMLS.py # Incomplete Multi-modality Latent Space
│ └── stagewise.py # Stagewise deep learning models
├── utils/ # Utility functions
│ ├── train.py # Model training and evaluation functions
│ ├── metrics.py # Evaluation metrics (AUC, accuracy, etc.)
│ ├── prepare_dataset.py # Data preprocessing and splitting
│ ├── generate_simulation.py # Synthetic data generation
│ ├── visualization.py # Plotting utilities
│ ├── oversampling.py # SMOTE oversampling for imbalanced data
│ ├── rank_selection.py # Rank selection utilities
│ └── compare_auc_delong_xu.py # Statistical comparison methods
├── experiments/ # Experimental notebooks
│ ├── case_study_adni.ipynb # ADNI dataset experiments
│ ├── case_study_headache.ipynb # Headache dataset experiments
│ └── simulation_study.ipynb # Simulation studies
├── preprocessing/ # Data preprocessing scripts
├── data/ # Data files
│ ├── ADNI_dataset.csv # ADNI dataset
│ ├── ADNI_SNP_fisher_nature_p0.0005.csv # SNP data
│ └── headache_*.csv # Headache study data
└── results/ # Experimental results
Reproducibility Workflow
To reproduce the results in Tables 2, 3, and 4 in Section 4
- Data File: Simulation data is generated by the code script
- Code File:
experiments/simulation_study.ipynb - Output: The values for Tables 2, 3, 4 printed in the notebook and exported to results/ simA_*.json files
- Run Time at the Above-Specified Computer Conditions: 15 minutes
To reproduce the results in Table 5 and Figure 1
- Data File: Generates the
data/ADNI_dataset.csv - Code File:
experiments/case_study_adni.ipynb - Output: The values for Table 5 are printed in the notebook and exported to results/case_*.json files
- Run Time at the Above-Specified Computer Conditions: 25 minutes
To reproduce the results in Table 6
- Data Files:
data/headache_metadata.csvdata/headache_questionnaire.csvdata/headache_mri.csvdata/headache_t2star.csv - Code File:
experiments/case_study_headache.ipynb - Output: The values for Table 5 are printed in the notebook and exported to results/case2_mri-t2star-questionnaire_wcovariates_*.json files
- Run Time at the Above-Specified Computer Conditions: 25 seconds
Note
All the Data Files are in the data folder. Running simulation codes will overwrite the simulated results. The codes have been designed in a way that they save the figures in the “results” folder. We have uploaded the data used to produce our results in the data_backup folder to ensure it is preserved in case the files in the data folder are overwritten when running the simulation codes.
Ongoing Development
A python package named distclust has been developed that can be used to perform the agglomerative clustering on empirical multivariate distributions and perform further analysis. More information regarding this package can be found on GitHub.
Cite
To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.
Article: https://doi.org/10.1287/ijds.2024.0056
Software and Data Repository: https://doi.org/10.1287/ijds.2024.0056.cd
License
Copyright (c) (2025 Lingchao Mao, Qi Wang, Yi Su, Fleming Lure, Catherine D. Chong, Todd J. Schwedt, Jing Li Ghasemloo and Eckman)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

