Ensemble Computational Pipelines for Robust Machine Learning with Applications in Manufacturing

Published Online:https://doi.org/10.1287/ijds.2024.0052

Supplemental Material

Online Appendix: ijds.2024.0052.sm1.pdf

Software and Data: ijds.2024.0052.cd.zip


Description of Software and Data

The code and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "Ensemble Computational Pipelines for Robust Machine Learning with Application in Manufacturing" by Yixin Chen, Xiaoyu Chen, Ran Jin, and Meimei Liu. This repository is also available via CodeOcean.

The goal of this repository is to replicate the numerical experiments in the paper.

Computer and Software Environment

The following describes the computer hardware conditions and software environment on which the authors produce the results reported in the paper. All experiments were executed inside a Docker container built from the base image


registry.codeocean.com/codeocean/matlab:2023b-ubuntu22.04

which provides:

  • Operating system: Ubuntu 22.04 (64-bit, x86_64).
  • MATLAB: R2023b (The MathWorks, Inc.) with a valid license.
  • Python: system Python 3 as provided by Ubuntu 22.04, with the scientific Python stack listed below.

Dependencies

The code in this repository requires the following dependencies. The dependency version number corresponds to the version of the package with which the code was tested.

The code requires:

  • MATLAB
    • MATLAB R2023b (for the BPMF benchmark code in code/bpmf/).
  • Python and core tools
    • Python 3 (Ubuntu 22.04 default)
    • pip, setuptools, wheel
  • Python packages (tested versions)
    • cvxopt==1.3.2
    • matplotlib==3.9.2
    • scipy==1.14.1
    • tpot==0.12.2
    • jupyterlab==4.2.5
    • scikit-learn==1.6.1
    • torch==2.6.0
    • torchaudio==2.6.0
    • torchvision==0.21.0

Installation

I. Folder structure

After unzipping the artifact, the main folders are:
  • code/ – all source code:
    • we-blsm.py: main WE-BLSM algorithm and result saving.
    • get_bpmf_rating.py: converts performance matrices to rating triplets for BPMF/NCF.
    • bpmf/: MATLAB files for the BPMF benchmark.
    • ncf.py: Neural Collaborative Filtering benchmark.
    • main.py: collects results and reproduces figures.
    • utils.py, plot.py, config.py: utilities, plotting, and configuration.
  • data/ – intermediate datasets:
    • fdm_boot_results.json, tec_boot_results.json, ajp_boot_results.json
  • environment/ – Dockerfile and build context for the computational environment.
  • results/ – created at runtime to store all output files and figures.

II. Recommended Installation: Docker-based reproduction (exact environment)

  • Install prerequisites
    • Install Docker Community Edition (CE).
    • Ensure you have access to a valid MATLAB license file (e.g., license.lic).
  • Build the Docker image

    From the root of the unzipped project:

cd environment
docker build . --tag dd626b46-5c2d-4707-a47a-fae9dd40f1c9
cd ..

  • Run the capsule

  • Ensure you are in the project root (containing code/, data/, environment/). Then run:

mkdir -p results
docker run --platform linux/amd64 --rm \
  --workdir /code \
  --mac-address=YOUR_MAC_ADDRESS \
  --volume "$PWD/license.lic":/MATLAB/licenses/network.lic \
  --volume "$PWD/data":/data \
  --volume "$PWD/code":/code \
  --volume "$PWD/results":/results \
  dd626b46-5c2d-4707-a47a-fae9dd40f1c9 bash run

    Replace YOUR_MAC_ADDRESS with your machine’s MAC address if required by your MATLAB license setup, or omit this line if not needed. The run script inside the container calls the relevant Python and MATLAB code to reproduce all results; outputs are written to results/:

III. Optional: Native (non-Docker) Python setup

If you prefer not to use Docker, you can recreate the Python part of the environment on your own machine:
  • Install Python 3 and MATLAB R2023b.
  • From the project root, create and activate a Python environment (e.g., venv or conda), then run
  • 
    pip install \
      cvxopt==1.3.2 \
      matplotlib==3.9.2 \
      scipy==1.14.1 \
      tpot==0.12.2 \
      jupyterlab==4.2.5 \
      scikit-learn==1.6.1 \
      torch==2.6.0 \
      torchaudio==2.6.0 \
      torchvision==0.21.0
    
    
  • Run the Python and MATLAB scripts manually, following the logic in run and main.py (e.g., first generate/load performance matrices and benchmark results, then call we-blsm.py and finally main.py to reproduce figures).

Reproducibility Workflow

To reproduce the results in Figure 5(a), Figure 6, and Figure 7 in the paper
  • Data File:fdm_boot_results.json
  • Code File: File: run.sh Section 1
  • Output: Four plots: Figure 5a (heatmap_score_fdm_m20.png), Figure 6a (bar3d_fdm_m20.png), Figure 6b (ptp_ranking_fdm_m20.png), and Figure 7 (bar_box_ranking_fdm_m20.png).
  • Run Time at the Above-Specified Computer Conditions: 3 minutes
To reproduce the results in Figure 5(b) and Figure 8 in the paper
  • Data File: tec_boot_results.json
  • Code File: File: run.shSection 2
  • Output: Four plots: Figure 5b (heatmap_score_tec_m20.png), Figure 7 (bar_box_ranking_tec_m20.png), and other two plots for Tecator datasets: bar3d_tec_m20.png, ptp_ranking_tec_m20.png
  • Run Time at the Above-Specified Computer Conditions: 3 minutes
To reproduce the results in Figure 5(c) and Figure 9 in the paper
  • Data Files: ajp_boot_results.json
  • Code File: File: run.shSection 3
  • Output: Four plots: Figure 5c (heatmap_score_ajp_m20.png), Figure 7 (bar_box_ranking_ajp_m20.png), and other two plots for AJP datasets: bar3d_fdm_m20.png, ptp_ranking_ajp_m20.png
  • Run Time at the Above-Specified Computer Conditions: 2 minutes

Notes on Datasets and Data Preprocessing:

We provide the details of the data we shared and how we established the Prediction Performance Matrix. Let N denote the number of datasets and let M denote the number of pipelines (we refer to the paper for the details of datasets and pipelines). To establish the prediction performance matrix, randomly split each dataset into training set and testing set. Train each candidate pipeline on the training set and evaluate it on the testing set of each dataset. Note that this split is fixed for all candidate pipelines, that is, once a dataset is split, every pipeline is trained and tested on the same respective training and testing sets.

We are sharing the intermediate datasets (`fdm_boot_results.json`, `tec_boot_results.json` and `ajp_boot_results.json` under `data/` folder). The JSON file consists of key-value pairs. Each key represents a unique identifier, and each value holds the data associated with that key. Our file contains the following:

  • nrmse: this key contains the evaluation performance matrix. In our FDM and Tecator application, it contains Normalized Root Mean Squared Error (NRMSE). In AJP application, it contains classification accuracy. The $(i,j)$ entry of the evaluation performance matrix represents the NRMSE or the Classification accuracy of pipeline $j$ evaluated at dataset $i$. This is a $N\times M$ array (matrix) indicating $N$ datasets and $M$ pipelines.
  • y (optional): this key contains the true responses in the testing sets of all datasets. In our FDM and Tecator application, this is an array with shape `[15, 72, 12]`, where 15, 72 and 12 represents the number of datasets, number of pipelines and the size of the testing set in each dataset.
  • yhat (optional): this key contains the predicted responses of all pipelines on the testing sets of all datasets. In our FDM application, this is an array with shape `[15, 72, 12]`, where 15, 72 and 12 represents the number of datasets, number of pipelines and the size of the testing set in each dataset.
  • pipenames (optional): this key contains the name of all candidate pipelines.

In practice, only the incomplete prediction performance matrix (key `nrmse` in each JSON) is needed to obtain the ensemble weights of the new WE-BLSM pipeline. However, in order to obtain the predictions of WE-BLSM pipeline and to evaluate its performance, we need the true and predicted responses. One can generate their own prediction performance matrix based on Section 3.2 in our paper with their datasets and pipelines, and create a dictionary in python with the same key-value pairs as ours before implementing our method.

Cite

To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.

Article: https://doi.org/10.1287/ijds.2025.0066
Software and Data Repository: https://doi.org/10.1287/ijds.2025.0066.cd

License

Copyright (c) (Chen, Chen, Jin, and Liu)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.