Dimension Reduction in Importance Sampling: Balancing Concentration and Exploration Through Variable Selection

Published Online:https://doi.org/10.1287/ijds.2025.0066

Supplemental Material

Online Appendix: ijds.2025.0066.sm1.pdf

Software and Data: ijds.2025.0066.cd.zip


Description of Software and Data

The code and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "Dimension Reduction in Importance Sampling: Balancing Concentration and Exploration through Variable Selection" by Chenfei Li, Jaeshin Park, Eunshin Byon. This repository is also available via GitHub.

The goal of this repository is to replicate the numerical experiments in the paper.

Computer and Software Environment

The following describes the computer hardware conditions and software environment on which the authors produce the results reported in the paper.

The numerical experiments were conducted on a Windows-based laboratory server equipped with Intel Xeon E5-2697 v3 CPUs (28 cores, 56 logical processors) and 128 GB of RAM. The experiments were implemented in R (version 4.2.2) and Python (version 3.11). Parallel computing was used when applicable.

Dependencies

The code in this repository requires the following dependencies. The dependency version number corresponds to the version of the package with which the code was tested.

R (version 4.2.2) with the following packages:

  • doParallel 1.0.17
  • truncnorm 1.0.9
  • splines 4.2.2
  • rootSolved 1.8.2.3
  • rmutil 1.1.10
  • pracma 2.4.2
  • mvtnorm 1.2.1
  • cubature 2.0.4.6
  • caret 6.0.94
  • stats 4.2.2
  • lhs .1.6
  • LHD 1.4.1
  • R utils 2.13.0
  • crch 1.1.2
  • loo 2.8.0
  • GauPro 0.2.11
  • DiceKriging 1.6.0
  • kernlab 0.9.32
  • SAM 1.1.3
  • glmnet 4.1.8
  • randomForest 4.7.1.1

Installation

The code and data are provided as a single compressed (.zip) archive. After downloading and extracting the archive, the directory structure is organized as follows:

  • Codes/R_methods: R source code for numerical experiments and analysis of all methods except ECL.
  • Codes/Python_ECL: Python source code for numerical experiments and analysis of ECL.

The required R packages can be installed from CRAN using:


install.packages(c("doParallel", "truncnorm", "splines", "rootSolve", "rmutil", 
                       "pracma", "mvtnorm", "cubature", "caret", "stats", "lhs", 
                       "LHD", "R.utils", "crch", "loo", "GauPro", "DiceKriging", "kernlab", 
                       "SAM", "glmnet", "randomForest"))

The required Python packages can be installed using pip, for example:


pip install numpy pandas pyDOE scipy scikit-learn matplotlib dill 

Reproducibility Workflow

To reproduce the results in Table 1 (all methods except ECL), Examples 1–4
  • Data File: Simulated internally by the code (no external data file)
  • Code File: /Codes/R_methods/main.R
  • Output: POE estimates (mean and standard error) for IS-VS, IS-CE, IS-Pareto, WAMK-SIS, Lasso, SpAM, RF-RFE, and OptiTreeStrat methods across Examples 1-4
  • Run Time at the Above-Specified Computer Conditions: Around 14 hours
To reproduce the results in Table 1 (ECL method), Example 1
  • Data File: Simulated internally by the code (no external data file)
  • Code File: Codes/Python_ECL/ecl_example1.ipynb
  • Output: POE estimates (mean and standard error) for ECL method of Examples 1
  • Run Time at the Above-Specified Computer Conditions: Around 2 days
To reproduce the results in Table 1 (ECL method), Example 2
  • Data Files: Simulated internally by the code (no external data file)
  • Code File: Codes/Python_ECL/ecl_example2.ipynb
  • Output: POE estimates (mean and standard error) for ECL method of Examples 2
  • Run Time at the Above-Specified Computer Conditions: Around 2 days
To reproduce the results in Table 1 (ECL method), Example 3
  • Data Files: Simulated internally by the code (no external data file)
  • Code File: Codes/Python_ECL/ecl_example3.ipynb
  • Output: POE estimates (mean and standard error) for ECL method of Examples 3
  • Run Time at the Above-Specified Computer Conditions: Around 2 days

Note

The implementation of the ECL baseline method is adapted from publicly available reference implementations associated with prior work, with modifications to ensure consistency with the experimental setup used in this paper.

Cite

To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.

Article: https://doi.org/10.1287/ijds.2025.0066
Software and Data Repository: https://doi.org/10.1287/ijds.2025.0066.cd

License

Copyright (c) (2025 Li, Park, Byon)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.