Using Operational Data Analytics for Planning Decisions Under Uncertainty

Published Online:https://doi.org/10.1287/ijds.2024.0051

Supplemental Material

Software and Data: ijds.2024.0051.sm1.zip


Description of Software and Data

The software and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "Using Operational Data Analytics for Planning Decisions Under Uncertainty" by Natarajan Gautam. This repository is also available on CodeOcean.

The goal of this repository is to replicate the numerical experiments in the paper.

Computer and Software Environment

The following describes the computer hardware conditions and software environment on which the authors produce the results reported in the paper.

Macbook Pro (Apple M1 Pro with 16 GB memory running macOS Sequoia version 15.6.1). Original codes were developed and tested on Jupyter notebook and runs on python.

Dependencies

The code in this repository requires the following dependencies. The dependency version number corresponds to the version of the package with which the code was tested.

The following (rather standard python libraries need to be imported and would have to be installed if unavailable):

Installation

The program could either be run directly on code ocean by creating a file called requirements.txt with appropriate version of the dependencies described above, or the various ipynb files could be downloaded and run locally. The crucial difference is that N_val in NV_like_pastabilities_gam_z_v4.ipynb is set to 10 (that means 10 example runs), but if one wishes a full run, that needs to convert to a value such as 2000 or 5000. All others can be run as is. Many are runs from output excel files that are obtained by running full blown versions. Also, select k_or = 1 if you desire exponential distribution.

Reproducibility Workflow

Which Results to Reproduce Data File Code File Output Run Time at the Above-Specified Computer Conditions
Figure 2 in the paper None, but simulated/ NV_like_pastabilities
_gam_z_v4.ipynb
Save outputs
by running
with 𝑁=50, 𝑛=5,
ℎ=4, k_or = 1,
𝑝=10, and 𝜃=6.
Then plot.
2 minutes
Figures 3 and 4 in the paper None, but simulated NV_like_pastabilities
_gam_z_v4.ipynb
Save outputs
by running
with 𝑛=6,
k_or = 1.33,
and 𝜃=6.
Use N_val = 10000.
Then plot using excel.
5 minutes
Table 2
and 3
experiments.csv Take averages in excel Results Less than 1 minute
Figure 5 experiments.csv EDA_viz_expt_
data.ipynb
Figure is an output
that can be saved.
To produce the
experiments.csv file,
run NV_like_
pastabilities_
gam_z_v4.ipynb
using N_val = 100000,
choices of p
and h given in
paper (these are
given in the
code as well),
k_or = 1,
choice of N,
n, and 𝜃
given in paper.
1 minute once experiments.csv is available (but that takes a long time)
Table 4 and 5 full_initial_
run_fixed_
params_v4.csv
Take averages in excel Results Less than 1 minute
Results in section 6.2 full_initial_
run_fixed_
params_v4.csv
Data_analysis_v4.ipynb Figure is an output
that can be saved.
To produce the
full_initial_run_
fixed_params_v4.csv file,
run NV_like_
pastabilities_gam_
z_v4.ipynb using
N_val = 5000,
N = 50,
n = 7,
h = 6,
p = 10,
k_or = 1.67,
and 𝜃 = 6.
1 minute once full_initial_run_fixed_params_v4.csv is available (but that takes overnight to complete)
Raw data for section 6.3 metro-trips-2025-q1.csv bike_share.ipynb Does a cluster analysis and we select cluster 4 2 minutes
ETA for section 6.3 cluster4_
bikes.csv
Bike_data_code.ipynb Uses cluster 4 and does a preliminary data analysis Less than 1 minute
Comparison of methods used in section 6.3 cluster4_
bikes.csv
Bike_data_pred_v2.ipynb Can save df_full as bike_data_results_
v3.csv
10 minutes
Table 6 bike_data_
results_v3.csv
Take averages on excel Results Less than 1 minute
Figures 6. 7. 8 bike_data_
results_v3.csv
Data_analysis_
bike_v2.ipynb
Figures shown in the output 1 minute

 

Note

Some of the data is generated while others are sourced and curated. The codes do not run produce the whole data. It would have to be tweaked. Also, the main code would have to be run every time a set of data is to be simulated. The remaining codes use the output data to do analysis.

Cite

To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.

Article: https://doi.org/10.1287/ijds.2024.0051
Software and Data Repository: https://doi.org/10.1287/ijds.2024.0051.cd

License

Copyright (c) 2025 Gautam [MIT License]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.