Using Operational Data Analytics for Planning Decisions Under Uncertainty
Supplemental Material
Software and Data: ijds.2024.0051.sm1.zip
Description of Software and Data
The software and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "Using Operational Data Analytics for Planning Decisions Under Uncertainty" by Natarajan Gautam. This repository is also available on CodeOcean.
The goal of this repository is to replicate the numerical experiments in the paper.
Computer and Software Environment
The following describes the computer hardware conditions and software environment on which the authors produce the results reported in the paper.
Macbook Pro (Apple M1 Pro with 16 GB memory running macOS Sequoia version 15.6.1). Original codes were developed and tested on Jupyter notebook and runs on python.
Dependencies
The code in this repository requires the following dependencies. The dependency version number corresponds to the version of the package with which the code was tested.
The following (rather standard python libraries need to be imported and would have to be installed if unavailable):
- pandas
- NumPy
- Matplotlib.pyplot
- sklearn.linear_model
- sklearn
- sklearn.tree
- sklearn.ensemble
- XGBoost
- plotly
- plotly.graph_object
- seaborn
- scipy.optimize
- scipy.special
- scipy.integrate
- scipy.stats
- sklearn.model_selection
Installation
The program could either be run directly on code ocean by creating a file called requirements.txt with appropriate version of the dependencies described above, or the various ipynb files could be downloaded and run locally. The crucial difference is that N_val in NV_like_pastabilities_gam_z_v4.ipynb is set to 10 (that means 10 example runs), but if one wishes a full run, that needs to convert to a value such as 2000 or 5000. All others can be run as is. Many are runs from output excel files that are obtained by running full blown versions. Also, select k_or = 1 if you desire exponential distribution.
Reproducibility Workflow
| Which Results to Reproduce | Data File | Code File | Output | Run Time at the Above-Specified Computer Conditions |
|---|---|---|---|---|
| Figure 2 in the paper | None, but simulated/ | NV_like_pastabilities _gam_z_v4.ipynb |
Save outputs by running with 𝑁=50, 𝑛=5, ℎ=4, k_or = 1, 𝑝=10, and 𝜃=6. Then plot. |
2 minutes |
| Figures 3 and 4 in the paper | None, but simulated | NV_like_pastabilities _gam_z_v4.ipynb |
Save outputs by running with 𝑛=6, k_or = 1.33, and 𝜃=6. Use N_val = 10000. Then plot using excel. |
5 minutes |
| Table 2 and 3 |
experiments.csv | Take averages in excel | Results | Less than 1 minute |
| Figure 5 | experiments.csv | EDA_viz_expt_ data.ipynb |
Figure is an output that can be saved. To produce the experiments.csv file, run NV_like_ pastabilities_ gam_z_v4.ipynb using N_val = 100000, choices of p and h given in paper (these are given in the code as well), k_or = 1, choice of N, n, and 𝜃 given in paper. |
1 minute once experiments.csv is available (but that takes a long time) |
| Table 4 and 5 | full_initial_ run_fixed_ params_v4.csv |
Take averages in excel | Results | Less than 1 minute |
| Results in section 6.2 | full_initial_ run_fixed_ params_v4.csv |
Data_analysis_v4.ipynb | Figure is an output that can be saved. To produce the full_initial_run_ fixed_params_v4.csv file, run NV_like_ pastabilities_gam_ z_v4.ipynb using N_val = 5000, N = 50, n = 7, h = 6, p = 10, k_or = 1.67, and 𝜃 = 6. |
1 minute once full_initial_run_fixed_params_v4.csv is available (but that takes overnight to complete) |
| Raw data for section 6.3 | metro-trips-2025-q1.csv | bike_share.ipynb | Does a cluster analysis and we select cluster 4 | 2 minutes |
| ETA for section 6.3 | cluster4_ bikes.csv |
Bike_data_code.ipynb | Uses cluster 4 and does a preliminary data analysis | Less than 1 minute |
| Comparison of methods used in section 6.3 | cluster4_ bikes.csv |
Bike_data_pred_v2.ipynb | Can save df_full as bike_data_results_ v3.csv |
10 minutes |
| Table 6 | bike_data_ results_v3.csv |
Take averages on excel | Results | Less than 1 minute |
| Figures 6. 7. 8 | bike_data_ results_v3.csv |
Data_analysis_ bike_v2.ipynb |
Figures shown in the output | 1 minute |
Note
Some of the data is generated while others are sourced and curated. The codes do not run produce the whole data. It would have to be tweaked. Also, the main code would have to be run every time a set of data is to be simulated. The remaining codes use the output data to do analysis.
Cite
To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.
Article: https://doi.org/10.1287/ijds.2024.0051
Software and Data Repository: https://doi.org/10.1287/ijds.2024.0051.cd
License
Copyright (c) 2025 Gautam [MIT License]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

