How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets
Supplemental Material
Software and Data: ijds.2025.0093.cd.zip
Description of Software and Data
The code and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "How to Purchase Labels? A Cost-effective Approach Using Active Learning Markets" by Xiwen Huang, Pierre Pinson. This repository is also available via GitHub.
The goal of this repository is to replicate the numerical experiments in the paper.
Computer and Software Environment
The following describes the computer hardware conditions and software environment on which the authors produce the results reported in the paper.
All results in this paper were produced on the following hardware and software environment:
HardwareDevice: MacBook Pro (14-inch, Nov 2023)
Chip: Apple M3
Memory: 8 GB unified memory
Internal storage: 1 TB SSD
Operating System
macOS Tahoe 26.1
Software Environment
Python 3.11.7
Dedicated Conda environment:
conda create -n project_env python=3.11.7
conda activate project_env
All required Python packages are listed in requirements.txt and installed using:
pip install -r requirements.txt
This setup was used to generate every figure, table, and experiment reported in the paper.
Dependencies
The code in this repository was tested with the following software versions:
- Python 3.11.7
- NumPy 1.26.4
- pandas 2.1.4
- scikit-learn 1.2.2
- Matplotlib 3.8.0
- ucimlrepo 0.0.7
- SciPy 1.11.4 (used for the Wilcoxon tests)
- openpyxl 3.1.2 (for reading/writing Excel files, if needed)
Installation
- Clone or download the repository
git clone https://github.com/xiwenhuang123/Active_learning_market_IJDS.git cd Active_learning_market_IJDS - Create and activate a Python environment (recommended)
Using Conda:
conda create -n project_env python=3.11.7 conda activate project_envUsing Venv:
python -m venv project_env source project_env/bin/activate #Windows:project_env\Scripts\activate - Install the required Python packages
If a
requirements.txtfile is present:pip install -r requirements.txtOtherwise, install the core dependencies directly:
pip install numpy pandas scikit-learn matplotlib scipy openpyxl - Data and folder structure
- data/Real_estate_valuation/*.csv
- data/Hog_buildings/Hog_education_Madge.csv
- data/Hog_buildings/Hog_education_Rachael.csv
After installation, the repository is organised as follows (main folders):
Active_learning_market_IJDS/
├── IJDS_Variance.py # Real-estate, variance-based
├── IJDS_MSE.py # Predictive-ability (MSE) scenario
├── Monte_carlo_variance_scenario.py # Monte Carlo variance robustness
├── Monte_carlo_MSE_scenario.py # Monte Carlo MSE robustness
├── Plots/
│ ├── VarianceScenario/ # Plots for variance-based
│ ├── MSEScenario/ # Plots for MSE-based scenario
│ └── Sensitivity_Results/ # CSV files for sensitivity tables
├── data/
│ ├── Real_estate_valuation/ # Real estate valuation data
│ └── Hog_buildings/ # Building energy (Hog) datasets
└── README.md
└── requirements.txt
Make sure the subfolders in data/ contain the corresponding .csv files, e.g.:
The plotting and results folders (Plots/VarianceScenario, Plots/MSEScenario, Plots/Sensitivity_Results) will be created automatically by the scripts if they do not already exist.
Reproducibility Workflow
Figures 3–5, Figure 8a-8f, Table 2 in the paper, and Tables 4–5 in the Appendix
- Data File:Real estate valuation data set.xlsx
- Code File:
IJDS_Variance.py - Output: PDF plots for Figures 3–6 and Figures 9a–9f, txt for Table 2 (saved in
Plots/VarianceScenario/); Sensitivity Tables 4–5 saved as CSV files inPlots/Sensitivity_Results/. - Run Time at the Above-Specified Computer Conditions: 5.5 minutes
Figure 6-7 in Section 4.1.5
- Data File: Real estate valuation data set.xlsx
- Code File:
Monte_carlo_variance_scenario.py - Output: PDF plots for Figures 7–8, saved in
Plots/VarianceScenario/. - Run Time at the Above-Specified Computer Conditions: 4.2 minutes
Figure 9-10, Figures 13a-13f in the paper, Table 3 in the paper, and Table 5 in the Appendix.
- Data Files: Hog_industrial_Rachael.csv, Hog_industrial_Madge.csv
- Code File:
IJDS_MSE.py - Output: PDF plots for Figures 10-11 and Figures 14a–14f, (saved in
Plots/MSEScenario/); Sensitivity Tables 6-7 saved as CSV files inPlots/Sensitivity_Results/. - Run Time at the Above-Specified Computer Conditions: 35.88 minutes
Figure 11-12 in Section 4.2.5
- Data Files: Hog_industrial_Rachael.csv,Hog_industrial_Madge.csv
- Code File:
Monte_carlo_MSE_scenario.py - Output: PDF plots for Figures 12-13, saved in
Plots/MSEScenario/. - Run Time at the Above-Specified Computer Conditions: 34.59 minutes
Ongoing Development:
This code is being developed on an ongoing basis at the author-maintained (please specify) package. In particular, the source code in this repository corresponds to (which version of the package function).
Cite
To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.
Article: https://doi.org/10.1287/ijds.2025.0093
Software and Data Repository: https://doi.org/10.1287/ijds.2025.0093.cd
License
Copyright (c) (Huang and Pinson)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

