Capacity Planning in Stable Matching

Published Online:https://doi.org/10.1287/opre.2023.0386

Supplemental Material

Online Appendix: opre.2023.0386.sm2.pdf

Software and Data: opre.2023.0386.cd.zip

Supplemental Review: The empirical results in this paper were replicated. The code, data, and files required to reproduce the results were reviewed and are available via the above zip file.


Description of Software and Data

The software and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "Capacity Planning in Stable Matching" by Federico Bobbio, Margarida Carvalho, Andrea Lodi, Ignacio Rios, Alfredo Torrico. This repository is also available via GitHub at https://github.com/ORJournal/2023.0386.

The goal of this repository is to replicate the numerical experiments in the paper.

Description

Julia package for the computation of the optimal capacity expansion and corresponding student-optimal stable matching for the model proposed in [1].

This project also involves two Python scripts that use various libraries (such as numpy, gurobipy, and matplotlib among others) to generate the plots and the table in the paper; information about how to run these scripts are found after the description of the Julia code. Our results were obtained on an Intel® Xeon® Gold 6226 CPU clocked at 2.70GHz and 384 GB of RAM, using Gurobi 11.0.0 to solve optimization problems with code written in Julia 1.9.4, Python 3.11.4, and R 4.4.2. The total time to run all instances and methods was approximately 240 hours. By using 5 threads in parallel (each solving one instance at a time with a specific method), the time to replicate the experiments was approximately 48 hours.

Installation

The package is not available on the registry. Please clone the repository. You also need to add JuMP and Gurobi.

The next step is to activate the Julia environment, i.e., to use the exact same package versions as those with which our code was developed. To that end, in the folder containing Manisfest.toml and Project.toml which specify the dependencies, open Julia and do (note: ] is a command that opens the package manager mode within the Julia REPL):

] activate .

Otherwise, you need to indicate the path to the folder as follows:

] activate /path/to/2023.0386

Next, you need to install the recorded dependencies by the environment:

using Pkg
Pkg.instantiate()

From now on, when you use this environment on your machine, all you need to do is activate it. Instantiation of the environment is no longer necessary.

Basic Usage

In the following, we essentially go over instances format, reading instances and solving them.

Instances File

Examples of instances can be found in the folder ./data/. In the folder ./data/Antofagasta/, we have instances of the Antofagasta region corresponding to the admissions process for Pre-K in 2023-2024 (the one used in the paper), 2022-2023, 2021-2022, 2020-2019, 2019-2020, 2018-2019. The naming of the instances shows the first year of the admission process and the considered capacity budget B. For example, instance_2023_1.txt corresponds to the 2023-2024 admissions process and the budget is 1. Below is an example of an instance file with 7 students (s0,s1,s2,s3,s4,s5,s6), 4 schools (c0,c1,c2,c3) and their corresponding capacity, budget B=5, the ranking of each acceptable school by each student (e.g., student s4 ranks 4th school c1) and the preferences of the schools (e.g., s0 is the most preferred student by school c1).

# Num. students:7
# Num. colleges:4
# Budget:5
# Students:s0,s1,s2,s3,s4,s5,s6
# Colleges:c0,c1,c2,c3
# Capacities:
c0 1
c1 2
c2 1
c3 1
# Student preferences:
s0 (1,c2) (2,c0) (3,c3) (4,c1)
s1 (1,c3) (2,c1) (3,c0) (4,c2)
s2 (1,c0) (2,c1) (3,c3) (4,c2)
s3 (1,c1) (2,c3) (3,c2) (4,c0)
s4 (1,c2) (2,c0) (3,c3) (4,c1)
s5 (1,c3) (2,c0) (3,c2) (4,c1)
s6 (1,c1) (2,c2) (3,c0) (4,c3)
# College priorities:
c0 (1,s4) (2,s3) (3,s1) (4,s6) (5,s5) (6,s2) (7,s0)
c1 (1,s0) (2,s3) (3,s1) (4,s6) (5,s5) (6,s2) (7,s4)
c2 (1,s1) (2,s3) (3,s1) (4,s6) (5,s5) (6,s2) (7,s4)
c3 (1,s3) (2,s4) (3,s1) (4,s6) (5,s5) (6,s2) (7,s0)

In the ./Dummy/ folder, there is a synthetically generated instance that is used when the code is run for the first time. This is because the first execution of the code involves compilation, which makes the initial run slower than subsequent ones.

Read Instances

In the folder ./src/, we have Instances.jl with all the necessary code to read instances (and also to save results). So, we load and execute the contents of that Julia file into the current session:

include("src/Instances.jl")

Now, we read the instance of the 2023-2024 admissions process in the Antofagasta region (Pre-K) with one unit of budget. The second command may take a few seconds since this is a large instance:

filename = "data/Antofagasta/instance_2023_1.txt"
SC = read_game_from_txt(filename);

To access the information of the instance just do SC.the_element you wish to access. More concretely,

# access the budget
SC.B
# access the number of students
SC.numbS
# access the number of schools
SC.numbC
# access the set of students
SC.students
# access the set of schools
SC.colleges
# access the preferences of the first student in SC.students
SC.pref[SC.students[1]]
# access the preferences of the first school in SC.schools
SC.pref[SC.colleges[1]]
# access the quota of the last school in SC.schools
SC.cap[SC.colleges[end]]
# access the set of schools that the first student in SC.student prefers over the first school in SC.colleges (that school will be included if the student deems it acceptable)
SC.S[SC.students[1]][SC.colleges[1]]
# access the set of students that the first school in SC.collegest prefers over the first student in SC.students (that student will be included if the school deems them acceptable)
SC.T[SC.students[1]][SC.colleges[1]]
Solving Tools

In the folder ./src/, we have algorithms.jl containing the methods designated by Quad, Lin, Comb, Hybrid, Greedy and LPH in the computational section of the paper [1] (more precisely, Section 5.3.4). Let us load the file content:

include("src/algorithms.jl")

Next, we show how to run the methods of our optimization toolbox. To this end, we first read an instance to which the methods are applied.

filename = "data/Antofagasta/instance_2023_1.txt"
SC = read_game_from_txt(filename);

The method Quad, i.e., the solving of the quadratic formulation by Gurobi, needs as input an instance (SC), a penalty for unassigned students, a time limit and a MipGap tolerance.

penalty = "last_pref" # penalty equal to the size of the student preference list plus 1
timelimit = 3600
tolerance = 1e-4
outputs = Quad(SC, penalty, timelimit, tolerance);

The object outputs is a dictionary with information regarding the solving of the formulation and solutions found. We save the results in a txt file by doing as follows:

# naming of the file saving the result
outputfile = string("results/example_quad_B=", string(SC.B),".txt")
# the following function is part of Intances.jl
write_output(outputs, outputfile, timelimit)

By opening the file results/example_quad_B=1.txt, the information is fairly self-explanatory; information concerning callbacks should be ignored, as this method does not use callbacks.

The method Lin, i.e., the solving of the compact formulation with L-constraints by Gurobi, needs as input an instance (SC), a penalty for unassigned students, a time limit and a MipGap tolerance.

penalty = "last_pref" # penalty equal to the size of the student preference list plus 1
timelimit = 3600
tolerance = 1e-4
outputs = Linearization(SC, penalty, timelimit, tolerance);

Exactly as before, the object outputs is a dictionary with information regarding the solving of the formulation and solutions found. We save the results in a txt file by doing as follows:

# naming of the file saving the result
outputfile = string("results/example_lin_B=", string(SC.B),".txt")
# the following function is part of Intances.jl
write_output(outputs, outputfile, timelimit)

The file results/example_lin_B=1.txt has the necessary results information; again, information concerning callbacks should be ignored, as this method does not use callbacks.

The method Greedy needs as input an instance (SC) and a penalty for unassigned students.

penalty = "last_pref" # penalty equal to the size of the student preference list plus 1
outputs = Greedy(SC, penalty);

Exactly as before, the object outputs is a dictionary with information regarding the performance of the method and solutions found. We save the results in a txt file by doing as follows:

# naming of the file saving the result
outputfile = string("results/example_greedy_B=", string(SC.B),".txt")
# the following function is part of Intances.jl
write_output(outputs, outputfile, timelimit)

The file results/example_greedy_B=1.txt has the necessary results information; information concerning callbacks and nodes explored should be ignored, as this method is an heuristic.

The method LPH needs as input an instance (SC) and a penalty for unassigned students. The time limit and a MipGap tolerance for the relaxation solved by this method is 3600 seconds and 1e-4.

penalty = "last_pref" # penalty equal to the size of the student preference list plus 1
outputs = LPH(SC, penalty);

Exactly as before, the object outputs is a dictionary with information regarding the performance of the method and solutions found. We save the results in a txt file by doing as follows:

# naming of the file saving the result
outputfile = string("results/example_lph_B=", string(SC.B),".txt")
# the following function is part of Intances.jl
write_output(outputs, outputfile, timelimit)

The file results/example_lph_B=1.txt has the necessary results information; information concerning callbacks and nodes explored should be ignored, as this method is an heuristic.

Finally, we apply Comb and Hybrid that use the same Julia function CPP (cutting-plane) but different inputs. This is because Comb is an extreme of Hybrid where no L-constraints are added. The CPP function has the following input:

  1. An instance SC (struct as in Instances.jl).
  2. (optional; default is "last_pref") the penalty for unassigned students.
  3. (optional; default is 36000) the time limit.
  4. (optional; default is 1e-4) the MipGap tolerance.
  5. (optional; default is true) boolean indicating whether we want to add L-constraints, i.e., $q_c+\sum_{k=1}^B y_c^k -(q_c+B) \sum_{c' \geq_s c} x_{s,c'} \leq \sum_{s' >_c s} x_{s',c}$. This should be false if we want to run Comb.
  6. (optional; default is -1) parameter γ of the hybrid approach. If γ<0 either no L-constraints are added and we have the Comb method (input 5 equal false) or all L-constraints are added and we have the Lin method (input 5 equal to true). If γ≥0 and input 5 is true, then Hybrid is applied.
  7. (optional; default is false) boolean indicating whether we use the warmstart of LPH or not.
  8. (optional; default is true) boolean indicating whether we add the valid inequalities indicating that each student must do better or as good as in the matching with no extra capacities.
  9. (optional; default is false) boolean indicating whether we want to add the combs with base (s,c) such that rank_c(s)=q_c. We always use false in the results of our paper.
  10. (optional; default is true) boolean indicating whether the matching variables x_{s,c} are considered continuous (true) or binary (false). Note that we can compute combs when x' is continuous.
  11. (optional; default is false) boolean indicating whether we want to run the code in verbose mode.
  12. (optional; default is false) boolean indicating whether we add at the start some modified combs. We always use false in the results of our paper.

Hence, to apply Comb, we do

penalty = "last_pref"
timelimit_dummy = 3600
tolerance_dummy = 1e-4
Lconstraint = false
gamma = 0 # this value does not matter as the boolean above means that only combs are considered
use_warmstart = true
add_DA_with_no_budget_constraints = true
add_hybrid_initialization = false
x_continuous = false # it can be true if we wish to separate fractional matchings
verbose_active = false
extra_DA_combs = false
outputs = CPP(SC, penalty,timelimit,tolerance,Lconstraint,gamma,use_warmstart,add_DA_with_no_budget_constraints,add_hybrid_initialization,x_continuous,verbose_active, extra_DA_combs);
# naming of the file saving the result
outputfile = string("results/example_comb_B=", string(SC.B),".txt")
# the following function is part of Intances.jl
write_output(outputs, outputfile, timelimit)

The file results/example_comb_B=1.txt has the necessary results information.

To apply Hybrid with, for example, γ=0, we do as follows:

penalty = "last_pref"
timelimit_dummy = 3600
tolerance_dummy = 1e-4
Lconstraint = true
gamma = 0 # since the boolean above is true, the gamma value now matters
use_warmstart = true
add_DA_with_no_budget_constraints = true
add_hybrid_initialization = false
x_continuous = false # it can be true if we wish to separate fractional matchings
verbose_active = false
extra_DA_combs = false
outputs = CPP(SC, penalty,timelimit,tolerance,Lconstraint,gamma,use_warmstart,add_DA_with_no_budget_constraints,add_hybrid_initialization,x_continuous,verbose_active, extra_DA_combs);
# naming of the file saving the result
outputfile = string("results/example_hybrid_B=", string(SC.B),".txt")
# the following function is part of Intances.jl
write_output(outputs, outputfile, timelimit)

The file results/example_hybrid_B=1.txt has the necessary results information.

Remark: The default MipGap of 1e-4 (tolerance=1e-4) can be sufficiently large for some instances so that the recorded optimal values may slightly vary per method. This is possible for instances where the penalty is large (the number of schools plus 1).

It is possible to run Hybrid (and thus, Comb) such that, if the time limit is reached, the best feasible solution found is returned. Moreover, it is possible to solve the problem including an upper bound on the number of extra-capacities assigned to each school. To this end, the function CPP_ExtraTime must be used. This function is equal to CPP, except that it recovers not only the best primal objective but also the best primal solutions, and it allows to bound the number of extra seats per school.

For example, the command RunSimulationsExtraTime("2023",ub) specifying the year of the Antofagasta instance we are interested in and the upper bound on extra capacities for each school (-1 being no bound) uses the Hybrid method with a time limit of 10 hours. Moreover, it employs CPP_ExtraTime, which recovers the best feasible solution found for the dataset from the admission process 2023-2024.

year = "2023"
ub = -1 # no upper bound on the number of extra capacities each school can receive
RunSimulationsExtraTime(year,ub)

Results

In the folder results, we provide the results concerning our methods' computational performance on the Antofagasta instances used in our paper. For example, the results of Section 5.3.4 are in the folder results/Comparison/ and of Appendix D.1 in the folder results/Gamma/. More concretely, in results/Comparison/hybrid/num_schools/ are all the results concerning the hybrid approach with γ=0 for our instance of 2023-2024 with penalty equal to the number of schools plus 1 and 1 hour time limit. In the folder Comparison_extratime are the results for the Hybrid with a time-limit of 10 hours for the admission processes of 2022-2023 and 2023-2024. To obtain these results the script scripts/RunParallel.jl must be executed, for example in the following way:

julia RunParallel.jl

Remark: It is possible to specify the number of threads to run in parallel when solving the instances. However, note that each model (methodology) runs on a single thread, so the solving times for our methodologies reflect single-thread performance. The number of threads may impact the optimal solution found as multiple may exist (see Appendix A.2 for an example and the alternative optimal solutions reported in results/Compaison_extratie_all_sol/72 for the high penalty case).

Python Scripts

In the src directory it can be found a Python directory containing two Python scripts: generate_inputs.py and simulations.py, that serve to analyze the output data and to visualize it, respectively.

Installation

To run these Python scripts, you will need Python 3.x installed on your system.

Furthermore, the Python scripts require the following packages:

numpy
matplotlib
seaborn

which can be installed by running the command:

pip install numpy matplotlib seaborn pickle random itertools re argparse

We also created a requirements.txt that can be installed to create the environment with the required packages by running the following command:

pip install -r requirements.txt
Solving Tools and Results

The script generate_inputs.py contains algorithms needed to preprocess the results from the outputs of the Julia simulations. The script simulations.py produces the plots and the table in the paper, and it can be run with the following command:

python simulations.py

The plots will be saved in the directory plots, and the table will be saved in the directory table, where both directories are subdirectories of results.

Remark: You may need to adapt the lines 13-14 of simulations.py to use the appropriate directory.

R Scripts

In the src directory it can be found a R directory containing one script: prepare_data.py. This file serves two purposes: generate tables with summary statistics (Tables 1 and 2 in the main paper) and also preprocess the data to obtain the instance to be used in the numerical experiments.

Data

To run this script, you will first need to download the data at the https://zenodo.org/records/15220293 and unzip it within the folder where the repository is contained. Then, run the following command:

Rscript -e "rmarkdown::render('prepare_data.Rmd', output_format='pdf_document')"

You also can get the data from the website https://datosabiertos.mineduc.cl/ of the Ministerio de Educación de Chile. This data is subject to the original Creative Commons Attribution 2.0 Chile (CC BY 2.0 CL) (https://creativecommons.org/licenses/by/2.0/cl/) license.

Installation

To run this script, you need an (any recent) R distribution installed on your system, and also you will need the capacity to render Latex (i.e., the command pdflatex must be available on the terminal). Moreover, you need pandoc installed to render markdown to pdf. Finally, the R file requires the following packages:

dplyr 1.0.7
tidyr 1.1.4
magrittr 2.0.1
knitr 1.46
kableExtra 1.3.1

After this, you can execute the R script using the command:

Rscript -e "rmarkdown::render('prepare_data.Rmd', output_format='pdf_document')"

This will produce a pdf file named prepare_data.pdf, which contains Tables 1 and 2 in the main paper.

Reference

[1] Bobbio F, Carvalho M, Lodi A, Rios I, Torrico A (2024) Capacity planning in stable matching. Preprint submitted August 7, https://arxiv.org/abs/2110.00734.

Cite

To cite the contents of this repository, please cite both the paper and this repository using their respective DOIs.

Article: https://doi.org/10.1287/opre.2023.0386
Software and Data Repository: https://doi.org/10.1287/opre.2023.0386.cd

License

Copyright (c) 2024 Federico Bobbio, Margarida Carvalho, Andrea Lodi, Ignacio Rios and Alfredo Torrico

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Third-Party Data:
This project includes datasets obtained from the Datos Abiertos del Ministerio de Educación de Chile (https://datosabiertos.mineduc.cl/), which are made available under the Creative Commons Atribución 2.0 Chile (CC BY 2.0 CL) (https://creativecommons.org/licenses/by/2.0/cl/) license.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.