OptiChat: Bridging Optimization Models and Practitioners with Large Language Models
Supplemental Material
Software and Data: ijds.2025.0074.cd.zip
Description of Software and Data
The software and data in the zip file referenced above are a snapshot of the software and data that were used in the research reported in the paper "OptiChat: Bridging Optimization Models and Practitioners with Large Language Models" by Hao Chen, Gonzalo Esteban Constante Flores, Krishna Sri Ipsit Mantri, Sai Madhukiran Kompalli, Akshdeep Singh Ahluwalia, Can Li. This repository is also available at https://github.com/li-group/OptiChat.
The goal of this repository is to replicate the numerical experiments in the paper.
Dependencies
The code in this repository requires the following dependencies. The dependency version number corresponds to the version of the package with which the code was tested.
- Matplotlib 3.10.1
- NetworkX 3.4.2
- NumPy 2.2.4
- OpenAI 1.66.5
- Pyomo 6.9.1
- python-dotenv 1.0.1
- Streamlit 1.43.2
- tiktoken 0.9.0
- gurobipy 12.0.1
- PySide6 6.8.2.1
- pytest 8.3.3
Installation
- Install python 3.10.16
- Install python packages
pip install -r requirements.txt - Install Gurobi following the instructions at "How do I install Gurobi Optimizer?" For windows without admin access, follow the instructions at "How do I install Gurobi without administrator credentials?"
- Apply for an OpenAI API key at https://platform.openai.com/docs/overview. Add the key to your environment variables as
OPENAI_API_KEY
Reproducibility Workflow
| Which Results to Reproduce | Data File | Code File | Output | Run Time at the Above-Specified Computer Conditions |
|---|---|---|---|---|
| Table 1 in the paper | Feas/ Infeas/ |
run_exp.py (interpreter_experiment=True) |
stats.json including time in Table 1 | 20 minutes (“gpt-4.1”) |
| Table 2, Diagnosing query | Infeas/ test_set/tool_testset/ |
run_exp.py (internal_experiment=True) |
stats.csv including accuracy and time of diagnosing query in Table 2, stats_detail.pkl storing the LLM-generated answers | 40 minutes (“gpt-4.1” + “gpt-4o-mini” + “gpt-4o” + “o3”) |
| Table 2, Retrieval, Sensitivity, What-if query | Feas/ test_set/ tool_testset/ |
run_exp.py (internal_experiment=True) |
stats.csv including accuracy and time of retrieval, sensitivity, what-if query in Table 2 stats_detail.pkl storing the LLM-generated answers | 70 minutes (“gpt-4.1” + “gpt-4o-mini” + “gpt-4o” + “o3”) |
| Table 2, Why-not query | Feas/ test_set/code_testset |
run_exp.py (external_experiment=True) |
stats.csv including accuracy and time of why-not query in Table 2 stats_detail.pkl storing the LLM-generated answers | 70 minutes (“gpt-4.1” + “gpt-4o-mini” + “gpt-4o” + “o3”) |
| Table 4, Diagnosing query, w/o Predefined Functions | Infeas/ test_set/tool_testset/ |
run_exp.py (internal_experiment=True; ablation=True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 70 minutes (“gpt-4.1” + “o3”) |
| Table 4, Diagnosing query, w/o Syntax Reminders | Infeas/ test_set/tool_testset/ |
run_exp.py (internal_experiment=True; skip_syntax=True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 20 minutes (“gpt-4.1” + “o3”) |
| Table 4, Diagnosing query, w/o Illustrator | Infeas/ test_set/tool_testset/ |
run_exp.py (internal_experiment=True; skip_description =True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 40 minutes (“gpt-4.1” + “o3”) |
| Table 4, Retrieval, Sensitivity, What-if query, w/o Predefined Functions |
Feas/ test_set/ tool_testset/ |
run_exp.py (internal_experiment=True, ablation=True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 90 minutes (“gpt-4.1” + “o3”) |
| Table 4, Retrieval, Sensitivity, What-if query, w/o Syntax Reminders | Feas/ test_set/tool_testset/ |
run_exp.py (internal_experiment=True; skip_syntax=True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 40 minutes (“gpt-4.1” + “o3”) |
| Table 4, Retrieval, Sensitivity, What-if query, w/o Illustrator |
Feas/ test_set/tool_testset/ |
run_exp.py (internal_experiment=True; skip_description =True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 80 minutes (“gpt-4.1” + “o3”) |
| Table 4, Why-not query, w/o Illustrator | Feas/ test_set/code_testset |
run_exp.py (external_experiment=True; skip_description =True) |
stats.csv including accuracy and time stats_detail.pkl storing the LLM-generated answers | 80 minutes (“gpt-4.1” + “o3”) |
Note
To reproduce the results reported in the main tables, the following parameters can be configured in run_exp.py to align with the specific experiment type and setting.
folder_name: set to “Infeas” for Diagnosing query; set “Feas” for Retrieval, Sensitivity, What-if, Why-not queryinterpreter_experiment: set to True for Table 1internal_experiment: set to True for Diagnosing, Retrieval, Sensitivity, What-if query in Table 2 and 4external_ experiment: set to True for Why-not query in Table 2 and 4ablation: set to True for the row, w/o Predefined Functions, in Table 4skip_syntax: set to True for the row, w/o Syntax Reminders, in Table 4skip_description: set to True for the row, w/o Illustrator, in Table 4gpt_model: set to “gpt-4.1”, “gpt-4o-mini”, “gpt-4o”, or “o3” to replicate the results specific to an LLM in Table 2 and 4
The stats_detail.pkl files that store LLM-generated answers are used to analyze the error distribution in Table 3.
Ongoing Development
This code is being developed on an ongoing basis at the author-maintained OptiChat package. In particular, the source code in this repository corresponds to v0.1.
Cite
Article: https://doi.org/10.1287/ijds.2025.0074
Software and Data Repository: https://doi.org/10.1287/ijds.2025.0074.cd
License
Copyright (c) (2025 Chen, Constante Flores, Mantri, Kompalli, Ahluwalia, Li)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

