
README for Replication of “Online Learning with Sample Selection Bias” 
by Divya Singhvi, Somya Singhvi

This repository contains Jupyter notebooks designed to replicate the experiments and figures presented in “Online Learning with Sample Selection Bias”. There are a total of 7 figures in the manuscript. The code in this repository can be used for generating figures or implementing baseline algorithms used in the analysis. Below is a description of the files and their purposes.

File Descriptions

Figure Files
These files are used to generate the figures in the paper. Each notebook corresponds to specific figures, as indicated by the file names:

- **Figure_1.ipynb**: Generates Figure 1 in the paper.
- **Figure_3_4_5.ipynb**: Generates Figures 3, 4, and 5 in the paper.
- **Figure_6.ipynb**: Generates Figure 6 in the paper.
- **Figure_7.ipynb**: Generates Figure 7 in the paper.

NOTE: Figure 2 in the manuscript is a pictorial representation of the algorithm and as such does not require data/code for replication. 

### Run Files
The following notebooks implement various baseline algorithms and helper functions used in the experiments:

- **run_sample.ipynb**: A sampling function to sample from empirical distribution of features. This file is primarily used for replicating Figure 7. 
- **rungreedybandit_donate.ipynb**: Implements the greedy bandit baseline algorithm.
- **runOFUL_donate.ipynb**: Implements the Optimism in the Face of Uncertainty for Linear models (OFUL) algorithm.
- **runpriordependentTS_donate.ipynb**: Implements a prior-dependent Thompson Sampling (TS) algorithm.
- **runpriorfreeTS_donate.ipynb**: Implements a prior-free Thompson Sampling (TS) algorithm.
- **runSSB_donate.ipynb**: Implements the proposed Sample Selection Bandit (SSB) algorithm.

###Data Files
- ** Source_data_file_A.csv: Donations data from GoFundMe platform collected by Sisco and Weber (2019).

## Usage Instructions
- File Placement: Ensure that all the Python notebooks are placed in the same directory.
- Dependencies: Install required Python libraries, including numpy, pandas, and matplotlib.
- Generating Figures: Run the Figure_*.ipynb files to generate the specific figures corresponding to the paper. These files call various helper notebooks to evaluate benchmark algorithms, where applicable. These are critical for reproducing the experimental results.

## Output
The figures will be saved in an Output directory once the respective notebook is executed. Ensure the directory exists or can be automatically created by the script.

## Notes
- Results may vary slightly depending on the Python version. To ensure smooth execution, use Python 3.0.
- If additional datasets or configuration files are required, their details will be provided in the notebooks.

### References
Sisco, Matthew R, Elke U Weber. 2019. Examining charitable giving in real-world online donations. Nature communications 10(1) 1–8