On Statistical Discrimination as a Failure of Social Learning: A Multiarmed Bandit Approach
Abstract
We analyze statistical discrimination in hiring markets using a multiarmed bandit model. Myopic firms face workers arriving with heterogeneous observable characteristics. The association between the worker’s skill and characteristics is unknown ex ante; thus, firms need to learn it. Laissez-faire causes perpetual underestimation: minority workers are rarely hired, and therefore, the underestimation tends to persist. Even a marginal imbalance in the population ratio frequently results in perpetual underestimation. We demonstrate that a subsidy rule that is implemented as temporary affirmative action effectively alleviates discrimination stemming from insufficient data.
This paper was accepted by Nicolas Stier-Moses, Management Science Special Issue on The Human-Algorithm Connection.
Funding: This work was supported by the Social Sciences and Humanities Research Council of Canada [Grant 430-2020-00088] and JST ERATO [Grant JPMJER2301], Japan.
Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.00893.
1. Introduction
Statistical discrimination refers to discrimination against minority people, taken by fully rational and nonprejudiced agents. Previous studies have shown that, even in the absence of prejudice, discrimination can occur persistently because of various reasons, including the discouragement of human capital investment (Arrow 1973, Foster and Vohra 1992, Coate and Loury 1993, Moro and Norman 2004), information friction (Phelps 1972, Cornell and Welch 1996, Bardhi et al. 2020), and search friction (Mailath et al. 2000, Che et al. 2019). The literature has proposed various affirmative-action policies to solve statistical discrimination, with many having been implemented in practice.
This paper demonstrates that statistical discrimination may appear as a failure of social learning. We endogenize the evolution of biased beliefs and analyze their consequences. Our model assumes that (i) all firms (decision makers) are fully rational and nonprejudiced (i.e., attempt to hire the most productive worker), and (ii) all workers are ex ante symmetric. In such an environment, an unbiased decision policy—hiring workers with superior skills—satisfies numerous fairness notions articulated in scholarly literature, including equalized odds and demographic parity. It also achieves efficiency by maximizing each firm’s payoff. However, the long-term persistence of biased beliefs could still occur. This paper demonstrates that temporary affirmative actions can effectively enhance both welfare and equality.
Although our model applies more broadly, we use the terminology of hiring markets to describe our model. We develop a multiarmed bandit model of social learning, in which many myopic and short-lived firms sequentially make hiring decisions. In each round, a firm hires one worker from a set of candidates. Each firm’s utility is determined by the hired worker’s skill, which cannot be observed directly until employment. However, as in the standard statistical discrimination model, each worker also has observable characteristics associated with their unobservable skills. Firms learn the statistical association between characteristics and skills using data pertaining to past hiring cases (shared through, e.g., private communication, social media, and recommendation letters) and use the estimators to predict the skills of candidates.
Each worker belongs to a group that represents, for example, their gender, race, and ethnicity. We assume that the characteristics of workers who belong to different groups should be interpreted differently. This assumption is realistic. First, previous studies have revealed that underrepresented groups receive unfairly low evaluations.1 When these evaluations are used as the observable characteristics, firms should be aware of the potential bias. Second, evaluations may reflect differences in cultures, living environments, and social systems (Precht 1998, Al-Ali 2004). For instance, firms need to be conversant with the norms of drafting recommendation letters to interpret them accurately. Therefore, observable characteristics, such as curriculum vitae, exam scores, grading reports, recommendation letters, and so forth, might convey starkly different implications despite their similar presentations. If firms are unbiased and cognizant of these potential biases, they should adapt their interpretation methods for these characteristics, applying varied statistical models to different groups.
When firms learn the statistical association from data, with some probability, the minority group is underestimated because of a large estimation error raised by insufficient data. Once the minority group is underestimated, it is difficult for a minority worker to appear to be the best candidate—even if he has the greatest skill among the candidates, the firm often dismisses this fact and tends to hire a majority worker. As long as firms hire only majority workers, society cannot learn about the minority group; thus, the imbalance persists even in the long run. We call this phenomenon perpetual underestimation.
We use a linear contextual bandit model to analyze the consequences of social learning. To gauge policy performance, we utilize regret, a widely adopted measure in machine-learning literature that assesses welfare loss relative to the optimal decision rule. Regret arises if proficient minority workers are overlooked because of biased estimates by firms; hence, regret not only signifies efficiency but also encapsulates fairness. We discuss the relationship between regret and fairness criteria in Section 7.
We focus on how regret grows as the total number of firms (denoted by N) increases. When regret is sublinear in N, firms make fair and efficient decisions in the long run. We first analyze the equilibrium consequence of laissez-faire (LF) (no policy intervention). When the groups are ex ante symmetric and the population ratio is equal, laissez-faire results in regret.2 However, when the population ratio is unbalanced, this no longer holds, and the expected regret is .
Toward fair and efficient social learning, we propose a subsidy rule based on the idea of the upper confidence bound (UCB) method. UCB is an effective solution for balancing exploration and exploitation (Lai and Robbins 1985, Auer et al. 2002). By incentivizing firms to take actions that are consistent with the recommendations of the UCB, social learning can promote sublinear regret in the long run. We demonstrate that the UCB mechanism has the expected regret of . The subsidy required to implement the UCB mechanism is also .
Improving the UCB mechanism, this paper proposes a hybrid mechanism, which lifts affirmative actions upon the collection of a sufficiently rich data set. The hybrid mechanism takes advantage of spontaneous exploration: Once firms obtain a certain amount of data, the diversity of workers’ characteristics naturally promotes learning about the minority group. The hybrid mechanism achieves regret with subsidy.
We also analyze the Rooney Rule, which requires each firm to interview at least one minority candidate as a finalist for each job opening. We demonstrate that although the Rooney Rule resolves perpetual underestimation by increasing the probability of hiring minority candidates, it sometimes unjustly deprives productive majority workers of employment opportunities, suggesting reverse discrimination.
This paper is framed as a positive analysis elucidating how discrimination arises from social learning conducted by small, rational, and unbiased firms. Alternatively, our study could be viewed as a normative analysis showcasing an efficient hiring policy targeting the long-term average skill of workers hired by a large firm (as explored by Li et al. 2020). For this latter scenario, our results for the hybrid mechanism indicate that a firm can cease affirmative action once it has accumulated reasonably comprehensive information about minority workers.
2. Related Literature
2.1. Statistical Discrimination
Various studies have analyzed statistical discrimination both theoretically (Phelps 1972, Arrow 1973, Foster and Vohra 1992, Coate and Loury 1993, Cornell and Welch 1996, Mailath et al. 2000) and experimentally (Neumark 2018 is an excellent survey). We contribute to this literature by articulating a new channel of discrimination: endogenous data imbalance and insufficiency. Similar to previous studies, we assume otherwise ex ante identical individuals from different groups to demonstrate how discrimination evolves and persists. Meanwhile, our results provide further indication that demographic minorities suffer from discrimination as an inevitable consequence of laissez-faire.
Hu and Chen (2018) examine a dynamic reputation model in a labor market, where workers can endogenously select their skill level. As highlighted by Foster and Vohra (1992) and Coate and Loury (1993), statistical discrimination can potentially discourage minorities from enhancing their skills. Implementing a fairness constraint through affirmative action at the entry level may rectify inequalities within the entire labor market. Our study adds to this body of literature by demonstrating that short-term affirmative action successfully tackles inefficiency and inequality, even when the skill level is fixed.
Kannan et al. (2019) study how a college can design an admission and grading policy to achieve fair employment, assuming employers form a Bayesian belief about students’ skills based on the information provided by the college. We also consider a government that introduces an affirmative-action policy taking into account endogenous responses. Che et al. (2019) examine a rating-guided market, demonstrating that feedback loops can cause discriminatory inferences concerning social groups. We identify endogenously created informational disparities because of feedback loops within a distinct model inspired by a hiring market, deliberate on the underlying causes (demographic imbalance) that instigate discrimination, and propose policy solutions.
Bohren et al. (2019, 2023) and Monachou and Ashlagi (2019) have demonstrated how misspecified beliefs about groups generate discrimination. Thus far, this literature has attributed belief misspecification to bounded rationality. In contrast, we demonstrate that misspecified beliefs may evolve and persist endogenously. Through a laboratory experiment, Dianat et al. (2022) reveal that affirmative action’s impact becomes fleeting if the measure is discontinued before beliefs undergo transformation. Our hybrid mechanism offers a resolution by optimally choosing the timing to terminate the program.
2.2. Social Learning
The economics literature has extensively studied herding, information cascade, and social learning (e.g., Banerjee 1992, Bikhchandani et al. 1992, Smith and Sørensen 2000). Additionally, various papers have studied improvements to social welfare through subsidy for exploration (e.g., Frazier et al. 2014, Kannan et al. 2017) and selective information disclosure (e.g., Kremer et al. 2014, Papanastasiou et al. 2018, Immorlica et al. 2020, Mansour et al. 2020). We propose novel policy interventions to improve social learning in fairness and efficiency.
2.3. Multiarmed Bandit
A multiarmed bandit problem stems from the literature of statistics (Thompson 1933, Robbins 1952). This problem is driven by the question of how a single long-lived decision maker can maximize his payoff by balancing exploration and exploitation. More recently, the machine-learning community has proposed the contextual bandit framework, in which payoffs associated with “arms” (actions) depend not only on the hidden state but also on additional information, referred to as “contexts” (Abe and Long 1999, Langford and Zhang 2008).3
Several previous studies have considered a linear contextual bandit problem and studied the performance of a “greedy” algorithm, which makes decisions myopically in accordance with the current information. Because firms take greedy actions under laissez-faire, their results are also relevant to our model. Bastani et al. (2021) and Kannan et al. (2018) have shown that a greedy algorithm leads to sublinear regret if the contexts are diverse enough. We characterize the relationship between the diversity of contexts and the rate of learning. Moreover, we show that the population ratio is crucial to the regret rate. As an efficient intervention, Kannan et al. (2017) consider a contextually fair UCB-based subsidy rule. Although our subsidy policy also originates from the idea of UCB, we establish the hybrid mechanism that reduces budget expenditure by utilizing spontaneous exploration.
The multiarmed bandit approach has recently found applications in labor market analyses. Bardhi et al. (2020) demonstrate that a minor difference in initial beliefs about each worker’s type can ultimately yield a substantial disparity in workers’ payoffs. Johari et al. (2018) examine how a labor platform can discern workers’ skills to attain an optimal worker assignment when the platform only observes the outcomes generated by teams, not individual workers.
Li et al. (2020) portray a large firm’s hiring process as a multiarmed bandit problem and empirically compare the performance of the status quo (screening via manual work), a greedy policy, and a UCB method. They reveal that a UCB method not only screens job applicants efficiently but also preserves diversity. Their findings suggest that a UCB method is both fairer and more efficient when implemented by a large firm. Interpreting this paper as a study of an efficient hiring policy by a large firm, our results enhance Li et al. (2020) by providing theoretical foundations that outline the performances of a greedy policy (corresponding to laissez-faire) and a UCB method. Moreover, we illustrate that affirmative action can be discontinued shortly by characterizing the performance of a hybrid mechanism.
2.4. Algorithmic Fairness
This algorithmic-fairness literature has implicitly assumed exogenous asymmetry in worker skills and pursued the approaches to correct between-group inequality. To this end, “discrimination-aware” constraints such as equalized odds (Hardt et al. 2016) and demographic parity (Pedreschi et al. 2008, Calders and Verwer 2010) have been proposed, with several papers applying these constraints in the context of multiarmed bandit problems (Joseph et al. 2016) or more general sequential learning (Raghavan et al. 2018, Bechavod et al. 2019, Chen et al. 2020). Although these fairness goals are conflicting in general, we analyze an environment in which many fairness goals are aligned and demonstrate how affirmative action improves them.
3. Model
3.1. Basic Setting
We develop a linear contextual bandit problem with myopic firms (agents). We consider a situation where N firms (indexed by ) sequentially hire one worker for each.4 In each round n, a set of workers I(n) (i.e., arms) arrives. Each worker takes no action, and firm n hires only one worker . Both firms and workers are short-lived. Upon round n ending, firm n’s payoff is finalized, and all rejected workers leave the market.5
Each worker belongs to a group . We assume that the population ratio is fixed: for every round n, the number of workers belonging to group g is and . Slightly abusing the notation, we denote the group worker i belongs to by g(i). Each worker i also has observable characteristics , with as their dimension. Finally, each worker i also has a skill that is not observable until worker i is hired. The characteristics and skills are random variables.
Because each firm’s payoff is equal to the hired worker’s skill yi (plus the subsidy assigned to worker i as affirmative action, if any), firms want to predict the skill yi based on the characteristics . We assume that characteristics and skills are associated as , where is a coefficient parameter, and i.i.d. is an unpredictable error term. We assume for some , where is the standard L2-norm. Because ϵi is unpredictable, is the best predictor of worker i’s skill yi.
The coefficient parameters are initially unknown. Hence, unless firms share information about past hires, firms are unable to predict each worker’s skill yi. We assume that firms share information about past hiring cases.6 Accordingly, when firm n makes a decision, in addition to the characteristics and groups of current workers , firm n observes the characteristics, groups, and skills of previously hired workers . We refer to all realizations of these variables as the history in round n and denote it by h(n). Formally, h(n) is given by
Note that h(n) does not include information about (i) the worker hired by firm n, or (ii) that worker’s actual skill. This is because the notation h(n) represents the information set firm n faces when it makes a hiring decision. We denote the set of all the possible histories in round n by H(n). The firm’s decision rule for hiring and the government’s subsidy rule are defined as a function that maps a history to a hiring decision and the subsidy amount (described later). For notational convenience, we often omit h(n).
3.2. Prediction
We assume that firms are not Bayesian but frequentists. Hence, firms do not have a prior belief about the parameter but estimate it only using the available data set.7
We assume that each firm predicts skill using ridge regression (L2-regularized least square).8 Let be the number of rounds at which group-g workers are hired before round n. Let be a matrix that lists the characteristics of group-g workers hired by round n: each row of corresponds to . Likewise, let be a vector that lists the skills of group-g workers hired by round n: each element of corresponds to . We define . For a parameter , we define , where denotes the d × d identity matrix. Firm n estimates the parameter as follows:
Firm n predicts worker i’s skill qi, while substituting the true predicted skill with estimated skill : . Hence, and depend on the history h(n). The ordinary least squares (OLS) estimator corresponds to the ridge estimator with λ = 0. We use the ridge estimator instead of the OLS estimator to stabilize the small-sample inference. For example, for some history, may not have full rank, and the OLS estimator may not be well defined. Even for such histories, the ridge estimator is always well defined.
For analytical tractability, we assume that for the first rounds, each firm n must hire from a prespecified group, gn. We refer to the first rounds as the initial sampling phase. We assume to be small and deal as a constant.9 Let as the data size of initial sampling for group g, where if event holds or zero otherwise. The initial sampling phase is exogenous. That is, we ignore the incentives and payoffs of firms and assume that the characteristics x of the hired candidate constitute an i.i.d. sample of the corresponding group. We analyze mechanism, social welfare, and budget after round . The initial sampling phase can be interpreted as data that have already been produced in history. The welfare cost is already sunk, and the government can no longer make policy interventions for the event that has already occurred in the past.
3.2.1. Mechanism.
In addition to worker skills, firms are also concerned about subsidies. We assume that firm preferences are risk neutral and quasilinear. Hence, if firm n hires worker i, its payoff (von Neumann-Morgenstern utility) is given by , where denotes the amount of the subsidy assigned to worker i.
In the beginning of the game, the government commits to a subsidy rule , which maps a history to a subsidy amount. Hence, once a history h(n) is specified, firm n can identify the subsidy assigned to each worker . Firm n attempts to maximize
Firm n’s decision rule specifies the worker that firm n hires after history h(n). We say that a decision rule ι is implemented by a subsidy rule si if for all n and h(n), we have
Throughout this paper, any ties are broken arbitrarily. We call a pair of a decision rule and subsidy rule a mechanism. We often drop h(n) from the input of decision rule ι when it does not cause confusion.
3.3. Regret
Regret is a standard measure for evaluating the performance of algorithms in multiarmed bandit models:
Because ϵi is unpredictable, it is natural to evaluate the performance of the algorithm (or the equilibrium consequence of the policy intervention) by comparing it with qi. If the parameter were known, each firm could easily calculate qi for each worker i and hire the best worker, . In this case, regret would be zero. The goal of the policy design is to establish a mechanism that minimizes the expected regret , where the expectation is taken on a random draw of workers. This aim is equivalent to maximizing the sum of the skills of workers hired.
Following the literature, we often evaluate the performance by the limiting behavior (order) of expected regrets. A decision rule ι is said to have sublinear regret if for some a < 1. Small regret implies not only efficiency but also fairness. Regret measures the disparate impact that is not justified by skill disparity, and sublinear regret is achieved if firms hire the most skillful workers without regard to the group of workers. The relationship between regret and fairness is discussed further in Section 7.
3.4. Budget
Some of the policies we study incentivize exploration through subsidies. The total budget required by a subsidy rule is also an important policy concern. The total amount of the subsidy is given by .
4. Laissez-Faire
This section analyzes the equilibrium under laissez-faire, that is, the consequence of social learning in the absence of policy intervention.
(
Laissez-faire makes no intervention. Each firm hires the worker with the greatest estimated skill, as predicted by the current data set. The multiarmed bandit literature refers to the laissez-faire decision rule as the greedy algorithm.
4.1. Symmetry and Diverse Characteristics
To illustrate a failure of social learning, we make three assumptions. First, as a minimal environment to analyze discrimination, we focus on the two-group case.
(
When we consider asymmetric equilibria, we refer to group 1 as the majority (unprotected) group and group 2 as the minority (protected) group. The two-group assumption enables the elucidation of how the minority group is discriminated against.
Second, we assume that groups are symmetric.
(
Note that although we assume that groups are symmetric, firms do not know the true parameters and therefore apply different statistical models to different groups. That is, even though the true coefficients are identical ( for all ), firms estimate them separately; thus, the values of the estimated coefficients are typically different ( for ).
Although Assumption 2 is unrealistic (because the characteristics should evidently be interpreted differently), it is useful for elucidating how laissez-faire nourishes statistical discrimination. Under Assumption 2, agents are ex ante identical (as assumed in Arrow 1973, Foster and Vohra 1992, Coate and Loury 1993, Moro and Norman 2004), and therefore the differences we observe in the equilibrium are entirely attributed to social learning.
Furthermore, when groups are symmetric, disparate impact is unambiguously unfair. It is well known that popular fairness notions aim at different goals and are compatible with each other only in highly constrained special cases (see, e.g., Kleinberg et al. 2017). The symmetric environment specified by Assumption 2 is one such exception: In this environment, sublinear regret implies many fairness goals conflicting with each other in general (see Section 7). Because this paper’s focus is not to debate which of the various types of fairness notions should be respected, we will concentrate only on the symmetric environment.10
Third, we assume that characteristics are normally distributed, and therefore, the distribution is nondegenerate. This assumption captures the diversity of workers.
(
We consider essentially the same results to hold more generally as long as the characteristics are sufficiently diverse. Note that when we have both Assumptions 2 and 3, then there exist such that and for all . Hence, for all i.
4.2. Perpetual Underestimation
To determine whether social learning incurs linear expected regret, it is useful to check whether it results in perpetual underestimation with a significant probability.
(
When group g0 is perpetually underestimated, no worker from group g0 is hired after the initial sampling phase. If social learning generates perpetual underestimation with a significant probability, then linear expected regret often results. In particular, under Assumption 2, perpetual underestimation against any group implies that firms fail to hire at least best candidate, which is linear in N. Hence, the constant probability of perpetual underestimation (independent of N) precipitates linear expected regret.
4.3. Sublinear Regret with Balanced Population
This section analyzes the case of only one candidate arriving from each group during each period. The contextual variation implicitly urges firms to explore all the groups with some frequency. Consequently, laissez-faire has sublinear regret, implying that statistical discrimination is eventually resolved.
(
Let and be the cumulative distribution function of the standard normal distribution. The constant on the top of is inverse proportional to , which approximately scales as .
All proofs are presented in Online Appendix E.
To prove Theorem 1, we characterize the condition with which underestimation is spontaneously resolved. Let indices i1 and i2 denote the majority candidate and the minority candidate. With a constant (i.e., independent of N) probability, the minority group is underestimated (i.e., is misestimated in such that often occurs) in early rounds because of a bad realization of the error term. Even in such a case, there is some probability of the minority candidate being hired. Because characteristics are diverse (i.e., ), with some probability, the majority candidate i1 is not very good (i.e., is small). In such a round, holds despite group 2 being underestimated, and the minority candidate i2 is hired. In such a case, firms update their belief about the minority, leading to a resolution of underestimation. Such events occur more frequently when workers have more diverse characteristics, that is, is small.
As anticipated by the theory of least squares, the standard deviation of is proportional to , and we demonstrate that its diameter shrinks as , where is the minimum eigenvalue of a matrix. The regret per error is defined by this quantity, with the total regret being .
Theorem 1 indicates that statistical discrimination is resolved spontaneously when candidate variation is large. At a glance, this appears to contradict widely known results that state laissez-faire (greedy) may lead to suboptimal results in bandit problems because of underexploration. However, the variation in characteristics naturally incentivizes selfish agents to explore the underestimated group, and therefore, with some additional conditions, the probability of perpetual underestimation is bounded.
In Theorem 1, we assumed that there is one candidate for each group, , for tractability. If we assume a larger but balanced population, , then the analysis would become significantly more challenging because the maximum of normally distributed variables is not normally distributed. However, we conjecture that a similar result would hold more generally because the variance of the expected skill of the best candidate in each group decreases only slowly as K increases.11
Theorem 1 shares certain intuitions with the previous research (Kannan et al. 2018, Bastani et al. 2021) demonstrating that the variation in contexts (characteristics) improves the performance of the greedy algorithm (laissez-faire) in contextual multiarmed bandit problems. However, our theorem makes no assumptions regarding the length of the initial sampling phase. Theorem 1 in Bastani et al. (2021) corresponds to our paper’s Theorem 1, and we further characterize the factor of the regret as a function of .
4.4. Large Regret with Unbalanced Population
Although Theorem 1 implies that statistical discrimination is spontaneously resolved in the long run, it crucially relies on one unrealistic assumption—the balanced population ratio. In many real-world problems, the population ratio is unbalanced, and the protected group is often a demographic minority in the relevant market. We indeed find that the population ratio crucially impacts the equilibrium consequence under laissez-faire.
(
In the proof of Theorem 2, we evaluate the probability that the following two events occur: (i) is underestimated, and (ii) the characteristics and skills of the hired majority workers are not very bad throughout rounds (i.e., for some constant c > 0). The probability of (i) is polylogarithmic to N (i.e., ), and the probability that (ii) consistently holds for all the rounds is polylogarithmic if . When both (i) and (ii) occur, we always have (where is the unique minority candidate of round n); thus, the minority worker is never hired. Note that the majority group does not suffer from perpetual underestimation (with a significant probability) because the event that all the minority workers are bad occasionally occurs.
Theorem 2 indicates that we should not be too optimistic about the consequences of laissez-faire. A small imbalance in the population ratio (the ratio of majority to minority is just to 1) could lead to a substantially unfair job allocation. Once the minority group is underestimated and the majority candidate pool is reasonably large, then the minority group is afforded no hiring opportunity, perpetuating underestimation. This insight applies to many real-world problems because unbalanced populations are commonplace.
We conjecture a substantial probability under a broader environment than the premise of Theorem 2. Specifically, the assumptions of d = 1 and are made only for analytical tractability, and (approximately) linear regret should be obtained under a weaker set of assumptions. Theorem 2 (i) focuses on perpetual underestimation, which is an extreme form of statistical discrimination, and (ii) evaluates the probability of perpetual estimation occurring loosely. In Section 9, we demonstrate that perpetual underestimation occurs with a significant probability even under the assumptions of d = 5 and , where the premise of Theorem 2 does not hold.
5. The Upper Confidence Bound Mechanism
Section 4 has discussed the equilibrium consequences of laissez-faire. We observed that an unbalanced population ratio leads to a substantial probability of underestimation being perpetuated. Policy intervention is demanded to improve social welfare and the fairness of the hiring market.
This section proposes a subsidy rule to resolve underestimation. We employ the idea of the UCB algorithm (Lai and Robbins 1985, Auer et al. 2002), which has widely been used in the literature on the bandit problem. The UCB algorithm balances exploration and exploitation by developing a confidence interval for the true reward and evaluating each arm’s performance according to its upper confidence bound to achieve this balance. Firms are generally unwilling to follow the UCB decision rule voluntarily; therefore, the government needs to provide a subsidy to incentivize firms to hire a candidate with the greatest UCB index. This section establishes a UCB-based subsidy rule and evaluates its performance.
The adaptive selection of candidates based on history can induce some bias, meaning the standard confidence bound no longer applies. To overcome this issue, we use martingale inequalities (Peña et al. 2008, Rusmevichientong and Tsitsiklis 2010, Abbasi-Yadkori et al. 2011). We here introduce the confidence interval for the true coefficient parameter, .
(
Abbasi-Yadkori et al. (2011) study the property of this confidence interval, and they prove that the true parameter lies in with probability (Lemma 19). By choosing a sufficiently small δ,12 it is “safe” to assess that worker i’s skill is at most
We call the UCB index of worker i’s skill. Intuitively, is worker i’s skill in the most optimistic scenario. The confidence interval shrinks as we obtain more data about group g. Hence, the UCB index converges to true predicted skill as the size of the data set increases.
(
The UCB index is close to the pointwise estimate when society has rich data about group g(i), because is small in such cases. However, when information about group g(i) is insufficient, is much larger than , because the firm is unsure about the true skill of worker i and is large. In this sense, the UCB decision rule offers affirmative actions toward underexplored groups.
The subsidy amount is proportional to the uncertainty surrounding the candidate’s characteristics, which is represented by the confidence interval for . The magnitude of the confidence interval is inverse proportional to .13 Hence, if the data do not vary substantially for a particular dimension of , then that dimension’s prediction can be inaccurate. In such cases, the UCB decision rule recommends hiring a candidate that contributes to increasing that dimension’s data. For example, when a candidate possesses skills previous hires do not, then the candidate’s UCB index tends to become large.
The UCB decision rule efficiently balances exploration and experimentation. Accordingly, it has sublinear regret in general environments.
(
There are three remarks. First, regret is the optimal rate for these sequential optimization problems under partial feedback (Chu et al. 2011). Hence, Theorem 3 states that the UCB decision rule effectively prevents perpetual underestimation and is asymptotically efficient. Second, Theorem 3 relies only on Assumption 3, and therefore, the regret under UCB is sublinear even when groups have a fundamental disparity besides their group sizes. Third, differing from the case of laissez-faire, where the factor depends on the variation of the context (Theorem 1), Theorem 3 provides a reasonably small regret bound even when σx is very small.
To implement the UCB decision rule, we need to satisfy the firms’ obedience condition (1) in conjunction with the UCB decision rule (2). In the following, we propose one of the most straightforward subsidy rules.
(
The UCB index subsidy rule aligns each firm’s incentive with the maximization of the UCB index, thereby incentivizing firms to follow the UCB decision rule.
(
The UCB index subsidy rule is an index policy in the sense that the subsidy amount is independent of the information about rejected workers. In Online Appendix A, we consider a nonindex subsidy rule that implements the UCB decision rule with a smaller budget.
6. The Hybrid Mechanism
The UCB mechanism has one drawback: it continues subsidies in perpetuity. Even for a large n, there remains a gap between estimated skill and the UCB index . This is undesirable for several reasons. First, introducing a permanent policy is often more politically difficult than introducing a temporary policy. Second, a long-term distribution of subsidies tends to increase the required budget. Third, the permanent allocation of the subsidy features (unmodeled) administrative costs.
To overcome these limitations, we propose the hybrid mechanism, which initially uses the UCB mechanism but switches to laissez-faire by terminating the subsidy at some point. We abandon the UCB phase upon receiving sufficient minority-group data to induce spontaneous exploration. Similar to the UCB mechanism, our hybrid mechanism has regret. Furthermore, its expected total subsidy amount is only , whereas the UCB mechanism needs subsidy.
The construction of the hybrid mechanism is as follows. Let be the size of the confidence bound. Note that corresponds to the amount of the subsidy allocated by the UCB index subsidy rule (Definition 5). The hybrid index is defined as
The hybrid index is literally a “hybrid” of estimated skill and the UCB index . If the difference between the UCB index and estimated skill surpasses the threshold (i.e., ), then the hybrid index is equal to the UCB index . The confidence bound is large when society has insufficient knowledge about group g(i), which is typically the case during early stages of the game. Once this gap falls below the threshold (i.e., ), then the hybrid index switches to the estimated skill .
The hybrid decision rule hires the worker who has the greatest hybrid index.
(
Because the hybrid decision rule is a hybrid of the UCB decision rule and the laissez-faire decision rule, it can be implemented by mixing the laissez-faire subsidy rule and the UCB index subsidy rule.
(
The following theorem characterizes the regret and the total subsidies associated with the hybrid mechanism.
(
Furthermore, for any a > 0, the total amount of the subsidy under the hybrid index subsidy rule () is bounded as
The order of the regret under the hybrid decision rule is the same as the original UCB, and the subsidy amount is reduced to (with respect to N). This is a substantial improvement from the UCB mechanism, which requires the subsidy.
The threshold for switching from the UCB mechanism to laissez-faire is crucial for guaranteeing the performance of the hybrid mechanism. Our threshold, , is determined such that the hybrid decision rule satisfies proportionality, a new concept that this paper establishes. We prove that the amount of exploration exerted by the hybrid decision rule is proportional to the UCB decision rule. This property guarantees that the hybrid rule resolves underestimation and secures the expected regret of . The formal statement appears in Lemma 28 in Online Appendix E.5.
7. Regret and Fairness Criteria
This section analyzes the connection between sublinear regret social learning and equalized odds and demographic parity, fairness notions prevalent in fair machine-learning literature (see, e.g., a survey by Makhlouf et al. 2021). For clarity, we invoke Assumption 1, specifying that group 1 is the majority (unprotected) group and group 2 is the minority (protected) group.
7.1. Sublinear Regret
Thus far, we have mainly used regret as a fairness notion. Recall that a decision rule ι is said to have sublinear regret if for some a < 1. This implies that the decision rule eventually makes an unbiased decision in the sense that each firm hires based on accurately estimated skill predictors, qi, uninfluenced by the workers’ group affiliation. Consequently, sublinear regret implies asymptotic unbiasedness in firms’ decision-making processes.
7.2. Equalized Odds
Equalized odds, defined below, is the fairness notion most directly related to sublinear regret.
(
Equalized odds requires that the hiring practices of firms perform equitably across various groups. This implies that if a worker possesses the highest skill predictor qi for a given round, then the probability of her being hired remains independent of her group affiliation. Considering we are dealing with a social learning problem, imposing this condition for all rounds would be overly restrictive; hence, we only enforce it as an asymptotic condition.
The following theorem characterizes the relationship between sublinear regret and equalized odds.
(
The intuition is as follows. The violation of equalized odds leads to persistent biased decisions by firms. Consequently, society experiences enduring, nondiminishing regret, which signifies a failure in sublinear-regret learning. Therefore, the realization of sublinear regret inherently necessitates the fulfillment of equalized odds.
7.3. Demographic Parity
Demographic parity is defined as follows.
(
Demographic parity necessitates the hiring probability to be indifferent to affiliation with the minority group. When this criterion is met, the proportion of hired workers from group g aligns with the overall population ratio of group-g workers. Analogous to Definition 8, the imposition of this condition for all rounds would be excessively restrictive; hence, we establish it as an asymptotic condition.
It is widely recognized that numerous fairness notions in machine learning are often at odds with each other and coexist only within very limited contexts (see Kleinberg et al. 2017). Demographic parity and equalized odds are prime examples of this conflict. Demographic parity necessitates equal treatment across groups, ignoring the individual skills of each worker. In contrast, equalized odds demands that firms hire proficient workers based on their skills, uninfluenced by their groups. Clearly, these two objectives are incompatible when the distribution of workers’ skills differs among groups. Because sublinear regret aligns with equalized odds, it generally contradicts demographic parity within general environments.
This paper does not aim to adjudicate between conflicting fairness notions, which cannot be simultaneously satisfied. Therefore, the core discussion assumes that groups have no significant disparities beyond their sizes (Assumption 2). Under this assumption, the successful hiring of the most skilled workers will naturally result in the hired group’s composition reflecting the population ratio. Consequently, sublinear regret, equalized odds, and demographic parity align. The theorem presented subsequently formalizes this.
(
Theorem 7 elucidates that in instances of group symmetry, equalized odds and demographic parity are compatible, eliminating the contention over the choice of fairness notions. Moreover, these two properties are guaranteed by sublinear regret.
Although our discussion has centered on equalized odds and demographic parity, the machine-learning literature has developed numerous other fairness notions. It is anticipated that many of them align within symmetric groups, but typically exhibit conflicts with sublinear regret and equalized odds in more general environments.
8. Rooney Rule
Subsidy rules can address statistical discrimination, but practical implementation challenges persist. In Online Appendix B, we examine the Rooney Rule, which mandates that firms interview at least one candidate from every group. This rule sidesteps the complexities of subsidies and hiring quotas, making it straightforward to apply. We summarize our key findings below.
Our two-stage model facilitates the analysis of the Rooney Rule. In this model, a firm only invites a limited number of candidates for interviews and receives additional signals on their skills. This rule compels firms to interview at least one representative from all groups. The outcomes are mixed. The Rooney Rule reduces the underexploration of minority groups by boosting their hiring probability, thus averting persistent underestimation. Yet, it occasionally and undesirably excludes qualified majority candidates in the first stage. Simulations suggest that phasing out the Rooney Rule can mitigate this drawback. For a comprehensive discussion, refer to Online Appendix B.
9. Simulation
This section presents the outcomes of our simulations.14 Unless specified, model parameters are set as , λ = 1, and N = 1,000. Group sizes are set to be . The initial sample size is , and the sample size for each group is equal to its population ratio: . We draw 4,000 paths independently for each simulation scenario. The value of δ in the confidence bound is set to 0.1.
9.1. The Effects of Population Ratio
We test how the population ratio impacts the frequency of perpetual underestimation. The decision rule is fixed to LF. We fix the number of minority candidates in each round to two (i.e., ) and vary the number of majority candidates ().
Figure 1 exhibits the simulation result. Consistent with our theoretical analyses, we observe that (i) as indicated by Theorem 1, laissez-faire rarely produces perpetual underestimation if the population is balanced (i.e., K1 is close to ), and (ii) as indicated by Theorem 2, the larger the population of majority workers (i.e., K1 increases), the more frequently perpetual underestimation occurs. With , perpetual underestimation occurs in more than 2% of runs, which is large enough to ensure that laissez-faire produces (approximately) linear regret.

Notes. Across 4,000 runs. The error bars represent the two-sigma binomial confidence intervals. PU, perpetual underestimation.
9.2. Laissez-Faire vs. the UCB Mechanism
Figure 2 compares the regret associated with the LF decision rule and the UCB decision rule. As indicated by Theorem 2, our simulation shows that laissez-faire has a significant probability of underestimating the minority group. Consequently, laissez-faire sometimes causes perpetual underestimation, and regret grows (approximately) linearly to n. Furthermore, because of the possibility of perpetual underestimation, the confidence intervals of the sample paths (denoted by the red area) are very large, indicating the highly uncertain performance of laissez-faire. In contrast, consistent with Theorem 3, the UCB decision rule performs much more stably. Because the UCB rule avoids underexploration, it does not cause perpetual underestimation.

Note. The lines are averages over sample paths, the areas cover between the 5th and 95th percentiles of runs, and the error bars at N = 1,000 are the two-sigma confidence intervals.
9.3. The UCB Mechanism vs. the Hybrid Mechanism
Next, we compare the performance of the UCB and hybrid mechanisms. The parameter of the hybrid mechanism is set to be a = 0.5. Figure 3 shows the associated regret. As Theorems 3 and 5 anticipated, the regrets associated with the two decision rules are similar (these two decision rules have the same order: ). Figure 4 compares the subsidy rules. As Theorems 4 and 5 predicted, the subsidy required for the UCB index rule grows at the rate of , whereas the hybrid index subsidy rule only requires only a constant subsidy, implying that the policy intervention can be terminated at some point. Furthermore, the hybrid index subsidy rule requires a much smaller budget than the UCB index subsidy rule. To summarize, the hybrid mechanism produces similar regret as the UCB mechanism with a much smaller budget.15

Note. The lines are averages over sample paths, the areas cover between the 5th and 95th percentiles of runs, and the error bars at N = 1,000 are the two-sigma confidence intervals.

Note. The lines are averages over sample paths, the areas cover between the 5th and 95th percentiles of runs, and the error bars at N = 1,000 are the two-sigma confidence intervals.
In Online Appendix G.1, we demonstrate that a nonindex subsidy rule implements the UCB decision rule with a smaller total budget; its total subsidy cannot be bounded by a constant.
9.4. Fairness Metrics
Figure 5 illustrates the disparity in equalized odds, representing the empirical analog of the sum of the terms on the left-hand side of Equation (3) and Equation (4). Similarly, Figure 6 displays the demographic disparity, corresponding to the empirical analog of the term on the left-hand side of Equation (9).16 For every n, both the UCB and hybrid decision rules outperform laissez-faire concerning the two fairness metrics. Notably, with our simulation setting, the differences in performance related to equalized odds are especially pronounced.

Note. The lines are averages over sample paths, and the error bars at N = 1,000 are the two-sigma confidence intervals.

Note. The lines are averages over sample paths, and the error bars at N = 1,000 are the two-sigma confidence intervals.
10. Conclusion
We have studied statistical discrimination using a contextual multiarmed bandit model. Our dynamic model articulates how a failure of social learning produces statistical discrimination. In our model, the insufficiency of data about minority groups is endogenously generated. This data shortage prevents firms from accurately estimating the skill of minority candidates. Consequently, firms tend to prefer hiring majority candidates, leading the data insufficiency to persist. We have demonstrated that an unbalanced population ratio leads laissez-faire to tend toward perpetual underestimation, an unfair and inefficient consequence.
We analyzed two possible policy interventions. One is subsidy rules that incentivize firms to hire minority candidates. Our hybrid mechanism achieves regret with subsidy. Another intervention is the Rooney Rule, which requires firms to interview at least one minority candidate. Our result indicates that terminating the Rooney Rule at an appropriate point would resolve statistical discrimination while maintaining the social welfare level. These results contrast with some of the previous studies (e.g., Foster and Vohra 1992, Coate and Loury 1993, Moro and Norman 2004) demonstrating the possible counterproductivity of affirmative-action policies.
Our analyses of the two interventions provide a consistent policy implication: Affirmative actions effectively resolve statistical discrimination caused by data insufficiency, but such actions should be lifted upon acquiring sufficient information. Accordingly, a temporary affirmative action constitutes the best approach to resolving statistical discrimination as a social learning failure.
1 For instance, Trix and Psenka (2003) analyze letters of recommendation for medical faculty, finding systematic differences between those written for female and male applicants. Hanna and Linden (2012) postulate that students belonging to lower castes in India tend to receive unjustifiably lower exam scores. In the context of teaching evaluations, MacNell et al. (2015) and Mitchell and Martin (2018) illustrate that students rate male identities significantly higher than female ones. In a study of online freelance marketplaces, Hannák et al. (2017) establish that gender and race significantly correlate with worker evaluations.
2 , and are Landau notations that ignore polylogarithmic factors. We often treat polylogarithmic factors as if they were constant because these factors grow very slowly ( for any exponent ).
3 The trade-off between exploration and exploitation presents itself in a wider context. For example, Owen and Varian (2020) propose “tie-breaker designs” which are hybrids of randomized controlled trials and regression discontinuity designs, and solve the optimal trade-off between information gain (exploration) and efficiency in the treatment allocation (exploitation).
4 Although real-world firms are long-lived and hire multiple workers, the number of workers hired by one firm is typically much smaller than the total number of workers hired in a hiring market. Accordingly, even if we allowed firms to hire multiple (but a small number of) workers, the conclusion would not change qualitatively. Note also that various seminal papers within the social learning literature (such as Banerjee 1992, Bikhchandani et al. 1992, Smith and Sørensen 2000) have made the same assumption.
5 This assumption is for simplicity. Because firms have no private information, the fact that a worker was previously rejected by another firm does not influence the worker’s evaluation; thus, entrant workers and incumbent workers have no informational difference. Accordingly, even if workers stay for multiple periods, our conclusion will not be changed qualitatively.
6 Alternatively, we can assume that firms only share information about a certain fraction of workers. We expect that, under this assumption, (i) the results would not change qualitatively, and (ii) the statistical discrimination would become more severe because it becomes more difficult to accumulate information about the minority group.
7 We expect that essentially the same results will be obtained with Bayesian firms (see Online Appendix C).
8 For the properties of the ridge estimator, see Kennedy (2008), for example.
9 The required size of is specified by Equation (36) in the online appendix.
10 We confirmed through simulations that the proposed mechanisms are effective in a broad class of asymmetric environments. See Section 7 and Online Appendix G.4.
11 Lemma 22 in the online appendix implies that the variance is in the order of .
12 We typically choose to make the confidence interval asymptotically correct in the limit of .
13 The standard OLS has a confidence bound of the form and thus . The price of adaptivity causes the martingale confidence bound to be larger than the OLS confidence bound for two factors: (i) the factor and (ii) the factor. As discussed in Xu et al. (2018), the factor unnecessarily overestimates the confidence bound in most cases.
14 The source code of the simulations is available at https://github.com/jkomiyama/FairSocialLearning.
15 In Online Appendix G.2, we confirm that this observation is robust against the parameter value of a.
16 The precise definition is shown in Online Appendix F.
References
- 2011) Improved algorithms for linear stochastic bandits. Taylor JS, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Proc. Twenty-fourth Conf. Neural Inform. Processing Systems (Curran Associates, New York), 2312–2320.Google Scholar (
- 1999) Associative reinforcement learning using linear probabilistic concepts. Bratko I, Dzeroski S, eds. Proc. Sixteenth Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 3–11.Google Scholar (
- 2004) How to get yourself on the door of a job: A cross-cultural contrastive study of Arabic and English job application letters. J. Multilingual Multicultural Development 25(1):1–23.Crossref, Google Scholar (
- 1973)
The theory of discrimination . Ashenfelter O, Rees A, eds. Discrimination in Labor Markets (Princeton University Press, Princeton, NJ), 3–33.Google Scholar ( - 2002) Finite-time analysis of the multi-armed bandit problem. Machine Learn. 47(2):235–256.Crossref, Google Scholar (
- 1992) A simple model of herd behavior. Quart. J. Econom. 107(3):797–817.Crossref, Google Scholar (
- 2020) Early-career discrimination: Spiraling or self-correcting? Working paper, Duke University, Durham, NC.Google Scholar (
- 2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar (
- 2019) Equal opportunity in online classification with partial feedback. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R, eds. Proc. Thirty-Third Conf. Neural Inform. Processing Systems (Curran Associates, New York), 8972–8982.Google Scholar (
- 1992) A theory of fads, fashion, custom, and cultural change as informational cascades. J. Political Econom. 100(5):992–1026.Crossref, Google Scholar (
- 2019) The dynamics of discrimination: Theory and evidence. Amer. Econom. Rev. 109(10):3395–3436.Crossref, Google Scholar (
- 2023) Inaccurate statistical discrimination: An identification problem. Rev. Econom. Statist., 1–45.Crossref, Google Scholar (
- 2010) Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21(2):277–292.Crossref, Google Scholar (
- 2019) Statistical discrimination in ratings-guided markets. Working paper, Columbia University, New York.Google Scholar (
- 2020) The fair contextual multi-armed bandit. Jonas P, David S, eds. Proc. Nineteenth Internat. Conf. Autonomous Agents Multiagent Systems (Journal of Machine Learning Research), 1810–1812.Google Scholar (
- 2011) Contextual bandits with linear payoff functions. Geoffrey G, David D, Miroslav D, eds. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist. (Journal of Machine Learning Research), 208–214.Google Scholar (
- 1993) Will affirmative-action policies eliminate negative stereotypes? Amer. Econom. Rev. 83(5):1220–1240.Google Scholar (
- 1996) Culture, information, and screening discrimination. J. Political Econom. 104(3):542–571.Crossref, Google Scholar (
- 2022) Statistical discrimination and affirmative action in the laboratory. Games Econom. Behav. 132:41–58.Crossref, Google Scholar (
- 1992) An economic argument for affirmative action. Rationality Soc. 4(2):176–188.Crossref, Google Scholar (
- 2014) Incentivizing exploration. Babaioff M, Conitzer V, Easley DA, eds. Proc. Fifteenth ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 5–22.Google Scholar (
- 2012) Discrimination in grading. Amer. Econom. J. Econom. Policy 4(4):146–168.Crossref, Google Scholar (
- 2017) Bias in online freelance marketplaces: Evidence from TaskRabbit and Fiverr. Lee CP, Poltrock SE, Barkhuus L, Borges M, Kellogg WA, eds. Proc. 2017 ACM Conf. Comput. Supported Cooperative Work Soc. Comput. (Association for Computing Machinery, New York), 1914–1933.Google Scholar (
- 2016) Equality of opportunity in supervised learning. Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R, eds. Proc. Twenty-ninth Conf. Neural Inform. Processing Systems (Curran Associates, New York), 3315–3323.Google Scholar (
- 2018) A short-term intervention for long-term fairness in the labor market. Champin PA, Gandon G, Lalmas M, Ipeirotis PG, eds. Proc. 2018 World Wide Web Conf. (Association for Computing Machinery, New York), 1389–1398.Google Scholar (
- 2020) Incentivizing exploration with selective data disclosure. Biró P, Hartline JD, Ostrovsky M, Procaccia AD, eds. Proc. Twenty-First ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 647–648.Google Scholar (
- 2018) Exploration vs. exploitation in team formation. Christodoulou G, Harks T, eds. Proc. Fourteenth Conf. Web Internet Econom., vol. 11316 (Springer, Berlin, Heidelberg), 452.Google Scholar (
- 2016) Fairness in learning: Classic and contextual bandits. Lee DD, Sugiyama M, von Luxburg U, Guyon I, Garnett R, eds. Proc. Twenty-Ninth Conf. Neural Inform. Processing Systems (Curran Associates, New York), 325–333.Google Scholar (
- 2019) Downstream effects of affirmative action. Boyd D, Morgenstern JH, eds. Proc. Second Conf. Fairness Accountability Transparency (Association for Computing Machinery, New York), 240–248.Google Scholar (
- 2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. Thirty-Second Conf. Neural Inform. Processing Systems (Curran Associates, New York), 2227–2236.Google Scholar (
- 2017) Fairness incentives for myopic agents. Proc. 2017 ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 369–386.Google Scholar (
- 2008) A Guide to Econometrics, 6th ed. (Wiley-Blackwell, Hoboken, NJ), 192–202.Google Scholar (
- 2017) Inherent trade-offs in the fair determination of risk scores. Papadimitriou CH, ed. Proc. Eighth Conf. Innovations Theoret. Comput. Sci. (Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl), 43:1–43:23.Google Scholar (
- 2014) Implementing the ‘wisdom of the crowd’. J. Political Econom. 122(5):988–1012.Crossref, Google Scholar (
- 1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar (
- 2008) The epoch-greedy algorithm for contextual multi-armed bandits. Proc. Twentieth Conf. Neural Inform. Processing Systems (Curran Associates, New York), 817–824.Google Scholar (
- 2020) Hiring as exploration. NBER Working Paper No. 27736, National Bureau of Economic Research, Cambridge, MA.Google Scholar (
- 2015) What’s in a name: Exposing gender bias in student ratings of teaching. Innovative Higher Ed. 40(4):291–303.Crossref, Google Scholar (
- 2000) Endogenous inequality in integrated labor markets with two-sided search. Amer. Econom. Rev. 90(1):46–72.Crossref, Google Scholar (
- 2021) Machine learning fairness notions: Bridging the gap with real-world applications. Inform. Processing Management 58(5):102642.Crossref, Google Scholar (
- 2020) Bayesian incentive-compatible bandit exploration. Oper. Res. 68(4):1132–1161.Link, Google Scholar (
- 2018) Gender bias in student evaluations. PS Political Sci. Politics 51(3):648–652.Crossref, Google Scholar (
- 2019) Discrimination in online markets: Effects of social bias on learning from reviews and policy design. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R, eds. Proc. Thirty-Second Conf. Neural Inform. Processing Systems (Curran Associates, New York), 2145–2155.Google Scholar (
- 2004) A general equilibrium model of statistical discrimination. J. Econom. Theory 114(1):1–30.Crossref, Google Scholar (
- 2018) Experimental research on labor market discrimination. J. Econom. Literature 56(3):799–866.Crossref, Google Scholar (
- 2020) Optimizing the tie-breaker regression discontinuity design. Electronic J. Statist. 14(2):4004–4027.Crossref, Google Scholar (
- 2018) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.Link, Google Scholar (
- 2008) Discrimination-aware data mining. Li Y, Liu B, Sarawagi S, eds. Proc. Fourteenth ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 560–568.Google Scholar (
- 2008) Self-Normalized Processes: Limit Theory and Statistical Applications (Springer Science & Business Media, Berlin).Google Scholar (
- 1972) The statistical theory of racism and sexism. Amer. Econom. Rev. 62(4):659–661.Google Scholar (
- 1998) A cross-cultural comparison of letters of recommendation. English Specific Purposes 17(3):241–265.Crossref, Google Scholar (
- 2018) The externalities of exploration and how data diversity helps exploitation. Bubeck S, Perchet V, Rigollet P, eds. Proc. Machine Learn. Res., vol. 75 (PMLR), 1724–1738.Google Scholar (
- 1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar (
- 2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar (
- 2000) Pathological outcomes of observational learning. Econometrica 68(2):371–398.Crossref, Google Scholar (
- 1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar (
- 2003) Exploring the color of glass: Letters of recommendation for female and male medical faculty. Discourse Soc. 14(2):191–220.Crossref, Google Scholar (
- 2018) A fully adaptive algorithm for pure exploration in linear bandits. Perez-Cruz, ed. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist. (Journal of Machine Learning Research), 843–851.Google Scholar (