In a field experiment, we find that a mandatory mentorship program raises worker productivity, whereas a voluntary version of the program does not. A significant reason why the mandatory program results in larger gains is that the lowest-productivity employees do not participate when the program is voluntary despite their having the greatest treatment benefits. A nationally representative survey of U.S. workers shows wide variation in human capital development program participation, suggesting that understanding self-selection is important for firms’ returns on these programs across a variety of settings. Our findings have implications for resource allocation, experimental design, productivity dispersion, and inequality.

This paper was accepted by Maria Guadalupe, business strategy.

Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2024.07524.

1. Introduction

Organizations actively train, educate, and upskill their employees through human capital development programs. These programs represent an increasingly large component of corporate spending and are a growing driver of firm value (Zingales 2000, Edmans 2011, Bloom et al. 2016, Rouen and Regier 2022, Nishesh et al. 2024). Although U.S. firms’ training investments alone now total more than $100 billion annually (Statista 2022), relatively little research addresses how organizations choose to allocate development resources across workers.¹ We conduct a novel field experiment to estimate the tradeoffs involved around a ubiquitous allocation decision: whether to make employee development programs mandatory or voluntary. Whether a mandatory or voluntary program is more effective—and improves resource allocations—hinges on workers’ self-selection. If those who benefit most opt into voluntary programs, then self-selection enables firms to target resources to employees with the highest return. If, instead, workers volunteer at random (or worse, if participation is negatively correlated with program gains), then mandates may be needed to efficiently allocate resources among workers.

We investigate the effectiveness of a mentorship program and the implications of making it mandatory or voluntary in a field experiment involving a large U.S.-based sales organization.² Salespeople at the firm answer incoming calls to sell digital subscriptions (e.g., television, Internet, and cellular services), and their incentives are likely aligned with the firm to increase their own human capital (Zivin et al. 2021), because commissions comprise more than one-third of the median employee’s compensation.³ This setting is well-suited to evaluate the design of human capital development programs for at least five reasons: (i) Sales agents work independently of each other, (ii) we have individual, daily sales performance data for all workers, (iii) inbound calls are randomly assigned to sales agents, (iv) the firm regularly hires new sales agents in cohorts that train together, allowing the program to be administered under different conditions for similar groups, and (v) there is variation in productivity across new hires (e.g., new agents at the 75th percentile of the sales revenue distribution generate twice as much revenue as those at the 25th percentile), allowing us to characterize differences in who selects into the program and who benefits most from it across the productivity distribution. Although these features are attractive for estimation, the evidence likely generalizes to many other work settings. There are more than four million jobs in sales and customer service occupations in the United States where workers do tasks similar to those at the study firm (Ruggles et al. 2025). In addition, a nationally representative survey that we fielded confirms that (a) the program that we test exists in many companies, (b) companies vary in whether they make their programs mandatory or voluntary, and (c) many workers choose not to participate in voluntary programs.

Our experiment entailed two levels of randomization; the first (high level) is at the cohort level for training classes of new hires, and the second (low level) is at the worker level within a cohort. At the high level, new-hire cohorts were randomized into one of two groups, labeled the Mandatory-Condition and the Voluntary-Condition. In Mandatory-Condition cohorts, the lower-level treatment involved randomly assigning agents to have a mentor or not; that is, agents were not first asked whether they wanted to participate in the program. For Voluntary-Condition cohorts, on the first day of training, the firm’s staff briefly described the mentorship program and asked each new hire to privately indicate whether they had an interest in participating in the program. We label those who indicated an interest in participating as, “opted in;” for those who opted in, the lower-level treatment involved assigning a random subset to receive a mentor. Agents who “opted out” of participation were not assigned a mentor. All mentor assignments were randomly drawn from a pool of established sales agents who had no formal authority over those they mentored (i.e., mentors were more experienced, lateral peers without supervisory responsibilities). Mentors volunteered to be in the program with the understanding that participation would look favorable for future promotions. Matched mentor-protégé pairs were asked to meet for 30 minutes each week for four weeks and to record what they discussed on a worksheet. Mentors were not informed of which condition, mandatory or voluntary, their matched protégé came from.

We find that mentorship had a positive and statistically significant effect on workers’ productivity in the Mandatory-Condition. Specifically, workers who received a mentor increased their individual output by about 0.145 standard deviations when using a combined index of four productivity measures. For mentored workers, daily sales revenue and revenue-per-call (the firm’s two focal performance measures) increased by 19% and 12%, respectively, compared with non-mentored agents over their first two months of tenure. The primary mechanism appears to be knowledge transfer, because about 45% of the treatment effects persist through agents’ first six months of tenure, well after the program’s conclusion. A follow-up survey indicated that the program allowed treated agents to ask questions and receive help, consistent with improved psychological safety and subsequent knowledge transmission (Edmondson and Lei 2014, Chandrasekhar et al. 2018, Castro et al. 2022). In addition, worksheets that mentor-protégé pairs completed during the program indicate that knowledge exchange occurred.

Next, we test whether firms can enhance average program outcomes by making the program voluntary. If workers who benefit most participate when the mentorship program is optional, then a voluntary program allows the firm to target those with the highest gains. Alternatively, a voluntary program could have no difference in effectiveness if treatment effects are uniform across workers, or it could even dampen the benefits if there are heterogeneous treatment effects and self-selection is negatively correlated with treatment gains. We may also find different effects in the voluntary program for other reasons, such as program framing. For example, workers could be more engaged with the program if they were the ones to volunteer for it, or they could be less engaged if the program’s voluntary nature signals a lack of importance.

We find that mentorship did not affect the productivity of workers who opted into the program in the Voluntary-Condition. Specifically, those who opted into the program and were randomly assigned a mentor had similar levels of productivity to those who opted in and were not assigned a mentor.

To assess why treatment affects differ between the Mandatory- and Voluntary-Conditions, we next quantify the relative importance of self-selection, heterogeneous treatment effects, and other mechanisms, such as program framing. We quantify the degree of self-selection as a function of realized productivity by comparing sales output for those who opt into the program and are not assigned a mentor with those who opt out. Workers who opt out are much less productive, with revenue between 23% and 30% lower than untreated workers who opt into the program. The strongest predictor of opting out is a pre-hire assessment score, which is given by interviewers during the recruitment process. Workers with low pre-hire assessments are much more likely than others to opt out of program participation. In contrast, workers’ demographics, work history, and personality characteristics (collected via surveys) have little predictive power for the opt-out decision.⁴ We use these factors to predict a propensity score for whether agents in the Mandatory-Condition would have been more or less likely to opt out of the program had they been given the choice.

When we quantify heterogeneous treatment gains in the mandatory program based on agents’ propensity to opt out of mentorship, we find that workers who are the least likely to participate in the program benefit most. We estimate separate treatment effects for agents in the top tercile of the opt-out propensity score distribution (i.e., those with a high likelihood of opting out) and for those in the bottom two terciles (i.e., those with a low likelihood of opting out).⁵ The mentorship treatment significantly raised the productivity of those agents in the top tercile (labeled likely-to-opt-out agents), whereas the effect of mentorship was significantly weaker among agents who were less likely to opt out. Furthermore, likely-to-opt-out agents generate significantly less revenue than those in the bottom two terciles when untreated, consistent with the previously discussed selection effects. This exercise suggests that self-selection and heterogeneous treatment effects explain about one-third of the overall gap in treatment effects between conditions.⁶

The remaining difference in treatment effects between the Mandatory- and Voluntary-Conditions is due to other factors, such as program framing. Program framing can impact overall buy-in and engagement, including that from workers who would have participated in the voluntary program were they proffered the choice. We find evidence suggesting that framing is likely important because treated agents in the Mandatory-Condition who were likely to participate in the program had greater sales gains because of mentorship than similar workers in the Voluntary-Condition. In addition, treated agents in the Mandatory-Condition were more likely to meet with their mentors than were treated agents in the Voluntary-Condition. Although we note that the program framing explanation is indirect and based on suggestive evidence, other channels are less likely to explain the difference in treatment effects across conditions. For instance, in both conditions, the mentorship program had no impact on retention, and three empirical exercises suggest that retention differences do not explain the sales revenue treatment effects. Furthermore, we designed the experiment to test for information leakages, other violations of the Stable Unit Treatment Value Assumption (SUTVA), and crowding out of organic mentorship. We find no evidence for these channels.

The firm realized significant benefits from implementing the mandatory version of the mentorship program. To quantify the gains, we calculate the return on investment across all mentoring relationships over a six-month horizon from treatment. We account for worker turnover by filling in a random, non-mentored replacement agent when we observe a worker leave the firm (either treated workers or those in the control group). Using this approach, we find that the firm gained $536,000 in revenues from treating 127 agents in the Mandatory-Condition over a six-month posttreatment horizon. The total costs, including overhead costs, associated with the mandatory program were $97,000. As such, the firm realized a $439,000 return on a $97,000 investment by implementing the mandatory mentorship program.

The program implementation had substantial implications for returns to the firm. Had the voluntary program instead been mandatory, the firm would have gained an additional $207,000, assuming that one-third of the treatment effect difference because of self-selection and heterogeneous effects carries over from the Mandatory-Condition. Although our results suggest that firms cannot always rely on self-selection to allocate workers with the highest returns into human capital development programs, in some settings alternative allocation approaches may be superior to mandatory rules that draw in all workers (Johnson et al. 2023, Li et al. 2025). In those cases, testing absent selection concerns using the approach in the Mandatory-Condition can inform refinements to resource allocation decisions. In practice, firms face a tradeoff between gathering data for better targeting up front and the horizon over which treatment gains are realized. In workplaces with high rates of attrition, the delay costs of gathering data may outweigh the misallocation costs of treating too many workers.

Our findings speak to the efficacy of human capital development programs, who benefits from these programs, how firms choose to deploy them, and how data from pilots that evaluate them should be interpreted. First, we provide evidence on the effectiveness of mentorship programs, advancing the understanding of whether widespread adoption of mentoring is justified (Hilmer and Hilmer 2007, Gutner 2009, Lyle and Smith 2014, Ginther et al. 2020). Addressing this question inside firms has been challenging because of nonrandom selection in most mentorship settings (Allen et al. 2017). Related work has studied the efficacy of other types of workplace programs, like purpose workshops and wellness programs (Gubler et al. 2018, Jones et al. 2019, Ashraf et al. 2025). Studying who responds to human capital development opportunities within the workplace may also be informative for the administration of public and social-sector training programs, which often report difficulty in attracting participation (Delfino et al. 2024).

Second, our results offer a proof of concept that the greatest beneficiaries of human capital development programs often fail to take advantage of the resources available to them.⁷ When some types of workers are less likely to engage with workplace programs, recruiting the right people is likely complementary to programs that rely on self-selection (Oyer and Schaefer 2011, Del Carpio and Guadalupe 2022).

What is clear from our results is that treatment effect heterogeneity is significant. Failure to participate in programs by weaker employees likely contributes to the long-term, widespread productivity dispersion within firms that has been documented across many other settings (for an overview, see Hoffman and Stanton 2024; for healthcare, see Finkelstein et al. 2016, Currie and MacLeod 2017, Currie and MacLeod 2020, and Chan et al. 2022; for judges, see Coviello et al. 2014; for teachers, see Chetty et al. 2014; and for services, see Lazear et al. 2015, 2016). Prior research has identified key drivers of differences in productivity, innovation, and compensation across workers and firms, such as management practices, managerial talent, capital formation, labor market concentration, and firm size (Bertrand and Schoar 2003, Bloom and Van Reenen 2007, Bloom et al. 2014, Larrain 2015, Lazear et al. 2015, Benson et al. 2019, Custódio et al. 2019, Bandiera et al. 2020, Benmelech et al. 2022, Friebel et al. 2022, Metcalfe et al. 2023, Benson et al. 2024). We show that variation in the administration of workplace programs may have profound consequences for the development and performance outcomes of workers in the lower tail of the productivity distribution.⁸ This heterogeneity also speaks to the emerging evidence on the importance of managers for encouraging workers’ career development and training (Hoffman and Tadelis 2021, Minni 2023, Diaz et al. 2025), in addition to the role of structured management practices in both attracting and retaining top workers (Cornwell et al. 2021).

Third, our findings suggest that firms must consider the selection margin when testing workplace programs. Whereas most work on selection underscores that firms can use design features to get advantageous selection or to exploit workers’ behavioral biases (Larkin and Leider 2012, Carter et al. 2019, Hoffman and Burks 2020, Huffman et al. 2022, Englmaier et al. 2025), our findings suggest that selection effects are (a) significant and (b) difficult to predict ex ante. This highlights the importance of running pilots with different recruitment and selection criteria ahead of broad deployment. Iconic recommendations on the econometrics of program evaluation suggest that randomization among program applicants is close to ideal for understanding potential outcomes (Heckman et al. 1997). Our results demonstrate that selection on who applies to a program can impact inference if the applicant composition changes when a program is deployed widely (List 2022).⁹

2. Firm Setting

The study firm operates inbound sales call centers on behalf of several companies and brands, most of which are television, phone, and Internet providers. Participants in the experiment are broadly representative of the four million U.S. workers in similar occupations. For example, average hourly earnings at the firm were about $21 per hour in 2019, whereas customer service representatives, telemarketers, and miscellaneous sales representatives earned about $23 per hour nationally and $20 per hour in Utah, where the firm is located.¹⁰

The mentoring program occurred from January to December 2019. Our data track new hires’ performance on the job after the conclusion of the program through early 2020. Sales agents answer incoming calls from potential customers and sell digital services with the goal of closing sales and upselling premium service packages. Firm insiders report that learning the sales process (e.g., how to run credit checks for equipment lease compliance or determine whether callers qualify for regional sales promotions) and how to upsell can be challenging for new hires.

When hired, sales agents begin a two-week training program, where they learn the sales process through lectures and by listening to other agents’ live calls. Once agents complete their two-week training, they are allocated to a team and begin answering inbound sales inquiries. Teams are typically comprised of 10–15 individuals overseen by a (direct) sales manager, who is responsible for monitoring performance and troubleshooting issues faced by the agents. Individuals from the same hiring cohort can be allocated to different teams after training; however, cohorts are recruited in service of selling a particular company’s products. Agents eligible for the mentorship program were spread across seven different sales divisions, corresponding to different companies’ brands or products.

This setting has several attractive features for studying the efficacy of mentorship. Most importantly, the firm provided us with individual-level performance measures for each sales agent. Sales agents work independently on a call from start to finish, without subsequent handoffs. Incoming calls are allocated to the next available agent within the appropriate division (each division receives calls from different phone numbers, depending on the service being sold and the location of the callers, and the opportunities are then randomly allocated to agents in the division through the firm’s call routing software). As such, agents do not have prior information about which calls may be more or less lucrative; that is, they cannot sort into better opportunities. Agents generate revenue through each sale they make. The firm’s focal productivity measure is revenue per call (RPC) because it allows managers to remove demand variation when comparing performance across workers. In addition, total revenue is important for workers because the absolute amount of revenue generated impacts workers’ commission pay. At the end of each week, the total amount of revenue generated is multiplied by an agent’s commission rate. The commission rate is a coarse function of the agent’s selling efficiency (determined by RPC and revenue per hour worked) relative to other agents in the same division. Commission rates range from 3% to 8%.¹¹ Multiplying the worker’s revenue and commission rate determines his or her weekly commission pay. Sales agents also earn an hourly wage that begins above the federal minimum wage and increases with tenure.

3. Experimental Design

The experiment involves two high-level treatment conditions that were first assigned at the new-hire training class (cohort) level. Lower-level subtreatments involving the assignment of mentors then occurred within each cohort. Training cohorts are specific to an office location and division. Cohorts joined the firm on a rolling basis during the experiment. We randomly assigned each cohort to either the Mandatory-Condition (probability 40%) or the Voluntary-Condition (probability 60%). Agents in the Voluntary-Condition were given the option to opt in or out of mentoring. Those who opted out did not receive a mentor. Agents in the Mandatory-Condition and those in the Voluntary-Condition who opted in were randomly assigned a mentor, or not, according to the following rule; if the supply of available mentors was greater than 50% of the cohort size, then approximately half of the agents would be assigned a mentor (the firm requested that we randomly allocate mentors to more agents when possible, e.g., rounding up for an odd number of agents in a cohort); otherwise, the available mentors would be assigned at random to those eligible to receive a mentor.¹² The pairing of mentors and new hires always occurred at random.

Figure 1 displays the allocation of cohorts and agents to the different conditions and treatments in the experiment. There were 591 program-eligible sales agents spread across 52 new hire cohorts.¹³ Twenty-one cohorts and their 264 sales agents were allocated to the Mandatory-Condition, whereas the other 31 cohorts and 327 sales agents were allocated to the Voluntary-Condition. Among the agents in the Mandatory-Condition, 127 agents (48%) were randomized to receive a mentor, and the remaining 137 were not. In the Voluntary-Condition, 272 agents (83%) chose to opt in, of which 155 agents (57%) were randomized to receive a mentor, and the remaining 117 were not. The remaining 55 agents (17%) in the Voluntary-Condition chose to opt out of receiving a mentor.¹⁴

Figure 1. Allocation of Cohorts and Agents to Treatment Conditions
*Notes.* This figure displays the allocation of the 52 mentor-eligible cohorts to either the Mandatory-Condition or the Voluntary-Condition, our first level of variation. It then shows the allocation of the 591 mentor-eligible agents within these cohorts into different treatment conditions, our second level of variation. This is based on agents who complete training and are observed to have post-training productivity data.

3.1. Timeline for Administering the Program and Communicating Treatment Allocations

Prior to starting the two-week training protocol, each cohort was allocated to either the Mandatory- or the Voluntary-Condition, and the staff administering the program was made aware of the cohort’s assignment. All new hires were asked to complete a survey on the first day of training, which asked about their personality traits, work styles, and work experiences (specifically, whether they had call center and/or sales work experience). We use these survey responses to identify the characteristics of individuals who opted into versus opted out of mentoring.

For cohorts in the Mandatory-Condition, agents were either randomly assigned a mentor or not based on the assignment rule described above. For cohorts in the Voluntary-Condition, the staff described the mentoring program to the newly hired agents and told them they could either opt in or opt out of participating. The agents were told that a randomly selected subset of those who opted in would receive a mentor at the end of the training period. The staff explained that the supply of mentors was limited and that an outside research team would help with the randomization to ensure fairness in the assignment.¹⁵ To avoid peer influence in program participation (Dahl et al. 2014), agents were asked to write on a piece of paper whether they wanted to opt in or out of the mentoring program, making their decision anonymous to their peers. Among those who opted in, agents were either randomly assigned a mentor or not based on the assignment rule described above. Agents assigned a mentor were informed of this assignment by the within-firm staff during the last days of their training. To reduce the possibility of discouragement among agents in the Mandatory-Condition who were not assigned a mentor, the staff did not initially inform them about the mentorship program. If agents inquired about why they were or were not assigned a mentor, the staff explained that the mentor supply was limited and that available mentors were randomly allocated to new hires.¹⁶

Across all treatment conditions, the two weeks of training remained exactly the same for all agents, regardless of their treatment assignment. After the two weeks of training, new hires graduated to work as regular agents, began taking customer calls, and had measurable sales productivity metrics. It was only then that meetings with mentors commenced. To facilitate meeting coordination, the firm built specific times to meet into mentors’ and protégés’ schedules. The mentoring relationships lasted for four weeks in most cohorts (the study’s pilot program used a six-week design, which we discuss in Section 3.4).

Mentors and protégés met once per week for approximately 30 minutes and completed a worksheet. They were free to discuss any topic, but the worksheet had to be completed for the mentor to receive credit for the meeting (as described below). Records of meeting occurrences and completed worksheets were kept by the staff and given to us. Shortly after their final week of meetings, protégés were asked to complete a post-mentorship survey about their experience. Although completion rates for the final survey were low, we use the data to provide insight into whether meetings continued after the formal program and whether agents viewed the experience as beneficial.

3.2. Identifying Mentors

The firm’s staff sourced mentors by announcing to incumbent sales agents that a mentoring program for new hires would occur and that agents could volunteer to be a mentor. The staff directly asked some promising candidates to participate. Agents who the staff felt were not suitable to be mentors were excluded. Mentors were given two main incentives to participate. First, in exchange for each prescheduled, confirmed meeting they held with their protégé, they received internal currency (“kudos” dollars) worth approximately $10. Second, incumbent sales agents were told that effective mentoring would help demonstrate leadership potential for future promotion considerations. It is important to note that mentors in this setting had no formal supervisory role; they were more experienced peers who had proven track records of sales success.

Mentors were always randomly assigned to protégés. Online Appendix Table 1.A.1 shows that the observable characteristics of the mentors—age, gender, marital status, and tenure—are similar across the Voluntary- and Mandatory-Conditions, meaning that endogenous matching of mentors to protégés or homophily do not explain differences in performance across the two high-level treatment conditions. Mentors were not informed about which condition their protégés were in.¹⁷

Table 1. Balance Tests for Treatment Assignment

Table 1. Balance Tests for Treatment Assignment

Panel A: Cohort-level balance in agent characteristics
	Mandatory-Condition	Voluntary-Condition	p-value
	(1)	(2)	(2) $-$ (1)
Age (years)
Mean	22.70	22.80	0.887
SD	(2.40)	(2.34)
Female
Mean	0.43	0.40	0.624
Married
Mean	0.13	0.16	0.522
Hiring Score
Mean	0.83	0.85	0.207
SD	(0.04)	(0.04)
Adjusted hiring score
Mean	0.84	0.86	0.029
Std Dev.	(0.03)	(0.03)
Referral
Mean	0.57	0.58	0.746
N Cohorts	21	31

Panel B: Balance in agent characteristics for those eligible for mentor assignment
	Mandatory-Condition			Voluntary-Condition
	Mentored	Non-mentored	p-value	Mentored	Non-mentored	p-value
	(1)	(2)	(2) $-$ (1)	(3)	(4)	(4) $-$ (3)
Age (years)
Mean	22.40	23.51	0.193	22.47	22.51	0.945
SD	(4.46)	(8.60)		(5.54)	(6.18)
Female
Mean	0.46	0.40	0.303	0.45	0.38	0.318
Married
Mean	0.09	0.15	0.150	0.15	0.17	0.722
Hiring score
Mean	0.83	0.84	0.432	0.85	0.86	0.508
SD	(0.09)	(0.08)		(0.08)	(0.08)
Adjusted hiring score
Mean	0.83	0.84	0.322	0.86	0.86	0.522
SD	(0.08)	(0.07)		(0.07)	(0.07)
Referral
Mean	0.58	0.55	0.649	0.56	0.60	0.543
Number of agents	127	137		155	117

Notes. This table presents balance tests. Most characteristics are self-explanatory, other than the hiring score, which is a recruiter-assigned measure of fit with the job, ranging from 0 to 1. The adjusted hiring score accounts for individual recruiter leniency, estimated using the productivity of non-mentor-eligible agents outside of the experiment, as described in Section 3.5. In panel A, we report average agent characteristics at the cohort level to test for assignment balance between the Mandatory- and Voluntary-Conditions. In panel B, we test for balance in subtreatment assignment to mentors. In panel B, the Voluntary-Condition sample is not comparable to the Mandatory-Condition sample because of selection into the program (Table 4 compares the characteristics of those who opt in and opt out in the Voluntary-Condition). Standard deviations are in parentheses for continuous variables. The p-values come from difference-in-means tests across high-level treatment conditions in panel A and for agents who do and do not receive mentors among those eligible for assignment in panel B.

3.3. Hold-Out Cohorts to Test for SUTVA Violations

There were 217 agents hired throughout the experiment in cohorts that were ineligible for the mentorship program. Ineligibility arose largely because these cohorts entered at times when mentor supply was lacking. Insufficient mentor supply typically occurred when the firm hired a new cohort soon after another one finished training, but in some cases projected call volumes relative to available staffing meant that potential mentors would not have time to meet with new hires. Agents in these cohorts formed holdout groups that were not informed about the mentorship program. Variation in treatment eligibility at the cohort level allows us to test for discouragement effects in the control group and other possible violations of the Stable Unit Treatment Value Assumption (SUTVA). Although these holdout cohorts were not randomly assigned, they had similar characteristics to program-eligible cohorts in the same division and office. We leverage these holdout cohorts to compare the productivity of holdout new hires to the productivity of nontreated agents in program-eligible cohorts, showing that SUTVA violations were unlikely (see Online Appendix Section I.A).

3.4. Pilot Data

We piloted our design in the firm from January to May of 2019 to ensure that we could logistically implement the program. The pilot surfaced several virtues of the program while assuaging feasibility concerns: (i) There was sufficient interest among seasoned agents to mentor new hires, (ii) the firm could schedule meetings between mentors and protégés, (iii) mentors and protégés would engage with the protocol as designed, (iv) anecdotal evidence indicated that protégés felt that they benefited from the mentorship, and (v) there were no indications of discouragement among non-mentored agents.

As a result, we moved forward with the experimental design described thus far, which varied from the design of the pilot in only two ways. First, to accommodate scheduling, we changed the duration of the mentorship program from five meetings over six weeks (with a gap in week five) to four meetings over four weeks. Second, at the beginning of the pilot, the allocation of cohorts to the Mandatory-Condition and Voluntary-Condition was determined by the location of each cohort; that is, all cohorts at one office were allocated to one condition, and those at the second office were allocated to the other. This allocation was chosen to limit potential spillovers between the Mandatory- and Voluntary-Conditions (e.g., workers potentially talking about the choice to opt in). Within each condition, the firm’s staff observed no discussion of program logistics among new hires or spillover effects within or across cohorts. There were also no complaints from agents in the Voluntary-Condition who requested but did not receive a mentor. Accordingly, we determined that the risk of spillovers across conditions was small and that the logistics were feasible such that we could randomize Mandatory- and Voluntary-Condition assignment within offices as well.¹⁸

No other changes were made between the pilot period and the later cohorts. The preregistration text was finalized after the pilot and is documented in Online Appendix (see Section I.B).¹⁹

Based on power calculations and the hiring projections given to us by the firm, we expected the firm to hire 619 agents across 46 cohorts after the pilot period (May to December of 2019). The actual hiring at the firm was much less frequent and intense, with the firm bringing on only 276 agents across 27 cohorts that were eligible for the mentorship program. We were not able to extend the mentorship program into 2020 because COVID-19 forced all employees to work remotely. Because the firm’s actual hiring behavior was substantially less intense than expected, and given the similarity between the experimental design in the pilot period and the preregistered period, our empirical analyses include the 315 agents and 25 cohorts from the pilot to improve statistical power. We detect no differences in treatment effects or imbalance in worker characteristics between the pilot cohorts and those from the post-pilot period (see Table 1.A.2 in the Online Appendix).

3.5. Balance Across Treatments

Agent characteristics are balanced across the conditions of the experiment and across the treatment statuses within each condition for those agents eligible for randomization. Table 1, panel A, displays cohort-level balance tests for the Mandatory-Condition compared with the Voluntary-Condition (the top level of randomization). There are no significant between-condition differences in average agent age, gender, marital status, hiring score (recruiters’ evaluation of the worker’s suitability for the position), and referral status. The average agent age in both groups is about 23 years old, women make up 43% of the agents in the Mandatory-Condition and 40% of agents in the Voluntary-Condition, and 13% to 16% of agents are married in the two groups. The average hiring scores (which have a maximum value of 1) are 0.83 and 0.85, respectively. These scores are based on the recruiters’ perceptions of applicants’ sales experience, ability to adhere to the sales process, self-awareness, competitiveness, and personal motivation. We also report adjusted hiring scores, which take into account some recruiters’ relative scoring leniency compared with others—akin to curving grades received from one professor versus another. Throughout our analysis, we use the adjusted hiring score because it is a better predictor of opting out of the program relative to the raw hiring scores, but our results are not sensitive to the use of raw hiring scores, which we discuss in Section 4.3.1.²⁰

Table 1, panel B, considers the second level of randomization, the allocation of mentors to new hires within the Mandatory-Condition or Voluntary-Condition. Columns (1) and (2) show the agent-level average characteristics in the Mandatory-Condition for those who did and did not receive a mentor, respectively. These two groups are similar in age, gender, marital status, hiring scores, adjusted hiring scores, and referral status. Columns (3) and (4) and the associated p-values show that agents assigned mentors and those that were not in the Voluntary-Condition, conditional on opting into the program, are similar across these observable characteristics as well.²¹ We defer the discussion of differences between agents who opt into and out of the program to Section 4.3.1.

4. Estimation and Results

4.1. Treatment Effects on Productivity and Selection Into Mentoring

We estimate differences in productivity by high-level treatment condition (Mandatory or Voluntary) and low-level subtreatment cell (assigned a mentor, not assigned a mentor, or opted out). We refer to agents assigned a mentor as “mentored,” which we use to denote treatment assignment in an intention-to-treat framework. Our main productivity outcomes of interest, $y_{i, t}$ , are total daily sales revenue (Revenue) and daily revenue per call (RPC). Total daily revenue relates directly to the firm’s profitability while accounting for the opportunity cost of time spent meeting a mentor. RPC captures selling efficiency on a given opportunity. We also form a composite index of productivity measures that incorporates two additional—albeit less central—performance measures that are tracked by the firm. These are adherence, which captures on a zero-to-one scale how closely agents adhere to their preset schedules (i.e., are available to take calls when they are supposed to be on the phones), and revenue per hour, which scales total revenue by hours worked. We preregistered a natural specification to capture percentage changes in revenue and RPC within a cohort. Cohort fixed effects sweep out division-level differences in baseline revenue and RPC (because cohorts are assigned to a single division). To account for the fact that some agents have days with zero sales revenue, we use the inverse hyperbolic sine transformation (IHS). Parameter estimates can be interpreted as approximate percentage changes.²² We use a sample of agent-day productivity data for all program-eligible agents in their first two months on the job after completing training (estimates for months 3–6 are discussed in Section 4.1.4).

Our first specification comes from a linear regression, fit separately for the Mandatory- and Voluntary-Conditions, on the sample of agents who were eligible to be assigned a mentor (e.g., they did not opt out of the program):

\begin{array}{l} y_{i, t} = α + β_{1} {Mentored}_{i} + γ_{j} + ε_{i, t} . \end{array}

(1)

The variable Mentored $_{i}$ is an indicator taking the value of one for agents who were randomly assigned to receive a mentor. The t subscript denotes the calendar date, and $γ_{j}$ is a cohort fixed effect at the unit of randomization that absorbs product- and brand-level differences. We cluster standard errors by cohort for those workers entering the experiment after the pilot period and at the pilot period by office level for those workers entering during the pilot (recall that the pilot program entailed assignment of the Mandatory- and Voluntary-Conditions at the office level).

4.1.1. Treatment Effects in the Mandatory-Condition.

Because of the random assignment of mentors to some Mandatory-Condition agents and not to others, when estimating Equation (1), the parameter $β_{1}$ is the average treatment effect of receiving a mentor across the entire population, not the treatment effect conditional on opting into the program. We tabulate the results in columns (1) and (2) of Table 2 for IHS(Revenue) and IHS(RPC), respectively. In both columns, we estimate positive effects that are statistically significantly different from zero. The estimate in column (1) implies that mentored agents in the Mandatory-Condition generated 18.6% ( $= e^{0.171} - 1$ , p-value = 0.002) more daily sales revenue than their non-mentored peers. The estimate in column (2) implies that mentored agents generated 11.9% ( $= e^{0.112} - 1$ , p-value = 0.003) more in revenue per call.²³ To understand the magnitude of the estimates, we compare them to the baseline gap between new hires and experienced agents. In our setting, the average experienced worker is about 30% more productive than the average new hire (see Table 1.A.5 in the Online Appendix), suggesting that the program accelerates on-the-job learning, although it fails to fully close the average productivity gap between new hires and experienced agents.²⁴

Table 2. Treatment and Selection Effects of Mentoring on Productivity

Table 2. Treatment and Selection Effects of Mentoring on Productivity

	Mandatory-Condition		Voluntary-Condition		Voluntary-Condition		Both Conditions
	(All agents)		(Opt-in agents)		(Non-mentored agents)		(All agents)
	IHS(Rev)	IHS(RPC)	IHS(Rev)	IHS(RPC)	IHS(Rev)	IHS(RPC)	IHS(Rev)	IHS(RPC)	Index
	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)
Mentored	0.171***	0.112***	−0.084	−0.084			0.171***	0.112***	0.145***
Standard error	(0.039)	(0.027)	(0.080)	(0.054)			(0.038)	(0.026)	(0.052)
Sharpened q-value							[0.001]	[0.002]	[0.013]
Voluntary Opt-out					−0.369***	−0.264***	−0.277***	−0.161***	−0.141***
Standard error					(0.109)	(0.068)	(0.098)	(0.053)	(0.028)
Sharpened q-value							[0.013]	[0.010]	[0.001]
Mentored $\times$ Voluntary							−0.272***	−0.207***	−0.203***
Standard error							(0.086)	(0.062)	(0.054)
Sharpened q-value							[0.009]	[0.007]	[0.003]

Cohort fixed effects	✓	✓	✓	✓	✓	✓	✓	✓	✓
Adjusted R²	0.028	0.032	0.020	0.046	0.047	0.066	0.026	0.040	0.059
Observations	6,725	6,725	7,569	7,569	4,734	4,734	15,670	15,670	15,670
p-value: Mentored + Mentored $\times$ Voluntary							0.241	0.124	0.155

Notes. This table reports estimates of the different treatment and selection effects from the mentorship program. The sample is composed of agent-day productivity data across agents’ first two months on the job after they complete training. IHS(.) indicates a variable that is transformed by the inverse hyperbolic sine. Revenue (“Rev”) is daily total revenue, and RPC is revenue per call. Mentored equals one for agents who were randomized to receive an available mentor and zero otherwise, Voluntary equals one for agents in the Voluntary-Condition and zero otherwise, and Voluntary Opt-Out equals one for agents who chose to opt out of possibly receiving a mentor and zero otherwise. The specifications in columns (1) and (2) include all agents in the Mandatory-Condition. The specifications in columns (3) and (4) include agents in the Voluntary-Condition who signaled their interest in receiving a mentor (i.e., those who opted in). The specifications in columns (5) and (6) include agents in the Voluntary-Condition who were not assigned a mentor, including those who opted out of the program. The specifications in columns (7)–(9) include all agents from both conditions. The dependent variable in column (9), Index, is the standardized weighted index of IHS(Revenue), IHS(RPC), IHS(RPH), and Adherence normalized using data from the non-mentored agents in the Mandatory-Condition (see the text for additional details). We estimate ordinary least squares regressions with cohort fixed effects in all columns. Standard errors are clustered by cohort for those workers entering the experiment after the pilot period and by pilot period by office for those workers entering during the pilot (this is because the pilot entailed assignment of the Mandatory- and Voluntary-Conditions at the office level) and are reported in parentheses. Sharpened q-values that adjust for the false discovery rate are presented in brackets, following Anderson (2008). The bottom row reports the p-values from post-estimation tests that the sum of the coefficients on Mentored and Mentored $\times$ Voluntary equals zero.

*Statistical significance at the 10% level; **statistical significance at the 5% level; ***statistical significance at the 1% level.

4.1.2. Treatment Effects in the Voluntary-Condition.

Next, we estimate Equation (1) on the sample of Voluntary-Condition agents who opted into the program. If the employees who are likely to benefit the most from mentorship are also those who opt into the program, then we would expect to see treatment effects in the Voluntary-Condition that exceed those in the Mandatory-Condition. If, however, the employees who are likely to benefit the most are also those who opt out of the program, then treatment effects in the Voluntary-Condition will be smaller than those in the Mandatory-Condition. Because of the random assignment of mentors to some opt-in agents and not to others, the parameter $β_{1}$ in the Voluntary-Condition is the average treatment effect of receiving a mentor conditional on selection into program participation. The results are in columns (3) and (4) of Table 2 for IHS(Revenue) and IHS(RPC), respectively. In both columns, the estimated effects are statistically indistinguishable from zero. Random assignment to be mentored had a negligible effect on the productivity of workers in the Voluntary-Condition conditional on their opting into the program. These estimates are much smaller than those in the Mandatory-Condition. Had the analysis been conducted only among those who selected into randomization, which is typical for many RCTs across disciplines ranging from medicine to economics, we would have falsely concluded that the program was not effective in the population. Instead, these results suggest that different procedures for administering the program can change inference. We turn now to assessing why estimates differ across conditions.

4.1.3. Self-Selection in the Voluntary-Condition.

How much of the difference in treatment effects between the Mandatory- and Voluntary-Conditions arises from selection into participation? To provide color on just how much selection bias may be present, we estimate how workers’ baseline productivity varies as a function of whether they volunteer to participate. Specifically, we compare non-mentored agents in the Voluntary-Condition who opt into the program with those who opt out using the following regression:

\begin{array}{l} y_{i, t} = α + β_{1} {Voluntary Opt‐Out}_{i} + γ_{j} + ε_{i, t} . \end{array}

(2)

The variable Voluntary Opt-Out $_{i}$ is an indicator for agents who opted out of the mentorship program when given the opportunity. The parameter $β_{1}$ captures the difference in productivity between agents who did and those who did not opt out of the program. Results for revenue and revenue per call are reported in columns (5) and (6) of Table 2, respectively. The estimates imply that opt-out agents generated 30.9% ( $= e^{- 0.369} - 1$ , p-value = 0.003) less revenue per day than non-mentored, opt-in agents and had 23.2% ( $= e^{- 0.264} - 1$ , p-value = 0.001) lower productivity on a per-call basis. The agents who opted into program participation were significantly more productive, on average, than those who opted out.

4.1.4. Pooled Estimates, Additional Productivity Measures, Multiple Tests, and Long-Term Outcomes.

In columns (7)–(9) of Table 2, we estimate all three effects of interest simultaneously in a single model that includes all mentor-eligible agents across both the Mandatory-Condition and the Voluntary-Condition. The model is

\begin{array}{l} y_{i, t} = α + β_{1} {Mentored}_{i} + β_{2} {Mentored}_{i} \times {Voluntary}_{i} \\ + β_{3} {Voluntary Opt‐Out}_{i} + γ_{j} + ε_{i, t}, \end{array}

(3)

where Mentored

_{i}

indicates that the agent was randomly assigned to receive a mentor, Voluntary

_{i}

equals one for agents in the Voluntary-Condition, and Voluntary Opt-Out

_{i}

equals one for agents in the Voluntary-Condition who opted out of the mentorship program. In this model,

β_{1}

captures the treatment effect of mentorship among agents in the Mandatory-Condition,

β_{2}

captures the difference in treatment effects among opt-in agents in the Voluntary-Condition relative to mentored agents in the Mandatory-Condition, and

β_{3}

captures the selection effect among non-mentored agents in the Voluntary-Condition. The baseline effects for the Mandatory- and Voluntary-Conditions are absorbed by the cohort fixed effects, which also control for differences in productivity that are specific to the time when agents entered the firm and the differing products sold.²⁵ Pooling the models allows us to test whether treatment effects differ between conditions.

The pooled model results for IHS(Revenue) and IHS(RPC) are reported in columns (7) and (8) of Table 2, respectively. In the top row, the productivity treatment effects for the Mandatory-Condition are identical to the prior estimates. In the second row, the point estimates of productivity differences for those who opt out in the Voluntary-Condition are similar to the prior estimates, but they are not identical because the sample changes relative to the columns that focus only on comparing unmentored agents. The third row shows that the treatment effect of receiving a mentor in the Voluntary-Condition is statistically different than the treatment effect of receiving a mentor in the Mandatory-Condition. The bottom row of Table 2 reports tests of the null that treatment effects are zero for those mentored in the voluntary program, because their treatment effects are the sum of the coefficients on Mentored and Mentored $\times$ Voluntary.

To capture the suite of productivity measures, in column (9) we repeat the pooled analysis with an alternative dependent variable that factors in adherence to schedule and revenue per hour as additional outcomes. We construct a standardized, weighted summary Index of all performance metrics (see Anderson 2008): IHS(Revenue), IHS(RPC), IHS(RPH), and Adherence. The measure is normalized to have mean zero and unit standard deviation for non-mentored agents in the Mandatory-Condition.²⁶ We continue to find that the program generally raised productivity when it was mandatory, that the program had no effect in the Voluntary-Condition, and that Voluntary-Condition participants who opted in were stronger than those who opted out. The economic magnitudes of the point estimates in column (9) must be interpreted differently from the other columns because they are in standard deviation units relative to the control mean. Thus, treatment in the Mandatory-Condition raised overall productivity by 0.145 standard deviations, and agents in the Voluntary-Condition who opted out had productivity that was 0.141 standard deviations lower than those who opted in.

We also correct for multiple hypothesis testing using a second suggestion by Anderson (2008), where we report sharpened q-values that are analogous to a p-value after adjusting for the False Discovery Rate (FDR). The q-values, reported in Table 2 in brackets below the standard errors, indicate that inference regarding our main point estimates is robust to holding fixed the proportion of false positives as the number of tests increases.²⁷

In Table 1.A.6 in the Online Appendix, we show that about 45% of the point estimates from months 1 to 2 persist through months 3–6 for mentored workers in the Mandatory-Condition, whereas the effect of having a mentor in the Voluntary-Condition remains close to zero. The longer-term point estimates have larger standard errors relative to the effects at one to two months of tenure for two primary reasons: (a) There is an increase in residual variation as agents gain experience, causing productivity to fan out, and (b) there are fewer agents who remain at the firm over longer time horizons. Although we lose precision, the pattern of estimates suggests that the mentorship program helped treated workers in the Mandatory-Condition over the longer-term.

Table 6. Meeting Completion Rates Across Conditions in the Experiment

Table 6. Meeting Completion Rates Across Conditions in the Experiment

	Mandatory-	Voluntary-	Diff.	Mandatory	Diff.
	Condition	Condition	p-value	Low $_{Opt}$	p-value
	(1)	(2)	(3)	(4)	(5)
Number of agents	127	155		89
At least one recorded meeting	109	130		77
No recorded meeting	18	25		12
Number of recorded meetings (avg.)	2.31	2.11	0.260	2.42	0.115
	(1.58)	(1.36)		(1.61)
Meeting completion ratio (avg.)	0.74	0.64	0.031	0.77	0.014
	(0.38)	(0.38)		(0.36)

Notes. In this table, we report the mentor meeting completion details of protégés in the Mandatory-Condition and the Voluntary-Condition and among protégés in the Mandatory-Condition with low opt-out propensity scores. No Recorded Meeting indicates that there is no record that the mentor-protégé pair ever met with one another. The Meeting Completion Ratio measure is based on the number of possible meetings the mentor-protégé pair could have had. Although the preregistered mentoring protocol called for one meeting per week for four weeks, there were instances in which either a mentor or protégé or both were absent from work for an extended period of time (e.g., on vacation), reducing the number of possible scheduled meetings from four to three (or fewer, in some cases). As such, the denominator of the meeting completion ratio is occasionally less than four. Column (4) considers agents in the Mandatory-Condition with opt-out propensity scores in the bottom two terciles. The p values in column (3) are from difference-in-means comparisons of the values in columns (1) and (2). The p values in column (5) are from difference-in-means comparisons of the values in columns (2) and (4).

4.2. The Mentorship Program Did Not Impact Worker Retention

Call centers have notoriously high levels of attrition (Hoffman et al. 2017), and retention is a key performance metric for the HR executives at the firm. To estimate retention effects from the mentorship program, we use data with a single observation per unique mentor-eligible agent among those who completed training,²⁸ and we create an indicator variable Tenure₃₀ (Tenure₆₀ ) that equals one for agents who remain with the firm for at least 30 (60) days after their hire date and zero otherwise. We then re-estimate each of the models specified by Equations (1)–(3) with these two tenure achievement indicators as the dependent variables.

In Table 3, we find no evidence that mentorship impacted agents’ retention, although agents who opt out of the program in the Voluntary-Condition are less likely to achieve 60 days of tenure than are non-mentored agents who opt in (although the estimate is noisy). There are no discernible retention effects among agents who were mentored relative to those who were not at these horizons or at longer horizons (see Table 1.A.7 in the Online Appendix).²⁹

Table 3. Treatment Effects on Retention

Table 3. Treatment Effects on Retention

	Mandatory-Condition		Voluntary-Condition		Voluntary-Condition		Both conditions
	(All agents)		(Opt-in agents)		(Non-mentored agents)		(All agents)
	Tenure₃₀	Tenure₆₀	Tenure₃₀	Tenure₆₀	Tenure₃₀	Tenure₆₀	Tenure₃₀	Tenure₆₀
	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)
Mentored	−0.001	−0.009	−0.001	−0.072			−0.001	−0.009
	(0.036)	(0.112)	(0.046)	(0.060)			(0.035)	(0.108)
Voluntary Opt-out					−0.117	−0.190	−0.054	−0.196*
					(0.079)	(0.126)	(0.058)	(0.104)
Mentored $\times$ Voluntary							−0.018	−0.072
							(0.062)	(0.147)

Cohort fixed effects	✓	✓	✓	✓	✓	✓	✓	✓
Adjusted R²	0.041	0.009	0.003	0.066	0.036	0.084	0.034	0.040
Observations	264	264	272	272	172	172	591	591
Mean value of tenure $_{i}$	0.86	0.61	0.92	0.67	0.91	0.65	0.89	0.63
p-value: Mentored + Mentored $\times$ Voluntary							0.651	0.194

Notes. The sample used is composed of a single observation per agent among all mentor-eligible agents with post-training productivity data. ${Tenure}_{30}$ ( ${Tenure}_{60}$ ) equals one for agents who remain with the firm for at least 30 (60) days after their hire date and zero otherwise. Mentored equals one for agents who were randomized to receive an available mentor and zero otherwise, Voluntary equals one for agents in the Voluntary-Condition and zero otherwise, and Voluntary Opt-Out equals one for agents who chose to opt out of possibly receiving a mentor and zero otherwise. The specifications in columns (1) and (2) include all agents in the Mandatory-Condition. The specifications in columns (3) and (4) include agents in the Voluntary-Condition who signaled their interest in receiving a mentor (i.e., those who opted in). The specifications in columns (5) and (6) include agents in the Voluntary-Condition who were not assigned a mentor, including those who opted out of the program. The specifications in columns (7) and (8) include agents from both conditions. We estimate ordinary least squares regressions with cohort fixed effects in all columns. Standard errors are clustered by cohort for those workers entering the experiment after the pilot period and by pilot period by office for those workers entering during the pilot (this is because the pilot entailed assignment of the Mandatory- and Voluntary-Conditions at the office level) and are reported in parentheses. The penultimate row reports the average value of Tenure $_{i}$ for the sample of agents used in the specification within that column. The bottom row reports the p-values from post-estimation tests that the sum of the coefficients on Mentored and Mentored $\times$ Voluntary equals zero.

*Statistical significance at the 10% level; **statistical significance at the 5% level; ***statistical significance at the 1% level.

Table 7. Survey Data on the Characteristics of Human Capital Development Programs and Participation in Voluntary Programs

Table 7. Survey Data on the Characteristics of Human Capital Development Programs and Participation in Voluntary Programs

	Is the program	If it is offered,	If it is voluntary,
Program type:	offered?	is it voluntary?	do you not participate?
Formal mentorship	0.45	0.59	0.27
Formal mentorship	(0.01)	(0.01)	(0.02)
New-hire training	0.87	0.22	0.21
New-hire training	(0.01)	(0.01)	(0.02)
Ongoing training or cont. ed.	0.80	0.43	0.28
Ongoing training or cont. ed.	(0.01)	(0.01)	(0.01)
N = 3,191

Notes. This table displays summary statistics on the prevalence and administrative choices for different human capital development programs. Means and standard errors (in parentheses) are reported. Data come from a nationally representative online survey conducted through the Lucid platform in June 2022. The survey was restricted to respondents currently employed by others. Respondents were asked about whether their employer offers a particular program and whether it is voluntary or mandatory with the following question: “Consider your current employer. Which of the following programs does your employer offer to you personally? If offered, are you required to participate (required/mandatory) or can you choose to participate or not (optional/voluntary)?” For each program, respondents chose between “Required or Mandatory,” “Optional or Voluntary,” or “Not offered.” For the three core programs—mentorship, new-hire training, and continuing education—if a respondent indicated that a program was voluntary, then follow-up questions were asked about their participation and the reasons for their lack of participation, if applicable. As reported in the text, the survey also asked about workplace wellness programs to benchmark responses against other sources.

To further analyze the relation between mentorship and retention, we plot the distribution of completed tenure for mentored agents in Figure 1.A.2 in the Online Appendix. Specifically, we plot the distribution of completed tenure, in years, for each mentored agent in the Mandatory-Condition (solid line) and for each mentored agent in the Voluntary-Condition (dashed line).³⁰ Comparing the distributions, we see that mentored agents in the Mandatory-Condition realize slightly higher levels of retention relative to mentored agents in the Voluntary-Condition. Formal tests of mean, standard deviation, and distribution differences do not reject the null that the two groups realize the same tenure outcomes.³¹ Because the distribution of completed tenure among mentored agents does not differ between the Mandatory-Condition and the Voluntary-Condition, it is unlikely that retention differences drive our main productivity findings. Given that mentorship does not appear to impact agents’ retention, it is unlikely that the differences in productivity treatment effects between conditions are driven by differences in attrition, a point we return to in Section 4.4.2.

4.3. Selection and Treatment Effect Heterogeneity

We study the mechanisms underlying the differing productivity treatment effects between the Mandatory-Condition and the Voluntary-Condition throughout the rest of the paper. Although our initial evidence suggests that heterogeneous treatment effects and self-selection might be the cause of the differences, these channels are likely not big enough to explain the totality of productivity treatment effect differences between the two conditions. We also explore several alternative channels that could have caused the treatment effects to differ. Because our experiment was done in the field, the greater effectiveness of mentorship in the Mandatory-Condition could have been caused by logistical differences, framing issues, or other unobserved factors—all of which could be expected to (similarly) surface in any real-world intervention or test involving the implementation of a mandatory program versus a voluntary program. We will use differences in outcomes for agents within the Mandatory-Condition to show the presence of treatment effect heterogeneity that varies with selection probabilities, and our evidence on other channels is suggestive of a program framing effect.

If differences in treatment effects across conditions are driven by self-selection, then there must be treatment effect heterogeneity such that those who are most likely to opt out of the program have the largest gains when they receive mentorship. We now turn to understanding selection before evaluating the other explanations.

4.3.1. Differences Between Agents Who Opt Out and Those Who Opt In.

We first consider how agents who opt out differ from those who opt into the program. We restrict the sample to the 365 agents in the Voluntary-Condition who were given the choice to opt in or out, and we estimate logistic regressions of a Voluntary Opt-Out indicator on worker characteristics.³²

The main conclusion from this analysis is that low hiring scores, assigned during the interview stage, are the best predictors of opting out of the program. Worker demographics from the firm’s personnel records and personality traits (obtained via onboarding surveys) do little to explain program participation. Failure to complete the onboarding surveys also predicts opting out of the program, but this is not a characteristic that is measurable up front. These results can be seen in Table 4, columns (1)–(4), which report marginal effects from logit models predicting the opt-out decision. In column (1), we report no difference between agents that opt in and opt out based on age and marital status. The impact of gender is significant at the 5% level in column (1), but it loses explanatory power and decreases substantially in magnitude when controlling for additional covariates. Participation decisions do not depend on an agent’s location (a fixed effect for one office compared with the other) or whether an existing employee referred the agent (following Friebel et al. 2023). Participation decisions also do not depend on whether the agent had prior sales experience (which we collected from the new-hire survey for 341 agents) or fixed effects for the agents’ assigned division. Personality characteristics, also collected from the new-hire survey, are weak predictors of the opt-out decision. However, we find that agents who did not complete the new-hire survey and those with prior call center experience have a higher propensity to opt out, as reported in column (4).

Table 4. Determinants of Program Opt-Out and the Relationship Between Opting Out, Productivity, and Worker Characteristics

Table 4. Determinants of Program Opt-Out and the Relationship Between Opting Out, Productivity, and Worker Characteristics

Dep. variable	$= 1$ if Opted Out				IHS(Revenue)		IHS(RPC)
Dep. variable	(1)	(2)	(3)	(4)	(5)	(6)	(7)
Age	0.002	0.001	0.002	−0.001	−0.002	−0.004	−0.001
Age	(0.002)	(0.002)	(0.001)	(0.003)	(0.005)	(0.005)	(0.003)
Female	−0.064**	−0.011	−0.007	−0.019	−0.104	−0.093	−0.084
Female	(0.029)	(0.030)	(0.032)	(0.032)	(0.136)	(0.128)	(0.084)
Married	−0.012	−0.019	−0.017	0.004	0.067	0.119	0.105
Married	(0.049)	(0.058)	(0.055)	(0.049)	(0.156)	(0.185)	(0.124)
Adjusted hiring score	−0.693***	−0.666***	−0.669***	−0.664***	2.703***	2.931***	2.088***
Adjusted hiring score	(0.249)	(0.223)	(0.206)	(0.200)	(0.852)	(0.818)	(0.515)
Location 1	−0.051	0.046	0.037	0.026
Location 1	(0.078)	(0.053)	(0.053)	(0.052)
Referral	−0.023	0.019	0.016	0.011	−0.188	−0.183	−0.079
Referral	(0.037)	(0.048)	(0.051)	(0.043)	(0.120)	(0.112)	(0.062)
Call center exp.		0.093	0.093*	0.099*		0.397**	0.313**
Call center exp.		(0.060)	(0.052)	(0.052)		(0.187)	(0.112)
Sales experience		0.006	0.006	0.005		−0.017	−0.050
Sales experience		(0.059)	(0.056)	(0.057)		(0.223)	(0.123)
High extroversion			−0.025	−0.025		0.205	0.163
High extroversion			(0.041)	(0.042)		(0.131)	(0.098)
High agreeableness			−0.040	−0.040		−0.206*	−0.109
High agreeableness			(0.026)	(0.025)		(0.106)	(0.088)
High conscientiousness			−0.059	−0.056		−0.035	−0.051
High conscientiousness			(0.047)	(0.045)		(0.110)	(0.076)
High emotional stability			0.042	0.041		−0.146	−0.056
High emotional stability			(0.052)	(0.053)		(0.099)	(0.054)
High openness			0.024	0.027		0.039	0.062
High openness			(0.037)	(0.037)		(0.138)	(0.088)
Missing survey				0.288***		−0.457**	−0.298**
Missing survey				(0.076)		(0.180)	(0.121)
Division fixed effects		✓	✓	✓
Cohort fixed effects					✓	✓	✓
(Pse.) R²	0.036	0.070	0.085	0.209	0.061	0.073	0.094
Observations	365	341	341	365	4,734	4,734	4,734

Notes. The sample in columns (1)–(4) is restricted to the 365 agents in the Voluntary-Condition, including those who quit before they completed training. The dependent variable is an indicator that equals one if the agent opted out of the program. The coefficients capture the marginal effects of a unit change in each regressor from logistic regressions of different predictors of the choice to opt out. Experience and personality factors were collected via survey. We split personality scores on the sample median. Column (4) includes agents who did not complete the new hire survey, which we account for with a Missing Survey indicator. In Columns (5) and (6), we use the sample of agents in the Voluntary-Condition who were not mentored, and we regress IHS(Revenue), the inverse hyperbolic sine of daily revenue, on agents’ characteristics. In column (7), we use the sample of agents in the Voluntary-Condition who were not mentored, and we regress IHS(RPC), the inverse hyperbolic sine of daily revenue per call, on agents’ characteristics. In columns (2)–(4), the Division Fixed Effects indicators reflect the inclusion of controls for whether the agent works in the first largest division, the second largest division, or one of the other smaller divisions. Standard errors are clustered by cohort for those workers entering the experiment after the pilot period and by pilot period by office for those workers entering during the pilot (this is because the pilot entailed assignment of the Mandatory- and Voluntary-Conditions at the office level). We report marginal effects and delta-method standard errors in parentheses in columns (1)–(4).

*Statistical significance at the 10% level; **statistical significance at the 5% level; ***statistical significance at the 1% level.

The best predictor of the opt-out decision is contained in recruiters’ assessments of the new hires’ suitability for the job. We find that agents with higher adjusted hiring scores—interview scores net of recruiter leniency (see Section 3.5)—are more likely to opt into the program. Because an agent’s hiring score is given to them by the recruiter who interviewed them for the job, this suggests that recruiters’ assessments of agents’ suitability for the job can predict their program engagement.³³ Computing marginal effects from the logit model, which are reported in the table for a unit change in each regressor, we find that an increase in the adjusted hiring score of 0.10 (approximately the interquartile range in the sample) is associated with a 6.9 percentage point decrease in the likelihood that the agent opts out of the program.

We also assess the extent to which the predictors of program opt-out explain variation in agent productivity. Using a sample of agents from the Voluntary-Condition who were not mentored, we regress realized productivity on the factors that potentially explain program participation. In Table 4, Column (5) displays the baseline productivity regression results controlling only for agent demographics, hiring scores, referral status, and cohort fixed effects (which absorb the location dummy). The coefficient on Adjusted Hiring Score is positive and statistically significant. A one-standard deviation change in the adjusted hiring score (approximately 0.07 units) yields a 19% change in revenue. This suggests that both the opt-out decision and the observable agent characteristics (that predict opting out) help to explain on-the-job productivity. Column (6) adds data from the new-hire survey and the Missing Survey dummy. The coefficient on Adjusted Hiring Score is larger in magnitude—even when other characteristics are included. The results are similar in Column (7) when the dependent variable is IHS(RPC), showing that hiring scores predict on-the-job performance. At this firm, and likely in others, workers with low pre-hire assessments are less productive than other workers with more favorable evaluations. As we show later, human capital development programs can help remediate this lower level of initial productivity.³⁴

4.3.2. Heterogeneous Treatment Effects for Agents Who Are Likely to Opt Out of the Program.

Next, we conduct tests of heterogeneous treatment effects within the Mandatory-Condition. These tests are robust to framing or logistical differences that may drive other variation between the Mandatory- and Voluntary-Conditions. Here we ask whether the largest individual treatment effects accrued to mentored workers in the Mandatory-Condition with the greatest likelihood of opting out of the program. To do this, we use the coefficient estimates in column (1) of Table 4 and the characteristics of workers in the Mandatory-Condition to impute opt-out propensity scores for those workers. We classify agents in the Mandatory-Condition as either High $_{Opt}$ , if their opt-out propensity score is in the top tercile of the distribution, or Low $_{Opt}$ , if their opt-out propensity score is in the bottom two terciles of the distribution. We use terciles rather than the 17% opt-out rate in the Voluntary-Condition because (i) the individual propensity scores are less than 1, implying that we need more workers to yield the total number of those who opt out in the Voluntary-Condition, and (ii) we run into power issues because of small samples if we use fewer agents or a finer partition. If treatment effects are monotonic in the propensity to opt out of treatment, then these choices are conservative. We then estimate Equation (1) on these subsets of the data alongside pooled models that allow us to test for differences in treatment effects between workers with high and low propensities to opt out of participation.

We find that agents in the Mandatory-Condition with a high estimated likelihood of opting out had a significantly greater treatment effect of mentorship than did their peers who were less likely to opt out, as reported in Table 5. The estimate in column (1) shows that agents who were most likely to opt out of the program had revenue gains of more than 38% ( $= e^{0.324} - 1$ ), whereas the estimate in column (2) shows that other agents had estimated gains of about 7%. The pooled estimate on $Mentored \times {High}_{Opt}$ in column (3) rejects equality of the treatment gains within the Mandatory-Condition between high and low opt-out agents, providing evidence in favor of heterogeneous treatment effects. Column (3) also shows that agents in the highest tercile of the opt-out propensity distribution are about 21% ( $= e^{- 0.239} - 1$ ) less productive than other agents in the Mandatory-Condition. The results in columns (4)–(6) report similar patterns when using IHS(RPC) as the dependent variable.³⁵ We note that the high opt-out propensity agents here are again less productive than those who are more likely to participate in the program, consistent with self-selection.

Table 5. Treatment Effects of Mentoring in the Mandatory-Condition by Predicted Opt-Out Propensity

Table 5. Treatment Effects of Mentoring in the Mandatory-Condition by Predicted Opt-Out Propensity

	IHS(Revenue)			IHS(RPC)
	High $_{Opt}$	Low $_{Opt}$	All	High $_{Opt}$	Low $_{Opt}$	All
	(1)	(2)	(3)	(4)	(5)	(6)
Mentored	0.324**	0.069**	0.063*	0.262**	0.026	0.022
Mentored	(0.138)	(0.023)	(0.029)	(0.094)	(0.021)	(0.021)
Mentored $\times$ High $_{Opt}$			0.342**			0.285***
Mentored $\times$ High $_{Opt}$			(0.128)			(0.078)
High $_{Opt}$			−0.239**			−0.163***
High $_{Opt}$			(0.085)			(0.049)
Cohort fixed effects	✓	✓	✓	✓	✓	✓
Adj. R²	0.038	0.036	0.030	0.044	0.035	0.035
Observations	2,244	4,481	6,725	2,244	4,481	6,725

Notes. This table reports heterogeneous treatment effect estimates for agents in the Mandatory-Condition. We estimate agents’ opt-out propensity scores as described in Section 4.3.2. After estimating propensity scores, we place agents into High $_{Opt}$ if their propensity score of opting out is in the top 33.3% of the propensity score distribution, and we place agents with a propensity score in the bottom 66.7% into Low $_{Opt}$ , indicating that they had a low likelihood to opt out. We use a larger threshold than the opt-out rate in the Voluntary-Condition because (i) the individual propensity scores are less than 1, implying that we need more workers to approximate the total number of those who opt out in the Voluntary-Condition, and (ii) we want a sample that is large enough for reliable inference. We then estimate Equation (1) within these subsets of the data with either IHS(Revenue) or IHS(RPC) as the dependent variable. To determine whether the effect of mentorship is significantly different between the High $_{Opt}$ and Low $_{Opt}$ agents, we pool the samples in columns (3) and (6) and include a one-zero indicator for High $_{Opt}$ along with its interaction with Mentored. Standard errors are clustered by cohort for those workers entering the experiment after the pilot period and by pilot period by office for those workers entering during the pilot (this is because the pilot entailed assignment of the Mandatory- and Voluntary-Conditions at the office level) and are reported in parentheses.

*Statistical significance at the 10% level; **statistical significance at the 5% level; ***statistical significance at the 1% level.

The estimates in Table 5 imply that the inclusion of workers who would have likely opted out of the voluntary program raised aggregate treatment gains in the Mandatory-Condition by 6% for revenue and by 5% for revenue per call. These figures come from taking the treatment effects of the program for top-tercile agents (38% revenue gains and 30% gains in RPC) and multiplying them by the actual opt-out rate in the Voluntary-Condition, 17%. If we assume that the actual treatment effect in the Voluntary-Condition among opt-in agents is zero, then adding opt-out workers to those eligible for treatment would close approximately 34% (43%) of the gap in treatment effects between the Mandatory- and Voluntary-Conditions for revenue (RPC).³⁶

To summarize, we find evidence of both self-selection—because higher productivity workers are more likely to opt into the mentorship program—and heterogeneous treatment effects—because the mentorship program helps some agents more than others. These channels explain part but not all of the differences in treatment effects between the mandatory and voluntary programs. We discuss in Section 4.4 that framing effects provide the most likely explanation for the remaining differences between conditions.

4.4. Framing Effects and Other Explanations for the Remaining Differences in Treatment Effects

Here, we consider several potential alternative explanations for the remaining differences in treatment effects between the mandatory and voluntary programs. Framing effects appear to be the most likely of the candidates we consider.

4.4.1. Framing Effects.

It is possible that the mandatory framing of the program in the Mandatory-Condition caused agents to infer something about its value and “buy in” or engage, whereas agents in the Voluntary-Condition may have perceived the program as optional, reducing buy-in and engagement. To test for differences in compliance or buy-in between the Mandatory- and Voluntary-Conditions, we tabulate meeting completion rates between mentor-protégé pairs in Table 6. Of the 127 agents assigned to mentorship in the Mandatory-Condition, 18 never completed a recorded meeting with a mentor, whereas 25 of the 155 treated agents in the Voluntary-Condition never met with their mentor. Mandatory-Condition protégés both completed more of their scheduled meetings (2.31 vs. 2.11) and had a higher meeting completion ratio (74% vs. 64%).³⁷ The values in column (4) show that meeting completion rates are even larger among agents in the Mandatory-Condition who likely would have opted into the voluntary program (those with low opt-out propensity scores), suggesting that similar agents have different levels of engagement across the two conditions.

These differences could arise because the opt-in framing in the Voluntary-Condition may have portrayed the program as optional rather than as a job requirement (Hossain and List 2012, Hong et al. 2015). However, meeting rates are still relatively high even in the Voluntary-Condition, suggesting that differences in meeting rates alone are unlikely to explain the full remaining gap in treatment effects across conditions.

Therefore, we attempt to test whether the quality of meetings differs across conditions using worksheet contents. Using two approaches, we find only minimal differences. First, we consider the amount of content transcribed on the mentor-protégé worksheets by counting the total number of words written. Although this is an imperfect measure of the quality of the mentor-protégé meetings, it proxies for the agents’ level of engagement. In our second approach, which is motivated by the worksheet analysis in Sandvik et al. (2020), we use a bag of words to determine how much of a response’s content is focused on job-specific skills and knowledge relative to how much is focused on receiving support or encouragement.³⁸

In comparing the worksheet content of Mandatory-Condition agents and Voluntary-Condition agents (reported in Table 1.A.9 in the Online Appendix), we do not find statistically significant differences in the number of total words or words related to sales skills or knowledge. Mandatory-Condition agents do use about 0.13 more support words than do Voluntary-Condition agents, but this effect is only marginally significant. To the extent that support words signal engagement or encouragement around both the program and the work, this evidence is mildly supportive of the framing channel. However, we have no direct evidence that this interpretation is valid, and we caution that these worksheets are an incomplete record of sentiment or buy-in.³⁹

4.4.2. Other Alternative Explanations.

A number of other factors, like SUTVA violations, crowd-out, retention differences, or perceptions of differential treatment, could potentially explain differences between the Mandatory- and Voluntary-Conditions. We do not find evidence for these alternative explanations. As such, for brevity, we discuss these explanations and their associated tests in detail in Section I.A of the Online Appendix, and we discuss only the conclusions of the tests here. We refer interested readers to the appendix material for additional detail. In Section I.A.1 of the Online Appendix, we do not find evidence that our results are driven by experimenter demand effects, Hawthorne effects, discouragement from treatment status, or information leakage. Leveraging variation from holdout cohorts suggests that violations of the Stable Unit Treatment Value Assumption (SUTVA) are not a major concern for our design. In Section I.A.2 of the Online Appendix, we show that the program does not appear to crowd out organic mentoring that may have occurred in its absence, because nontreated agents in experimental cohorts had similar productivity to agents entering the firm prior to the program’s existence.

In Section 4.2, we showed that, across both conditions, the mentorship program had no impact on retention, which, therefore, cannot explain the observed sales revenue treatment effects. In Section I.A.3 of the Online Appendix, we further show that productivity gains remain (i) when accounting for nonrandom attrition by filling in missing data after separations with the average productivity of replacements and (ii) when using Lee (2009) bounds estimators. The bounding estimator trims the highest and lowest values of the productivity distribution based on the observed attrition rate in the sample. If attrition is nonrandom with respect to the underlying sales measures, then this exercise captures robust treatment effects that are not driven by differential retention of heterogeneous workers across conditions. The bounded estimates of mentorship are never positive for the Voluntary-Condition and remain positive and statistically significant for the Mandatory-Condition.

We show in Section I.A.4 of the Online Appendix that perceptions of differential treatment, where agents may have bought in more if they felt special for receiving a mentor, are not likely at play. Then we conclude in Section I.A.5 of the Online Appendix with a discussion of the challenges in directly calibrating belief- or preference-based explanations for agents’ opt-in or opt-out behavior. As such, in the next section (Section 5), we rely on indirect measures from national surveys that provide context for belief and preference heterogeneity to determine participation decisions.

5. Returns to the Program, Program Prevalence, and External Validity

In this section, we value the program for the firm and discuss the costs of misallocating mentors to agents with relatively small treatment gains. Then, we discuss the results of a nationally representative survey that we conducted, highlighting the widespread prevalence and design variation of human capital development programs. To finish, we comment on the external validity of our findings.

5.1. Net Present Value of Mandatory Mentorship and the Costs of Misallocation

The net present value of the mandatory mentorship program to the firm is equal to approximately $439,000. To arrive at this estimate, we calculate additional revenues of approximately $536,000 in present discounted value over a six-month period. The revenue estimates come from an analysis that considers the additional revenue gain to the firm from each investment in mentorship (or each program slot). We track workers over an entire six-month period after being allocated to receive a mentor or to the control group. If the worker leaves the firm prior to the six-month horizon, we account for the productivity of replacements by filling in a random draw from the distribution of new agents. In this way, we track both long-term revenue gains and any potential impact on retention from the program. We then subtract $97,000 of costs, which include costs of the firm’s staff to administer the program and each mentor’s opportunity cost of lost revenue from engaging in meetings rather than answering calls. We provide details about the calculations in Section I.F of the Online Appendix.

Had the firm allocated all workers to the Mandatory-Condition and had the treatment effects been the same across workers, the firm would have gained an additional $620,000. If, instead, only about one-third of the treatment gains are due to selection and heterogeneous treatment effects (what our back-of-the-envelope calculation based on the opt-out propensity score recovers), the firm still would have gained approximately $207,000 in additional revenue had the mentorship treatment been mandatory.

5.2. Prevalence of Human Capital Development Programs

Beyond our study firm, we conducted a nationally representative worker survey to provide background context about human capital development programs, with a focus on three questions: how prevalent are they, how is their participation determined (i.e., mandatory or voluntary), and how often do workers participate when programs are voluntary?

We administered the survey through the Lucid platform in June 2022 and compensated respondents between $1 and $4. The survey took between seven and 10 minutes to complete. Respondents had to be employed and pass attention checks to proceed through the survey. We asked respondents whether their current employer offers the following programs: (i) mentorship, (ii) training for new hires, and (iii) ongoing training or continuing education. We also asked whether the programs were required/mandatory or optional/voluntary and, if voluntary, whether they participated. We then probed for the reasons for their participation decisions. We present the results from this survey and details about the survey instrument in Table 7.⁴⁰

The survey responses provided three main takeaways. (1) Human capital development programs are ubiquitous, (2) many are voluntary, and (3) many employees do not participate in voluntary programs. Specifically, 45% of the respondents said their employer offers a mentorship program, 87% said they offer new-hire training, and 80% said they offer ongoing training or continuing education. About 59% of the mentorship programs and 43% of the continuing education programs offered are voluntary. New-hire training is much more likely than the other programs to be mandatory. The last column in Table 7 shows substantial nonparticipation rates in voluntary programs. Roughly 27% to 28% of respondents did not participate in their employer’s voluntary mentorship or ongoing training/continuing education programs. Even for new-hire training, rates of nonparticipation exceed 20% when training is optional. Time commitments and doubts about personal program benefits are the most common reasons workers cite for their lack of take-up.⁴¹ These survey results highlight the importance of considering the implications of the mandatory versus voluntary participation design choice that many managers are faced with when they implement a new human capital development program.

5.3. External Validity

As part of the first wave of evidence on mandatory versus voluntary programs, we made multiple decisions to give us high internal validity (List 2020). The tasks that agents performed in the mentorship program—reflecting on their work, sharing these thoughts with mentors, and acting on their mentors’ advice—were a natural extension of their day-to-day activities. Our intervention intentionally included features that would allow the treatment to be deployed at scale (permanently) both at the focal firm and in organizations more broadly.

Several additional points suggest that our results are likely to be externally valid for workers in other frontline or entry-level jobs. In particular, in data from the Census Bureau’s 2019 American Community Survey, sales and related occupations are the second-most common entry level jobs for workers under 25 years old, following food services occupations. Although our results may not speak to human capital development programs for stable, professional occupations, they are applicable to the development decisions of firms that onboard and recruit a substantive share of domestic, entry-level jobs. In addition, our representative worker survey found substantial rates of nonparticipation in human capital development programs, suggesting that nonparticipation is a general phenomenon that applies beyond entry-level occupations because older workers and those with a bachelor’s degree or higher have slightly higher rates of nonparticipation in voluntary programs than the overall sample average.

6. Conclusion

Many firms make considerable investments in human capital development programs, such as training and mentorship programs. But an important and understudied question is whether human capital development resources are allocated to the right workers. We consider a ubiquitous decision that managers face when deciding how to allocate human capital development resources: whether they should make development programs mandatory or voluntary. We investigate the implications of this mandatory versus voluntary design choice by conducting a field experiment on mentorship in a U.S.-based inbound sales call center.

We find that a mandatory version of the mentorship program significantly raised workers’ productivity, with average sales gains on the order of about 19% over new hires’ first two months on the job. By contrast, treatment gains were approximately zero when the program was voluntary. A substantial part of the difference in the efficacy of the mandatory and voluntary programs arises because program treatment effects are negatively correlated with the propensity to participate in the program. Our findings indicate that the decision to make a human capital development program mandatory versus voluntary is not trivial because the returns to the program are determined largely by selection and treatment effects. As such, our findings shed additional light on why wage inequality and performance differences may persist across workers and firms. That these differences exist even in the presence of high-powered incentive pay suggests that managers may need to mandate worker participation in human capital development programs.

In our setting, training that leverages know-how from coworkers can improve the productivity of lower-performing workers, but low-ability workers may be the least likely to seek out such help. As such, an employee’s voluntary decision to participate or not in human capital development programs may be a useful signal to managers—in this setting and possibly others—about who will benefit the most from additional help. Other allocation rules may also be feasible, but these likely entail waiting to collect performance data and subsequently staging a performance improvement intervention. In high turnover frontline jobs, firms face a tradeoff between a delay in upskilling workers to improve program allocations through targeting versus offering broader training more quickly.

In general, frictions around program participation deserve further investigation because nonparticipation is a pervasive feature found in the focal organization and, as shown in our national survey, in other public and private firms. Selection bias in program recruitment can distort program efficacy and inferences, as demonstrated by negative sorting on gains in charter school enrollment (Walters 2018) and the site selection bias identified by Allcott (2015). With remote work, these allocation questions may become more pronounced (Bojinov et al. 2021, Emanuel et al. 2023), making human capital development allocation decisions even more salient and challenging. Furthermore, the implications of the mandatory versus voluntary program design choice may vary depending on other features of the setting (e.g., how well-defined the content of the training is). These considerations should motivate future research to understand how to allocate scarce human capital development resources.

Acknowledgments

The authors thank Emily Beam, Jasmijn Bol, Laura Boudreau (discussant), Zoe Cullen, Florian Englmaier (discussant), Guido Friebel, Robert Garlick, Robert Gary-Bobo (discussant), Isaac Hacamo (discussant), Jessica Hoel, Mitch Hoffman, Lisa LaViers, John List, Michelle Lowry, Bentley MacLeod, Robert Metcalfe, Paige Ouimet, Dimitris Papanikolaou, Raffaella Sadun, Elena Simintzi, Jason Snyder, Harish Sujan, Brian Waters (discussant), Michael Weisbach, seminar participants at Harvard Business School, MIT, the University of Michigan’s Ross Strategy Brownbag series, the Indian Institute of Management Ahmedabad, the University of Arizona, and conference participants at the CEPR IMO, the Econometric Society Latin American Meeting, the FMA, the Advances in Field Experiments Meeting, the Accounting and Economics Society, the 2022 Labor and Finance Meeting, the 2022 Strategy Science Conference, the 2022 NBER Summer Institute, the 2022 NBER Organizational Economics Meeting, and the 2023 AEA Annual Meeting for helpful comments.

Endnotes

¹ The prior literature on human capital development has focused largely on whether firms can rationalize training investments (Becker 1975, Acemoglu and Pischke 1999, Fudenberg and Rayo 2019, Starr 2019) rather than how program features influence returns.

² Workplace mentorship programs are themselves ubiquitous in the United States, with approximately 70% of the Fortune 500 firms offering such programs (Gutner 2009).

³ Commission rates rise with performance, with the most (least) productive workers earning commissions equal to 8% (3%) of their total sales revenue each week.

⁴ We do not evaluate the extent to which workers’ beliefs about the efficacy of the program predicted their opt-out decisions because eliciting subjects’ ex ante beliefs about treatment effects could have potentially swayed their participation decisions. To get at belief-related mechanisms behind the opt-out decision, we rely on evidence from a nationally representative survey of workers. In that survey, we find wide variation in firms’ practices regarding whether programs are mandatory or voluntary, with substantial rates of nonparticipation in voluntary programs. Workers cite time constraints and inconvenience as the primary reasons for nonparticipation; these issues are less applicable to the experiment because the program was conducted during work hours. The next most common reasons highlight skepticism about personal benefits and the desire to avoid interacting with coworkers or bosses. Intimidation around interacting with more productive coworkers was a theme that emerged in interviews during our prior work in this firm (Sandvik et al. 2020), but to the extent that variation in personality characteristics may pick up differential propensity for intimidation, we do not detect evidence for it. A related possibility is that some workers may believe that program enrollment signals something about their competence, which has been shown to impact advice seeking at work (Heursen et al. 2023).

⁵ We use terciles because the propensity to opt out is noisily estimated, and we lose power when using finer partitions of the sample.

⁶ The gains in the Mandatory-Condition for workers who are likely to be lower performers suggest that any incentive conflict between the firm and these workers (who have relatively lower anticipated commission rates than higher performers) can be overcome with the firm’s guidance to use program resources for improvement.

⁷ Our survey also provides new context around the prevalence and characteristics of workplace programs, complementing studies of programs in particular contexts or industries (Rockoff 2008, Jones et al. 2019, Chatterji et al. 2019, Reif et al. 2020).

⁸ The early work in this area tended to be motivated by across-firm variation in management practices contributing to differences in TFP (Bloom and Van Reenen 2007, Syverson 2011, Gibbons and Henderson 2013). Intra-firm experiments show that small changes in practices or incentives can lead to profound differences in output, which may contribute to across-firm variation (Friebel et al. 2017, Gosnell et al. 2020). These effects are likely even more pronounced when they interact with spillovers inside organizations (Mas and Moretti 2009, Bandiera et al. 2013, Carrell et al. 2013, Herbst and Mas 2015, Lazear et al. 2015, Cornelissen et al. 2017). Our findings contribute to an understanding of how variation in how firms do things contributes to performance heterogeneity (Englmaier et al. 2018, Bloom et al. 2019).

⁹ Several earlier papers have examined endogenous entry/participation across different contexts (Karlan and Zinman 2009, DellaVigna et al. 2012, Lazear et al. 2012), but many of these designs may be difficult to implement inside firms. Our design instead allows for simple variation in program recruitment procedures that enable tests of how treatment effects vary across selected samples.

¹⁰ These figures come from the 2019 five-year American Community Survey for SOC codes 43405, 41904, and 41309. To construct hourly earnings in the ACS data, we divide total individual income by the product of weeks worked last year and usual hours per week.

¹¹ There is mild relative performance evaluation in this setting, and commissions increase at each quintile of selling efficiency. Helping another agent is unlikely to change relative rankings across quintiles because the probability is small that any two agents are pivotal at the commission rate kink. Relative to settings with longer sales cycles (see, e.g., Oyer 1998 and Larkin 2014), incentives based on relative performance reset weekly.

¹² When mentor supply fell below 50% of the number of eligible new hires, the most common reason was conflicting obligations to mentor other cohorts in the same division or office.

¹³ Our prior working paper version reports 53 cohorts assigned to treatments. We erroneously coded one cohort that had no available mentors as eligible for the experiment.

¹⁴ Across months, opt-out rates range from 6% to 43%, with no obvious time trend.

¹⁵ The staff members were asked to read the following statement to new hires in the Voluntary-Condition: “We have recently begun a mentorship program to help newly hired sales agents when they begin working on the sales floor. Agents who opt into the program and are chosen by [the research team] will be assigned a mentor. Your mentor will approach you during your first week on the sales floor to initiate the mentoring relationship. The program will run from your first week on the sales floor to your fourth week on the sales floor, and you and your mentor will meet once a week to discuss your progress.”

¹⁶ The staff reported to the authors on multiple check-in calls that they found no evidence of discouragement among the agents who did not receive a mentor.

¹⁷ Mentors were not designated exclusively to either the Mandatory- or Voluntary-Condition, so their first protégé could have been in one condition and their second protégé could have been in the other condition. Mentors generally mentored only a single protégé at a time, but there were instances where a mentor was assigned to multiple protégés at once. This occurred only when mentors were in short supply and the firm’s internal staff felt that the mentors could effectively handle the assignment. In all cases, though, to facilitate meeting coordination, the firm built specific times to meet into mentors’ and protégés’ schedules, and mentor-protégé pairs always met individually, meaning that the protocol was the exact same from the point of view of the protégé.

¹⁸ In our previous experience conducting experiments within this setting, we found no evidence of spillovers from one treatment group to another among sales agents within the same office (Sandvik et al. 2020). In particular, we leveraged data from sale agents in a separate office that was not part of (or informed of) the experiment, and we found that their trends in sales performance mirrored those of the agents in the control group who were aware of the experiment (those located in the two participating offices) but who were not treated with any stimulus to alter their behavior. In addition, in that setting we found significant differences in treatment effects between the conditions that nudged agents to share best practices and those that did not—even though the printed prompts to share best practices (a physical worksheet with questions) could have easily been disseminated across treatment groups. In the Online Appendix, we present tests for contamination and spillovers outside of the experimental treatments.

¹⁹ Instructions given to mentors and the mentor-protégé worksheet can be found in the Online Appendix (see Section I.C).

²⁰ There are 15 recruiters in the data. Some recruiters systematically give higher scores than others conditional on the performance of the workers they evaluate. We find this relationship for workers who are not part of the experiment, and we account for it using a procedure that adjusts for the stringency or leniency of each recruiter. Using data on workers outside of the experiment, we recover recruiter relative leniency by regressing raw hiring scores on productivity (specifically, the inverse hyperbolic sine of revenue and the inverse hyperbolic sine of revenue per call), recruiter fixed effects, brand fixed effects, and time fixed effects. We then shrink the recruiter fixed effects (that are net of the productivity adjustment) using the procedure in Lazear et al. (2015). We subtract the adjusted recruiter fixed effects from the raw hiring scores of workers in the experiment to return the adjusted hiring scores.

²¹ We also check for balance across assignment to divisions based on estimated division-level productivity for workers outside the experiment. Table I.A.3 in the Online Appendix shows that assignment is balanced on the productivity metrics of non-mentor-eligible new hires (from holdout cohorts) across divisions.

²² The inverse hyperbolic sine transformation was not under consideration at the time we preregistered using dependent variables in logarithms, because at the time we were unaware of the fact that workers occasionally experienced zero-revenue days. Our results are similar if we use the natural logarithm of one plus revenue or one plus RPC. Although the results are qualitatively unchanged, we nonetheless provide the results using the logarithmic transformation in Table I.A.4 in the Online Appendix.

²³ Differences between revenue and RPC estimates arise from differences in hours and/or calls. Mentored agents in the Mandatory-Condition, on average, field 0.4 more calls per day (p-value = 0.134), and they work 0.2 more hours per day (p-value < 0.01) than their non-mentored peers.

²⁴ Our effects are smaller than those documented from the introduction of technology, because Brynjolfsson et al. (2023) found that access to a generative A.I.-based tool increased the productivity of newly hired call center agents doing customer service work by 34%. However, our effect sizes are roughly equal in magnitude to the 13% lift associated with remote work found in Bloom et al. (2015).

²⁵ All of our preregistered specifications include cohort fixed effects because we expected that between-cohort variation would significantly increase minimum detectable effect sizes. With cohort fixed effects, calendar time and elapsed time since hire are colinear. In a balanced panel with a short time window, cohort-fixed effects absorb time-fixed effects. We show in Figure I.A.1 of the Online Appendix that our results are robust to the inclusion of date fixed effects as well as to the use of several other alternative specifications.

²⁶ The summary index approach has been used to evaluate education interventions when there are multiple potential outcomes (Deming et al. 2014). The procedure first demeans and standardizes each individual outcome by the control group standard deviation (in this case, non-mentored agents in the Mandatory-Condition). The index is then the weighted sum across inputs, where the weights come from the inverse of the covariance matrix of the standardized measures, akin to the approach in generalized least squares. Anderson (2008) argued that this approach has three advantages. (i) It allows for a single test rather than multiple tests across different outcomes, (ii) it is a test of whether a program has a general effect, and (iii) the tests are potentially more powerful than multiple tests with marginal significance.

²⁷ The q-values are adjusted for tests on all regressors reported in columns (7)–(9) of Table 2, as well as all regressors reported in Columns (7)–(9) of Table I.A.6 in the Online Appendix, which capture the long-term treatment and selection effects of mentorship in months 3–6 of agents’ tenure with the firm. The estimated sharpened q-values are conservative in our case because they do not account for the positive correlations across tests.

²⁸ The results are similar if we include individuals who did not complete training into the retention estimations.

²⁹ One possible, albeit speculative, explanation for the lack of a retention effect among agents in the Mandatory-Condition—despite their improved productivity—is that exposure to a top performer (i.e., their mentor) may have increased their awareness of how much they had yet to learn. This could have discouraged them about their long-term career prospects at the firm, even if their performance was accelerating faster than that of their non-mentored peers.

³⁰ For all agents, the completed tenure is calculated as the difference between their hire date and the date of the last day they are observed in the data, divided by 365.

³¹ If we conduct a t-test on completed tenure between the two groups, the averages are 0.391 and 0.394 with a p-value of 0.9581. Similarly, we cannot reject the null that the standard deviations of these two distributions differ (p-value = 0.2446), nor can we reject the null that the distributions differ via the rank-sum test (p-value = 0.2047).

³² In this analysis, we include agents who did not complete training, in which case they do not have productivity data. This accounts for the difference in unique worker counts between this sample and that reported in Figure 1. Results are similar when we use only workers who completed training to examine the determinants of opting out.

³³ We are missing hiring score data for 25 agents, so we set their hiring scores to zero and include an indicator variable that they had missing data. We find similar results in Table I.A.8 of the Online Appendix when using raw (non-adjusted) hiring scores.

³⁴ Readers may wonder why the firm would hire applicants with low interview scores. The seasonal nature of subscription sales requires immediate workforce capacity, and the firm often would need to take a set of applicants as given and pick the best among them rather than continue to recruit better agents to fill out a training class.

³⁵ We also preregistered a procedure for estimating heterogeneous treatment effects that yields larger estimates for those who opt out. We discuss this procedure in the Online Appendix (see Section I.D). The results of the preregistered estimations provide additional evidence that the treatment effects of mentorship are greatest among agents who are most likely to have opted out of the program. Our preregistered estimates of treatment gains for opt-out agents are larger than those given here because that estimator imposes that the treatment effects for agents who opt in are constant across the mandatory and voluntary programs. The approach in Table 5 allows the treatment effects to differ across the treatment conditions for agents who are likely to opt in, and we find that the treatment effects for these agents are modestly positive in the Mandatory-Condition. The preregistered approach is sensitive to this variation among agents who are likely to opt in, so we prioritize the estimator based on the propensity score that does not impose this restriction.

³⁶ For revenue, (38% $\times$ 17%)/19% = 0.340. For RPC, (30% $\times$ 17%)/12% = 0.425. We note that these are conservative estimates because the propensity scores are measured with error, so we use estimates that average effects over the top third of the distribution of scores, covering more workers than the actual opt-out rate.

³⁷ While the pre-registered mentoring protocol called for one meeting per week for four weeks, there were instances in which either a mentor, protégé, or both were absent from work for an extended period of time (e.g., on vacation), reducing the number of possible scheduled meetings from four to three (or fewer, in some cases). As such, the denominator of the meeting completion ratio is occasionally less than four.

³⁸ Specifically, we tabulate the number of “skill” words an agent uses in his or her responses, and we do the same thing for the number of “support” words. Words that are not classified as either support words or skill words are categorized as “other,” including stop words. We list the words in each category in the Online Appendix (see Section I.E), along with multiple example responses.

³⁹ Two weeks after mentors and protégés completed their final meeting, the staff asked protégés to complete a post-mentorship survey. The completion rates for this survey were quite low (less than 10%) because the firm did not monitor or provide incentives for completion. Figure I.A.3 in the Online Appendix shows that protégés, on average, felt like they benefited from the program. The average respondent reported that mentorship helped them to learn selling tactics and that the program increased their day-to-day satisfaction at work.

⁴⁰ We also included workplace wellness programs as a validation check. Sixty-five percent of our respondents indicated that their workplace has a wellness program. This is roughly comparable to numbers cited by Jones et al. (2019) from a 2016 Kaiser Family Foundation report, indicating that 53% of firms with more than 200 employees do biometric screening, 59% assess lifestyle health habits, and 83% have programs that encourage healthy lifestyles.

⁴¹ Forty-seven percent of nonparticipants in mentorship, 36% in new hire training, and 42% in ongoing training cite time constraints or the inconvenience of program offerings as one of their reasons for not participating. “Didn’t believe these programs would benefit me” (26% for mentorship, 28% for new-hire training, and 31% for ongoing training) is the next most common reason. Other options such as, “Didn’t plan to stay at the firm, so didn’t invest,” “Wanted to avoid interaction with coworkers or bosses,” and “Felt the program would benefit my employer more than it would benefit me” were selected by 8% to 13% of the respondents.

References

Acemoglu D, Pischke JS (1999) The structure of wages and investment in general training. J. Political Econom. 107(3):539–572.Crossref, Google Scholar
Allcott H (2015) Site selection bias in program evaluation. Quart. J. Econom. 130(3):1117–1165.Crossref, Google Scholar
Allen TD, Eby LT, Chao GT, Bauer TN (2017) Taking stock of two relational aspects of organizational life: Tracing the history and shaping the future of socialization and mentoring research. J. Appl. Psych. 102(3):324–337.Crossref, Google Scholar
Anderson ML (2008) Multiple inference and gender differences in the effects of early intervention: A reevaluation of the abecedarian, perry preschool, and early training projects. J. Amer. Statist. Assoc. 103(484):1481–1495.Crossref, Google Scholar
Ashraf N, Bandiera O, Minni V, Zingales L (2025) Meaning at work. NBER Working Paper No. 33843, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Bandiera O, Barankay I, Rasul I (2013) Team incentives: Evidence from a firm-level experiment. J. Eur. Econom. Assoc. 11(5):1079–1114.Crossref, Google Scholar
Bandiera O, Prat A, Hansen S, Sadun R (2020) Ceo behavior and firm performance. J. Political Econom. 128(4):1325–1369.Crossref, Google Scholar
Becker GS (1975) Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education (University of Chicago Press, Chicago).Google Scholar
Benmelech E, Bergman N, Kim H (2022) Strong employers and weak employees: How does employer concentration affect wages? J. Hum. Res. 57(S):S200‒S250.Google Scholar
Benson A, Li D, Shue K (2019) Promotions and the peter principle. Quart. J. Econom. 134(4):2085–2134.Crossref, Google Scholar
Benson A, Li D, Shue K (2024) Potential and the gender promotions gap. Preprint, submitted April 2, http://dx.doi.org/10.2139/ssrn.4747175.Google Scholar
Bertrand M, Schoar A (2003) Managing with style: The effect of managers on firm policies. Quart. J. Econom. 118(4):1169–1208.Crossref, Google Scholar
Bloom N, Van Reenen J (2007) Measuring and explaining management practices across firms and countries. Quart. J. Econom. 122(4):1351–1408.Crossref, Google Scholar
Bloom N, Sadun R, Van Reenen J (2016) Management as a technology? NBER Working Paper No. 22327, National Bureau of Economic Research, Cambridge, MA.Crossref, Google Scholar
Bloom N, Liang J, Roberts J, Ying ZJ (2015) Does working from home work? Evidence from a Chinese experiment. Quart. J. Econom. 130(1):165–218.Crossref, Google Scholar
Bloom N, Lemos R, Sadun R, Scur D, Van Reenen J (2014) Jeea-fbbva lecture 2013: The new empirical economics of management. J. Eur. Econom. Assoc. 12(4):835–876.Crossref, Google Scholar
Bloom N, Brynjolfsson E, Foster L, Jarmin R, Patnaik M, Saporta-Eksten I, Van Reenen J (2019) What drives differences in management practices? Amer. Econom. Rev. 109(5):1648–1683.Crossref, Google Scholar
Bojinov I, Choudhury PN, Lane J (2021) Virtual watercoolers: A field experiment on virtual synchronous interactions and performance of organizational newcomers. Harvard Business School Technology & Operations Mgt. Unit Working Paper (21-125).Google Scholar
Brynjolfsson E, Li D, Raymond LR (2023) Generative AI at work. NBER Working Paper No. 31161, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Carrell SE, Sacerdote BI, West JE (2013) From natural variation to optimal policy? The importance of endogenous peer group formation. Econometrica 81(3):855–882.Crossref, Google Scholar
Carter SP, Dudley W, Lyle DS, Smith JZ (2019) Who’s the boss? The effect of strong leadership on employee turnover. J. Econom. Behav. Organ. 159:323–343.Crossref, Google Scholar
Castro S, Englmaier F, Guadalupe M (2022) Fostering psychological safety in teams: Evidence from an RCT. Preprint, submitted June 20, http://dx.doi.org/10.2139/ssrn.4141538.Google Scholar
Chan DC, Gentzkow M, Yu C (2022) Selection with variation in diagnostic skill: Evidence from radiologists. Quart. J. Econom. 137(2):729–783.Crossref, Google Scholar
Chandrasekhar AG, Golub B, Yang H (2018) Signaling, shame, and silence in social learning. NBER Working Paper No. 25169, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Chatterji A, Delecourt S, Hasan S, Koning R (2019) When does advice impact startup performance? Strategic Management J. 40(3):331–356.Crossref, Google Scholar
Chetty R, Friedman JN, Rockoff JE (2014) Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood. Amer. Econom. Rev. 104(9):2633–2679.Crossref, Google Scholar
Cornelissen T, Dustmann C, Schönberg U (2017) Peer effects in the workplace. Amer. Econom. Rev. 107(2):425–456.Crossref, Google Scholar
Cornwell C, Schmutte IM, Scur D (2021) Building a productive workforce: The role of structured management practices. Management Sci. 67(12):7308–7321.Link, Google Scholar
Coviello D, Ichino A, Persico N (2014) Time allocation and task juggling. Amer. Econom. Rev. 104(2):609–623.Crossref, Google Scholar
Currie J, MacLeod WB (2017) Diagnosing expertise: Human capital, decision making, and performance among physicians. J. Labor Econom. 35(1):1–43.Crossref, Google Scholar
Currie JM, MacLeod WB (2020) Understanding doctor decision making: The case of depression treatment. Econometrica. 88(3):847–878.Crossref, Google Scholar
Custódio C, Ferreira MA, Matos P (2019) Do general managerial skills spur innovation? Management Sci. 65(2):459–476.Link, Google Scholar
Dahl GB, Løken KV, Mogstad M (2014) Peer effects in program participation. Amer. Econom. Rev. 104(7):2049–2074.Crossref, Google Scholar
Del Carpio L, Guadalupe M (2022) More women in tech? Evidence from a field experiment addressing social identity. Management Sci. 68(5):3196–3218.Link, Google Scholar
Delfino A, Garnero A, Inferrera S, Leonardi M, Sadun R (2024) Unwilling to reskill? Evidence from a survey experiment with jobseekers.Google Scholar
DellaVigna S, List JA, Malmendier U (2012) Testing for altruism and social pressure in charitable giving. Quart. J. Econom. 127(1):1–56. ISSN 00335533, https://doi.org/10.1093/qje/qjr050.Crossref, Google Scholar
Deming DJ, Hastings JS, Kane TJ, Staiger DO (2014) School choice, school quality, and postsecondary attainment. Amer. Econom. Rev. 104(3):991–1013.Crossref, Google Scholar
Diaz B, Neyra-Nazarrett A, Ramirez J, Sadun R, Tamayo J (2025) Training within firms. NBER Working Paper No. 33670, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Edmans A (2011) Does the stock market fully value intangibles? Employee satisfaction and equity prices. J. Financial Econom. 101(3):621–640.Crossref, Google Scholar
Edmondson AC, Lei Z (2014) Psychological safety: The history, renaissance, and future of an interpersonal construct. Annual Rev. Org. Psych. Org. Behav. 1(1):23–43.Crossref, Google Scholar
Emanuel N, Harrington E, Pallais A (2023) The power of proximity to coworkers: Training for tomorrow or productivity today? NBER Working Paper No. 31880, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Englmaier F, Foss NJ, Knudsen T, Kretschmer T (2018) Organization design and firm heterogeneity: Towards an integrated research agenda for strategy. Organization Design (Emerald Publishing Limited, Bingley, UK), 229–252.Google Scholar
Englmaier F, Grimm S, Grothe D, Schindler D, Schudy S (2025) The value of leadership: Evidence from a large-scale field experiment. Leadership Quart. 36(3):101869.Google Scholar
Finkelstein A, Gentzkow M, Williams H (2016) Sources of geographic variation in health care: Evidence from patient migration. Quart. J. Econom. 131(4):1681–1726.Crossref, Google Scholar
Friebel G, Heinz M, Zubanov N (2022) Middle managers, personnel turnover, and performance: A long-term field experiment in a retail chain. Management Sci. 68(1):211–229.Link, Google Scholar
Friebel G, Heinz M, Hoffman M, Zubanov N (2023) What do employee referral programs do? Measuring the direct and overall effects of a management practice. J. Political Econom. 131(3):633–686.Crossref, Google Scholar
Friebel G, Heinz M, Krueger M, Zubanov N (2017) Team incentives and performance: Evidence from a retail chain. Amer. Econom. Rev. 107(8):2168–2203.Crossref, Google Scholar
Fudenberg D, Rayo L (2019) Training and effort dynamics in apprenticeship. Amer. Econom. Rev. 109(11):3780–3812.Crossref, Google Scholar
Gibbons R, Henderson R (2013) What Do Managers Do? Exploring Persistent Performance Differences Among Seemingly Similar Enterprises (Princeton University Press, Princeton, NJ), 680–731.Crossref, Google Scholar
Ginther DK, Currie JM, Blau FD, Croson RT (2020) Can mentoring help female assistant professors in economics? An evaluation by randomized trial. AEA Papers Proc. 110:205–209.Crossref, Google Scholar
Gosnell GK, List JA, Metcalfe RD (2020) The impact of management practices on employee productivity: A field experiment with airline captains. J. Political Econom. 128(4):1195–1233.Crossref, Google Scholar
Gubler T, Larkin I, Pierce L (2018) Doing well by making well: The impact of corporate wellness programs on employee productivity. Management Sci. 64(11):4967–4987.Link, Google Scholar
Gutner T (2009) Finding anchors in the storm: Mentors. Wall Street J. (January 27), https://www.wsj.com/articles/SB123301451869117603?msockid=11735632ab496b95126d43ceaa2b6a03.Google Scholar
Heckman JJ, Ichimura H, Todd PE (1997) Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Rev. Econom. Stud. 64(4):605–654.Crossref, Google Scholar
Herbst D, Mas A (2015) Peer effects on worker output in the laboratory generalize to the field. Science. 350(6260):545–549.Crossref, Google Scholar
Heursen L, Friess S, Chugunova M (2023) Reputational concerns and advice-seeking at work. Max Planck Institute for Innovation & Competition Research Paper, No. 23-17, Max Planck Institute for Innovation and Competition, Munich, Germany.Google Scholar
Hilmer C, Hilmer M (2007) Women helping women, men helping women? Same-gender mentoring, initial job placements, and early career publishing success for economics PhDs. Amer. Econom. Rev. 97(2):422–426.Crossref, Google Scholar
Hoffman M, Burks SV (2020) Worker overconfidence: Field evidence and implications for employee turnover and firm profits. Quant. Econom. 11(1):315–348.Crossref, Google Scholar
Hoffman M, Stanton CT (2024) People, practices, and productivity: A review of new advances in personnel economics. NBER Working Paper No. 32849, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Hoffman M, Tadelis S (2021) People management skills, employee attrition, and manager rewards: An empirical analysis. J. Political Econom. 129(1):243–285.Crossref, Google Scholar
Hoffman M, Kahn LB, Li D (2017) Discretion in hiring. Quart. J. Econom. 133(2):765–800.Crossref, Google Scholar
Hong F, Hossain T, List JA (2015) Framing manipulations in contests: A natural field experiment. J. Econom. Behav. Organ. 118:372–382.Crossref, Google Scholar
Hossain T, List JA (2012) The behavioralist visits the factory: Increasing productivity using simple framing manipulations. Management Sci. 58(12):2151–2167.Link, Google Scholar
Huffman D, Raymond C, Shvets J (2022) Persistent overconfidence and biased memory: Evidence from managers. Amer. Econom. Rev. 112(10):3141–3175.Crossref, Google Scholar
Johnson MS, Levine DI, Toffel MW (2023) Improving regulatory effectiveness through better targeting: Evidence from OSHA. Amer. Econom. J. Appl. Econom. 15(4):30–67.Crossref, Google Scholar
Jones D, Molitor D, Reif J (2019) What do workplace wellness programs do? Evidence from the Illinois workplace wellness study. Quart. J. Econom. 134(4):1747–1791.Crossref, Google Scholar
Karlan D, Zinman J (2009) Observing unobservables: Identifying information asymmetries with a consumer credit field experiment. Econometrica 77(6):1993–2008.Crossref, Google Scholar
Larkin I (2014) The cost of high-powered incentives: Employee gaming in enterprise software sales. J. Labor Econom. 32(2):199–227.Crossref, Google Scholar
Larkin I, Leider S (2012) Incentive schemes, sorting and behavorial biases of employees: Experimental evidence. Amer. Econom. J. Microeconom. 4(2):184–214.Crossref, Google Scholar
Larrain M (2015) Capital account opening and wage inequality. Rev. Financial Stud. 28(6):1555–1587.Crossref, Google Scholar
Lazear EP, Malmendier U, Weber RA (2012) Sorting in experiments with application to social preferences. Amer. Econom. J. Appl. Econom. 4(1):136–163.Crossref, Google Scholar
Lazear EP, Shaw KL, Stanton C (2016) Making do with less: Working harder during recessions. J. Labor Econom. 34(S1):S333–S360.Crossref, Google Scholar
Lazear EP, Shaw KL, Stanton CT (2015) The value of bosses. J. Labor Econom. 33(4):823–861.Crossref, Google Scholar
Lee DS (2009) Training, wages, and sample selection: Estimating sharp bounds on treatment effects. Rev. Econom. Stud. 76(3):1071–1102.Crossref, Google Scholar
Li D, Raymond LR, Bergman P (2025) Hiring as exploration. Rev. Economic Stud. rdaf040.Google Scholar
List JA (2020) Non est disputandum de generalizability? A glimpse into the external validity trial. NBER Working Paper No. 27535, National Bureau of Economic Research, Cambridge, MA.Crossref, Google Scholar
List JA (2022) The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale (Crown Currency, New York).Google Scholar
Lyle DS, Smith JZ (2014) The effect of high-performing mentors on junior officer promotion in the us army. J. Labor Econom. 32(2):229–258.Crossref, Google Scholar
Mas A, Moretti E (2009) Peers at work. Amer. Econom. Rev. 99(1):112–145.Crossref, Google Scholar
Metcalfe RD, Sollaci AB, Syverson C (2023) Managers and productivity in retail. NBER Working Paper No. 31192, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Minni V (2023) Making the invisible hand visible: Managers and the allocation of workers to jobs. Working paper, Chicago Booth.Google Scholar
Nishesh N, Ouimet P, Simintzi E (2024) Labor and corporate finance. Handbook of Corporate Finance (Edward Elgar Publishing, Cheltenham, UK), 647–673.Google Scholar
Oyer P (1998) Fiscal year ends and nonlinear incentive contracts: The effect on business seasonality. Quart. J. Econom. 113(1):149–185.Crossref, Google Scholar
Oyer P, Schaefer S (2011) Personnel economics: Hiring and incentives. Handbook of Labor Econom. 4:1769–1823.Crossref, Google Scholar
Reif J, Chan D, Jones D, Payne L, Molitor D (2020) Effects of a workplace wellness program on employee health, health beliefs, and medical use: A randomized clinical trial. JAMA Internal Med. 180(7):952–960.Crossref, Google Scholar
Rockoff JE (2008) Does mentoring reduce turnover and improve skills of new employees? Evidence from teachers in New York City. NBER Working Paper No. 13868, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Rouen E, Regier M (2022) The stock market value of human capital creation. Harvard Business School Accounting & Management Unit Working Paper (21–047).Google Scholar
Ruggles S, Flood S, Sobek M, Backman D, Cooper G, Rivera Drew JA, Richards S, Rogers R, Schroeder J, Williams KCW (2025) IPUMS USA: Version 16.0 [American Community Survey dataset], IPUMS, Minneapolis.Google Scholar
Sandvik JJ, Saouma RE, Seegert NT, Stanton CT (2020) Workplace knowledge flows. Quart. J. Econom. 135(3):1635–1680.Crossref, Google Scholar
Starr E (2019) Consider this: Training, wages, and the enforceability of covenants not to compete. ILR Rev. 72(4):783–817.Crossref, Google Scholar
Statista (2022) Total training expenditures in the United States from 2012 to 2022 (in billion U.S. dollars) [graph]. Accessed January 29, 2023, https://www-statista-com.ezp-prod1.hul.harvard.edu/statistics/788521/training-expenditures-united-states/.Google Scholar
Syverson C (2011) What determines productivity? J. Econom. Lit. 49(2):326–365.Crossref, Google Scholar
Walters CR (2018) The demand for effective charter schools. J. Political Econom. 126(6):2179–2223.Crossref, Google Scholar
Zingales L (2000) In search of new foundations. J. Finance 55(4):1623–1653.Crossref, Google Scholar
Zivin JG, Kahn LB, Neidell M (2021) Incentivizing learning-by-doing: The role of compensation schemes. Workplace Productivity and Management Practices, 139–178 (Emerald Publishing Limited, Leeds, UK).Crossref, Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 01, 2024
Accepted:January 01, 2025
Published Online:September 15, 2025

Cite as

Jason Sandvik, Richard Saouma, Nathan Seegert, Christopher Stanton (2025) Should Human Capital Development Programs be Mandatory or Voluntary? Evidence from a Field Experiment on Mentorship. Management Science 0(0).

https://doi.org/10.1287/mnsc.2024.07524

Keywords

Acknowledgments

PDF download

Available Issues

Available Issues

Should Human Capital Development Programs be Mandatory or Voluntary? Evidence from a Field Experiment on Mentorship

Abstract

1. Introduction

2. Firm Setting

3. Experimental Design

3.1. Timeline for Administering the Program and Communicating Treatment Allocations

3.2. Identifying Mentors

3.3. Hold-Out Cohorts to Test for SUTVA Violations

3.4. Pilot Data

3.5. Balance Across Treatments

4. Estimation and Results

4.1. Treatment Effects on Productivity and Selection Into Mentoring

4.1.1. Treatment Effects in the Mandatory-Condition.

4.1.2. Treatment Effects in the Voluntary-Condition.

4.1.3. Self-Selection in the Voluntary-Condition.

4.1.4. Pooled Estimates, Additional Productivity Measures, Multiple Tests, and Long-Term Outcomes.

4.2. The Mentorship Program Did Not Impact Worker Retention

4.3. Selection and Treatment Effect Heterogeneity

4.3.1. Differences Between Agents Who Opt Out and Those Who Opt In.

4.3.2. Heterogeneous Treatment Effects for Agents Who Are Likely to Opt Out of the Program.

4.4. Framing Effects and Other Explanations for the Remaining Differences in Treatment Effects

4.4.1. Framing Effects.

4.4.2. Other Alternative Explanations.

5. Returns to the Program, Program Prevalence, and External Validity

5.1. Net Present Value of Mandatory Mentorship and the Costs of Misallocation

5.2. Prevalence of Human Capital Development Programs

5.3. External Validity

6. Conclusion

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords