The Design and Targeting of Compliance Promotions
Abstract
This paper considers an experiment-based approach to the optimal design and targeting of compliance promotions. Compliance promotions involve optional participation on the behalf of customers. For example, physicians must consent to see detailers, and consumers must redeem coupons to obtain discounts. Individual compliance decisions affect the mix of customers participating in the promotion and, therefore, how the promotion affects sales. Optional compliance is an especially acute problem in the context of field experiments as policy optimization often necessitates extrapolation beyond the observed cells of the experiment to a different mix of complying customers. Our approach to optimizing the design and targeting of compliance promotions involves (i) an experiment to exogenously vary promotion features; (ii) a means to identify which promotion features can be causally extrapolated; (iii) an approach to extrapolate those causal effects; and (iv) an optimization over the promotion features, conditioned on the extrapolation. The approach is easy to estimate, accommodates two-sided noncompliance due to unobserved heterogeneity, and establishes partial identification bounds of causal effects. When applying the approach to a hotel loyalty promotion, wherein customers must visit enough hotels to earn bonus loyalty points, we find profits are improved considerably.
History: Olivier Toubia served as the senior editor for this article.
Supplemental Material: The data files are available at https://doi.org/10.1287/mksc.2022.1420.
1. Introduction
1.1. Overview
Consider a credit card retention offer designed to prevent customers from attriting. The card issuer faces two major questions when developing this promotion: (i) who should receive the offer (targeting), and (ii) what the level of the retention offer should be (design). To aid these decisions, firms often conduct field experiments to ascertain how different consumer segments respond to promotions with various features (Ascarza 2018). Upon observing customer responses in the field experiment, the firm can simply choose, for each segment, the offer with the highest expected customer profit, net of reward costs. For example, a common credit card promotion offers a retention bonus to customers who renew, in the hopes that the subset of customers encouraged by the bonus to renew then spends enough to offset the total cost of the retention bonuses. Using an experiment, the credit card issuer might learn that a $50 retention bonus generates higher net profit than a $25 bonus. However, this simple example raises the question of whether a lower ($20), intermediate ($40), or higher ($60) retention bonus might be even more profitable. Unless the card issuer includes $20, $40, and $60 reward levels as part of the original experiment, it is necessary to interpolate or extrapolate customer spending and promotion costs to these new reward levels.1
One option available to the card issuer is to regress customer spending on the reward levels included in the experiment and use a linear extrapolation to predict spending at other reward levels. But such an approach might ignore how different reward levels affect the mix of customers retained by the promotion and how average spending after renewal varies (i) across different mixes of customers for the same promotion and (ii) within the same mix of customers for different promotions. For example, customers who renew their cards when offered larger bonuses might be more price-sensitive than those who renew when offered smaller bonuses. If retention rates and the average spending among retained customers are both linear functions of the reward offered, then a linear extrapolation might make sense. But even small changes to the mix of retained customers might correspond to large differences in postretention spending, contradicting the assumed linearity and thereby biasing predictions. Moreover, unobserved factors that simultaneously affect (i) which customers satisfy the terms of promotion (e.g., renewal) and (ii) the outcome of interest (e.g., postrenewal spending) can further complicate the task of extrapolation from experimental results and, thus, the optimization of the promotion.
The goal of this paper is to clarify the nature of these complications, demonstrate how ignoring them can lead to suboptimal promotions, provide practical guidance for how to overcome these issues, and illustrate our approach with an empirical application. We consider the optimal design of promotions in settings in which (i) the promotion’s features and unobserved customer heterogeneity jointly determine the mix of customers whose behavior satisfies the terms of the promotion, and (ii) this particular mix of customers determines the incremental, net profit generated by the promotion. We refer to such promotions as compliance promotions because they are characterized by what is termed two-sided noncompliance in field experiments: customers who are not offered the promotion may nevertheless behave in a way that satisfies its terms, and customers who are offered the promotion might not take advantage of it.
Compliance promotions in marketing are commonplace. The credit card retention offer is one example. Another is coupons: only a fraction of consumers who receive coupons end up using them, those who do redeem coupons may differ from those who do not, and some customers who redeem coupons might have purchased even without the coupon (Venkatesan and Farris 2012, Noble et al. 2017, Ghose et al. 2019). Another example is pharmaceutical detailing, in which physicians who consent to meet with detailers might already exhibit prescribing behavior that differs from physicians who decline to meet (Narayanan and Manchanda 2009, Montoya et al. 2010). Loyalty reward promotions are yet another example, as only a small portion of customers reach the promotion’s purchase threshold as a consequence of being offered a reward, with the remainder reaching or not reaching the threshold irrespective of the reward offer (Kumar and Reinartz 2018). Our empirical application (Section 2) is based on a major hotel chain’s field experiment of loyalty reward promotions.
For these and other compliance promotions, the incremental margin (revenue less selling costs) from the promotion is generated entirely (or nearly so) by the subset of customers who would satisfy the terms of the promotion if they are offered the coupon, meeting, or reward but who, without the offer, would not satisfy the terms of the promotion. In the literature on experiments with two-sided noncompliance, these individuals are referred to as compliers.2 Customers who meet the terms of the promotion regardless of whether they are offered it are referred to as always-takers in the literature. Although their behavior does not affect the gains from the promotion, it may contribute to the promotion’s fixed costs (e.g., if all customers making a purchase receive a reward).
Thus, central to the task of optimizing most compliance promotions with data from field experiments is (i) obtaining consistent estimates of the proportions of compliers and always-takers, and the average change in the outcome variable (e.g., margin from purchases) among compliers, and (ii) extrapolating these estimates to promotions that were not part of the original experiment. We outline an approach to identify when the extrapolation of these causal effects is feasible. We further show that, whereas some approaches to extrapolation may be appropriate when compliance is not a concern, they are usually ill-suited for compliance promotions. Such approaches are typically predicated on an unconfoundedness assumption: that there are no unobserved factors that simultaneously affect (i) whether a customer satisfies the terms of the promotion, and (ii) the (potential) outcomes that arise if that customer does or does not satisfy the promotion’s terms. We show, theoretically and empirically, that failing to account for these unobserved factors can lead to biased predictions and suboptimal promotion designs.
There are situations in which optimizing compliance promotions using data from a field experiment is not as complicated as we have characterized. First, if one can observe (in data) all of the factors that are believed to jointly affect whether a customer meets the terms of a promotion and their potential outcomes, then the unconfoundedness assumption is appropriate. One can then apply a wide array of tools to estimate (and extrapolate) causal effects and use these to find a set of optimal promotions (see Powers et al. 2018, Jacob 2020 for recent summaries). Second, in settings in which the optimality of a promotion is invariant to the mix of customers (within each segment) who meet the terms, then one can sidestep issues of two-sided noncompliance entirely by focusing on intent-to-treat (ITT) effects. Hitsch and Misra (2018), for example, use a randomized experiment to consider the effect of catalog targeting on profits across customer segments. Whether customers use a catalog is likely confounded with how much profit they generate, but due to random assignment within the experiment, the intent-to-treat effect of mailing catalogs is not confounded with profits. Moreover, the retailer incurs the cost of the promotion when it mails a catalog, not when the customer uses it to make a purchase. Because the effect of catalog use on compliers’ spending is accounted for by the ITT effects and the promotion’s costs are independent of which customers are compliers, the task of finding an optimal promotion can proceed by (i) ignoring compliance altogether by assuming that the unobserved factors driving catalog use have the same effect on spending for all customers, and (ii) extrapolating incremental intent-to-treat effects to new groups of customers. Third, and perhaps trivially, on rare occasions, there may be only a finite set of promotions under consideration. If one can design a field experiment measuring the net profit from each promotion (plus a control condition of no promotion offer), then the optimal design can be derived directly from the experimental data without the need for extrapolation. In all other settings—in which unobserved customer heterogeneity jointly affects who meets the terms of the promotion and their outcome of interest, the profitability of the promotion depends on this mix of customers, and extrapolation outside the cells of a field experiment is desired—the complications just described are consequential.
1.2. Challenges to Optimizing Compliance Promotions
To illustrate our approach, consider a hypothetical supermarket loyalty promotion. In this promotion, customers who spend at least $50 during a shopping trip in the next week earn a $10 discount on tickets to a popular, local theme park. The net profit from the promotion depends on two factors. One is the increase in store margin generated by compliers—customers who spend $50 or more, but only because they are offered the promotion (otherwise, they would spend less than $50). The other factor is the cost of the ticket discounts, which depends on the total number of customers spending $50 or more regardless of whether they are compliers or always-takers. The optimal design of this compliance promotion considers (i) the minimum level of spending (the hurdle) and (ii) the size of the discount on tickets (the reward). The optimal targeting of this promotion considers which designs are best for which customers. The objective of the firm is typically to first choose the target and then optimize the design of the promotion offered to each target in order to maximize short-term profits.
Our approach assumes that the firm is able to conduct a randomized field experiment among a random sample of (potentially) targeted customers. In this field experiment, customers are randomized so that some are offered a promotion (the offer group) and the remainder are not offered a promotion (the control group). Within the offer group, customers are further randomized to be offered promotions with different features. In the supermarket example, the store might randomize the hurdle (minimum spend) over the values and the reward (ticket discount) over the values . After the experiment, the store observes for each customer (i) which promotion they were offered, if any, (ii) their promotion status—whether their behavior met the terms of the promotion (spend hurdle, D = 1) or not (D = 0), regardless of whether they were in the offer or control group, and (iii) their outcome—how much store margin they generated during the promotion, Y. The causal effects of primary interest are, for each promotion design, the changes in margin among compliers and the total cost of the rewards. Furthermore, the store wants to extrapolate these causal effects to hurdles and rewards that were not in the experiment. As we show, the supermarket in this example faces several challenges when attempting to extrapolate its outcomes.
The first challenge arises due to the unobserved factors that affect both promotion status (D) and outcome (Y), as this unobserved confounding raises the risk of selection bias. To address this issue, we use methods based on marginal treatment effects (MTEs) and instrumental variables for the estimation and extrapolation of causal effects in the face of unobserved heterogeneity and experimental noncompliance (Heckman and Vytlacil 1999, 2005; Mogstad and Torgovitsky 2018; Mogstad et al. 2018). An MTE is the expected causal effect of a treatment (e.g., a change in promotion status) on an outcome of interest (e.g., spending) conditioned on a specific realized level of observed and unobserved customer heterogeneity. The MTE serves as a building block for defining, estimating, and extrapolating more complex treatment effects (Heckman and Vytlacil 2007). A causal effect that is especially relevant for compliance promotions is the policy-relevant treatment effect (PRTE; Heckman and Vytlacil 2001). The PRTE of interest in the supermarket example is the incremental net profit from offering a promotion with a particular hurdle and reward (versus not offering a promotion). This PRTE is increased by the incremental margin among customers who comply with the promotion offer and decreased by the reward costs of the promotion.3 By comparing (extrapolated) PRTEs for promotions with different features, we can identify a subset of promotions that yield the greatest expected incremental profit.
A second challenge is that, for some features of a promotion, it might not be possible to extrapolate causal effects to levels that were not part of the original experiment. In particular, it is only possible to extrapolate causal effects over promotion features that are valid instrumental variables for compliance status. That is, these promotion features need to satisfy an exclusion restriction, meaning they have a direct causal effect on which customers comply with the promotion but, otherwise, no direct effect on the outcome. In the supermarket example, the size of the ticket discount arguably has such a property: it provides an incentive to spend enough to reach the hurdle but, otherwise, has no meaningful direct effect on spending. For example, we do not expect customers willing to spend $50 to receive a $10 theme park discount to spend more than $50 if they were instead offered a $15 theme park discount. By contrast, the spending hurdle in this example does not have this property: it not only affects the incentive to meet the terms of the promotion, but also has a direct effect on how much customers spend. For example, some customers who spend $50 to receive the tickets might spend $60 if their offer had $60 as the hurdle.
One key insight about this second challenge that emerges from our approach is that it is not possible to extrapolate causal effects for promotion features that have a direct effect on the outcome of interest, even if the levels of those features are manipulated experimentally. Our approach, thus, entails extrapolating causal effects over promotion features that are (conditionally) excludable from the outcome (i.e. valid instruments) to levels not observed in the original experiment. Yet these extrapolations are, by necessity, conditional on a particular instance of all other variables that do not satisfy the exclusion restriction. In the supermarket example, the store can estimate, conditional on any spending hurdle found in the experiment, an extrapolated PRTE at reward levels not included in the original experiment. Moreover, it can compare these PRTEs across different spending hurdles to obtain the promotion with the highest expected profit. But it cannot use the results of the experiment to extrapolate the profitability of promotions with hurdles that were not in the original experiment.
Given the subtlety of this restriction on which promotion features can be used to extrapolate causal effects, we believe it is frequently violated in practice. To underscore this point, we offer two examples of experimental contexts in which extrapolation without an instrumental variable is employed under the assumption of unconfoundedness. The first example comes from Danaher (2002), who considers a subscription plan for a telecommunications product. Analyzing a field experiment to assess the effects of access fees and per-use charges on retention and usage levels, Danaher (2002) extrapolates both beyond the manipulated levels to optimize plan pricing. In this context, retention corresponds with D and postretention usage with Y; access fees and per-use charges are the randomized promotion features. The unconfoundedness assumption translates to no unobserved variable affecting both retention and postretention usage. If unconfoundedness holds, extrapolation over promotion features is possible. If unconfoundedness does not hold, extrapolation can still be valid over promotion features that are valid instruments. Whereas the access fee might well be a valid instrument (affecting retention but not directly affecting usage), one can argue that the per-use charge has a direct effect on usage, in which case the per-use charge is not a valid instrument and, therefore, unsuitable for extrapolation. A more recent example comes from Tian and Feinberg (2020), who explore the effect of duration-based discounts for subscriptions to an online dating site, on which the price levels for different duration subscription plans are manipulated experimentally. In their context, the subscribing decision is D, and the duration of the plan that is purchased is Y. The unconfoundedness assumption translates to no unobserved variable affecting both the decision to subscribe and the subscription plan that is purchased. If unconfoundedness does not hold, extrapolation can still be valid over promotion features that affect the purchase decision but not the chosen plan. But, if the price schedule has a direct effect on the chosen plan, then it is not a valid instrument and, therefore, unsuitable for extrapolation. In both examples, causal effects are extrapolated over variables that potentially alter (i) the composition of customers complying with the promotion (via D) and (ii) the causal effect among that same group of customers (via Y). Experiments with noncompliance due to unobserved heterogeneity cannot separately identify these extrapolated effects without assuming away the unobserved heterogeneity, which is a very strong behavioral assumption.
A second set of insights from our approach pertains to the first challenge regarding the role of unobserved heterogeneity on optimal promotional design. Our empirical application illustrates how ignoring unobserved heterogeneity in treatment effects leads to recommended optimal incentives offered that are as much as 50% too large and predicted profits 100% too high. Likewise, commonly invoked assumptions, such as (i) outcomes being monotone increasing with unobservables or (ii) the marginal effects of the unobservables on outcomes being independent of the observables (e.g., Heckman 1979, Lee 1982) are not supported by the data. Invoking either of these assumptions also leads to suboptimal promotional design.
In sum, our approach consists of conducting an experiment to vary the promotional design, specifying the design parameters that are conditionally excludable and, thus, can be extrapolated, applying an MTE approach to accommodate unobserved confounding, and using the resulting causal estimates to design optimal promotions for each target. We describe this approach and illustrate its value with an empirical example in which an international hotel chain seeks to optimize a compliance promotion using data from a field experiment.
1.3. Related Literature
A growing literature in marketing pertains to the extrapolation of causal effects for targeted promotions, whereby causal effects are estimated from field experiments and then extrapolated to customer segments not directly represented in the original experiment (Ascarza 2018; Hitsch and Misra 2018; Simester et al. 2020a, b; Dubé and Misra 2022). Our focus is instead on extrapolating causal effects to new promotion designs in settings in which there is unobserved customer heterogeneity and compliance with the promotion is not mandatory.
We apply our approach to optimizing compliance promotions in the context of loyalty promotions (see Kumar and Reinartz 2018 for a summary). Overall, our research differs from this stream of loyalty promotion research on a number of dimensions. First, our emphasis is on optimizing the design and targeting of short-term loyalty promotions within a loyalty program rather than optimizing the terms of long-term loyalty programs. Second, our approach relies on field experiments to estimate the effects of different promotion features on outcomes for which, owing to the practical limitations of such experiments, extrapolation is necessary.
Our empirical application is most similar to that of Wang et al. (2016), who study the effects of promotion compliance on purchase behavior after the promotion has ended. Their analysis entails measuring how different spending hurdles affect the total length of hotel stays during and after the promotion period. They experimentally manipulate spending hurdles and recover causal effects on hotel stays using (i) a regression model of promotion status as a function of customer demographics and (ii) a Tobit model of nights stayed as a function of promotion status. Our application instead focuses on optimizing promotions using similar experimental data and with a more robust approach to extrapolation (e.g., an implicit assumption in Wang et al. (2016) is that the effect of unobservables on outcomes is independent of hurdle levels, implying that the marginal customers attracted by different hurdle levels are identical). In addition, our empirical application emphasizes the optimization of rewards, conditional on hurdles, and we argue that the extrapolation over hurdles in Wang et al. (2016) is not valid in this setting.
A third stream of relevant research relates to the estimation of MTEs using instrumental variables (Heckman and Vytlacil 1999, Mogstad et al. 2018). We build on this research in several ways. First, we outline conditions under which it is possible to extrapolate causal effects from field experiments involving compliance promotions and illustrate these using Pearl’s (2009) causal framework. Second, we adapt MTE estimation approaches to the context of compliance promotions and demonstrate the suitability of these approaches in these contexts. We find that approaches predicated on more restrictive assumptions about the impact of unobserved factors on outcomes (analogous to the restrictions embedded in the Heckman selection model; Heckman 1979, Lee 1982) perform poorly in our empirical context, incorrectly doubling predicted profit lifts. Third, we couple the MTE estimation of causal effects with optimization in the context of compliance promotions to enable their use in marketing settings in which not only customer margins, but also promotion costs depend on the mix of customers who comply with the promotion. Fourth, we prove that commonly used linear extrapolation approaches in experimental contexts, such as linear regression assuming unconfoundedness or linear extrapolation of nonparametric intent-to-treat effects, are likely to be badly misspecified in all but the most exceptional circumstances.
An alternative method for extrapolation and optimization is to specify a structural model of customer behavior and estimate it with experimental data (e.g., Dubé et al. 2017, Tian and Feinberg 2020, Dubé and Misra 2022; see also the discussion in Nelson et al. 2020). Compared with the MTE approach, structural estimation can lead to greater generalizability by enabling causal extrapolation over all features of the promotion to new environments and even to new populations. But this increase in generalizability can come at the cost of internal validity, as the “structure” in a structural model typically entails strong assumptions about causal mechanisms. There is also the risk that causal effects estimated from a structural model might depend critically on assumptions that are neither obviously correct nor central to the main research question. Because the MTE approach, with its focus on selection, does not make explicit the types of assumptions needed to ensure policy invariance to new populations and environments, it can lack in external validity. However, the MTE’s internal validity can be higher because its implied policy insights are not dependent on the appropriateness of such assumptions, which often cannot be tested (Heckman and Vytlacil 2005). If the goal is to optimize a particular compliance promotion, generalizability is less important than internal validity, and the MTE approach has an advantage owing to its weaker assumptions.
In summary, the marketing literature is replete with field experiments involving compliance promotions but bereft of a general framework to analyze them. Studies that optimize compliance promotions by extrapolating causal effects outside the bounds of the experiment do not discuss whether the data from the field experiment can identify the extrapolated effects. Even in cases in which extrapolation is potentially valid, causal effects are often estimated using restrictive assumptions about the nature of unobserved customer heterogeneity. This study advances our understanding of compliance promotions and provides concrete guidance for how to estimate and extrapolate causal effects in this common but complex setting.
2. Data and Descriptive Analysis
Throughout the paper, we use a field experiment involving a compliance promotion offered by a major hotel brand to exemplify the extrapolation of causal effects. Our approach is intended to generalize to most compliance promotion settings, so the context is predominantly illustrative of the approach. This section first details the empirical setting and field experiment, and reports descriptive analyses showing that the promotional design has a substantial effect on customers’ purchases. We then apply a simple, nonparametric approach to optimizing promotional designs within the cells of the experiment, showing profit gains across the experimental cells. The findings from this analysis serve to motivate the potential to improve profits further via a valid extrapolation approach, which we describe in the following sections of the paper.
2.1. Example Empirical Setting
We exemplify our extrapolation procedure with a reward promotion conducted by InterContinental® Hotels Group (IHG®), a prominent multinational hospitality company with $4.6B in annual revenue during 2019.4 IHG’s 16 brands include Holiday Inn®, Crowne Plaza®, and InterContinental. IHG offers a loyalty program that customers may join for free.
Customers in this program are sorted into four categorical loyalty tiers, T, based on their past stay and spending behaviors across all IHG brands, with higher tier customers receiving premium services. In addition, for the experiment period, IHG uses a proprietary algorithm to predict an integer number, B, of each customer’s expected stays at a specific IHG hotel brand (henceforth hotel chain A) in the absence of the promotion, called a “baseline.” Whereas tier T categorizes overall spending at the IHG brand, baseline B categorizes stays at hotel chain A. Both are used to customize promotional offers as we describe shortly. Among the various specific promotional offers available to its loyalty program members, we consider compliance promotions wherein customers are offered R bonus reward points if they reach a hurdle of staying at H (or more) different hotels of hotel chain A within a defined time period. The bonus points can be redeemed for benefits such as future stays, digital goods such as ebooks or games, and merchandise (each 1,000 points are worth about $6.80). Customers self-select into meeting the terms of the promotion by reaching the hurdle (D = 1) or not (.
2.2. Field Experiment
In this section, we describe the experimental design and summarize the outcomes in terms of stays and profits.
2.2.1. Experimental Conditions and Randomization.
In 2017, IHG launched a four-month field experiment by assigning 23,583 randomly selected customers into an offer group (16,034 customers) and a control group (7,549 customers).
2.2.1.1. Offer Group.
Customers in the offer group are offered a reward promotion of the form “Receive R bonus points for staying at H different hotels of hotel chain A within the next four months.” These customers are further randomized within experimental blocks defined by their combined values of baseline B and tier T to receive different promotions. Promotions differ in their values of H and R, and within each block of baseline and tier, up to four promotions are offered. For example, a customer in tier T = 2 with predicted baseline visits of B = 3 might be offered a promotion with H = 4 and R = 13,200, meaning they receive 13,200 bonus points if they stay at four or more different hotels of hotel chain A over the four-month duration of the experiment. As Figure 1 illustrates, customers within the higher baseline groups are offered promotions with higher hurdles (and correspondingly more bonus points) as it would otherwise be too easy for them to surpass the hurdle and earn the reward.

Note. Multiple circles are portrayed side by side in cases wherein the same hurdle and reward are offered to customers with different baselines and/or loyalty tiers.
2.2.1.2. Control Group.
Customers in the control group are not offered a reward promotion, so R = 0, and H is undefined. For the purpose of comparing customers in the offer and control groups, however, it is advantageous to define a value of H for customers in the control group. Thus, we randomly sample values of H for customers in the control group, using the empirical distribution of hurdle levels among customers in the offer group with the same baseline and tier. Conceptually, customers in the control group meet the terms of the promotion if they stay at H or more hotels, but they receive R = 0 points.
2.2.1.3. Randomization.
Approximately 68% of the sample is assigned to the offer group within each of the 51 experimental blocks defined by unique pairs of baseline and tier. As a randomization check, we conduct a test of the null hypothesis that the distribution of baseline-tier blocks is the same within the offer and control groups. The test cannot reject the null hypothesis, showing no evidence of an imbalance in random assignment (, p = 0.0902).
Table 1 presents key statistics for baseline, tier, hurdle, and reward. With coefficients of variation around one, there appears to be sufficient variation to assess how these variables affect the outcome variable, stays.
|
Table 1. Summary Statistics for Stays, Baseline, Tier, Hurdle, and Reward
Mean | Standard deviation | Minimum | Maximum | |
---|---|---|---|---|
Baseline, B | 2.67 | 2.04 | 0 | 17 |
Tier, T | 2.36 | 0.89 | 1 | 4 |
Hurdle, H | 3.69 | 2.03 | 2 | 19 |
Reward (1,000s points), R | 6.22 | 5.54 | 0 | 54 |
Notes. Tiers 1–4 contain 14.7%, 47.9%, 24.1%, and 13.3% of customers, respectively. The value R = 0 corresponds with the absence of an offered promotion in the control group.
2.2.2. Experimental Outcomes for Hurdle Achievement and Stays.
Across the sample, the average number of stays is 2.02 (, ), but average stays varies with experimental assignment and baseline–tier. To illustrate this point, Figure 2 depicts the difference in average stays and rates of hurdle achievement between offer and control groups for each baseline–tier–hurdle combination. The preponderance of positive values in Figure 2 indicates that the offer group is more likely to reach the hurdle than the control group (left), and accordingly, the offer group has a higher average number of stays (right). Controlling for each baseline-tier experiment block, the average rate of hurdle achievement is 2.6 percentage points higher in the offer group (two-sided z = 4.825, ), and average stays are 0.09 higher (two-sided z = 3.341, p = 0.0008).

Notes. The y-axis shows the difference in proportion of customers reaching the hurdle (left) and average stays (right) between the offer group (O) and control group (C). The x-axis groups observations by the difference between the hurdle and the baseline (H – B). Points are horizontally jittered.
2.2.3. Profits from Promotions.
We next illustrate the extent to which the experimental outcomes can be used to improve the profitability of loyalty promotions. We first construct a PRTE. As shown in Appendix A, this PRTE is derived from a more primitive conditional local average treatment effect (CLATE) among customers with baseline B and tier T who comply with a promotion with hurdle H and reward R. Focusing for now on values of R and H that were included in the experiment, this PRTE is (i) the expected change in margin resulting from customers complying with a promotion with hurdle H and reward R minus (ii) the expected cost of the reward points for the promotion. Conditional on a target set of customers and hurdle, denoted , the PRTE for a promotion with a reward of R = r is
The steps involved to compute this estimator are as follows:
Obtain expected margins per stay, mgn, and the expected cost per reward point, cpp, from IHG.
Estimate the total profits, , for each X = x and R = r in the experiment (including R = 0, using observations from the control group). These estimates are based on average stays, Y, and promotion status, D, observed in the data.
For each cohort of customers with the same X = x, compare the average profit across reward levels, R, to ascertain which reward level from the experiment yields the highest expected profit for the target, .
Compute the counterfactual total profits if all customers were assigned to the most profitable reward level for their cohort, .
Finally, compare the profit levels obtained from step 4 to the profit levels if all customers were assigned to the control group to evaluate the profit lift, .
Table 2 presents the results of this analysis in terms of profit lift. With 85 baseline–tier–hurdle cohorts, we focus on 10 cohorts to facilitate exposition and average the results across the remaining baseline–tier–hurdle cohorts (weighted by the number of customers in each cohort). Overall, redesigning the reward promotion yields a profit gain of 7.73%, showing substantial potential for profit optimization within the range of the experimental variation.
|
Table 2. Profit Gains from Reassigning Customers to the Most Profitable Reward Levels
Baseline | Tier | Hurdle | Profit gain percentage | Number of customers |
---|---|---|---|---|
1 | 1 | 2 | 2.77 | 2,629 |
1 | 2 | 2 | 2.32 | 4,141 |
1 | 3 | 2 | 14.30 | 725 |
1 | 4 | 2 | 0.00 | 127 |
2 | 3 | 2 | 23.45 | 152 |
2 | 4 | 2 | 9.80 | 114 |
2 | 1 | 3 | 0.00 | 705 |
2 | 2 | 3 | 1.84 | 4,297 |
2 | 3 | 3 | 9.45 | 1,255 |
2 | 4 | 3 | 18.59 | 285 |
Weighted profit gain percentage | Number of customers | |||
Baselines 1 and 2 only | 3.95 | 14,430 | ||
Other baselines | 13.71 | 9,153 | ||
All customersa | 7.73 | 23,567 |
Notes. Detailed results are shown for 14,430 customers with the two most common values of baseline (1 and 2), whose average profit gain is 3.95%.
aNine cells comprising 16 customers are not included because of undefined lift statistics.
To further illustrate the insights from the nonparametric PRTEs, we focus on the case of B = 2, T = 2, and H = 3 (the modal cohort in the experiment). The results from this target set of customers are captured in Table 3. Comparing the two rows, we observe that offering 10,800 points (instead of 7,200 points) yields higher profits ($1.36 per customer in the segment), stays (1.36 visits per customer), and the proportion of customers meeting the hurdle (18.5%).6
|
Table 3. Profit Lift for B = 2, T = 2, and H = 3
Points in 1,000 (r) | CLATE | PRTE | ||||
---|---|---|---|---|---|---|
7.2 | 1.34 | 1.30 | 0.166 | 0.147 | 1.96 | $0.95 |
10.8 | 1.36 | 1.30 | 0.185 | 0.147 | 1.54 | $1.36 |
Notes. All expectations and causal effects are nonparametric estimates. CLATE is the conditional local average treatment effect in total stays from offering R = r points versus R = 0 points. The CLATE is defined as with conditioning on B = 2, T = 2, and H = 3 suppressed in the notation. The PRTE is expressed per customer in the segment with mgn and cpp scaled to preserve anonymity.
Yet the range of reward points in the experiment is limited, and the findings raise the question of whether profits would continue to increase if points were increased beyond 10,800. Alternatively, it may be the case that the optimal level of rewards lies between 7,200 and 10,800 bonus points. There may be considerable potential to further improve outcomes via extrapolation, making it necessary to determine when and how extrapolation can be used in experimental contexts. We discuss extrapolation in the context of general compliance promotions next.
3. Approach
In this section, we describe our approach to extrapolating causal effects from compliance promotions and illustrate its application in the context of IHG’s loyalty reward promotion experiment. We begin with a general description of compliance promotions as viewed through the lens of MTEs (Section 3.1). Our purpose is twofold: (i) to explain the general structure of the problem and our approach to modeling it and (ii) to acquaint readers who are not already familiar with MTEs to this approach. We then discuss how we apply this procedure to IHG’s experiment (Section 3.2). IHG’s experiment provides a canonical example of how some variables are valid for extrapolating causal effects and others not, so we illustrate these general concepts in that empirical context. We then further specify the empirical model, which is used to estimate and extrapolate causal effects outside the bounds of the original experiment, and to detail how to use these extrapolations to find the most profitable promotional design for any given target segment (Section 3.3). We conclude this section by considering the conditions under which a linear extrapolation of intent-to-treat effects might be valid (Section 3.4).
3.1. Marginal Treatment Effects
We model a customer’s response to an offer under a compliance promotion in terms of an outcome of interest, Y; a promotion status indicating whether the terms of the promotion are met, D; and a variable representing unobserved customer heterogeneity, U. In presenting this model, we adapt and describe the MTE approach from Mogstad et al. (2018) to the context of compliance promotions (interested readers are referred to Mogstad and Torgovitsky 2018, Mogstad et al. 2018 for a more complete treatment of the MTE approach).
3.1.1. Selection into Promotion Status and Effect on Outcomes.
We assume that meeting the terms of a compliance promotion is a binary state with D = 1 when the terms are met and D = 0 otherwise. We also reemphasize here that a customer who is not offered a promotion can nevertheless have a promotion status of D = 1 if the customer’s behavior satisfies the terms of the promotion. We consider two, unobserved, potential outcomes of interest (Rubin 1974): Y1, the outcome observed if D = 1, and Y0, the outcome observed if D = 0. Every customer’s observed outcome, Y, is then related to these potential outcomes through a switching equation:
Only Y1 or Y0 is observed for any customer—never both. Thus, individual-level causal effects are not observed in the data, and we focus on estimating and extrapolating average causal effects among subsets of customers.
The selection equation for promotion status can be written compactly as a function of a customer’s unobserved heterogeneity and a propensity function reflecting the customer’s conditional probability of meeting the promotion’s terms (Heckman and Vytlacil 1999, 2007):
The propensity function, p(X, Z), depends on two sets of variables, X and Z. First, X is a set of variables affecting both promotion status and outcomes. Second, Z is a set of variables that can weakly increase (or decrease) the likelihood of meeting the promotional terms, but otherwise has no direct effect on the outcome (i.e., Z contains instruments that are conditionally excludable from Y given X). As is standard in the literature, the variable U is normalized (for identification purposes) to be uniformly distributed between zero and one, conditional on X (see Heckman and Vytlacil 2007, section 3, for further discussion). In a field experiment, X includes observed sources of customer heterogeneity plus any randomized promotion features that can have a direct effect on D and Y, and Z includes experimentally manipulated promotion features whose effects on Y are fully mediated through D.
Under the standard no-defiers assumption, Equation (3) implies that higher values of U correspond with lower rates of meeting the promotion terms, which is standard in the literature on experiments with two-sided noncompliance.7 In the context of compliance promotions, the unobserved U typically affects both the promotion status and the outcome as customers who meet the terms of a promotion are typically different from those who do not. As discussed previously, in such a case, customers’ promotion statuses are confounded with their outcomes.8
3.1.2. Marginal Treatment Effects and Marginal Treatment Response Functions.
The MTE function provides the foundation for the extrapolation approach we use. The marginal treatment effect (Heckman and Vytlacil 1999, 2005) is defined as
It is the expected difference in the outcome resulting from a change in promotion status for a customer with observed characteristics X = x and unobserved characteristics U = u. Because U varies continuously, Equation (4) implies that causal effects can vary on the margin for any customer represented by a realized value of the pair {x, u}.
The MTE function can be rewritten as the difference in expected potential outcomes, Y1 and Y0. These expectations are called marginal treatment response (MTR) functions and are defined as
3.1.3. Target Parameters.
We next discuss how MTRs are related to the causal effects we want to estimate and the data generated by the experiment. Specifically, MTRs can be seen as building blocks for constructing target parameters, which include many causal effects (e.g., typical estimands, such as the average treatment effect, conditional average treatment effect (CATE), local average treatment effect (LATE), and PRTE) and empirical moments (e.g., typical estimators, such as treatment coefficients from ordinary least squares (OLS) and instrumental variable (IV) regressions). Denoting these target parameters β and indexing them according to their respective estimators or estimands by s, all of the causal effects and empirical moments discussed in this paper can be expressed using the following weighted function of the MTRs:9
The choice of weights, , determines the causal effect or empirical estimand, s, that is represented by βs.
To provide intuition on the role the weights play in defining the target parameters, consider the example of the average treatment effect on the treated (ATT), . The weights for the ATT are (Heckman and Vytlacil 2005). As u decreases, meeting the terms of the promotion becomes more likely, and the numerator for increases. Correspondingly, as increases, the weight placed on m1 increases. Accordingly, the ATT target parameter places greater weight on the expected outcomes among those who are more likely to self-select into treatment as one expects.
Most common causal effects and empirical estimands can be expressed as weighted averages of the MTR functions using Equation (6). This suggests a way to construct bounds or estimates for the MTRs. By matching various estimators, such as estimated IV coefficients, with the definition of their estimands based on Equation (6), one can bound or estimate the underlying MTRs. The recovered MTRs can then be used to construct bounds or estimates of causal effects of interest, such as the CLATE or PRTE.10
To illustrate, consider the IV estimator for D obtained by regressing Y on D with a scalar instrumental variable Z. Expressing the IV regression coefficient for D as a target parameter yields
The expectation of Z and the covariance of D and Z can be estimated from the data, the indicators and can be computed from an estimated propensity function, and the term on the left-hand side can be estimated from an IV regression. Hence, the only unknowns in Equation (7) are the MTR functions, and , which can, in many cases, be recovered by matching the estimated value of with its definition in Equation (7). As we discuss next, multiple empirical estimators can be used to construct moment equations for estimating or placing restrictions on the MTRs.
3.1.4. Relating Target Parameters Representing Empirical Moments and Causal Effects.
Given an empirically restricted subset of MTRs, and , and an estimated propensity function, , it becomes possible to estimate or bound many common causal effects. For typical compliance promotions, the causal effect of interest is the conditional on X = x local average treatment effect (CLATE). The CLATE is especially relevant for compliance promotions because it represents the efficacy of the promotion among customers affected by the promotion: those who are sensitive to a change in the promotion feature Z from z0 (the value Z takes on in the absence of a promotion) to a different level (under the no-defiers assumption of ). The CLATE is equal to , where defines the marginal subset of customers in a specific range of unobserved heterogeneity, U, who select into becoming compliers with a change in Z from z0 to z. That is, this group does not meet the promotion terms when , but does when Z = z. Importantly, this subset excludes all noncomplying customers, who do not contribute to the incremental margin from the promotion as they have either (i) , meaning they will meet the promotion terms anyway (always-takers), or (ii) , meaning they will not meet the terms of any promotion with (never-takers). The CLATE can also be expressed as a target parameter in the form of Equation (6) (Heckman and Vytlacil 1999):
Recall, in Equation (7), the target parameter (the IV coefficient for D) was known, but the MTRs were not known. In Equation (8), the target parameter (CLATE) is not known, but the MTRs are known (in the sense that the MTRs are restricted using, e.g., Equation (7)). In sum, the MTE approach first uses various estimators to recover or restrict the MTR functions and then uses the recovered MTRs to estimate or bound causal effects. And, perhaps most importantly, when estimating local average treatment effects, the values of z0 and z, which define the subset of complying customers for the promotion with Z = z, do not need to have been included in the original experiment. Rather, these values can represent interpolations or extrapolations of Z.
3.1.5. Estimation or Bounding of Causal Effects.
Causal effects can be estimated by applying the approach described in Mogstad et al. (2018). This procedure uses a numerical optimization routine to recover point estimates, or estimated upper and lower bounds, of causal effects such as the CLATE. These bounds (or point) estimates must respect (i) the restrictions that are placed on the MTRs by matching them to empirical moment estimators, such as the coefficients from IV and OLS regressions, and accounting for sampling error in these empirical moments, and (ii) the definition of the propensity function and its coefficient estimates.
Given an assumed functional form for the MTRs (which depends on the unknown parameter vector , estimating upper and lower bounds for a causal effect requires us to choose a set of empirical moments (e.g., regression coefficients), , which the MTRs must be able to generate (subject to minimal sampling error) through Equation (6). As discussed previously, this moment matching step restricts the MTRs to a subset of parameterizations, Θ. Just as with any method of moments procedure, by using multiple regression coefficients, one can make more information available to restrict the MTRs and the causal effect of interest.
After restricting the MTRs to a subset whose parameters lie in Θ, the upper bound for the (conditional on X) target parameter, (i.e., a causal effect of interest, such as the CLATE), can be estimated as
3.2. Application and Model Specification
In this section, we apply the MTE approach to our hotel loyalty example to illustrate its utility in the context of compliance promotions. We first discuss issues related to identification of causal effects in the context of compliance promotion experiments, using IHG’s experiment as a canonical example. We then specify the functional form and variables used in the MTRs and the propensity function. We conclude by detailing the specification of and rationale behind the empirical moments used to restrict the MTRs.
3.2.1. Identification
3.2.1.1. Overview.
To optimize the compliance promotion through extrapolation, we must map the variables X and Z introduced in Section 3.1.1 to IHG’s field experiment. Recall that X contains variables describing customers and promotion features that directly affect the promotion status D and the outcome Y, and Z contains any experimentally manipulated promotion features that are valid instruments for D (directly affecting promotion status but not the outcome). Because the CLATE in Equation (8) is defined over a change in Z given a value of X, it is not possible to extrapolate causal effects over variables contained in X.
Figure 3 depicts a directed acyclic graph (DAG; Pearl 2009) for the data in our application with arrows indicating the possibility of a direct causal effect and the lack of an arrow between nodes encoding an assumed absence of a direct effect. All variables shown in the DAG are affected by (i) customers’ predicted baseline stays in the absence of a promotion, B, and (ii) customers’ loyalty tiers, T. To simplify Figure 3, these implicit dependencies on B and T are not depicted. All variables in Figure 3 are observed except for U, which is depicted with an open circle. The unobserved confounding due to U is represented by the dashed arrows from U to D and Y. The number of bonus reward points R and the spending hurdle H can both have direct effects on promotion status D. Moreover, because the hurdle H establishes upper and lower bounds on the number of hotel stays for each promotion status, there is an arrow from H to Y, as Y is directly affected by H.

Notes. All causal effects are conditional on baseline B and tier T; these dependencies are not depicted. The variable U represents unobserved confounding that affects both promotion status D and the outcome Y. The effect of the reward R on Y is fully mediated through D and, thus, indirect. R is, therefore, a valid instrument for D, and extrapolation is possible to new values of R. The total effect of the hurdle H on Y includes an effect mediated through D as well as a direct effect. H is, therefore, not a valid instrument for D and so cannot be extrapolated.
3.2.1.2. Reward Bonus Points Are Assumed to Be Excludable.
There is no arrow pointing from R to Y because, in this empirical setting, bonus reward points, R, are plausibly excludable from customers’ potential outcomes, Y1 and Y0, conditional on the spending hurdle. Consider the potential outcome for hotel stays when a customer meets or exceeds the terms of the promotion, Y1. Regardless of whether the customer visits exactly hotels or exceeds the hurdle by visiting hotels, the customer still receives the same reward of R bonus points. Therefore, conditional on visiting at least H hotels, increasing or decreasing the reward has no economic effect on how many hotels customers visit. Similarly, the potential outcome for hotel stays when a customer does not meet the terms of the promotion, Y0, should also be unaffected by the points offered because a customer who does not reach the hurdle does not receive the reward. Conditional on not receiving the reward, increasing or decreasing the reward should have no impact on hotel stays. Rewards are, therefore, assumed to be a valid instrument for promotion status. We describe and address several potential threats to the excludability assumption in the online appendix.
3.2.1.3. Spending Hurdles Are Not Excludable.
As mentioned, the potential outcomes Y1 and Y0 are directly affected by the spending hurdle with and . Thus, H cannot serve as an instrument for promotion status, and we cannot estimate or extrapolate CLATEs comparing promotions with different spending hurdles. Rather, we can only estimate CLATEs for changes in bonus points, conditional on a particular spending hurdle. Even though one can ascertain intent-to-treat effects of hurdles on stays over the observed experimental cells, it is not possible to extrapolate these effects to spending hurdles that did not appear in the original experiment, even though spending hurdles were manipulated experimentally. Randomizing a promotion feature does not guarantee that it will be independent of potential outcomes.
3.2.1.4. Only Instruments Are Valid for Extrapolation.
If there is a manipulated variable that is not conditionally excludable from customers’ potential outcomes (i.e., it is not fully mediated by promotion status, D), then one cannot extrapolate causal effects related to that variable because the experiment does not generate any information relevant to the extrapolation. Stated differently, unless the analyst is willing to assume a promotion feature has a zero (or negligible) direct effect on the outcome or to impose a model of behavior that does not depend at all on the outcome of the experiment, then extrapolation of causal effects is not possible for that feature of the promotion.
To illustrate, consider an experiment with two experimentally manipulated hurdles, H = 2 and H = 4, and two experimentally manipulated rewards, R = 0 (no promotion) and R = 10,000. When extrapolating over R, the excludability assumption—in particular, the assumption that the value of Y1 for a customer who reaches the hurdle at the observed levels of R = 0 and R = 10,000 is the same as their value of Y1 at any other counterfactual value of R—is what enables extrapolation. In contrast, one cannot extrapolate a LATE for a promotion with H = 3 and R = 10,000 (or any other counterfactual value of H) even if one can predict the probability of reaching the H = 3 hurdle from an estimated propensity function. To see this, consider two relevant groups of customers who contribute to the LATE for H = 3. One group comprises customers who reach the H = 4 hurdle with an average value of Y that is observed in the experiment . We can reasonably assume these customers would also reach the H = 3 hurdle, but their value of likely differs from because fewer minimum stays are required to obtain the same reward. However, in this case, the experiment does not identify the counterfactual expectation of Y1 at the lower hurdle. Another group comprises customers who reach the H = 2 hurdle, and would also reach the H = 3 hurdle but not the H = 4 hurdle. Whereas the experiment enables us to estimate this group’s value of , there is no way to estimate their value of . Furthermore, the relative proportions of these two groups who would reach the H = 3 hurdle is also unknown. Appendix B provides a more detailed version of this example that further outlines the intuition behind the role of excludability in extrapolation and helps to illustrate a general principle for field experiments involving compliance promotions.
When designing experiments around compliance promotions, researchers must account for the restriction that extrapolation is only possible over variables that are valid instruments for compliance. In the example of IHG’s compliance promotion, this might mean (i) fully saturating experiments to include all spending hurdles under consideration and (ii) possibly omitting levels of bonus point rewards because these can be extrapolated.
3.2.2. MTR Functions.
As discussed, the hurdle H places a natural restriction on the upper and lower values of the potential outcomes for hotel visits: and (recall Y is integer valued). In order to incorporate this information during estimation, we model a transformed outcome variable, , leading to the restrictions and , which we impose during estimation as restrictions on the domains of the MTR functions: and .
To facilitate presentation of the model and without loss of generality, we decompose the MTR functions, md for , into two, additively separable functions,
Here, we mention two nested models (discussed in detail in Section 4) that we use to compare our approach to simpler but more restrictive alternatives. The first nested model ignores unobserved heterogeneity by assuming unconfoundedness. This nested model can be represented by defining , in which case . The second nested model accounts for unobserved heterogeneity but restricts its effects to be additively separable from effects due to the observed variables. This nested model can be represented by defining , in which case .12
We specify the first component of the MTR functions, μd, as
This specification is linear in baseline stays b and hurdle h and includes fixed effects for each loyalty tier (with tier 1 normalized at ). The MTR for d = 0 also includes a fixed effect for the special case when b = 0, , to account for a feature of the data: hotel stays Y and baseline predictions B are both bounded below at zero. Hence, for the subset of customers with zero predicted baseline stays, their realized average hotel stays are (weakly) greater than their baseline prediction. Without this dummy variable for the case when b = 0, regression coefficients for B, which are used to restrict the set of feasible values of in the MTR, underestimate the (positive) relationship between B and Y. We include this dummy in m0 as well as in the regression equations (which we specify in Section 3.2.5) but not in m1, as and the minimum value of H in the experiment is two.
Turning to the second component of the MTRs, νd, we specify a polynomial basis function of order three for u (French and Song 2014) with all powers of u interacted with baseline, hurdle, and tier,
The structure of this function is motivated by two considerations. First, the numerical optimization procedure of Mogstad et al. (2018) and the accompanying ivmte R package rely on the polynomial structure of the MTRs to efficiently calculate the integral in Equation (10) (Mogstad et al. 2018, Shea and Torgovitsky 2021, Torgovitsky and Shea 2021). Second, a third-degree polynomial of u is highly flexible. It allows for nonmonotonic marginal effects in the unobservable u and is capable of approximating a normally distributed unobservable, a common assumption in Heckman selection and switching regressions (French and Song 2014, Brinch et al. 2017).
The MTRs defined by Equations (11)–(13) balance parsimony with variation in the moments. To illustrate this trade-off, consider that Equation (12) does not include interactions between observed variables, such as baseline and tier. There are two considerations leading to this decision. First, the baseline term already contains information about tier because it is a prediction of expected stays in the absence of a promotion that is predicated on all available customer information. Hence, we do not expect that any residual variation in stays that would be accounted for by interacting baseline and tier is a first order feature of the data. Second, identification of two-way interactions in the MTRs requires us to empirically restrict the MTRs by including three-way interactions in the OLS and IV regressions (e.g., a two-way interaction between variables a and b in m1 is identified by a three-way interaction between a, b, and D in a regression; we describe these regressions in Section 3.2.5). As higher order interaction terms can overfit the data, we omit them from the model.
3.2.3. Propensity Function.
The propensity function isolates the effect of on meeting the terms of the promotion, D = 1, conditional on . We specify the following logit propensity function:
3.2.4. Summary of the MTE Model Specification.
The full model can be succinctly summarized as follows:
There is an unobserved variable, U, that affects (i) how many hotels would be visited if the hurdle were to be met, Y1; (ii) how many hotels would be visited if the hurdle were not to be met, Y0; and (iii) whether the hurdle is in fact reached, D (Equations (2) and (3)).
The expected change in hotel stays resulting from a customer with X = x and U = u complying with the promotion—the marginal treatment effect—is the difference in their marginal treatment responses when meeting versus not meeting the promotions terms, (Equations (4), (5), and (11)–(13)).
Whether the hurdle is reached further depends on the propensity function, , where contains the baseline and tier customer variables and the spending hurdle, and contains the reward level (Equations (3) and (14)).
We make two remarks about the model. First, another way to characterize the first point is that each customer’s observed outcome depends on (i) a realization of the random triple , whose distribution depends on X, and (ii) a realization of the promotion feature Z, which is randomly assigned. Second, whether U reflects static customer traits or transient purchase propensities is irrelevant to the econometric specification; U simply represents an indexable agglomeration of unobserved factors that arise simultaneously with potential outcomes.
3.2.5. Specification of Empirical Moments.
The model we just described encodes our assumptions about customers’ behaviors during the field experiment and, thus, the empirical data that the experiment produced. Next, we specify which features of the data (i.e., regression coefficients) should be used to restrict the MTRs and, thus, place bounds on the causal effects we seek to recover.
We consider empirical moments produced by three regression equations. The moments that are used to restrict the MTRs are a subset of coefficients from these regressions (some coefficients are nearly collinear across regressions; in such cases, we include only one). The first two regression equations are closely related: one is an IV regression and the other is its OLS counterpart; both are estimated using the entirety of the experimental data. The third regression seeks to characterize the behavior of customers in the absence of a promotion and is estimated via OLS, using only data from customers in the control condition.
The IV and OLS regressions share the following (second stage) structure:
This regression formula includes a main effect for baseline (as well as the dummy for b = 0), but it does not include main effects for customers’ loyalty tiers, as any effects of loyalty tier on hotel visits when not meeting the spending hurdle are already accounted for by the baseline prediction. This regression equation also includes a main effect for hurdle in order to account for the transformation . All other variables in Equation (15) are interacted with promotion status, d, and, thus, reflect heterogeneous effects (based on observed covariates) varying by hurdle, baseline, and tier (with tier 1 normalized at ).
In the IV regression, the endogenous variable d is instrumented using the variables defined on the right-hand side of the propensity function in Equation (14). The OLS version of the IV regression estimates the coefficients in Equation (15) without instrumenting for promotion status. Although the coefficients from an OLS regression are typically biased as causal estimates, they are nevertheless functions of the MTRs that can be observed in data and, thus, provide information about the MTR functions. Indeed, any discrepancy between an OLS and IV coefficient for the same treatment variable can potentially constrain the MTR functions at different levels of the unobserved variable, u.
The third regression, also estimated with OLS, seeks to capture the relationship between baseline predictions and hotel visits in the absence of a promotion (i.e., among customers in the control group). This regression is specified as
As in Equation (15), we include a dummy for B = 0 and hurdle because of the transformation . This regression is estimated using the subset of observations from the control group. We include this regression because the control group in the experiment is small relative to the offer group. Estimates from the other two regressions use the entire sample and need to account for differences among individuals in the offer group with different promotion statuses. These estimates by themselves might not adequately rationalize outcomes in the control group.
3.3. Optimization of Reward Points
Given estimates of the CLATEs, we seek to use them to optimize reward points. Recall from Equation (1) that the nonparametric estimate of profit from a promotion, as a function of reward bonus points and conditional on X, can be written as
This expression highlights a number of ideas. One is that the policy-relevant treatment effect can be decomposed into three parts: (i) the number of compliers affected by the promotion, (ii) the average increase in margin due to these compliers meeting the terms of the promotion, and (iii) the total cost of the promotion (which depends on the total number of customers reaching the hurdle when offered R = r points, regardless of whether they are compliers or always-takers.
Equation (17) also emphasizes that the primitive causal effect relevant to the promotion is the CLATE. Extrapolation of the PRTE depends on an appropriate extrapolation of the CLATE that is consistent with changes in the proportion of compliers at new values of R, as well as the cost of the promotion. Both of these depend on the same propensity function (per Equations (8) and (14)). Expressing the PRTE as a function of the ITT as in Equation (1) obscures the need for internal consistency between (i) the extrapolated margin lift from complying customers and (ii) the extrapolated cost of the promotion.
For values of R that were used in the experiment, one can calculate the PRTE nonparametrically using Equation (1) (as described in Section 2.2.3). To extrapolate the PRTE to values of R = r outside the experiment, we use (i) the MTE approach to estimate and (ii) logistic regression of the propensity function to estimate and . We perform this estimation for many values of reward points in a set of candidate values, . If the estimated PRTEs are point identified, then conditional on X = x, the level of rewards that leads to the highest profits can be found within this set,
If all the PRTEs are negative, then not offering a promotion is optimal. Should the MTE approach produce upper and lower bounds on instead of point estimates, then one can use a decision rule (e.g., minimax; see Section 4.4) to determine the optimal reward level.
We note several implicit assumptions and approximations underpinning the PRTE in Equation (17). First, consistent with IHG’s practice, we assume that the cost per reward point, cpp, is independent of promotion status, potential outcomes, and background variables (baseline and loyalty tier). Points can be redeemed for merchandise and future stays, and IHG assigns the same cost for reward points regardless of how they are redeemed. Of course, a generalization of Equation (17) could, for example, replace the terms with a conditional on X expected cost function. Second, and again consistent with IHG’s practice, we assume a similar set of independences for the margin per customer stay, mgn, and in particular, we assume that the margin per hotel visit is not endogenous with promotion status. This rules out situations whereby customers stay in cheaper hotels than they otherwise would during the promotion in order to meet its terms. Again, IHG uses a constant expected margin per stay for its internal calculations, but it is possible to generalize this approach by changing the outcome variable from hotel stays to total margin per customer.15 The degree of bias in our estimates resulting from these assumptions depends on the extent of heterogeneity in these margin and cost terms. If customers choose to stay in cheaper hotels, for example, we overestimate the profit effects of compliance.
3.4. When Linear Interpolation is Valid
A common practice in industry, as noted in the motivating example, is to perform a linear interpolation of intent-to-treat effects in order to optimize promotion features. As we show, linear interpolation is not generally a valid substitute for the use of MTEs and should be used with caution when designing compliance reward promotions.
Linear interpolation is most easily accomplished by using OLS to regress total stays Y on reward points R (conditional on X). Suppressing the dependence on X in the discussion that follows, such a regression might take the form . This approximation implies that, under linear interpolation, the intent-to-treat effect of reward points on stays is . In Appendix C, we show that, in the case of (i) a logit propensity function of the form and (ii) an arbitrary set of MTRs, we can also express as
Comparing the two expressions for ITT suggests that they are equal to each other when (i) there is no effect of reward points on total stays, , or (ii) the integrand, , is a constant for any reward level ϱ. The latter of these two conditions might occur when changes in ϱ are so small that the and functions are locally linear in ϱ—that is, their ratio is constant in ϱ. In other words, linear interpolation approximates the ITT only for very small differences in reward. Because one of the end points in is at R = 0, it is unlikely that local linearity holds for any values of R = r that are managerially meaningful.
Another way of considering when a linear interpolation might be valid is to (i) specify flexible MTR functions and (ii) allow for an arbitrary propensity function (as opposed to the preceding paragraph with a logit propensity function and arbitrary MTR functions). In Appendix C, we show that, if is specified as a linear polynomial function of u given x, then we obtain
Again contrasting this expression with from an OLS regression, we see that is potentially linear in r when (i) compliance p(x, r) is linear in r and (ii) there is no unobserved heterogeneity in the causal effect of promotion status on hotel stays, K = 0. Linearity in p is plausible for small changes in r, but if propensities are estimated using a linear regression, there is a risk of violating the overlap condition (propensities cannot be exactly zero or one). Note, however, that, if K = 0 (i.e., there is unconfoundedness as U does not enter the MTR functions), then the expression for the CLATE becomes —a constant—implying that causal effects are identical among customers who might be very different, for example, among those who require very few and very many points to comply with the promotion.
4. Results
In this section, we begin by discussing estimates from the propensity model and reporting the findings of our MTE model of stays. Within our discussion of stays, we outline a null model of conditional average treatment effects (CATE) under an assumption of unconfoundedness and compare it to the CLATEs estimated via the MTE approach. We conclude this section with a discussion of how optimal promotional designs vary across target segments and the profit implications of optimizing compliance promotions using more restrictive models.
4.1. Propensity Function
Overall, the propensity model fits the data well. Figure 4 shows estimated propensities from the regression model and compares these with nonparametric estimates. Specifically, it compares (i) values of estimated nonparametrically from the data and (ii) fitted values of p(X, Z) obtained from the logistic regression. The mean absolute difference in these estimates is 0.11 (median = 0.05).

Notes. Includes customers in the control group, whose stays are compared against the hypothetical hurdles they would be assigned if offered a promotion. A 45° line is shown for reference. A sample-weighted least square regression of nonparametric estimates (x-axis) on the regression estimates (y-axis) yields an intercept of 0 ( and slope of 1.001 (0.03) with an R2 of 0.83, suggesting unbiased propensity estimates that fit the data well.
Both rewards and hurdles affect propensities. For tiers 1–4, the expected increase in compliance from 0 to 10,000 points is 0.12%, 3.4%, 4.1%, and 2.8%, respectively; an additional 10,000 points is expected to raise compliance further by 0.085%, 2.7%, 3.0%, and 2.0%, respectively. On average, an increase in H from H = B to decreases the estimated propensity to reach the hurdle by 0.11, and a decrease in H from H = B to increases the estimated propensity by 0.18. For comparison, the average baseline propensity to reach the hurdle in the control group is 0.18, meaning a unit hurdle decrease nearly doubles propensities.16
4.2. Model Results and Comparisons
In this section, we consider a number of alternative models to assess which features of the proposed model are most helpful for explaining the data and to help refine the model specification to be used in optimizing promotional design. We begin by comparing nested models based on restricted versions of the MTR functions and then define and contrast our model against an inverse propensity weighting approach that makes the common unconfoundedness assumption.
4.2.1. MTR Comparisons.
A common approach to estimating treatment effects for compliance promotions is to use Heckman’s correction model if D = 0 leads to censoring (Heckman 1979), or a switching model if D = 0 and D = 1 produce different distributions of outcomes (Lee 1982) (see also Endnote 12). These standard approaches, however, embed a number of strong assumptions that may be difficult to justify in the context of compliance promotions. One is the assumption that the effect on Y due to the unobserved variable, U, is additively separable from the effects due to other observed factors, X. Another assumption is that the effect of U on Y is (weakly) monotonically increasing or decreasing in U. Section 4.4 discusses the assumption of a monotone effect in U, whereas this section suggests the separability assumption is not tenable in our context.
Separability refers to the assumption that the MTE can be expressed in the general form . This allows for the unobservable variable u to influence both promotion status, D, as well as hotel stays, Y. But separability also requires that the same unobserved variable that makes it more or less likely to meet the spending hurdle, H, also has an effect on hotel stays that does not vary with the level of the hurdle (or any other variables in X). Rather, separability implies that the marginal effect of U = u on hotel stays is the same for all customers regardless of their predicted baseline stays, loyalty tier, or spending hurdle. Separability is violated in our context if leisure travelers, for example, are both (i) more likely to comply with the hurdle (perhaps rewards are worth more to the more price-sensitive leisure segment) and (ii) more sensitive to hurdles in their stays conditional on reaching the hurdle (perhaps due to the more limited nature of leisure travel).
Separability is nested within our MTRs but not required. To explore the impact of requiring separability on the estimated treatment effects, as well as the impact of assuming selection on the observables only (unconfoundedness), we consider the two alternative specifications for the MTR function introduced in Section 3.2.2. Recall that the first of these assumes that there is no effect of the unobserved variable U on stays ( in Equation (11); hence, ). This selection on observables model is roughly comparable to an inverse propensity weighted (IPW) regression with regression weights derived from the propensity regression. Causal estimates from this model are constants for all values of reward points; thus, the estimates are conditional on X average treatment effects (CATE) and not CLATEs as in the MTE approach. The second nested model assumes additive separability, meaning there are no interactions between observables and unobservables in the MTRs (i.e., the s, s, and s in Equation (13) are all equal to zero; hence, ).
Table 4 reports, for each matched moment , the coefficient estimates from the IV and OLS regressions and the differences between these moments and their MTE counterparts (computed at the upper and lower treatment effect bounds— and from Equation (10)) for each of the three MTE specifications: (i) the full model, (ii) the model with separability (“no interactions”), and (iii) the model with unconfoundedness (“no unobservables”). Figure 5 depicts the coefficient estimates and their standard errors from the three regression models, which allows a more direct comparison of the coefficient estimates.
|
Table 4. Moment Error Bounds for the Full and Nested MTE Models
Moments | Full model | No interactions | No unobservables | |||||
---|---|---|---|---|---|---|---|---|
Regression | Coefficient, s | Estimate17 | ||||||
IV | Intercept*, | −0.16 | ||||||
IV | Baseline 0 Dummy*, | 0.36 | ||||||
IV | Baseline*, | 0.61 | ||||||
IV | Hurdle, | −1.02 | 0. | 0. | −0.17 | −0.15 | 0. | 0. |
IV | Promotion Status, | −1.03 | 0. | 0. | 0. | 0.12 | 0.90 | 0.90 |
IV | Promotion Status×Baseline, | −0.69 | 0. | 0. | 0.07 | 0.05 | 0.46 | 0.46 |
IV | Promotion Status×Hurdle, | 1.35 | 0. | 0. | 0. | 0. | −0.48 | −0.48 |
IV | Promotion Status×Tier 2, | 0.70 | 0. | 0. | 0. | 0. | 0. | 0. |
IV | Promotion Status×Tier 3, | 0.92 | 0. | 0. | 0. | 0. | 0. | 0. |
IV | Promotion Status×Tier 4, | 1.25 | 0. | 0. | 0. | 0. | 0. | 0. |
OLS | Intercept, | −0.43 | 0. | 0. | −0.02 | −0.02 | 0.01 | 0.01 |
OLS | Baseline 0 Dummy, | 0.33 | 0. | 0. | 0. | 0. | 0.001 | 0. |
OLS | Baseline, | 0.47 | −0.03 | −0.03 | −0.05 | −0.05 | −0.02 | −0.03 |
OLS | Hurdle*, | −0.88 | ||||||
OLS | Promotion Status, | 0.72 | 0. | 0. | 0. | 0. | 0. | 0. |
OLS | Promotion Status×Baseline, | −0.19 | 0. | 0. | −0.28 | −0.27 | −0.05 | −0.05 |
OLS | Promotion Status×Hurdle, | 0.83 | 0. | 0. | 0.34 | 0.33 | 0.05 | 0.05 |
OLS | Promotion Status×Tier 2, | 0.17 | 0. | 0. | 0. | 0. | 0. | 0. |
OLS | Promotion Status×Tier 3, | 0.16 | 0. | 0. | −0.07 | 0. | 0. | 0. |
OLS | Promotion Status×Tier 4, | 0.13 | 0. | 0. | 0. | 0. | 0. | 0. |
OLS (control) | Intercept, | −0.03 | 0. | 0. | 0. | 0. | 0. | −0.002 |
OLS (control) | Baseline 0 Dummy, | 0.69 | 0. | 0. | 0. | 0. | 0. | 0. |
OLS (control) | Baseline, | 1.01 | 0. | 0. | 0. | 0. | 0. | 0. |
OLS (control) | Hurdle*, | −1.21 | ||||||
Maximum sum of absolute errors (κ) | 0.03 | 1.00 | 1.98 |
Notes. Coefficients from the three regressions used in the MTE estimation are specified in Section 3.2.5, and their coefficient estimates (moments) are given herein. Per Equation (10), and are differences between moments obtained from OLS or IV regressions and their MTR counterparts at the lower and upper bounds of the treatment estimated effect, respectively, and κ is the maximum sum of absolute error (due to sampling error) allowed when determining the upper and lower bounds of CLATEs. Because of collinearity, moments in italics and marked with an asterisk are not used in the MTE procedure. When there are MTRs that can fully reproduce the data moments, the error is zero (this does not mean that the CLATE is point-identified, however).

Notes. Estimates from different regression models are grouped along the x-axis by their covariates. Along the x-axis, b0 indicates the dummy for b = 0, and t2, t3, and t4 indicate coefficients for tiers 2–4.
Several insights emerge from Table 4 and Figure 5. First, Table 4 shows that neither the no-interaction model nor the no-unobservable model can recover most of the moments without considerable error. This implies that the full model, which accounts for unobservables and their dependence on the observed data, reproduces the moments of the observed data substantially better than the restricted models. The improvement in fit is also indicated by the sum of absolute errors across moment conditions, κ, which is lowest in the full model in spite of having more parameters and, thus, more absolute errors to sum. Second, some of the discrepancies between regression estimates and their MTE counterparts in the restricted models are quite large. For example, the discrepancy in the no unobservables model for the IV promotion status moment (corresponding with tier T = 1), , is 0.90 or 87% of the parameter value. Third, the discrepancies are particularly large for the IV moments under the no unobservables model because the assumption that unobservables do not affect potential outcomes makes it difficult to reconcile the MTRs with an IV regression that does not make this assumption. The unconfoundedness assumption of the no-unobservables model is clearly problematic. For example, the IV estimate for tier 1 promotion status (–1.03) has the reverse sign of the OLS estimate (0.72). This suggests that promotion status is endogenous with stays. Because the no-unobservables model assumes this endogeneity away, it cannot reproduce both the IV and OLS parameter estimates and, thus, produces estimates that fare well for neither. In sum, the full model improves fit with the moments of the data substantially compared with the nested models that ignore or incompletely account for endogeneity and (lack of) separability.18
Next, Figure 6 compares estimated causal effects for the full, no interaction, and no unobservables MTR specifications. Additional insights emerge. First, the estimated causal effects are quite different across the models. We interpret this discrepancy as evidence that the nested models, which fail to fit the empirical moments, also produce biased estimates of the CLATEs. For example, the CLATE estimates for T = 2, B = 2, and H = 3 vary widely across models, differing by nearly 1.5 stays. Second, the simplest model (no unobservables) has the narrowest estimated bounds on the CLATEs. With fewer terms in the MTR functions, the parameter space that comes closest to satisfying the moment constraint shrinks (see Equation (10)). This suggests a trade-off. On the one hand, a more complex MTR can more easily meet the moment constraints, but on the other hand a poorly specified MTR or too many parameters can lead to large bounds, making the MTE model less useful for policy analysis. Specifically, as the bounds grow, so too does the range of potential optimal promotions, making it less clear which, among the set of optimal policies, to choose.

Notes. Causal effects are in units of stays and consider a change from D = 0 to D = 1 with R equal to the smallest reward used in each baseline–hurdle–tier experimental cell. Vertical bars show upper and lower bounds estimates. In cases in which the bounds are very narrow (CLATEs) or point estimates (CATEs), points are plotted at the midpoint of the bounds or value of the point estimate.
4.2.2. Doubly Robust Inverse Propensity Weighting (DR IPW) Comparison.
To afford another comparison of the MTE approach to more restricted models, we next describe a null model of CATEs that assumes unconfoundedness (as is common in the experimental literature in marketing). This means that the unobserved heterogeneity affecting compliance does not have an impact on stays. We use a doubly robust regression estimator (Bang and Robins 2005). First, observations are weighted by the inverse of their propensities (IPW), thus . Second, covariates that appear in the propensity function but not in the MTRs are also included in the regression equation to further adjust outcomes for these determinants of compliance (we do not include reward points in the regression). The regression equation is, thus,
Figure 6 reports the estimated CATEs, labeled “DR IPW.” The CATEs estimated with this approach are nearly identical to the no-unobservables MTR model, which also assumes unconfoundedness. The slight differences between the two estimators are due to the use of IV moments and domain restrictions for inference on m1 and m0 in the MTE approach, and the inclusion of additional adjustment covariates in the DR IPW approach. A key conclusion is that the simple DR IPW approach suffers the same limitations as the no unobservables MTR model.
4.3. Validation Against Nonparametric Estimates
Figure 7 compares the MTE estimates of CLATEs from the full model to nonparametric estimates based on the Wald estimator, .As the nonparametric estimates can be computed only for the observed cells, we plot treatment effects for each baseline–hurdle–reward combination in the experiment with hurdles less than or equal to five (corresponding with 84.8% of observations). Ten of the nonparametric estimates are not shown because, perhaps as a consequence of a limited range of integer outcomes, individual cells exhibit a high degree of finite sample bias and, thus, produce Wald estimates that are negative or undefined. The MTE estimator does not suffer from this limitation. Overall, the figure suggests that the chosen moments and functional forms for the MTRs generate estimated CLATEs that are consistent with the bulk of their nonparametric counterparts.

Notes. Points are horizontally jittered. MTE bounds estimates are plotted at the midpoint between the upper and lower bounds. Wald estimates are not shown if they are negative or violate the no-defiers assumption.
4.4. The Optimal Design and Targeting of Promotional Reward Structures
4.4.1. Optimizing Over Rewards.
Next, we consider a key goal of this research; extrapolating PRTEs outside the range of the original field experiment to infer the optimal promotion design. Figure 8 depicts the estimated effect of rewards on incremental stays, promotion costs, and net profit for the baseline–tier–hurdle combinations with the greatest number of observations in each of the four loyalty tiers19 and for the full and nested MTE models described in Section 4.2. In each plot, we depict the optimal reward level for each model as a vertical line. Because we estimate the profit function using bounds estimators, we select the optimal reward level using a minimax criterion (Handel et al. 2013).20

Notes. For each tier , the combination of B and H with the most observations is chosen for extrapolation and optimization, and is shown in its own panel. Within each panel, expected outcomes for promotions offering R bonus points (shown along the horizontal axis) are given. The subpanel labeled “Propensity” shows the proportion of customers expected to reach the hurdle, “CLATE” shows the expected increase in hotel stays among compliers, and “Profit” shows the expected incremental value from the promotion (margin from increased visits less the cost of bonus points). Profits on the vertical axes are scaled to preserve confidentiality. Upper and lower bounds are shown for CLATE and profit estimates. The optimal reward points under a minimax decision rule are indicated with arrows. Randomized reward point levels from the experiment are shown as inverted triangles.
We first consider the upper left panel in Figure 8, in which T = 1, B = 1, and H = 2. As the baseline and tier are low, these customers are infrequent visitors, and one might expect them to be relatively insensitive to rewards. Consistent with this, our results suggest that rewards have little effect on compliance or stays. As the incremental margin from the promotion depends on the share of compliers—customers whose promotion status is changed by the offered reward—these promotions generate almost no incremental revenue. Reward points, however, must be awarded to any customers who meet or exceed the hurdle regardless of whether they were motivated by the promised reward. As rewards are costly and have little effect on stays, their marginal value is negative, and the optimal reward level is R = 0, meaning it is optimal not to offer this group any promotion at all.
Turning next to the upper right panel, in which T = 2, B = 2, and H = 3, we find that higher rewards drive greater compliance. The effect of rewards on stays differs across the nested models. Recall that rewards are excludable from potential outcomes, so the only effect rewards have on stays is indirect by inducing customers with different values of U to select into compliance with the promotion. When we assume unconfoundedness—that there are no unobservables in the MTR functions—the estimated value of the CLATE does not depend on the number of bonus points offered (i.e., the estimand reduces to a CATE). This can be seen in the flat line for the CLATE with no unobservables. The profits under no unobservables, therefore, are driven only by changes in compliance, with an optimal level of 40,000 points. Comparing the full model to the two restricted models, we observe that the expected profits in the restricted models are off by nearly 100%, and they are far too optimistic (because their estimated CLATEs are much too large). The model with no interactions predicts (scaled) profits of $8.53–$8.99 per customer in this segment, whereas the full model predicts $3.84–$4.20 per customer, an error of roughly 100%. Differences in the cost of the promotion can also be substantial as the optimal reward points for this segment is 40,000 points according to the model with no unobservables but only 27,500 points in the full model, an error of roughly 50%. In other words, the flexible MTR specifications in the full model have a material difference on promotion design and profit, which cannot be captured by models that assume separability (e.g., the classic Heckman correction model) or unconfoundedness (e.g., IPW regression).
Next, we consider T = 3, B = 2, and H = 3 in the lower left panel. Again, we note that rewards affect compliance, but their effect on outcomes only matters when accounting for unobservables. Unlike the case of T = 2, B = 2, and H = 3, differences in the profit functions are small because estimates for the CLATEs are similar across specifications (and all three models share the same propensity regression model). Nevertheless, the optimal number of bonus points varies across models owing to (i) the large estimation bounds and the use of the minimax criterion; (ii) the high margin per visit, which amplifies small differences in the CLATE for visits into large differences in profit; and (iii) the relative flatness of the profit curve over a potentially wide range of bonus points. As a result, even though the optimal design differs across models, the predicted profits at the different optima are roughly the same. Even if the wrong model were to be used in this instance, profits would not be substantially affected (ranging between $5.10 and $5.74 per customer over the range of 17,500–30,000 points).
The last case is presented in the lower right panel with T = 4, B = 3, and H = 4, and represents results for members of the highest loyalty tier: those who stay at the hotel chain most often. As in the case of the upper right panel, in which T = 2, B = 2, and H = 3, the choice of model again has a substantial effect on estimates. However, in this case, the effect is reversed; the full model in the lower right panel predicts higher rather than lower profits, whereas the full model in the upper right panel (for tier T = 2) predicts lower rather than higher profits. This result is suggestive that a model in which unobservables can interact with observables is able to capture many patterns of behavior (that is, the model is not restricted to assume that the marginal consumer at higher rewards is always the same). The scaled profit in the model without unobservables when T = 4, B = 3, and H = 4 is about $3.11 per customer compared with the full model’s prediction of $4.46–$4.57, an error of about 30%. The optimal reward levels are similar, however, because marginal revenues are roughly proportional across models.
Finally, as Section 4.2 notes, the Heckman correction model assumes separability and a monotone effect of unobservables on outcomes, and the previous discussion offers evidence that the separability assumption is not supported by the data. Here, we reference the monotone treatment effect assumption. This assumption is also violated because, in the full model, the CLATE for T = 2, B = 2, and H = 3 is increasing; the CLATE for T = 3, B = 2, and H = 3 is flat; and the CLATE for T = 4, B = 2, and H = 4 is decreasing. By contrast, the no-interactions model generates a monotonic effect, which can be seen in the decreasing CLATEs for this model in Figure 8.
These examples generate a couple of insights. First, the assumptions of separability and selection into compliance can be materially important when setting rewards and forecasting returns, even though they are often ignored in the literature on compliance promotions. Second, the MTE analysis can yield guidance regarding the design of subsequent experiments. For example, the optimal level of reward points greatly exceeds the maximum points that were randomized in the experiment in many cases. Thus, it seems prudent to generate experimental variation over higher levels of reward points in our example. Information on enhancing subsequent experimental designs in this fashion would not be available without extrapolation.
4.4.2. Optimizing Jointly over Rewards and Hurdles.
Although it is not possible to extrapolate over hurdles, it is possible to jointly optimize over the observed hurdle and extrapolated reward levels. Doing so involves comparing the extrapolation of CLATEs and PRTEs (over R) for all values of H that were randomized to customers with the same baseline and tier. The minimax criterion (or any decision rule) is then applied to the full set of estimated upper and lower bounds of extrapolated PRTEs, both within and across observed values of H, implying that the respective optimal values of R for each value of H may differ from the optimal value of R across all values of H.
The minimax criterion tends to favor certainty (narrower bounds) as this often leads to lower maximum regret (expected loss). IHG’s block randomization experimental design offered fewer customers promotions with H = B than with , so the CLATE bounds are correspondingly narrower for promotions with relative to H = B. Thus, PRTEs with tend to have lower maximum regret and are accordingly selected by the minimax rule. This can be seen in Table 5, which shows optimal hurdles and rewards for a subset of customers with T = 3. In all cases, the promotion with was preferred over the promotion with H = B by the minimax criterion. A key insight from this analysis is that designs reflecting experimental cells with more uncertainty (wider bounds) are less likely chosen, and that firms seeking to optimize over nonexcludable variables (such as hurdle) can benefit from using experimental designs with more balanced sample sizes.
|
Table 5. Joint Optimization of Hurdle and Reward Across B for T = 3
Baseline (B) | Optimal hurdle () | Optimal points, 1000s () | Expected increase in stays | Expected profit |
---|---|---|---|---|
2 | 3 | 22.5a | [2.90, 2.94] | [5.52, 5.65] |
3 | 4 | 40.0 | [3.64, 3.70] | [9.02, 9.37] |
4 | 5 | 32.5 | [4.45, 4.54] | [14.29, 14.78] |
5 | 6 | 40.0 | [5.17, 5.29] | [18.84, 19.57] |
6 | 7 | 47.5 | [6.20, 6.27] | [27.11, 27.59] |
7 | 8 | 47.5 | [6.88, 7.00] | [31.88, 32.75] |
Notes. Within each baseline in these selected cells, hurdle was manipulated to be either H = B or Profits are in units of dollars, and both the stays and profit columns report the respective bounds for the CLATE and PRTE. The optimal promotion for B = 2 has H = 3 and R = 22,500, whereas the optimum shown in the corresponding panel of Figure 8 lies at R = 30,000. The discrepancy is due to Figure 8 showing an optimum over all possible outcomes conditional on H = 3 and this result showing an optimum over all outcomes with either H = 2 or H = 3.
5. Conclusion
In this paper, we outline an approach for designing and targeting of compliance promotions and apply it to the context of a hotel’s loyalty rewards promotion. Given the rapid growth of field experiments and methods for policy evaluation in marketing, the utility of extrapolating beyond experimental cells to improve promotional outcomes (for example, deciding who to target with a promotion and what the optimal design parameters of that promotion should be) is growing rapidly. Often these optimal designs do not align exactly with the values used in the randomized field experiment. Yet there is little to date in the marketing literature that can guide these targeting and design decisions, particularly in experimental contexts in which compliance is not guaranteed.
The approach we outline offers several advantages over past approaches used in marketing. First, it does not assume away unobservable factors that can affect compliance and outcomes (that is, it does not assume compliance is unconfounded). Accounting for these unobservable factors complicates the task of extrapolation, yet assuming their effect away during extrapolation biases causal estimates and leads to suboptimal promotions or other policy outcomes. Second, when unobservables are acknowledged to be an important factor for a compliance promotion, we argue that causal extrapolation is only valid for promotion features that can function as valid instruments for promotion status, in other words, promotion features whose impact on the outcome is fully mediated by whether customers meet the terms of the promotion. Moreover, this requirement is true even if the design parameters are manipulated experimentally. This observation is relevant in marketing as prior research has extrapolated non-fully mediated manipulations, perhaps assuming randomization alone is sufficient. Third, for features whose effects on the outcome are assumed to be fully mediated through promotion status, we show how the MTE approach can be used to extrapolate causal effects in the context of compliance promotions. This approach is easy to use (as it is implemented in a few lines of R code) and does not require strong assumptions on the error structure or functional form of the outcome equation as with some prior approaches. In our context, the separability and monotone effect assumptions are not supported by the data. Rather, flexibility in the MTR functions is apparently necessary to fully rationalize observed patterns in the data. Fourth, we show that simple linear interpolation methods, wherein levels of outcomes are interpolated between observed experimental manipulations, and which are commonly used in practice, are only locally valid for very small changes in design parameters away from the values used in the experiment.
We apply this approach to a loyalty reward promotion experiment implemented by IHG and find that extrapolation is valid for rewards but not hurdle levels, and traditional approaches to extrapolation can overestimate or underestimate the profit of compliance promotions. Further, we demonstrate how to use this approach to optimize promotions for specific customer target segments. Findings suggest that more standard and restricted approaches can lead to large prediction errors in profits (on the order of 100%) and reward points that are nearly 50% too large.
Given the growth in machine learning approaches used to estimate heterogeneous treatment effects for purposes of targeting in the face of a large number of observable covariates, an obvious next step is to integrate MTE approaches with machine learning in order to enable promotional design and targeting in the face of unobservable customer heterogeneity.
Another interesting topic for future consideration pertains to the Heckman correction model widely applied within and beyond marketing (Heckman 1979, Lee 1982). The key difference between it and our considered switching regression is whether D = 0 implies censoring (see Endnote 12 and Section 4.2.1 for more detail). Yet both the Heckman correction and switching regression models presume separability and monotonicity in the effect of unobserved heterogeneity on the uncensored outcome. Not only are these assumptions not supported in our application, they are rather restrictive as a general principle. Why should a change in X bring about an identical change in the uncensored outcome for different marginal consumers (e.g., why presume that people with very different educations respond to employment incentives in exactly the same manner)? Our results yield substantially incorrect forecasts when invoking these restrictive assumptions, raising the possibility that many prior studies using the Heckman correction model for policy extrapolation might have misleading conclusions.
All in all, we hope this research is an initial step to enable marketers and researchers to extrapolate for policy evaluation as well as develop new approaches for the targeting and design of compliance promotions.
The authors thank seminar participants at the 2022 Frank M. Bass University of Texas at Dallas Frontiers of Research in Marketing Science Conference, Boston College, Boston University, Duke University, Johns Hopkins University, the 2021 INFORMS Society for Marketing Science Marketing Science Conference, KU Leuven, the 2022 Summer Institute in Competitive Strategy, the University of Chicago, and the University of Washington as well as Bryan Bollinger, Andrey Fradkin, Xiang Hui, Przemyslaw Jeziorski, Garrett Johnson, Byungyeon Kim, Aurélie Lemmens, Martina Pocchiari, Kosuke Uetake, and Levin Zhu for helpful comments and suggestions; Marketing Science Institute for their support; and Siddharth Prusty and Donggwan Kim for their research assistance. Authors are listed alphabetically. Competing interests: Carl F. Mela is an associate editor at Marketing Science. Song Yao is a member of the editorial board at Marketing Science. Jim Sprigg was a full-time salaried employee at InterContinental Hotel Group from the time the data were collected through manuscript submission. The authors agreed the research would not use or disclose personal or private company information but are not generally otherwise restricted.
Appendix A. Policy-Relevant Treatment Effect
The PRTE for a promotion offering R bonus points is derived from the more primitive CLATE of meeting the terms of the promotion by visiting H or more hotels (D = 1) on the outcome Y, among customers with baseline B and tier T who are compliers when offered R = r points. For these compliers, D = 1 when R = r, and D = 0 when R = 0 (conditional on X); we denote this definition as . Using the potential outcomes notation introduced in Section 3.1, we can write this causal effect as
Let denote the proportion of customers who are compliers when offered R = r points and denote the proportion of customers who are always-takers (those who would have D = 1 even without a promotion). Further, let N, mgn, and cpp be the number of targeted customers, the expected margin per stay, and expected cost per point as defined in the main text. We can use these terms to define the expected, incremental profit from this promotion:
Next, we note two identities. First, is equal to the ratio of the (conditional on X) intent-to-treat effect on Y from offering a reward of R = r points to the proportion of compliers when R=r: . Second, is equal to the expected proportion of customers in the offer group who meet the terms of the promotion (D = 1) when offered R = r points. Thus, we can rewrite the PRTE as
Given random assignment of R (conditional on X), a nonparametric estimator of this PRTE is available at values of R observed in the experiment.
Appendix B. Extrapolation Validity Example
To understand why extrapolation of causal effects over the spending hurdle is not valid, consider that, as the value of the hurdle changes, two things happen: (i) the mix of customers who comply with the promotion changes, and (ii) the average number of hotel visits among complying and noncomplying customers also changes. We can illustrate the problems this generates for extrapolation by considering a hypothetical experiment in which customers in the offer condition are randomized into one of two promotions, one with a hurdle of H = 4 and one with H = 2, with both promotions offering R = 10,000 bonus points as a reward. Table B.1 summarizes what we might observe from this experiment. In the cell with H = 4, 10% of customers reach the hurdle, and they visit 4.5 hotels on average. In the cell with H = 2, 20% of customers reach the hurdle, and they visit an average of three hotels. We want to extrapolate the results of the experiment to predict average stays among those reaching a hurdle of H = 3, which was not part of the original experiment.
|
Table B.1. Illustration of Why Extrapolation over Nonexcludable Variables is Invalid
Average stays if D = 1 | |||||
---|---|---|---|---|---|
Hurdle | Proportion with D = 1 | Data | Group D | Group B | Group |
4 | 0.10 | 4.5 | 4.5 | – | – |
2 | 0.20 | 3.0 | 3.0 |
Note. Bonus points are set at R = 10,000 and do not vary over promotions.
From Table B.1, we can infer a few things about this hypothetical population of customers. First, if we offer all customers the same promotion with a spending hurdle of H = 4, we expect 10% of them to reach the hurdle. We can call this subset of customers reaching the hurdle and infer from the experiment that their average hotel stays would be 4.5. Second, if we offer all customers a promotion with a hurdle of H = 2, we expect all of the customers in the subset to reach this lower hurdle because the reward is unchanged. Moreover, based on the results of the experiment, we also expect an additional 10% of customers to reach this lower hurdle. We can call this new subset of customers, . Third, we know from the experiment that, when we offer a promotion with a hurdle H = 2, average stays in groups and combined should be three. And, fourth, we know that average stays for customers in group should be lower when they are offered the H = 2 versus the H = 4 hurdle because (i) groups and are the same size, thus, their combined average of three stays is the simple average of their respective group averages, and (ii) group ’s average stays cannot be less than the minimum spending hurdle of H = 2; hence, group ’s average cannot be greater than four.
What if one seeks to infer the average stays among those reaching a hurdle of H = 3? For one, we need to predict the proportion of customers in who will not reach the H = 3 hurdle. One can generate such a prediction from the estimated propensity function, p(x,z), but unfortunately, one can do no more. Specifically, one cannot infer (i) how average stays in group change when offered a hurdle lower than four or (ii) how average stays in group change when offered a hurdle greater than two. We can conceive of such a model to predict these missing values, but the experiment does not generate any information that can be used to estimate that model. One needs to impose the extrapolation model by fiat.
By contrast, extrapolation on R is valid as long as we are willing to assume that changes in R affect the mix of customers who comply with the promotion, but otherwise have no effect on stays. For example, imagine we included a third promotion in the experiment from the previous example and this new cell had a promotion with H = 4 and R = 15,000 (see Table B.2). As in the previous example, when we offer a reward of 15,000 bonus points, we expect all of the customers in group to reach the hurdle and to be joined by a new subset of customers, which we call group . In total, 15% of customers in this cell reach the hurdle, visiting an average of 4.4 hotels. In the previous example, we could not infer average stays for group when offered a lower hurdle. Here, we need to infer group ’s average stays when offered 15,000 bonus points, and thankfully, we can infer this: it is 4.5, the same as when they are offered 10,000 bonus points. Furthermore, because we know the sizes of groups and , we can do the same for group ; they visit 4.2 hotels on average (). As before, we can conceive of a model for how average stays are related to propensities to comply with the promotion. In this case, however, the experiment does contain enough information to identify the parameters of that model.
|
Table B.2. Illustration of Why Extrapolation over Excludable Variables is Valid
Proportion with D = 1 | Average stays if D = 1 | ||||
---|---|---|---|---|---|
Reward | Data | Group | Group | Group | |
10,000 | 0.10 | 4.5 | 4.5 | – | – |
15,000 | 0.15 | 4.4 | 4.5 | 4.2 | 4.4 |
Note. Hurdles are set at H = 4 and do not vary over promotions.
Appendix C. Extrapolation
C.1. Logit Propensity and General MTR Function
Conditional on X (and suppressing X in the notation), the ITT effect of a promotion with R = r points (versus R = 0 points—i.e., no promotion) is given by
Change the variable of integration from u to ϱ, where and :
Specify the propensity function as . The derivative of the propensity function with respect to ϱ is
Thus, the expression becomes
C.2. General Propensity and Polynomial MTR Function
Let the MTRs be given by the polynomial function Then (again, suppressing X in the notation), the ITT can be expressed as
1 Hereafter we use extrapolation to refer to both interpolation and extrapolation and interpolation when referring exclusively to interpolation.
2 Here and throughout this paper, we make the standard assumption that there are no defiers: customers who satisfy the promotional terms if they are not offered the promotion but do not satisfy the terms if they are offered the promotion.
3 The term comply here refers specifically to the pattern of behavior whereby (i) a customer is offered a promotion and (ii) the customer’s behavior meets the terms of the promotion (D = 1) but (iii) the customer would not have met the promotion’s terms (D = 0) had the customer not been offered the promotion. When we refer to the causal effect of compliance on an outcome, we mean the difference between the outcomes that arise for one of these complying customers at D = 1 if offered the promotion versus their outcome at D = 0 if not offered the promotion.
4 See https://www.ihgplc.com/en/investors/2019-annual-report, accessed May 9, 2021.
5 Our PRTE example considers only changes in reward points but not changes in hurdles. As we explain in Section 3.2, extrapolations of the PRTE for changes in hurdles are not identified in our data context. More generally, identification of extrapolated PRTEs requires that the policy-relevant treatment variables are conditionally excludable from outcomes. See Section 3.2.1.
6 Actual profit levels are scaled to preserve confidentiality.
7 In typical marketing applications, higher values of an unobservable variable ϵ correspond with a higher choice likelihood. Thus, it might help intuition to think of .
8 Depending on the empirical setting, Y and D might be observed simultaneously or sequentially, and the model accommodates either case.
9 The expectation in (6) is taken with respect to X and Z (i.e., observational units in the experiment), but one can construct target parameters that are conditional on X = x by modifying the expectation operator. An example of the latter is the conditional on X = x CLATE, and an example of the former is the coefficient for D in the OLS regression .
10 Point identification of causal effects, such as CLATEs and PRTEs, typically requires policies to be observed in the cells of the original experimental data (i.e., without the need to extrapolate), or a strong assumption about the distribution of unobservables (e.g., rank invariance). In practice, point-like estimators can arise when estimated upper and lower bounds are extremely close. We still consider these to be bounds estimators, owing to the lack of an affirmative proof of point identification.
11 For any finite sample, the value of κ is bounded below by the set of MTRs that minimize the left-hand side of Equation (10), (for the upper bound of the target parameter). κ typically is greater than zero due to sampling error. Allowing κ to be slightly greater than this infimum expands the set of MTRs that are deemed to be consistent with the data, in turn, allowing the estimated upper and lower bounds for the target parameter to be wider (i.e., more conservative). For this purpose, we allow κ to be 1.0001 times the value of in our empirical application.
12 Heckman correction and switching models (Heckman 1979, Lee 1982) are nested within the more general MTE framework (Heckman et al. 2003). The more general MTE approach does not require additive separability in outcomes (that is, the marginal effect of the unobservables on outcomes can depend on the observables) and does not limit the effect of unobserved factors on outcomes to be monotonically increasing or decreasing.
13 This transformation encodes our assumption that (i) there is a nonlinear response to more reward points on promotion status—if the reward is too small, it is not motivating, and there are diminishing marginal returns to larger rewards—and (ii) IHG designed the experiment with an understanding (perhaps implicit) of this response function. Accordingly, we chose the scaling factor of to rationalize the observed reward levels used in the experiment.
14 Logistic regression is not a requirement for the MTE procedure; any consistent estimates of can be used.
15 According to IHG’s management, those who register for the loyalty promotion typically redeem almost all of the points earned, and margins do not vary much by baseline and tier.
16 The propensity function parameter estimates are reported in the online appendix.
17 Standard errors are shown in Figure 5. Although moments are estimated with statistical error, Mogstad et al. (2018) show that the MTR estimator is consistent.
18 To better understand the role of the IV specifications moments versus OLS, we estimate versions of the models matching only the OLS moments from Equation (15). This leads to much wider bounds on the estimated CLATEs owing to more limited information to fit the model. Matching only OLS moments does not bias the estimates because the OLS moments are still valid functions of the underlying MTE.
19 We focus on four cases from our example application—one from each loyalty tier—because customers in each tier differ in their sensitivity to reward points, and these differences illustrate how the dependence of U on X affects optimal rewards.
20 For each model, customer segment, and hurdle, we first calculate the maximum lower bound on expected profit, over the set of rewards under consideration. We then consider the subset of reward levels whose upper bounds on expected profit exceed . Next, for each reward level , we calculate the maximum regret, , from earning profit in the range when the true optimal profit level is contained in the set . To calculate maximum regret, we use an absolute loss function over expected and realized profits: . Finally, we find the optimal reward as the level that minimizes maximum regret: . If is not unique, we set to be the smallest value from the set of values that minimize maximum regret.
References
- 2018) Retention futility: Targeting high-risk customers might be ineffective. J. Marketing Res. 55(1):80–98.Crossref, Google Scholar (
- 2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973.Crossref, Google Scholar (
- 2017) Beyond LATE with a discrete instrument. J. Political Econom. 125(4):985–1039.Crossref, Google Scholar (
- 2002) Optimal pricing of new subscription services: Analysis of a market experiment. Marketing Sci. 21(2):119–138.Link, Google Scholar (
- 2022) Personalized pricing and customer welfare, Preprint, submitted June 19, https://ssrn.com/abstract=3035110.Google Scholar (
- 2017) Self-signaling and prosocial behavior: A cause marketing experiment. Marketing Sci. 36(2):161–186.Link, Google Scholar (
- 2014) The effect of disability insurance receipt on labor supply. Amer. Econom. J. Econom. Policy 6(2):291–337.Crossref, Google Scholar (
- 2019) Mobile targeting using customer trajectory patterns. Management Sci. 65(11):5027–5049.Link, Google Scholar (
- 2013) Robust firm pricing with panel data. J. Econometrics 174(2):165–185.Crossref, Google Scholar (
- 1979) Sample selection bias as a specification error. Econometrica 47(1):153–161.Crossref, Google Scholar (
- 1999) Local instrumental variables and latent variable models for identifying and bounding treatment effects. Proc. Natl. Acad. Sci. USA 96(8):4730–4734.Crossref, Google Scholar (
- 2001) Policy-relevant treatment effects. Amer. Econom. Rev. 91(2):107–111.Crossref, Google Scholar (
- 2005) Structural equations, treatment effects, and econometric policy evaluation. Econometrica 73(3):669–738.Crossref, Google Scholar (
- 2007)
Econometric evaluation of social programs, Part I: Causal models, structural models and econometric policy evaluation . Heckman JJ, Leamer EE, eds. Handbook of Econometrics, vol. 6 (Elsevier, Amsterdam), 4779–4874.Crossref, Google Scholar ( - 2003) Simple estimators for treatment parameters in a latent-variable framework. Rev. Econom. Statist. 85(3):748–755.Crossref, Google Scholar (
- 2018) Heterogeneous treatment effects and optimal targeting policy evaluation. Preprint, submitted February 6, https://dx.doi.org/10.2139/ssrn.3111957.Google Scholar (
- 2020) Group average treatment effects for observational studies. Preprint, submitted March 27, https://arxiv.org/abs/1911.02688v5.Google Scholar (
- 2018) Customer Relationship Management: Concept, Strategy, and Tools, Chapter 10, 3rd ed. (Springer, Berlin), 179–205.Crossref, Google Scholar (
- 1982) Some approaches to the correction of selectivity bias. Rev. Econom. Stud. 49(3):355–372.Crossref, Google Scholar (
- 2018) Identification and extrapolation of causal effects with instrumental variables. Annual Rev. Econom. 10(1):577–613.Crossref, Google Scholar (
- 2018) Using instrumental variables for inference about policy relevant treatment parameters. Econometrica 86(5):1589–1619.Crossref, Google Scholar (
- 2010) Dynamic allocation of pharmaceutical detailing and sampling for long-term profitability. Marketing Sci. 29(5):909–924.Link, Google Scholar (
- 2009) Heterogeneous learning and the targeting of marketing communication for new products. Marketing Sci. 28(3):424–441.Link, Google Scholar (
- 2020) Introduction to the special issue on marketing science and field experiments. Marketing Sci. 39(6):1033–1038.Link, Google Scholar (
- 2017) Coupon clipping by impoverished consumers: Linking demographics, basket size, and coupon redemption rates. Internat. J. Res. Marketing 34(2):553–571.Crossref, Google Scholar (
- 2009) Causality, 2nd ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar (
- 2018) Some methods for heterogeneous treatment effect estimation in high dimensions. Statist. Medicine 37(11):1767–1787.Crossref, Google Scholar (
- 1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66(5):688–701.Crossref, Google Scholar (
- Shea J, Torgovitsky A (2021) ivmte: An R package for implementing marginal treatment effect methods. Preprint, submitted September 7, https://dx.doi.org/10.2139/ssrn.3516114.Google Scholar
- 2020a) Efficiently evaluating targeting policies: Improving on champion vs. challenger experiments. Management Sci. 66(8):3412–3424.Link, Google Scholar (
- 2020b) Targeting prospective customers robustness of machine-learning methods to typical data challenges. Management Sci. 66(6):2495–2522.Link, Google Scholar (
- 2020) Optimizing price menus for duration discounts: A subscription selectivity field experiment. Marketing Sci. 39(6):1181–1198.Link, Google Scholar (
- 2021) ivmte: Instrumental variables: Extrapolation by marginal treatment effects, R package version 1.4.0. Accessed April 12, 2022, https://CRAN.R-project.org/package=ivmte.Google Scholar (
- 2012) Measuring and managing returns from retailer-customized coupon campaigns. J. Marketing 76(1):76–94.Crossref, Google Scholar (
- 2016) Enduring effects of goal achievement and failure within customer loyalty programs: A large-scale field experiment. Marketing Sci. 35(4):565–575.Link, Google Scholar (