Participation vs. Effectiveness in Sponsored Tweet Campaigns: A Quality-Quantity Conundrum

Published Online:https://doi.org/10.1287/mnsc.2019.01897

Abstract

We investigate the participation and effectiveness of paid endorsers in sponsored tweet campaigns. We manipulate the financial pay rate offered to endorsers on a Chinese paid endorsement platform, where payouts are contingent on participation rather than engagement outcomes. Hence, our design can distinguish between variation in participation and variation in outcomes, even if people select to endorse only specific tweets. Also, the lack of compensation for effort allows one to attribute differences in outcomes to precontractual selection rather than postcontractual behavior. The main finding is that endorsers exhibited adverse selection. Several observed and unobserved endorser characteristics associated with a higher propensity to participate had a negative association with being an effective endorser given participation. This adverse selection results in a conundrum when trying to recruit a sizable number of high-quality endorsers. Only 9.5%–11.8% of the endorsers were above the median in both the propensity to participate and the propensity to be effective compared to a benchmark of 25% in the absence of any association. A simulation analysis of various targeting approaches that leverages our data of actual endorsements and outcomes shows that targeting candidate endorsers by scoring and ranking them using models taking into account adverse selection on observables improves campaign outcomes by 13%–55% compared to models ignoring adverse selection.

This paper was accepted by Eric Anderson, marketing.

Funding: This project was made possible by financial support extended to J. Peng through a Penn Lauder Center for International Business Education & Research PhD Grant and a Wharton Baker Retailing Center PhD Research Grant.

Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2019.01897.

1. Introduction

Marketers are quite keen on leveraging word of mouth (WOM) and other viral processes. Rather than simply relying on organically occurring WOM, many seek to boost their odds of success through structured WOM programs. Practices like targeting opinion leaders or designing messages with above-average virality have gained much attention (e.g., Berger 2013). Many companies, however, go beyond trying to steer organic WOM and turn to incentivized WOM (i.e., actively compensating customers to engage in viral activities).

Some incentivized WOM programs are run in-house. Examples are speaker programs run by pharmaceutical companies and customer referral programs run by a wide gamut of companies (e.g., Berman 2016, Van den Bulte et al. 2018). However, oftentimes marketers turn to viral-for-hire platforms to recruit customers willing to engage in WOM for compensation. BzzAgent is such a platform for face-to-face incentivized WOM (Godes and Mayzlin 2009, Berger and Schwartz 2011). Quite a few platforms exist for online incentivized WOM, allowing marketers to recruit and incentivize people to post content online. Goodreads and NetGalley, for instance, provide free copies of books with the expectation of getting an online review. ReviewMe and PayPerPost facilitate sponsored reviews for many other categories. Yet other services facilitate sponsored content on social media platforms, like Twitter, Facebook, Instagram, or YouTube. For instance, IZEA, Tapinflluence, and Twitter for Business facilitate sponsored tweets (i.e., tweets containing specific content chosen by the sponsor/advertiser).

Marketers using sponsored tweets and other incentivized WOM efforts face several key decisions (Haenlein and Libai 2017). One question is who to target and recruit. Another question is how much to pay. The first decision is challenging, especially if endorsers exhibit adverse selection, where people who are less likely to sign up and participate in viral-for-hire campaigns are more effective in generating a response from their followers (Fang and Liu 2018, Khern-am-nuai et al. 2018, Viswanathan et al. 2018). The second decision can be a challenge too because viral-for-hire endorsers are often motivated by more than financial gain.

We investigate two questions related to those decisions. The primary question is whether we can detect adverse selection in observable or unobservable endorser characteristics and quantify the gains in campaign effectiveness that might be achieved by taking this quality-quantity conundrum into account when deciding which endorsers to target. The secondary question is to what extent the pay rate affects participation.

We investigate these questions by means of a field experiment on a paid endorsement platform. Like many referral programs and sponsored tweet campaigns, endorsers are paid for participating regardless of the response they generate (likes, comments, retweets). This feature of the research setting implies that participating endorsers receive no incentive to exert effort and hence allows us to use the pay rate, which we manipulate, as the exclusion restriction to distinguish between participation and effectiveness in a selection model. Manipulating the pay rate also enables us to estimate the causal effect of the pay rate on participation and, indirectly, on effectiveness.

The main contribution consists of documenting adverse selection in both observables and unobservables among endorsers in a sponsored tweet campaign and of quantifying the resulting quality-quantity conundrum when deciding who to target as endorsers. Several observed and unobserved endorser characteristics associated with a higher propensity to participate have a negative association with being an effective endorser given participation. This adverse selection results in a quality-quantity conundrum, making it challenging for managers to recruit a sizable number of high-quality endorsers. Only 9.5%–11.8% of the endorsers were above the median in both the propensity to participate and the propensity to be effective compared to a benchmark of 25% in the absence of any association between participation and effectiveness. A simulation of various targeting approaches that leverages our data of actual endorsements and outcomes shows that scoring and targeting candidate endorsers using models that account for adverse selection on observables improve campaign outcomes by 13%–55% compared to models that ignore adverse selection.

The presence of a sizable quality-quantity conundrum is especially noteworthy considering that 93% of endorsers in our experiment had fewer than 5,000 verified followers and that 97% had fewer than 10,000. Even though such nanoinfluencers are believed to be more effective on a per-follower basis than endorsers with more followers,1 we find that they still present marketers with a notable quality-quantity conundrum.

A secondary contribution is to document an instance where the absence of pay depressed participation dramatically but offering a low or high pay rate did not affect participation. Although surprising, the latter is consistent with the notion that monetary rewards are not the only factor motivating people to participate in sponsored tweet and other viral campaigns. However, the data refute the notion that endorsers simply did not care about financial rewards. The effects of both the minimum and higher rates (as opposed to the zero rate) are higher among endorsers who have more to gain and who have participated in more campaigns in the past. Because 87% of the campaigns on the platform used the minimum rate, it is possible that the minimum was already more than high enough to motivate any marginal endorser.

In what follows, we first discuss some recent literature relevant to the questions at hand. Next, we describe the research setting and design. We then report evidence of adverse selection in both observables and unobservables, quantify how few potential endorsers score high on both the propensity to participate and the propensity to be effective, quantify how scoring models taking adverse selection into account can improve campaign outcomes, and investigate the causal effect of the pay rate on participation. We then present validity checks and assess the plausibility of alternative explanations to adverse selection, including platform policies, moral hazard, and repetition wear out. We conclude with implications for practice and research.

2. Conceptual and Empirical Insights from Recent Literature

2.1. Participation vs. Effectiveness and Adverse Selection

There are many reasons to expect the odds of engaging in WOM and of being effective when doing so to be positively correlated. For instance, involvement and expertise in a high-risk product category increase the probabilities of trying a new product, engaging in WOM, and doing so effectively. In a study of organic (nonincentivized) peer influence for a new drug to treat a potentially lethal medical condition, Iyengar et al. (2011) find that heavy prescribers in the category not only tended to adopt earlier than other physicians but also tended to exert greater peer influence after adoption. This implies a positive correlation between the tendency to engage in peer influence and the efficacy of doing so. People’s enthusiasm for a product or brand also likely correlates positively with both engaging in WOM and doing so extensively and persuasively. As another example, someone motivated to help others by retweeting great deals will be more motivated to seek out and share deals if they have a large network of followers and will likely have a greater impact when they do so than someone with fewer followers.

However, this positive association between engaging in WOM and doing so effectively is less likely to hold in incentivized WOM campaigns. In a field study of such a campaign by a restaurant chain, Godes and Mayzlin (2009) found that incentivized WOM by heavy users tended to be less effective than WOM by light users. They replicated this finding in a laboratory experiment, where they also found that heavy users tended to be more likely to engage in WOM. Assuming that no other observable or unobservable has a positive or negative association with both participation and effectiveness, this implies a negative correlation between the tendency to engage in peer influence and the efficacy of doing so. The authors’ explanation involves unintentional adverse selection through homophily, where friends of heavy users likely already know and intend to use the product or service and hence are less responsive to WOM.

Other scenarios with unintentional adverse selection may occur. For example, in a study of an incentivized referral program at a retail bank, Van den Bulte et al. (2018) find that the profitability of a referred customer correlates positively with that of the referring customer. To the extent that less wealthy and less profitable customers are more eager to earn a 25-euro referral fee, one’s tendency to participate in the referral program and one’s effectiveness in bringing valuable customers would be negatively correlated. More subtle and difficult-to-observe traits may matter as well. For instance, Viswanathan et al. (2018) found that extravert customers tended to generate more referrals but that these were of lower than average profitability.

Of course, adverse selection may also be intentional. People more interested in monetary rewards than in helping others in discovering products or services that fit their needs are more likely to participate in an incentivized campaign and less likely to exert diligence in selecting or educating the people they contact. Such intentional adverse selection is a definite possibility in most incentivized campaigns because the reward is typically not contingent on the quality or effectiveness of the participation.

The potential for unintentional and intentional adverse selection raises the concern that larger rewards may not only fail to boost but even depress the profitability of customers acquired through incentivized customer referral programs (Van den Bulte et al. 2018). In a field experiment with an online bank and an observational study with a telecom service provider, Wolters et al. (2020) indeed found that offering larger rewards increased the number of acquired customers but decreased the average value of these customers. Notably, the bank customers in the high versus low reward condition who made a referral did not differ markedly in profitability, tenure, number of log-ins in the prior year, gender, or age, suggesting a lack of selection on these observables. Nor was there a significant difference in the number of referrals made in the prior year. This led the authors to conclude that the negative effect of reward on the quality of referrals likely reflected induced changes in the referral behavior and the resulting quality of referrals and not adverse selection on those observed traits. Such adverse selection is what the present study investigates while accounting for the possibility that selection (also) operates through unobserved traits, like personality (Viswanathan et al. 2018).

Adverse selection may stem from differences in referrer characteristics even with a single reward size. For example, people especially keen on only making recommendations that are useful to their connections—out of altruism, a sense of duty to their community, or fear of creating a negative impression when making poor recommendations—may be less likely to participate in a campaign yet be more effective if and when they do participate (Kornish and Li 2010, Xiao et al. 2011, Van den Bulte et al. 2018). In their analysis of a telecom referral program, Wolters et al. (2020) indeed find that, after controlling for reward size, having made more referrals in the past is associated with making more but less profitable referrals now.

Hence, managers of referral programs and social advertising campaigns should be concerned about adverse selection in both observables and unobservables even at a single reward level. Also, knowing what observable traits have opposite associations with participation versus effectiveness would be useful when deciding who to attract and who to avoid as paid referrers or endorsers.

Overall, extant theory and research suggest at least three broad categories of customer traits that may be associated with adverse selection. First, people with a large network of contacts are in a position to reach out to more people and earn greater rewards in campaigns where rewards increase with the number but not quality of responses. Hence, simply having a greater network may entice opportunistic participation by people who do not expect their contacts to be good matches for the product or service. Adverse selection may also be unintentional as a large network typically exhibits greater variety in needs and preferences and more weak than strong ties (Roberts et al. 2009, Katona et al. 2011, Barasch and Berger 2014). Consequently, the endorsed product is less likely to be a good match for the average follower the larger one’s network is.

Second, some people may simply be very selective in how often and what they endorse, whereas others may be closer to “serial endorsers.” Again, this may be associated with intentional opportunism but may also be associated with having an ebullient extravert personality (Viswanathan et al. 2018) or fear of creating a negative impression when making poor recommendations (Toubia and Stephen 2013). The latter may explain why community members with more followers have been observed to make fewer endorsements (Lanz et al. 2019). Either way, having engaged in more campaigns in the past may be predictive of greater participation but lower effectiveness (Wolters et al. 2020).

Third, people with a stronger sense of altruism or civic duty toward the members of their networks are more likely to be keen on only making recommendations that are useful to consumers. People who view their fellow endorsers more as friends than competitors or strangers and see themselves more embedded in the community of endorsers, the argument suggests, may dispositionally care more about being helpful to consumers rather than reaping financial rewards. If so, they would be less likely to participate in an incentivized campaign but more effective when they do participate.

In short, the size of one’s network, one’s embeddedness in the endorser community, and one’s level of participation in prior campaigns are observable traits likely associated with adverse selection. Their relevance would be consistent not only with the prior research findings briefly discussed here but also with the better-matching mechanism affecting the quality of referrals (e.g., Van den Bulte et al. 2018) and the notion that intrinsic- and image-related concerns motivate engagement in WOM (e.g., Toubia and Stephen 2013). The list of factors is not exhaustive, of course, and additional elements may be at work.

2.2. Contingent vs. Noncontingent Reward and Adverse Selection

Whether self-selection creates a positive or negative association between participation and performance in incentivized WOM programs likely also depends on the nature of the incentive, standard economic theory implies. If the incentive is contingent on performance and if exerting effort to boost performance is costly to the program participant, then a positive association is likely. In contrast, if the incentive is contingent on participation rather than performance, as is true of the very great majority of sponsored tweet campaigns, then higher incentives are likely to draw in marginal participants with little intrinsic motivation or ability to produce positive outcomes for the firm. Such negative self-selection would present marketers with a quantity-quality conundrum in who to target when designing viral-for-hire campaigns.

2.3. Reward Size and Participation

Unsurprisingly, several studies document instances where increasing referral rewards increased referral rates (e.g., Hinz et al. 2011, Wolters et al. 2020). However, even in viral-for-hire campaigns, endorsers may be motivated by more than financial gain. People share online content for other reasons as well, including a desire to maintain status or an altruistic desire to share information and suggestions that others will deem useful (e.g., Toubia and Stephen 2013, Barasch and Berger 2014, Fang and Liu 2018, Khern-am-nuai et al. 2018). This mixture of intrinsic and extrinsic motives in sharing online content, combined with the well-documented tendency for extrinsic incentives to depress intrinsic motivation (Deci et al. 1999), implies that higher financial incentives need not translate into higher participation or higher effectiveness of viral-for-hire agents (Jenkins et al. 1998, Sun et al. 2017, Burtch et al. 2018, Jung et al. 2020). This mechanism involves the potential endorsers but not the final audience, and hence, it operates even when audience members do not know that the message is sponsored and the credibility, persuasiveness, or legitimacy of endorsers is unaffected by the incentive.

Furthermore, it is possible that once financial rewards meet a particular level, increasing reward size further is ineffective. Several reasons may generate this phenomenon. One is satisficing behavior resulting in financial motivation being swamped by other motives, including altruism and status considerations, once the reward threshold is met. Other reasons include decreasing marginal utility of reward size combined with an already very generous minimum or with higher reward sizes that are still too low to result in a sufficiently large difference in utility (Camerer and Hogarth 1999).

In short, higher incentives may but need not translate into higher participation or effectiveness of viral-for-hire campaigns. Whether and at what level they do are empirical questions. However, when they do, it is likely that the three factors associated with adverse selection moderate the effect. These factors may affect participation in organic WOM but are expected to matter even more when incentives are offered.

3. Research Setting and Design

3.1. Paid Endorsement and Sponsored Tweets

Sponsored tweets are a form of online incentivized WOM or paid endorsement that allows companies to recruit individual endorsers of their own choice. Specifically, advertisers post so-called “tasks” on a paid endorsement platform (a broker website similar to Amazon Mechanical Turk), and microbloggers registered on the platform can take on the tasks requiring them to post or retweet some ad for monetary rewards. Paid endorsement has gained particular popularity in China, where many websites act as platforms. Weibo (weibo.com), the largest Chinese microblog site, launched its official paid endorsement platform in 2012. The success of a paid endorsement campaign depends on how many endorsers participate and on how well they expand reach (i.e., views), generate engagement (i.e., likes, comments, and retweets), increase traffic (i.e., clicks), and boost sales.

3.2. Research Setting

We conducted a field experiment on Weituitui (weituitui.com),2 a paid endorsement platform with more than 40,000 registered endorsers with an account on Weibo. Weituitui enables advertisers to recruit endorsers at prespecified pay rates for their sponsored tweet campaigns. To initiate a campaign, an advertiser posts a task describing her needs on Weituitui. In the task, the advertiser also specifies how much an endorser will be paid as a linear or stepwise linear function of the endorser’s number of followers on Weibo. To penalize robot followers and inactive followers, Weituitui uses the number of verified followers to calculate the reward for an endorser. The platform has an internal algorithm to compute the percentage of verified followers based on how actively an endorser’s followers engage with her past tweets. Note that payment for endorsing a sponsored tweet is based on participation and the number of verified followers, not on the engagement level achieved.

As a paid endorsement platform featuring nanoinfluencers, Weituitui has several policies in place to ensure that the rewards are large enough to encourage endorsers with small numbers of followers to participate. The rewards for endorsers with fewer than 1,000 verified followers are fixed (10–49: 0.1 renminbi (RMB); 50–99: 0.2 RMB; 100–499: 0.3 RMB; 500–999: 0.5 RMB), regardless of the reward structure. Note, RMB (renminbi) is the currency used in China and 1 RMB is roughly 0.15 U.S. dollars. Endorsers with fewer than 10 verified followers are not allowed to participate. The reward for an endorser with more than 1,000 verified followers is no less than 0.5 RMB.

In a task, the advertiser provides the URL of the target tweet containing the ad. The advertiser can impose some requirements for the task such as how long the endorser should keep the retweet on her timeline and the minimal length of the comment in the retweet. Furthermore, the advertiser can specify who is eligible for the task. Adherence to eligibility requirements is checked automatically by the platform for a few requirements (e.g., retweeting between 9 a.m. and 9 p.m.) but typically needs to be verified manually by the advertiser. If an endorser decides to participate, she needs to retweet the given tweet, fulfill the requirements, and then submit the URL of her retweet. The duration of a task ranges from three to five days. Once the task ends, the advertiser has three days to manually approve or disapprove the submissions depending on whether the endorser has truly retweeted the given tweet and fulfilled the requirements. All remaining submissions are approved automatically by the platform after the three-day window. Because of this autoapproval policy, opportunistic endorsers or scammers may submit a random URL even if they have not retweeted the tweet. For approved tasks, the endorsers are paid, and Weituitui charges a 30% commission fee. That fee is reduced to 15% for an extremely small proportion (0.3%) of endorsers who have also posted tasks themselves and have spent more than 1,000 RMB on Weituitui.

3.3. Experiment Design

To induce exogenous variation in participation, we post a pair of identical tasks offered at three different pay rates. We manipulate whether a specific endorser is eligible for any compensation for a specific task (i.e., whether the pay rate is zero or nonzero). We use two nonzero pay rates, applying a linear pricing scheme as it is easy to implement and understand. The two nonzero pay rates are 0.0002 RMB and 0.0004 RMB per verified follower, respectively. The former is the lowest linear rate allowed by the platform and the most common one,3 used in as many as 87% of all tasks, whereas the latter is at the 96th percentile of all linear rates used on Weituitui. Figure 1 plots the payment curves for the two nonzero pay rates, showing how the number of verified followers map into the financial rewards at the low and high pay rates. The percentile of endorsers is given along the top of the figure (e.g., 59% of endorsers have fewer than 500 verified followers). Note how rewards at the two nonzero pay rates differ only for endorsers with more than 1,250 verified followers or about 23% of all endorsers in our experiment. Being offered the high rather than the low rate doubles the financial rewards for endorsers with more than 2,500 verified followers, who account for 13% of all endorsers and for 57% of those with more than 1,250 verified followers. Another 16% of the latter receives between 50% and 100% higher financial rewards from the high rather than the low rate. Whether the pay rate is zero or higher affects the payment to all endorsers, not only the top 23%.

Figure 1. The Payment Curves at Low and High Pay Rates

We use the last two digits of endorsers’ six-digit Weituitui identification numbers, which are independent of endorsers’ characteristics, to randomize the pay rates offered. Specifically, we do so by posting two identical tasks at the low and high pay rates (i.e., 0.0002 versus 0.0004 RMB per follower, respectively). Then, in each task, we impose an eligibility restriction such that only endorsers whose last two digits of their Weituitui identification fall in a certain range (e.g., 00–24) are eligible to participate in the task. In such a setup, those endorsers who are eligible for the high (low) pay rate tasks are offered the high (low) pay rate. Those who are not eligible for any of them are offered a pay rate of zero because they will not be paid even if they participate in the two tasks. To ensure that the two tasks are identical except for the pay rate, we post two identical tweets with the same Weibo account and then invite endorsers to retweet these two tweets in the two tasks. We call these two tasks a task pair.

We repeat this experimental procedure 16 times using not one but two pairs of tasks per week, doing so in 8 different weeks in a span of 12 weeks (February to April). Each week, we post two pairs of tasks. Each task in a pair pertains to exactly the same product, vendor, and price. We divide endorsers into four different groups based on their identification (i.e., 00–24, 25–49, 50–74, 75–99) such that each endorser is eligible for exactly one of the four tasks at a nonzero pay rate in that week. The four tasks are posted within seconds of each other so that they show up right next to each other on the platform. To avoid within-pair order artifacts, we rotate across weeks whether the task listed first offers a low or high pay rate. Our tasks are rotated over six products from two vendors on taobao.com. Each task is open for participation for at least 72 hours and does not impose any particular effort-related task requirements except for retweeting and liking the tweet. Table 1 shows the pay rates and eligibility restrictions of four tasks posted in a given week (A1 and A2 featuring product A and B1 and B2 featuring product B).

Table

Table 1. Assignment of Endorsers to the Treatment Conditions in a Given Week

Table 1. Assignment of Endorsers to the Treatment Conditions in a Given Week

Pay rateProduct AProduct B
0.0002 RMB/verified follower00–24 (A1)50–75 (B1)
0.0004 RMB/verified follower25–49 (A2)76–99 (B2)
Zero50–9900–49

This design mimics a real-world scenario where tasks with different pay rates and eligibility requirements are visible to endorsers. In each of the eight weeks of the experiment, any endorser logging into the platform can see exactly four experimental tasks posted—apart from all the other tasks posted organically. Only one of these four tasks provides a specific endorser with a nonzero pay rate, which is either low or high.

We rotate the nonzero pay rate for the same endorsers across weeks. This allows us to observe the behavior of the same person under different nonzero pay rates. Overall, we have between-subject variation across all three pay rates in the same week, within-subject variation in zero versus one of the two nonzero pay rates in the same week, and within-subject variation across all three pay rates across weeks.

We register a new account on Weibo for this experiment. Because the new account has no followers, all the observed engagements on the sponsored tweet come from the paid endorsers and their followers.

3.4. Data

The data are collected and analyzed at the endorser-task level. We focus on the 8,351 active endorsers who participated in at least one paid endorsement task in the 6 months before our experiment started or in the 6 months after our experiment ended (a 15-month window in total), excluding those who were not registered yet by the time a task was closed for participation. Excluding nonactive endorsers who are very unlikely to respond to any campaign limits our inferences to those potential endorsers who advertisers care about. In every task, we record whether each of the active endorsers participates and how many engagements she generates. The number of engagements is collected for each retweeter (endorser) using the application programming interface provided by Weibo. Herein, participation means that an endorser has actually retweeted the message. The participation statistics are summarized in Table 2. Excluding 1 task for which we failed to track the engagement because of a technical issue, the 31 tasks we posted attracted 2,241 participations from 1,016 endorsers. Some endorsers joined the platform or met the activity requirements only during the 12-week window and hence, were not exposed to all tasks, which explains why the total number of observations is less than 8,351 × 31.

Table

Table 2. Experiment Statistics

Table 2. Experiment Statistics

No.
Number of weeks8
Number of products6
Number of tasks31
Number of endorsers8,351
Number of participating endorsers1,016
Number of participations2,241
Number of <task, endorser> observations230,259

Detailed task-by-task participation and engagement statistics are reported in Table OA1 in Online Appendix 1. The number of potential endorsers participating in a campaign is fairly stable across weeks, but there is a slight decline in the number of likes over time and a marked decline in the number of comments and retweets. Decreasing engagement might be because of temporal variation in tweeting volume, an increase in the number of other tweets posted by the participants, or a loss of “freshness” of the tweets in our own tasks.

To avoid having tasks running out of budget before they were closed for participation, we tried different budgets (100, 200, and 300 RMB) in the first four weeks and found that 200 RMB (higher than or equal to the budgets of 92% of sponsored tweet campaigns on the platform) was more than enough. Therefore, the budgets for all the tasks in the subsequent weeks were set to 200 RMB.

We collected data on the characteristics of endorsers by scraping their profiles on Weituitui, which includes profile information on Weibo. The information on Weituitui includes the number of verified followers, the number of tasks participated in, the total reward from referring new users, the number of friends on Weituitui’s internal social network, and their tenure on Weituitui. The information on Weibo includes the number of followers and the number of tweets (including retweets). We also computed the monetary reward from participating in the campaign, which is a function of both observed endorser characteristics and the manipulated pay rate. All the measured variables can vary throughout our experiment, but cross-time within-endorser variation amounts to less than 3% of the total variation in each variable. Table 3 organizes these variables into four categories and also reports the two dummies used to capture the three pay rate treatment levels (with 0.0002 RMB per follower or low being the baseline). All these variables can be set or be observed by advertisers for targeting purposes.

Table

Table 3. Description of Independent Variables

Table 3. Description of Independent Variables

VariableDescription
Pay rate (manipulated)
 ZeroWhether an endorser is offered zero financial incentive
 HighWhether an endorser is offered 0.0004 RMB per verified follower
Financial reward
RewardFinancial reward of an endorser for participating (Figure 1)
Social media fan base
 followersNumber of followers on Weibo
 verifiedRatioPercentage of verified followers among all followers
Prior activity level
 tweetsNumber of tweets posted on Weibo
 tasksTotal number of tasks participated in the past
 approvalRatePercentage of tasks approved in the past
Community embeddedness
 tenureNumber of days an endorser was registered on Weituitui (rescaled to [0,1])
 referralRewardTotal reward received for referring others to register on Weituitui
 friendsNumber of friends an endorser has on Weituitui’s internal social network

Table 4 reports the means of the endorser traits for the entire data set as well as for the three pay rate levels. Comparisons of cell means across treatment levels document proper randomization.4

Table

Table 4. Randomization Check

Table 4. Randomization Check

Cell means by pay rateANOVA test (p-value)
AllZeroLowHighAllLow vs. zeroLow vs. high
log(followers)6.936.936.936.920.970.980.87
verifiedRatio0.440.440.440.440.850.760.57
log(tweets)5.905.905.915.900.980.850.90
log(tasks)2.482.482.482.470.900.700.67
approvalRate0.740.740.740.740.460.380.21
tenure0.210.210.210.210.810.560.56
log(referralReward)−3.66−3.66−3.66−3.670.750.640.45
log(friends)0.380.380.380.380.860.650.59
Observations230,259172,72529,64227,892230,259202,36757,534


Note. ANOVA, analysis of variance.

We quantify effectiveness using three outcome measures: likes, comments, and retweets. However, because the model-based results and substantive insights are very similar across these outcomes, for most analyses we simply report the results for the sum of all three, which we label actions.

4. Adverse Selection and the Quality-Quantity Conundrum

We now investigate whether potential endorsers with particular characteristics are more likely to participate in a sponsored tweet campaign but less likely to generate engagements from their followers. We also quantify the resulting quality-quantity conundrum that managers face.

We investigate the presence of selection in both observables and unobservables by means of a selection model. To account for the presence of multiple observations per potential endorser, we extend the standard model with endorser-specific random effects that are allowed to be correlated across selection and outcome equations. To account for the fact that our engagement data are counts, we use a Poisson-normal model structure rather than the more common linear-normal structure. To strengthen model identification, we use an exclusion restriction rather than relying merely on functional form (Puhani 2000, d’Haultfoeuille 2010).5 Specifically, we exploit the fact that the platform rewards participation rather than outcomes and use the exogenously manipulated pay rates as shifters of participation but not outcome given participation.

4.1. Model

Although sample selection and repeated observations are both common in research and can be addressed effectively when they appear separately, little has been done to address both challenges jointly, especially when the dependent variable is counts. We propose a model to deal with both challenges.

Following the standard sample selection model, we use a probit model for the selection equation. Letting zit indicate whether endorser i participates in task t, the participation decision is given by

zit=1(αwit+δui+ξit>0),(1)
where the row vector wit includes an intercept and the sets of variables that affect the participation decision of endorser i in task t. The variables in wit include characteristics of endorser i, characteristics of task t, characteristics specific to the endorser-task combination, and the pay rate dummy variables. They also include 15 dummy variables, one for each pair of tasks that differ only in pay rate (the intercept captures the 16th pair). These task pair dummies absorb any task-specific effect apart from pay rate, including characteristics of the product featured, characteristics of our post on Weituitui, and temporal shocks common across endorsers. Two paired tasks share the same fixed effect as they are identical except for the pay rate offered. The random terms uiN(0,1) and ξitN(0,1) capture endorser and endorser-task unobserved characteristics that affect the participation decision. The selection equation is a probit model with random effects.

Because the engagement or outcome data (likes, comments, and retweets) are all counts and feature both overdispersion and repeated observations, we use a Poisson-normal model with random effects for the outcome equation. Let yit* be the potential outcome of endorser i on task t. The outcome equation is

E[yit*|xit,εi,ϵit]=λit=exp(βxit+σεi+γϵit),(2)
where xit includes an intercept and the set of variables that affects the potential engagement generated by endorser i for task t. The difference between xit and wit is that only wit includes the pay rate dummies and financial rewards. The random terms εiN(0,1) and ϵitN(0,1) capture the effect of unobserved endorser and endorser-task characteristics, respectively, and both account for overdispersion in the counts. As usual in selection models, the outcome equation pertains to potential outcomes (i.e., outcomes generated if one participates, regardless of whether one actually participated).

The random shocks in the selection and outcome equations need not be independent. Specifically, the endorser-level unobserved characteristics that affect selection or participation may also affect outcomes and so may endorser-task–level unobserved characteristics. As a result, we further assume that the endorser and endorser-task shocks are bivariate normally distributed, with a correlation of ρ and τ, respectively:

(uiεi)N((00), (1ρρ1)), (ξitϵit)N((00), (1ττ1)).

Unlike prior sample selection models (Heckman 1979, Winkelmann 2007, Greene 2009), our model allows for random effects that are correlated across equations. This enables us to identify adverse selection in unobservables. Online Appendix 2 discusses the likelihood function of the proposed model, its connection with existing models, and how to compute the relative partial effect of a variable on the expected potential outcome E[yit*|xit] (i.e., the expected outcome of an endorser regardless of whether she actually participated) and the expected actual outcome E[yit=zityit*|xit,wit] (i.e., the expected outcome taking participation into consideration). Online Appendix 3 further discusses how to estimate the parameters in the model, how to estimate the standard errors of the relative partial effects, and how to derive the expression of the expected actual outcome. Our estimation procedure is available as an open-source R package at https://CRAN.R-project.org/package=PanelCount.

4.2. Validity of Exclusion Restriction

To strengthen model identification beyond merely relying on functional form, we use the exogenously manipulated pay rates and the financial rewards offered as shifters of participation but not outcome given participation. This exclusion restriction is justified because the Weituitui platform rewards participation rather than outcomes. However, to be valid, the high and low pay rates must also actually shift participation. The cell means in Table 5 clearly show that they do. It is notable that the high pay rate does not boost participation more than the low pay rate does, but this does not affect their validity as shifters. Table 6 shows that the pay rates also shift the outcomes but that they do so only through shifting participation. As Table 6 shows, conditional on participation, the pay rate does not shift outcome levels beyond the outcomes observed under zero pay. This further justifies the restriction of excluding pay rate from the outcome equation. Additionally, regressing participation on log(reward) and log(followers) for all endorsers as well as for the top 23% and the bottom 77% separately and doing the same for log(1 + Actions) conditional on participation also show that financial rewards shift participation upward (all three p-values are <0.001) but do not shift effectiveness given participation (all three p-values are >0.200).

Table

Table 5. Participation Rates by Pay Rate (Cell Means)

Table 5. Participation Rates by Pay Rate (Cell Means)

Set of task-endorser observationsPay rateANOVA rest (p-value)
Zero, %Low, %High, %AllZero vs. lowLow vs. high
All endorsers0.053.713.77<0.001<0.0010.702
Top 23% endorsers only0.086.896.84<0.001<0.0010.920
Bottom 77% endorsers only0.042.722.81<0.001<0.0010.559


Notes. Total number of observations in each row: 230,259 (all), 54,754 (top 23%), and 175,505 (bottom 77%). ANOVA, analysis of variance.

Table

Table 6. Outcome Levels by Pay Rate (Cell Means)

Table 6. Outcome Levels by Pay Rate (Cell Means)

Set of task-endorser observationsPay rateANOVA test (p-value)
ZeroLowHighAllZero vs. lowLow vs. high
All endorsers
 likes0.000050.0040.004<0.001<0.0010.981
 comments0.000170.0100.006<0.001<0.0010.115
 retweets0.000220.0100.007<0.001<0.0010.285
 actions0.000440.0230.016<0.001<0.0010.198
Top 23% endorsers only
 likes0.000070.0090.008<0.001<0.0010.825
 comments0.000190.0140.002<0.001<0.0010.053
 retweets0.000270.0120.006<0.001<0.0010.332
 actions0.000540.0350.016<0.001<0.0010.162
Bottom 77% endorsers only
 likes0.000040.0020.002<0.001<0.0010.671
 comments0.000170.0090.007<0.001<0.0010.535
 retweets0.000210.0090.007<0.001<0.0010.521
 actions0.000410.0200.016<0.001<0.0010.552
Participations only
 likes0.0880.1010.1000.9720.7850.960
 comments0.3300.2700.1590.2130.7850.101
 retweets0.4180.2580.1760.2950.4850.260
 actions0.8350.6300.4350.2800.6510.173


Notes. Total number of observations in each row: 230,259 (all), 54,754 (top 23%), 175,505 (bottom 77%), and 2,241 (participations only). actions = likes + comments + retweets. ANOVA, analysis of variance.

As Table 7 shows, the modal engagement level generated by the typical sponsored tweet is nil. Because sponsored tweets are retweets of an original, the third row in Table 7 is consistent with the finding by Goel et al. (2016) that only a very small fraction of first-order retweets generates further retweets. The average in our data is very similar to that in their data (0.3).

Table

Table 7. Amount of Engagement Generated by the 2,241 Sponsored Tweets

Table 7. Amount of Engagement Generated by the 2,241 Sponsored Tweets

Engagement type012345–10>10MeanSDMax
likes2,0721451444110.100.5015
comments2,104692115516110.221.5934
retweets2,13050915715150.231.7136
actions1,92419344121129280.553.3373


Note. SD, standard deviation.

4.3. Evidence of Adverse Selection: Participation vs. Potential Effectiveness

Table 8 presents model-free evidence of adverse selection on observables. For each of the eight endorser characteristics, we split the data at the median value and compare the participation across the two halves as well as the engagement achieved across the two halves, where engagement is operationalized as actions (the sum of likes, comments, and retweets) conditional on participation. For six of eight characteristics, the top half of endorsers has a higher participation rate but lower engagement than the bottom half. The differences are significantly positive for participation and significantly negative for engagement for four of the characteristics. Using correlations rather than median splits leads to the same conclusion. For six characteristics, their correlation is positive with participation but negative with engagement, and for four of them, the two correlations are significantly different from zero with opposite signs.

Table

Table 8. Model-Free Evidence of Adverse Selection

Table 8. Model-Free Evidence of Adverse Selection

VariableDifference between top and bottom halvesCorrelation
Participation rate, %EngagementsParticipationEngagements
log(followers)0.802***−0.432**0.047***−0.050*
verifiedRatio0.0060.0510.0010.020
log(tweets)0.575***−0.694***0.038***−0.110***
log(tasks)1.496***−1.627***0.109***−0.159***
approvalRate0.734***−0.357*0.032***−0.038
tenure−0.092*−0.1340.002−0.046*
log(referralReward)0.679***−0.2930.028***−0.043*
log(friends)0.486***−0.2030.031***−0.024


Notes. Top and bottom halves are split by the median of each covariate. Engagements are conditional on participation.

 *p < 0.05; **p < 0.01; ***p < 0.001.

We now proceed to a model-based analysis, which not only controls for financial rewards and accounts for the correlations among the observables but also can detect selection on unobservables. Table 9 presents the selection model estimates where engagement outcomes are measured as the sum of likes, comments, and retweets. The first column presents the estimates of α in the probit equation for participation (Equation (1)). The second column presents the estimates of β in the Poisson-normal equation for the potential outcomes (Equation (2)). This coefficient measures the relative partial “effect” of the predictor variable on potential outcomes (i.e., the proportional change in the potential number of outcomes associated with a one-unit increase in the predictor). The third column reports the estimates of the population-averaged relative partial “effect” of each predictor on the actual rather than potential outcomes. These estimates, which in our data, roughly equal 2.2α + β, measure the proportional change in the actual number of outcomes associated with a one-unit increase in the predictor. The precise expression for this third type of effect is presented in Equations (A7)–(A10) in Online Appendix 2.

Table

Table 9. Joint Model of Participation and Effectiveness for Overall Engagements (Actions)

Table 9. Joint Model of Participation and Effectiveness for Overall Engagements (Actions)

ParticipationPotential effectivenessActual effectiveness
Social media fanbase
 log(followers)0.0000.0670.067
(0.019)(0.039)(0.057)
 verifiedRatio0.1500.1910.519
(0.107)(0.323)(0.386)
Prior activity level
 log(tweets)0.052**−0.364***−0.249***
(0.020)(0.055)(0.071)
 log(tasks)0.542***−0.395***0.793***
(0.022)(0.059)(0.084)
approvalRate0.0140.6650.695
(0.116)(0.376)(0.455)
Community belongingness
tenure−4.239***2.157***−7.139***
(0.201)(0.569)(0.750)
 log(referralReward)−0.027*0.021−0.038
(0.012)(0.048)(0.054)
 log(friends)−0.098**0.187−0.029
(0.037)(0.124)(0.143)
Incentive
 log(reward)0.292***0.640***
(0.028)(0.067)
payRate=zero−1.509***−3.310***
(0.109)(0.279)
payRate=high−0.030−0.066
(0.036)(0.079)
Heterogeneity
δ1.330***
(0.047)
σ1.879***
(0.126)
γ0.952***
(0.110)
Correlation
ρ−0.213***
(0.041)
τ0.233*
(0.092)
Fit
 Log likelihood−7,963.2
 Akaike information criterion16,038.4
 Bayesian information criterion16,617.8


Notes. N = 230,259. The coefficients of the intercept and the 15 task pair dummies are omitted for clarity. Similar results for individual engagements (i.e., likes, comments, and retweets) are reported in Table OA2 in Online Appendix 1. Table OA3 in Online Appendix 1 reports nearly identical results for the eight endorser traits in a model where the high pay rate is restricted to have an effect only for endorsers with more than 1,250 verified followers.

 *p < 0.05; **p < 0.01; ***p < 0.001.

Table 9 also reports the standard deviations of the random effects capturing unobserved endorser-level heterogeneity in the propensity to participate (δ) and to have high potential outcomes (σ), the standard deviation in the Poisson-normal term accounting for endorser-task–level heterogeneity in the outcomes (γ), and the correlations between unobservables operating at the participation versus outcome stages (ρ and τ).

We first discuss how several endorser characteristics have an opposite association with participation than with potential outcomes.6 This is true not only for observed characteristics but also for unobserved traits.

4.3.1. Social Media Fan Base.

Table 9 suggests that endorsers with a large number of followers and a high ratio of verified followers were not more likely or less likely to participate in our campaigns. However, this is after controlling for the financial rewards from participation, which themselves increase with the number of verified followers. Excluding financial rewards from the model results in both fan base variables having a significant positive association with participation. More notable is that the potential effectiveness (i.e., the expected outcome regardless of participation) did not vary with either the number of followers or the ratio of verified followers, even though financial rewards are excluded from the effectiveness equation. Overall, the conclusion is that people with a large base of verified followers are more likely to participate (at a greater cost to the marketer) but not more likely to generate markedly more engagement compared with other endorsers (setting aside participation).

4.3.2. Prior Activity Level.

Endorsers who tweeted more on microblogs were more likely to participate, probably because of greater interest or enthusiasm in sharing online content. Endorsers who participated in more tasks in the past were also more likely to participate in the tasks we posted. Such persistence in behavior is far from surprising. What is notable, rather, is that both forms of prior activity level were negatively associated with potential outcomes. People who were more involved in microblogging and in prior sponsored tweet campaigns were more likely to participate in our campaign but less likely to generate positive outcomes compared with other endorsers (setting aside participation).

The most straightforward explanation is that some people are “serial endorsers” who post regardless of whether the content is a good match for their followers. This would be quite consistent with financial motivations but inconsistent with altruistic motivations unless endorsers are extremely miscalibrated about what their followers value. Setting intentions aside, it is also possible that endorsers who have been more active in blogging and sponsored tweeting in the past posted more irrelevant content, hurting their reputation in online communities (Bock et al. 2005, Barasch and Berger 2014) and their effectiveness in our campaigns.

4.3.3. Community Embeddedness.

We find a similar pattern for the variables reflecting how well someone is embedded in the Weituitui community of endorsers. Endorsers who registered a longer time ago (measured by tenure), have more friends on Weituitui’s internal social network, and were active referrers were less likely to participate in the tasks we posted, even though they are not less effective in generating outcomes (setting aside participation). One likely reason is that endorsers with a long presence and central position in the community of endorsers are more deliberate and mindful about their followers’ interests when deciding what campaigns to participate in. This pattern is consistent with endorsements being motivated by altruism and status rather than only financial gain.

4.3.4. Unobservables.

The estimates of δ and σ indicate the presence of significant endorser-specific random effects in both participation and outcomes. Of greater interest is that these unobserved endorser-specific propensities to participate and be effective are negatively correlated, as indicated by the negative value of ρ. Hence, the unobserved propensities tend to have opposite associations with participation versus effectiveness, just like several observed characteristics do. For example, endorsers with a strong reputation for being cool or discerning, which is not observed in our study, might be less likely to participate but more likely to be effective when they do.

The positive correlation τ indicates that the unobserved characteristics at the endorser-task level have directionally consistent effects on participation and effectiveness. This suggests that endorsers self-select into specific tasks wherein they are more likely to generate a high level of engagements, counteracting in part the adverse selection operating at the endorser rather than endorser-task level.

4.3.5. Summary.

The main conclusion from Table 9 is the presence of adverse selection, enriching the model-free analysis in Table 8. Several characteristics—both observed and unobserved—associated with higher participation are associated with lower effectiveness, and none are associated positively with both participation and effectiveness.

4.4. Predictors of Actual Effectiveness

The main insight from the first two columns in Table 9 is that observed and unobserved endorser characteristics have different and often opposite associations with participation versus potential outcomes (i.e., regardless of participation status). The third column in Table 9 reports the associations of observed characteristics with actual outcomes, which are zero if an endorser does not participate. Note how the associations with participation dominate those with potential outcomes in terms of generating actual engagements.

4.5. Quantifying the Resulting Quality-Quantity Conundrum for Marketers

Having documented the presence of adverse selection, we now turn to the quality-quantity conundrum this poses for marketers. We do so by categorizing endorsers into four cells based on whether their predicted responsiveness and their predicted potential effectiveness are above or below the median (or mean) and quantifying how few endorsers score highly on both dimensions.

The expressions for predicted responsiveness and predicted potential effectiveness are given in Equations (A6) and (A2) in Online Appendix 2. Here, we just note five elements of our procedure. First, we label an endorser as effective (responsive) if her predicted potential to generate engagements (predicted probability to participate) is above the median (or mean), and we assess potential effectiveness for each type of outcome separately. Second, when making predictions, we assume that every endorser is eligible for every task and that the pay rate is the most common rate on the platform (i.e., the low rate of 0.0002 RMB per follower). This controls for variation induced by pay rate and focuses on the effects of endorsers’ characteristics available for targeting. The results are very similar if we assume that every endorser is paid at the high rate. Third, the calculations involve only the data and coefficients on observable endorser characteristics available to marketers for making targeting decisions. Fourth, we first score each endorser for each individual task and then categorize over all tasks. Finally, we use both median splits and mean splits for categorization. The former has the benefit of a clear baseline. In a two-by-two matrix where both participation and potential effectiveness are split at the median, the high-high cell should count 50% of the endorsers if the two variables are perfectly positively correlated, 25% if they are uncorrelated, and 0% if they are perfectly negatively correlated.

As the percentages in the upper rows of Table 10 show, the high-high and low-low cells count markedly fewer than 25% of the endorsers: only 11.8% for likes, 10.0% for comments, 9.5% for retweets, and 11.1% for all actions. These numbers convey in managerially relevant terms the importance of the quality-quantity conundrum resulting from the contrasting associations with participation versus potential effectiveness reported in Table 9.

Table

Table 10. Distribution of Endorsers by Responsiveness and Potential Effectiveness

Table 10. Distribution of Endorsers by Responsiveness and Potential Effectiveness

Likes, %Comments, %Retweets, %Actions, %
EffectiveIneffectiveEffectiveIneffectiveEffectiveIneffectiveEffectiveIneffective
Median cutoff
 Responsive11.838.210.040.09.540.511.138.9
 Unresponsive38.211.840.010.040.59.538.911.1
Mean cutoff
 Responsive2.329.00.430.90.430.91.130.2
 Unresponsive34.933.731.737.128.939.832.136.6


Note. The four cells do not always sum up to 100% because of rounding errors.

The numbers in the lower rows of Table 10 using mean splits are more dramatic, showing no more than 3% of the endorsers in the high-high cell. When responsiveness and effectiveness are uncorrelated, the percentages of endorsers in the high-high cell should be 11.6% for likes, 10.0% for comments, 9.2% for retweets, and 10.4% for actions in general. The values in the high-high cells using mean splits are all notably lower than those benchmarks, again because of the negative association between the propensity to participate and the potential effectiveness.7

The conundrum is more acute for higher-effort engagement (i.e., comments and retweets). The results of the selection model estimated for each outcome separately (Table OA2 in Online Appendix 1) explain why. For likes, there are two predictors with a significant association in the opposite direction as for participation, whereas there are three for comments and four for retweets. It is harder to find endorsers who score high on both participation and outcomes when the latter is measured as comments or retweets rather than likes.

5. Implications for Targeting

Taking into account adverse selection on observable characteristics will help marketers make better decisions on which endorsers to target and recruit. To quantify these benefits, we compare the performance of six model-based approaches to rank prospective endorsers: three single-equation models that ignore the differences between participation and effectiveness and three models that do take into account the differences.

The first approach uses a probit model with random effects to predict the probability of participating in a campaign and then divides the predicted participation probability by the offered reward size (based on the platform’s policy shown in Figure 1) to obtain the recruiting efficiency. Prospects are then ranked by their recruiting efficiency and targeted going down starting from the top until the budget constraint is met.

In contrast, the other two single-equation approaches only consider effectiveness among participants using a Poisson with random effects or a zero-inflated negative binomial model. The predicted effectiveness given participation is then divided by the offered reward size, after which prospective endorsers are ranked on their expected efficiency and targeted going down starting from the top until the budget constraint is met.

The three two-equation approaches consider that observed predictors may have different associations with participation versus effectiveness. The first two combine the single-equation models into a “probit RE + Poisson RE” specification and a “probit RE + zero-inflated NB” specification, both of which are estimated without taking into account correlated unobservables. The last model is the sample selection model used in the main analysis (Table 9). For each model, the predicted effectiveness is divided by the offered reward, after which prospects are ranked and selected until the budget runs out.

We use 10-fold crossvalidation to evaluate the performance of the six ranking and targeting approaches. Specifically, we randomly divide the endorsers into 10 folds, use the data in 9 folds to train models for targeting, and then evaluate their performance on the remaining fold. This process is repeated 10 times in a round-robin fashion so that every fold serves as the holdout test set. For fair comparison, we use the same budget to select endorsers for each targeting model. Based on the model obtained on the training data, we predict the targeting metric of the test endorsers, rank them, and select the top endorsers until the budget is exhausted, assuming that each selected endorser will actually participate. The latter is conservative but ensures that the marketer can meet their financial commitments. We allow the weight of the last selected endorser to be fractional to ensure that the budget is fully used. After selecting the endorsers, we sum up their actual engagements observed in the data. Note that the actual engagements are zero if a selected endorser did not participate in a task.

Table 11 summarizes the average actual engagements achieved under each of the six approaches. It does so for four different budget sizes, assuming the low rate of 0.0002 RMB per follower. Using the high rate of 0.0004 RMB per follower does not affect the main conclusion, which is that two-equation models reflecting that the same observable predictor may affect participation and effectiveness in different directions lead to better campaign outcomes. Using the best two-equation model rather than the best single-equation model leads to an improvement of 13%–55%, with the gap being larger for smaller budgets. The bottom row of Table 11 reports that more than 87% of the recruited endorsers have fewer than 1,250 followers, which is more than their population proportion of 77%. In other words, the targeting favors endorsers with fewer followers.

Table

Table 11. Actual Engagements of Selected Eligible Endorsers by Different Targeting Models

Table 11. Actual Engagements of Selected Eligible Endorsers by Different Targeting Models

TypeTargeting modelBudget (RMB)
2550100200
One equationProbit RE4.498.5310.6015.50
Poisson RE4.808.5312.6017.53
Zero-inflated NB4.708.5011.7718.42
Two equationProbit RE + Poisson RE6.768.8317.0320.75
Probit RE + zero-inflated NB6.169.2115.4820.83
Sample selection model7.429.8116.9619.20
Statistics of sample selection model% Eligible users recruited in holdout sample7122237
% Recruited eligible users with <1,250 verified followers96949187


Notes. The values of the best one-equation model and the best two-equation model in each column are in bold. The pay rate offered is 0.0002 RMB per verified follower. RE, random effects; NB, negative binomial.

6. The Causal Effect of Pay Rate

Because we exogenously manipulated the pay rate, the shifts in participation and outcomes induced by changing pay rates reported in the model-free analyses in Tables 5 and 6 should, one would presume, amount to average treatment effects of being offered a specific pay rate on endorsement behavior.

In this section, we first specify more precisely under what conditions the causal interpretation of these shifts is valid. Then, we investigate to what extent the effect of pay rate on participation is moderated by the three categories of variables used in the selection model. We do not present moderator analyses for outcomes because Table 6 clearly shows that the pay rate shifts outcomes only through participation.

6.1. Causal Identification

Endorsers saw four tasks being offered at the same time on the platform. For one task, they qualified at a nonzero pay rate, whereas three quarters of randomly selected endorsers did not. The other three tasks offered them a zero pay rate while offering a quarter of randomly selected others a nonzero pay rate. Consequently, the treatments consist of “being offered a specific pay rate (zero; low; high) while knowing that others are being offered the same pay rate and yet others are being offered a different but well-specified pay rate for the same task.” Note that there is an invariant mapping between the pay rate offered to the focal person and the pay rates offered to others. Similarly, the fraction of people being offered each pay rate is constant. Hence, there is only a single version of each treatment, as required by the stable unit treatment value assumption (SUTVA). To the extent that endorsers make sideways comparisons and are envious when others get more than them or feel good when others get less than them (e.g., Fehr and Schmidt 1999, Ho et al. 2006), the pay rate effects we observe will be larger than effects without such sideways comparisons.

Participation involves only the behavior of the endorser who is being given the offer, and hence, it is not subject to the standard concern about interference in network settings violating the SUTVA. In contrast, the engagement generated by an endorser involves the behavior of their followers. To the extent that endorsers have overlapping networks of followers, the differences in outcomes observed across treatment levels may suffer from interference. However, the differences in effectiveness across conditions are overwhelmingly driven by differences in participation (Tables 6 and 8) rather than in effectiveness given participation. Because participation itself is free of network interference, the risk of interference invalidating an interpretation of the observed shifts in outcome levels across pay rates as average treatment effects is very likely ignorable.

In short, pay rate effects on participation can be interpreted as traditional average treatment effects. The shifts in engagement can be as well if one deems the threat of network interference ignorable. More conservatively, given the large number of observations, the effects of pay rate on engagement can be interpreted as how much, on average, a particular endorser’s engagement outcome is affected when only that endorser’s pay rate is changed (Sävje et al. 2021).

6.2. Causal Effects on Participation

The causal interpretation of the results in Table 5 is clear cut. Providing the minimum allowed nonzero pay rate boosts participation compared with offering no pay, but increasing the pay rate further from low to high (i.e., from the minimum to the 96% percentile) does not boost participation. This is true for both the entire set of observations and for the two subsets of endorsers identified by whether they have more than 1,250 verified followers and therefore, receive a different payment amount under the high and low pay rate conditions. In short, whether there is an incentive matters, but the size of the pay rate does not. This echoes previous findings about how incentives affect participation in surveys (Singer and Ye 2013), but observing the same pattern on a platform that people join to engage in viral-for-hire activities is notable. Also, consistent with financial gain being a motivating factor, the causal effect of being offered a nonzero pay rate is larger (6.9% versus 2.8%) among those in the top 23% of the network size distribution who—at the same pay rate—receive larger payments than people in the bottom 77%.

One small disadvantage of the model-free analysis of cell means in Table 5 is that it does not account for the fact that we have multiple observations over the same 8,351 endorsers and over the same 16 pairs of tasks involving the same product and posted in the same week. Because treatment levels were assigned exogenously, such interdependencies among observations should not bias the effect size estimates but are likely to inflate significance levels.

We, therefore, also analyze the differences in participation using a linear probability model estimated using ordinary least squares (OLS) in which we control for endorser fixed effects and task pair fixed effects and allow for the errors to be heteroscedastic and arbitrarily correlated within endorsers, thereby accounting for possible dependencies of an endorser’s decisions across tasks and weeks. We also estimate a binary logit model with fixed effects for endorser and task pairs. That model is estimated using conditional maximum likelihood (e.g., Chamberlain 1980) such that all observations from endorsers who did not participate in any task are excluded, resulting in a data set of 26,622 observations. Finally, we also estimate a Cox proportional hazard model with stratification on the endorser (i.e., an endorser-level fixed effect on the baseline hazard) and with task pair fixed effects. This model analyzes how long it takes a potential endorser to participate. Because we consider the duration data to be censored at the time we pull the task from the platform or the budget runs out (which happened in 2 of the 31 tasks, both in week 1), this hazard analysis accounts for the fact that some endorsers might not have been able to participate because they were too slow in doing so.

Each model contains only two dummy regressors. The first is Paid, and it equals one for both the low and high pay rates and zero for the zero pay rate. The second is High and equals one for the high pay rate only. Hence, in a model with both dummies, the coefficient of Paid captures the effect of the low rate and that of High captures to what extent the effect of the high rate differs from that of the low rate.

Table 12 reports the estimates of the three models for three sets of endorsers: all, bottom 77%, and top 23%. These analyses confirm the conclusions from the model-free analyses, including the significance levels after controlling for lack of independence across observations.

Table

Table 12. The Effect of Pay Rate on Participation: Model Estimates

Table 12. The Effect of Pay Rate on Participation: Model Estimates

All endorsersBottom 77% endorsersTop 23% endorsers
OLSCLogitCoxOLSCLogitCoxOLSCLogitCox
Paid0.037***5.183***4.699***0.027***4.967***4.485***0.068***5.547***5.018***
(0.002)(0.125)(0.114)(0.001)(0.157)(0.142)(0.004)(0.212)(0.194)
High0.0010.0290.0400.0010.0310.049−0.00010.0070.011
(0.001)(0.059)(0.047)(0.001)(0.077)(0.063)(0.003)(0.097)(0.072)
Endorser FEYesYesYesYesYesYesYesYesYes
Pair FEYesYesYesYesYesYesYesYesYes
Observations230,25926,62226,622175,50516,50416,50454,7549,7529,752
Within R20.0300.0220.057
R20.1490.1340.185
Log likelihood−2,599−3,867−1,545−2,160−977−1,602


Notes. In the OLS model, the empirical standard errors reported in parentheses are clustered on endorsers. In the conditional logit (CLogit) and Cox hazard models with endorser-level fixed effects (FEs), endorsers who do not participate in any tasks in our experiment do not contribute to the likelihood function.

 ***p < 0.001.

Next, we investigate systematic variation in the treatment effect. Instead of regressing participation only on the pay rate dummies, we also include product terms of pay rate dummies and moderators of interest. Because moderator effects or interactions are easier to interpret in linear models, we conduct these analyses only with the linear probability model estimated using OLS. We mean center the moderators before creating the product terms such that the linear effects of pay rate are identical to the average effects in the OLS models without interactions in Table 12.

We first investigate whether the null effect of high versus low stems from not accounting for the fact that the financial rewards offered are a function of both the pay rate and the number of verified followers. We do so by interacting the two pay rate dummies with log(verifiedFollowers). The results in Table 13 are clear cut. The interactions with Paid indicate that the effect of being offered a nonzero pay rate is greater among endorsers with a larger number of verified followers, to be expected if endorsers are motivated by financial gain. However, the main and interaction effects of High indicate that the effect of being offered the high versus low rate is essentially nil regardless of the number of verified followers.

Table

Table 13. How Pay Rate Effects on Participation Vary by the Number of Verified Followers

Table 13. How Pay Rate Effects on Participation Vary by the Number of Verified Followers

All endorsersBottom 77% endorsersTop 23% endorsers
Paid0.037***0.027***0.068***
(0.002)(0.001)(0.004)
Paid × log(verifiedFollowers)0.012***0.009***0.014**
(0.001)(0.001)(0.005)
High0.0010.001−0.00004
(0.001)(0.001)(0.003)
High × log(verifiedFollowers)−0.0004−0.0010.001
(0.001)(0.001)(0.003)
Endorser FEYesYesYes
Pair FEYesYesYes
Observations230,259175,50554,754
Within R20.0400.0260.059
R20.1580.1370.187


Notes. Robust standard errors clustered by endorsers are in parentheses. FE, fixed effect.

 **p < 0.01; ***p < 0.001.

Next, we investigate to what extent the effect of pay rate is moderated by the endorser characteristics. We interact the two pay rate dummies (Paid and High) with the three sets of endorser characteristics (fan base, embeddedness in the endorser community, and level of participation in prior campaigns). To facilitate comparison, the main and interaction effects on Paid and High are reported in side-by-side columns. A quick eyeballing of Table 14 indicates that many of these characteristics indeed moderate but only for being paid a nonzero rate rather than for being paid the high versus low rate and that the moderator effects are larger for the top 23% of the endorsers who receive larger payments than the bottom 77%.

Table

Table 14. How Pay Rate Effects on Participation Vary by Endorser Characteristics

Table 14. How Pay Rate Effects on Participation Vary by Endorser Characteristics

All endorsersBottom 77% endorsersTop 23% endorsers
PaidHighPaidHighPaidHigh
Main effect0.036***0.0010.027***0.0010.068***−0.0002
(0.001)(0.001)(0.001)(0.001)(0.004)(0.003)
Interaction effect with
 log(followers)0.005***0.00030.004***0.0010.016***0.002
(0.001)(0.001)(0.001)(0.001)(0.004)(0.003)
verifiedRatio0.029***−0.0090.017*−0.0110.105***0.004
(0.007)(0.005)(0.007)(0.006)(0.023)(0.018)
 log(tweets)0.004***0.00030.003**0.00010.006**0.0002
(0.001)(0.001)(0.001)(0.001)(0.002)(0.002)
 log(tasks)0.027***−0.0010.022***−0.0010.042***0.001
(0.001)(0.001)(0.001)(0.001)(0.003)(0.002)
 approvalRate0.015***−0.0000.012***0.0020.028**−0.010
(0.003)(0.003)(0.002)(0.003)(0.009)(0.008)
 tenure−0.139***−0.018−0.116***−0.013−0.195***−0.030
(0.014)(0.011)(0.015)(0.013)(0.030)(0.023)
 log(referralReward)−0.003*0.0004−0.0020.0004−0.0040.001
(0.001)(0.001)(0.001)(0.001)(0.002)(0.002)
 log(friends)−0.0050.004*−0.0050.003−0.0030.006
(0.003)(0.002)(0.003)(0.002)(0.009)(0.006)
Endorser FEYesYesYes
Pair FEYesYesYes
Observations230,259175,50554,754
Within R20.0820.0590.129
R20.1950.1670.247


Notes. Robust standard errors clustered by endorsers are in parentheses. The linear effects of endorser characteristics are not included in the models because they are absorbed into the fixed effects (FEs).

 *p < 0.05; **p < 0.01; ***p < 0.001.

The pattern across the various moderator effects is consistent with financial motivation. People with larger networks and with a greater proportion of verified followers (two factors that combined with pay rate, determine actual payment amounts) exhibit a greater effect of being offered a nonzero pay rate. So do people who have tweeted more, have participated in more tasks in the past, and have joined the platform more recently. Hence, apart from the insensitivity to low versus high rates, the patterns of interactions in Tables 13 and 14 are consistent with endorsers being motivated by financial gains.

Our various analyses imply that even though endorsers were financially motivated, offering the high pay rate was a waste of money. Campaign managers posting tasks on the platform seem to agree with that conclusion, as the overwhelming majority (87%) of all campaigns using a linear pricing scheme offered the low rate.

7. Validity Concerns and Checks

There are several features of the data that may potentially undermine the validity of our findings and conclusions. We discuss them in detail.

7.1. Platform Policies

One important concern when conducting experiments on digital platforms is whether, and to what extent, platform policies and algorithms affect the assignment to treatment. One specific concern is whether Weituitui’s policies affected exposure to tasks at particular pay rates. The short answer is no. The platform does not send notifications of new tasks, let alone of tasks at specific pay rates or specific eligibility requirements determining whether one qualifies for a nonzero pay rate. All tasks are listed in the reverse chronological order they are posted, and members of a pair of tasks (with high versus low pay rate) appear right next to each other because we posted them within seconds. The platform does allow endorsers to sort tasks by overall budget and remaining budget, but the overall budget was held constant within each task pair in our experiment. We address insufficient remaining budget as a separate point.

Another concern is that Weituitui’s verification approach likely takes into consideration the activity and response level of followers in the past. However, it is not clear how this would bias the coefficients of verifiedRatio toward a pattern implying adverse selection during the experiment. If anything, it would make the estimates in the effectiveness equation more positive and closer to those in the selection equation. Because the model includes verifiedRatio, we do not see why the other estimates would be affected notably toward a pattern consistent with adverse selection.

Weibo’s algorithm may give greater prominence to tweets of more active microbloggers and possibly more active endorsers. This would boost the effectiveness of frequent bloggers and participants such that our negative estimates of adverse selection through log(tweets) and log(tasks) in the outcome equation in Table 9 partly reflect these policies. This would underestimate the extent of adverse selection associated with these two endorser characteristics. Conversely, if Weibo displays only a portion of all tweets by very active microbloggers and possibly very active endorsers, then our estimates would overestimate the extent of adverse selection associated with these two endorser characteristics as opposed to platform policies. However, the overall impact on our research conclusions is likely very localized. First, the number of prior tasks is unlikely to affect Weibo’s algorithmic decisions. Of those participating in any of our tasks, 89% posted more than 100 tweets and 44% posted more than 1,000 during our experiment. Because the average number of retweeted tasks among those participants was only two, it is fair to assume that the number of prior tasks is swamped by the number of prior tweets. Hence, Weibo’s algorithm is unlikely to have produced large overestimates or underestimates in the difference in the coefficient of log(tasks) in selection versus effectiveness. Second, the selection model controls for both the number of prior tweets and prior tasks. Hence, there is no reason to expect estimates of the other coefficients or the cross-equation correlation in unobservables to be affected notably. Finally, regardless of the net effect of Weibo’s algorithm on our estimates of adverse selection, the ensuing quality-quantity conundrum for marketers would still be correctly quantified given the algorithm in place.

7.2. Research Participation Effect

Our use of the endorser identification’s last two digits as an eligibility requirement rather than more common bases, like their number of followers or a specific level of (in-)activity over a specific time window, likely struck the platform members as uncommon. This could have made them suspect they were part of a study and may have affected their decision to participate in the campaigns we posted. This scenario is plausible a priori but is inconsistent with the fact that the high pay rate was just as effective as the low pay rate in boosting participation. If people knew they were part of an experiment and acted strategically with an eye on maximizing their long-term remuneration, then they would have exhibited a markedly greater response to the high pay rate compared with the low pay rate. Instead, they were equally responsive. In short, our results are inconsistent with a research participation effect affecting our pay rate effect estimates and by implication, the decision to participate in our campaigns. If endorsers were driven by a sense of altruism toward the experimenters, then this might have boosted participation levels across the board and perhaps more so among people who knew they were especially effective at generating engagement. Neither of these possibilities can account for the presence of adverse selection. Furthermore, the first possibility is inconsistent with the participation rate being merely 0.05% at pay rate zero, and assuming the second possibility is true implies that adverse selection is even more pronounced than what we measured.

7.3. Budget Limit

Every task posted on Weituitui has a preset budget. The budget keeps decreasing as more and more endorsers participate, even if their responses are not yet approved. Once the budget is exhausted or is insufficient to pay the reward for an endorser signing up to participate, that endorser is automatically prevented from participating in the task. Therefore, as the budget depletes, there might be some endorsers willing to participate who are excluded from doing so.

We find that, in weeks 2–8, the 200- to 300-RMB budgets were always large enough to accommodate at least 98.5% and on average, 99.6% of all endorsers who might have been willing to participate at any time. However, the 100-RMB campaign budgets we had set in week 1 proved to be too low to accommodate more than 95% of all possibly interested endorsers for three of the four tasks posted in that week. This inability of interested potential endorsers to participate could produce a bias toward zero in the estimates of treatment effects on participation or the estimates of associations of endorser traits with participation.

We addressed this concern in several ways. First, as reported in Table 12, a hazard model that explicitly accounts for the inability to participate because of budget restrictions leads to the same conclusions about the causal effect of pay rates on participation as a linear probability model or a conditional logit model that assumes that everyone who wanted to participate could do so. Second, re-estimating the joint selection model of participation and potential outcomes after removing data from the first week leads to the same substantive insights (compare Table 9 with Table OA4 in Online Appendix 1).

Finally, very few endorsers likely were affected by budget exhaustion in week 1. Because tasks posted in week 2 may exhibit exceptionally high participation since they were posted immediately following Chinese New Year and because the participation levels in weeks 3–8 are remarkably stable around 60–80 per task, we use the latter as baseline. The number of actual participants in weeks 3–8 suggests that the four tasks would, in expectation, have attracted about 75 × 4 = 300 participants in week 1. Instead, they attracted only 134 participants (Table OA1 in Online Appendix 1). This suggests that about 170 willing endorsers were unable to participate. In other words, budget exhaustion in week 1 is likely to have affected only 170 of the more than 8,000 endorsers on the platform.

7.4. Attribution

There are 16 occasions where one and the same endorser participated in two tasks in the same week for the same product but at two different pay rates (one zero and one nonzero). This creates a potential attribution problem between the two sponsored retweets on the same product by the same endorser. However, only 6 of the 96 engagement outcomes observed for those participations (16 endorsers, two tasks, three types of outcomes) have a nonzero value. Of those six nonzero outcomes, the highest recorded value is three, and the median is one. Hence, the attribution problem is extremely minor and too small to bias our estimates notably.

7.5. Validity of Exclusion Restriction

The policy of the platform to reward participation rather than outcomes justifies the exclusion of the pay rate indicators and financial rewards from the outcome equations. This exclusion restriction is further supported by the model-free analysis in Table 6 documenting that pay rate has no direct effect on effectiveness given participation. Nevertheless, it is conceivable that offering a higher pay rate motivated the participating endorsers to expend higher effort and hence affected their effectiveness indirectly. In sponsored tweet campaigns like ours, the only venue for endorsers to expend differential effort is in their text included in their retweet. Two metrics that reflect such effort are the number of words and the use of emojis. The former metric is commonly used to measure the effort level of survey respondents (Singer and Ye 2013). Analyses reported in Table OA5 in Online Appendix 1 show no evidence that pay rate affected word count or the use of emojis. Hence, the data are inconsistent with the concern that the pay rate might have boosted the effectiveness through effort expended and that the exclusion restriction might hence not be warranted. Furthermore, evidence of adverse selection persists even after including the pay rate indicators and financial rewards jointly or separately in the effectiveness equation (Table OA6 in Online Appendix 1). We do not use this as our main specification because without exclusion restriction, the identification of the selection model relies solely on the parametric assumption about the random shocks. Moreover, the log likelihood of the model barely changes when pay rate indicators and financial rewards are both included in the outcome equation, indicating that their inclusion amounts to overparameterization.

7.6. Adverse Selection vs. Moral Hazard

Our study documents a negative association between the propensity to participate as an endorser and the tendency to be effective when endorsing. We interpret this pattern in the data as stemming from adverse selection (i.e., a precontractual difference in types between those likely to participate and those likely to be effective). However, in principal-agent settings like the one we study, moral hazard often forms an alternative explanation for such negative associations between the agent’s propensity to enter a contract and the outcomes for the principal.

Whereas adverse selection is predicated on differences in “precontractual types” (i.e., who the agent is), moral hazard is predicated on “postcontractual actions” (i.e., what the agent does). The notion of some postcontractual action by one party that detrimentally affects the outcomes for the other party is essential to the concept of moral hazard (Williamson 1985, Tirole 1988, Kreps 1990, Bolton and Dewatripont 2005, Salanié 2005). In economics, the basic assumption is that (1) the party taking the postcontractual action does so purposively (i.e., does so only if it benefits them somehow). Salanié (2005, pp. 4 and 119) notes three additional necessary conditions for moral hazard; (2) the agent takes a decision (action) after the contract is signed that affects both their utility and that of the principal, (3) the action that the agent would choose spontaneously is not Pareto optimal, and (4) the principal designing the contract only observes the outcome and not the action taken. The Williamson (1985, p. 47) short definition of moral hazard as “ex post opportunism” emphasizes that for moral hazard to exist, the action must occur after the contract is agreed upon and be motivated by self-interest seeking.

The second and third necessary conditions for moral hazard to operate are not satisfied in our research setting. Specifically, the potential endorsers have no reason to engage in actions after the contract is signed that would decrease the effectiveness of their retweet. For instance, agents are not contractually expected by the principal to engage in any effortful action beyond the mere act of retweeting, and agents who care only about the financial incentive have no opportunity to shirk on effort or diligence at the detriment of the principal. Of course, as we noted earlier, it is possible for agents to also have nonfinancial motives for participating in sponsored tweet campaigns, like being useful or boosting their status. However, these motivations for sharing a sponsored tweet among one’s followers offer no reason to engage in actions that would also reduce the engagement that the sponsoring principal cares about.

Not only the research setting but also the actual data are inconsistent with moral hazard being the driver for the negative association between participation and effectiveness we document. Whereas distinguishing between adverse selection and moral hazard is often challenging in cross-sectional observational data, the same is not true when the data include multiple contracts per person (e.g., Abbring et al. 2003) or feature exogenous variation in contract offerings (e.g., Karlan and Zinman 2009, Powell and Goldman 2021). Our data have both characteristics and allow us to perform three sets of informative tests (seven tests in total).

The first three tests involve the lack of a direct correlation between financial incentives and effectiveness, net of participation. As already reported in Table 6, the pay rate is unrelated to effectiveness conditional on participation. Nor is it related to the potential effectiveness separate from participation (Table OA6 in Online Appendix 1). The latter is also true for the actual financial reward offered (Table OA6 in Online Appendix 1). All this supports the notion that in our setting, participating endorsers had no financial incentive to engage in moral hazard.

The next test leverages the fact that we offered multiple contracts per person, with exogenous variation in incentives. The fact that a nonzero pay rate shifts the participation rate even after controlling for individual-level fixed effects (Table 12) is inconsistent with the notion that the heterogeneity involved in the adverse selection consists merely of heterogeneity in the propensity to engage in moral hazard.

The final three tests leverage the fact that, as we noted earlier, the only place for endorsers to exert effort affecting engagement with a retweet is the text included in that retweet. First, the quality-quantity conundrum persists even after controlling for the number of words and the use of emojis in the outcome equation (Table OA7 in Online Appendix 1). Second, these effort variables do not have a statistically significant impact on the outcome. Third, if the negative association between participation and effectiveness stemmed from moral hazard through lack of effortful actions, then endorsers who are more likely to participate (e.g., those who tweet more often, have participated in more tasks, or have a shorter tenure) should exert significantly lower effort. However, endorser characteristics associated with higher participation are not associated with lower effort exerted (Table OA5 in Online Appendix 1).

In short, neither the institutional details of the research setting nor the seven tests provide any indication that the quality-quantity conundrum we document resulted from moral hazard.

7.7. Adverse Selection vs. Repetition Wear out

A final potential alternative explanation for the negative association between participation and effectiveness in our data is repetition wear out (Naik et al. 1998, Braun and Moe 2013) or repetition weariness (Chae et al. 2019), especially because tweets do not allow for much variation or emotionality in copy content (MacInnis et al. 2002, Bass et al. 2007, Bruce 2008). The mechanism here would be that prior retweet frequency causally depresses current retweet effectiveness because followers exposed to multiple retweets for the same sponsor from the same endorser become disinterested, bored, or even irritated by that endorser’s retweets. Hence, prior participation would lower the effectiveness of current participation such that endorsers’ average participation rate correlates negatively with their average effectiveness. In other words, the negative correlation we document would stem not from heterogeneity-based adverse selection but from a negative cross-equation carryover effect.

To assess this alternative explanation, we re-estimate our sample selection model on subsamples excluding repeated participations. First, we exclude a user from all subsequent tasks involving a given product once that user has participated in a task featuring that product. In a more conservative robustness check, we exclude a user from all subsequent tasks once that user has participated in any of our tasks. The negative correlation in both observable and unobservable endorser characteristics continues to hold (Tables OA8 and OA9 in Online Appendix 1). The same is true when using all observations but controlling for the number of prior participations as a covariate in both the participation and effectiveness equations (Tables OA10 and OA11 in Online Appendix 1). In short, none of the four tests provide evidence that the negative association between participation and effectiveness stemmed from repetition wear out.

8. Conclusions

Sponsored tweeting has gained popularity in recent years, especially among small businesses. A field experiment provides new insights into how financial incentives and endorser characteristics affect both participation and effectiveness. There are two main insights.

First, endorsers exhibit adverse selection in both observables and unobservables, which creates a quality-quantity conundrum for marketers when deciding who to target as endorsers. Several observed and unobserved endorser characteristics associated with a higher propensity to participate have a negative association with being an effective endorser given participation. This makes it a challenge for managers to recruit a sizable number of high-quality endorsers. Only 9.5%–11.8% of the endorsers were above the median in both the propensity to participate and the propensity to be effective compared with a benchmark of 25% in the absence of any association. A simulation of various targeting approaches shows that targeting candidate endorsers by scoring and ranking them using models taking adverse selection on observables into account improves campaign outcomes by 13%–55% compared to methods ignoring adverse selection.

Second, increasing the pay rate beyond the minimum required by the platform had no effect on participation. Two possible reasons are an already very generous minimum combined with a decreasing marginal utility of reward size and conversely, a high pay rate that is still too low to generate a sufficiently large difference in financial rewards (Camerer and Hogarth 1999). However, the effects of offering a nonzero pay rate were higher among endorsers who had more to gain and those who had participated in many campaigns in the past. This rejects the notion that endorsers simply did not care about financial rewards.

A few additional findings are noteworthy. Unlike Wolters et al. (2020) in the context of customer referral programs, we (i) do find evidence of adverse selection in both observables and unobservables, (ii) do so in a setting where paid endorsers do not react differently to high versus low incentives, and (iii) find that the adverse selection cannot be explained by changes in endorsement behavior induced by varying the level of incentives. Unlike what Godes and Mayzlin (2009) reported in the context of a restaurant chain, the pattern of adverse selection we document is fairly general rather than hinging on the particular combination of heavy users being more likely to engage in WOM but their network contacts being less responsive because they already know and patronize the product or service. Unlike Lanz et al. (2019) in a context of unpaid endorsement requests, we do not find that people with more followers are less likely to endorse, even after we control for the fact that rewards increase with the number of followers. Unlike Iyengar et al. (2011) and Katona et al. (2011) in the context of contagion, we do not find evidence that people who are more central in the network are either more or less influential on a per-follower basis.

8.1. Implications for Practice

Given the presence of adverse selection and the resulting quality-quantity conundrum, marketers should not assess and target endorsers solely based on the actual engagement they generated in the campaigns they participated in previously. As the analysis shows, endorsers observed to generate high engagements are not necessarily the most effective who could have been recruited. The reason is that the potentially most effective endorsers tend to participate in fewer campaigns. This type of “latent gold” endorser may easily be overlooked by marketers and analysts who do not distinguish between participation and effectiveness or between actual and potential effectiveness. As we illustrated, the output of a selection model can be used to categorize available endorsers into a simple two-by-two matrix to inform the decision of who to target, and targeting endorsers using scoring and ranking models reflecting adverse selection on observables can lead to better campaign outcomes.

Of course, better targeting would not be the solution to the quality-quantity conundrum if it stemmed from moral hazard or repetition wear out rather than adverse selection. If the conundrum stems from moral hazard, then using performance-based rewards, possibly with convex payouts, should be considered. If the conundrum stems from wear out, then enforcing a minimum delay between two similar campaigns for the participation by the same endorser should be considered.

Among the endorsers in our experiment, 93% had fewer than 5,000 verified followers, and 97% had fewer than 10,000. The presence of adverse selection among such nanoinfluencers implies that—even though they may be more effective on a per-follower basis than influencers with more followers—marketers relying on nanoinfluencers still face a notable quality-quantity conundrum.

Marketers should actually assess if, rather than simply assume that, offering higher rewards translates into higher participation. Perhaps the minimum level set by the platform is already so generous that further increases will be ineffective. Campaign managers posting tasks on the platform we studied might have thought so, as 87% of all campaigns using a linear pricing scheme offered the lowest possible rate. Perhaps some sought-after endorsers are not greatly affected by varying pay rates. If so, marketers should seek to understand what else motivates them to participate in sponsored tweet campaigns and how campaigns can be developed to appeal to endorsers on those dimensions which they care about. For instance, an experiment of social advertising on WeChat (Huang et al. 2020) found that displaying the status and expertise of the endorsing user in the message affected the campaign’s virality, especially so for status products. Does this happen only because recipients cared about the endorsers’ status or also because better endorsers are likely to participate if such tokens of recognition are given (Toubia and Stephen 2013)? Also importantly, what constitutes effective reward structures when endorsers care about more than only money, like being helpful or having high status (Xiao et al. 2011)? Although our study does not provide answers, the evidence of adverse selection certainly indicates that companies seeking to boost the participation among the more effective endorsers would benefit from raising and answering the question.

Practitioners and consultants may seek to identify the optimal pay rate, which would require using more than two rates. The latter is appropriate for a study where the reason for manipulating the pay rate is to strengthen identification of the selection model but is obviously a nonstarter when the goal is pay rate optimization based on a set of finely graduated causal effects.

8.2. Implications for Research

Helping companies and platforms address the challenges posed by adverse selection and the possibly quirky effect of financial and nonfinancial incentives is a promising venue for future research. Our findings also raise the question to what extent adverse selection can account for the facts that very few tweets are re-retweeted and that the very great majority of retweets do not spread beyond one additional remove into the network (Goel et al. 2016). For instance, Taylor et al. (2013) found in the context of Facebook Offers that the product-level characteristics that predicted sharing versus adoption by followers were only mildly correlated, and they offered this as a likely explanation for why campaigns rarely go viral. Our own study, documenting negative associations in both observables and unobservables, provides even stronger credence for this possible explanation for the low virality of many online campaigns. Given the interest in making viral marketing more effective, there is much promise in further investigating the different predictors and causes of participation versus effectiveness.

Two other questions worth investigating are to what extent adverse selection is more pronounced among influencers with more followers than the predominantly “nanoinfluencers” we studied and to what extent this contributes to the higher per-follower effectiveness that the latter is believed to have.

As noted, our evidence of adverse selection in observables and of the number of followers being unrelated to both the probability of endorsement (after controlling for financial rewards) and the effectiveness given participation stands at variance with other studies conducted in other contexts (Iyengar et al. 2011, Katona et al. 2011, Lanz et al. 2019, Wolters et al. 2020). This suggests that important contingencies are likely at work. Until those are understood, scholars and practitioners should be wary of making unwarranted generalizations about adverse selection across the many domains of influencer marketing.

Our study documents adverse selection and quantifies the resulting quality-quantity conundrum in a specific setting with mostly unbranded products and with a specific set of observed endorser characteristics. It would be valuable to investigate the phenomenon in other settings where brand involvement and person-brand fit are likely to affect the endorsers’ participation and their followers’ engagement. A selection modeling approach incorporating correlated random effects should be valuable there as well.

Acknowledgments

The authors benefited from feedback from the review team and from session participants at the 2015 Marketing Science Conference, the 2015 Workshop on Information in Networks, and the 2015 INFORMS Annual Meeting.

Endnotes

1 See https://influencermarketinghub.com/influencer-marketing-benchmark-report-2021/ (accessed January 30, 2023).

2 An archive of the website is available at https://web.archive.org/web/20211130081611/https://www.weituitui.com/ (accessed January 30, 2023).

3 The minimum rate had been set to 0.0002 RMB per follower more than two years before we ran the experiment.

4 We add 1 (0.01) before applying the log transformation to integer (continuous) variables.

5 If the regressors in the selection and outcome equations are identical and the outcome equation is linear, the identification of a sample selection model relies solely on the nonlinearity of the inverse Mills ratio. However, the inverse Mills ratio is quasilinear over a wide range of its arguments, especially when the probability of selection is low, which can result in ill conditioning (e.g., Puhani 2000). When the outcome equation of a sample selection model is nonlinear, no analytical result is available for identification, but it can be shown with simulations that identification may suffer from the same problem if the regressors are identical in both equations.

6 Because platform members self-select to log onto Weituitui and check offers, variations in participation reflect variations in both exposure and in accepting tasks given exposure. Because exposure likely varies with some of the measured covariates like log(tasks), this affects the interpretation of the coefficients of these covariates in the participation equation in Table 9 and their interactions in Table 14.

7 These managerially oriented analyses involve only the observed endorser characteristics and their coefficients. Incorporating point estimates for each endorser’s unobservable random effects, information not available to a typical marketer planning a new campaign, would further depress the values in the high-high cells.

References

  • Abbring JH, Heckman JJ, Chiappori PA, Pinquet J (2003) Adverse selection and moral hazard in insurance: Can dynamic data help to distinguish? J. Eur. Econom. Assoc. 1(2–3):512–521.CrossrefGoogle Scholar
  • Barasch A, Berger J (2014) Broadcasting and narrowcasting: How audience size affects what people share. J. Marketing Res. 51(3):286–299.CrossrefGoogle Scholar
  • Bass FM, Bruce N, Majumdar S, Murthi BPS (2007) Wearout effects of different advertising themes: A dynamic Bayesian model of the advertising-sales relationship. Marketing Sci. 26(2):179–195.LinkGoogle Scholar
  • Berger J (2013) Contagious: Why Things Catch on (Simon and Schuster, New York).Google Scholar
  • Berger J, Schwartz EM (2011) What drives immediate and ongoing word of mouth? J. Marketing Res. 48(5):869–880.CrossrefGoogle Scholar
  • Berman B (2016) Referral marketing: Harnessing the power of your customers. Bus. Horizons 59(1):19–28.CrossrefGoogle Scholar
  • Bock G, Zmud RW, Kim Y, Lee J (2005) Behavioral intention formation in knowledge sharing: Examining the roles of extrinsic motivators, social-psychological forces, and organizational climate. MIS Quart. 29(1):87–111.CrossrefGoogle Scholar
  • Bolton P, Dewatripont M (2005) Contract Theory (MIT Press, Cambridge, MA).Google Scholar
  • Braun M, Moe WW (2013) Online display advertising: Modeling the effects of multiple creatives and individual impression histories. Marketing Sci. 32(5):753–767.LinkGoogle Scholar
  • Bruce NI (2008) Pooling and dynamic forgetting effects in multitheme advertising: Tracking the advertising sales relationship with particle filters. Marketing Sci. 27(4):659–673.LinkGoogle Scholar
  • Burtch G, Hong Y, Bapna R, Griskevicius V (2018) Stimulating online reviews by combining financial incentives and social norms. Management Sci. 64(5):2065–2082.LinkGoogle Scholar
  • Camerer CF, Hogarth RM (1999) The effects of financial incentives in experiments: A review and capital-labor-production framework. J. Risk Uncertainity 19(1/3):7–42.CrossrefGoogle Scholar
  • Chae I, Bruno HA, Feinberg FM (2019) Wearout or weariness? Measuring potential negative consequences of online ad volume and placement on website visits. J. Marketing Res. 56(1):57–75.CrossrefGoogle Scholar
  • Chamberlain G (1980) Analysis of covariance with qualitative data. Rev. Econom. Stud. 47(1):225–238.CrossrefGoogle Scholar
  • d’Haultfoeuille X (2010) A new instrumental method for dealing with endogenous selection. J. Econometrics 154(1):1–15.CrossrefGoogle Scholar
  • Deci EL, Koestner R, Ryan RM (1999) A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psych. Bull. 125(6):627–668.CrossrefGoogle Scholar
  • Fang B, Liu X (2018) Do money-based incentives improve user effort and UGC quality? Evidence from a travel blog platform. PACIS 2018 Proc., vol. 132 (Association for Information Systems, Atlanta).Google Scholar
  • Fehr E, Schmidt KM (1999) A theory of fairness, competition, and cooperation. Quart. J. Econom. 114(3):817–868.CrossrefGoogle Scholar
  • Godes D, Mayzlin D (2009) Firm-created word-of-mouth communication: Evidence from a field test. Marketing Sci. 28(4):721–739.LinkGoogle Scholar
  • Goel S, Anderson A, Hofman J, Watts DJ (2016) The structural virality of online diffusion. Management Sci. 62(1):180–196.LinkGoogle Scholar
  • Greene W (2009) Models for count data with endogenous participation. Empir. Econom. 36(1):133–173.CrossrefGoogle Scholar
  • Haenlein M, Libai B (2017) Seeding, referral, and recommendation: Creating profitable word-of-mouth programs. Calif. Management Rev. 59(2):68–91.CrossrefGoogle Scholar
  • Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47(1):153–161.CrossrefGoogle Scholar
  • Hinz O, Skiera B, Barrot C, Becker JU (2011) Seeding strategies for viral marketing: An empirical comparison. J. Marketing 75(6):55–71.CrossrefGoogle Scholar
  • Ho TH, Lim N, Camerer CF (2006) Modeling the psychology of consumer and firm behavior with behavioral economics. J. Marketing Res. 43(3):307–331.CrossrefGoogle Scholar
  • Huang S, Aral S, Hu YJ, Brynjolfsson E (2020) Social advertising effectiveness across products: A large-scale field experiment. Marketing Sci. 39(6):1142–1165.LinkGoogle Scholar
  • Iyengar R, Van den Bulte C, Valente TW (2011) Opinion leadership and social contagion in new product diffusion. Marketing Sci. 30(2):195–212.LinkGoogle Scholar
  • Jenkins GD Jr, Mitra A, Gupta N, Shaw JD (1998) Are financial incentives related to performance? A meta-analytic review of empirical research. J. Appl. Psych. 83(5):777–787.CrossrefGoogle Scholar
  • Jung J, Bapna R, Golden J, Sun T (2020) Words matter! Toward pro-social call-to-action for online referral: Evidence from two field experiments. Inform. Systems Res. 31(1):16–36.LinkGoogle Scholar
  • Karlan D, Zinman J (2009) Observing unobservables: Identifying information asymmetries with a consumer credit field experiment. Econometrica 77(6):1993–2008.CrossrefGoogle Scholar
  • Katona Z, Zubcsek PP, Sarvary M (2011) Network effects and personal influences: The diffusion of an online social network. J. Marketing Res. 48(3):425–443.CrossrefGoogle Scholar
  • Khern-am-nuai W, Kannan K, Ghasemkhani H (2018) Extrinsic vs. intrinsic rewards for contributing reviews in an online platform. Inform. Systems Res. 29(4):871–892.LinkGoogle Scholar
  • Kornish LJ, Li Q (2010) Optimal referral bonuses with asymmetric information: Firm-offered and interpersonal incentives. Marketing Sci. 29(1):108–121.LinkGoogle Scholar
  • Kreps DM (1990) A Course in Microeconomic Theory (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • Lanz A, Goldenberg J, Shapira D, Stahl F (2019) Climb or jump: Status-based seeding in user-generated content networks. J. Marketing Res. 56(3):361–378.CrossrefGoogle Scholar
  • MacInnis DJ, Rao AG, Weiss AM (2002) Assessing when increased media weight of real-world advertisements helps sales. J. Marketing Res. 39(4):391–407.CrossrefGoogle Scholar
  • Naik PA, Mantrala MK, Sawyer AG (1998) Planning media schedules in the presence of dynamic advertising quality. Marketing Sci. 17(3):214–235.LinkGoogle Scholar
  • Powell D, Goldman D (2021) Disentangling moral hazard and adverse selection in private health insurance. J. Econometrics 222(1):141–160.CrossrefGoogle Scholar
  • Puhani PA (2000) The Heckman correction for sample selection and its critique. J. Econom. Surveys 14(1):53–68.CrossrefGoogle Scholar
  • Roberts SGB, Dunbar RIM, Pollet TV, Kuppens T (2009) Exploring variation in active network size. Soc. Networks 31(2):138–146.CrossrefGoogle Scholar
  • Salanié B (2005) The Economics of Contracts: A Primer, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
  • Sävje F, Aronow PM, Hudgens MG (2021) Average treatment effects in the presence of unknown interference. Ann. Statist. 49(2):673–701.CrossrefGoogle Scholar
  • Singer E, Ye C (2013) The use and effects of incentives in surveys. Ann. Amer. Acad. Political Soc. Sci. 645(1):112–141.CrossrefGoogle Scholar
  • Sun Y, Dong X, McIntyre S (2017) Motivation of user-generated content: Social connectedness moderates the effects of monetary rewards. Marketing Sci. 36(3):329–337.LinkGoogle Scholar
  • Taylor SJ, Bakshy E, Aral S (2013) Selection effects in online sharing: Consequences for peer adoption. Proc. fourteenth ACM Conf. Electronic Commerce, vol. 821 (Association for Computing Machinery, New York), 821–836.Google Scholar
  • Tirole J (1988) The Theory of Industrial Organization (MIT Press, Cambridge, MA).Google Scholar
  • Toubia O, Stephen AT (2013) Intrinsic vs. image-related utility in social media: Why do people contribute content to Twitter? Marketing Sci. 32(3):368–392.LinkGoogle Scholar
  • Van den Bulte C, Bayer E, Skiera B, Schmitt P (2018) How customer referral programs turn social capital into economic capital. J. Marketing Res. 55(1):132–146.CrossrefGoogle Scholar
  • Viswanathan V, Tillmanns S, Krafft M, Asselmann D (2018) Understanding the quality–quantity conundrum of customer referral programs: Effects of contribution margin, extraversion, and opinion leadership. J. Acad. Marketing Sci. 46(6):1108–1132.CrossrefGoogle Scholar
  • Williamson OE (1985) The Economic Institutions of Capitalism (Free Press, New York).Google Scholar
  • Winkelmann R (2007) Count data models with selectivity. Econometric Rev. 17(4):339–359.CrossrefGoogle Scholar
  • Wolters HM, Schulze C, Gedenk K (2020) Referral reward size and new customer profitability. Marketing Sci. 39(6):1166–1180.LinkGoogle Scholar
  • Xiao P, Tang C, Wirtz J (2011) Optimizing referral reward programs under impression management consideration. Eur. J. Oper. Res. 215(3):730–739.Google Scholar