PartTime Bayesians: Incentives and Behavioral Heterogeneity in Belief Updating
Abstract
Decisions in management and finance rely on information that often includes winlose feedback (e.g., gains and losses, success and failure). Simple reinforcement then suggests to blindly repeat choices if they led to success in the past and change them otherwise, which might conflict with Bayesian updating of beliefs. We use finite mixture models and hidden Markov models, adapted from machine learning, to uncover behavioral heterogeneity in the reliance on difference behavioral rules across and within individuals in a beliefupdating experiment. Most decision makers rely both on Bayesian updating and reinforcement. Paradoxically, an increase in incentives increases the reliance on reinforcement because the winlose cues become more salient.
This paper was accepted by Gustavo Manso, finance.
Funding: C. AlósFerrer gratefully acknowledges financial support from the Deutsche Forschungsgemeinschaft under [Grant AL1169/4], part of the research unit “Psychoeconomics” (FOR 1882).
Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2022.4584.
1. Introduction
Overwhelming evidence shows that human decision makers have a limited grasp of probabilities and, especially, of how beliefs should be updated in the face of new information. Previous research has identified a veritable catalogue of deviations from Bayesian updating, most of them taking the form of heuristics and biases that become relevant when certain informational triggers are present (Kahneman and Tversky 1972, Grether 1980, Camerer 1987). Numerous studies have demonstrated that those deviations affect and distort financial decisions (Shleifer 2000, Shiller 2005, Thaler 2005, Asparouhova et al. 2015, Frydman and Camerer 2016). In this work, we are interested in a kind of deviation from Bayesian updating that is particularly relevant for management and finance. Decisions in those fields often rely on information that includes winlose feedback: gains and losses, success and failure, beating the competition or not, and so on. Indeed, as shown by Knutson and Bossaerts (2007), gains and losses are crucial to understand the neural foundations of financial decisions.
Whenever information carries a winlose component, elementary reinforcement behavior (Thorndike 1911, Schultz et al. 1997, Sutton and Barto 1998) dictates to repeat choices if they led to success in the past, and change them if they led to failure.^{1} This kind of simple “winstay, loseshift” behavior (Thorndike’s “law of effect”) is in line with modelfree reinforcement learning, which has been studied in recent contributions in neuroscience as opposed to modelbased (reinforcement) learning, which itself can be construed as incorporating Bayesian behavior (Daw et al. 2005, Doll et al. 2012, Feher da Silva and Hare 2020). For example, Daw et al. (2011) show that, in a twostage Markov decision task, subjects’ choices seem to follow a mixture of modelbased and modelfree reinforcement learning. In a related task, Gläscher et al. (2010) find support for the coexistence of two distinct neural signatures corresponding to these two unique forms of learning. Crucially for our purposes, Beierholm et al. (2011) study switching between modelbased and modelfree reinforcement and estimate a high withinindividual switching rate (of around onethird of the time).
We hypothesized that modelfree reinforcement (winstay, loseshift behavior) is the main reason that financial actors often fail to conform with Bayesian updating in decision problems where new information allows to update prior beliefs and that information contains a winlose component. In view of the neuroscience literature, we carried out a decisionmaking experiment to explore the idea that there is substantial heterogeneity across individuals, with some agents being “more Bayesian” than others, and within each given individual (as suggested by Beierholm et al. 2011). Specifically, our hypothesis was that decision makers mostly follow either noisy versions of Bayesian updating or modelfree reinforcement, but we did not exclude a priori the involvement of other candidates that the literature has proposed as playing a potential role.
Specifically, we consider two additional behavioral phenomena. The first is conservatism bias or nonupdating behavior, which reflects a general failure to update beliefs, and, taken to the extreme, identifies the posterior with the prior (Edwards 1968, Navon 1978, ElGamal and Grether 1995). For example, in financial markets, Barberis et al. (1998) argued that conservatism might explain underreaction to news. The second is decision inertia, which refers to the repetition of previous choices independently of feedback (Pitz and Geller 1970, AlósFerrer et al. 2016). For instance, it has been shown that a large fraction of stock account owners exhibit portfolio inertia (Bilias et al. 2010). Although these phenomena represent broad behavioral tendencies, in the framework of our experiment, they will correspond to precisely defined behavioral rules, which we take as representative of behavior corresponding to neither Bayesian updating nor modelfree reinforcement. We are particularly interested in these rules because they both capture feedbackindependent deviations from Bayesian updating (hence contrasting with modelfree reinforcement) involving low cognitive costs.
The second question we were interested in is whether, and if so, how our classifications would be affected by monetary incentives and their magnitude. On the one hand, increasing incentives should motivate decision makers to spend more cognitive effort on the task, thereby behaving more in accordance with the prescriptions of Bayes’ rule. On the other hand, reinforcement tendencies are triggered by winlose cues, and if incentives are increased, both wins and losses become larger in magnitude, hence making the triggers more salient. In particular, in an experiment on belief updating, Achtziger and AlósFerrer (2014) showed that when the cue which triggers reinforcement is not paid, compared with when it is incentivized, participants rely less on winstay/loseshift. However, Asparouhova et al. (2015) showed that, in a market situation, participants sometimes avoided belief updating under higher incentives, possibly for fear of making (costly) mistakes, relying on simpler behavioral rules instead. Thus, we will specifically ask how the reliance on different behavioral rules is affected by the magnitude of incentives.
To answer the questions posed previously, we collected behavioral data using a novel experimental paradigm designed to disentangle four decision rules in belief updating (Bayes’ rule, reinforcement, conservatism, and decision inertia) and varied the level of monetary incentives. To obtain a classification of decision makers and study its dependence on incentives, we follow two complementary approaches. First, we rely on finite mixture models (FMM) from statistics (FrühwirthSchnatter 2006), which provide classifications on the basis of choice data accounting for heterogeneity, unobservable determinants of behavior, and rulespecific errors. These models have received increasing attention for the analysis of heterogeneity in economic behavior (CostaGomes et al. 2001, Bellemare et al. 2008, Bruhin et al. 2018, Barron 2021). In agreement with our hypothesis, we find that roughly half of our sample follows nonBayesian rules of behavior, with modelfree reinforcers being the most numerous group among those. The other half is classified as mostly following Bayes’ rule, but with significant error rates. Additionally, we find no performance improvement with higher incentives for Bayesian participants, but nonBayesian decision makers do become significantly better. This suggests a heterogeneous effect of incentives. For half of our sample, low incentives suffice to spark cognitive deliberation and the use of noisy approximations of Bayes’ rule. For these individuals, higher incentives do not result in additional improvements, in agreement with a ceiling effect. The rest of the sample, however, follows mostly modelfree reinforcement, resulting in a lower performance. For these participants, higher incentives do result in increased reliance on Bayes’ rule (but not enough to result in the subjects’ reclassification as mostly Bayesians) and hence an increase in performance.
Although this first strategy of analysis concentrates on heterogeneity across individuals, our second approach turns to heterogeneity within each individual and specifically the temporal dynamics of the data. We ask the question of whether a given decision maker might rely on different decision rules over time and whether incentives do affect the balance. For this purpose, we implement an identification strategy adapted from the machine learning literature. Specifically, we rely on hidden Markov models (HMM; Rabiner 1990, FrühwirthSchnatter et al. 2019), which have been previously used in economics to, for example, identify switching among learning rules in repeated games (Ansari et al. 2012, Shachat et al. 2015) or to study bidding heuristics in auctions (Shachat and Wei 2012).
In our setting, the idea is that each behavioral rule corresponds to an unobservable (hence “hidden”) state of an individuallevel Markov chain whose transition probabilities capture the dynamics of behavior over time. Thus, we estimate the transition probabilities, which determine the longrun probabilities of the behavioral rules. We find that most subjects exhibit relatively large probabilities for both Bayes’ rule and modelfree reinforcement, as well as relatively large transition probabilities between those. Overall, a picture arises where some subjects rely mostly on Bayes’ rule even if incentives are low, but occasionally follow modelfree reinforcement and other rules, and either hit a ceiling when incentives are increased or even suffer detrimental effects due to increased reliance on modelfree reinforcement as the winlose cues become more salient. Other subjects rely predominantly on modelfree reinforcement but occasionally follow Bayes’ rule. For those, an increase in incentives is essentially detrimental. Last, some subjects mostly follow rules other than Bayes’ rule and (modelfree) reinforcement and generally achieve low levels of performance. For those, an increase in incentives is beneficial because it increases the transition probabilities toward Bayes’ rule.
Our work follows on previous literature on the classification of decision makers in terms of behavioral rules in the domain of belief updating. This includes both the distinction between modelfree and modelbased reinforcement learning (Daw et al. 2005, Doll et al. 2012) and previous work using finite mixture models. For example, in a beliefupdating experiment, ElGamal and Grether (1995) showed that different decision makers favored different behavioral rules, but Bayes’ rule was the most frequently used at the population level. That is, humans generally fail to perfectly follow Bayes’ rule, but the latter is still a good representation of the behavior of a majority of people for a large proportion of the time. Of course, both observations are only compatible if there is heterogeneity in the reaction to new information and how beliefs are updated.
We contribute to the related literature on heterogeneity in belief updating in three ways. First, compared with existing paradigms (Charness and Levin 2005, Knutson and Bossaerts 2007), our new experimental task allows to explicitly consider and disentangle more than two behavioral rules (and, in particular, contrast modelfree reinforcement learning to feedbackindependent deviations from Bayesian updating which have been found to be relevant in the financial decisionmaking literature). Second, compared with previous classification contributions as ElGamal and Grether (1995), we take a step further in the identification of heterogeneity by investigating potential temporal dynamics, that is, allowing for withinsubject heterogeneity in the reliance on different behavioral rules within the course of the experiment. Third, we explicitly target the effect of incentives on heterogeneity, that is, whether higher incentives affect reliance on one particular rule.
The most closely related contribution to ours is PayzanLeNestour and Bossaerts (2015), which relied on a multiarmed bandit task. Participants mostly followed modelfree reinforcement but switched to Bayesian learning when nudged into paying attention to crucial statistics of the environment. We share with this work a focus on the comparison between Bayesian behavior and modelfree reinforcement and an interest in switching behavior. The main difference is that PayzanLeNestour and Bossaerts (2015) evaluate which of different candidate models describe the data better, whereas we consider withinindividual heterogeneity; that is, we examine to which extent decision makers are both Bayesians and reinforcers. Other differences concern the specific implementation and task complexity. For instance, our experimental task belongs to the class of static settings typically used to study heuristics and biases, where priors are reset after each relevant decision and each trial is independent of (and equivalent to) the others. In contrast, as in many contributions contrasting modelfree and modelbased reinforcement, the task of PayzanLeNestour and Bossaerts (2015) is inherently dynamic, with the expected payoffs of the bandit’s arms changing over time.
Our contribution is further related to a strand of papers using a different, simpler paradigm contrasting winstay, loseshift behavior and Bayesian updating, where, as in our case, information is endowed with a winlose cue. In this paradigm, Charness and Levin (2005) found very high error rates that would be consistent with modelfree reinforcement behavior. By examining response times in the same paradigm, Achtziger and AlósFerrer (2014) argued that the data could be well explained by a dualprocess model where simple reinforcement conflicts and interacts with more deliberative belief updating. Achtziger et al. (2015) examined neural evidence for simple reinforcement in this task using the electroencephalograph (EEG) and found that subjects with higher error rates under high incentives exhibited larger amplitudes in (extremely early) brain potentials linked to reinforcement learning. A possible interpretation in agreement with our results is that, even if larger monetary rewards increase effort, they also increase the salience of the winlose cues which trigger reinforcement, hence creating a “reinforcement paradox” where higher incentives increase reliance on this alternative process instead of on Bayesian behavior. In line with this paradox, a pupildilation study (AlósFerrer et al. 2021b) has recently shown that higher incentives in this paradigm do increase cognitive effort while failing to result in generally increased performance.
The paper is structured as follows. Section 2 discusses the experimental design and the behavioral rules. Section 3 provides a descriptive overview of the data. Section 4 applies finite mixture modeling to obtain a classification of decision makers, studies the effects of incentives on that classification, and briefly discusses response times. Section 5 applies an HMM to study the temporal dynamics, the effects of incentives, and the differences across behavioral types. Section 6 concludes.
2. Design and Procedures
2.1. Behavioral Rules and Experimental Design
We designed a novel beliefupdating paradigm with the explicit objective of disentangling different behavioral rules when new information carries a winlose component. Because reinforcement is relevant for a wide range of conceptually different decisions, we developed a framefree, abstract paradigm to study behavioral heterogeneity free of potential confounds that could arise from particularities present in one application but absent in others. In view of our motivation, it was important to focus on binary decisions that would provide a simple winlose feedback while giving an opportunity to update beliefs on an underlying state of the world. An additional concern was to develop a paradigm rich enough to allow disentangling Bayesian updating and (modelfree) reinforcement from the other two rules of interest (conservatism and decision inertia). The paradigm we developed belongs to the larger class of urn tasks, which have been extensively used to study biases in belief updating, for example, for the case of representativeness and conservatism (Grether 1980; 1992) or for the comparison of Bayesian updating and reinforcement rules (Charness and Levin 2005, Achtziger and AlósFerrer 2014, Achtziger et al. 2015).^{2}
The essence of the paradigm is as follows. Participants are presented with three covered urns, each containing balls of two possible colors (black or white) in different but known proportions. One of the three urns is chosen at random, and a single ball is randomly extracted from it. Participants know the proportion of balls in each urn, that the actual urn is one of the three described ones, and that the urn has been selected randomly, with equal probabilities for each urn. Crucially, the ball is not replaced after the first extraction. Then, a second ball is extracted at random from the same urn. The participant’s decisions are bets. Before each of the two extractions, participants bet on the color of the extracted ball. We are interested in the second betting choice because the color of the first extracted ball allows updating the belief regarding the urn from which the balls are extracted from.^{3}
Incentives are straightforward. After each extraction, participants are paid a constant amount if and only if the color of the ball matches their bet. After two extractions, the trial ends and all balls are replaced in the urns before a new trial starts. Participants are aware that all trials are independent from each other, so the urn from which the two balls are extracted is randomly and independently determined according to the same uniform prior in each of the 60 repetitions.
It is important to note that our focus is on distinguishing behavioral rules and not on dynamic learning, in contrast to the literature investigating modelfree and modelbased reinforcement learning (PayzanLeNestour and Bossaerts 2015). By design, there is limited scope for learning in our paradigm, as trials are independent, and hence the situation “resets” after each choice, which is very different from the twostage or multiarmedbandit tasks that are commonly used in that literature.
To be able to disentangle our candidate behavioral rules, there are two types of trials using two different urn compositions, as depicted in Figure 1(a). In one type, each urn contains four balls. In the other type, each urn contains six balls. In both cases, one of the urns contains exactly one black ball, the other contains exactly one white ball, and the third urn contains half each of black and white balls. Although the alternative urn compositions are superficially similar and are both easy to understand, the prescriptions of the candidate rules (in particular, of Bayesian updating of beliefs) are quite different across the two kinds of trials, which in turn allows us to discriminate among them. In the experiment, participants were reminded of the composition of the urns and the number of balls in each urn at all times, as this information was prominently displayed in each trial.
The first of the rules we are interested in is optimization following Bayesian updating of the prior, or simply Bayes’ rule for short. Bayes’ rule captures the normatively correct way to integrate new information with prior beliefs, but it has been widely shown to perform poorly as a descriptive rule. Empirically documented violations in experiments involving conditional probability judgments (Grether 1980, Charness and Levin 2005) have shown that human beings are simply not Bayesian optimizers, although Bayes’ rule is sometimes a reasonable approximation of behavior (ElGamal and Grether 1995, Griffiths and Tenenbaum 2006). We use Bayes’ rule as a benchmark, as it describes the normatively optimal behavior.
Straightforward computations show that a subject who used Bayes’ rule in fourball trials and chose black for the first bet should, for the second bet, shift to white if she won the first bet and stay with black if she lost the first best (and symmetrically if she bet on white the first time). In contrast, in sixball trials, Bayes’ rule prescribes the exact opposite: stay with the same color for the second bet if she won the first and shift if she lost.^{4} The prescriptions of Bayes’ rule are summarized graphically in the column “Bayesian” in Figure 1(b).
The main alternative rule we are interested in is (modelfree) reinforcement learning. This is the natural candidate in any setting where information comes with winlose feedback. Reinforcement is a basic component of human behavior (Thorndike 1911, Sutton and Barto 1998) and refers to the tendency to repeat whatever action yielded a positive result in the past and avoid those which led to failure. In this paradigm, we assume the simplest form of reinforcement, that is, winstay, loseshift behavior. This rule of thumb has been shown to explain deviations from Bayesian behavior in beliefupdating paradigms where information is extracted from previous wins and losses, as in many economic settings (Charness and Levin 2005, Achtziger and AlósFerrer 2014).
The prescriptions of this rule are particularly simple (see column “Reinforcement” in Figure 1(b)). In case of success, participants following (modelfree) reinforcement would repeat the same choice, whereas they would shift to the other choice after failure. As a consequence, in our binary setting, this rule always prescribes to place the second bet on the color of the actually extracted first ball, that is, chasing after the previous winner. In particular, modelfree reinforcement makes identical prescriptions for fourball and sixball trials, which means that it coincides with Bayes’ rule for sixball trials but is completely opposed to it for fourball trials.
The third behavioral rule we are interested in is inertia, which is the tendency to repeat previous choices independently of the outcome (Pitz and Geller 1970, Akaishi et al. 2014, AlósFerrer et al. 2016) and has been linked to status quo bias (Ritov and Baron 1992). For instance, Erev and Haruvy (2016) argue that humans exhibit a strong tendency to simply repeat the most recent decision, and this tendency is sometimes stronger than the tendency to react optimally to the most recent outcome.
In our paradigm, inertia prescribes that participants ignore the results of the first bet and simply repeat the previous decision, that is, bet black after betting black the first time and white after white (see column “Inertia” in Figure 1(b)). Again, the pattern is identical for fourball and sixball trials, and in both cases, it differs from Bayes’ rule and from reinforcement.
We term the last behavioral rule we focus on “nonupdating.” This rule postulates that the prior is not updated, and that the decision maker uses a posterior identical to the prior (hence uniform). However, in our setting, there is a transparent change between the first and the second ball extraction, which is independent of probability updating: because there is no replacement, for the second bet the selected urn contains one ball less (three for fourball trials and five for sixball trials), namely the one extracted after the first bet. This poses an incorrect but simple optimization problem. That is, this rule prescribes that participants take into account the rather obvious fact that after the first extraction, one ball is missing, but they do not engage in belief updating at all.
Direct computations show that a subject who used the nonupdating rule and won would shift to the opposite color for the second bet but would stay with the same color if she lost the first bet (column “Nonupdating” in Figure 1(b)). The prediction is the same for fourball and sixball trials. That is, by design, in our paradigm, nonupdaters would always behave in direct opposition to winstay, loseshift reinforcement, and hence, the prescriptions of this rule coincide with those of Bayes’ rule for fourball trials and are the opposite for sixball trials.
In summary, and as shown in Figure 1(b), using just two binary choices and two urn compositions with three urns each, we can fully disentangle four different behavioral rules. For fourball trials, Bayes’ rule and nonupdating prescribe the same decisions, which are in direct opposition to the prescriptions of (modelfree) reinforcement. For sixball trials, Bayes’ rule and (modelfree) reinforcement prescribe the same decisions, which are in direct opposition to the prescriptions of nonupdating. In both cases, inertia always coincides with exactly half of the prescriptions of each other rule and is opposed to it for the other half.
2.2. Procedures
A total of n = 268 university students (142 females; age range, 18–43 years; mean = 24.07 years) were recruited using the Online Recruitment System for Economic Experiments (Greiner 2015) from the preregistered pool of the Cologne Laboratory for Economic Research (CLER), excluding students majoring in psychology or economics because they could have been familiar with similar paradigms. The experiment was programmed using zTree (Fischbacher 2007). Each participant made decisions in 60 trials, 30 with the fourball design and 30 with the sixball design. To avoid order effects, half of the participants (randomly assigned) worked on fourball trials first and sixball trial later, and the remaining participants followed the inverse order.
Participants received a performancebased payment plus a showup fee of 2.50 Euro. To study the effects of incentives in our classifications, there were two different treatments (with data collected in different sessions). Under low incentives, each successful bet was rewarded with 18 Eurocents (n = 128). Under high incentives, the payoff was 30 Eurocents (n = 140) for each successful bet. The size of the incentives remained constant throughout the experiment and was common knowledge.^{5} All (successful) decisions were paid. In our context, this payment mechanism is incentive compatible under mild assumptions on individual preferences, as shown by Azrieli et al. (2018).^{6}
Participants received detailed written instructions and answered nine control questions regarding the replacement of balls, states of the world, and trial independence. To ensure that participants understood the task, those who got any control questions wrong were provided with further information by the experimenter, until a sufficient understanding of the task was reached. There were no practice trials. No time limit was imposed during the experiment; participants were free to use as much time as they needed. At the end of the experiment, participants completed some (nonincentivized) questionnaires and provided demographic information (gender, age, and field of studies). A session lasted about 90 minutes, and average earnings were 17.93 Euros (standard deviation SD = 1.74).
3. Descriptive View of the Data
3.1. Choice Frequencies
The first bet is never of interest in itself, as given the prior both choices are equally likely to result in a payoff, and given the symmetry of the design, no behavioral rule makes a specific prescription. We are interested in the second bet within each trial. For those, rules generally make different prescriptions, which depend on the first bet and on the outcome of the first extraction.
Participants failed to consistently make optimal decisions (those prescribed by Bayes’ rule). For the second bet within each trial, the average across individuals of the percentage of correct decisions was 61.23% (median = 58.33%, SD = 15.53, minimum = 20.00%, maximum = 100.00%), which was significantly different from random performance according to a Wilcoxon signedrank (WSR) test (n = 268, z = 14.193, p < 0.001).
At the aggregate level, there were no significant differences in the percentage of correct answers across incentive levels according to a MannWhitneyWilcoxon test (MWW; low incentives, average 62.54%; high incentives, 59.80%; n = 268, z = 1.242, p = 0.214). The percentage of correct answers in fourball trials (average, 58.22%) was lower than in sixball trials (average, 64.24%), although the difference did not reach significance (WSR test, within subject, n = 268, $z=1.626$, p = 0.104). The (counterbalanced) order did not affect the percentage of correct answers (fourball trials first: average, 62.22%; sixball trials first: average, 57.25%; MWW test, n = 268, $z=0.418$, p = 0.676).
There is no evidence that the subjects “figured out” how to solve the task during the course of the experiment. In particular, there is no discernible trend for the lowincentive treatment (random effects Probit regression on the probability of a correct choice regressed against the trial number, coefficient $=0.001$, t = 0.50, p = 0.616), indicating that there was no learning at the aggregate level. For the highincentive treatment, we observe a marginally significant but very modest positive trend (random effects regression, coefficient $=0.002$, t = 1.87, p = 0.061), which should not be overinterpreted.
3.2. Naïve Classification
As an illustration, we start with a simple but naïve classification approach based on observable data, similar to previous approaches in the literature, before we turn to our finitemixture and hiddenMarkov analyses. We examine choices at the individual level and classify a subject as following a particular behavioral rule if his or her decisions coincide with the prescriptions of that rule more than the others (i.e., is she follows Bayes’ rule more frequently than any of the alternative rules).
This simple procedure classifies 31.72% of subjects as following Bayes’ rule, 22.76% as reinforcers, 29.48% as nonupdaters, and only 21.27% as relying on inertia. Figure 2 displays the proportion of subjects classified as following each behavioral rule depending on the incentive condition. However, in this simple classification, incentives seem to have limited or even detrimental effects, in the sense that there are no significant differences in the proportions assigned to each rule across incentive treatments for most rules (test of proportion; reinforcers, p = 0.801; inertia, p = 0.259; nonupdaters, p = 0.733) with the notable exception of Bayes’ rule. There is a lower number of subjects classified as following the normative prescriptions in the higher (37.14%) compared with lower (25.78%) incentive treatment (test of proportions, z = 1.996, p = 0.022).
Unsurprisingly, subjects classified as following Bayes’ rule earned more on average in the task (mean, 9.31) than other subjects (mean, 9.04; MWW, n = 268, z = 2.759, p = 0.006). These results are a first indication that, in this paradigm, being Bayesian actually paid off, but increasing incentives might actually have a negative effect on performance, at least for some people. The naïve classification provides a first glimpse at behavioral heterogeneity, but it is obviously of limited interest. The reason is that it adopts the unrealistic view that decision makers follow exactly one behavioral rule and remains silent on deviations with respect to a given rule, even though a subject is still classified as following a rule if a large part of his or her decisions deviate from it. Hence, in the following section, we turn to a finitemixture modeling approach.
4. Behavioral Types and Finite Mixture Modeling
We are interested in individual heterogeneity, and hence we now turn to a different method of classification of decision makers according to the postulated behavioral rules. The task of classifying the observed choices according to behavioral rules is not trivial, because which behavioral rule was followed by each participant for each decision is not directly observable. Hence, any inference on which rule determined which decision must rely on a comparison between observed choices and the prescriptions of the rules. This is complicated by the fact that different behavioral rules might prescribe the same choice part of the time. Furthermore, in view of overwhelming evidence on decision errors and stochastic choice, it is unrealistic to assume that behavioral rules are purely deterministic.
To solve these problems, we explicitly introduce the possibility that subjects make errors during the experiment. In the model, an error occurs whenever a subject is following a specific behavioral rule but the actual choice deviates from the rule’s prescriptions. For example, a subject might follow Bayes’ rule but make a (maybe computational) mistake and choose the incorrect option. In other words, we consider noisy versions of the behavioral rules of interest. Given strictly positive error probabilities, each and every choice might a priori have been generated by each and every candidate behavioral rule. For instance, in a situation where reinforcement and Bayes’ rule prescribe different answers, a given (erroneous) answer might be due to the subject following reinforcement or to the subject making a mistake while actually following Bayes’ rule. To untangle behavior, we now turn to a finite mixture modeling approach, which accounts for unobservable determinants of behavior, potential heterogeneity among individuals, and the possibility of errors. In the following sections, we describe how we implement finitemixture modeling for our data and report the actual estimation.
We adopt the supervised approach to finitemixture modeling, where we specify the number and characteristics of types ex ante. This is appropriate in our case because we have clearly defined candidates, as the experiment was designed to disentangle precisely the four behavioral rules we consider. In this sense, we tie our hands ex ante to the set of relevant behavioral rules and examine whether the evidence supports that choice.^{7}
4.1. Finite Mixture Model for Noisy Behavioral Rules
Let $B=\{0,1\}$ denote the set of possible bets or outcomes, where zero stands for “black” and one for “white.” For each trial and participant in our experiment, the second bet $b\in B$ is made after the first, $a\in B$, and after observing the color of the first extracted ball, $x\in B$. Furthermore, let $t\in T=\{4,6\}$ indicate whether the trial corresponds to the fourball or the sixball design. Before the decision b is made, the relevant characteristics of a trial are completely identified by $\omega =(t,a,x)\in \mathrm{\Omega}=T\times {B}^{2}$. With this notation, a behavioral rule in our setting is a mapping
Because the behavioral rules are deterministic and there are no ties in our design, the choice probabilities induced by each rule β_{k} are given by
The observations $m=1,\dots M$ in our data set are of the form $(\omega ,b)\in \mathrm{\Omega}\times B$. A finite mixture model assumes that a distribution of types (in our case, behavioral rules) generates the actual observations, with η_{k} ($k=1,\dots ,4$) being the probability of type k.
To take errors into account, we consider noisy behavioral rules by introducing an error or “trembling” probability ${\epsilon}_{k}$, analogously to Bruhin et al. (2010). Because we deal with binary choice, the interpretation is simple: given ω and assuming that β_{k} generated the observed choice, $1{\epsilon}_{k}$ is the probability with which the choice prescribed by β_{k} actually obtains, whereas with probability ${\epsilon}_{k}$ the opposite choice is made. Formally,
Our estimation assumes that each individual $j=1,\dots ,268$ has independent probabilities ${\eta}_{k}^{j}$ of relying on each rule plus individuallevel error rates ${\epsilon}_{k}^{j}$ ($k=1,\dots ,4$). Hence, the likelihood of the subdata set containing j’s decisions, ${D}_{j}=\{({\omega}_{1}^{j},{b}_{1}^{j}),\dots ,({\omega}_{60}^{j},{b}_{60}^{j})\}$, is
Because $\mathrm{ln}\u200a\text{Prob}(D)={\sum}_{j=1}^{268}\mathrm{ln}\u200a\text{Prob}({D}_{j})$ and $\text{Prob}({D}_{j})$ depends only on the variables ${\eta}_{1}^{j},\dots ,{\eta}_{4}^{j}$ and ${\epsilon}_{1}^{j},\dots ,{\epsilon}_{4}^{j}$, which in turn play no role for $\text{Prob}({D}_{\ell})$ with $\ell \ne j$, maximum likelihood estimation can be done independently for each participant j. The estimation then delivers which type $k=1,\dots ,4$ each participant j is more likely to belong to plus the individual error rate ${\epsilon}_{k}^{j}$ of the corresponding rule.
4.2. Estimation Results
We are interested in how many subjects consistently follow each (noisy) behavioral rule in this paradigm and in how precise these rules are (i.e., how large the ${\epsilon}_{k}$ are). To answer this question, we estimate the finite mixture model described previously separately for each subject. This type of individuallevel analysis has been shown to be a useful tool in a number of other contexts (ElGamal and Grether 1995, CostaGomes et al. 2001, Iriberri and ReyBiel 2013, Bruhin et al. 2018).^{8}
Following a common approach in the relevant statistics literature (Redner and Walker 1984), we adopt a maximum likelihood (ML) approach for parameter estimation in our finite mixture model. Specifically, we rely on the DavidonFletcherPowell (DFP) algorithm (Fletcher and Powell 1963, Davidon 1991).
Table 1 summarizes the distribution of subjects classified as following each behavioral rule. ML estimation converged for n = 266 of our 268 subjects. Columns Est.Weight and Est.Error rate report the average estimated parameters corresponding to the probability ${\eta}_{k}^{j}$ of each rule and the relative error rate ${\epsilon}_{k}^{j}$. “Classified as” and “Percentage” report the number and percentage of subjects classified as most likely following each rule, respectively, that is, those subjects for which that rule had the highest probability. This classification is depicted in Figure 3, which shows that a slight majority of our subjects are classified as nonBayesian, but Bayes’ rule (at 49%) is the most frequently followed rule, a result in alignment with previous observations by ElGamal and Grether (1995). Confirming our basic hypothesis that reinforcement is the main determinant of deviations from Bayes’ rule in our setting, this rule is the second highest in terms of number of participants mostly following it: around 28%. In contrast, inertia and nonupdating describe just around 17% and 6% of our sample, respectively. This is in sharp contrast with the naïve classification reported previously.

Estimation Summary: Finite Mixture Model
Behavioral rule  Est.Weight  Est.Error rate  Classified as  Percentage  Cond.Mean  Cond.Error rate 

Bayes  48.15  13.62  130  48.87  97.09  12.07 
(48.71)  (13.34)  (10.34)  (11.84)  
Reinforcement  28.28  24.52  75  28.20  98.09  26.06 
(16.14)  (15.51)  (7.81)  (16.14)  
Inertia  15.48  41.62  44  16.54  84.74  41.61 
(32.55)  (11.63)  (19.16)  (11.26)  
Nonupdating  8.09  47.45  17  6.39  88.35  35.16 
(23.26)  (23.32)  (16.72)  (19.29) 
Notes. Standard deviation is in parentheses. n = 266.
A good classification should be as unambiguous as possible. Ideally, to declare a decision maker as, say, a Bayesian or a reinforcer, one would like to have a probability ${\eta}_{k}^{j}$ close to one for the corresponding rule. The columns Cond.Mean and Cond.Error rate of Table 1 report the estimated average parameters restricted to those subjects who are actually classified as most likely following the corresponding rule. The column Cond.Mean shows that, for our estimation, most estimated values of ${\eta}_{k}^{j}$ are close to either zero or one (the distributions are bimodal). In other words, almost all subjects are unambiguously classified as following one of the four behavioral rules, which is an indication of the goodness of the estimation. However, our assumption of noisy behavioral rules seems justified. In particular, the estimated error rates (conditional or not) differ across rules and are smallest for the two most common ones. Specifically, Bayes’ rule has the lowest estimated error rate, around 12%, followed by reinforcement at around 26%. In contrast, the two lessfrequent rules, inertia and nonupdating, appear to be associated with relatively large error rates.^{9}
We can further explore possible correlations between the classification and demographic characteristics. Females are not more likely to be classified as following Bayes’s rule or reinforcement than males (MWW tests: Bayes, n = 266, z = 0.348, p = 0.728; reinforcement, n = 266, z = 0.688, p = 0.491). Moreover, there is no significant correlation between age and being classified into these types (Spearman’s correlation: Bayes, n = 266, $\rho =0.018$, p = 0.766; reinforcement, n = 266, $\rho =0.016$, p = 0.798).^{10}
The bottom line of the classification is that more than three quarters of our sample seem to have mostly (and consistently) followed either Bayes’ rule or reinforcement during the entire experiment, once one accounts for occasional errors. That is, Bayes’ rule explains part of the data, but reinforcement is an important driver of behavior and, in our classification, represents the most common deviation from Bayesian behavior. Our results further highlight that analyses of aggregate behavior can be potentially improved by considering the underlying heterogeneity in decisionmaking processes across individuals.
4.3. Behavioral Rules and Incentives
Participants in our study performed the beliefupdating task for different incentive levels, with n = 128 participating under low incentives and n = 140 under high incentives. Because our estimation was conducted at the individual level and incentive levels were varied between subjects (and not within), it is possible to further differentiate the classification according to the level of incentives. Table 2 reports the proportion of individuals in each treatment classified as mostly following each behavioral rule (left), which are depicted in the left panel of Figure 4. The right side of Table 2 displays the average error rates for each behavioral rule, restricted to the individuals classified as mostly following it, and includes MWW tests for those error rates across treatments. As was the case for the naïve classification, there are no treatment effects for the percentage of individuals classified under each rule. Moreover, the (conditional) error rates for the different behavioral rules are not significantly different across treatments.

Treatment Effect at the Individual Level
Rule  Proportion  Estimated error rate  

Low  High  Test of proportion  Low  High  MWW test  
z  p value  z  p value  
Bayes  45.32  52.76  −1.212  0.113  13.05  11.14  0.553  0.580 
Reinforcement  30.94  25.20  1.042  0.149  25.21  27.20  −0.397  0.692 
Inertia  16.55  16.53  0.004  0.997  40.31  43.04  −0.200  0.841 
Nonupdating  7.19  5.51  0.563  0.574  35.77  34.28  0.098  0.922 
n = 266  n_{low} = 126  n_{high} = 140 
For participants classified as mostly following Bayes’ rule, by definition, the error rates (${\epsilon}_{1}^{j}$) refer to deviations from normatively optimal behavior. Hence, the first test on the right side of Table 2 shows that Bayesian decision makers are not significantly better (in a normative sense) for higher incentives (88.86% correct answers under high incentives versus 86.95% under low incentives; MWW, n = 130, z = 0.553, p = 0.580). However, nonBayesian decision makers do become significantly better in this sense. Specifically, as illustrated in the right panel of Figure 4, if we pool all subjects classified as following either reinforcement, inertia, or nonupdating into a nonBayesian category, we find a significant increase in the percentage of normatively correct answers under high incentives (64.06%) compared with low incentives (60.19%; MWW, n = 136, z = 1.969, p = 0.049).^{11}
This result has a simple interpretation and speaks in favor of a heterogeneous effect of incentives. For part of the population (roughly half), low incentives suffice to engage in cognitive deliberation and decide following noisy, errorprone approximations of Bayes’ rule. Higher incentives do not result in additional improvements for these individuals, and in particular, error rates cannot be reduced, suggesting ceiling effects. Another large part of the population, however, follows nonBayesian behavioral rules most of the time (mostly reinforcement) and accordingly achieves a lower performance. Taken as a whole, the latter subjects, however, do react to incentives and make more choices in alignment with Bayes’ rule under high incentives.
The increase in Bayesian choices under high incentives for subjects classified as nonBayesians, however, is not of sufficient magnitude to be reflected in a change in the classification or even in a significant increase of the fraction of the time that those subjects are classified as relying on Bayes’ rule (${\eta}_{1}^{j}$), mainly because the latter is very close to zero. Indeed, the estimated average weight for Bayes’ rule for nonBayesian subjects is merely 0.74% (minimum $=0$, maximum $=38.56\%$) under low incentives and 1.89% (minimum $=0$, maximum $=44.14\%$) under high incentives (MWW, n = 136, z = 0.132, p = 0.895).
4.4. Response Times and Incentives
In our experiment, we recorded both choices and their response times. The analysis of the response times reveals a very large, highly significant difference between incentive treatments. As illustrated in Figure 5 (left), decisions are considerably faster under high incentives (average, 1.627 seconds) compared with low ones (3.402 seconds; MWW, n = 268, z = 7.868, p < 0.001), with an average decrease of about 52%. This difference is independent of the classification. In particular, it holds both for subjects classified as most likely following Bayes’ rule (Figure 5, right; high incentives, 1.592 seconds; low incentives, 3.183 seconds; MWW, n = 130, z = 3.957, p < 0.001) and for all those classified as nonBayesian (high incentives, 1.594 seconds; low incentives, 3.335 seconds; MWW, n = 136, z = 7.758, p < 0.001).
This effect demonstrates that participants did react to the different levels of incentives; that is, there were differences in the perception of high versus low incentives for our task. However, the interpretation of response times is nuanced, especially when different behavioral rules codetermine behavior (Achtziger and AlósFerrer 2014, Spiliopoulos and Ortmann 2018). Some authors have suggested an interpretation of response times as a proxy for cognitive effort in certain settings (Moffatt 2005, Enke and Zimmermann 2019), for example, iterative thinking in games (AlósFerrer and Buckenmaier 2021). However, this interpretation would be unwarranted in the simple decisions we study here.
On the contrary, we suggest that the interpretation of the effect of incentives on response times in our context is straightforward. One of the most stable and most firmly established regularities involving response times is the chronometric effect, which boils down to the extremely robust observation that easier choice problems (where alternatives’ evaluations show large differences) take less time to respond to than harder problems (Dashiell 1937, Moyer and Landauer 1967). Hence, deliberation times are longer for alternatives that are more similar, either in terms of preference or along a predefined scale. This effect has explicitly been shown in various economic settings such as intertemporal choice (Chabris et al. 2009), decisions under risk (AlósFerrer and Garagnani 2022), consumer choice (Krajbich et al. 2010), and dictator and ultimatum games (Krajbich et al. 2015).^{12} In our case, increasing incentives makes the payoff difference between a correct and an incorrect choice larger. Hence, higher incentives result in a larger difference in payoffs, which, by the chronometric effect, result in shorter response times. This observation is enough to explain the effect we find.
4.5. Alternative Classifications
Our classification through an FMM follows the supervised approach because our experiment was designed to disentangle four specific behavioral rules. It specifies noisy behavioral rules where error rates can be estimated because those provide a measure of the accuracy of the classification. We conduct the estimation at the individual level, where the FMM is estimated separately for every participant, because we are especially interested in heterogeneity across and within individuals. Unfortunately, all three choices make the analysis computationally intensive.
The online appendices report a number of additional alternative classifications. First, we implemented an unsupervised approach, which is detailed in Online Appendix A. In this approach, we compared models with two, three, or four components without specifying their characteristics. The results show that the unsupervised approach is not appropriate for our data because it loses track of the individual and essentially tries to fit the data through rules that always bet on black or always bet on white.^{13}
Second, we estimated an FMM at the aggregate level, that is, as if the entire data set came from a representative decision maker. This analysis (assuming noisy behavioral rules as in our main analysis) is detailed in Online Appendix B. As in the model described previously, this estimation puts the largest weights on Bayesian behavior and modelfree reinforcement, with a sizeable weight on inertia (18.5%). However, the error rates are very high for all rules (close to 50%), which turns them into essentially random behavior. Again, this suggests that this kind of analysis is inappropriate for our data. The reason is that the aggregatelevel analysis treats all observations as equivalent and effectively ignores heterogeneity across individuals.
Third, we conducted a number of model comparisons estimating the model at the aggregate level, assuming mistakefree rules and varying the number of considered rules (two, three, or four; see Online Appendix C). The best model (according to criteria that penalize additional parameters) consists of only Bayes’ rule and (modelfree) reinforcement. However, it is difficult to evaluate the fit of the classification in the absence of estimated error rates.
Fourth, as an additional model comparison, we estimated an FMM with noisy rules and at the individual level but only three types: Bayes’ rule, reinforcement, and a purenoise rule that simply randomizes between both alternatives. This estimation was very inaccurate, with conditional means between 55% and 61% (see Online Appendix D for details).
5. Temporal Dynamics and Hidden Markov Modeling
The finite mixture approach detailed in the previous section has shown that there is substantial heterogeneity across individuals, with half of our sample mostly following errorprone approximations of Bayes’ rule and the other half mostly following nonBayesian rules. Thus far, however, we have ignored the temporal dimension. In particular, the previous analysis does not rule out possible dynamic relations across behavioral rules.^{14} For example, thus far, we do not know whether a Bayesian choice might be more or less likely after having relied on the reinforcement rule. To answer these questions, this section investigates temporal effects by relying on a simple machine learning methodology: HMM. While the previous section concentrated on identifying the behavioral rules that subjects most likely followed during the entire experiment (interindividual heterogeneity), in this section, we focus on possible dynamic patterns where decision makers may switch from one behavioral rule to another over time (intraindividual heterogeneity).
5.1. HMMs
To study the temporal interplay between behavioral rules, we need to classify each choice for each subject and trial, based on the rule the subject most likely followed in that trial, while conditioning on the most likely behavior in previous rounds of the experiment. With this aim, we propose a different model specification that goes beyond the finite mixture modeling approach used thus far.
We rely on a simple machine learning algorithm known as HMM (MacDonald and Zucchini 1997, Ramaprasad and Shigeyuki 2004, FrühwirthSchnatter et al. 2019). The basic idea is as follows. Because we aim to investigate the dynamic dependence across different behavioral rules, we cannot assume that the latent variables are independently and identically distributed, as in finite mixture models. Rather, an HMM postulates that the latent variables are connected through a Markov chain. Specifically, in our setting, each state of the Markov chain represents a behavioral rule. Because we cannot directly observe which rule a participant actually uses in a given trial, these states are “hidden.” Thus, instead of estimating static probabilities for each state, one needs to estimate transition probabilities across them.
Specifically, let the state of subject j’s Markov chain at time t be described by ${X}_{t}^{j}\in \{1,2,3,4\}$, where ${X}_{t}^{j}=k$ means that, at time t, the subject follows the noisy behavioral rule p_{k} given by (1). That is, as in Section 4, we maintain our assumption that behavioral rules are noisy, and mistakes are possible. In particular, a state does not correspond to a deterministic behavioral rule but rather to its noisy counterpart, and thus we need to estimate the relative error probability for each state (which, in terms of HMM models, is the complementary of the emission probability).
For our data, HMM reduces to the estimation of the individuallevel, statedependent error rates ${\epsilon}_{k}^{j},\hspace{0.17em}k=1,2,3,4,\hspace{0.17em}j=1,\dots ,268$ and the stationary transition probabilities
Following the HMM literature (Rabiner 1990, Durbin et al. 1998), we adopt a standard approach to estimate the parameters of the HMM, namely a special case of expectation maximization (Dempster et al. 1977) known as the BaumWelch algorithm (Baum et al. 1970). This algorithm is also known as the forwardbackward (or α, β, γ) algorithm because it loops among three welldifferentiated parts or tasks. Intuitively, the first task is to efficiently evaluate the probability of the observations given the model (following a topdown procedure, in our case from first to last trial). The second task is to identify the optimal state sequence, that is, which state sequence among all theoretically possible ones is most likely to have occurred (now following a bottomup or lasttofirst procedure). The third task is to adjust the model’s parameters given the mostlikely sequence and the estimated probability of observations given the model in a meaningful way. We refer the reader to FrühwirthSchnatter (2006) or FrühwirthSchnatter et al. (2019) for details on the algorithm. We declare that the algorithm has converged when the change in log likelihood, normalized by the total number of possible sequences and by sequence length, is smaller than ${10}^{7}$ as standard in the literature (Ramaprasad and Shigeyuki 2004, Mamon and Elliott 2014). To determine the initial values for the algorithm, we used the values (${\eta}_{k}^{j}$ and ${\epsilon}_{k}^{j}$) obtained in the FMM of Section 4.2 (and a flat prior for the two subjects for whom that estimation did not converge).
5.2. Temporal Dynamics: Estimation Results
We estimated the parameters of the HMM for all 268 participants, thereby obtaining a transition probability matrix and a vector of error terms for the hidden states for each individual. From these matrices, we computed all individual invariant distributions (μ such that $\mu =\mu \xb7P$), which, by the fundamental theorem of Markov chains, summarize the longrun probabilities of the states, and, by the ergodic theorem, reflect the average time that the system spends at each state (Karlin and Taylor 1975).^{15}
Table 3 displays the average results of the estimation for this type of analysis, pooling across incentive treatments. The From column indicates the behavioral rule which was most likely followed in the previous trial. The columns under the To label indicate the behavioral rule to where the Markov process is most likely to go. That is, the given matrices represent average transition probability matrices. The last column of the table (Error) indicates the estimated probability of making an error (the complementary of the emission probability) at each state. The last row of the table (Invariant) reports the probabilities in the corresponding invariant distribution.

Temporal Dynamics: Individual Averages
From  To  

Bayes  RL  Inertia  Nonupdated  Error  
Bayes  42.75%  34.73%  14.63%  7.88%  12.74% 
(22.53)  (17.70)  (21.60)  (13.02)  (13.02)  
RL  42.63%  34.86%  14.56%  7.96%  22.85% 
(22.56)  (17.66)  (21.80)  (12.90)  (15.35)  
Inertia  42.33%  35.52%  14.62%  7.53%  39.21% 
(22.52)  (18.70)  (21.69)  (12.54)  (11.64)  
Nonupdating  42.64%  34.82%  14.88%  7.66%  42.93% 
(24.91)  (20.89)  (22.09)  (22.09)  (23.20)  
Invariant  42.64%  34.90%  14.62%  7.84% 
Notes. Standard deviation is in parentheses. n = 268.
Even at this level of aggregation, the results of the temporal dynamics analysis suggest an oscillatory pattern in behavior. This is made clear by Figure 6. In this and the following figures, circled areas corresponds to the states (rules of behavior) and arrows to transitions. The areas of the circles and the thickness of the arrows are proportional to the weights in the corresponding invariant distributions and to the transition probabilities, respectively. Participants gravitate mostly toward Bayes’ rules and reinforcement and continue using those rules in most cases, but transition probabilities across rules are typically large, very especially between Bayes’ rule and reinforcement. We observe that, at the aggregate level, the rows of the transition probability matrix are similar to each other, indicating that the average dynamics might be close to an independent and identically distributed (i.i.d.) process. This, however, is not necessarily true at the individual level. Individual transition probability matrices display considerable variance and are often quite far from reflecting i.i.d. processes. Online Appendix F provides a more detailed examination of this point, including a measure of heterogeneity in transition probability matrices (Figure F.1) and several examples at the individual level (Table F.1).
5.3. Temporal Dynamics and Incentives
Table 4 displays the average results of the HMM estimation (transition probability matrices, statedependent error rates, and invariant distributions) separately by treatment (low versus high incentives), following the conventions in Table 3. Figure 7 depicts the results, again separately by treatment (conventions are as in Figure 6). The overall pattern is confirmed, with participants mostly relying on Bayes’ rules and reinforcement but displaying relatively large transition probabilities. However, the effects of incentives are now more clear.

Temporal Dynamics by Treatment: Individual Averages
Low incentives  High incentives  

From  To  From  To  
Bayes  RL  Inertia  Nonupdating  Error  Bayes  RL  Inertia  Nonupdating  Error  
Bayes  44.10%  31.68%  15.12%  9.10%  11.36%  Bayes  41.29%  38.07%  14.09%  6.56%  14.01% 
(24.73)  (16.58)  (22.55)  (14.99)  (12.39)  (19.84)  (18.33)  (20.54)  (10.34)  (13.49)  
RL  44.11%  31.62%  14.91%  9.37%  23.30%  RL  41.00%  38.40%  14.17%  6.42%  22.44% 
(24.93)  (16.62)  (22.47)  (15.17)  (15.76)  (19.60)  (18.13)  (21.13)  (9.66)  (15.01)  
Inertia  43.13%  32.79%  15.25%  8.83%  39.45%  Inertia  41.45%  38.49%  13.93%  6.12%  38.98% 
(24.41)  (17.36)  (22.67)  (14.72)  (13.11)  (20.32)  (19.70)  (20.64)  (9.46)  (10.15)  
Nonupdating  44.28%  31.26%  15.49%  8.97%  43.71%  Nonupdated  40.85%  38.72%  14.22%  6.22%  42.21% 
(26.60)  (18.57)  (23.25)  (15.46)  (24.15)  (22.89)  (21.81)  (20.81)  (9.83)  (22.36)  
Invariant  43.97%  31.79%  15.11%  9.13%  Invariant  41.17%  38.30%  14.11%  6.42% 
Notes. Standard deviation is in parentheses. n = 128 for low incentives and n = 140 for high incentives.
Specifically, comparing the two conditions, we observe different dynamics among behavioral rules. The most striking result is that increasing incentives shifts behavior toward reinforcement and away from Bayes’ rule. After using Bayes’ rule, under high incentives (compared with low incentives), it is less likely to stay with it and use it again (low, 44.10%; high, 41.29%; MWW, n = 268, z = 2.959, p = 0.003) and more likely to switch away to reinforcement (low, 31.68%; high, 38.07%; MWW, n = 268, z = −4.329, p < 0.0001). In contrast, after using reinforcement, under high incentives, it is more likely to stay with this rule (low, 31.62%; high, 38.40%; MWW, n = 268, z = −4.749, p < 0.0001) and less likely to switch away to Bayes’ rule (low, 44.11%; high, 41.00%; MWW, n = 268, z = 2.927, p = 0.003).^{16}
In other words, we observe a significant treatment effect in the temporal dynamics in the form of a clear shift toward reinforcement and away from Bayes’ rule with larger incentives. This effect can only be observed clearly by examining the temporal dynamics. Comparing individual invariant distributions, we observe that, in accordance with the previous results, the overall probability of relying on reinforcement is larger under high incentives (low, 31.79%; high, 38.30%; MWW, n = 268, z = −4.756, p < 0.0001). The longrun probability of relying on Bayes’ rule is smaller under high incentives, but the difference is not significant (low, 43.97%; high, 41.17%; MWW, n = 268, z = 1.018, p = 0.310).
These results have a twofold interpretation. On the one hand, they speak in favor of a ceiling effect of incentives where a higher reward for each correct choice does not decrease the error rates and does not increase the overall reliance on Bayes’ rule. On the other hand, higher incentives seem to increase the appeal of simple reinforcement. The latter is compatible with the EEG study by Achtziger et al. (2015), which pointed out that increasing incentives increases the salience of the received feedback and makes simple reinforcement processes more prominent. This is simply because if monetary incentives are increased, the win/loss cues that lead to an activation of reinforcement processes become more salient, and hence reliance on reinforcement increases.
We remark that these results can only be obtained once we rely on the HMM analysis. The previous FMM analysis neglects the temporal dynamics, and it is only by allowing for the latter that a clearer picture of the effects of incentives on actual behavior starts to emerge. We will further examine heterogeneity in the reaction to incentives in this context in Section 5.5.
5.4. Dynamic Classification
Our FMM analysis in Section 4.2 allowed us to classify participants according to the rule they mostly followed under an i.i.d. assumption (Figure 3). Because our HMM analysis delivers the individual invariant distributions, which summarize the longrun proportion of time spent in each state, that is, using each rule, we can derive an alternative classification from those invariant distributions dispensing with the i.i.d. assumption. Specifically, we classify individuals based on the state displaying the largest probability in their respective individual invariant distributions. That is, similarly to Section 4.2, a subject is considered Bayesian if Bayes’ rule is given the largest probability in the invariant distribution derived from her individual transition probability matrix.
The result of this classification turns out to be empirically identical to the one obtained in Section 4.2 and depicted in Figure 3. That is, although individual weights for the hidden states are quite different from the rule weights obtained in the FMM model, every single individual retains the same classification in the HMM as in the FMM case, except for the two unclassified subjects (who are reclassified as nonupdaters). That is, the previous classification according to the mostused behavioral rules turns out to be stable. Although the objective of the HMM analysis is to examine the temporal dynamics and not this derived classification, we view the fact that the latter agrees with the ones derived in the previous section as a validation of the approach.
Table 5 summarizes the results of the HMM classification. Although the classification in terms of which rule is followed most of the time is almost identical to the one derived from the FMM, the conditional means are quite different. In the FMM case, the conditional means were above 97% for Bayesians and reinforcers and above 84% for nonupdaters and subjects relying mostly on inertia. In contrast, in the HMM classification conditional means have a different interpretation because they are derived from invariant distributions. As can be seen in the third column of Table 5, in this case, the conditional means are between 51% and 65%. That is, when one neglects the possibility of dynamic dependence among behavioral rules, the FMM classification reliably identifies the mostused rule for each individual but does not offer a good explanation for deviations from that rule at the individual level. In contrast, taking into account the possible temporal dynamics, the HMM classification uncovers heterogeneity within each individual, in the sense that subjects who mostly follow a behavioral rule actually also rely on different rules a significant proportion of the time, as reflected by the weights in the invariant distribution. Thus, the conditional means in the HMM case reveal that withinindividual heterogeneity in behavioral rules is of a sizeable magnitude.

Estimation Summary: HMM Conditional on Subjects Classified as Mostly Following the Relative Behavioral Rule
Behavioral rule  Classified as  Percentage  Cond.Mean  Cond.Error rate 

Bayes  130  48.51  64.78  10.07 
(5.51)  (11.39)  
Reinforcement  75  27.99  59.28  24.33 
(5.25)  (16.17)  
Inertia  44  16.42  62.52  39.18 
(3.31)  (10.98)  
Nonupdating  19  7.09  51.44  35.16 
(10.14)  (18.51) 
Notes. Standard deviation is in parentheses. n = 268.
5.5. Heterogeneity in the Effects of Incentives
We now proceed to refine the results on the effects of incentives (Section 5.3) by accounting for heterogeneity, relying on the classification obtained previously. Table 6 displays the average transition probability matrices conditional on subjects classified as Bayesians (132) or nonBayesians (136), respectively. As in Table 4, the rightmost columns list the estimated error rates, and the bottom rows detail the (average) invariant distributions. The two panels of Figure 8 give a graphical representation of the results. Each picture is analogous to the one in Figure 7 but restricted to subjects classified as Bayesians or nonBayesians, respectively. The significance of tests for each transition probability (high versus low incentives) and the direction of the significant change ($+/$) is highlighted.

Temporal Dynamics Distinguishing Subjects Classified as Most Likely Following Bayes’ Rule from Others
Bayesian subjects  NonBayesian subjects  

From  To  From  To  
Bayes  RL  Inertia  Nonupdating  Error  Bayes  RL  Inertia  Nonupdating  Error  
Bayes  64.85%  30.15%  2.97%  2.02%  10.07%  Bayes  21.94%  39.05%  25.61%  13.41%  14.38% 
(5.55)  (5.60)  (1.10)  (1.13)  (11.39)  (7.79)  (23.29)  (25.65)  (16.30)  (14.27)  
RL  64.74%  30.35%  2.91%  2.00%  22.18%  RL  21.79%  39.10%  25.53%  13.58%  23.50% 
(6.24)  (6.16)  (13.44)  (1.18)  (14.95)  (7.33)  (23.12)  (25.98)  (16.05)  (15.75)  
Inertia  63.41%  31.61%  3.21%  1.77%  39.44%  Inertia  22.47%  39.20%  25.37%  12.96%  38.98% 
(10.49)  (10.23)  (3.55)  (2.54)  (12.73)  (8.14)  (23.56)  (25.79)  (15.46)  (10.53)  
Nonupdating  64.93%  29.90%  3.17%  2.01%  45.16%  Nonupdating  21.64%  39.46%  25.92%  12.98%  40.79% 
(15.44)  (15.21)  (5.19)  (3.37)  (23.92)  (8.33)  (23.57)  (25.93)  (16.32)  (22.79)  
Invariant  64.78%  30.25%  2.97%  2.01%  Invariant  21.98%  39.16%  25.56%  13.30% 
Notes. Individual averages with standard deviation are in parentheses. n = 130 for Bayesian subjects and n = 138 for nonBayesian subjects.
Increasing incentives has different effects for different subjects. For those relying mostly on Bayes’ rule, higher incentives seem to be slightly detrimental, in the sense that the longrun probability of Bayes’ rule (in the invariant distribution) is smaller under high incentives (low, 70.24%; high, 59.65%; MWW, n = 130, z = 9.833, p < 0.001). This is mostly due to a decrease in the probability to stay with Bayes’ rule after using it (low, 70.17%; high, 59.84%; n = 130, z = 9.832, p < 0.001) and an increase in the probability of transition from Bayes’ rule to reinforcement (low, 24.75%; high, 35.24%; MWW, n = 130, z = −9.832, p < 0.001), in line with the increased appeal of reinforcement under high incentives mentioned previously. This is further supported by a similarly large increase in the probability of staying with the reinforcement rule after using it (low, 24.62%; high, 35.74%; MWW, n = 130, z = −9.813, p < 0.001) and a reduction in the probability of transition from reinforcement to Bayes’ rule (low, 70.50%; high, 59.32%; MWW, n = 130, z = 9.813, p < 0.001). We remark that the changes in probabilities across incentives are substantial, especially compared with those at the aggregate level.
For subjects classified as following mostly some nonBayesian rule, the longrun probability of Bayes’ rule is slightly larger under high incentives than under low ones (low, 21.13%; high, 22.67%; n = 138, $z=2.816$, p = 0.005). This results from a small increase in the probability to stay with Bayes’ rule after using it (low, 20.90%; high, 22.77%; n = 138, z = −1.814, p = 0.070). However, there is a significant increase in the probability of switching from Bayes’ rule to reinforcement (low, 37.36%; high, 41.17%; n = 138, z = −2.596, p = 0.009). Furthermore, in line with the increased appeal of reinforcement under higher incentives, the probability of staying with this rule after using it is larger under high incentives than under low ones (low, 37.33%; high, 41.33%; MWW, n = 138, z = 2.673, p = 0.007).
Aggregating all subjects classified as nonBayesians together (138), however, again loses valuable information. Online Appendix G reports the temporal dynamics (average transition matrices and invariant distributions, Table G.1 and Figure G.1) for all four types. Subjects classified as following a specific behavioral rule are in general quite consistent and exhibit large probabilities for returning to that rule after a deviation. However, the effects of incentives are quite different across types and in particular when comparing reinforcers and (nonBayesian) nonreinforcers.
The majority of nonBayesians are classified as reinforcers (75). For those, increasing incentives results in a large increase in the longrun probability of modelfree reinforcement (low, 54.89%; high, 65.03%; MWW, n = 75, $z=9.834$, p < 0.001) and a sizeable decrease in the probability of using Bayes’ rule (low, 30.08%; high, 20.09%; MWW, n = 75, z = 7.371, p < 0.001). This is accompanied by a large increase in the probability of switching from Bayes’ rule to reinforcement (low, 54.70%; high, 65.34%; n = 75, z = −7.220, p < 0.001) and a large reduction in the probability of transition from reinforcement to Bayes’ rule (low, 29.87%; high, 20.18%; MWW, n = 75, z = 7.370, p < 0.001).
For nonBayesian subjects classified as mostly following inertia (44) or nonupdating (19), results are quite different. Higher incentives lead to an increase in the longrun probability of Bayes’ rule (inertia: low, 14.95%; high, 25.11%; MWW, n = 44, $z=5.677$, p < 0.001; nonupdating: low, 9.81%; high, 14.86%; MWW, n = 19, $z=3.636$, p < 0.001) accompanied by a decrease in the longrun probability of the own respective rule (inertia: low, 65.16%; high, 59.72%; MWW, n = 44, z = 5.631, p < 0.001; nonupdating: low, 59.95%; high, 40.07%; MWW, n = 19, z = 3.638, p < 0.001).^{17}
These results clarify the mechanisms linking incentives to performance in our paradigm while taking into account behavioral heterogeneity. The effect of increasing incentives is doubleedged. On the one hand, higher incentives do have a positive, presumably motivational effect for some subjects, leading to a higher reliance on Bayes’ rule. On the other hand, higher incentives seem to generally increase the reliance on reinforcement, in agreement with a resulting increased salience of the winlose cues that activate that behavioral rule (Section 5.3).
The overall picture is hence as follows. For subjects classified as mostly Bayesian, low incentives suffice to spark the use of Bayes’ rule, presumably because those are sufficient motivation to overcome their cognitive costs. For those, an increase in incentives produces no improvement, suggesting a ceiling effect. The effects might even be slightly detrimental, in line with the interpretation that higher incentives make modelfree reinforcement more appealing. For subjects classified as mostly reinforcers, the winlose monetary cues spark reliance on reinforcement already for low incentives, and an increase in incentives simply enhances those cues and is hence mostly detrimental. For subjects classified neither as Bayesians nor as reinforcers, low incentives do not suffice to trigger the use of Bayes’ rule, but increasing incentives creates a significant shift toward it, which is of a large magnitude in relative terms. Although their performance is generally worse than that of Bayesians, these subjects do respond to incentives as standard economic theory would suggest.
Thus, the analysis of the temporal dynamics casts light on the mechanisms undergoing the effects of incentives on performance for belief updating tasks and specifically shows that those effects are highly heterogeneous. Furthermore, our analysis through HMM shows that the underlying heterogeneity is well captured by the view that reinforcement is the main driver behind deviations from Bayesian updating, with alternative rules playing a comparatively small role.
6. Conclusion
We designed a novel beliefupdating experimental paradigm to disentangle alternative rules of behavior when an existing prior can be updated in the face of new information and that information carries a winlose feedback, as is often the case for financial and managerial decisions. The analysis shows that, in such cases, modelfree reinforcement is the main driver of deviations from Bayesian updating, with alternative rules as decision inertia or nonupdating playing a smaller role.
We use a multilayered identification strategy. First, by applying finite mixture models, we find large levels of heterogeneity, with around half of the population relying mostly on Bayesian updating, over a quarter relying on reinforcement, and the rest on the remaining rules. At this level of analysis, we find that increasing incentives results in a performance increase for subjects classified as nonBayesians, whereas evidence for Bayesians is compatible with ceiling effects.
Second, we use HMMs to examine the temporal dynamics across different behavioral rules. We find considerable heterogeneity within individuals, with most of them relying on several, different behavioral rules over time, especially Bayes’ rule and modelfree reinforcement. Thus, even subjects classified as Bayesians are only “parttime Bayesians.” The analysis of these patterns shows significant effects of incentives with relatively large probability shifts. Subjects who rely mostly on reinforcement increase the frequency of this rule under higher incentives. Subjects who rely mostly on Bayes’ rule experience detrimental effects of incentives, due to an analogously increased frequency of reinforcement. These results might reflect a “reinforcement paradox” where modelfree reinforcement is triggered more often when the monetary values attached to the winlose cues increases. Last, subjects who rely on either inertia or nonupdating benefit from increased incentives, as they react by shifting toward Bayes’ rule, improving their performance.
Our results go beyond the wellknown fact that human decision makers are not Bayesian. On the one hand, decision makers can be fruitfully classified according to the decision rules they mostly rely on. Furthermore, the resulting heterogeneity extends to the effects of financial incentives. Although some decision makers experience ceiling effects or even detrimental consequences, others do react positively to incentives. On the other hand, decision makers exhibit nontrivial temporal patterns and rely on different behavioral rules over time. When it comes to belief updating, it is not only that one size does not fit all, but rather, that one size might not even fit one.
^{1} Charness and Levin (2005) provide an intuitive example illustrating reinforcement behavior in managerial decisions. Imagine a rookie employee is sent to close a business deal because a more experienced negotiator is unavailable and achieves good results. Next time a similar deal has to be closed, arguments like “never change a winning horse” might prompt the CEO to send the rookie again. This, however, neglects the possible informational content of the previous outcome, which might imply that a more experienced negotiator could achieve even better results.
^{2} Received belief updating tasks from the decisionmaking literature are typically simpler and insufficient for our purposes. For example, in the task of Charness and Levin (2005), Achtziger and AlósFerrer (2014), and Achtziger et al. (2015), in half of the possible cases, the Bayesian prescription is “obvious” (and error rates are extremely low) because actually it coincides with the prescription of reinforcement, although in the remaining cases, the two rules conflict (and error rates are extremely high). Furthermore, the task produces only four possible decision cases in total.
^{3} The design is related to Asparouhova et al. (2015), who investigate belief updating based on sampling without replacement. As in our case, in that paradigm, the first draw is of little interest in itself.
^{4} For example, after observing a white ball in the first draw in a fourball trial, a Bayesian should update the probabilities of the three urns to 1/2, 1/3, and 1/6, respectively; hence, the probability of extracting a black ball in the second draw, given that there is one white ball less in the urn, is $(1/3)\xb7(1/2)+(2/3)\xb7(1/3)+1\xb7(1/6)=5/9>1/2$, leading to an optimal bet on black. In contrast, if a white ball is extracted in the first draw of a sixball trial, the updated probabilities of the urns are 5/9, 3/9, and 1/9, respectively, and the probability of a black ball in the second draw is $(1/5)\xb7(5/9)+(3/5)\xb7(3/9)+1\xb7(1/9)=19/45<1/2$, leading to an optimal bet on white.
^{5} We implemented a betweensubject treatment of incentives as common in the literature (Barron 2021) to avoid order effects and other confounds.
^{6} Specifically, the payall mechanism is incentive compatible whenever subjects’ preferences fulfill a condition called “no complementarities at the top” when evaluating bundles of outcomes (Azrieli et al. 2018). In our case, this assumption is immediately fulfilled as long as participants prefer more money over less (but the assumption might impose stronger restrictions in more complex environments). In the second decision within every trial, one option is dominated by the other, and participants always have a strict incentive to choose the urn they believe to be more likely. Then, choosing the correct option in all trials dominates any other alternative bundle of choices, hence making this payment mechanism incentivecompatible according to Azrieli et al. (2018).
^{7} The alternative unsupervised approach attempts to identify a predetermined number of different types of individuals in the population. In Online Appendix A (see Section 4.5), we implement an unsupervised approach and show that this analysis is inappropriate for our data.
^{8} Online Appendix A (see Section 4.5) reports an additional analysis at the aggregate level.
^{9} Online Appendices C and D (see Section 4.5) report on the estimation of an FMM with only three (noisy) rules and of FMMs at the aggregate level with varying numbers of mistakefree rules.
^{10} Additionally, there is no correlation between the selfreported difficulty of the task and the classification (measured according to a Likert scale $[0,10]$ with 10 being “very difficult”; mean 2.78, median 2; Spearman’s correlation: Bayes, n = 266, $\rho =0.011$, p = 0.860; reinforcement, n = 266, $\rho =0.017$, p = 0.777).
^{11} A similar result is obtained by comparing the estimated error rates for Bayes’ rule, ${\epsilon}_{1}^{j}$, for nonBayesian subjects across incentive levels. The average estimated error rate is 16.48% for low incentives and 13.38% for high incentives (MWW, n = 136, z = 1.849, p = 0.064).
^{12} AlósFerrer et al. (2021a) relies on the chronometric effect to obtain results on preference revelation in the absence of assumptions on underlying utility noise.
^{13} We further applied the unsupervised approach to a simulated data set where all subjects switch uniformly across our four rules (the data set is described in Online Appendix E). Again, the results fit it to rules which always bet on the same color.
^{14} Subjects classified as mostly Bayesian in the FMM made fewer correct choices (55.03%) for fourball decisions, where reinforcement prescribes an error, than for sixball decisions, where reinforcement points to the correct response (65.21%, WSR test n = 130, $z=2.155$, p = .031). This would be compatible with those subjects relying on reinforcement as an alternative rule at least part of the time. The FMM classification, however, does not reflect this observation, in view of the large conditional means in Table 1.
^{15} Online Appendix E reports a parameter recovery exercise, where the data were simulated with a known generating process.
^{16} Increasing incentives decreases the probability of staying with nonupdating (low, 8.97%; high, 6.22%; MWW, n = 268, z = 2.141, p = 0.0323) but not with inertia (low, 15.25%; high, 13.93%; MWW, n = 268, z = 0.801, p = 0.423).
^{17} For subjects relying mostly on inertia, there was no significant difference in the probability of reinforcement (low, 10.21%; high, 9.03%; MWW, n = 44, z = 0.530, p = 0.604). For subjects relying mostly on nonupdating, higher incentives increased the reliance on reinforcement (low, 25.00%; high, 29.78%; MWW, n = 19, $z=3.635$, p < 0.001).
References
 2014) Fast or rational? A responsetimes study of Bayesian updating. Management Sci. 60(4):923–938.Link, Google Scholar (
 2015) Higher incentives can impair performance: Neural evidence on reinforcement and rationality. Soc. Cognitive Affective Neurosci. 10(11):1477–1483.Crossref, Google Scholar (
 2014) Autonomous mechanism of internal choice estimate underlies decision inertia. Neuron 81(1):195–206.Crossref, Google Scholar (
 2021) Cognitive sophistication and deliberation times. Experiment. Econom. 24(2):558–592.Crossref, Google Scholar (
 2022) Strength of preference and decisions under risk. J. Risk Uncertainty 64(3):309–329.Google Scholar (
 2021a) Time will tell: Recovering preferences when choices are noisy. J. Political Econom. 129(6):1828–1877.Google Scholar (
 2016) Inertia and decision making. Frontiers Psych. 7(169):1–9.Google Scholar (
 2021b) Effortful Bayesian updating: A pupildilation study. J. Risk Uncertainty 63(1):81–102.Google Scholar (
 2012) Dynamic learning in behavioral games: A hidden Markov mixture of experts approach. Quant. Marketing Econom. 10(4):475–503.Crossref, Google Scholar (
 2015) Asset pricing and asymmetric reasoning. J. Political Econom. 123(1):66–122.Crossref, Google Scholar (
 2018) Incentives in experiments: A theoretical analysis. J. Political Econom. 126(4):1472–1503.Crossref, Google Scholar (
 1998) A model of investor sentiment. J. Financial Econom. 49(3):307–343.Crossref, Google Scholar (
 2021) Belief updating: Does the goodnews, badnews asymmetry extend to purely financial domains? Experiment. Econom. 24(1):31–58.Google Scholar (
 1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41(1):164–171.Crossref, Google Scholar (
 2011) Separate encoding of modelbased and modelfree valuations in the human brain. Neuroimage 58(3):955–962.Crossref, Google Scholar (
 2008) Measuring inequity aversion in a heterogeneous population using experimental decisions and subjective probabilities. Econometrica 76(4):815–839.Crossref, Google Scholar (
 2010) Portfolio inertia and stock market fluctuations. J. Money Credit Bank. 42(4):715–742.Crossref, Google Scholar (
 2018) The many faces of human sociality: Uncovering the distribution and stability of social preferences. J. Eur. Econom. Assoc. 17(4):1025–1069.Crossref, Google Scholar (
 2010) Risk and rationality: Uncovering heterogeneity in probability distortion. Econometrica 78(4):1375–1412.Crossref, Google Scholar (
 1987) Do biases in probability judgment matter in markets? Experimental evidence. Amer. Econom. Rev. 77(5):981–997.Google Scholar (
 2009) The allocation of time in decisionmaking. J. Eur. Econom. Assoc. 7(23):628–637.Crossref, Google Scholar (
 2005) When optimal choices feel wrong: A laboratory study of Bayesian updating, complexity, and affect. Amer. Econom. Rev. 95(4):1300–1309.Crossref, Google Scholar (
 2001) Cognition and behavior in normalform games: An experimental study. Econometrica 69(5):1193–1235.Crossref, Google Scholar (
 1937) Affective valuedistances as a determinant of aesthetic judgmenttimes. Amer. J. Psych. 50:57–67.Crossref, Google Scholar (
 1991) Variable metric method for minimization. SIAM J. Optim. 1(1):1–17.Crossref, Google Scholar (
 2011) Modelbased influences on humans’ choices and striatal prediction errors. Neuron 69(6):1204–1215.Crossref, Google Scholar (
 2005) Uncertaintybased competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neurosci. 8(12):1704–1711.Crossref, Google Scholar (
 1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B 39(1):1–38.Crossref, Google Scholar (
 2012) The ubiquity of modelbased reinforcement learning. Curr. Opin. Neurobiology 22(6):1075–1081.Crossref, Google Scholar (
 1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, vol. 5 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar (
 1968)
Conservatism in human information processing . Kleinmuntz B, ed. Formal Representation of Human Judgment (Wiley, New York), 17–52.Google Scholar (  1995) Are people Bayesian? Uncovering behavioral strategies. J. Amer. Statist. Assoc. 90(432):1137–1145.Crossref, Google Scholar (
 2019) Correlation neglect in belief formation. Rev. Econom. Stud. 86(1):313–332.Google Scholar (
 2016)
Learning and the economics of small decisions . Kagel J, Roth AE, eds. Handbook of Experimental Economics, vol. 2 (Princeton University Press, Princeton, NJ), 638–716.Crossref, Google Scholar (  2020) Humans primarily use modelbased inference in the twostage task. Natural Human Behav. 4(10):1053–1066.Crossref, Google Scholar (
 2007) zTree: Zurich toolbox for readymade economic experiments. Experiment. Econom. 10(2):171–178.Crossref, Google Scholar (
 1963) A rapidly convergent descent method for minimization. Comput. J. 6(2):163–168.Crossref, Google Scholar (
 2006) Finite Mixture and Markov Switching Models (Springer Series in Statistics, New York).Google Scholar (
 2019) Handbook of Mixture Analysis (Chapman and Hall, New York).Crossref, Google Scholar (
 2016) The psychology and neuroscience of financial decision making. Trends Cognitive Sci. 20(9):661–675.Crossref, Google Scholar (
 2010) States vs. rewards: Dissociable neural prediction error signals underlying modelbased and modelfree reinforcement learning. Neuron 66(4):585–595.Crossref, Google Scholar (
 2015) Subject pool recruitment procedures: Organizing experiments with ORSEE. J. Econom. Sci. Assoc. 1:114–125.Crossref, Google Scholar (
 1980) Bayes rule as a descriptive model: The representativeness heuristic. Quart. J. Econom. 95:537–557.Crossref, Google Scholar (
 1992) Testing Bayes rule and the representativeness heuristic: Some experimental evidence. J. Econom. Behav. Organ. 17:31–57.Crossref, Google Scholar (
 2006) Optimal predictions in everyday cognition. Psych. Sci. 17(9):767–773.Crossref, Google Scholar (
 2013) Elicited beliefs and social information in modified dictator games: What do dictators believe other dictators do? Quant. Econom. 4(3):515–547.Crossref, Google Scholar (
 1972) Subjective probability: A judgment of representativeness. Cognitive Psych. 3:430–454.Crossref, Google Scholar (
 1975) A First Course in Stochastic Processes, 2nd ed. (Academic Press, San Diego).Google Scholar (
 2007) Neural antecedents of financial decisions. J. Neurosci. 27(31):8174–8177.Crossref, Google Scholar (
 2010) Visual fixations and the computation and comparison of value in simple choice. Nature Neurosci. 13(10):1292–1298.Crossref, Google Scholar (
 2015) Rethinking fast and slow based on a critique of reactiontime reverse inference. Nature Comm. 6(7455):1–9.Google Scholar (
 1997) Hidden Markov and Other Models for DiscreteValued Time Series (Chapman & Hall/CRC Press, Boca Raton, FL).Google Scholar (
 2014) Hidden Markov Models in Finance: Further Developments and Applications, vol. II (Springer, New York).Crossref, Google Scholar (
 2005) Stochastic choice and the allocation of cognitive effort. Experiment. Econom. 8(4):369–388.Crossref, Google Scholar (
 1967) Time required for judgements of numerical inequality. Nature 215(5109):1519–1520.Crossref, Google Scholar (
 1978) The importance of being conservative: Some reflections on human Bayesian behaviour. British J. Math. Statist. Psych. 31:33–48.Crossref, Google Scholar (
 2015) Learning about unstable, publicly unobservable payoffs. Rev. Financial Stud. 28(7):1874–1913.Crossref, Google Scholar (
 1970) Revision of opinion and decision times in an informationseeking task. J. Experiment. Psych. 83(3):400–405.Crossref, Google Scholar (
 1990)
A tutorial on hidden Markov models and selected applications in speech recognition . Setti G, ed. Readings in Speech Recognition (Elsevier, Amsterdam, Netherlands), 267–296.Crossref, Google Scholar (  2004) Hidden Markov Models: Applications to Financial Economics (Springer Science & Business Media, Berlin).Google Scholar (
 1984) Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26(2):195–239.Crossref, Google Scholar (
 1992) Statusquo and omission biases. J. Risk Uncertainty 5(1):49–61.Crossref, Google Scholar (
 1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599.Crossref, Google Scholar (
 2012) Procuring commodities: Firstprice sealedbid or English auctions? Marketing Sci. 31(2):317–333.Link, Google Scholar (
 2015) A hidden Markov model for the detection of pure and mixed strategy play in games. Econometric Theory 31(4):729–752.Crossref, Google Scholar (
 2005) Irrational Exuberance, 2nd ed. (Princeton University Press, Princeton, NJ).Google Scholar (
 2000) Inefficient Markets: An Introduction to Behavioural Finance (OUP Oxford, Oxford, UK).Crossref, Google Scholar (
 2018) The BCD of response time analysis in experimental economics. Experiment. Econom. 21(2):383–433.Crossref, Google Scholar (
 1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar (
 2005) Advances in Behavioral Finance, vol. 2 (Princeton University Press, Princeton, NJ).Google Scholar (
 1911) Animal Intelligence: Experimental Studies (MacMillan, New York).Crossref, Google Scholar (