Part-Time Bayesians: Incentives and Behavioral Heterogeneity in Belief Updating

Published Online:


Decisions in management and finance rely on information that often includes win-lose feedback (e.g., gains and losses, success and failure). Simple reinforcement then suggests to blindly repeat choices if they led to success in the past and change them otherwise, which might conflict with Bayesian updating of beliefs. We use finite mixture models and hidden Markov models, adapted from machine learning, to uncover behavioral heterogeneity in the reliance on difference behavioral rules across and within individuals in a belief-updating experiment. Most decision makers rely both on Bayesian updating and reinforcement. Paradoxically, an increase in incentives increases the reliance on reinforcement because the win-lose cues become more salient.

This paper was accepted by Gustavo Manso, finance.

Funding: C. Alós-Ferrer gratefully acknowledges financial support from the Deutsche Forschungsgemeinschaft under [Grant AL1169/4], part of the research unit “Psychoeconomics” (FOR 1882).

Supplemental Material: The data files and online appendices are available at

1. Introduction

Overwhelming evidence shows that human decision makers have a limited grasp of probabilities and, especially, of how beliefs should be updated in the face of new information. Previous research has identified a veritable catalogue of deviations from Bayesian updating, most of them taking the form of heuristics and biases that become relevant when certain informational triggers are present (Kahneman and Tversky 1972, Grether 1980, Camerer 1987). Numerous studies have demonstrated that those deviations affect and distort financial decisions (Shleifer 2000, Shiller 2005, Thaler 2005, Asparouhova et al. 2015, Frydman and Camerer 2016). In this work, we are interested in a kind of deviation from Bayesian updating that is particularly relevant for management and finance. Decisions in those fields often rely on information that includes win-lose feedback: gains and losses, success and failure, beating the competition or not, and so on. Indeed, as shown by Knutson and Bossaerts (2007), gains and losses are crucial to understand the neural foundations of financial decisions.

Whenever information carries a win-lose component, elementary reinforcement behavior (Thorndike 1911, Schultz et al. 1997, Sutton and Barto 1998) dictates to repeat choices if they led to success in the past, and change them if they led to failure.1 This kind of simple “win-stay, lose-shift” behavior (Thorndike’s “law of effect”) is in line with model-free reinforcement learning, which has been studied in recent contributions in neuroscience as opposed to model-based (reinforcement) learning, which itself can be construed as incorporating Bayesian behavior (Daw et al. 2005, Doll et al. 2012, Feher da Silva and Hare 2020). For example, Daw et al. (2011) show that, in a two-stage Markov decision task, subjects’ choices seem to follow a mixture of model-based and model-free reinforcement learning. In a related task, Gläscher et al. (2010) find support for the coexistence of two distinct neural signatures corresponding to these two unique forms of learning. Crucially for our purposes, Beierholm et al. (2011) study switching between model-based and model-free reinforcement and estimate a high within-individual switching rate (of around one-third of the time).

We hypothesized that model-free reinforcement (win-stay, lose-shift behavior) is the main reason that financial actors often fail to conform with Bayesian updating in decision problems where new information allows to update prior beliefs and that information contains a win-lose component. In view of the neuroscience literature, we carried out a decision-making experiment to explore the idea that there is substantial heterogeneity across individuals, with some agents being “more Bayesian” than others, and within each given individual (as suggested by Beierholm et al. 2011). Specifically, our hypothesis was that decision makers mostly follow either noisy versions of Bayesian updating or model-free reinforcement, but we did not exclude a priori the involvement of other candidates that the literature has proposed as playing a potential role.

Specifically, we consider two additional behavioral phenomena. The first is conservatism bias or non-updating behavior, which reflects a general failure to update beliefs, and, taken to the extreme, identifies the posterior with the prior (Edwards 1968, Navon 1978, El-Gamal and Grether 1995). For example, in financial markets, Barberis et al. (1998) argued that conservatism might explain underreaction to news. The second is decision inertia, which refers to the repetition of previous choices independently of feedback (Pitz and Geller 1970, Alós-Ferrer et al. 2016). For instance, it has been shown that a large fraction of stock account owners exhibit portfolio inertia (Bilias et al. 2010). Although these phenomena represent broad behavioral tendencies, in the framework of our experiment, they will correspond to precisely defined behavioral rules, which we take as representative of behavior corresponding to neither Bayesian updating nor model-free reinforcement. We are particularly interested in these rules because they both capture feedback-independent deviations from Bayesian updating (hence contrasting with model-free reinforcement) involving low cognitive costs.

The second question we were interested in is whether, and if so, how our classifications would be affected by monetary incentives and their magnitude. On the one hand, increasing incentives should motivate decision makers to spend more cognitive effort on the task, thereby behaving more in accordance with the prescriptions of Bayes’ rule. On the other hand, reinforcement tendencies are triggered by win-lose cues, and if incentives are increased, both wins and losses become larger in magnitude, hence making the triggers more salient. In particular, in an experiment on belief updating, Achtziger and Alós-Ferrer (2014) showed that when the cue which triggers reinforcement is not paid, compared with when it is incentivized, participants rely less on win-stay/lose-shift. However, Asparouhova et al. (2015) showed that, in a market situation, participants sometimes avoided belief updating under higher incentives, possibly for fear of making (costly) mistakes, relying on simpler behavioral rules instead. Thus, we will specifically ask how the reliance on different behavioral rules is affected by the magnitude of incentives.

To answer the questions posed previously, we collected behavioral data using a novel experimental paradigm designed to disentangle four decision rules in belief updating (Bayes’ rule, reinforcement, conservatism, and decision inertia) and varied the level of monetary incentives. To obtain a classification of decision makers and study its dependence on incentives, we follow two complementary approaches. First, we rely on finite mixture models (FMM) from statistics (Frühwirth-Schnatter 2006), which provide classifications on the basis of choice data accounting for heterogeneity, unobservable determinants of behavior, and rule-specific errors. These models have received increasing attention for the analysis of heterogeneity in economic behavior (Costa-Gomes et al. 2001, Bellemare et al. 2008, Bruhin et al. 2018, Barron 2021). In agreement with our hypothesis, we find that roughly half of our sample follows non-Bayesian rules of behavior, with model-free reinforcers being the most numerous group among those. The other half is classified as mostly following Bayes’ rule, but with significant error rates. Additionally, we find no performance improvement with higher incentives for Bayesian participants, but non-Bayesian decision makers do become significantly better. This suggests a heterogeneous effect of incentives. For half of our sample, low incentives suffice to spark cognitive deliberation and the use of noisy approximations of Bayes’ rule. For these individuals, higher incentives do not result in additional improvements, in agreement with a ceiling effect. The rest of the sample, however, follows mostly model-free reinforcement, resulting in a lower performance. For these participants, higher incentives do result in increased reliance on Bayes’ rule (but not enough to result in the subjects’ reclassification as mostly Bayesians) and hence an increase in performance.

Although this first strategy of analysis concentrates on heterogeneity across individuals, our second approach turns to heterogeneity within each individual and specifically the temporal dynamics of the data. We ask the question of whether a given decision maker might rely on different decision rules over time and whether incentives do affect the balance. For this purpose, we implement an identification strategy adapted from the machine learning literature. Specifically, we rely on hidden Markov models (HMM; Rabiner 1990, Frühwirth-Schnatter et al. 2019), which have been previously used in economics to, for example, identify switching among learning rules in repeated games (Ansari et al. 2012, Shachat et al. 2015) or to study bidding heuristics in auctions (Shachat and Wei 2012).

In our setting, the idea is that each behavioral rule corresponds to an unobservable (hence “hidden”) state of an individual-level Markov chain whose transition probabilities capture the dynamics of behavior over time. Thus, we estimate the transition probabilities, which determine the long-run probabilities of the behavioral rules. We find that most subjects exhibit relatively large probabilities for both Bayes’ rule and model-free reinforcement, as well as relatively large transition probabilities between those. Overall, a picture arises where some subjects rely mostly on Bayes’ rule even if incentives are low, but occasionally follow model-free reinforcement and other rules, and either hit a ceiling when incentives are increased or even suffer detrimental effects due to increased reliance on model-free reinforcement as the win-lose cues become more salient. Other subjects rely predominantly on model-free reinforcement but occasionally follow Bayes’ rule. For those, an increase in incentives is essentially detrimental. Last, some subjects mostly follow rules other than Bayes’ rule and (model-free) reinforcement and generally achieve low levels of performance. For those, an increase in incentives is beneficial because it increases the transition probabilities toward Bayes’ rule.

Our work follows on previous literature on the classification of decision makers in terms of behavioral rules in the domain of belief updating. This includes both the distinction between model-free and model-based reinforcement learning (Daw et al. 2005, Doll et al. 2012) and previous work using finite mixture models. For example, in a belief-updating experiment, El-Gamal and Grether (1995) showed that different decision makers favored different behavioral rules, but Bayes’ rule was the most frequently used at the population level. That is, humans generally fail to perfectly follow Bayes’ rule, but the latter is still a good representation of the behavior of a majority of people for a large proportion of the time. Of course, both observations are only compatible if there is heterogeneity in the reaction to new information and how beliefs are updated.

We contribute to the related literature on heterogeneity in belief updating in three ways. First, compared with existing paradigms (Charness and Levin 2005, Knutson and Bossaerts 2007), our new experimental task allows to explicitly consider and disentangle more than two behavioral rules (and, in particular, contrast model-free reinforcement learning to feedback-independent deviations from Bayesian updating which have been found to be relevant in the financial decision-making literature). Second, compared with previous classification contributions as El-Gamal and Grether (1995), we take a step further in the identification of heterogeneity by investigating potential temporal dynamics, that is, allowing for within-subject heterogeneity in the reliance on different behavioral rules within the course of the experiment. Third, we explicitly target the effect of incentives on heterogeneity, that is, whether higher incentives affect reliance on one particular rule.

The most closely related contribution to ours is Payzan-LeNestour and Bossaerts (2015), which relied on a multiarmed bandit task. Participants mostly followed model-free reinforcement but switched to Bayesian learning when nudged into paying attention to crucial statistics of the environment. We share with this work a focus on the comparison between Bayesian behavior and model-free reinforcement and an interest in switching behavior. The main difference is that Payzan-LeNestour and Bossaerts (2015) evaluate which of different candidate models describe the data better, whereas we consider within-individual heterogeneity; that is, we examine to which extent decision makers are both Bayesians and reinforcers. Other differences concern the specific implementation and task complexity. For instance, our experimental task belongs to the class of static settings typically used to study heuristics and biases, where priors are reset after each relevant decision and each trial is independent of (and equivalent to) the others. In contrast, as in many contributions contrasting model-free and model-based reinforcement, the task of Payzan-LeNestour and Bossaerts (2015) is inherently dynamic, with the expected payoffs of the bandit’s arms changing over time.

Our contribution is further related to a strand of papers using a different, simpler paradigm contrasting win-stay, lose-shift behavior and Bayesian updating, where, as in our case, information is endowed with a win-lose cue. In this paradigm, Charness and Levin (2005) found very high error rates that would be consistent with model-free reinforcement behavior. By examining response times in the same paradigm, Achtziger and Alós-Ferrer (2014) argued that the data could be well explained by a dual-process model where simple reinforcement conflicts and interacts with more deliberative belief updating. Achtziger et al. (2015) examined neural evidence for simple reinforcement in this task using the electroencephalograph (EEG) and found that subjects with higher error rates under high incentives exhibited larger amplitudes in (extremely early) brain potentials linked to reinforcement learning. A possible interpretation in agreement with our results is that, even if larger monetary rewards increase effort, they also increase the salience of the win-lose cues which trigger reinforcement, hence creating a “reinforcement paradox” where higher incentives increase reliance on this alternative process instead of on Bayesian behavior. In line with this paradox, a pupil-dilation study (Alós-Ferrer et al. 2021b) has recently shown that higher incentives in this paradigm do increase cognitive effort while failing to result in generally increased performance.

The paper is structured as follows. Section 2 discusses the experimental design and the behavioral rules. Section 3 provides a descriptive overview of the data. Section 4 applies finite mixture modeling to obtain a classification of decision makers, studies the effects of incentives on that classification, and briefly discusses response times. Section 5 applies an HMM to study the temporal dynamics, the effects of incentives, and the differences across behavioral types. Section 6 concludes.

2. Design and Procedures

2.1. Behavioral Rules and Experimental Design

We designed a novel belief-updating paradigm with the explicit objective of disentangling different behavioral rules when new information carries a win-lose component. Because reinforcement is relevant for a wide range of conceptually different decisions, we developed a frame-free, abstract paradigm to study behavioral heterogeneity free of potential confounds that could arise from particularities present in one application but absent in others. In view of our motivation, it was important to focus on binary decisions that would provide a simple win-lose feedback while giving an opportunity to update beliefs on an underlying state of the world. An additional concern was to develop a paradigm rich enough to allow disentangling Bayesian updating and (model-free) reinforcement from the other two rules of interest (conservatism and decision inertia). The paradigm we developed belongs to the larger class of urn tasks, which have been extensively used to study biases in belief updating, for example, for the case of representativeness and conservatism (Grether 1980; 1992) or for the comparison of Bayesian updating and reinforcement rules (Charness and Levin 2005, Achtziger and Alós-Ferrer 2014, Achtziger et al. 2015).2

The essence of the paradigm is as follows. Participants are presented with three covered urns, each containing balls of two possible colors (black or white) in different but known proportions. One of the three urns is chosen at random, and a single ball is randomly extracted from it. Participants know the proportion of balls in each urn, that the actual urn is one of the three described ones, and that the urn has been selected randomly, with equal probabilities for each urn. Crucially, the ball is not replaced after the first extraction. Then, a second ball is extracted at random from the same urn. The participant’s decisions are bets. Before each of the two extractions, participants bet on the color of the extracted ball. We are interested in the second betting choice because the color of the first extracted ball allows updating the belief regarding the urn from which the balls are extracted from.3

Incentives are straightforward. After each extraction, participants are paid a constant amount if and only if the color of the ball matches their bet. After two extractions, the trial ends and all balls are replaced in the urns before a new trial starts. Participants are aware that all trials are independent from each other, so the urn from which the two balls are extracted is randomly and independently determined according to the same uniform prior in each of the 60 repetitions.

It is important to note that our focus is on distinguishing behavioral rules and not on dynamic learning, in contrast to the literature investigating model-free and model-based reinforcement learning (Payzan-LeNestour and Bossaerts 2015). By design, there is limited scope for learning in our paradigm, as trials are independent, and hence the situation “resets” after each choice, which is very different from the two-stage or multiarmed-bandit tasks that are commonly used in that literature.

To be able to disentangle our candidate behavioral rules, there are two types of trials using two different urn compositions, as depicted in Figure 1(a). In one type, each urn contains four balls. In the other type, each urn contains six balls. In both cases, one of the urns contains exactly one black ball, the other contains exactly one white ball, and the third urn contains half each of black and white balls. Although the alternative urn compositions are superficially similar and are both easy to understand, the prescriptions of the candidate rules (in particular, of Bayesian updating of beliefs) are quite different across the two kinds of trials, which in turn allows us to discriminate among them. In the experiment, participants were reminded of the composition of the urns and the number of balls in each urn at all times, as this information was prominently displayed in each trial.

Figure 1. Experimental Design
Notes. (a) Four-balls (top) and six-balls trials (bottom). (b) Prescriptions of the four candidate behavioral rules for the second bet, classified by first bet and received stimulus. ( = black ball extracted/bet on black ball, ° = white ball extracted/bet on white ball).

The first of the rules we are interested in is optimization following Bayesian updating of the prior, or simply Bayes’ rule for short. Bayes’ rule captures the normatively correct way to integrate new information with prior beliefs, but it has been widely shown to perform poorly as a descriptive rule. Empirically documented violations in experiments involving conditional probability judgments (Grether 1980, Charness and Levin 2005) have shown that human beings are simply not Bayesian optimizers, although Bayes’ rule is sometimes a reasonable approximation of behavior (El-Gamal and Grether 1995, Griffiths and Tenenbaum 2006). We use Bayes’ rule as a benchmark, as it describes the normatively optimal behavior.

Straightforward computations show that a subject who used Bayes’ rule in four-ball trials and chose black for the first bet should, for the second bet, shift to white if she won the first bet and stay with black if she lost the first best (and symmetrically if she bet on white the first time). In contrast, in six-ball trials, Bayes’ rule prescribes the exact opposite: stay with the same color for the second bet if she won the first and shift if she lost.4 The prescriptions of Bayes’ rule are summarized graphically in the column “Bayesian” in Figure 1(b).

The main alternative rule we are interested in is (model-free) reinforcement learning. This is the natural candidate in any setting where information comes with win-lose feedback. Reinforcement is a basic component of human behavior (Thorndike 1911, Sutton and Barto 1998) and refers to the tendency to repeat whatever action yielded a positive result in the past and avoid those which led to failure. In this paradigm, we assume the simplest form of reinforcement, that is, win-stay, lose-shift behavior. This rule of thumb has been shown to explain deviations from Bayesian behavior in belief-updating paradigms where information is extracted from previous wins and losses, as in many economic settings (Charness and Levin 2005, Achtziger and Alós-Ferrer 2014).

The prescriptions of this rule are particularly simple (see column “Reinforcement” in Figure 1(b)). In case of success, participants following (model-free) reinforcement would repeat the same choice, whereas they would shift to the other choice after failure. As a consequence, in our binary setting, this rule always prescribes to place the second bet on the color of the actually extracted first ball, that is, chasing after the previous winner. In particular, model-free reinforcement makes identical prescriptions for four-ball and six-ball trials, which means that it coincides with Bayes’ rule for six-ball trials but is completely opposed to it for four-ball trials.

The third behavioral rule we are interested in is inertia, which is the tendency to repeat previous choices independently of the outcome (Pitz and Geller 1970, Akaishi et al. 2014, Alós-Ferrer et al. 2016) and has been linked to status quo bias (Ritov and Baron 1992). For instance, Erev and Haruvy (2016) argue that humans exhibit a strong tendency to simply repeat the most recent decision, and this tendency is sometimes stronger than the tendency to react optimally to the most recent outcome.

In our paradigm, inertia prescribes that participants ignore the results of the first bet and simply repeat the previous decision, that is, bet black after betting black the first time and white after white (see column “Inertia” in Figure 1(b)). Again, the pattern is identical for four-ball and six-ball trials, and in both cases, it differs from Bayes’ rule and from reinforcement.

We term the last behavioral rule we focus on “non-updating.” This rule postulates that the prior is not updated, and that the decision maker uses a posterior identical to the prior (hence uniform). However, in our setting, there is a transparent change between the first and the second ball extraction, which is independent of probability updating: because there is no replacement, for the second bet the selected urn contains one ball less (three for four-ball trials and five for six-ball trials), namely the one extracted after the first bet. This poses an incorrect but simple optimization problem. That is, this rule prescribes that participants take into account the rather obvious fact that after the first extraction, one ball is missing, but they do not engage in belief updating at all.

Direct computations show that a subject who used the non-updating rule and won would shift to the opposite color for the second bet but would stay with the same color if she lost the first bet (column “Nonupdating” in Figure 1(b)). The prediction is the same for four-ball and six-ball trials. That is, by design, in our paradigm, non-updaters would always behave in direct opposition to win-stay, lose-shift reinforcement, and hence, the prescriptions of this rule coincide with those of Bayes’ rule for four-ball trials and are the opposite for six-ball trials.

In summary, and as shown in Figure 1(b), using just two binary choices and two urn compositions with three urns each, we can fully disentangle four different behavioral rules. For four-ball trials, Bayes’ rule and non-updating prescribe the same decisions, which are in direct opposition to the prescriptions of (model-free) reinforcement. For six-ball trials, Bayes’ rule and (model-free) reinforcement prescribe the same decisions, which are in direct opposition to the prescriptions of non-updating. In both cases, inertia always coincides with exactly half of the prescriptions of each other rule and is opposed to it for the other half.

2.2. Procedures

A total of n = 268 university students (142 females; age range, 18–43 years; mean = 24.07 years) were recruited using the Online Recruitment System for Economic Experiments (Greiner 2015) from the preregistered pool of the Cologne Laboratory for Economic Research (CLER), excluding students majoring in psychology or economics because they could have been familiar with similar paradigms. The experiment was programmed using z-Tree (Fischbacher 2007). Each participant made decisions in 60 trials, 30 with the four-ball design and 30 with the six-ball design. To avoid order effects, half of the participants (randomly assigned) worked on four-ball trials first and six-ball trial later, and the remaining participants followed the inverse order.

Participants received a performance-based payment plus a show-up fee of 2.50 Euro. To study the effects of incentives in our classifications, there were two different treatments (with data collected in different sessions). Under low incentives, each successful bet was rewarded with 18 Euro-cents (n = 128). Under high incentives, the payoff was 30 Euro-cents (n = 140) for each successful bet. The size of the incentives remained constant throughout the experiment and was common knowledge.5 All (successful) decisions were paid. In our context, this payment mechanism is incentive compatible under mild assumptions on individual preferences, as shown by Azrieli et al. (2018).6

Participants received detailed written instructions and answered nine control questions regarding the replacement of balls, states of the world, and trial independence. To ensure that participants understood the task, those who got any control questions wrong were provided with further information by the experimenter, until a sufficient understanding of the task was reached. There were no practice trials. No time limit was imposed during the experiment; participants were free to use as much time as they needed. At the end of the experiment, participants completed some (nonincentivized) questionnaires and provided demographic information (gender, age, and field of studies). A session lasted about 90 minutes, and average earnings were 17.93 Euros (standard deviation SD = 1.74).

3. Descriptive View of the Data

3.1. Choice Frequencies

The first bet is never of interest in itself, as given the prior both choices are equally likely to result in a payoff, and given the symmetry of the design, no behavioral rule makes a specific prescription. We are interested in the second bet within each trial. For those, rules generally make different prescriptions, which depend on the first bet and on the outcome of the first extraction.

Participants failed to consistently make optimal decisions (those prescribed by Bayes’ rule). For the second bet within each trial, the average across individuals of the percentage of correct decisions was 61.23% (median = 58.33%, SD = 15.53, minimum = 20.00%, maximum = 100.00%), which was significantly different from random performance according to a Wilcoxon signed-rank (WSR) test (n = 268, z = 14.193, p < 0.001).

At the aggregate level, there were no significant differences in the percentage of correct answers across incentive levels according to a Mann-Whitney-Wilcoxon test (MWW; low incentives, average 62.54%; high incentives, 59.80%; n = 268, z = 1.242, p = 0.214). The percentage of correct answers in four-ball trials (average, 58.22%) was lower than in six-ball trials (average, 64.24%), although the difference did not reach significance (WSR test, within subject, n = 268, z=1.626, p = 0.104). The (counterbalanced) order did not affect the percentage of correct answers (four-ball trials first: average, 62.22%; six-ball trials first: average, 57.25%; MWW test, n = 268, z=0.418, p = 0.676).

There is no evidence that the subjects “figured out” how to solve the task during the course of the experiment. In particular, there is no discernible trend for the low-incentive treatment (random effects Probit regression on the probability of a correct choice regressed against the trial number, coefficient =0.001, t = 0.50, p = 0.616), indicating that there was no learning at the aggregate level. For the high-incentive treatment, we observe a marginally significant but very modest positive trend (random effects regression, coefficient =0.002, t = 1.87, p = 0.061), which should not be overinterpreted.

3.2. Naïve Classification

As an illustration, we start with a simple but naïve classification approach based on observable data, similar to previous approaches in the literature, before we turn to our finite-mixture and hidden-Markov analyses. We examine choices at the individual level and classify a subject as following a particular behavioral rule if his or her decisions coincide with the prescriptions of that rule more than the others (i.e., is she follows Bayes’ rule more frequently than any of the alternative rules).

This simple procedure classifies 31.72% of subjects as following Bayes’ rule, 22.76% as reinforcers, 29.48% as non-updaters, and only 21.27% as relying on inertia. Figure 2 displays the proportion of subjects classified as following each behavioral rule depending on the incentive condition. However, in this simple classification, incentives seem to have limited or even detrimental effects, in the sense that there are no significant differences in the proportions assigned to each rule across incentive treatments for most rules (test of proportion; reinforcers, p = 0.801; inertia, p = 0.259; non-updaters, p = 0.733) with the notable exception of Bayes’ rule. There is a lower number of subjects classified as following the normative prescriptions in the higher (37.14%) compared with lower (25.78%) incentive treatment (test of proportions, z = 1.996, p = 0.022).

Figure 2. Naïve Classification of Subjects by Treatment

Unsurprisingly, subjects classified as following Bayes’ rule earned more on average in the task (mean, 9.31) than other subjects (mean, 9.04; MWW, n = 268, z = 2.759, p = 0.006). These results are a first indication that, in this paradigm, being Bayesian actually paid off, but increasing incentives might actually have a negative effect on performance, at least for some people. The naïve classification provides a first glimpse at behavioral heterogeneity, but it is obviously of limited interest. The reason is that it adopts the unrealistic view that decision makers follow exactly one behavioral rule and remains silent on deviations with respect to a given rule, even though a subject is still classified as following a rule if a large part of his or her decisions deviate from it. Hence, in the following section, we turn to a finite-mixture modeling approach.

4. Behavioral Types and Finite Mixture Modeling

We are interested in individual heterogeneity, and hence we now turn to a different method of classification of decision makers according to the postulated behavioral rules. The task of classifying the observed choices according to behavioral rules is not trivial, because which behavioral rule was followed by each participant for each decision is not directly observable. Hence, any inference on which rule determined which decision must rely on a comparison between observed choices and the prescriptions of the rules. This is complicated by the fact that different behavioral rules might prescribe the same choice part of the time. Furthermore, in view of overwhelming evidence on decision errors and stochastic choice, it is unrealistic to assume that behavioral rules are purely deterministic.

To solve these problems, we explicitly introduce the possibility that subjects make errors during the experiment. In the model, an error occurs whenever a subject is following a specific behavioral rule but the actual choice deviates from the rule’s prescriptions. For example, a subject might follow Bayes’ rule but make a (maybe computational) mistake and choose the incorrect option. In other words, we consider noisy versions of the behavioral rules of interest. Given strictly positive error probabilities, each and every choice might a priori have been generated by each and every candidate behavioral rule. For instance, in a situation where reinforcement and Bayes’ rule prescribe different answers, a given (erroneous) answer might be due to the subject following reinforcement or to the subject making a mistake while actually following Bayes’ rule. To untangle behavior, we now turn to a finite mixture modeling approach, which accounts for unobservable determinants of behavior, potential heterogeneity among individuals, and the possibility of errors. In the following sections, we describe how we implement finite-mixture modeling for our data and report the actual estimation.

We adopt the supervised approach to finite-mixture modeling, where we specify the number and characteristics of types ex ante. This is appropriate in our case because we have clearly defined candidates, as the experiment was designed to disentangle precisely the four behavioral rules we consider. In this sense, we tie our hands ex ante to the set of relevant behavioral rules and examine whether the evidence supports that choice.7

4.1. Finite Mixture Model for Noisy Behavioral Rules

Let B={0,1} denote the set of possible bets or outcomes, where zero stands for “black” and one for “white.” For each trial and participant in our experiment, the second bet bB is made after the first, aB, and after observing the color of the first extracted ball, xB. Furthermore, let tT={4,6} indicate whether the trial corresponds to the four-ball or the six-ball design. Before the decision b is made, the relevant characteristics of a trial are completely identified by ω=(t,a,x)Ω=T×B2. With this notation, a behavioral rule in our setting is a mapping

and the four behavioral rules β1,,β4 that we consider can be written down in this format simply by translating the corresponding columns in Figure 1(b). For example, if k = 1 is Bayes’ rule, then β1(4,0,0)=1,β1(4,0,1)=0,,β1(6,1,0)=0.

Because the behavioral rules are deterministic and there are no ties in our design, the choice probabilities induced by each rule βk are given by


The observations m=1,M in our data set are of the form (ω,b)Ω×B. A finite mixture model assumes that a distribution of types (in our case, behavioral rules) generates the actual observations, with ηk (k=1,,4) being the probability of type k.

To take errors into account, we consider noisy behavioral rules by introducing an error or “trembling” probability εk, analogously to Bruhin et al. (2010). Because we deal with binary choice, the interpretation is simple: given ω and assuming that βk generated the observed choice, 1εk is the probability with which the choice prescribed by βk actually obtains, whereas with probability εk the opposite choice is made. Formally,


Our estimation assumes that each individual j=1,,268 has independent probabilities ηkj of relying on each rule plus individual-level error rates εkj (k=1,,4). Hence, the likelihood of the sub-data set containing j’s decisions, Dj={(ω1j,b1j),,(ω60j,b60j)}, is

and the likelihood of the entire data set D=j=1268Dj is

Because lnProb(D)=j=1268lnProb(Dj) and Prob(Dj) depends only on the variables η1j,,η4j and ε1j,,ε4j, which in turn play no role for Prob(D) with j, maximum likelihood estimation can be done independently for each participant j. The estimation then delivers which type k=1,,4 each participant j is more likely to belong to plus the individual error rate εkj of the corresponding rule.

4.2. Estimation Results

We are interested in how many subjects consistently follow each (noisy) behavioral rule in this paradigm and in how precise these rules are (i.e., how large the εk are). To answer this question, we estimate the finite mixture model described previously separately for each subject. This type of individual-level analysis has been shown to be a useful tool in a number of other contexts (El-Gamal and Grether 1995, Costa-Gomes et al. 2001, Iriberri and Rey-Biel 2013, Bruhin et al. 2018).8

Following a common approach in the relevant statistics literature (Redner and Walker 1984), we adopt a maximum likelihood (ML) approach for parameter estimation in our finite mixture model. Specifically, we rely on the Davidon-Fletcher-Powell (DFP) algorithm (Fletcher and Powell 1963, Davidon 1991).

Table 1 summarizes the distribution of subjects classified as following each behavioral rule. ML estimation converged for n = 266 of our 268 subjects. Columns Est.Weight and Est.Error rate report the average estimated parameters corresponding to the probability ηkj of each rule and the relative error rate εkj. “Classified as” and “Percentage” report the number and percentage of subjects classified as most likely following each rule, respectively, that is, those subjects for which that rule had the highest probability. This classification is depicted in Figure 3, which shows that a slight majority of our subjects are classified as non-Bayesian, but Bayes’ rule (at 49%) is the most frequently followed rule, a result in alignment with previous observations by El-Gamal and Grether (1995). Confirming our basic hypothesis that reinforcement is the main determinant of deviations from Bayes’ rule in our setting, this rule is the second highest in terms of number of participants mostly following it: around 28%. In contrast, inertia and non-updating describe just around 17% and 6% of our sample, respectively. This is in sharp contrast with the naïve classification reported previously.


Table 1. Estimation Summary: Finite Mixture Model

Table 1. Estimation Summary: Finite Mixture Model

Behavioral ruleEst.WeightEst.Error rateClassified asPercentageCond.MeanCond.Error rate

Notes. Standard deviation is in parentheses. n = 266.

Figure 3. Proportions of Individuals Classified into Each Behavioral Rule

A good classification should be as unambiguous as possible. Ideally, to declare a decision maker as, say, a Bayesian or a reinforcer, one would like to have a probability ηkj close to one for the corresponding rule. The columns Cond.Mean and Cond.Error rate of Table 1 report the estimated average parameters restricted to those subjects who are actually classified as most likely following the corresponding rule. The column Cond.Mean shows that, for our estimation, most estimated values of ηkj are close to either zero or one (the distributions are bimodal). In other words, almost all subjects are unambiguously classified as following one of the four behavioral rules, which is an indication of the goodness of the estimation. However, our assumption of noisy behavioral rules seems justified. In particular, the estimated error rates (conditional or not) differ across rules and are smallest for the two most common ones. Specifically, Bayes’ rule has the lowest estimated error rate, around 12%, followed by reinforcement at around 26%. In contrast, the two less-frequent rules, inertia and non-updating, appear to be associated with relatively large error rates.9

We can further explore possible correlations between the classification and demographic characteristics. Females are not more likely to be classified as following Bayes’s rule or reinforcement than males (MWW tests: Bayes, n = 266, z = 0.348, p = 0.728; reinforcement, n = 266, z = 0.688, p = 0.491). Moreover, there is no significant correlation between age and being classified into these types (Spearman’s correlation: Bayes, n = 266, ρ=0.018, p = 0.766; reinforcement, n = 266, ρ=0.016, p = 0.798).10

The bottom line of the classification is that more than three quarters of our sample seem to have mostly (and consistently) followed either Bayes’ rule or reinforcement during the entire experiment, once one accounts for occasional errors. That is, Bayes’ rule explains part of the data, but reinforcement is an important driver of behavior and, in our classification, represents the most common deviation from Bayesian behavior. Our results further highlight that analyses of aggregate behavior can be potentially improved by considering the underlying heterogeneity in decision-making processes across individuals.

4.3. Behavioral Rules and Incentives

Participants in our study performed the belief-updating task for different incentive levels, with n = 128 participating under low incentives and n = 140 under high incentives. Because our estimation was conducted at the individual level and incentive levels were varied between subjects (and not within), it is possible to further differentiate the classification according to the level of incentives. Table 2 reports the proportion of individuals in each treatment classified as mostly following each behavioral rule (left), which are depicted in the left panel of Figure 4. The right side of Table 2 displays the average error rates for each behavioral rule, restricted to the individuals classified as mostly following it, and includes MWW tests for those error rates across treatments. As was the case for the naïve classification, there are no treatment effects for the percentage of individuals classified under each rule. Moreover, the (conditional) error rates for the different behavioral rules are not significantly different across treatments.


Table 2. Treatment Effect at the Individual Level

Table 2. Treatment Effect at the Individual Level

RuleProportionEstimated error rate
LowHighTest of proportionLowHighMWW test
zp valuezp value
n = 266nlow = 126nhigh = 140
Figure 4. Treatment Effect on the Finite Mixture Model Classification
Notes. (Left) Individuals classified into each behavioral rule by treatment. (Right) Percentage of correct answers for subjects classified as following Bayes’ rule and other behavioral rules by incentive treatment. Stars indicate MWW tests, **p<.05.

For participants classified as mostly following Bayes’ rule, by definition, the error rates (ε1j) refer to deviations from normatively optimal behavior. Hence, the first test on the right side of Table 2 shows that Bayesian decision makers are not significantly better (in a normative sense) for higher incentives (88.86% correct answers under high incentives versus 86.95% under low incentives; MWW, n = 130, z = 0.553, p = 0.580). However, non-Bayesian decision makers do become significantly better in this sense. Specifically, as illustrated in the right panel of Figure 4, if we pool all subjects classified as following either reinforcement, inertia, or non-updating into a non-Bayesian category, we find a significant increase in the percentage of normatively correct answers under high incentives (64.06%) compared with low incentives (60.19%; MWW, n = 136, z = 1.969, p = 0.049).11

This result has a simple interpretation and speaks in favor of a heterogeneous effect of incentives. For part of the population (roughly half), low incentives suffice to engage in cognitive deliberation and decide following noisy, error-prone approximations of Bayes’ rule. Higher incentives do not result in additional improvements for these individuals, and in particular, error rates cannot be reduced, suggesting ceiling effects. Another large part of the population, however, follows non-Bayesian behavioral rules most of the time (mostly reinforcement) and accordingly achieves a lower performance. Taken as a whole, the latter subjects, however, do react to incentives and make more choices in alignment with Bayes’ rule under high incentives.

The increase in Bayesian choices under high incentives for subjects classified as non-Bayesians, however, is not of sufficient magnitude to be reflected in a change in the classification or even in a significant increase of the fraction of the time that those subjects are classified as relying on Bayes’ rule (η1j), mainly because the latter is very close to zero. Indeed, the estimated average weight for Bayes’ rule for non-Bayesian subjects is merely 0.74% (minimum =0, maximum =38.56%) under low incentives and 1.89% (minimum =0, maximum =44.14%) under high incentives (MWW, n = 136, z = 0.132, p = 0.895).

4.4. Response Times and Incentives

In our experiment, we recorded both choices and their response times. The analysis of the response times reveals a very large, highly significant difference between incentive treatments. As illustrated in Figure 5 (left), decisions are considerably faster under high incentives (average, 1.627 seconds) compared with low ones (3.402 seconds; MWW, n = 268, z = 7.868, p < 0.001), with an average decrease of about 52%. This difference is independent of the classification. In particular, it holds both for subjects classified as most likely following Bayes’ rule (Figure 5, right; high incentives, 1.592 seconds; low incentives, 3.183 seconds; MWW, n = 130, z = 3.957, p < 0.001) and for all those classified as non-Bayesian (high incentives, 1.594 seconds; low incentives, 3.335 seconds; MWW, n = 136, z = 7.758, p < 0.001).

Figure 5. Response Times by Treatment
Note. Distribution of response times by treatment, pooled (left) and separately for subjects classified as Bayesians and non-Bayesians (right).

This effect demonstrates that participants did react to the different levels of incentives; that is, there were differences in the perception of high versus low incentives for our task. However, the interpretation of response times is nuanced, especially when different behavioral rules codetermine behavior (Achtziger and Alós-Ferrer 2014, Spiliopoulos and Ortmann 2018). Some authors have suggested an interpretation of response times as a proxy for cognitive effort in certain settings (Moffatt 2005, Enke and Zimmermann 2019), for example, iterative thinking in games (Alós-Ferrer and Buckenmaier 2021). However, this interpretation would be unwarranted in the simple decisions we study here.

On the contrary, we suggest that the interpretation of the effect of incentives on response times in our context is straightforward. One of the most stable and most firmly established regularities involving response times is the chronometric effect, which boils down to the extremely robust observation that easier choice problems (where alternatives’ evaluations show large differences) take less time to respond to than harder problems (Dashiell 1937, Moyer and Landauer 1967). Hence, deliberation times are longer for alternatives that are more similar, either in terms of preference or along a predefined scale. This effect has explicitly been shown in various economic settings such as intertemporal choice (Chabris et al. 2009), decisions under risk (Alós-Ferrer and Garagnani 2022), consumer choice (Krajbich et al. 2010), and dictator and ultimatum games (Krajbich et al. 2015).12 In our case, increasing incentives makes the payoff difference between a correct and an incorrect choice larger. Hence, higher incentives result in a larger difference in payoffs, which, by the chronometric effect, result in shorter response times. This observation is enough to explain the effect we find.

4.5. Alternative Classifications

Our classification through an FMM follows the supervised approach because our experiment was designed to disentangle four specific behavioral rules. It specifies noisy behavioral rules where error rates can be estimated because those provide a measure of the accuracy of the classification. We conduct the estimation at the individual level, where the FMM is estimated separately for every participant, because we are especially interested in heterogeneity across and within individuals. Unfortunately, all three choices make the analysis computationally intensive.

The online appendices report a number of additional alternative classifications. First, we implemented an unsupervised approach, which is detailed in Online Appendix A. In this approach, we compared models with two, three, or four components without specifying their characteristics. The results show that the unsupervised approach is not appropriate for our data because it loses track of the individual and essentially tries to fit the data through rules that always bet on black or always bet on white.13

Second, we estimated an FMM at the aggregate level, that is, as if the entire data set came from a representative decision maker. This analysis (assuming noisy behavioral rules as in our main analysis) is detailed in Online Appendix B. As in the model described previously, this estimation puts the largest weights on Bayesian behavior and model-free reinforcement, with a sizeable weight on inertia (18.5%). However, the error rates are very high for all rules (close to 50%), which turns them into essentially random behavior. Again, this suggests that this kind of analysis is inappropriate for our data. The reason is that the aggregate-level analysis treats all observations as equivalent and effectively ignores heterogeneity across individuals.

Third, we conducted a number of model comparisons estimating the model at the aggregate level, assuming mistake-free rules and varying the number of considered rules (two, three, or four; see Online Appendix C). The best model (according to criteria that penalize additional parameters) consists of only Bayes’ rule and (model-free) reinforcement. However, it is difficult to evaluate the fit of the classification in the absence of estimated error rates.

Fourth, as an additional model comparison, we estimated an FMM with noisy rules and at the individual level but only three types: Bayes’ rule, reinforcement, and a pure-noise rule that simply randomizes between both alternatives. This estimation was very inaccurate, with conditional means between 55% and 61% (see Online Appendix D for details).

5. Temporal Dynamics and Hidden Markov Modeling

The finite mixture approach detailed in the previous section has shown that there is substantial heterogeneity across individuals, with half of our sample mostly following error-prone approximations of Bayes’ rule and the other half mostly following non-Bayesian rules. Thus far, however, we have ignored the temporal dimension. In particular, the previous analysis does not rule out possible dynamic relations across behavioral rules.14 For example, thus far, we do not know whether a Bayesian choice might be more or less likely after having relied on the reinforcement rule. To answer these questions, this section investigates temporal effects by relying on a simple machine learning methodology: HMM. While the previous section concentrated on identifying the behavioral rules that subjects most likely followed during the entire experiment (interindividual heterogeneity), in this section, we focus on possible dynamic patterns where decision makers may switch from one behavioral rule to another over time (intraindividual heterogeneity).

5.1. HMMs

To study the temporal interplay between behavioral rules, we need to classify each choice for each subject and trial, based on the rule the subject most likely followed in that trial, while conditioning on the most likely behavior in previous rounds of the experiment. With this aim, we propose a different model specification that goes beyond the finite mixture modeling approach used thus far.

We rely on a simple machine learning algorithm known as HMM (MacDonald and Zucchini 1997, Ramaprasad and Shigeyuki 2004, Frühwirth-Schnatter et al. 2019). The basic idea is as follows. Because we aim to investigate the dynamic dependence across different behavioral rules, we cannot assume that the latent variables are independently and identically distributed, as in finite mixture models. Rather, an HMM postulates that the latent variables are connected through a Markov chain. Specifically, in our setting, each state of the Markov chain represents a behavioral rule. Because we cannot directly observe which rule a participant actually uses in a given trial, these states are “hidden.” Thus, instead of estimating static probabilities for each state, one needs to estimate transition probabilities across them.

Specifically, let the state of subject j’s Markov chain at time t be described by Xtj{1,2,3,4}, where Xtj=k means that, at time t, the subject follows the noisy behavioral rule pk given by (1). That is, as in Section 4, we maintain our assumption that behavioral rules are noisy, and mistakes are possible. In particular, a state does not correspond to a deterministic behavioral rule but rather to its noisy counterpart, and thus we need to estimate the relative error probability for each state (which, in terms of HMM models, is the complementary of the emission probability).

For our data, HMM reduces to the estimation of the individual-level, state-dependent error rates εkj,k=1,2,3,4,j=1,,268 and the stationary transition probabilities

which form a 4 × 4 matrix Pj for each subject j. Given a deterministic initial condition X0j, or a probabilistic one (Prob(X0j=k))k=14, this transition probability matrix fully determines the probabilities of the events Xtj=k for all t and k by

Following the HMM literature (Rabiner 1990, Durbin et al. 1998), we adopt a standard approach to estimate the parameters of the HMM, namely a special case of expectation maximization (Dempster et al. 1977) known as the Baum-Welch algorithm (Baum et al. 1970). This algorithm is also known as the forward-backward (or α, β, γ) algorithm because it loops among three well-differentiated parts or tasks. Intuitively, the first task is to efficiently evaluate the probability of the observations given the model (following a top-down procedure, in our case from first to last trial). The second task is to identify the optimal state sequence, that is, which state sequence among all theoretically possible ones is most likely to have occurred (now following a bottom-up or last-to-first procedure). The third task is to adjust the model’s parameters given the most-likely sequence and the estimated probability of observations given the model in a meaningful way. We refer the reader to Frühwirth-Schnatter (2006) or Frühwirth-Schnatter et al. (2019) for details on the algorithm. We declare that the algorithm has converged when the change in log likelihood, normalized by the total number of possible sequences and by sequence length, is smaller than 107 as standard in the literature (Ramaprasad and Shigeyuki 2004, Mamon and Elliott 2014). To determine the initial values for the algorithm, we used the values (ηkj and εkj) obtained in the FMM of Section 4.2 (and a flat prior for the two subjects for whom that estimation did not converge).

5.2. Temporal Dynamics: Estimation Results

We estimated the parameters of the HMM for all 268 participants, thereby obtaining a transition probability matrix and a vector of error terms for the hidden states for each individual. From these matrices, we computed all individual invariant distributions (μ such that μ=μ·P), which, by the fundamental theorem of Markov chains, summarize the long-run probabilities of the states, and, by the ergodic theorem, reflect the average time that the system spends at each state (Karlin and Taylor 1975).15

Table 3 displays the average results of the estimation for this type of analysis, pooling across incentive treatments. The From column indicates the behavioral rule which was most likely followed in the previous trial. The columns under the To label indicate the behavioral rule to where the Markov process is most likely to go. That is, the given matrices represent average transition probability matrices. The last column of the table (Error) indicates the estimated probability of making an error (the complementary of the emission probability) at each state. The last row of the table (Invariant) reports the probabilities in the corresponding invariant distribution.


Table 3. Temporal Dynamics: Individual Averages

Table 3. Temporal Dynamics: Individual Averages


Notes. Standard deviation is in parentheses. n = 268.

Even at this level of aggregation, the results of the temporal dynamics analysis suggest an oscillatory pattern in behavior. This is made clear by Figure 6. In this and the following figures, circled areas corresponds to the states (rules of behavior) and arrows to transitions. The areas of the circles and the thickness of the arrows are proportional to the weights in the corresponding invariant distributions and to the transition probabilities, respectively. Participants gravitate mostly toward Bayes’ rules and reinforcement and continue using those rules in most cases, but transition probabilities across rules are typically large, very especially between Bayes’ rule and reinforcement. We observe that, at the aggregate level, the rows of the transition probability matrix are similar to each other, indicating that the average dynamics might be close to an independent and identically distributed (i.i.d.) process. This, however, is not necessarily true at the individual level. Individual transition probability matrices display considerable variance and are often quite far from reflecting i.i.d. processes. Online Appendix F provides a more detailed examination of this point, including a measure of heterogeneity in transition probability matrices (Figure F.1) and several examples at the individual level (Table F.1).

Figure 6. Results of the HMM Estimation
Notes. Circle sizes are proportional to weights in the invariant distribution. Arrow thickness is proportional to transition probabilities among states. Probabilities themselves are rounded for graphical illustration.

5.3. Temporal Dynamics and Incentives

Table 4 displays the average results of the HMM estimation (transition probability matrices, state-dependent error rates, and invariant distributions) separately by treatment (low versus high incentives), following the conventions in Table 3. Figure 7 depicts the results, again separately by treatment (conventions are as in Figure 6). The overall pattern is confirmed, with participants mostly relying on Bayes’ rules and reinforcement but displaying relatively large transition probabilities. However, the effects of incentives are now more clear.


Table 4. Temporal Dynamics by Treatment: Individual Averages

Table 4. Temporal Dynamics by Treatment: Individual Averages

Low incentivesHigh incentives

Notes. Standard deviation is in parentheses. n = 128 for low incentives and n = 140 for high incentives.

Figure 7. Results of the HMM Estimation by Treatment
Notes. Circle sizes are proportional to weights in the invariant distribution. Arrow thickness is proportional to transition probabilities among states. Probabilities themselves are rounded for graphical illustration. (a) Low incentives. (b) High incentives.

Specifically, comparing the two conditions, we observe different dynamics among behavioral rules. The most striking result is that increasing incentives shifts behavior toward reinforcement and away from Bayes’ rule. After using Bayes’ rule, under high incentives (compared with low incentives), it is less likely to stay with it and use it again (low, 44.10%; high, 41.29%; MWW, n = 268, z = 2.959, p = 0.003) and more likely to switch away to reinforcement (low, 31.68%; high, 38.07%; MWW, n = 268, z = −4.329, p < 0.0001). In contrast, after using reinforcement, under high incentives, it is more likely to stay with this rule (low, 31.62%; high, 38.40%; MWW, n = 268, z = −4.749, p < 0.0001) and less likely to switch away to Bayes’ rule (low, 44.11%; high, 41.00%; MWW, n = 268, z = 2.927, p = 0.003).16

In other words, we observe a significant treatment effect in the temporal dynamics in the form of a clear shift toward reinforcement and away from Bayes’ rule with larger incentives. This effect can only be observed clearly by examining the temporal dynamics. Comparing individual invariant distributions, we observe that, in accordance with the previous results, the overall probability of relying on reinforcement is larger under high incentives (low, 31.79%; high, 38.30%; MWW, n = 268, z = −4.756, p < 0.0001). The long-run probability of relying on Bayes’ rule is smaller under high incentives, but the difference is not significant (low, 43.97%; high, 41.17%; MWW, n = 268, z = 1.018, p = 0.310).

These results have a twofold interpretation. On the one hand, they speak in favor of a ceiling effect of incentives where a higher reward for each correct choice does not decrease the error rates and does not increase the overall reliance on Bayes’ rule. On the other hand, higher incentives seem to increase the appeal of simple reinforcement. The latter is compatible with the EEG study by Achtziger et al. (2015), which pointed out that increasing incentives increases the salience of the received feedback and makes simple reinforcement processes more prominent. This is simply because if monetary incentives are increased, the win/loss cues that lead to an activation of reinforcement processes become more salient, and hence reliance on reinforcement increases.

We remark that these results can only be obtained once we rely on the HMM analysis. The previous FMM analysis neglects the temporal dynamics, and it is only by allowing for the latter that a clearer picture of the effects of incentives on actual behavior starts to emerge. We will further examine heterogeneity in the reaction to incentives in this context in Section 5.5.

5.4. Dynamic Classification

Our FMM analysis in Section 4.2 allowed us to classify participants according to the rule they mostly followed under an i.i.d. assumption (Figure 3). Because our HMM analysis delivers the individual invariant distributions, which summarize the long-run proportion of time spent in each state, that is, using each rule, we can derive an alternative classification from those invariant distributions dispensing with the i.i.d. assumption. Specifically, we classify individuals based on the state displaying the largest probability in their respective individual invariant distributions. That is, similarly to Section 4.2, a subject is considered Bayesian if Bayes’ rule is given the largest probability in the invariant distribution derived from her individual transition probability matrix.

The result of this classification turns out to be empirically identical to the one obtained in Section 4.2 and depicted in Figure 3. That is, although individual weights for the hidden states are quite different from the rule weights obtained in the FMM model, every single individual retains the same classification in the HMM as in the FMM case, except for the two unclassified subjects (who are reclassified as non-updaters). That is, the previous classification according to the most-used behavioral rules turns out to be stable. Although the objective of the HMM analysis is to examine the temporal dynamics and not this derived classification, we view the fact that the latter agrees with the ones derived in the previous section as a validation of the approach.

Table 5 summarizes the results of the HMM classification. Although the classification in terms of which rule is followed most of the time is almost identical to the one derived from the FMM, the conditional means are quite different. In the FMM case, the conditional means were above 97% for Bayesians and reinforcers and above 84% for non-updaters and subjects relying mostly on inertia. In contrast, in the HMM classification conditional means have a different interpretation because they are derived from invariant distributions. As can be seen in the third column of Table 5, in this case, the conditional means are between 51% and 65%. That is, when one neglects the possibility of dynamic dependence among behavioral rules, the FMM classification reliably identifies the most-used rule for each individual but does not offer a good explanation for deviations from that rule at the individual level. In contrast, taking into account the possible temporal dynamics, the HMM classification uncovers heterogeneity within each individual, in the sense that subjects who mostly follow a behavioral rule actually also rely on different rules a significant proportion of the time, as reflected by the weights in the invariant distribution. Thus, the conditional means in the HMM case reveal that within-individual heterogeneity in behavioral rules is of a sizeable magnitude.


Table 5. Estimation Summary: HMM Conditional on Subjects Classified as Mostly Following the Relative Behavioral Rule

Table 5. Estimation Summary: HMM Conditional on Subjects Classified as Mostly Following the Relative Behavioral Rule

Behavioral ruleClassified asPercentageCond.MeanCond.Error rate

Notes. Standard deviation is in parentheses. n = 268.

5.5. Heterogeneity in the Effects of Incentives

We now proceed to refine the results on the effects of incentives (Section 5.3) by accounting for heterogeneity, relying on the classification obtained previously. Table 6 displays the average transition probability matrices conditional on subjects classified as Bayesians (132) or non-Bayesians (136), respectively. As in Table 4, the right-most columns list the estimated error rates, and the bottom rows detail the (average) invariant distributions. The two panels of Figure 8 give a graphical representation of the results. Each picture is analogous to the one in Figure 7 but restricted to subjects classified as Bayesians or non-Bayesians, respectively. The significance of tests for each transition probability (high versus low incentives) and the direction of the significant change (+/) is highlighted.


Table 6. Temporal Dynamics Distinguishing Subjects Classified as Most Likely Following Bayes’ Rule from Others

Table 6. Temporal Dynamics Distinguishing Subjects Classified as Most Likely Following Bayes’ Rule from Others

Bayesian subjectsNon-Bayesian subjects

Notes. Individual averages with standard deviation are in parentheses. n = 130 for Bayesian subjects and n = 138 for non-Bayesian subjects.

Figure 8. Average Probabilities of Switching from One Behavioral Rule to Another Separating Subjects Classified as Mostly Following Bayes’ Rule from Others
Notes. Stars indicate MWW tests between low and high incentives, ***=p<.01,**=p<.05,*=p<.10; [+/] indicates a significant increase from low to high [+] or vice versa []. Arrow thickness indicates the probability to transition from one state to another. Size of the circle indicates the ergodic distribution. Numbers are approximated to the nearest integer for graphical illustration. (a) Bayesian subjects. (b) Non-Bayesian subjects.

Increasing incentives has different effects for different subjects. For those relying mostly on Bayes’ rule, higher incentives seem to be slightly detrimental, in the sense that the long-run probability of Bayes’ rule (in the invariant distribution) is smaller under high incentives (low, 70.24%; high, 59.65%; MWW, n = 130, z = 9.833, p < 0.001). This is mostly due to a decrease in the probability to stay with Bayes’ rule after using it (low, 70.17%; high, 59.84%; n = 130, z = 9.832, p < 0.001) and an increase in the probability of transition from Bayes’ rule to reinforcement (low, 24.75%; high, 35.24%; MWW, n = 130, z = −9.832, p < 0.001), in line with the increased appeal of reinforcement under high incentives mentioned previously. This is further supported by a similarly large increase in the probability of staying with the reinforcement rule after using it (low, 24.62%; high, 35.74%; MWW, n = 130, z = −9.813, p < 0.001) and a reduction in the probability of transition from reinforcement to Bayes’ rule (low, 70.50%; high, 59.32%; MWW, n = 130, z = 9.813, p < 0.001). We remark that the changes in probabilities across incentives are substantial, especially compared with those at the aggregate level.

For subjects classified as following mostly some non-Bayesian rule, the long-run probability of Bayes’ rule is slightly larger under high incentives than under low ones (low, 21.13%; high, 22.67%; n = 138, z=2.816, p = 0.005). This results from a small increase in the probability to stay with Bayes’ rule after using it (low, 20.90%; high, 22.77%; n = 138, z = −1.814, p = 0.070). However, there is a significant increase in the probability of switching from Bayes’ rule to reinforcement (low, 37.36%; high, 41.17%; n = 138, z = −2.596, p = 0.009). Furthermore, in line with the increased appeal of reinforcement under higher incentives, the probability of staying with this rule after using it is larger under high incentives than under low ones (low, 37.33%; high, 41.33%; MWW, n = 138, z = 2.673, p = 0.007).

Aggregating all subjects classified as non-Bayesians together (138), however, again loses valuable information. Online Appendix G reports the temporal dynamics (average transition matrices and invariant distributions, Table G.1 and Figure G.1) for all four types. Subjects classified as following a specific behavioral rule are in general quite consistent and exhibit large probabilities for returning to that rule after a deviation. However, the effects of incentives are quite different across types and in particular when comparing reinforcers and (non-Bayesian) nonreinforcers.

The majority of non-Bayesians are classified as reinforcers (75). For those, increasing incentives results in a large increase in the long-run probability of model-free reinforcement (low, 54.89%; high, 65.03%; MWW, n = 75, z=9.834, p < 0.001) and a sizeable decrease in the probability of using Bayes’ rule (low, 30.08%; high, 20.09%; MWW, n = 75, z = 7.371, p < 0.001). This is accompanied by a large increase in the probability of switching from Bayes’ rule to reinforcement (low, 54.70%; high, 65.34%; n = 75, z = −7.220, p < 0.001) and a large reduction in the probability of transition from reinforcement to Bayes’ rule (low, 29.87%; high, 20.18%; MWW, n = 75, z = 7.370, p < 0.001).

For non-Bayesian subjects classified as mostly following inertia (44) or non-updating (19), results are quite different. Higher incentives lead to an increase in the long-run probability of Bayes’ rule (inertia: low, 14.95%; high, 25.11%; MWW, n = 44, z=5.677, p < 0.001; non-updating: low, 9.81%; high, 14.86%; MWW, n = 19, z=3.636, p < 0.001) accompanied by a decrease in the long-run probability of the own respective rule (inertia: low, 65.16%; high, 59.72%; MWW, n = 44, z = 5.631, p < 0.001; non-updating: low, 59.95%; high, 40.07%; MWW, n = 19, z = 3.638, p < 0.001).17

These results clarify the mechanisms linking incentives to performance in our paradigm while taking into account behavioral heterogeneity. The effect of increasing incentives is double-edged. On the one hand, higher incentives do have a positive, presumably motivational effect for some subjects, leading to a higher reliance on Bayes’ rule. On the other hand, higher incentives seem to generally increase the reliance on reinforcement, in agreement with a resulting increased salience of the win-lose cues that activate that behavioral rule (Section 5.3).

The overall picture is hence as follows. For subjects classified as mostly Bayesian, low incentives suffice to spark the use of Bayes’ rule, presumably because those are sufficient motivation to overcome their cognitive costs. For those, an increase in incentives produces no improvement, suggesting a ceiling effect. The effects might even be slightly detrimental, in line with the interpretation that higher incentives make model-free reinforcement more appealing. For subjects classified as mostly reinforcers, the win-lose monetary cues spark reliance on reinforcement already for low incentives, and an increase in incentives simply enhances those cues and is hence mostly detrimental. For subjects classified neither as Bayesians nor as reinforcers, low incentives do not suffice to trigger the use of Bayes’ rule, but increasing incentives creates a significant shift toward it, which is of a large magnitude in relative terms. Although their performance is generally worse than that of Bayesians, these subjects do respond to incentives as standard economic theory would suggest.

Thus, the analysis of the temporal dynamics casts light on the mechanisms undergoing the effects of incentives on performance for belief updating tasks and specifically shows that those effects are highly heterogeneous. Furthermore, our analysis through HMM shows that the underlying heterogeneity is well captured by the view that reinforcement is the main driver behind deviations from Bayesian updating, with alternative rules playing a comparatively small role.

6. Conclusion

We designed a novel belief-updating experimental paradigm to disentangle alternative rules of behavior when an existing prior can be updated in the face of new information and that information carries a win-lose feedback, as is often the case for financial and managerial decisions. The analysis shows that, in such cases, model-free reinforcement is the main driver of deviations from Bayesian updating, with alternative rules as decision inertia or non-updating playing a smaller role.

We use a multilayered identification strategy. First, by applying finite mixture models, we find large levels of heterogeneity, with around half of the population relying mostly on Bayesian updating, over a quarter relying on reinforcement, and the rest on the remaining rules. At this level of analysis, we find that increasing incentives results in a performance increase for subjects classified as non-Bayesians, whereas evidence for Bayesians is compatible with ceiling effects.

Second, we use HMMs to examine the temporal dynamics across different behavioral rules. We find considerable heterogeneity within individuals, with most of them relying on several, different behavioral rules over time, especially Bayes’ rule and model-free reinforcement. Thus, even subjects classified as Bayesians are only “part-time Bayesians.” The analysis of these patterns shows significant effects of incentives with relatively large probability shifts. Subjects who rely mostly on reinforcement increase the frequency of this rule under higher incentives. Subjects who rely mostly on Bayes’ rule experience detrimental effects of incentives, due to an analogously increased frequency of reinforcement. These results might reflect a “reinforcement paradox” where model-free reinforcement is triggered more often when the monetary values attached to the win-lose cues increases. Last, subjects who rely on either inertia or non-updating benefit from increased incentives, as they react by shifting toward Bayes’ rule, improving their performance.

Our results go beyond the well-known fact that human decision makers are not Bayesian. On the one hand, decision makers can be fruitfully classified according to the decision rules they mostly rely on. Furthermore, the resulting heterogeneity extends to the effects of financial incentives. Although some decision makers experience ceiling effects or even detrimental consequences, others do react positively to incentives. On the other hand, decision makers exhibit nontrivial temporal patterns and rely on different behavioral rules over time. When it comes to belief updating, it is not only that one size does not fit all, but rather, that one size might not even fit one.


1 Charness and Levin (2005) provide an intuitive example illustrating reinforcement behavior in managerial decisions. Imagine a rookie employee is sent to close a business deal because a more experienced negotiator is unavailable and achieves good results. Next time a similar deal has to be closed, arguments like “never change a winning horse” might prompt the CEO to send the rookie again. This, however, neglects the possible informational content of the previous outcome, which might imply that a more experienced negotiator could achieve even better results.

2 Received belief updating tasks from the decision-making literature are typically simpler and insufficient for our purposes. For example, in the task of Charness and Levin (2005), Achtziger and Alós-Ferrer (2014), and Achtziger et al. (2015), in half of the possible cases, the Bayesian prescription is “obvious” (and error rates are extremely low) because actually it coincides with the prescription of reinforcement, although in the remaining cases, the two rules conflict (and error rates are extremely high). Furthermore, the task produces only four possible decision cases in total.

3 The design is related to Asparouhova et al. (2015), who investigate belief updating based on sampling without replacement. As in our case, in that paradigm, the first draw is of little interest in itself.

4 For example, after observing a white ball in the first draw in a four-ball trial, a Bayesian should update the probabilities of the three urns to 1/2, 1/3, and 1/6, respectively; hence, the probability of extracting a black ball in the second draw, given that there is one white ball less in the urn, is (1/3)·(1/2)+(2/3)·(1/3)+1·(1/6)=5/9>1/2, leading to an optimal bet on black. In contrast, if a white ball is extracted in the first draw of a six-ball trial, the updated probabilities of the urns are 5/9, 3/9, and 1/9, respectively, and the probability of a black ball in the second draw is (1/5)·(5/9)+(3/5)·(3/9)+1·(1/9)=19/45<1/2, leading to an optimal bet on white.

5 We implemented a between-subject treatment of incentives as common in the literature (Barron 2021) to avoid order effects and other confounds.

6 Specifically, the pay-all mechanism is incentive compatible whenever subjects’ preferences fulfill a condition called “no complementarities at the top” when evaluating bundles of outcomes (Azrieli et al. 2018). In our case, this assumption is immediately fulfilled as long as participants prefer more money over less (but the assumption might impose stronger restrictions in more complex environments). In the second decision within every trial, one option is dominated by the other, and participants always have a strict incentive to choose the urn they believe to be more likely. Then, choosing the correct option in all trials dominates any other alternative bundle of choices, hence making this payment mechanism incentive-compatible according to Azrieli et al. (2018).

7 The alternative unsupervised approach attempts to identify a pre-determined number of different types of individuals in the population. In Online Appendix A (see Section 4.5), we implement an unsupervised approach and show that this analysis is inappropriate for our data.

8 Online Appendix A (see Section 4.5) reports an additional analysis at the aggregate level.

9 Online Appendices C and D (see Section 4.5) report on the estimation of an FMM with only three (noisy) rules and of FMMs at the aggregate level with varying numbers of mistake-free rules.

10 Additionally, there is no correlation between the self-reported difficulty of the task and the classification (measured according to a Likert scale [0,10] with 10 being “very difficult”; mean 2.78, median 2; Spearman’s correlation: Bayes, n = 266, ρ=0.011, p = 0.860; reinforcement, n = 266, ρ=0.017, p = 0.777).

11 A similar result is obtained by comparing the estimated error rates for Bayes’ rule, ε1j, for non-Bayesian subjects across incentive levels. The average estimated error rate is 16.48% for low incentives and 13.38% for high incentives (MWW, n = 136, z = 1.849, p = 0.064).

12 Alós-Ferrer et al. (2021a) relies on the chronometric effect to obtain results on preference revelation in the absence of assumptions on underlying utility noise.

13 We further applied the unsupervised approach to a simulated data set where all subjects switch uniformly across our four rules (the data set is described in Online Appendix E). Again, the results fit it to rules which always bet on the same color.

14 Subjects classified as mostly Bayesian in the FMM made fewer correct choices (55.03%) for four-ball decisions, where reinforcement prescribes an error, than for six-ball decisions, where reinforcement points to the correct response (65.21%, WSR test n = 130, z=2.155, p = .031). This would be compatible with those subjects relying on reinforcement as an alternative rule at least part of the time. The FMM classification, however, does not reflect this observation, in view of the large conditional means in Table 1.

15 Online Appendix E reports a parameter recovery exercise, where the data were simulated with a known generating process.

16 Increasing incentives decreases the probability of staying with non-updating (low, 8.97%; high, 6.22%; MWW, n = 268, z = 2.141, p = 0.0323) but not with inertia (low, 15.25%; high, 13.93%; MWW, n = 268, z = 0.801, p = 0.423).

17 For subjects relying mostly on inertia, there was no significant difference in the probability of reinforcement (low, 10.21%; high, 9.03%; MWW, n = 44, z = 0.530, p = 0.604). For subjects relying mostly on non-updating, higher incentives increased the reliance on reinforcement (low, 25.00%; high, 29.78%; MWW, n = 19, z=3.635, p < 0.001).


  • Achtziger A, Alós-Ferrer C (2014) Fast or rational? A response-times study of Bayesian updating. Management Sci. 60(4):923–938.LinkGoogle Scholar
  • Achtziger A, Alós-Ferrer C, Hügelschäfer S, Steinhauser M (2015) Higher incentives can impair performance: Neural evidence on reinforcement and rationality. Soc. Cognitive Affective Neurosci. 10(11):1477–1483.CrossrefGoogle Scholar
  • Akaishi R, Umeda K, Nagase A, Sakai K (2014) Autonomous mechanism of internal choice estimate underlies decision inertia. Neuron 81(1):195–206.CrossrefGoogle Scholar
  • Alós-Ferrer C, Buckenmaier J (2021) Cognitive sophistication and deliberation times. Experiment. Econom. 24(2):558–592.CrossrefGoogle Scholar
  • Alós-Ferrer C, Garagnani M (2022) Strength of preference and decisions under risk. J. Risk Uncertainty 64(3):309–329.Google Scholar
  • Alós-Ferrer C, Fehr E, Netzer N (2021a) Time will tell: Recovering preferences when choices are noisy. J. Political Econom. 129(6):1828–1877.Google Scholar
  • Alós-Ferrer C, Hügelschäfer S, Li J (2016) Inertia and decision making. Frontiers Psych. 7(169):1–9.Google Scholar
  • Alós-Ferrer C, Jaudas A, Ritschel A (2021b) Effortful Bayesian updating: A pupil-dilation study. J. Risk Uncertainty 63(1):81–102.Google Scholar
  • Ansari A, Montoya R, Netzer O (2012) Dynamic learning in behavioral games: A hidden Markov mixture of experts approach. Quant. Marketing Econom. 10(4):475–503.CrossrefGoogle Scholar
  • Asparouhova E, Bossaerts P, Eguia J, Zame W (2015) Asset pricing and asymmetric reasoning. J. Political Econom. 123(1):66–122.CrossrefGoogle Scholar
  • Azrieli Y, Chambers CP, Healy PJ (2018) Incentives in experiments: A theoretical analysis. J. Political Econom. 126(4):1472–1503.CrossrefGoogle Scholar
  • Barberis N, Shleifer A, Vishny R (1998) A model of investor sentiment. J. Financial Econom. 49(3):307–343.CrossrefGoogle Scholar
  • Barron K (2021) Belief updating: Does the good-news, bad-news asymmetry extend to purely financial domains? Experiment. Econom. 24(1):31–58.Google Scholar
  • Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41(1):164–171.CrossrefGoogle Scholar
  • Beierholm UR, Anen C, Quartz S, Bossaerts P (2011) Separate encoding of model-based and model-free valuations in the human brain. Neuroimage 58(3):955–962.CrossrefGoogle Scholar
  • Bellemare C, Kröger S, Van Soest A (2008) Measuring inequity aversion in a heterogeneous population using experimental decisions and subjective probabilities. Econometrica 76(4):815–839.CrossrefGoogle Scholar
  • Bilias Y, Georgarakos D, Haliassos M (2010) Portfolio inertia and stock market fluctuations. J. Money Credit Bank. 42(4):715–742.CrossrefGoogle Scholar
  • Bruhin A, Fehr E, Schunk D (2018) The many faces of human sociality: Uncovering the distribution and stability of social preferences. J. Eur. Econom. Assoc. 17(4):1025–1069.CrossrefGoogle Scholar
  • Bruhin A, Fehr-Duda H, Epper T (2010) Risk and rationality: Uncovering heterogeneity in probability distortion. Econometrica 78(4):1375–1412.CrossrefGoogle Scholar
  • Camerer CF (1987) Do biases in probability judgment matter in markets? Experimental evidence. Amer. Econom. Rev. 77(5):981–997.Google Scholar
  • Chabris CF, Morris CL, Taubinsky D, Laibson D, Schuldt JP (2009) The allocation of time in decision-making. J. Eur. Econom. Assoc. 7(2-3):628–637.CrossrefGoogle Scholar
  • Charness G, Levin D (2005) When optimal choices feel wrong: A laboratory study of Bayesian updating, complexity, and affect. Amer. Econom. Rev. 95(4):1300–1309.CrossrefGoogle Scholar
  • Costa-Gomes MA, Crawford VP, Broseta B (2001) Cognition and behavior in normal-form games: An experimental study. Econometrica 69(5):1193–1235.CrossrefGoogle Scholar
  • Dashiell JF (1937) Affective value-distances as a determinant of aesthetic judgment-times. Amer. J. Psych. 50:57–67.CrossrefGoogle Scholar
  • Davidon WC (1991) Variable metric method for minimization. SIAM J. Optim. 1(1):1–17.CrossrefGoogle Scholar
  • Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6):1204–1215.CrossrefGoogle Scholar
  • Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neurosci. 8(12):1704–1711.CrossrefGoogle Scholar
  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B 39(1):1–38.CrossrefGoogle Scholar
  • Doll BB, Simon DA, Daw ND (2012) The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiology 22(6):1075–1081.CrossrefGoogle Scholar
  • Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, vol. 5 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Edwards W (1968) Conservatism in human information processing. Kleinmuntz B, ed. Formal Representation of Human Judgment (Wiley, New York), 17–52.Google Scholar
  • El-Gamal MA, Grether DM (1995) Are people Bayesian? Uncovering behavioral strategies. J. Amer. Statist. Assoc. 90(432):1137–1145.CrossrefGoogle Scholar
  • Enke B, Zimmermann F (2019) Correlation neglect in belief formation. Rev. Econom. Stud. 86(1):313–332.Google Scholar
  • Erev I, Haruvy E (2016) Learning and the economics of small decisions. Kagel J, Roth AE, eds. Handbook of Experimental Economics, vol. 2 (Princeton University Press, Princeton, NJ), 638–716.CrossrefGoogle Scholar
  • Feher da Silva C, Hare TA (2020) Humans primarily use model-based inference in the two-stage task. Natural Human Behav. 4(10):1053–1066.CrossrefGoogle Scholar
  • Fischbacher U (2007) z-Tree: Zurich toolbox for ready-made economic experiments. Experiment. Econom. 10(2):171–178.CrossrefGoogle Scholar
  • Fletcher R, Powell MJ (1963) A rapidly convergent descent method for minimization. Comput. J. 6(2):163–168.CrossrefGoogle Scholar
  • Frühwirth-Schnatter S (2006) Finite Mixture and Markov Switching Models (Springer Series in Statistics, New York).Google Scholar
  • Frühwirth-Schnatter S, Celeux G, Robert CP (2019) Handbook of Mixture Analysis (Chapman and Hall, New York).CrossrefGoogle Scholar
  • Frydman C, Camerer CF (2016) The psychology and neuroscience of financial decision making. Trends Cognitive Sci. 20(9):661–675.CrossrefGoogle Scholar
  • Gläscher J, Daw N, Dayan P, O’Doherty JP (2010) States vs. rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66(4):585–595.CrossrefGoogle Scholar
  • Greiner B (2015) Subject pool recruitment procedures: Organizing experiments with ORSEE. J. Econom. Sci. Assoc. 1:114–125.CrossrefGoogle Scholar
  • Grether DM (1980) Bayes rule as a descriptive model: The representativeness heuristic. Quart. J. Econom. 95:537–557.CrossrefGoogle Scholar
  • Grether DM (1992) Testing Bayes rule and the representativeness heuristic: Some experimental evidence. J. Econom. Behav. Organ. 17:31–57.CrossrefGoogle Scholar
  • Griffiths TL, Tenenbaum JB (2006) Optimal predictions in everyday cognition. Psych. Sci. 17(9):767–773.CrossrefGoogle Scholar
  • Iriberri N, Rey-Biel P (2013) Elicited beliefs and social information in modified dictator games: What do dictators believe other dictators do? Quant. Econom. 4(3):515–547.CrossrefGoogle Scholar
  • Kahneman D, Tversky A (1972) Subjective probability: A judgment of representativeness. Cognitive Psych. 3:430–454.CrossrefGoogle Scholar
  • Karlin S, Taylor HM (1975) A First Course in Stochastic Processes, 2nd ed. (Academic Press, San Diego).Google Scholar
  • Knutson B, Bossaerts P (2007) Neural antecedents of financial decisions. J. Neurosci. 27(31):8174–8177.CrossrefGoogle Scholar
  • Krajbich I, Armel C, Rangel A (2010) Visual fixations and the computation and comparison of value in simple choice. Nature Neurosci. 13(10):1292–1298.CrossrefGoogle Scholar
  • Krajbich I, Bartling B, Hare T, Fehr E (2015) Rethinking fast and slow based on a critique of reaction-time reverse inference. Nature Comm. 6(7455):1–9.Google Scholar
  • MacDonald IL, Zucchini W (1997) Hidden Markov and Other Models for Discrete-Valued Time Series (Chapman & Hall/CRC Press, Boca Raton, FL).Google Scholar
  • Mamon RS, Elliott RJ (2014) Hidden Markov Models in Finance: Further Developments and Applications, vol. II (Springer, New York).CrossrefGoogle Scholar
  • Moffatt PG (2005) Stochastic choice and the allocation of cognitive effort. Experiment. Econom. 8(4):369–388.CrossrefGoogle Scholar
  • Moyer RS, Landauer TK (1967) Time required for judgements of numerical inequality. Nature 215(5109):1519–1520.CrossrefGoogle Scholar
  • Navon D (1978) The importance of being conservative: Some reflections on human Bayesian behaviour. British J. Math. Statist. Psych. 31:33–48.CrossrefGoogle Scholar
  • Payzan-LeNestour E, Bossaerts P (2015) Learning about unstable, publicly unobservable payoffs. Rev. Financial Stud. 28(7):1874–1913.CrossrefGoogle Scholar
  • Pitz GF, Geller ES (1970) Revision of opinion and decision times in an information-seeking task. J. Experiment. Psych. 83(3):400–405.CrossrefGoogle Scholar
  • Rabiner LR (1990) A tutorial on hidden Markov models and selected applications in speech recognition. Setti G, ed. Readings in Speech Recognition (Elsevier, Amsterdam, Netherlands), 267–296.CrossrefGoogle Scholar
  • Ramaprasad B, Shigeyuki H (2004) Hidden Markov Models: Applications to Financial Economics (Springer Science & Business Media, Berlin).Google Scholar
  • Redner RA, Walker HF (1984) Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26(2):195–239.CrossrefGoogle Scholar
  • Ritov I, Baron J (1992) Status-quo and omission biases. J. Risk Uncertainty 5(1):49–61.CrossrefGoogle Scholar
  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599.CrossrefGoogle Scholar
  • Shachat J, Wei L (2012) Procuring commodities: First-price sealed-bid or English auctions? Marketing Sci. 31(2):317–333.LinkGoogle Scholar
  • Shachat J, Swarthout JT, Wei L (2015) A hidden Markov model for the detection of pure and mixed strategy play in games. Econometric Theory 31(4):729–752.CrossrefGoogle Scholar
  • Shiller RJ (2005) Irrational Exuberance, 2nd ed. (Princeton University Press, Princeton, NJ).Google Scholar
  • Shleifer A (2000) Inefficient Markets: An Introduction to Behavioural Finance (OUP Oxford, Oxford, UK).CrossrefGoogle Scholar
  • Spiliopoulos L, Ortmann A (2018) The BCD of response time analysis in experimental economics. Experiment. Econom. 21(2):383–433.CrossrefGoogle Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Thaler RH (2005) Advances in Behavioral Finance, vol. 2 (Princeton University Press, Princeton, NJ).Google Scholar
  • Thorndike EL (1911) Animal Intelligence: Experimental Studies (MacMillan, New York).CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.