Chapter 2:
Models of Quantitative Estimations: Rule-Based and Exemplar-Based Processes Compared

Abstract

↓60

In the area of categorization it has been argued that explicit, rule-based processes and implicit, similarity-based processes compete to control behavior. A similar division of labor has been suggested for multiple-cue judgment and estimation tasks (Juslin, Karlsson & Olsson, in press). Recently, however, Helversen and Rieskamp (in press) proposed a simple rule-based model, the mapping model, that outperformed the exemplar model in a task that was thought to promote exemplar-based processing. This raised the question of under which circumstances a shift to exemplar-based processing can be observed. In the present research we investigate the impact of task structure on two core assumptions of the mapping model: the establishment of an exemplar memory base and the abstraction of explicit knowledge about the task. Our results indicate that knowledge about cues is decisive. When knowledge about cues existed, the mapping model was the best model; however, if knowledge about the cues was difficult to abstract, participants’ estimations were best described by an exemplar model.

Models of Quantitative Estimations:
Rule-Based and Exemplar-Based Processes Compared

How do people estimate a continuous quantity, such as the selling price of their house or the quality of a job candidate? In many cases people base their estimations on cues that are probabilistically related to the quantity being estimated. For example, when estimating the selling price of a house people could rely on information such as the size of the house, the attractiveness of the neighborhood, or the presence of a deck. Cognitive models of estimation try to explain which cues people use and how they integrate them to estimate a continuous criterion, that is, the quantity of interest.

↓61

Previous research has been dominated by the use of linear additive models for describing people’s estimations, such as multiple linear regression. Recently new estimation models have been successfully introduced as alternatives to the standard regression approach. First, Juslin, Karlsson, and Olsson (in press) have argued that people frequently do not rely on rules when making estimations, but on an exemplar-based process. According to the exemplar model people estimate the criterion of an object by retrieving the criterion values of similar exemplars from memory. Second, Helversen and Rieskamp (in press) have argued that people follow a rule-based process, which differs considerably from the process assumed by linear additive approaches. Introducing the mapping model, Helversen and Rieskamp proposed that people estimate the criterion value of an object by first categorizing the object by the number of positive cue values and then using a typical criterion value of past objects with the same number of positive cues as an estimate. Although the exemplar model and the mapping model argue for conceptually different estimation processes, both models have been proposed for estimation tasks in which the standard regression approach did not provide a good account of people’s estimations. The goal of the present article is to test these two models rigorously against each other to examine in more detail in which situations people follow an exemplar-based or a rule-based process for making quantitative estimations.

Models of Estimation

Consistent with the widespread assumption that human cognition comprises competing multiple systems (Ashby, Alfonso-Resese, Turken, &Waldron, 1998; Hahn & Chater, 1998; Nosofsky, Palmeri, & McKinley, 1994), models of quantitative estimations can be broadly classified by the underlying processes they assume. In general, explicit, rule-based processes are distinguished from more implicit, similarity-based processes (Hahn & Chater, 1998; Juslin, Olsson, & Olsson, 2003; Olsson, Enkvist, & Juslin, 2006; Patalano, Smith, Jonides, & Koeppe, 2001; Nosofsky et al., 1994). The dominant approach to quantitative estimation falls clearly into the category of rule-based models. Accordingly, estimation processes are conceptualized as a process of weighting and adding information, which can be described by linear additive models such as multiple linear regression (Anderson, 1981; Brehmer, 1994; Brunswik, 1952; Hammond, 1955; Hammond & Stewart, 2001). Regression models assume that for each cue, the relation between cues and criterion is abstracted and explicitly represented as a cue weight; the judgment is then made by summing the weighted cue values. The cue weights that best describe the judgment policy are found by a regression analysis (Cooksey, 1996; Doherty & Brehmer, 1997). In this vein, linear regression has been successfully applied to analyze judgments in many areas, such as clinical diagnostics (e.g. Harries & Harries, 2001), legal and medical decision making (Ebbesen & Konecni, 1975; Wigton, 1996), and personality evaluations (e.g. Zedeck & Kafry, 1977, for a review, see Brehmer & Brehmer, 1988).

Lately, however, alternative models have been suggested to describe the estimation process (Helversen & Rieskamp, in press; Hertwig, Hoffrage, & Martignon, 1999; Juslin et al., in press). Following the idea that cognitive processes are largely a function of the characteristics of the task environment (Ashby & Maddox, 2005; Erickson & Kruschke, 1998; Gigerenzer & Todd, 1999; Juslin, Jones, Olsson, & Winman, 2003; Payne, Bettman, & Johnson, 1993; Rieskamp, 2006; Rieskamp, Busemeyer, & Laine, 2003; Rieskamp & Otto, 2006), Juslin et al. (in press) suggested a shift to exemplar-based processing in nonlinear decision tasks. In contrast to a linear estimation task, where the criterion is a linear function of the cues, a task is assumed to be nonlinear if the criterion follows from a nonlinear combination of the cues. Recently Helversen and Rieskamp (in press) proposed a new rule-based model for quantitative estimation, the mapping model, that also outperformed linear regression in a nonlinear decision task. Even more puzzling, testing the two models against each other led to inconsistent results. In the third experimental study of Helversen and Rieskamp the mapping model was clearly superior to the exemplar model, but when the mapping model was tested against the exemplar model in a reanalysis of the first experimental study of Juslin and colleagues, the results were contradictory. In our past work (Helversen & Rieskamp, in press) we presented substantial evidence to support the idea that the mapping model is a strong competitor to the exemplar model and generated some expectations about the task characteristics leading to exemplar-based or rule-based estimation processes.

↓62

In the present article we venture to test these expectations. We claim that the nonlinearity of the environment, although important, is not a sufficient factor to trigger exemplar-based estimation processes. Instead, we argue, the two models make specific assumption about the cognitive process underlying estimation. Following these assumptions, two cognitive components of the estimation process are essential: For an exemplar-based process, the quality of the exemplar memory is essential, whereas for a rule-based process, the abstraction of explicit task knowledge is decisive. In the following, we will introduce the two models of estimation and then discuss how the structure of the estimation task affects the essential cognitive component of each model and consequently explains the diverging results reported by Helversen and Rieskamp (in press) and Juslin et al. (in press; Olsson, et al. 2006).

Competing Theories

Both Helversen and Rieskamp (in press) and Juslin and colleagues (2003; in press) argued that linear additive approaches such as linear regression can predict participants’ behavior in a linear estimation task, but not in a nonlinear task. For nonlinear environments they suggested competing theories. Helversen and Rieskamp proposed the rule-based mapping model, whereas Juslin et al. suggested a similarity-based exemplar model.

The Mapping Model

↓63

The mapping model assumes a simple rule-based estimation process. Accordingly, people estimate the criterion of an object by first categorizing the object by the number of positive cue values and then using the typical criterion value of past objects with the same number of positive cues as an estimate. For example, when estimating the price of a house, the mapping model assumes that people first count the number of positive features of the house that favor a high price (e.g. great location, a deck, a swimming pool). Then the number of positive features is used to categorize the house into a certain price class and the typical price for houses within this price class is used an estimate.

The mapping model is inspired by the framework for quantitative estimation developed by Brown and Siegler (1993). Brown and Siegler proposed that two types of information are necessary for an estimation: knowledge about the mappings, that is, the ordinal relation of the objects according to the criterion of interest; and knowledge about the metrics, that is, the numeric properties of the objects, such as the distribution, the range, or the mean of possible estimates. The mapping model relies on binary cue information; each cue is coded as having a positive or a negative cue value and all cues are coded so that they correlate positively with the criterion.

In a first step, knowledge about the mappings is inferred from the cue values by counting the number of positive cue values and grouping objects together according to their cue sums. This implies that all cues are weighted equally. In a second step, knowledge about the metric properties is derived by abstracting a typical estimate for each category, represented by the median criterion values of the objects falling into the same category. When estimating the criterion value for a new object, first the category it falls in is determined by counting the number of positive cue values, and then the typical estimate for this category is abstracted and given as an estimate.

↓64

The Exemplar Model

In contrast, the exemplar model assumes a similarity-based process. According to the exemplar model people estimate the criterion of an object by retrieving the criterion values of similar exemplars in memory. For example, when estimating the price of a house, the exemplar model assumes that people recall the selling prices of similar houses that were sold in the vicinity and use them to estimate the selling price for the house under evaluation. Exemplar models have been successfully employed to explain human behavior in categorization (Juslin et al., 2003; Kruschke, 1992; Nosofsky & Johansen, 2000). As a result of this success they were recently extended to the area of quantitative estimation (Juslin et al., 2003, in press; Olsson et al., 2006).

Exemplar models assume that estimations rely on the similarity of an object to previously encountered objects that are stored in memory. To make an estimation, these previously encountered exemplars are retrieved and compared to the probe, that is, the object under evaluation. The more the probe resembles a retrieved exemplar, the closer the estimate for the probe will be to the exemplar’s criterion value. More specifically, the estimate consists of the average criterion values of the retrieved exemplars, weighted by their similarity to the probe:

↓65

(1)

where is the estimated criterion value for the probe p; S is the similarity of the probe to the stored exemplars; x i is the criterion value of the exemplar i; and I is the number of stored exemplars in memory. The similarity S between a stored exemplar and the probe depends on how many features the exemplar and the probe share. It is calculated using the multiplicative similarity rule of the context model (cf., Medin & Schaffer, 1978), defined as

(2)

↓66

For each cue j it is determined whether the cue values of the probe p and the stored exemplar i match. If they match, d equals one, and if they do not match, d equals the attention parameter s j , which captures the impact of a cue on the overall similarity and varies between zero and one. The closer s j is to zero, the more important the cue. If s j = 1, this implies that the cue j is irrelevant for the evaluation of the overall similarity. The original exemplar model assumes a separate s j  parameter for each cue j (Juslin et al., 2003; Medin & Schaffer, 1978). However, as the original exemplar model seems to be prone to overfitting, we additionally considered a simplified version with one single attention parameter s for all cues (Helversen & Rieskamp, in press). In this case, s is an attention parameter indicating how closely a retrieved exemplar needs to resemble the probe to be considered for the estimation. The closer s is to zero, the more similar an exemplar has to be to the probe to have an impact on the estimation.

Model Competition

Both the exemplar model and the mapping model provide new and successful modeling approaches to quantitative estimation. However, both models were proposed to explain estimation processes in nonlinear estimation environments. Furthermore, two previous experimental studies led to rather conflicting result regarding which model provided a better account of observed estimations. In the third experimental study reported by Helversen and Rieskamp (in press), the mapping model clearly outperformed the exemplar model in predicting participants’ estimations. In contrast, the reanalysis of the first experiment of Juslin et al. (in press) as reported in Helversen and Rieskamp revealed an advantage of the exemplar model over the mapping model in predicting estimations. Surprisingly, the studies were very similar: In both studies, participants estimated a continuous criterion based on multiple binary cues. The criterion was a multiplicative function of the cues, and the participants received outcome feedback to learn the task. What factors led to these conflicting results, and how can an illuminative test of the two models be performed?

↓67

In general, human cognition can be understood as an adaptation to different environments (Ashby & Maddox, 2005; Gigerenzer & Todd, 1999; Payne et al., 1993; Rieskamp & Otto, 2006). From this view it follows that estimation processes will differ depending on the estimation situation. We argue that the conflicting results found by Helversen and Rieskamp (in press) can be explained by characteristics of the task that affect two essential cognitive components of the models: exemplar memory and knowledge abstraction. Our goal is to test the importance of these components for estimation processes.

Exemplar memory. Exemplar models assume the retrieval of previously encountered exemplars from memory. Thus the quality of memory traces for encountered exemplars plays a key role. Exemplar models can only be successfully applied if memory traces for exemplars exist and can be accurately retrieved. Differences in the quality of exemplar memory could explain the contradictory results reported in the two studies by Helversen and Rieskamp (in press) and Juslin et al. (in press). The studies differed in some aspects that potentially affected the quality of exemplar memory. For one, in the study by Juslin and colleagues the training objects were repeated twice as often as in the study by Helversen and Rieskamp. In addition, a lower number of exemplars (i.e. 11 vs. 16) and less complex exemplars (4 vs. 5 cue dimensions) were used in the training phase of the study by Juslin et al. in comparison to the study by Helversen and Rieskamp. The more often each exemplar is repeated, the better participants should be able to establish accurate memory traces. Furthermore, the fewer training objects that exist and the fewer cue values that have to be stored, the easier it should be to accurately encode and retrieve the training exemplars. Thus both factors could enhance an exemplar-based estimation process.

In line with this argument, Smith and Minda (1998) found that in a categorization task exemplar-based processes occurred later in training, while at the beginning of training participants were better described by an additive prototype model (i.e. a rule-based process). Moreover, they found that the exemplar model performed better when it learned a small category with few dimensions than when it tackled a big, more dimensional category (Minda & Smith, 2001), suggesting that exemplar-based processes should be more accessible, the fewer training exemplars have to be learned and the more frequently training exemplars are encountered.

↓68

Knowledge abstraction. Differences in the availability of task knowledge could also explain the diverging results reported by Helversen and Rieskamp (in press) and Juslin et al. (in press). While the exemplar model relies on accurate representation of encountered exemplars in memory, the mapping model requires the abstraction of explicit task knowledge.

In the third experimental study of Helversen and Rieskamp participants were informed about the directions of the cues, providing explicit knowledge that could be directly applied in the estimation task. However, in the study by Juslin and colleagues, no prior information about the cues was given to the participants, making it more difficult to form explicit knowledge of the cue directions. For the mapping model, knowledge about the predictability and the directions of the cues is crucial. Objects can only be grouped into meaningful categories if the valid cues are used and the directions of the cues are known. Furthermore, prior knowledge about the cue directions decreases the computational demands of the mapping model and could thus foster rule-based processing. In contrast, the exemplar model, relying on the similarity relations of the objects, does not depend on any knowledge about the cues but can be applied successfully as long as objects are sufficiently differentiable. Thus, if no prior knowledge about the cues exists, this could cause a shift in the direction of exemplar-based processing. In particular, if cue directions are difficult to learn, it might be equally demanding to abstract the cue directions than to store the exemplars in memory.

In sum, the mapping model relies on the abstraction of knowledge about the cues and should profit more than the exemplar model from explicit information about the cue directions being provided. In contrast, the exemplar model seems to be particularly suited to capturing the estimation process when it is difficult to gain explicit knowledge about the cues, but intensive training and less complex training material make it possible to establish accurate exemplar memory. We investigated the influence of these factors in two studies, manipulating the quality of the memory traces established as well as the access to knowledge about the cues by varying features of the task.

Methods of Model Selection and Qualitative Tests of Models

↓69

Model selection can be a challenging task. For one, the complexity of the models needs to be taken into account. Although more complex models are better in fitting data they run the risk of overfitting; that is, they not only capture systematic variance but also fit unsystematic variance in the data. Second, models often make very similar predictions, which makes it difficult to devise tests that reliably differentiate between the models.

We addressed the problem of model selection with a twofold approach. First, similar to Helversen and Rieskamp (in press), we used a generalization test (Busemeyer & Wang, 2000). To take model complexity into account, we first estimated the models’ parameters by using the data of a training phase. The estimated parameter values were then used to predict participants’ estimations for new test objects. Second, we devised a qualitative test. Qualitative tests are preferable to pure quantitative models tests (Pitt, Kim, Navarro, & Myung, 2006). They are less dependent on specific parameter values and they provide a critical test of the model assumptions, providing information about the correspondence of the pattern in the data with model predictions. Therefore, we aimed to find qualitative predictions that were specific for each model and could not be derived from the competing model.

For this purpose we focused on the assumptions the models make about which objects should be treated similarly and for which objects the estimations should differ. The mapping model groups objects according to their cue sums, ignoring which specific cue has a positive value. This implies that the model treats all objects with the same cue sum alike and makes the same estimations, whereas objects with different cue sums will be treated differently and estimations will differ. The exemplar model, on the other hand, relies on the similarity relations of the objects to the stored exemplars. Thus two objects that are maximally different should also differ in which exemplars they resemble and thus in the criterion values estimated. This opens the possibility of qualitatively differentiating between the models. For example, while the mapping model will predict the same value for two objects that share the same number of positive cue values but do not match on a single cue, the exemplar model will differ in its estimations (across a wide range of parameter values).1 We used these assumptions of the models to design qualitative model comparison tests in addition to the quantitative model comparison tests.

Study 1

↓70

The goal of Study 1 was to investigate the influence of exemplar memory on model performance. We manipulated the ease with which exemplars could be stored in memory by varying the number of training exemplars. In a multiple-cue estimation task participants evaluated job candidates according to the policy of their company. Each job candidate was characterized by six cues, which the participants could use for their evaluation. In a training phase participants were presented with a number of candidates who had been evaluated by their supervisors. Based on this training sample they could learn how their company evaluated job candidates. In a subsequent test phase, we tested how well they generalized this knowledge to new job candidates and which model was best in predicting their evaluations. We manipulated the size of the training set. In the first condition participants encountered a large number of training exemplars (24), in the second condition a small number of training exemplars (8).

Method

Participants. In Study 1, 40 participants took part, 20 in each condition. The majority of participants were students from one of the Berlin universities, with an average age of 24 years (SD = 4); 30% of the participants were male. Participants were randomly assigned to one of the experimental conditions, balanced for gender. The study lasted for about 1 h 45 min and participants were paid an average of €16 for their participation. One participant in the condition with a low number of training exemplars was excluded from the analysis as he did not improve in evaluating the training candidates during the training phase.

Procedure and material. The study was conducted as a computer-based experiment. The task of the participants was to evaluate the quality of job candidates for an IT position on a scale of 1 to 100 points. The more points a job candidate received the more suited he or she would be for the position. Participants received information about the job candidates on six cues and each cue could have two possible characteristics (i.e. cue values). The six cues and their binary characteristics were knowledge of programming languages (C++ vs. Java), knowledge of foreign languages (French vs. Turkish), additional skills (SAP (a software system) vs. web design), previous work experience (software development vs. system administration), previous employment area (business vs. academia), and knowledge of operating systems (UNIX vs. Windows).

↓71

Participants were told which of the two possible characteristics of the cues matched the companies’ demands; characteristics that matched the companies’ preferences were marked in green, while characteristics that did not meet the companies’ requirements were marked in red. During training participants learned how many points job candidates with different combinations on the six cues had been awarded in previous assessments. The criterion, that is, how many points a job candidate received, was determined as a multiplicative function of the cue values (Helversen & Rieskamp, in press; Juslin et al., in press):

(12)  

where C is the points the job candidates received and c 1 to c 6 the values on the six dimensions. A positive characteristic of a cue was coded with a cue value of 1 and a negative characteristic was coded with a cue value of zero. The assignment of the weights to the cues, which characteristic of a cue was coded as positive or negative, as well as the order of the cues on the screen was randomly determined for each participant.

↓72

The study consisted of two parts, a training phase and a test phase. During the training phase participants could learn the company’s evaluation policy by judging job candidates who had previously been evaluated by their supervisors. In each trial participants saw and were asked to evaluate one job candidate. After each trial they received feedback about the number of points this candidate had received from his or her supervisor, how close their estimate had been, and how many points they earned in this trial (see below). Then the next candidate appeared. All training candidates were repeated 10 times, structured in 10 blocks; the order of appearance in each block was randomly determined.

We manipulated the number of training candidates in this study: In one condition the training set consisted of a large number (24) of different training candidates; in the other condition the training set comprised a small number (8) of training objects. After the training phase participants continued with a test phase in which they had to evaluate 30 more job candidates. The test phase was similar to the training phase, with the difference that participants did not receive immediate feedback about the accuracy of their evaluations and only learned how many points they had earned after they had finished the test phase. The 30 test candidates were evaluated twice. Eight of the candidates in the test phase had also appeared during training and 22 were new candidates participants had not encountered before.

Participants’ payment was based on their performance. In each trial participants could earn up to 100 points depending on how accurately they estimated the quality of the job candidates. The more they deviated from the criterion the fewer points they earned. The exact number of points subtracted for a given deviation was calculated by a feedback algorithm, based on the squared deviation from the estimation to the criterion. This resulted in a rapidly decreasing number of points with less accurate estimations. Additionally, the feedback algorithm incorporated a correction term that determined the deviation that would result in a payoff of zero. It was calculated on the basis of a baseline model that always estimated the average criterion value. Any deviation exceeding the correction term led to the subtraction of points. To exclude the subtraction of a high number of points due to a typing error, the feedback algorithm was truncated. Any deviation larger than 50 was treated as a deviation of 50. A similar feedback algorithm had been successfully used by Helversen and Rieskamp (in press) to create a moderately exacting feedback environment (Hogarth, Gibbs, McKenzie, & Marquis, 1991). After the experiments points were exchanged into euros at a rate of €0.1 for every 150 points.

↓73

Selection of training and test sets. To test which model could explain the participants’ behavior best we relied on a generalization test (see Busemeyer & Wang, 2000). That is, we compared which model was better in predicting the participants’ estimations for a test set consisting of objects they had not encountered during training. However, we did not just compare the quantitative fits of the models, but additionally conducted a qualitative test of the models’ assumptions (see Pitt et al., 2006). Qualitative tests have the advantage that they provide a critical test of model assumptions and can be constructed to be widely independent of model parameters. For this purpose we focused on two qualitative predictions that were derived from the models’ assumptions about the estimation process. Due to these different assumptions of the models it was possible to derive different ordinal predictions, that is, predicted patterns of results that are qualitatively different.

First, according to the mapping model, the same value is estimated for any two objects with an equal number of positive cues, regardless of the similarity of the two objects. In contrast, if two objects are very dissimilar, that is, if they do not match on a single cue, the exemplar model’s estimations should differ. For the experimental task with six cues, an estimation situation in which the mapping model makes identical predictions and the exemplar model makes different predictions occurs for objects with a cue sum of three. To clarify, for any cue profile with three positive and three negative cues (e.g. 111000, with each number representing the cue value of one cue), the mapping model makes the same prediction for an object with the reversed cue profile (i.e. 000111). In contrast, the exemplar model will most likely make different estimation predictions, because these two objects are maximally dissimilar—that is, they do not share any cue values.

Second, we devised an additional experiment situation in which the exemplar model made similar predictions, and the mapping model made different predictions for the test objects. The mapping model makes different predictions for objects when they have different cue sums, for instance, objects with cue sums of 2 and 4. In contrast, for these objects, which necessarily share some cue values, the exemplar model can make very similar estimations. For the test phase we selected test objects with cue sums of 2 and 4 for which the exemplar model indeed made similar predictions.

↓74

To summarize, our qualitative test comprised two conditions in which the exemplar model and the mapping model made qualitatively different predictions. While the mapping model predicted a difference between the estimations for objects with a cue sum of 4 and a cue sum of 2 and no difference for objects with a cue sum of 3, the exemplar model made the opposite predictions. However, the strength of the qualitative predictions depends on the specific training and test objects. For instance, if all training objects had the same criterion value, it would be impossible to differentiate between the models. Accordingly, we aimed at selecting training set–test set combinations where the qualitative predictions of the two models would differ as widely as possible.

We first selected the training set–test set combination for the condition with 24 exemplars. To ensure that the training set would well represent the total set, we constrained the selection of training objects to contain objects with all possible cue sums approximately in proportion to the frequency in the whole set: Each sample had to contain one object with a cue sum of 0, two with a cue sum of 1, five with a cue sum of 2, eight with a cue sum of 3, five with a cue sum of 4, two with a cue sum of 5, and one with a cue sum of 6.

To find a training set–test set combination for which the models made qualitatively different predictions, we generated 100 different training samples. Next, we calculated model predictions for the remaining objects based on the respective training samples. For the mapping model with no free parameters the predictions could be directly determined from the training samples. In contrast, for the exemplar model we first calculated the optimal parameters to predict the criterion of the training set and then made predictions based on these parameter values. From the 100 samples we selected the training set–test set combination for which the models differed most in their qualitative predictions. For the test set we included objects with cue sums of 2 and 4 for which the mapping model made widely different estimations but the exemplar model made similar estimations. Further, we included pairs of objects with a cue sum of 3 for which the mapping model made identical predictions but the exemplar model made different predictions. Additionally, we included some extra objects on which the models differed strongly in their predictions to enhance quantitative comparisons (see Table 8 for the test set, and see Appendix A for the training set of Study 1). Lastly, we included 8 objects in the test set that had appeared in the training set. In total, the test set consisted of 30 objects, 22 new objects selected for the qualitative tests and the additional 8 old objects.

↓75

Table 8: New test objects in the condition with a large number of training objects

Objects

Cue 1

Cue 2

Cue 3

Cue 4

Cue 5

Cue 6

Criterion

Mapping

Exemplar

Test 2

0

0

0

1

1

0

3

3

7

Test 2

0

0

1

0

1

0

3

3

8

Test 2

0

0

1

1

0

0

3

3

8

Test 2

1

0

1

0

0

0

5

3

7

Test 4

1

0

0

1

1

1

16

24

8

Test 4

1

0

1

1

0

1

18

24

9

Test 4

1

1

0

0

1

1

20

24

8

Test 4

1

1

0

1

0

1

21

24

8

Test 3a

0

0

0

1

1

1

5

8

6

Test 3a

1

1

1

0

0

0

13

8

26

Test 3b

0

0

1

0

1

1

6

8

2

Test 3b

1

1

0

1

0

0

12

8

25

Test 3c

0

0

1

1

0

1

6

8

3

Test 3c

1

1

0

0

1

0

11

8

14

Test 3d

0

1

0

1

1

0

8

8

14

Test 3d

1

0

1

0

0

1

9

8

24

Test 3e

1

0

0

0

1

1

7

8

3

Test 3e

0

1

1

1

0

0

9

8

27

Test/extra

1

0

1

0

1

1

17

24

10

Test/extra

0

1

1

0

1

1

16

24

16

Test/extra

0

0

0

0

1

0

1

2

3

Test/extra

1

0

1

1

1

1

37

44

100

Note. Test 2 denotes objects with a cue sum of 2, Test 3 objects with a cue sum of 3, where pairs with the same letter indicate opposite cue profiles, and Test 4 objects with a cue sum of 4. Test/extra indicates objects that were additionally included in the test set to increase the differences in model predictions.

To select the training set–test set combination for the condition with eight exemplars we repeated the procedure described above. To make conditions more comparable, the 100 training sets with 8 training objects were randomly drawn from the condition with 24 training objects, with the restriction that the training sample contained one object with a cue sum of 0, 1, 2, 4, 5, and 6, and two objects with a cue sum of 3. Again, we obtained model predictions for the remaining objects and selected a test set that maximized the differences in qualitative predictions. Again, the test set consisted of 22 new objects that were not included in the training set and the 8 known objects from the training phase. The training and the test sets are reported in Appendix A.

Finally, we explored the prediction of the models for the models’ parameter space, to determine the range of parameter values for which the models make qualitatively different predictions. For the mapping model this is a simple task because it has no free parameters, so that it makes one single prediction for a specific object. In contrast, the exemplar model’s predictions depend on its values for the attention parameter. We covered the parameter space of the attention parameter s by using the values .001, .1, 2, .5, .7, and .9. Figure 3 illustrates how the predictions of the exemplar model change with increasing parameter values.

↓76

Figure 3: Qualitative model predictions. The models’ predictions for the two qualitative tests, when varying the values of the exemplar model’s attention parameter s. The “4 vs. 2” denotes the predicted average differences in estimations for the criterion values of test objects with a cue sum of 4 and test objects with a cue sum of 2. The “3” refers to the predicted average differences in estimations for the criterion values of the pair of test objects with a cue sum of 3, with maximally different cue profiles (e.g. 111000 and 000111). (A) The predictions for the condition with a small number of training objects. (B) The predictions for the condition with a large number of training objects

With small parameter values a clear difference in the qualitative predictions of the models can be observed: In this case the mapping model predicts different estimations for objects with a cue sum of 4 versus 2, implying large differences in the estimations, which are substantially larger than the zero difference of the same estimates for pairs of objects with a cue sum of 3. In contrast, the exemplar model predicts small differences in estimations for objects with a cue sum of 4 versus 2, which are smaller than the difference of estimations for pairs of objects with a cue sum of 3. The small values for the attention parameter of the exemplar model are most plausible, because they are exactly the ones Helversen and Rieskamp (in press) found to be the best estimates for the exemplar model (i.e. the average estimated parameter values varied between .001 and .17). Thus, when assuming small attention parameter values that perform best in predicting participants’ estimations, the two models make distinct ordinal predictions. Moreover, the results show that over the whole range of parameter values, the models’ predictions do not overlap and that even for parameter values for which the exemplar model predicts the same ordinal data pattern as the mapping model, strong quantitative differences are to be expected.

Results

Overall, the mapping model predicted participants’ estimations significantly better than the exemplar model in both conditions. Somewhat unexpectedly, the advantage of the mapping model was higher in the condition with a small number of training objects than in the condition with a large number of training objects. However, before we come to the model comparisons, we first report the participants’ accuracy.

↓77

Participants’ accuracy. Participants learned to evaluate the training candidates fairly well in both conditions. We measured the participants’ accuracy via the root mean square deviation (RMSD) between the criterion values and the participants’ estimations. In the condition with a large number of training objects RMSD dropped from 15.56, SD =5.62 in the first block to 3.86, SD = 2.07 in the 10th block. Similarly, the RMSD in the condition with a small number of training objects dropped from 22.97, SD = 6.76 in the first block to 3.04, SD = 4.04 in the 10th block. The participants’ accuracy in the test phase did not differ between the two conditions, RMSD large = 5.84, SD = 1.87 versus RMSD small = 7.42, SD = 3.39; U = 137, p = .14. However, in both conditions, accuracy in the test phase was worse than in the training phase, RMSD training = 3.98, SD = 2.32 versus RMSD test = 6.61, SD = 2.80; Z = −4.16, p < .001. Participants were more accurate in the test phase in estimating the old objects known from the training phase than the new objects, RMSD old = 4.49, SD = 6.10 versus RMSD new = 6.69, SD = 2.21; Z = −3.99, p < .001.

Overall, participants were quite consistent in their estimations. Consistency was measured as the Pearson correlation between the first and the second presentation of the test objects. In both conditions consistency was similarly high, r large(20) = .95, SD = .06 versus r small (19) = .94, SD = .06; U = 137, p = .14. Overall, participants were more consistent in estimating old objects than new objects, r old = .98, SD = .05 versus r new = .85, SD =.15; Z = −4.88, p <.001.

Model parameters. To test which model predicted participants’ estimations best, we first fitted both models on the last blocks of the training phase for each participant individually. In the condition with a large number of training objects we used the last three blocks and in the condition with a small number of training objects the last four blocks to fit the models on a sufficient number of training objects. Based on the parameters estimated we made predictions for the test phase. Goodness-of-fit was determined as the RMSD of the model prediction from the participants’ estimations. Additionally, we report the coefficient of determination r². The exemplar model’s parameter was estimated by using participants’ estimations for the last blocks of the training phase with a knowledge base consisting of the objects from the training phase with their correct criterion values. The best value for its free attention parameter was found by a grid search followed by a nonlinear least square method (as implemented in MATLAB). For the condition with a large number of training objects a mean attention parameter value of s = .01 (SD = .01) was estimated; likewise, a mean attention parameter value of s = .01 (SD = .05) was estimated for the condition with a small number of training objects. As expected, these values are rather small and correspond to the findings of Helversen and Rieskamp (in press). For the mapping model no parameters needed to be estimated. We simply calculated the typical criterion value for all objects of the training set with the same cue sum (using the correct criterion values of the objects).

↓78

In addition to the reported model comparisons, we also tested two further models to rule out that they would predict participants’ behavior better than the mapping model or the simplified exemplar model we report. We included a standard exemplar model with a free parameter for every cue, as this is the exemplar model originally suggested by Juslin and colleagues (2003; see also Medin & Schaffer, 1978). We also tested a linear regression model, as it has been shown to describe participants’ behavior well in other estimation tasks (Brehmer, 1994; Helversen & Rieskamp, in press; Juslin et al., in press). Both models performed worse than the mapping model and the simplified exemplar model.2 

Quantitative model comparison. In the training set both models described participants’ estimations fairly well (for means see Table 9). We used the nonparametric Wilcoxon test to analyze which model explained participants’ estimations best. For the training phase the exemplar model performed better than the mapping model in both conditions, Z small = −2.20, p = .03; Z large = −3.21, p < .01. However, the better fit of the exemplar model during training can be explained by its higher flexibility and should not be decisive for model selection. The crucial test is how well the models predict participants’ estimations in the test phase for the new objects they did not encounter during training.

Table 9: Model accuracies in Study 1

Number of training objects

Large

Small

Mapping

Exemplar

Mapping

Exemplar

Training set

RMSD

5.37

4.38

3.63

3.53

SD RMSD

1.12

1.66

2.77

2.81

r²

0.94

0.95

0.97

0.98

SD r ²

0.03

0.03

0.05

0.24

Test set: Old

RMSD

5.28

3.27

5.94

5.85

SD RMSD

1.70

2.54

8.24

8.21

r²

0.97

0.98

0.91

0.92

SD r ²

0.02

0.02

0.22

0.22

Test set: New

RMSD

5.87

15.45

5.74

22.63

SD RMSD

2.32

2.37

3.51

1.82

r²

0.75

0.39

0.77

0.41

SD r ²

0.19

0.16

0.16

0.10

Test set: Total

RMSD

5.80

13.39

6.63

20.00

SD RMSD

1.93

2.10

4.02

2.06

r²

0.91

0.67

0.86

0.52

SD r ²

0.06

0.10

0.18

0.09

Note. N Total = 39, with N = 20 in the high training condition and N = 19 in the low training condition. RMSD = root mean squared deviation

↓79

Here, the mapping model clearly outperformed the exemplar model in both conditions. In the condition with a large number of training objects it reached a RMSD of 5.87, SD = 2.32, compared to the exemplar model with a RMSD of 15.45, SD = 2.37; Z = −3.92, p < .01. Also in the condition with a small number of training objects the mapping model was clearly superior, RMSD mapping = 5.74, SD = 3.52 versus RMSD exemplar = 22.63, SD = 1.82; Z = −3.82, p <.01 (see also Table 9). Somewhat unexpectedly, the exemplar model performed better in the condition with a large number of training objects than in the condition with a small number of training objects (U = 5, p <.01). This appears to be contrary to the prediction that the exemplar model’s performance should improve with fewer training objects. However, these results have, in fact, no implication for this prediction, because the mapping model outperformed the exemplar model in both conditions, which suggests that participants did not rely on exemplar-based processes in either condition. Consistent with this interpretation, the mapping model performed equally well in both conditions; U = 167, p = .53.

Qualitative model comparison. Though the quantitative model comparison already indicated that the mapping model was better suited to predict participants’ estimations, we additionally relied on a qualitative test. The qualitative test was designed to specifically test the models’ assumptions about the cognitive process underlying estimations. To test the models’ predictions, we determined for each participant and model the mean difference between the estimations for the objects with a cue sum of 2 and 4 and for the pair of objects with a cue sum of 3. As expected from the parameter space analysis illustrated in Figure 3, for both experimental conditions the models made clearly distinct qualitative predictions, as illustrated in Figure 4.

Figure 4: Qualitative test in Study 1. (A) Qualitative predictions of the models and the participants’ estimations in the condition with a large number of training objects (N = 20). (B) Qualitative predictions of the models and the participants’ estimations in the condition with a small number of training objects (N = 19). Sum of cue values 3 gives the average difference in estimations for the criterion values of the pair of test objects with a cue sum of 3 with maximally different cue profiles. Sum of cue values 4 vs. 2 gives the average difference in estimations for the criterion values of test objects with a cue sum of 4 and test objects with a cue sum of 2; error bars denote ±1 SD.

↓80

In the condition with a small number of training objects, the exemplar model predicted a small difference of 1.2 points while the mapping model predicted a difference of 22 points for test objects with cue sums of 2 and 4. In contrast, for the pairs of objects with a cue sum of 3, the mapping model predicted no difference, while the exemplar model predicted that estimations would differ by 18.4 points. Although not quite as pronounced, the same interaction was predicted in the condition with a large number of training objects. The predictions of the mapping model were clearly supported by the data. In both conditions participants’ estimations differed strongly for the objects with a cue sum of 4 and a cue sum of 2. With a mean difference of 18.1 points (SD = 4.5) in the condition with a small number of training objects and 17.2 points (SD = 5.3) with a high number of training objects, they were close to the difference predicted by the mapping model. Likewise, the participants’ estimations for the objects with the same cue sum but maximally different cue profiles corresponded to the assumptions of the mapping model. The difference in estimations between pairs of objects with the same cue sum were on average M = 1.3 (SD = 2.1) for the condition with 24 training objects and M = −1.8 (SD = 3.2) for the condition with 8 training objects.

Discussion of Study 1

Study 1 supported the mapping model in an estimation task with multiple predictive cues and a nonlinear criterion. It predicted well how participants estimated values for objects they had not seen during training, obviously capturing the process underlying the estimations. In comparison, the exemplar model performed quite poorly; although it was able to accurately describe the estimations during training, it could not predict the estimations for the test phase. These results indicate that the number of training objects is not a crucial factor for model performance on its own.

However, one reason we did not find an effect of the number of training objects could be that the establishment of a stable exemplar memory requires more training, even if the number of exemplars is rather small. In our study every training object was repeated 10 times, leading to a quite accurate performance of the participants in the estimation task. Nevertheless, studies investigating exemplar-based approaches often provide more training. For instance, Minda and Smith (2001) presented training objects up to 60 times each and Juslin et al. (in press) presented each object 20 times. Furthermore, Smith and Minda (1998) suggested that exemplar-based processes only occur later in training. Thus a higher amount of training could be necessary to detect a shift in processing.

↓81

A second possibility for why the mapping model outperformed the exemplar model in both conditions is that we provided knowledge about the cue directions. This knowledge could trigger rule-based processing in accordance with the mapping model. If cue directions are known, the processes assumed by the mapping model require only a minimum of computation. However, if cue directions first need to be learned, this leads to a higher effort that has to be invested for the knowledge abstraction the mapping model requires. In contrast, for an exemplar-based estimation it is not necessary to know the direction of a cue. The amount of computation is the same, regardless of whether the cue directions are known. Accordingly, exemplar-based processes could be favored if participants do not know the cue directions. We tested these predictions in Study 2.

Study 2

Study 1 failed to elicit a shift to exemplar-based processing. In Study 2 we addressed two possible reasons for the poor performance of the exemplar model in Study 1. For one, establishing reliable exemplar memory could require extensive training. Thus, we increased the training to 20 blocks, to ensure that stable memory traces could be established. Second, the availability of explicit knowledge about the cues could have primed rule-based processing in Study 1. Because the exemplar models’ performance is largely independent of explicit task knowledge, providing no information about the cues should present conditions favorable for the exemplar model. However, a shift to exemplar-based processing might not only depend on the availability of knowledge, but also on the ease with which knowledge can be abstracted. If picking up the cue directions during training is easy, the mapping model could still prevail. In Study 1 (see Table 10) all cues correlated substantially with the criterion, which should make it fairly easy to pick up the cues’ directions (Hoffman & Murphy, 2006; Klayman, 1988a). Thus, to additionally manipulate the ease with which the cue directions could be learned we also manipulated how demanding it was to detect the correct directions of the cues. For this purpose we created a training set in which only half of the cues were predictive whereas the other half were useless for estimating the criterion values. This should increase the difficulty of inferring the cues’ directions for predicting the criterion (Brehmer, 1973).

Table 10: Cue–criterion correlations in Study 2

Cue 1

Cue 2

Cue 3

Cue 4

Cue 5

Cue 6

Criterion (six predictive cues)

0.37

0.60

0.63

0.60

0.47

0.43

Criterion (three predictive cues)

0.79

0.15

0.17

0.58

0.56

0.11

↓82

In addition, an estimation situation in which only a few cues are predictive is a difficult one for the mapping model. The mapping model assumes that all cues are included in the estimation process. Therefore if the participants learn that only a few cues are predictive and the others can be ignored, the prediction of the mapping model, which uses all cues, can become completely wrong. Thus, if no knowledge about the cues is available and in addition it is demanding to abstract this knowledge in the training phase, this should provide optimal conditions to observe a shift form a rule-based to an exemplar-based estimation process.

Method

Participants. In Study 2, 80 students from one of the Berlin universities participated (average age = 25 years, SD = 3); 33% of the participants were male. Participants were randomly assigned to one of the four experimental conditions, balanced for gender. The study lasted for about 1 h 30 min and participants were paid on average €14 for their participation.

Design, procedure, and material. In Study 2 we increased the training phase, providing twice as many learning trials in comparison to Study 1. In addition, we manipulated the prior knowledge about the directions of the cues and the ease with which the cues’ directions could be learned with two between-subjects factors, providing a 2 × 2 experimental design. Similar material to Study 1 was used. Again, participants were asked to evaluate the quality of job candidates based on the six binary cues described in Study 1. However, in Study 2 only half of the participants were told which cue values were regarded as positive and which as negative. The other half needed to discover the cues’ directions during the training phase. Additionally, we manipulated how easily the cues’ directions could be learned. One half of the participants were provided with the identical set of training objects used in the training phase of the condition with 8 training objects in Study 1. For this set of training objects all cues correlated substantially with the criterion (in all cases r > .35). For the other half of participants we used a different set of training objects, so that three cues correlated highly with the criterion (r > .5) and three correlated poorly (r <. 2). The exact cue–criterion correlations are reported in Table 10. The selection of objects for the training and test phases for the second condition was achieved in the same way as in Study 1 with the additional constraint on the cue–criterion correlations and the exclusion of extreme profiles (with all positive or all negative cue values, which had to be excluded to achieve the desired cue–criterion correlations).

↓83

Similar to Study 1, Study 2 consisted of a training phase and a test phase. In the training phase participants could learn the companies’ policies for evaluating job candidates by observing how many points the training candidates had received from their supervisors. The training sets in both conditions consisted of eight training exemplars. In comparison to Study 1, we increased the duration of the training to 20 trials per candidate, structured in 20 blocks. In each block the eight training candidates were presented in a random order. Participants were paid contingent on their performance, based on the same feedback algorithm used in Study 1. However, to prevent participants from becoming discouraged by overly negative feedback in the beginning of the study, we truncated the feedback algorithm, similar to in Study 1. However, to counteract the higher difficulty in the conditions with no prior information, we decreased the maximum deviation: In Study 2 any deviation larger than 30 was treated like a deviation of 30. The training phase was followed by a test phase consisting of 30 objects with 22 new and 8 old objects that participants evaluated twice. The test objects were selected in the same way as in Study 1 to allow a qualitative test of the models. The training and test sets are reported in Appendix A (Tables A2 and A3). After the test phase, participants who had not been informed about the cue directions were asked to indicate which cue values went with higher criterion values.

Results

As in Study 1, the mapping model outperformed the exemplar model when the direction of the cues was known to the participants. However, when the cue direction had to be learned during training, which model predicted the participants’ estimations best depended on the number of predictive cues, that is, cues that correlated substantially with the criterion. In the condition in which all cues were predictive, the mapping model was still the best model in predicting the estimations. Only in the condition in which the direction of the cues was unknown to the participants and only three cues were predictive did the exemplar model outperform the mapping model.

Participant performance. The participants learned to evaluate the job candidates correctly in all conditions, dropping from an average RMSD of 27.31, SD =12.61 in the first block to 3.77, SD = 5.90 in the 20th block. However, training accuracy depended on the knowledge of the cue directions. Participants were more accurate in their estimations when they knew the cue directions (RMSD = 2.07, SD = 2.03) than when they did not (RMSD = 7.43, SD = 6.79; U = 364, p < .01). If the cue directions were known, participants did better if all cues were predictive (RMSD = 1.40, SD = 2.03) than if only half were predictive (RMSD = 2.75, SD = 1.82; U = 94, p < .01). However, if the cue directions were not known, participants performed equally well (RMSD three predictive cues = 6.11, SD = 4.09 vs. RMSD six predictive cues = 8.74, SD = 8.62; U = 193, p = .86). Overall, participants’ estimation accuracy was better for the training phase than for the test phase (RMSD training = 4.75, SD = 5.66 vs. RMSD test = 11.82, SD = 5.79; Z = −7.62, p < .01).

↓84

To measure the consistency of participants’ estimations we calculated the Pearson correlation between the two judgments of the same objects during the test phase. A similar pattern to that found for participants’ accuracy emerged: Participants were more consistent when they knew the cue directions (r = .92, SD = .11) than when they learned them during training (r = .81, SD = .17; U = 448, p < .01). When the participants knew the cue directions, the number of predictive cues did not matter (r three predictive cues = .92, SD = .11 vs. r six predictive cues = .92, SD = .10, U = 193, p = .86). However, when the cue directions were learned during training, participants were more consistent when all cues were predictive (r = .86, SD = .15) than when only three cues were predictive (r = .76, SD = .17, U = 122, p = .04). Overall, participants were more consistent in estimating the old objects than estimating the new objects (r old = .93, SD = .14 vs. r new = .79, SD = .22; Z = −5.50, p < .01).

Knowledge of cue directions. To examine whether our manipulation of the ease with which the cue directions could be learned had an effect, we compared how many mistakes participants made in reporting the correct directions of the cues. As expected, participants performed better when all six cues were predictive (i.e. correlated substantially with the criterion) than when only three cues were predictive. When all cues were predictive, 7 (35%) participants indicated for at least one cue an incorrect direction; whereas when only three cues were predictive, 14 (70%) participants made at least one mistake. In particular, the participants had difficulty in correctly reporting the direction of the low-quality cues (i.e. those that correlated only slightly with the criterion), with a total of 16 mistakes in comparison to only 8 mistakes with the high-quality cues.

Quantitative model comparison. As in Study 1 we used the last four blocks of the training phase to estimate individually the exemplar models’ attention parameter. Furthermore, we used the objects’ correct criterion values in the training phase to determine the median estimates for the mapping model’s estimation categories. The categories were formed on the basis of all six cues.3 In this way we determined the models’ predictions for the new objects in the test phase. Model performance was measured as the RMSD between model predictions and participants’ estimations. Additionally, we report the r² as a second measure of goodness-of-fit. Again, we also included a version of the exemplar model with a free parameter for every cue and a linear regression model in the comparison. As neither of the two models was the best model in any condition, we do not report the model fits here. Although our conclusions are not affected by these results, they are reported in Appendix B to provide a complete picture.

↓85

Again, the exemplar model (RMSD total = 4.51, SD = 5.29) outperformed the mapping model (RMSD total = 4.89, SD = 5.54) during training in describing participants’ estimations, Z = −5.25, p <.01. This was expected, as the exemplar model is more flexible than the mapping model. Interestingly, both models fitted the participants better when participants knew the cue directions than when they did not, U mapping = 375, U exemplar = 364, both p < .01; following a similar pattern to the accuracy of the participants. Table 11 reports for all conditions the mean RMSDs and SDs.

Table 11: Model accuracies in Study 2

Number of predictive cues

Six predictive cues

Three predictive cues

Cue directions

Cue directions

Known

Unknown

Known

Unknown

Mapping

Exemplar

Mapping

Exemplar

Mapping

Exemplar

Mapping

Exemplar

Training set

RMSD

1.77

1.40

8.86

8.40

2.89

2.68

6.03

5.61

SD RMSD

1.77

2.03

8.56

8.07

1.75

1.78

3.94

3.63

r²

.99

.99

.88

.87

.93

.94

.74

.74

SD r ²

.01

.01

.17

.20

.07

.07

.27

.26

Test set: Old

RMSD

3.44

3.25

8.39

8.92

3.12

2.99

6.35

6.02

SD RMSD

3.73

3.90

7.94

8.05

2.46

2.44

3.22

3.03

r²

.98

.98

.89

.87

.92

.92

.72

.72

SD r ²

.06

.06

.19

.20

.11

.11

.23

.23

Test set: New

RMSD

6.34

23.50

16.34

22.22

10.36

14.78

12.24

8.71

SD RMSD

4.00

2.85

7.36

4.29

4.50

3.47

2.21

1.92

r²

.71

.37

.43

.40

.66

.24

.17

.39

SD r ²

.27

.17

.25

.23

.17

.09

.16

.16

Test set: Total

RMSD

5.98

20.29

14.88

19.90

9.11

12.80

11.09

8.16

SD RMSD

3.47

2.48

7.02

4.20

3.86

3.01

1.95

1.94

r²

.89

.51

.64

.52

.69

.36

.28

.49

SD r ²

.12

.11

.23

.19

.16

.09

.16

.18

Note. N Total = 80, with N = 20 in each condition. RMSD = root mean squared deviation

Again, the crucial model comparison test consisted of how well the two models were able to predict participants’ estimations for the new independent objects of the test phase. Figure 5 shows that in the condition replicating the Study 1 condition with a small number of training objects (where the participants knew the cue directions and where all cues were predictive), with the only difference being having a larger number of training trials, the mapping model again clearly outperformed the exemplar model, RMSD mapping = 6.33, SD = 4.00 versus RMSD exemplar = 23.50, SD = 2.85, Z = −3.92, p <.001. Thus, by simply having more training, the participants did not switch to an exemplar-based estimation process. Similarly, when the cue directions were known but only half of the cues were predictive, the mapping model predicted the participants’ estimations better than the exemplar model, RMSD mapping = 10.36, SD = 4.50 versus RMSD exemplar = 14.78, SD = 3.47, Z = −3.92, p < .01. Furthermore, the mapping model was still the superior model when the participants had to learn the directions of the cues, and all cues were predictive, RMSD mapping = 16.34, SD = 7.36 versus RMSD exemplar = 22.22, SD = 4.29, Z = −2.80, p < .01. However, when the participants needed to abstract the directions of the cues during training and this was difficult because only three cues were predictive, the exemplar model outperformed the mapping model, RMSD mapping = 12.24, SD = 2.20 versus RMSD exemplar = 8.71, SD = 1.92, Z = −3.62, p < .01.

↓86

Figure 5: Models’ accuracy in predicting the participants’ estimations for the new objects in the test phase of Study 2. (A) Models’ accuracy when the cues’ directions were known (N = 40; 20 for each condition). (B) Models’ accuracy when the cues’ directions were not known (N = 40; 20 in each condition).

An additional analysis of the correlation between the models’ accuracy and the number of mistakes participants made when indicating the cue directions provided further evidence for a shift in processing. In the condition with unknown cue directions and six predictive cues, the mapping model performed worse the more cue directions a participant had indicated incorrectly, r(20) = .64, p < .01, suggesting that the difference in performance between the conditions with known and unknown cue directions was at least partly due to the failure of some participants to learn the cue directions. In contrast, in the condition with only three predictive cues this relation was not significant, r(20) = .21, p = .38, suggesting a shift in processing.

Qualitative model comparison. Similar to Study 1, we also tested which of the qualitatively different predictions of the two models were in line with the observed estimations. Again, we compared the predictions of the exemplar model and the mapping model by taking the difference in estimations for the pairs of objects with a cue sum of 3 and the objects with cue sums of 2 and 4. For the pairs of objects with a cue sum of 3 the mapping model made the same predictions whereas the exemplar model made different predictions. In contrast, for the objects with cue sums of 2 and 4 the mapping model made the different predictions and the exemplar model made similar predictions.

↓87

Figure 6 shows that the results of the qualitative tests clearly supported the quantitative model comparison tests. When the participants knew the cue directions, their estimations were in line with the mapping model’s predictions. Similarly, when the participants did not know the cue directions, but all cues were predictive, the participants showed a similar pattern to that predicted by the mapping model. Only in the condition in which the participants did not know the cue directions and only three cues were predictive was the qualitative pattern of the estimations consistent with the exemplar model’s predictions.

Figure 6: Qualitative test in Study 2. (A) Qualitative tests for the condition with known cue directions but only three predictive cues are shown. (B) Qualitative tests for the conditions with known cue directions and six predictive cues. (C) Qualitative tests for the condition with unknown cue directions and three predictive cues. (D) Qualitative tests for the condition with unknown cue directions and six predictive cues. Sum of cue values 3 gives the average difference in estimations for the criterion values of the pairs of test objects with a cue sum of 3 with maximally different cue profiles. Sum of cue values 4 vs. 2 gives the average difference in estimations for the criterion values of test objects with a cue sum of 4 and test objects with a cue sum of 2. Error bars denote ±1 SD; N = 20 in each panel.

Discussion of Study 2

Study 2 confirmed our prediction that a rule-based process as described by the mapping model depends on accurate knowledge abstraction, which is not necessary for the exemplar model. In line with this theoretical foundation, which model was most capable of predicting participants’ estimations in Study 2 crucially depended on accurate knowledge about the cue directions. In the two conditions in which participants were told which cue values were regarded as positive evidence, the mapping model was clearly better in explaining participants’ behavior. However, when the participants had to learn the cue directions during training and when this was difficult, because only three cues substantially correlated with the criterion, the exemplar model was the superior model. These results are consistent with the results reported by Helversen and Rieskamp (in press) and Juslin et al. (in press) and shed light on why the authors had found support for the mapping model in one study, while in another study the exemplar model was superior.

↓88

Although the mapping model clearly outperformed the exemplar model in the two conditions in which all cues were predictive, it should be noted that the mapping model predicted the estimations worse when the participants learned the cue directions than when the participants were informed about the directions. This result is partly attributable to some participants who failed to learn the cue directions. Furthermore, the condition where the cue direction had to be learned was apparently also quite difficult, indicated by a high variance between participants’ estimations and the relatively poor performance during training. Thus, although participants managed to learn most of the cue directions, this happened at the expense of accuracy.

General Discussion

Past research has proposed that multiple distinct processing systems control human cognitive behavior. Which system wins out depends on the structure of the task (e.g. Ashby et al., 1998; Juslin et al., in press). For instance, explicit, rule-based processes are assumed to be constrained to tasks in which stimulus dimensions are separable and can be selectively attended to, while implicit, similarity-based processes catch on if the stimulus dimensions are integral (Ashby et al., 1998). Likewise, Erickson and Kruschke (1998; Ashby et al., 1998) argued that rule-based processing in categorization is restricted to easily verbalizable, uni-dimensional rules.

Following up on this line of research, our goal was to test how two recent models of quantitative estimation, the rule-based mapping model (Helversen & Rieskamp, in press) and a similarity-based exemplar model (Juslin et al., 2003; in press), are affected by different task structures. This test was based on theoretical considerations of the crucial cognitive components that are essential for the two models: the establishment of accurate knowledge abstraction for the mapping model and of accurate exemplar memory for the exemplar model. This theoretical grounding allowed us to investigate the link between cognitive processing and task characteristics. Accordingly we predicted that the mapping model would describe participants’ estimations well when knowledge about the task was available or could be easily abstracted during the task. In contrast, exemplar-based processes should be triggered when knowledge abstraction is difficult but the stimulus material allows the accurate storage and retrieval of training exemplars. Our results supported our predictions. The mapping model performed best when the participants were informed about the cues’ directions or could abstract them during training. However, when abstracting knowledge about the cues was difficult but exemplar memory could be used for accurate estimation, the exemplar model was best in predicting participants’ estimations. In the following we will discuss the relevance of establishing accurate knowledge abstraction and exemplar memory for quantitative estimations in more detail.

Exemplar Memory: Number of Training Trials and Number of Objects

↓89

Our results showed that simply increasing the amount of training is not sufficient to trigger an exemplar-based estimation process. Even after we doubled the amount of training in Study 2 and used a small number of objects that had to be learned in the training phase the mapping model still outperformed the exemplar model in predicting participants’ estimations, supporting a rule-based estimation process. Thus, when the participants had access to explicit task knowledge, no shift to an exemplar-based processing occurred even when the training intensity was increased. However, these results only hold in the cases where participants were informed about the cue directions. This suggests that the opportunity to establish stable memory traces of the exemplars does not necessarily lead to reliance on exemplar-based processes. Rather, the processes underlying estimation seem to be triggered early on: In situations in which sufficient knowledge is provided about the task structure people start with a rule-based estimation process and do not necessarily switch to an exemplar-based process even if extensive training is provided. In contrast, when only little knowledge is available about the task structure exemplar-based processes might become more frequent and are enforced by an increased amount of training and a smaller number of training instances (Smith & Minda, 1998).

Knowledge Abstraction 

Providing explicit knowledge about the cues led to a strong effect on the estimation process. The mapping model, which relies on the abstraction of explicit knowledge about the task, clearly suffered when no knowledge about the cue directions was given to the participants prior to the task. Furthermore, in both conditions the participants performed worse during training, indicating that if knowledge about the task needs to be acquired during training, learning can be impeded.

However, the exemplar model was only better in predicting participants’ estimations when just a subset of the cues substantially correlated with the criterion. This suggests that lacking knowledge about the cue directions is not sufficient to trigger exemplar-based processing, but that the accuracy and difficulty with which rule-based estimation processes could be employed played an important role when a shift to exemplar-based processing occurred (see also Ashby et al., 1998; Juslin et al., in press; Olsson et al., 2006).

↓90

The condition with no prior information about the cue directions and only three predictive cues provided especially problematic circumstances for the mapping model, because it affected two of its core assumptions. First, the mapping model assumes that explicit knowledge about the cues is abstracted. This was difficult to achieve, as no information about the cues was available and the cue directions were difficult to pick up. Second, the mapping model assumes that all cues are equally important. However, in this task, in fact, only three cues were substantially correlated with the criterion. Thus, if participants learned to ignore the less valid cues (Castellan, 1973; Klaymann, 1988b), the mapping model should not be able to predict participants’ estimations accurately.

This raises the question of why the mapping model performed well when only three cues were predictive and information about the cue directions was available. The good performance of the mapping model in this condition implies that participants regarded all cues as equally important for making the estimations. Thus, following a rule-based process as described by the mapping model in this condition implies that the participants did not accurately learn the task structure. Instead, to improve their estimation accuracy it would have been advantageous to use only the predictive cues for an estimation. Apparently, providing the participants with explicit knowledge about the direction of the cues led to the inference that all cues were relevant to predict the criterion and thereby triggered a rule-based process. This finding is consistent with a “rule bias” as documented by Ashby and colleagues (1998; see also Olsson et al., 2006). This rule bias implies an initial preference for rule-based over more implicit processing, such as exemplar-based processes. In sum, our results indicate that also in quantitative estimation problems people mainly follow an exemplar-based process when a rule-based process does not provide an accurate solution to the estimation task. The theoretical considerations of the crucial cognitive components that are essential for the two models, namely, the establishment of accurate knowledge abstraction for the mapping model and of accurate exemplar memory for the exemplar model allows the prediction of under which condition a shift to exemplar-based processing can be expected.

Conclusion

Previous research has described estimation processes almost exclusively with multiple linear regression models. Recently new cognitively motivated models, such as the exemplar model by Juslin et al. (in press) and the mapping model by Helversen and Rieskamp (in press; see also Brown & Siegler, 1993) have been proposed to model estimation processes. Interestingly, these models represent two different views on estimation processes. While the exemplar model proposes an implicit, similarity-based process, the mapping model assumes a rule-based process (Ashby et al., 1998; Hahn & Chater, 1998; Juslin et al., 2003, in press). Consistent with previous research on the interplay of rule-based and similarity-based systems in categorization problems, we found evidence for an initial preference for rule-based processes in quantitative estimation tasks. Furthermore, the experimental studies reported in the present article successfully illustrate the link between the cognitive processes assumed by the models and the structure of the environments. We showed that the models’ assumptions about the estimation process were directly affected by different structures of the estimation task, which consequently determined which estimation process prevailed. This highlights not only the impact of task characteristics on information processing, but also the importance of explicit assumptions about the cognitive process for computation modeling approaches.

Appendices

↓91

Appendix A

Training and test sets for Studies 1 and 2

The following tables describe the sets of items that were used in Study 1 and Study 2. Table A1 describes the set of items for the training phase of Study 1. Table A2 describes the set of items for the training and test phases of the condition with a low number of training objects in Study 1 and for the condition with six predictive cues in Study 2. Table A3 describes the set of items for the training and test phases of the conditions with three predictive cues in Study 2.

↓92

Table A1: Sets of objects for the training phases of Study 1

Training condition

Cue 1

Cue 2

Cue 3

Cue 4

Cue 5

Cue 6

Criterion

A & B

0

0

0

0

0

0

1

A

0

1

0

0

0

0

2

A & B

1

0

0

0

0

0

2

A & B

0

0

0

0

1

1

2

A

0

0

0

1

0

1

3

A

0

1

0

0

0

1

3

A

0

1

0

0

1

0

4

A

1

0

0

0

1

0

4

A

0

0

1

1

1

0

7

A

0

1

0

0

1

1

7

A & B

0

1

0

1

0

1

7

A

0

1

1

0

1

0

9

A

1

0

0

1

0

1

8

A & B

1

0

1

0

1

0

10

A

1

0

1

1

0

0

10

A

1

1

0

0

0

1

10

A

0

1

0

1

1

1

14

A & B

1

1

0

1

1

0

24

A

1

1

1

0

0

1

24

A

1

1

1

0

1

0

26

A

1

1

1

1

0

0

27

A & B

0

1

1

1

1

1

33

A

1

1

1

1

1

0

55

A & B

1

1

1

1

1

1

100

Note. A & B = objects that were used for the training condition (A) with a large number of training objects and for the training condition (B) with a small number of training objects. A = objects that were additionally used in the training condition (A) with a large number of training objects.

Table A2: Sets of objects for the training and test phases of Study 1 for the condition with a small number of training objects and of Study 2 for the condition with six predictive cues

Objects

Cue 1

Cue 2

Cue 3

Cue 4

Cue 5

Cue 6

Criterion

Mapping

Exemplar

Test/training

0

0

0

0

0

0

1

1

1

Test/training

1

0

0

0

0

0

2

2

2

Test/training

0

0

0

0

1

1

2

2

2

Test/training

0

1

0

1

0

1

7

8

7

Test/training

1

0

1

0

1

0

10

8

10

Test/training

1

1

0

1

1

0

24

24

24

Test/training

0

1

1

1

1

1

33

33

33

Test/training

1

1

1

1

1

1

100

100

100

Test 2

0

0

0

1

0

1

3

2

7

Test 2

0

0

0

1

1

0

3

2

9

Test 2

0

0

1

0

1

0

3

2

10

Test 2

0

1

0

0

0

1

3

2

7

Test 2

0

1

0

0

1

0

4

2

9

Test 2

0

1

0

1

0

0

4

2

7

Test 3a

0

0

1

0

1

1

6

8

2

Test 3a

1

1

0

1

0

0

12

8

24

Test 3b

1

0

1

0

0

1

9

8

6

Test 3b

0

1

0

1

1

0

8

8

24

Test 3c

1

0

0

0

1

1

7

8

2

Test 3c

0

1

1

1

0

0

9

8

20

Test 3d

1

0

0

1

0

1

8

8

5

Test 3d

0

1

1

0

1

0

9

8

21

Test 3e

1

1

0

0

0

1

10

8

5

Test 3e

0

0

1

1

1

0

7

8

21

Test 4

1

0

1

0

1

1

17

24

10

Test 4

1

0

1

1

1

0

20

24

10

Test 4

1

1

0

1

0

1

21

24

7

Test 4

1

1

1

0

1

0

26

24

10

Test/extra

1

0

1

1

1

1

37

33

100

Test/extra

1

1

1

1

0

1

50

33

100

Note. Test/training indicates the eight objects that constituted the training set in the condition with a small number of training objects in Study 1 and the two conditions with six predictive cues in Study 2. These eight objects also appeared in the respective test sets. Test 2 denotes objects with a cue sum of 2, Test 3 objects with a cue sum of 3, where pairs with the same letter indicate opposite cue profiles, and Test 4 objects with a cue sum of 4. Test/extra indicates objects that were additionally included in the test set to increase the differences in model predictions.

Table A3: Sets of objects for the training and test phases of Study 2 for the condition with three predictive cues

Objects

Cue 1

Cue 2

Cue 3

Cue 4

Cue 5

Cue 6

Criterion

Exemplar

Mapping

Test/training

0

0

1

0

0

0

2

2

2

Test/training

0

0

0

1

0

1

3

3

3

Test/training

0

1

1

0

0

0

4

4

3

Test/training

0

1

1

0

1

0

9

9

8

Test/training

1

0

0

1

0

1

8

8

8

Test/training

1

1

0

1

1

0

24

24

25

Test/training

1

1

1

1

0

0

27

27

25

Test/training

1

0

1

1

1

1

37

37

37

Test 2

0

1

0

0

1

0

4

9

3

Test 2

1

0

0

0

0

1

4

8

3

Test/extra

1

0

0

0

1

0

4

24

3

Test 2

1

0

0

1

0

0

4

8

3

Test/extra

1

1

0

0

0

0

6

18

3

Test 3a

0

0

0

1

1

1

5

3

8

Test 3a

1

1

1

0

0

0

13

16

8

Test 3b

0

0

1

1

0

1

6

3

8

Test 3b

1

1

0

0

1

0

11

24

8

Test 3c

0

1

0

1

0

1

7

3

8

Test 3c

1

0

1

0

1

0

10

16

8

Test 3d

0

1

1

0

0

1

8

4

8

Test 3d

1

0

0

1

1

0

9

24

8

Test 3e

0

1

0

0

1

1

7

9

8

Test 3e

1

0

1

1

0

0

10

27

8

Test/extra

0

1

0

1

1

1

14

13

25

Test 4

0

1

1

0

1

1

16

9

25

Test 4

0

1

1

1

1

0

18

9

25

Test 4

1

1

0

1

0

1

21

8

25

Test 4

1

1

1

0

1

0

26

9

25

Test/extra

1

1

1

1

1

0

55

25

37

Test/extra

1

1

1

1

1

1

100

37

37

Note. Test/training indicates the eight objects that constituted the training set in the two conditions with three predictive cues in Study 2. These eight objects also appeared in the respective test sets. Test 2 denotes objects with a cue sum of 2, Test 3 objects with a cue sum of 3, where pairs with same letters indicate opposite cue profiles, and Test 4 objects with a cue sum of 4. Test/extra indicates objects that were additionally included in the test set to increase the differences in model predictions.

↓93

Appendix B

Accuracies of the Regression Model and the Standard Exemplar Model  

In the following we report the performance of the regression model and the standard exemplar model in predicting participants’ estimations. For Study 2 we additionally tested a regression model and a standard exemplar model with a free attention parameter for every cue. The predictions of the regression model were obtained by running a multiple linear regression with the cues of the training phase as predictors and participants’ estimations in the last four blocks as the dependent variable. On the basis of the obtained cue weights, predictions for the test phase were made. The standard exemplar model was fitted in the same way as the simplified exemplar model, but s was allowed to vary freely for each cue. The regression model and the standard exemplar model performed worse than the mapping model and the simplified exemplar model when all cues were predictive. Only when half of the cues were predictive and cue directions were known was the mapping model better than the regression model and the standard exemplar model (both Z = −3.92, p < .01), but the regression model outperformed the simplified exemplar model (Z = −2.95, p < .01). In the condition of Study 2, where the cue directions were unknown and only three cues were predictive, the regression model (Z = −2.95, p < .01) and the standard exemplar model (Z = −2.24, p = .02) performed better than the mapping model and equally as well as the exemplar model, Z regression = −1.53, p = .13; Z exemplar = −1.57, p = .12. The standard exemplar model predicted the estimations of 5 (25%) participants best, the regression model provided the best estimations for 3 (15%), and the simplified exemplar model for 12 (60%). An overview of the accuracies of the regression model and the standard exemplar model in Study 2 is reported in Table B1.

↓94

Table B1: Accuracies of the regression model and the standard exemplar model in predicting participants’ estimations

Number of predictive cues

Six predictive cues

Three predictive cues

Cue directions

Cue directions

Known

Unknown

Known

Unknown

Regression

Exemplar

Regression

Exemplar

Regression

Exemplar

Regression

Exemplar

Training set

RMSD

12.89

1.20

14.70

7.31

2.92

2.45

4.24

4.53

SDRMSD

.30

1.72

2.54

6.96

1.04

1.69

1.69

2.58

.88

1.00

.81

.91

.94

.95

.85

.85

SD

.003

.01

.12

.14

.05

.06

.13

.14

Test set: Old

RMSD

13.33

3.19

15.61

8.35

3.56

3.07

5.63

5.35

SDRMSD

.92

3.82

3.04

6.30

1.90

2.35

2.72

2.49

.86

.98

.79

.90

.91

.92

.76

.77

SD

.05

.06

.11

.15

.10

.11

.23

.20

Test set: New

RMSD

27.03

29.94

27.94

28.18

13.90

16.57

10.01

10.05

SDRMSD

2.57

8.20

3.52

6.68

3.57

3.30

2.54

3.10

.39

.35

.24

.33

.36

.14

.39

.33

SD

.15

.18

.14

.20

.09

.11

.22

.27

Test set: Total

RMSD

24.16

25.75

25.28

24.72

12.08

14.31

9.14

9.17

SDRMSD

2.15

7.10

3.10

5.72

3.06

2.88

2.22

2.49

.37

.43

.33

.43

.45

.25

.47

.42

SD

.09

.15

.11

.16

.10

.12

.21

.24

Note. N Total = 80, with N = 20 in each condition. RMSD = root mean squared deviation

Footnotes

1. Although this pattern holds true for a wide range of parameter values it should be noted that the strength of the qualitative differences in model predictions depends on the composition of the training set as well as on the parameter values for the cues. Therefore we selected training set–test set combinations where strong qualitative results should be expected.

↓95

2. For the model comparison we used the nonparametric Wilcoxon test. The standard exemplar model performed significantly worse than the mapping model in the test phase and worse than the simplified exemplar model in both conditions, in all cases Z < −4.3, and p < .01. The regression model was significantly worse than the mapping model (Z = −5.44, p < .01, for both conditions) but performed equally as well as the simplified exemplar model (in all cases Z = −1.02, p = .32).

3. We also tested a version of the mapping model that included only the three cues that were substantially correlated with the criterion. However, overall this model did not perform better than a mapping model that considered all cues.

Authors’ Note

↓96

Bettina von Helversen and Jörg Rieskamp, Max Planck Institute for Human Development, Berlin, Germany. We would like to thank Anita Todd for editing a draft of this manuscript. This work has been supported by a doctoral fellowship of the International Max Planck Research School LIFE to the first author. Correspondence concerning this article should be addressed to Bettina von Helversen.

Bettina von Helversen
Max Planck Institute for Human Development
Lentzeallee 94, 14195 Berlin, Germany
Phone: ++49-30-82406699
Fax: ++49-30-82406394
Email: vhelvers@mpib-berlin.mpg.de


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML-Version erstellt am:
07.02.2008