The goal of this research was to investigate the processes underlying estimation from multiple cues and to examine if a heuristic model can be formalized to describe the estimation process. I proposed the mapping model as a possible cognitive theory for estimation and tested it in multiple experiments against several models of estimation put forward in the literature. Overall, the results clearly supported the mapping model as a realistic model of human cognition. It described participants’ estimations in diverse laboratory tasks, reaching from estimating the toxicity of bugs, the probability of cure from a disease, and the evaluation score of job candidates. Furthermore, it was well suited to capture prosecutors’ sentence recommendations for low-level crimes. Additionally, the research provided evidence that estimation processes are adapted to the environment. Thus how well the mapping model described participants’ estimations depended on the characteristics of the task. In the following, I will discuss under which conditions the mapping model performed well, when other models might be better suited to describe the peoples’ estimations and the resulting implications for the estimation process. Further, I will discuss the problems of model selection and the methods I used to address them. Finally, I will consider possible extensions and limitations of the approach and its generalizability to other areas of research.
Overall, the mapping model proved to be a suitable model for human estimation. In several studies it described participants’ estimations as good as or better than a linear regression model, an exemplar model, or QuickEst, a noncompensatory heuristic. Even though the mapping model is a deterministic model that ignores interindividual differences between the participants, it was surprisingly good at predicting individual estimations.
Consistent with the idea that cognitive processes are adapted to the structure of the environment, the model’s success was clearly dependent on the structure of the estimation task. In my dissertation I specified conditions under which the mapping model predicts participants’ estimations well. The first chapter provided evidence that the mapping model is influenced by the statistical structure of the estimation environment. It performed well in nonlinear environments, that is, if the criterion was a nonlinear function of the cues or followed a J-shaped distribution. The second chapter highlighted the importance of another aspect of the task: the availability of explicit task knowledge, and the ease with which it can be acquired. If participants were informed about the cue directions, the mapping model was clearly the best model to predict participants’ estimations. Furthermore, also if participants could easily abstract explicit knowledge about the cues during training, the mapping model outperformed the other models. Last, but not least, the mapping model did not only describe participants’ estimations in a highly constrained experimental task but the laboratory results shown in the first two chapters were supported by those in the third chapter, showing the success of the mapping model in a real-world example.
In the first chapter, I tested the mapping model against a multiple linear regression model, as the predominant model for quantitative judgment in the literature. Regression models have been widely and successfully used to capture judgment policies in many areas of social research (e.g., Hammond, 1996; Brehmer & Brehmer, 1988). However, researchers have questioned if people possess the cognitive capacities to perform the rather complex calculation required by a regression analysis. In this vein, it has been contended that regression models do not capture the process underlying a decision, even if they accurately predict decisions’ outcomes (Hoffman, 1960; Gigerenzer & Todd, 1999; see also Doherty & Brehmer, 1997). However, many researchers have argued that this criticism does not necessarily affect the idea that judgments follow a linear additive estimation process. According to this argument participants assign each cue a subjective weight and then add the weighted cues (Einhorn et al., 1979; Brehmer, 1994; Juslin et al., 2003; Juslin et al., in press). Multiple linear regression analysis is employed to estimate the subjective weights that participants assigned to the cues, but without the claim that it reflects the process of how participants determined the weights.1
Consistent with this literature supporting the success of linear additive models to describe human judgment (Juslin et al., in press; Juslin et al., 2003; Brehmer, 1994; Einhorn et al., 1979; Kalish, Lewandowsky, & Kruschke, 2004; Anderson, 1981; Hammond, 1996), the results in Chapter 1 (Study 3) showed that a linear regression model described participants’ estimations well, if the criterion was a linear additive function of the cues.
Interestingly, in this task, the correct cue weights could be abstracted easily, which could have enhanced the reliance on a linear additive strategy. Due to the deterministic nature of the task, the correct cue weights could be estimated from any two training objects, differing only on this cue during the training phase (Juslin, et al., in press). This was not possible in the tasks with skewed criteria or a nonlinear relation between the cues and the criteria, which could be one of the cues indicating to the participants that a linear additive strategy can not be successfully applied in these tasks.
In Chapters 1 and 2, I tested the mapping model against an exemplar-based model (Juslin et al., 2003). Exemplar models have been quite successful in describing behavior in categorization (Nosofsky, 1986; Kruschke, 1992), and were recently successfully put forward as a model for quantitative estimation (Juslin et al., in press). One advantage of exemplar models is that they can offer an accurate solution to tasks that can not be successfully solved by rule-based processes (Juslin et al. in press; Olsson, Enkvist, & Juslin, 2006). In this vein, Juslin and colleagues argued for a shift to exemplar-based processing if the criterion is a nonlinear function of the cues, and thus a linear additive model, such as multiple linear regression, could not successfully predict the criterion. However, the results in my studies suggested that these claims need to be further specified. The exemplar model was the best model to describe participants’ estimations if participants had no prior knowledge about the cues and could not easily acquire knowledge during the training phase. Nevertheless, if these requirements were met, I found, consistent with Juslin and colleagues (in press), support for a spontaneous shift to exemplar-based processing. This is notable, as Olsson et al. (2006) recently had only found a shift to exemplar-based strategy if participants were explicitly instructed to use this strategy. Olsson and colleagues argued that a shift only occurs spontaneously if the reliance on exemplar knowledge promised accurate performance at the beginning of the training phase. As in Study 2 (Chapter 2), the exemplar model led to an accurate performance during training, which could have increased the accessibility of exemplar-based strategies.
Overall, the results suggested that exemplar models in fact offer a valid description of human estimation processes, but that the situations in which they are applied are rather specific. More precisely, people seem to only fall back on exemplar-based strategies if other rule-based models are not suited to solve the task.
A second point worth noting concerns the parameterization of the exemplar model. Throughout my dissertation, I considered two versions of the exemplar model: a complex version with a free parameter for every cue and a simplified version assuming that all cues are weighted equally. In the majority of the tasks, the simplified exemplar model was better in predicting participants’ estimations than the more complex standard version, indicating that the original version of the exemplar model is prone to overfitting, and that the attention parameters need to be interpreted with caution. However, in the reanalysis of Experiment 1 by Juslin et al. (in press) and the second study in Chapter 2, there was a stable minority of participants best described by the complex exemplar model. This suggests that sometimes the additional parameters can prove to be necessary and informative, especially if not all cues are predictive and thus not used for the estimation. Likewise, Rehder and Hoffman (2005a, 2005b) showed that the parameter of exemplar models (see also Kruschke, 1992; Nosofsky, 1986) match the actual attention that participants allocate to the cues. However, it leaves the question of, how many parameters should be assumed a priori, and under which conditions the parameters can be reliably interpreted as reflecting the attention given to the cue (Medin & Schaffer, 1978; Juslin et al., 2003)? Results by Rehder and Hoffman (2005b) indicated that this could also be dependent on the duration of training, as they found a learning pattern where initially all cues were considered, but attention gradually concentrated on the relevant features.
In the first chapter, I also considered the heuristic QuickEst as a competitor for the mapping model and a possible alternative model for quantitative estimation (Hertwig et al., 1999). In particular, in the J-shaped condition, a good performance of QuickEst would have been expected. However, the overall support for QuickEst was rather weak. Though in the very first study about 20% of the participants were best described by QuickEst, it did not perform well in the second study. Likewise, an empirical study by Hausmann et al. (2007) did not find any support for QuickEst as a model of human behavior. However, in the studies presented here, one reason for the lack of support for QuickEst could lie in the design of the task. Participants were presented with all relevant information simultaneously and free of charge. Research on inferential decision making in pair comparisons, however, indicated that noncompensatory strategies, like QuickEst, could be favored under time pressure or if information search is costly (Rieskamp & Hoffrage, in press; Payne, Bettman, & Johnson, 1988; Bröder & Schiffer, 2003). Like “tally,” the mapping model is an information intensive strategy including all cues that are considered relevant. QuickEst, on the other hand, ignores a large part of the information, and thus QuickEst might be more readily employed if information has to be retrieved from memory or involves others’ costs (Bröder & Schiffer, 2003).
The mapping model makes explicit assumptions about the processes underlying estimation. Thus, the mapping model’s success in describing the estimation process has specific implications regarding human estimation processes. According to the mapping model, people group objects together into a few categories based on the amount of evidence provided by the cues, and then select a typical estimate for each category, reflecting the central tendency of the objects falling into this category. More specifically, it implies that all cues are weighted equally, and that similar estimates follow from grouping objects together at an abstract level and not by similar configurations of cues. In these assumptions, the mapping model differs from the other models of estimation.
With the assumptions that all relevant cues are weighted equally, the mapping model differs from the regression model and the standard version of the exemplar model, which both propose that cues are weighted according to their importance. Overall, the results of my dissertation provided evidence for the equal weight approach assumed by the mapping model in nonlinear environments, but not for linear environments. In nonlinear estimation tasks, the mapping model accurately predicted participants’ estimations in the laboratory. Likewise, the mapping model outperformed the regression model in a naturalistic estimation task. Furthermore, the exemplar model with a single attention parameter for all cues performed better than an exemplar model that potentially ignored cues, resonating with research highlighting the good performance of unit weight models in prediction tasks (Dawes, 1979). Although the unit weight approach of the mapping model was successful even if the cues differed clearly in their predictiveness (Chapter 1), the results in Chapter 2 indicated that this might be limited to situations where knowledge about the cues is available. When the cues differed substantially in their predictiveness and participants had no prior knowledge, at least some participants were better captured by an exemplar model, allowing for differential weighting of the cues. This is also consistent with results by Rehder and Hoffman (2005b) who reported a shift of attention measured by eye movements to relevant cues during training.
A second related assumption of the mapping model is that object category membership is computed on the level of the summed evidence ignoring which specific cue contributed to the cue sum. Here, the mapping model differs from the exemplar model. While the exemplar model determines the similarity of two objects by the number of matches on the cues, and thus puts emphasis on specific configurations of cues, the mapping model assumes that objects are grouped together based on the total evidence provided by the cues, regardless of how many cues actually match. The clear qualitative pattern in the studies in Chapter 2 provided evidence that, at least if knowledge about the cue directions is available, participants’ behavior corresponded to the assumptions of the mapping model, estimating similar values for objects with the same number of positive cues regardless of the dimension the cues were in. This also relates to Brunswik’s idea of vicarious functioning (1952), that is, cues can be replaced by each other to reach the same judgment.
A further clear result from the experimental studies was that no single model could explain participants’ behavior in all situations. Instead, the results indicated that whatever model was most successful in describing participants’ estimations was a function of the environment. Similar to pair comparison tasks, participants adaptively shifted their estimation strategies to match the structure of the task (Todd & Gigerenzer, 2007; Rieskamp & Otto, 2006; Rieskamp, Busemeyer, & Laine, 2003; Payne et al. 1993; Juslin et al., in press). Consistent with the simulation study in Chapter 1, the mapping model performed better than or as good as the other models if the criterion followed a skewed or linear distribution. Likewise, the mapping model or the exemplar model was best if the criterion was a nonlinear function of the cues and thus a linear regression model was not suited to solve the task. However, if the criterion was a linear function of the cues, and thus linear regression was the optimal model to solve the task, participants’ estimations were consistently best described by a regression model. Moreover, the last paper indicated that the adaptive match between models and task structures was not just a result of the artificial nature of the task environment. In a real-world estimation task with a skewed criterion, when predicting sentence magnitude for low-level crimes, the mapping model also outperformed the regression model.
Likewise, the shift from the mapping model to exemplar-based processing, as reported in Chapter 2, can be regarded as adaptive. As knowledge about the correct cue directions is indispensable for the accurate performance of the mapping model, the ease with which the mapping model can be applied is closely tied to prior knowledge about the cues. If the cue directions are clear, the mapping model only demands minimal computation and can be correctly executed with little training. Thus, relying on the mapping model leads to a computational advantage if it can be applied to master the estimation task from the beginning. However, if the mapping model was not easily applicable because detecting the correct cue directions was difficult and a linear additive model could not successfully be applied, shifting to an exemplar-based estimation process can be considered an adaptive response to the task.
Though my results indicate an adaptive shift in processing dependent on the task structure, it remains unclear if the shift is due to an automatic error-driven learning process (e.g., Rieskamp & Otto, 2006; Erickson & Kruschke, 1998; Ashby, Alfonso-Reese, Turken, & Waldron, 1998) or to deliberate and voluntary processing (Haider, Frensch, & Joram, 2005). Although I did not model the learning process, it seems reasonable to assume that participants did not commit, at the onset of the task, for one type of processing, but learnt during the task which type of processing was most successful (Rehder & Hoffman, 2005b). However, it seems probable that also deliberative and controlled processes are involved. Recently, Haider and colleagues (2005) suggested that explicit knowledge stems from voluntary inferential processes. This suggests that learning the cue directions and thus the application of the mapping model, if no prior knowledge is available, could depend on a voluntary inferential effort of the participants to acquire this information.
The goal of my dissertation was to determine which model was best suited to describe the participants’ estimations. However, this raises methodological concerns, especially if models of differing complexity are compared, as just selecting the best-fitting model often leads to the wrong choice (Robert & Pashler, 2000; Pitt, Myung, & Zhang, 2002). Although flexible models, that is, models with more free parameters, are better able to fit a specific dataset, they run the risk of overfitting the data. That is, they not only capture the systematic variance due to the underlying process but also fit random variance in the data. Thus, the best-fitting models are not necessarily best to accurately predict new data, making it indispensable to take model complexity into account for model selection. In my research, I addressed the problem of model selection with different methodologies.
In Chapters 1 and 2, I implemented a generalization test (Busemeyer & Wang, 2000). In a first step, I set the models’ free parameters to equate the models’ flexibilities by fitting them to a training set. Next, I predicted participants’ estimates for a test set by computing the test objects’ criterion values based on the obtained parameter values. The test set consisted of “old” exemplars, that is, test objects with the same cue values as the training objects, and of “new” objects, that is, test objects that they did not encounter during training, forcing the models to make sample predictions. Generalization tests go beyond pure cross validation: They not only ensure that only a model capturing the process underlying the estimation process is able to make accurate predictions, they also warrant that good model performance is not restricted to the objects encountered during training, but can be generalized out of the tested sample. However, they make the underlying assumption that the same processes govern the generation of data in both samples.
Although quantitative measures of model performance are informative and allow a first test, if a model can capture human behavior, they do not offer any insights if the model assumptions actually correspond to the cognitive process generating the data. Furthermore, models often make very similar predictions, making it difficult to differentiate between them on a pure quantitative level. In the first chapter, I addressed this problem by selecting test objects on which the models differed in their predictions to increase the possibility to differentiate between the models. In the second chapter, I went one step further and tested qualitative predictions to underpin the quantitative model test. Qualitative tests are highly desirable because they can be constructed to be widely independent of model parameters, and they allow a better test of the models’ assumptions (Pitt, Kim, Navarro, & Myung, 2006). My goal was to provide some evidence that the participants’ behavior actually corresponds to the model assumptions about the estimation process. For this I focused on differences in the core assumptions the models make about the estimation process. For one the mapping model assumes that objects with a same cue sum are grouped together and receive the same criterion value as estimate. However, objects with differing cue sums are assigned to different categories and thus receive differing estimates. In contrast, the exemplar model relies on the similarity relations of the test objects to the training objects, which can be similar for two objects with differing cue sums. However, if two objects are maximally different, that is they do not match on a single cue, it is probable that they will also differ in their similarity relations to the training objects and thus in the estimates for the criterion. Based on these model assumptions, I constructed test conditions in which the models differed in their ordinal predictions, largely independent from the model parameters. Thus, when the participants’ estimations matched the model predictions, this gave a strong indication that the model in fact captured the cognitive process underlying the estimations.
In the third paper, I relied on a different methodology for model selection. Similar to Chapters 1 and 2, one methodological problem was that I was comparing models of differing complexity, and thus different abilities to fit a data set. However, this study offered a further methodological problem as the relevant cues for the estimation task were not clear, but one goal of the analysis was to identify which predictors reliably influenced sentencing. In regression analysis, often methods based on significance testing, such as stepwise regression procedures, are used to find the best model to describe the data and to quantify the impact that the cues have on the estimation. However, these methods are often unreliable, potentially leading to different results if cues are stepwise included or excluded. Furthermore, focusing on a single model ignores the uncertainty involved in model selection and the conclusions that can be drawn from a single model. To address these problems, I chose a Bayesian Model Averaging method (BMA) by Raftery (1995). Based on the BIC approximation for the Bayes’ factor (Schwartz, 1978; Raftery, 1995), the BMA method can be used to more reliably identify which models most probably underlie the data, and takes model complexity into account by penalizing a model for its number of free parameters. Moreover, it takes model uncertainty fully into account to determine which predictors have a significant impact on the estimation. Thus, it seemed to be a more reliable methodology to analyze the data.
Though the mapping model was quite successful in describing participants’ estimations, there are limits to its applicability. In the following, I will sketch some of the boundary conditions for the mapping model and how it could possibly be extended in the future.
The mapping model does not have a mechanism incorporated to decide which cues should be included or to stop search for further information, but works on the assumption that all relevant cues are included in the estimation. This makes the assumption that prior knowledge about which cues are important is available and can be incorporated into the analysis, or the cues are preselected for their relevance (Brehmer, 1994). Thus, in real-world estimation tasks, the applicability of the mapping model could be limited because often an enormous amount of possible relevant cues can be identified, and knowledge about the cues’ quality is not easily available (Brehmer & Brehmer, 1988). One way to solve this problem is to employ statistical methods, such as the BMA (Raftery, 1995), to identify which cues influenced the estimation processes.
However, a second possibility would be to implement a search and a stopping rule to model how the decision to include or exclude a cue is made (Gigerenzer & Todd, 1999).
The mapping model assumes that all cues are weighted equally; an assumption which was largely supported by the data in my dissertation. In a similar vein, unit weight linear models have been found to be as good or better as proper regression models, providing evidence for the robustness of a unit weight approach (Dawes, 1979; Einhorn & Hogarth, 1975). In this vein Dawes and Corrigan (1974) wrote:
“The whole trick is to decide what variables to look at and then know how to add.” (p. 105).
However, the assumption of equal weights is a simplification which will not hold for all situations. It has been repeatedly shown that, in tasks with few cues, participants are able to differentially weight cues and can learn to ignore irrelevant cues (e.g., Castellan, 1973; Brehmer, 1973; Klayman, 1988; Kruschke & Johansen, 1999). Furthermore, Rehder and Hoffman (2005b) showed that spatial attention measured by eye movements was eventually restricted to relevant cues. If several predictive cues are available, it seems unrealistic that the correct weights for all cues are correctly learnt, making a unit weight approach more probable. However, if only two or three cues are available and furthermore strongly differ in their validity, it seems probable that humans would learn not to rely on all cues, but concentrate their attention on the cues offering predictive information. Thus, in this situation, the simplification by the mapping model might lead to worse performance and not reflect the behavior of the participants, restricting it to situations with several predictive cues.
Similar to the exemplar model, the mapping model does not extrapolate over the range of criterion values encountered during training. This seems to be a reasonable assumption if multiple cues are available and the environment is nonlinear (Juslin et al, 2003; Juslin et al., in press). Likewise, the results of Study 3 (Chapter 1) only provided evidence for extrapolation if the criterion was a linear additive function of the cues. However, research has shown that people are able to extrapolate in a one-dimensional function learning task (e.g., DeLosh, Busemeyer, & McDaniel, 1997; Kalish et al., 2004). Moreover, in the second study (Chapter 2), participants extrapolated over the experienced range in a condition in which the participants’ estimations were otherwise well-described by the mapping model. This suggests that it could be plausible to consider an extrapolation mechanism for the mapping model. For instance, if the maximum is known and an object falling outside of the existing categories is encountered, a new category could be formed, and a typical criterion value falling between the criterion for the closest category and the maximum value estimated.
In the first two papers, I presented a version of the mapping model relying on binary cues. However, cues often provide more finely-graded information which can be used for the estimation. It seems to be sensible to assume that the continuous information is used and not just reduced to binary information. Thus, in the third chapter, I extended the mapping model to apply it to continuous cues. This extension differed in some respect from the binary version in Chapters 1 and 2, even though the model predictions are equivalent. More specifically, the continuous version of the mapping model assumes that, consistent with range frequency theory (Parducci, 1974), the perception of the magnitude of cues’ values is normalized, a mechanism which was not necessary in the binary version. Second, in the continuous version of the mapping models, the cues are integrated by averaging the cue values instead of adding positive cue information. Furthermore, in the first two papers, the number of categories and category membership is determined by the number of positive cues, but in the continuous version, the number of categories is set to seven (Miller, 1956), which are found by dividing the range of averaged cue values into equally sized categories.
As the continuous version of the mapping model made identical predictions in the tasks reported in Chapters 1 and 2, it was impossible to evaluate the two versions against each other in this work. However, from a theoretical perspective, relying on an averaging approach seems plausible, resonating with research on information integration (Anderson, 1965, 1967; Juslin et al., in press). Similarly, the range frequency theory provides a psychological plausible theory of how continuous cues are perceived (Parducci, 1974). Moreover, the results from Chapter 3 can be seen as a first support that the mapping model can be successfully extended to continuous cues. However, a more rigorous examination with an experimental approach of a continuous version is certainly necessary.
The success of the mapping model in predicting sentencing recommendations indicated that it can serve as a model of quantitative estimation in real-world tasks; in particular if similar conditions, as identified in the first two chapters, are encountered. This could be more frequent than appears at first glance: In many estimation tasks, we encounter in our daily lives that we possess explicit knowledge about the task, in particular about the cues. Knowledge about the task can not only be acquired through personal experience but can also be socially transmitted. For example, legal or medical education consists, to a large part, of transmitting knowledge about which cues are predictive in a specific task, such as diagnosing a specific disease or deciding if a defendant violated the law. Furthermore, skewed distributions are frequent. Because general growth processes commonly generate power law distributions (Gabaix, 1999), diverse phenomena ranging from city sizes to record sales or the size of computer files follow J-shaped distributions (for a review, see Schroeder, 1991). Thus, the conditions for the successful application of the mapping model could frequently be at hand. In a similar vein, consistent with the general idea of the mapping model, research in diverse areas of psychology has shown that unit weight summary indices of risk and protective factors are often the most reliable predictor to assess the risk of juvenile delinquency or childrens’ intelligence scores (e.g., Sameroff, Seifer, Baldwin, & Baldwin, 1993). In sum, the applicability of the mapping model is not restricted to laboratory or legal decision-making tasks, but can easily be employed to model quantitative estimations in a variety of areas.
Past research on quantitative estimation has almost exclusively relied on linear regression models to model the human estimation processes. In spite of the success of regression models in predicting the outcome of estimation, these models have been criticized for not capturing the cognitive process underlying estimations (Gigerenzer & Todd, 1999; Hoffman, 1960). Recently, alternative computational models were proposed for the area of quantitative judgment and estimation (e.g., Juslin et al., 2003; in press). My dissertation presents an important contribution to this literature, proposing a new cognitively inspired theory for quantitative estimation that outperformed current models of estimation in capturing peoples’ behavior. In particular, in situations in which linear regression did not capture human behavior, the mapping model offered a plausible alternative solution. Furthermore, the mapping model explained peoples’ estimations not only in several laboratory studies but also in a real-world environment. This suggests that its applicability is not restricted to laboratory tasks, but can be potentially employed in a diverse set of tasks. Thus, the mapping model offers an interesting extension for the adaptive toolbox (Gigerenzer & Todd, 1999), providing a further tool for quantitative estimations.
A second contribution of my research concerns the link between environmental structures and cognitive processing, extending existing research on the adaptive nature of human decision making to quantitative estimation (Payne et al., 1993; Gigerenzer, Todd, & the ABC Research Group, 1999). More specifically, my work showed that specific task structures differentially affected cognitive components that were essential for the models’ assumptions about the estimation task. Consequently, model performance was, to a high degree, a function of the environment. In sum, my research not only highlights the impact of the environment on cognitive processing but also the importance of precise assumption about cognitive processes for psychological research.
1 However, it should be noted that the assumption that the estimation process in fact follows a linear additive combination of the cue information makes it reasonable to impose restrictions on the regression model, for example, see the constraints assumed by the cue abstraction module of the Sigma model (Juslin et al., in press).
|© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.|
|DiML DTD Version 4.0||Zertifizierter Dokumentenserver|
der Humboldt-Universität zu Berlin
|HTML-Version erstellt am: