Chapter 3:
Predicting Sentencing for Low-Level Crimes:
A Cognitive Modeling Approach

↓97

Abstract

Laws and guidelines regulating legal decision making are often imposed without taking the cognitive processes of the legal decision maker into account. In the case of sentencing, this raises the question of to what extent the sentencing decisions of prosecutors and judges are consistent with legal policy. Especially in handling low-level crimes, legal personnel suffer from high case loads and time pressure, which can make it difficult to comply with the often complex rulings of the law. To understand sentencing decisions it is beneficial to consider the cognitive processes underlying the decision. An analysis of fining and incarceration decisions in cases of larceny, fraud, and forgery showed that prosecutors’ sentence recommendations were not consistent with legal policy. Instead they were well described by a cognitive theory of quantitative estimation that assumes sentence recommendations rely on a categorization of cases based on their characteristics.

Predicting Sentencing for Low-Level Crimes: A Cognitive Modeling Approach

↓98

How are criminal sentences determined? Although legal systems differ from country to country, judges worldwide struggle with the problem of determining which factors should be considered and how they should be combined to form appropriate and just sentences. Even if the legal system provides guidelines to regulate the sentencing process, the question still remains how well judges and other legal personnel follow the prescribed policies (Ruback & Wroblewski, 2001). Research on sentencing has a long tradition of identifying deviations from legal policy: Extralegal factors such as race or gender have been found to influence sentencing, and in some cases legal factors are not properly taken into account (e.g., Davis, Severy, Kraus, & Whitaker, 1993; Ebbesen & Konečni, 1975; ForsterLee, ForsterLee, Horowitz, & King, 2006; Henning & Feder, 2005; Johnson, 2006; Ojmarrh, 2005). This indicates that the cognitive processes of legal professionals do not always lead to sentencing that is consistent with the sentencing policy specified by the law (Dhami & Ayton, 2001; Ebbesen & Konečni, 1975; Hertwig, 2006; Tata 1997; Van Duye, 1987).

The goal of this article is to investigate to what extent sentencing decisions deviate from legal regulations and how these deviations can be explained by cognitive models of the sentencing process. For this purpose we test whether prosecutors’ sentence recommendations can be better explained by a cognitive model or by adherence to legal policy. Additionally we examine whether the same cognitive processes underlie both fining and incarceration decisions.

Heuristics in Legal Decision Making

The legal decision environment is highly complex and the workload of legal personnel heavy; decisions need to be made under time pressure and often little or no feedback regarding the quality of the decision is available (Gigerenzer, 2006). Even if specific rules exist to guide the decision process, they are often too complex to be executed in the allotted time (Ruback & Wroblewski, 2001). Not surprisingly, then, research on sentencing has found that often only a small part of the available information is used (Ebbesen & Konečni, 1975, 1981) to determine a sentence.

↓99

Heuristics are simple strategies that allow decisions to be made without much information or complex computations. Although there is disagreement on to what extent heuristics allow good decisions and how they should be formalized (Gigerenzer, 1996; Kahneman &Tversky, 1996), there is converging evidence that heuristics provide good accounts of people’s decision processes (e.g., Bröder & Schiffer, 2003; Payne, Bettman & Johnson, 1993; Rieskamp & Otto, 2006). In particular, when making complex decisions under time pressure, reliance on heuristics increases (Payne, Bettman, & Johnson, 1988; Rieskamp & Hoffrage, in press), making the legal domain an area conducive to decision-making heuristics. In fact, reliance on heuristics has been shown in several areas of legal decision making, such as bail decisions (Dhami & Ayton, 2001; Dhami, 2003; Leiser & Pachman, 2007), tort law (Guthrie, Rachlinksi, & Wistrich, 2001), and sentencing (Englich, Mussweiler & Strack, 2006; for an overview see Colwell, 2005; Engel & Gigerenzer, 2006).

Especially for the domain of low-level offenses where the decision situation can be relatively transparent and the costs of wrong decisions low, reliance on heuristics might be a way to deal with the immense workload involved. Although regrettably widely ignored by research (for an exception, see Albrecht, 1980), the majority of the cases in courts are low-level crimes and petty offenses. For example, in Germany, about 80% of the cases are punished with a fine (Langer, 1994; Meier, 2001), an alternative to incarceration that can only be imposed in minor cases. Thus, particularly in cases sentenced with a fine, heuristics might be prevalent.

Sentencing Decisions by the Prosecution

As in most legal systems, in Germany the sentence is determined by the judge. However, the judge makes this decision after hearing sentencing recommendations from both the prosecution and the defense. Research has shown that the sentencing recommendation of the prosecution is the single most important factor influencing the decision of the judge (Ebbesen & Konečni, 1975; Schünemann, 1988). For instance, Englich and Mussweiler (2001) found that, all things being equal, the recommendation of the prosecution significantly influenced a criminal’s sentence; similarly Dhami and Ayton (2001) showed that in bail decisions, British magistrates followed almost without exception the recommendation of the prosecution. Additionally the prosecution can directly impose fines by penalty order. If the defendant accepts the fine, the case never goes to trial. These findings indicate that to understand which factors influence a sentence’s magnitude, it is indispensable to first investigate the process by which the prosecution determines the sentence recommendation.

↓100

How should the prosecution do this? German sentencing is regulated by the German penal code (Strafgesetzbuch, StGB; Tröndle & Fischer, 2007), more specifically by articles 21, 23, 46, 47, and 49 and by decisions of the German Federal Court of Justice. Both judge and prosecution are bound by the same legal regulations. The general goal is to achieve an appropriate sentence that is proportional to the guilt of the offender. For each offense there exists a sentencing range that establishes a minimum and a maximum sentence that can be imposed. Within these often rather broad sentencing ranges, the placement of the sentence depends on the seriousness of the case and is largely left to the discretion of the judge. The judge’s task, as well as the prosecution’s, is to evaluate the factors mitigating or aggravating the guilt of the offender and to determine the sentence accordingly. Which factors should be considered as mitigating or aggravating is specified in the penal code. Article 46 of the StGB alone lists over 20 factors relevant for the sentencing decision although it cautions that it is not an exhaustive list.1

What the German penal code (§ 46) does not provide is explicit guidelines on how the factors should be combined. However, the German Federal Court of Justice recommends that mitigating and aggravating factors be balanced in an integrative evaluation of the overall picture (Schäfer, 2001). According to the predominant opinion in the legal literature, this is best accomplished with a three-step sentencing process: All relevant factors are evaluated according to the direction of their effect on the sentence (aggravating or mitigating), then weighted by their importance, and finally added up to form the sentence (Bruns 1985, 1988; Foth, 1985; Schäfer, 2001; but see Mösl, 1981, 1983; and Theune, 1985a, 1985b). Thus, the legal prescription asks for a linear additive decision process.

Models of Sentence Magnitude

How can the underlying cognitive process of sentencing decisions be described? In many areas of psychology multiple linear regression models are applied to analyze decision policies (Doherty & Kurz, 1996; Brehmer, 1994, Cooksey, 1996). Likewise, in the legal domain these have been the predominant models used to analyze sentencing policies and to identify which factors influence sentence magnitude (Engen & Gainy, 2000; Johnson, 2006; Kautt, 2002; Kautt & Spohn, 2002). Regression models are especially attractive to model sentencing, as the three-step model is consistent with their linear additive approach (Brehmer, 1994; Hammond, 1996). More specifically, regression models assume that quantitative judgments, such as determining the magnitude of a sentence, can be modeled as a process of weighting and adding information (Doherty & Brehmer, 1997; Einhorn, Kleinmuntz & Kleinmuntz, 1979; Juslin, Karlsson, & Olsson, in press). Each factor is weighted according to its importance and the judgment is determined as the sum of the weighted factor values. The weights that best characterize the sentencing process are found by minimizing the squared deviation between the actual and the estimated sentence (cf. Cohen, Cohen, West, & Aiken, 2003; Cooksey, 1996):

↓101

(1) ,
where the estimate, ŷ p , for the case p is given by the sum of the product of the factor values, c j , of the factors j with their respective weights, β j , plus an intercept, β 0.

If prosecutors and judges in fact weigh mitigating and aggravating factors against each other and then add up the weighted factor values to arrive at a final sentence, sentencing should be well captured by multiple regression. In this case multiple regression allows us to identify the factors that influenced the sentencing decision. Furthermore, if the sentencing policy corresponds to the law, all legally relevant factors should make a significant contribution, whereas extralegal factors should not be considered. Thus, analyzing sentencing with a multiple linear regression approach allows us to compare the judges’ and prosecutors’ sentencing policies to the policy required by law.

The Mapping Model: A Cognitive Theory of Quantitative Estimation 

Even though multiple regression can capture decision outcomes, its value as a model of human judgment processes is debatable. Researchers have doubted that people actually perform the relatively complex calculations required by multiple regression and therefore have argued that multiple regression does not provide a valid description of the cognitive process underlying a decision (Brehmer, 1994; Einhorn et al., 1979; Gigerenzer & Todd, 1999; Hoffman, 1960). In response to this criticism we have proposed an alternative, called the mapping model, that we consider to be a psychologically plausible alternative to multiple regression. The mapping model provides a cognitive theory for quantitative judgments and has been successful in predicting people’s estimations (Helversen & Rieskamp, in press).

↓102

Generally, the mapping model assumes that when people make a judgment about a case or object, they assign the object to a category and use a typical criterion value for this category as an estimate. Categories are formed on the basis of previously encountered objects, and the category membership is defined by the objects’ characteristics or features. The typical criterion value of a category is represented by the median criterion value of all cases belonging to this category. For example, to estimate the selling price of a house, the mapping model assumes that one would consider the house’s features that speak in favor of a high price (e.g., great location, a deck, a swimming pool), categorize the house according to its average value on theses features into a certain price class, and estimate a price that is typical for houses within this price class, that is, the median price for which houses in this category were sold for.

Helversen and Rieskamp (in press) showed that the mapping model in comparison to multiple regression was particularly suitable for predicting people’s estimations if the cases’ criterion values followed a skewed distribution, which is typical of sentencing decisions (Meier, 2001). Helversen and Rieskamp (in press) tested the mapping model under highly controlled experimental settings; yet these conditions are similar to the conditions of sentencing decisions, suggesting that the mapping model might be a good model for sentencing decisions.

How is the mapping model applied for sentencing decisions? Commonly, each case is described by several characteristics or factors relevant for sentencing. To apply the mapping model, first cases are categorized according to their mean value on these factors.2 To allow comparisons of factors with different dispersions all factors are normalized by applying range frequency theory (Parducci, 1974). Using range frequency theory for normalization instead of a purely statistical technique (i.e., z-transformation) has the advantage that a psychologically more plausible representation of how the magnitude of a factor value is subjectively perceived by an individual is accomplished (for details see Appendix A). After normalizing all factors the mean factor value for all encountered cases is determined. This mean value represents the seriousness of the case. Next, the minimum and the maximum value of cases’ seriousness are determined and the range is divided into seven equally sized categories; that is, category boundaries are chosen so that the distance between category boundaries is the same for all categories. Due to humans’ limited cognitive capacities (see also Miller, 1956) only a limited number of categories is assumed. Next, the typical sentence for each category is computed by taking the median sentence of all previously encountered cases that fall into the same category. The sentence for a new case is simply determined by establishing its category membership and then using the typical sentence of that category as a sentence for the new case. Figure 7 gives an overview of the processing steps assumed by the mapping model.

↓103

Figure 7: The processing steps of the mapping model. In the first step, the relevant cues are evaluated and rated according to their severity. In the second step the cues are integrated by establishing the average severity score. Then, the case is categorized according to its average score and the typical criterion value, that is the sentence for this category is retrieved. In the last step the retrieved criterion value is used as an estimate

To give an example how the mapping model can be applied to sentencing we will describe how a typical case would be sentenced according to the mapping model. Imagine a case of shoplifting: The defendant has confessed stealing minor goods in five cases. The net worth of the stolen goods amounts to $100 and the defendant has three prior convictions for theft. In the first step the prosecutor considers the cues, that is the characteristics of the case relevant for sentencing such as the number of charges, the amount of money stolen and the number of prior convictions. Next, she rates the severity of each cue; for instance, the amount of money stolen was low, but three prior convictions are of medium severity and so forth, thereby standardizing the cues. After that she forms an overall impression of the case, taking the average of the cues’ severity scores. Based on this average score she categorizes the case as a theft of medium seriousness and retrieves the typical sentence for this category. In the last step she determines the sentence recommendation based on the retrieved category value.

When comparing the mapping model with the regression model two important differences can be emphasized. First, unlike the regression model, the mapping model gives all factors the same weight for assigning a case to a category. Second, in contrast to the regression model, the influence of one single factor in the mapping model can interact with the other factors. The factors determine which category is used to make an estimate. Thus, how an estimate changes when one factor provides positive compared to negative evidence depends on the evidence of the other factors. Here the mapping model differs substantially from the regression model, where a factor’s impact on the estimate is independent of the other factors.

Fines versus Incarceration 

↓104

The second goal of this article was to investigate differences between fines and incarceration sentences. Low-level offenses can be sentenced by one or the other. Although much research has examined which factors influence sentencing length in incarceration sentences (ForsterLee et al., 2006; Johnson, 2006; Langer, 1994; Oswald, 1994; Schünemann, 1988), to our knowledge there is a lack of research on fining and the differences between incarceration and fining decisions (for an exception see Albrecht, 1980; Oswald, 1994). However, fining and incarceration are often viewed as serving different sentencing goals (Schäfer, 2001). This suggests that fining and incarceration decisions could be based on different factors and the cognitive processes underlying the decisions could also differ.

Fining decisions could be especially likely to induce heuristic decision making. As fines constitute the majority of the sentences (Meier, 2001), they represent the biggest proportion of the prosecution’s workload. More serious cases might be allotted more time and be processed more systematically, as they are less frequent, incur more public interest, and have a higher probability of appeal. Thus cases sentenced by a fine could differ systematically from cases that are sentenced with incarceration. To investigate these questions, we conducted an analysis of trial records for three common offenses.

Study: Analysis of Trial Records

The first goal of the study was to model sentencing decisions for common minor offenses, investigating how well the sentencing procedure corresponds with the legal policy and if sentencing decisions are best described by a cognitive theory of quantitative estimations. The study’s second goal was to examine whether fining differs systematically from incarceration decisions and which factors influence sentence magnitude in the two decisions.

↓105

We approached these goals by conducting an analysis of trial records. In comparison to an experimental approach, this type of analysis has the advantage that it is based on real cases and does not need to be limited to the small number of factors that can be manipulated in an experimental study. Furthermore, the complexity of the real cases as well as the time pressure of the daily case load could be decisive for the cognitive process underlying the sentencing decisions, favoring the analysis of real case data.

Method

We focused on three common offenses against property, namely, theft, fraud, and forgery. This allowed us to include different offenses while measuring the severity of the offense on a common scale—money—and keeping the sentencing range equal (0–5 years for a common case and 3–6 months to 10 years for an aggravated case).To investigate the sentencing process we collected trial records from a small Brandenburg Court (the Amtsgericht Bad Freienwalde), for the years 2003 to 2005. All records with a main charge of theft, forgery, or fraud (§§ 242, 243, 244, 248, 263, and 267) were included in the analysis. Trial records included the indictment, the transcript of the trial, orders by the prosecution, and the verdict. Based on these documents we identified offense and offender characteristics relevant for sentencing, the sentencing range, and the recommendations of the prosecution and the defense.

Categorization system. Offense and offender characteristics were classified by a categorization system that was based on the German penal code (§§ 46, 47, 52, 53, 242, 243, 244, 248, 263, and 267) in close cooperation with legal experts in the area of sentencing. Classification of a factor rested upon the indictment, the trial transcripts, and the verdict. Besides the legal factors, the categorization system also included extralegal factors that have been found to affect sentencing (e.g., Ebbesen & Konečni, 1975; ForsterLee et al., 2006). Table 12 provides an overview and a description of the factors.

↓106

Table 12: Overview of the categorization system

Factors

Description

Values

Offender information

Gender

Male vs. female

0 vs. 1

Nationality

German vs. non-German

0 vs. 1

Age

20–80 years

Family status

Married or single with kids vs. single and no kids

0 vs. 1

Occupational status

Employed, apprenticed, or student vs. unemployed

0 vs. 1

Economic status

Above poverty line vs. below poverty line (ca. €900 per month)

0 vs. 1

Diminished capacity

No diminished capacity vs. diminished capacity

(Diminished capacity was assumed if the defendant had a psychological or medical diagnosis of a mental or organic disorder)

0 vs. 1

No. of prior convictions

0–14

Type of last sentence

Fine, incarceration, or incarceration with probation

Dummy coded

Probation status

Offender was not on probation when the offense was committed vs. was on probation

0 vs. 1

Offense characteristics

Net worth of property violated

€0–80,000

No. of charges

1–112

No. of offenders

1–3

Mitigating evidence I

Coded as a summary factor; one point was added if there was external pressure to commit the crime (e.g., an emergency situation or blackmail), the crime was a failed attempt, the offender’s role was secondary, or the offender’s capacity was diminished due to alcohol

0–2

Mitigating evidence II

One point was added if the offender had no prior convictions or the net worth of property violated was below €30

0–2

Remorse

Defendant showed no remorse vs. showed remorse, offered reparation or amends

0 vs. 1

Confession

Defendant did not confess vs. defendant confessed

0 vs. 1

Aggravating evidence

One point was added if any of the following conditions was fulfilled: a high number of offenses (> 5), over a long period of time (> 6 month); the offense was carefully planned; perseverance in the face of obstacles; incited others to commit the crime; used unnecessary violence

0-2

Legal regulations

Offense type

Theft, fraud, or forgery

Dummy coded

Summary penalty

A summary penalty was not given vs. a summary penalty was given

0 vs. 1

Penalty order

Sentencing by trial vs. sentencing by penalty order

0 vs. 1

Sentencing range

Max. sentence 5 years vs. max. sentence 10 years

0 vs. 1

The categorization system included personal information on the offender, as well as legally relevant factors concerning the offender’s criminal and personal history. To capture the severity of the crime several characteristics of the offense were coded, such as the number of charges and the net worth of property violated. The presence of mitigating and aggravating factors concerning the conduct of the crime were coded in two summary factors capturing the amount of mitigating and aggravating evidence. If the description of a case in the indictment and the trial protocols left doubt about the presence of a mitigating or aggravating factor the verdict was used as a reference. Only if the behavior in question was mentioned in the rationale of the verdict was it considered as mitigating or aggravating evidence. Additionally, the presence of a confession and mitigating behavior after the crime, such as remorse, were coded as two separate factors. A further mitigating summary factor coded whether the net worth of property violated was low enough to count as a less severe case (§ 248) and whether the offender had no prior record; these are two characteristics specifically identified by the German penal codes that mitigate the sentence regardless of the overall impact of property violated or of any prior record. Additionally we included three factors concerning legal regulations, such as, for instance, the sentence range applied. Finally, we did not include the recommendation of the defense in the analysis, because in most cases the defendant did not have a defense attorney present during the trial.

For most variables a nominal or ordinal level of measurement was assumed. Nominal variables were binary coded, indicating the presence or absence of a factor; ordinal variables were dichotomized by a median split. For the variables number of charges, offenders, and prior convictions, amount of mitigating or aggravating evidence, and net worth of property, an interval scale was assumed. Two independent raters coded the cases. The raters’ agreement was satisfactory on all subjectively rated factors (r = .77, SD = .12). Non-random missing data were analyzed and missing values substituted with the mean of the variable, because no effect on the dependent variable was found and the overall number of cases was rather small.

↓107

Dependent variables. Dependent variables were the type of sentence (fine or incarceration) and the number and magnitude of daily payments (for fines) and the length of a prison term in months (for incarceration) as recommended by the prosecution and the verdict. According to the German legal system a fine is constructed as a number of daily payments of a certain magnitude. The number is determined in correspondence to the severity of the crime, whereas the magnitude depends on the income of the defendant. As the aim of this study was to compare sentencing for prison terms and fines we focused on the number of daily payments as the dependent variable for fines corresponding to length of prison sentence. The number of daily payments can vary between 5 and 365; more severe offenses are sentenced by incarceration. The dependent variable for incarceration length was number of months sentenced to prison, irrespective of whether the offender was let off with probation. To identify the differences between fines and incarceration, we analyzed the sentences for fines and incarceration separately.

Description of the court, the offenses, and the offenders. The Amtsgericht Bad Freienwalde is a small court in the Brandenburg district of Märkisch-Oderland, close to the Polish border under the jurisdiction of the Frankfurt (Oder) district attorney’s office. The city of Bad Freienwalde has a population of 13,000 with an unemployment rate of 12%. Overall, 99 cases of theft, fraud, and forgery were tried in this court during 2003 and 2004. From the 99 cases, 15 were excluded because the major charge was none of the offenses under consideration, juvenile law was applied, or the case did not lead to a conviction. Of the remaining 84 cases, 82% were tried by the same judge. The 84 cases were prosecuted by 45 different attorneys with a maximum of 5 cases by the same attorney. In 49 cases the main charge was theft, in 20 it was fraud, and in 15, forgery. On average, property worth €2,497 was violated (SD = €8,826). The offenders were predominantly German males; 69 were men and 15 women. Eight offenders did not have German citizenship. The mean age of the offender was 36 years, ranging from 20 to 80 years. About half of the offenders were sentenced to a fine (M = 48 days; SD = 27) and half to a prison term (M = 8 months; SD = 6).

Model selection. The main goal of our study was to identify the cognitive process underlying sentencing and to determine if a cognitive model of sentencing could predict the magnitude of a sentence. For this purpose we tested which theory describes the sentencing process better: legal policy as modeled by a multiple linear regression model (e.g., Cooksey, 1996) or the mapping model, a cognitive theory for quantitative estimation (Helversen & Rieskamp, in press).

↓108

Testing these two models on the data of real cases raised two crucial methodological problems: First, real cases involve an enormous number of factors that could potentially predict the sentence. In our cases we recorded 22 factors that could influence the sentencing decision. How can we find out which factors have a substantial effect? One common technique when using regression models for identifying important factors relies on significance tests. In these models the estimated impact of a factor can depend on the other factors included in the regression equation, so that often procedures are performed where factors are step-wise either included or excluded from the regression equation (cf., Cohen et al., 2003). However, when considering a larger number of factors this procedure is very unsatisfying, because factors that were added to the equation at the beginning of a step-wise forward procedure might not have been added had other factors already been included. Therefore, different statistical procedures applied to the same original set of factors often lead to inconsistent results (i.e., different regression equations), which can lead to very different conclusions.

The second methodological problem we faced concerns the models’ complexity, that is, their flexibility in describing different results. In particular, we were interested in testing the regression model against the mapping model; these models differ in their number of free parameters and therefore in their potential to describe different processes. Therefore, we sought a methodology that would take the models’ complexity into account when testing them against each other.

To tackle these two methodological problems we followed a Bayesian approach, specifically the Bayesian model averaging (BMA) method (see Raftery, 1995, and also Raftery, Madigan, & Hoeting, 1997). This Bayesian method identifies the model or the models that are most probable given the data. Furthermore, BMA provides reliable estimates of the predictors’ influence on the dependent variable and it allows comparison of models of different complexities by taking the models’ free parameters into account. BMA was proposed especially to examine the uncertainty of parameter estimates and for model selection. To identify the most probable models, the Bayesian method calculates the posterior probability of a model given the observed data. Pragmatically this is performed by determining the Bayesian information criterion (BIC), which approximates the so-called Bayes factor (Raftery, 1995; Schwarz, 1978). The method additionally allows one to specify the probability that a factor will have an impact on the dependent variable: Taking model uncertainty fully into account, the average amount of evidence speaking for an effect of a factor is determined by summing the posterior probabilities of all models that include this factor (for details see Appendix B).

↓109

The most reliable method for model selection, according to Raftery (1995), is to construct all possible models that can be built with the available factors and then select the models with the highest posterior probability given the data. However, including all candidate predictor variables would result in an enormous number of possible models, as 15 predictor variables already amount to 32,768 models. Thus we reduced the number of factors by first including all factors that substantially correlated with the dependent variable (i.e., showed a value of r > .3) and then additionally adding factors such as confession or remorse that were not necessarily correlated with sentence magnitude but are of special theoretical importance, because they frequently appear as mitigating reasons in the rationale of the verdict. For the fines we included 11 factors and for the incarceration decisions 9 factors (see Tables 2 and 3).

Next we calculated the BIC values for all models resulting from all possible combinations of the factors. This amounted to 2,048 models in the case of fining decisions and 512 models in the case of incarceration decisions for each model class, the mapping models and the regression models. We first ran the analysis separately for the two model classes, to investigate if the factors identified by the two types of models would differ. Then we included all models in the comparison to identify which class of models most probably underlies the decision behavior given the observed data.

For all of the models we calculated the BIC′ value based on the amount of variance explained (R²) as a measure of goodness-of-fit of the model and the number of free parameters (see Raftery, 1995). Details on the computation and the equations can be found in Appendix B. The BIC′ value gives the odds with which a specific model is preferred to a baseline. In the case of regression, usually a null model is chosen as a baseline model. The null model only includes an intercept (i.e., estimates the mean criterion value for all objects) and no predictor (i.e., free parameter). It explains zero of the variance in the data and its BIC′ is zero (see Equation 2); The BIC k ′ of a specific model M k is defined so that if the BIC k ′ value is positive, the null model is preferred, while a negative BIC k ′ value provides evidence for the model M k under consideration. The lower the BIC k ′ value, the more the model is supported by the data.

↓110

(2)
where is the value of R² for model M k , q k is the number of free parameters for that model, and n is the number of data points.

For the regression models a least squares regression was run with the factors as predictor variables and the sentence recommendation of the prosecution as the dependent variable. For the mapping models the category borders and the typical sentence for each category were estimated from the data. First the perceived factor score was calculated based on range frequency theory with one free parameter for all factors, capturing the relative importance of range and frequency information (for details see Appendix A). Then case seriousness was computed by averaging the factor scores over all factors, the minimum and maximum case seriousness was determined, and the range was divided into seven equally sized categories. For each category the typical sentence was calculated by taking the median of all cases that fell into this category. The typical sentence was estimated for all cases falling into one category and the amount of variance in the sentence recommendation of the prosecution explained (R²) was computed. Based on the BIC′ value we calculated the posterior probability of each model, assuming equal priors for all models. Additionally we computed the probability of each factor being included and an approximation of a Bayesian point estimator of beta weights and standard errors for each factor (see Appendix B).

Results

Overall, the more parsimonious mapping models offered the more probable description of the data, but both model types identified the same factors as influencing sentencing. Although sentencing decisions for fines and prison times were both based on the factors net worth of property and number of charges, the role of mitigating and aggravating evidence differed for the two sentence types. Fining decisions were more influenced by aggravating evidence and the number of prior convictions while incarceration length was more affected by mitigating evidence (II). Neither for fines nor for incarceration decisions did extralegal factors such as sex, age, or nationality play a role. In the following we report the results of the analysis for fines and incarceration separately.

↓111

Magnitude of fines. Overall, the verdict could be almost perfectly predicted by the recommendation of the prosecution (r = .99), as illustrated in Figure 8.

Figure 8: Scatter plot of the sentence recommendation for fines by the prosecution and the corresponding verdict by the judge. The magnitude of the fines is given in number of days a payment has to be made.

Accordingly, we concentrated on the recommendation of the prosecution as the more interesting dependent variable. The recommended sentence, in turn, correlated significantly with a number of offense and offender characteristics (see Table 13). As expected, the presence of a confession and mitigating evidence II, coding low worth of property violated and no prior record, correlated negatively with the recommended sentence. The net worth of the property violated, the number of prior convictions, the number of charges, and the amount of aggravating evidence correlated positively with the magnitude of the sentence. All other factors did not correlate significantly with sentence length or showed no variation in the sample.

↓112

Table 13: Results of correlation analysis and model comparison for fines

Fines (no. of days)

Pearson correlation

(p values)

Five best models

Probability

Beta

SD

Mapping model 1

Mapping model 2

Mapping model 3

Mapping model 4

Mapping model 5

   

Age

.34 (.02)

.08

.03

.11

No. of prior convictions

.32 (.03)

.56

.19

.10

No. of charges

.36 (.02)

.64

.16

.17

Net worth of property

.46 (.001)

.97

.36

.11

Confession

-.50 (.001)

.15

-.08

.19

Penalty order

.50 (.001)

.91

.53

.14

Summary penalty

.53 (.001)

.46

.24

.12

Aggravating evidence

.39 (.01)

.59

.32

.13

Mitigating evidence II

-.48 (.001)

.25

.15

.12

Remorse

-.20 (.20)

.13

-.16

.11

Nationality

.32 (-03)

.14

.08

.12

PMP

.15

.13

.09

.07

.06

BIC′

-56

-55

-55

-54

-54

R²

.74

.74

.73

.73

.73

Note: N= 44; Probability denotes the probability that the factor had an effect and is given by Equation B3 (Appendix B). BIC′ denotes the Bayesian Information Criterion. PMP denotes posterior model probability. An open circle denotes that a factor was not included in the model; a solid circle denotes that a factor is included in the model. For the analyses, the factors confession, remorse, and mitigating evidence II were recoded so that they correlated positively with sentence magnitude. The five best models all belonged to the class of mapping models

Modeling—critical factors. Model analysis showed that a few factors are sufficient to describe the data. BMA for the two model classes gave a similar picture of which factors influence sentencing. Altogether 11 factors were considered (see Table 13), resulting in 2,048 possible models for each model class. Thus the prior probability of a model was about .0005. Of the 2,048 linear regression models under evaluation, 95% had a posterior probability below .002, with the two best models reaching a posterior probability of 5% and 6% and explaining 64% and 68% of the variance in the sentencing recommendations, respectively. There was strong evidence that the factors net worth of property, penalty order, and aggravating evidence affected sentence recommendation. Additionally there was weak evidence for the factors summary penalty and number of prior convictions. The estimated beta weights can be found in Table 2.

Applying the BMA method to the class of mapping models similarly led to discarding a large proportion of models: 96% had a posterior probability below .001. However, the two best models reached a posterior probability of 15% and 13%, respectively. They both explained a much higher amount of variance (74%) in the sentence recommendations than the best regression models. Similar to the regression models, there was strong evidence for the factors net worth of property and penalty order. The factors aggravating evidence, number of prior convictions, and number of charges were supported by some evidence. In contrast to the regression model the factor summary penalty received less support.

↓113

In sum, the BMA analyses of the two model classes rendered that the choice of model had only a slight influence on which factors were identified as important. In both model classes, the most important factors were net worth of property, whether the sentence was recommended by a penalty order or after a trial, and the presence of aggravating evidence. Additionally the number of prior convictions, the number of charges, and if the sentence was a summary penalty played a role, while age, nationality, and a confession or other mitigating evidence did not influence the sentence recommendation. This is clearly inconsistent with the legal requirement that all legally relevant factors be taken into account. Particularly surprising is that confession and remorse were not considered, as they are usually mentioned as extenuating factors in the rationale for the verdict.

Model comparison. After examining which factors influenced the sentencing decision in cases punished with a fine, we now tested which type of model was better suited to explain the decision process underlying fining, mapping or regression. For this comparison we included all models and calculated the posterior probabilities, assuming that all models have the same prior probability. This resulted in a comparison of 4,096 models with a prior probability of .0002. Over all models, 17 reached a posterior probability above .01, summing up to a joint probability of .74, compared with a joint probability of .26 for the remaining 4,079 models. All of them belonged to the class of mapping models (see Table 2 for the five best models). Overall, the mapping models reached a much higher posterior probability: The joint posterior probability of all mapping models was .99999 compared to .00001 for the regression models. This is illustrated by Figure 9, showing the posterior probabilities of the best 1,500 models. The majority clearly belong to the mapping model class.

Figure 9: The posterior model probability of the best 1,500 of all 4,096 models to describe the fining process, differentiated by model class. Of the 1,500 best models, 99% belong to the class of mapping models and 1% to the class of regression models.

↓114

Incarceration length. Similar to the fines, the recommendation of the prosecution was the best predictor of sentence length (r = .95). Accordingly, we again focused on the sentence recommendation of the prosecution as the main dependent variable. Altogether we considered nine factors. Seven offense or offender characteristics correlated above .3 with the length of prison sentence. As expected, the factors net worth of property violated, summary penalty, aggravating evidence, number of charges, and number of offenders correlated positively with the sentence length, while the second mitigating factor (coding a low worth of property violated and no prior record) correlated negatively with recommended sentence length (see Table 14). The factor penalty order was not applicable as a sentence by penalty order is not allowed for prison sentences. Somewhat unexpectedly, the presence of a confession and special circumstances leading to diminished capacity correlated positively with sentence length. This effect, however, is probably due to the comparatively serious nature of these cases and does not reflect a negative evaluation of these factors for sentencing. Although remorse did not correlate with sentence length, we additionally included it in the analysis.

Table 14: Results of correlation analysis and model comparisons for incarceration

Incarceration

(no. of months)

Pearson correlation

(p value)

Five best models

Probability

Beta

SD

Mapping model 1

Mapping model 2

Mapping model 3

Mapping model 4

Mapping model 5

   

No. of charges

.40 (.01)

.32

.31

.01

Diminished capacity

.41 (.01)

.45

.41

.01

Net worth of property

.62 (.001)

.91

.57

.01

Summary penalty

.65 (.001)

.61

.20

.02

Aggravating evidence

.58 (.001)

.63

-.10

.02

Mitigating evidence II

-.41 (.01)

.78

.24

.01

No. of offender

.31(.05)

.04

-.01

.01

Confession

.29 (.07)

.03

-.09

.01

Remorse

-.01 (.98)

.58

.07

.01

PMP

.53

.10

.10

.07

.06

BIC′

-49

-46

-46

-45

-45

R²

.82

.76

.76

.78

.75

Note: N = 40; Probability denotes that the probability that the factor has an effect and is given by Equation B3 (Appendix B). BIC′ denotes the Bayesian Information Criterion PMP denotes the posterior model probability. A solid circle denotes that a factor was included in the model; an open circle denotes that a factor was not included in the model. For the analyses, the factors mitigating evidence II and remorse were recoded so that they correlated positively with sentence magnitude. The five best models all belonged to the class of mapping models

Modeling—critical factors. Similarly to the analysis of the fining decisions, we used the BMA method to determine the factors with the highest probability of influencing sentence length. The factors included in the models were number of charges, net worth of property, diminished capacity, mitigating (II) and aggravating evidence, confession, summary penalty, number of offenders, and remorse, resulting in 512 models per model class with a prior probability of .002.

↓115

Five regression models reached a posterior probability above .05, with the best model clearly superior to the other models with a probability of .28, compared to the second best model with a probability of .11. The best model explained 75% of the variance in the recommended incarceration length and reached a BIC′ value of –41. There was strong evidence for the effect of the factors number of charges, net worth of property, and diminished capacity. Additionally, there was some support that mitigating evidence II influenced sentencing recommendations for prison terms. The corresponding beta weights can be found in Table 14.

For the mapping models, also five models reached a probability above .05. The best model reached a probability of .55 and explained 82% of the variance in sentence length, much more than the best regression model or the second best mapping model with a posterior probability of .10 and an r² of .76. However, the factors supported by the mapping models differed from the factors supported by the regression models. Similar to the regression models, net worth of property received strong support, and mitigating evidence II some support. However, there was hardly any evidence for number of charges, and diminished capacity was somewhat less important. Instead, there was additional evidence for the summary penalty, aggravating evidence, and remorse.

In sum, the analyses showed consistently that—despite the stipulations of the law—only a few factors were necessary to describe sentencing. However, which factors were considered important differed between the two model classes. Although both model classes supported the factors net worth of property violated, mitigating evidence II, and diminished capacity, applying the regression models provided evidence for the factor number of charges, whereas the mapping models indicated the factors summary penalty, aggravating evidence, and remorse.

↓116

Model comparison. To find out which class of models was better suited to explain incarceration decisions, we again entered all models in a joint comparison. The final analysis comparing 1,024 models from both model classes supported the mapping model as the superior type of model. The best five models belonged to this model class (see Table 14). The posterior probabilities of these models added up to a joint probability of .86, compared with a probability of .14 for the remaining 1,019 models. Again, the class of mapping models was more strongly supported than the regression models. The posterior probability of all mapping models added up to .96, compared to .04 for the regression models. This is illustrated by Figure 10, depicting the posterior probabilities of the best 100 models.

Figure 10: The posterior model probability of the best 100 models describing the incarceration decisions, differentiated by model class. Of the 100 best models, 65% belong to the class of mapping models and 35% to the class of regression models

The joint model comparison also supported the evaluation of the factors’ importance by the mapping model (see Table 14). There was strong evidence for the factors net worth of property and mitigating evidence (II), and some support for summary penalty, aggravating evidence, remorse, and diminished capacity.

Discussion

↓117

There are two ways in which sentencing decisions can deviate from the law: First, the decision can be based on a different set of factors than required by the law; second, the way these factors lead to a sentence can be inconsistent with the prescribed legal policy. The present article examined both routes by testing two different models of decision making and by identifying the crucial factors influencing sentencing, following a Bayesian approach.

The model comparison test allowed us to identify which type of model—one consistent with the legal theory or one derived from cognitive psychology—captured the sentencing decisions best. Furthermore we were able to identify the factors that were crucial for each model class to predict the sentencing. Our results show that the prosecutors neither considered all factors required by law nor exhibited decision processes consistent with the policy assumed by the legal literature. Instead, the decisions of the prosecutors were best described by a heuristic for quantitative estimation, the mapping model (Helversen & Rieskamp, in press). In the following we will first discuss the results on which factors predicted sentence recommendations and differences between fines and incarceration sentences. Then we will turn to the model comparison and the significance of the BMA method for sentencing research. Finally we will discuss limitations of the current study.

Predictors of Sentencing Decisions

Prosecutors clearly deviated from the law concerning the factors that had an impact on sentencing length. According to the law, all legally relevant factors in the analysis should have affected the sentence recommendation. However, in both types of sentencing decisions only a few factors were sufficient to predict the prosecutors’ recommendations. It is in particular surprising that factors such as confession or remorse did not always lead to lower sentences, as they are usually stated as mitigating factors in the rationale for the verdict. The results are, however, in line with psychological research on judgment and decision making, which has repeatedly shown that humans often lack insight into their judgment policies (Brehmer & Brehmer, 1988) and tend to base their decisions on only a few factors (Brehmer, 1994).

↓118

Interestingly, the factors influencing fining and incarceration decisions varied substantially. For one, the magnitude of the fine was higher if the sentence was imposed via penalty order than if by trial, whereas incarceration length was influenced by diminished capacity. However, as there were no cases of diminished capacity in the sample receiving fines and sentencing by penalty order is not allowed for incarceration sentences, these differential effects are not very surprising. More interestingly, fines were influenced by prior record and aggravating evidence, but not by mitigating evidence. This suggests that the prosecution, deciding which factors were relevant, might have relied on an image of a “typical” case. Factors that indicated deviation from the norm were considered for the sentence while factors that constituted the “normal” case were not (Mösl, 1983; Tata, 1997). Fines are usually imposed in less serious cases. Thus, in cases punished by a fine the prosecution might have already “used up” the influence of any mitigating information by sparing the offender an incarceration sentence, while in cases punished with incarceration the mitigating information was taken into account, reducing sentence length.

Model Comparison

In both types of sentencing decisions, our analyses clearly illustrated that cognitively derived mapping model provided a much better explanation for the sentencing process than the regression model that is consistent with legal regulations. For the fining decisions, just about any mapping model was more probable than a regression model. Even in the incarceration decisions the five best models belonged to the mapping model class. These results are in line with those of Helversen and Rieskamp (in press), who demonstrated the success of the mapping model in comparison to the regression model in a laboratory estimation task. Because the regression model was outperformed by the mapping model, this result suggests that prosecutors do not weigh each factor individually and sum up the weighted evidence as one would expect from standard legal procedure. Instead, the cognitive process underlying sentencing decisions was more in line with the mapping model. Therefore, when prosecutors make sentencing decisions they apparently use the evidence provided to group cases of similar seriousness together, where the seriousness of a case depends on its average value on the factors considered relevant. Finally, a typical sentence is stored for each category and used to evaluate a present case.

The finding that cognitive models are more suitable to predict legal decision making is consistent with previous findings, indicating that legal decision-making processes often do not concur with the procedures assumed by the law (e.g., Dhami & Ayton, 2001; Hertwig, 2006; Van Duyne, 1987). However, although our study illustrates that a cognitive model was more suitable to predict sentencing than a model consistent with standard legal procedure, we emphasize that following the mapping model to make sentencing decisions does not necessarily represent a case of biased decision making. In contrast, Helversen and Rieskamp (in press) showed that in situations in which the criterion is nonlinearly distributed, the mapping model was more accurate in predicting the criterion than a regression model. Thus, in sentencing situations in which the distribution of the cases’ seriousness is highly skewed, the mapping model might be, in fact, more suitable than a regression model for making sentencing decisions. Particularly in low-level crimes, where legal decision makers operate under severe time constraints, making sentencing decisions according to the mapping model could be an adaptive response.

↓119

Nevertheless, making a decision according to the mapping model compared to a weighted additive model will often lead to different sentences. This raises the question of which process sentencing should follow. It also resonates with a discussion in the German legal literature in the 1980s. Instigated by a decision of the German Federal Court of Justice, the relevance of “normal” and “average” cases as reference points for sentencing was discussed (see Bruns, 1988; Mösl, 1981, 1983; Theune, 1985a, 1985b). Likewise in England, similarity-based decision aids for sentencing have been under discussion (e.g., Tata, 1998). In principle, because the German penal code does not regulate how the relevant factors should be integrated, processes as assumed by the mapping model might be legally justifiable. Although this is ultimately a legal question, psychological insights into the cognitive processes underlying legal decisions could inform a legal discussion on sentencing laws and might provide valuable input for the development of institutions.

Bayesian Approach

The way we analyzed the data and tested the two competing models differs substantially from the standard approach taken in policy-capturing research (e.g., Cooksey, 1996). According to the standard approach, one single regression model is estimated by applying a specific statistical test procedure. This approach has the disadvantage that it can lead to rather different results and conclusions depending on the statistical procedure chosen. Moreover, the interpretation of the influence of single factors is rather complicated, because the influence depends on the other factors included in the equation.

In contrast, the Bayesian approach we followed led us to consider all possible models that could be constructed with the available predictors and for each model the posterior probability was estimated. The two competing model classes were tested against each other by considering all models of each class and not simply one best model. This model comparison test provided very strong empirical support for the mapping model. Moreover, by considering which factors were included in models with large posterior probabilities, it was possible to provide more reliable conclusion about the factors that are important for sentencing decisions.

Limitations of the Study

↓120

Our study focused on one single German court. This naturally raises the question of how well the results generalize. Many studies have shown the importance of location and the legal culture of a jurisdictional district (e.g., Johnson, 2006; Kautt, 2002; Langer 1994). Especially, which factors influence sentence magnitude could differ between districts and thus our results concerning the importance of factors should be treated with caution. Furthermore, our results were based on a rather small sample, which could reduce the generalizability of the results even within the jurisdictional district. Nevertheless, for the restricted data set we could illustrate the benefits of a cognitively inspired approach to legal decision making. Future research is necessary to test if these results can be replicated with larger samples for a wider range of jurisdictional districts. Although this needs to be tested, we do not have a reason to assume that prosecutors from Brandenburg differ in their cognitive processes from prosecutors in other parts of Germany. If anything, a higher case load and more time pressure should be expected.

Even when generalizing outside of Germany, similar results might be anticipated, given that the general features of the task remain the same. That is, as long as the prosecutor or the judge has to integrate several factors to determine a final sentence, the mapping model could offer a valid description of the process. However, legal systems where sentencing is strictly regulated by sentencing guidelines, as, for instance, in the United States, could provide exceptions. Thus further studies investigating the generalizability of the utility of the mapping model to explain sentencing are necessary. 

In a similar vein, it is important to note that this study focused on low-level offenses. It is an open question if the same cognitive processes underlie the sentencing of more severe cases, such as capital crimes. It appears reasonable that for more severe cases more factors are taken into account for sentencing decisions, which therefore might be more in line with legal policy.

Conclusion and Outlook

↓121

This paper provides evidence that in sentencing, cognitive models are necessary to understand the decision process. Our results suggest that the sentence recommendations of prosecutors were not consistent with the requirements of the law; instead, sentence recommendations were well described by the mapping model, a cognitive theory for quantitative estimation (Helversen & Rieskamp, in press). This study joins a growing body of research questioning the ability of decision makers to comply with legal regulations and emphasizes the importance of understanding cognitive processes for the development of institutions.

Appendices

Appendix A

Range Frequency Theory

↓122

According to range frequency theory (Parducci, 1974), human judgments of magnitudes and size are context dependent, that is, they depend on the range of the stimulus values as well as on the frequency with which a stimulus value appears. The judged magnitude J of a stimulus i is given by the weighted sum of the range value R and the frequency value F (cf. Parducci, 1974, p. 209):

(A1) J i = wR i + (1 – w)F i ,
with 0 < w < 1. The range value R represents the proportion of the current range below the current stimulus S i :

(A2) R i = (S i S min)/( S maxS min),
where S i  denotes the current stimulus value and S min  and S max are respectively the smallest and the largest stimulus in the set.

↓123

The frequency value F i represents the proportion of all current values below the current stimulus:

(A3) F i = (r i – 1) / (N – 1),
where F i represent the frequency value of the stimulus i, r i is the rank of stimulus i, and N the number of stimuli in the set.

Appendix B 

↓124

Bayesian Model Averaging

The Bayesian information criterion (BIC) gives the odds with which a specific model is preferred to a baseline model.To calculate a model’s BIC value we compared it with the null model (a baseline model with no independent variables), following Raftery (1995, Equation 26, p. 135):

(B1) ,
where k is the value of R² for model M k , q k is the number of free parameters for that model, and n is the number of data points. The gives the BIC value for the null model compared to the model M k . The BIC′ of the null model is zero. Accordingly, if the is positive the null model is preferred to the model M k . However, if the is negative, model M k is preferred to the null model, and the smaller the , the more M k is supported by the data.

↓125

The posterior probability of a model is defined as:

(B2) ,
(cf. Raftery, 1995, Equation 35, p. 145) where p gives the probability of model M k given the data D in comparison with all models from set K assuming an equal prior probability of 1/k for all models.

The posterior probability pr that a factor B has an effect (B ≠ 0) is given by the sum of the posterior probabilities of all models that include B, here referred to as model set A:

↓126

(B3) ,
(cf. Raftery, 1995, Equation 36, p. 145).
The beta weight and the standard error of the beta weights can be estimated by an approximation to a Bayesian point estimator and an analogue of the standard error. Approximations are given by:

(B4)
(cf. Raftery, 1995, Equations 38 and 39, p. 146), where , E denotes the expected value of the beta weight , and is the maximum likelihood estimator of under Model M k .
Respectively, the standard error can be approximated by:

(B5) ,
where is the standard error of under Model M k (cf. Raftery, 1995, p. 146).

↓127

Authors’ Note

Bettina von Helversen and Jörg Rieskamp, Max Planck Institute for Human Development, Berlin, Germany. We would like to thank attorney M. Neff and the prosecution authority of Eberswalde for providing us access to the trial records. We are very grateful to Christoph Engel, Stefan Bechthold, Stefan Tontrupp, Andreas van den Eikel, and Tobias Lubitz for their help and advice in devising the categorization system. We gratefully acknowledge Patrizia Ianiro, Daria Antonenko, and Cornelia Büchling’s commitment and helpful ideas in coding and analyzing the data. We would like to thank Anita Todd for editing a draft of this manuscript. This work has been supported by a doctoral fellowship of the International Max Planck Research School LIFE to the first author. Correspondence concerning this article should be addressed to Bettina von Helversen.

Bettina von Helversen
Max Planck Institute for Human Development
Lentzeallee 94, 14195 Berlin, Germany
Phone: (+49 30) 82406 699
Fax: (+49 03) 82406 394
Email: vhelvers@mpib-berlin.mpg.de

↓128

Footnotes

1. Besides the factors stated in § 46, German law allows sentence adjustments to achieve general prevention as well as specific prevention objectives (Meier, 2001; Schäfer, 2001). Furthermore the sentencing range can be lowered if mitigating reasons as specified in articles 21, 23, and 49, exist. As our sample did not included mitigated sentencing ranges according to these articles, we relied on the sentencing ranges as specified for common and aggravated cases of theft (§242 ff.), fraud (§263), and forgery (§267).

2. To simplify the statistical analysis we inverted all factors that were negatively correlated with sentence magnitude, so that after inversion all factors were positively correlated with sentence magnitude. Please note that this is only a statistical simplification; alternatively the difference between the mean score on aggravating factors and the mean score on mitigating factors could be taken.


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML-Version erstellt am:
07.02.2008